July 23, 2020

Billionaire Tech Entrepreneur Tom Siebel Built A Massive Compendium Of Covid-19 Datasets. Some 2,000 Researchers Now Use It.

Billionaire tech entrepreneur Tom Siebel struck gold with Siebel Systems, which he sold to Oracle in 2006, and is trying again with artificial intelligence firm C3.ai, valued at $3.3 billion. But as the pandemic hit, business slowed and he spent weeks immersed in how to use data to help Covid-19 researchers. He set up a so-called “data lake” of Covid-19 information, culled from Johns Hopkins, the World Health Organization, the Institute for Health Metrics and Evaluation, the Covid Tracking Project and dozens of other organizations that researchers could access in one place for free.

All told, he says, some 2,000 active users from around the world are now working with this compendium of datasets to research the course of the disease and ways to mitigate it. Among the users, he says, are researchers at the National Institutes of Health, MIT and various pharmaceutical companies.

“What’s difficult about these data sets is making all the connections. All of these data sets are extraordinarily large with tens of thousands of fields, and hundreds of millions of records. In order to make them useful for analytics you need to connect issues like co-morbidity and infection rates,” he says. “The number of things we have connected is mind-numbing.”

Siebel, 67, is in a unique position to create a compendium of data sets. He spent more than a decade and, he says, nearly $1 billion building the technology underlying C3.ai, which offers predictive analytics to customers that include 3M, Royal Dutch Shell and the U.S. Air Force. His Redwood City, California-based business has grown rapidly, passing $160 million in revenue for the fiscal year that ended in April.

C3.ai cleaned up the data sets using the automated tools it developed to help its corporate customers so that researchers could access data that is structured, readable by machine and free of anomalies. The effort began with 11 data sets, published in April, and expanded over time to include 32 in June. Siebel says that he intends to continue adding new datasets to the data lake, which is hosted on AWS, over time.

“This is a natural application of AI,” Siebel says. “There are a lot of applications of AI that we both know are a little scary and onerous, and this is one that is potentially enormously socially beneficial.”

The data effort is one of two Covid-19 projects that Siebel launched this spring. The other, called the C3.ai Digital Transformation Institute, is giving away more than $300 million in grants and in-kind resources to data-driven, Covid-19 research projects in partnership with Microsoft. The University of California, Berkeley, and the University of Illinois at Urbana-Champaign are managing that consortium, which has funded 26 projects to date.

“We’re doing our best to help advance the underlying science that will make this problem go away,” Siebel says. “Until we make this problem go away, I don’t think we’re going to get this economy back on its feet.”

Read the full article here.