If AI is to help turn back the COVID-19 tide in a timeframe that saves lives and livelihoods, data scientists will need to build machine learning models faster than usual and run them on an AI platform that scales to the global pandemic’s enormous complexities. Even more, vast amounts of Coronavirus-related data will be needed so models can be trained to produce valid epidemiological and treatment research results. And given the time pressures of this crisis, data scientists must be freed from the shackles of data wrangling and cleansing, which can consume 80 percent of their time.
But does such an AI development environment exist? And where can the data, in a cleansed and usable format, be found? Thomas M. Siebel, Chairman and CEO of C3.ai, says his company has both, and he’s giving them away for free.
C3.ai, which boasts some of the world’s largest at-scale enterprise AI implementations (Con Edison, Enel, U.S. Air Force, Royal Dutch Shell), is leading a two-pronged effort to fight COVID-19. On the technology side, the C3.ai Digital Transformation Institute is a consortium of research universities, supercomputing centers and Microsoft Azure that will pursue research projects, the results of which Siebel told us will be released into the public domain, using the C3 AI Suite and its model-driven architecture designed for enterprise-scale AI applications. On the data side, the company on April 13 will release, for public use, the first of two tranches of an aggregated COVID-19 Data Lake from more than 20 information sources, which will have been unified and federated using the C3 AI Suite’s data wrangling capabilities, ready for use by data scientists.
The effort leverages experience gained by Siebel and his team not only at C3.ai but also from their years running Siebel Systems, a creator of CRM software formed in 1993 that became a $2 billion company and merged with Oracle in 2006. The platform that evolved into the C3 AI Suite, funded by Siebel when he formed the company in 2009, was eight years in development before becoming commercially available. C3.ai now comprises more than 500 employees and grew by roughly 100 percent last year, Siebel said.
“Some of these projects we’re getting involved in, like building these large-scale discrete event simulations, taking a massive amount of data, and predicting what it will look like in seven days, OK, that’s a hard process,” Siebel said. “So the amount of data that you need to be able to aggregate, synthesize, and process – the number of CPU cycles that you need, to be able to do that with acceptable levels of precision – this is a computationally extraordinarily extensive problem. In terms of scaling, it’s mind numbing. When we start mapping these large genome sequence databases, you’re going to run into scaling issues on data size and data processing capability, people are going to spool up, they’re going to build machine learning models, that will take tens of thousands of virtual machines operating in parallel process.”
Delivering the most extreme of the compute intensive cycles will be the National Center for Supercomputer Applications (NCSA) at the University of Illinois at Urbana-Champaign and its Blue Waters system, and the Perlmutter supercomputer, due for completion by spring 2021, at Lawrence Berkeley National Laboratory’s National Energy Research Scientific Computing Center.
The C3.ai-led effort is one of many resource sharing, crowdsourcing efforts formed to combat COVID-19. Data analytics and business intelligence specialist Tibco has released its COVID-19 Visual Analysis Hub, a site for using the company’s Spotfire analytics software to track the pandemic’s spread and impact based on data from the Center for Systems Science and Engineering at Johns Hopkins University (also used in C3.ai’s COVID-19 Data Lake) and other sources.
Among other efforts that emerged this week, Domino Data Lab, provider of an open data science platform for large enterprises, announced complimentary access to its Domino data science platform to organizations advancing collective understanding of COVID-19. Data scientists at WellAI released a software application for COVID-19 researchers based on machine learning algorithms that read and summarize vast amounts of medical literature, available at https://wellai.health. From China, Huawei Cloud announced as part of its Anti-COVID-19 Partner Program free access to cloud and AI services, including of its EIHealth, which includes viral genome detection, antiviral drug in silico screening and AI-assisted CT patient screening service, as well as free cloud resources worth up to $30,000 (US). (For other COVID-19-related analytics and data sources available at no cost, see “COVID-19 Spurs Offers for Free Software, Data, and Training” at sister publication Datanami.)
At C3.ai, Siebel said the C3.ai Digital Transformation Institute (C3DTI.ai) has issued its first call for COVID-19 research proposals dealing with such challenges as slowing the pandemic’s spread, speeding development of medical treatments and designing and repurposing of drugs or clinical trials. C3.ai DTI will initially fund 26 research projects annually. C3.ai will provide more than $57 million in cash over 5 years along with $310 million in the form of in-kind contributions from C3.ai and its C3 AI Suite and Microsoft Azure cloud resources. Winning proposals will be selected by June 1, Siebel said.
He’s optimistic the project work, when released publicly, will quickly be accepted and adopted because “it’s been blessed by Berkeley, Princeton, Carnegie Mellon so, I mean, the National Institutes of Health and the CDC are going to like it.”
While compute and machine learning resources are valuable to projects of this type, Siebel said they’re not the most valuable.
“When you’re dealing with AI at research institutions,” he said, “the scarcest resource isn’t computing capacity going into bioinformatics, and it’s not human capital. It’s availability of real data. So these data scientists and researchers, because they do not have access to large public health databases, due to HIPAA regulations and what have you, they they’re forced to synthesize data.”
The first tranche of the COVID-19 Data Lake will be released next week, and the second will be released in May. The open data sets will be accessible at https://c3.ai/covid via utilities that support access through a RESTful API using common tools such as Python, R, Ex Machina, and Microsoft Power BI. C3.ai said researchers and developers are invited to help expand the data lake by enhancing its functionality, developing analytics and predictive models and contributing additional data sets through a crowdsourcing model.
“We started working with NIH, the CDC and all of these research institutions to basically aggregate the largest unified, federated data image that consists of all the data that we’re able to find on COVID-19,” said Siebel, adding that C3.ai partnered with Amazon Web Services on this aspect of the Coronavirus project. “And by a unified aggregate image…, it’s not simply that all of these data are in one place, they’re in one place and fully connected. This is an extremely large dataset where we’ve connected the articles on the disease to the patient who has the disease to the CT scan that indicates the disease. All of these pointers are there in a unified data image that we can navigate using a knowledge graph…and perform data science.”
Siebel said 50 C3.ai employees have been assigned to COVID-19 project work.
“In many ways, I think this crisis is a test,” Siebel said. “It’s a test of us as individuals and how we behave. It’s a test of the strength of our social fabric and how well it holds up under crisis, (because) it might get pretty tense out there in the next month. It’s a test of the strength of our government institutions, and at a less significant level, it’s going to be a test of the resilience of corporate leaders.”
“And you know, if we have some small impact at the edge of this crisis, I’ll be honest with you, if this is all the company ever accomplishes, I’ll be happy. If this is all we accomplish in the history of this company, that we make a positive contribution to this COVID-19 problem, I’ll feel the last 10 years will have been successful.”
Read the full article here.