Doubles in Size – Now One of the World’s Largest Unified Sources of COVID-19 Data
Early Adopters of Data Lake Include Researchers from Top Universities, Leading Hospitals, and Government Agencies Working to Mitigate the Spread of the Pandemic
Redwood City, CA—(BUSINESS WIRE)— May 15, 2020 – C3.ai, a leading enterprise AI software provider for accelerating digital transformation, today announced the addition of 11 new integrated COVID-19 data sets to the C3.ai™ COVID-19 Data Lake, making it one of the largest pre-integrated and free sources of COVID-19 data in the world. C3.ai COVID-19 Data Lake offers researchers access to normalized, unified data to accelerate efforts in the fight against COVID-19.
Researchers are in a race to predict the virus’ trajectory, forecast demand for ICU bed capacity, analyze the efficacy of COVID-19 guidelines, support COVID-19 diagnosis, and speed the development of medical treatments. The challenge is that most data are dispersed in a variety of different locations and in unusable formats. Absent rich, integrated data sets, it is impossible to develop meaningful and accurate artificial intelligence models.
The C3.ai COVID-19 Data Lake – developed by C3.ai in three weeks and accessible at https://c3.ai/covid – is a unified source of comprehensive, integrated COVID-19 data that C3.ai has made publicly available, at no cost, to global research and scientific communities. The data lake is unique and structurally different from other COVID-19 data collections in that it provides analysis-ready data that researchers can use immediately to enhance new or ongoing COVID projects.
Early Adopters Fast Track COVID-19 Projects
Researchers and data scientists from top universities, leading hospitals, and government agencies are among the early adopters using the C3.ai COVID-19 Data Lake to support a variety of efforts, including:
- Supply chain analysis at Massachusetts Institute of Technology (MIT) Humanitarian Supply Chain Lab, MIT Center for Transportation & Logistics: Researchers at MIT, in collaboration with the Federal Emergency Management Agency (FEMA) and other agencies, are focused on the analysis of critical supply chain issues to understand the distribution and availability of COVID-19 testing equipment and personal protective equipment (PPE) – and the pandemic’s impact on freight flows throughout the country.
”Having access to an integrated set of diverse COVID-19 data sources with a common data model can help accelerate analysis of critical supply chain issues in our work with FEMA and other agencies,” said Tim Russell, Research Engineer at the MIT Humanitarian Supply Chain Lab, MIT Center for Transportation & Logistics. “The C3.ai COVID-19 Data Lake provides a valuable resource in unifying and simplifying access to the necessary data without having to waste time on finding, cleaning, and preparing the data for analysis.”
- COVID-19 search engine affiliated with Lawrence Berkeley National Laboratory (Berkeley Lab): A team of materials scientists at Berkeley Lab have launched a COVID-19 publications search engine that synthesizes hundreds of scientific papers every day for information extraction using text mining algorithms and natural language processing. Berkeley Lab scientists used the C3.ai COVID-19 Data Lake to incorporate Milken Institute data on therapeutics.
- Media portrayal of COVID-19 in the U.S. at Arizona State University (ASU): Researchers at Arizona State want to understand the social psychology behind people’s responses to the pandemic based on media portrayal of COVID-19. Specifically, they will be evaluating the impact of news and social media posts on the population’s compliance with local mandates over time.
- Pandemic strategies and response scenarios at a government agency: Data scientists are developing pandemic strategies, response scenarios, and risk assessments by building predictive models that will validate other publicly available models.
“With the addition of these 11 important data sets, we are proud to continue enhancing the scope and exponentially increasing the value of the C3.ai COVID-19 Data Lake as a no-cost resource for the global research community,” said Thomas M. Siebel, CEO of C3.ai. “We are excited by the enthusiastic response among researchers and we are confident that their creativity, innovation, and imaginative use of this resource will yield significant results toward mitigating this and future pandemics.”
C3.ai also is encouraging researchers to recommend data sources they would like to see added to the C3.ai COVID-19 Data Lake for future research. For example, a physician from a leading hospital has requested C3.ai add all U.S. vaccination data to the data lake to study the impact of previous vaccinations on the rate of hospitalizations and infections. Additionally, researchers affiliated with a leading university have requested C3.ai populate de-identified patient data into the data lake to improve an app that informs users with pre-existing conditions of COVID-related morbidity risks.
C3.ai COVID-19 Data Lake data sources currently include:
- Johns Hopkins University: COVID-19 Data Repository
- The COVID Tracking Project
- World Health Organization: Daily Situation Reports
- The New York Times: COVID-19 Data in the United States
- European Centre for Disease Prevention and Control: Worldwide Situation Updates
- University of Washington – Institute for Health Metrics and Evaluation: COVID-19 Projections
- Data Science for COVID-19: South Korea Dataset
- Dipartimento della Protezione Civile – Emergenza Coronavirus
- COVID-19 India
- nCoV-2019 Data Working Group: Epidemiology Data
- MOBS Lab: COVID-19 Situation Report
- National Center for Biotechnology Information Virus Database
- Allen Institute for AI: COVID-19 Open Research Dataset (CORD-19)
- Milken Institute COVID-19 Treatment and Vaccine Tracker
- World Health Organization COVID-19 R&D
- University of Montreal: COVID-19 Image Data Collection
- Carbon Health & Braid Health: COVID-19 Clinical Data Repository
- Definitive Healthcare – Hospital Beds
- Kaiser Family Foundation: Social Distancing Policies
- Apple: COVID-19 Mobility Trends
- US Census Bureau: Demographic & Housing Estimates
- The World Bank – Global Health Statistics
The C3.ai COVID-19 Data Lake unifies data sources into a single, federated cloud image, updated in real-time with pre-established linkages so researchers can easily navigate and explore all of the associations within and across the data sources through a knowledge graph. Researchers can then apply advanced data science methods against the corpus of all COVID-19 data.
By unifying the data sources, the C3.ai COVID-19 Data Lake helps researchers generate insights faster and more easily than is possible with other data collections. The C3.ai COVID-19 Data Lake is easily accessible to researchers via any utility that supports access through a RESTful API using common tools such as Python, R, Ex Machina, and Microsoft Power BI. C3.ai intends to release future data sets bi-weekly.
Amazon Web Services (AWS) is co-sponsor of the open data initiative and is providing cloud infrastructure services in support of this initiative.
This news follows the COVID-19 Data Lake announcement on April 22, 2020 and the March 26, 2020 launch of C3.ai Digital Transformation Institute (C3.ai DTI), a research consortium dedicated to accelerating the application of artificial intelligence to speed the pace of digital transformation in business, government, and society.
For additional information about C3.ai COVID-19 Data Lake please visit: https://c3.ai/covid
To learn more about the C3.ai DTI program, award opportunities, and the first call for research proposals, focusing on AI techniques to mitigate COVID-19 and future pandemics, please visit C3DTI.ai.
C3.ai is a leading AI software provider for accelerating digital transformation. C3.ai delivers the C3 AI Suite for developing, deploying, and operating large-scale AI, predictive analytics, and IoT applications in addition to an increasingly broad portfolio of turn-key AI applications. The core of the C3.ai offering is a revolutionary, model-driven AI architecture that dramatically enhances data science and application development.
Director of Public Relations