Accelerate Research with Unified, Integrated COVID-19 Data

The COVID-19 Data Lake uniquely integrates multiple data sources in a unified data model, ready for analysis – not just a list of links or a collection of data sets. Stop wasting time wrangling data and focus instead on generating insights. Access data at no charge with any utility that supports RESTful APIs. COVID-19 Knowledge Graph

The COVID-19 Data Lake pre-establishes the important linkages in the disparate COVID-19 data sets sourced from all over the globe, so that researchers can easily navigate and explore the data features that may be of interest (e.g., diagnosis, age, locale, preexisting condition, etc.) and can perform sophisticated data science on those data.

How to Access the COVID-19 Data Lake

Get started by downloading R and Python quickstart notebooks and access documentation for COVID-19 RESTful APIs.

Get Access

Data Sources

Daily Case Reports
Epidemiology Line Lists


Unified data image

Get COVID-19 data in a form that is more easily accessible and useful for applying advanced analytics and artificial intelligence.

Available at no charge

Access, use, and share multiple important COVID-19 data sets for research purposes at no cost.

Single, secure cloud image

Access data from a private, highly scalable, distributed cloud infrastructure.

Accessible through RESTful API

Access COVID-19 data via any utility that supports access through a RESTful API (e.g., common tools such as Python, R, Microsoft Power BI, etc.).

Expandable via crowdsourcing

Help expand the scale of the COVID-19 Data Lake and enhance its functionality by contributing additional open data sets through a crowdsourcing model. COVID-19 Data Lake

Available to users of C3 AI Suite

Access the COVID-19 Data Lake through the C3 AI Suite, in addition to public access through RESTful APIs.

Data sets from global sources

Leverage COVID-19 data sourced from curated global sources and unified into an interconnected data image.

Continuously updated data

Enable researchers to work with the latest data sets from all sources.



“I was expecting something similar to many of the other COVID data resources out there. Instead of getting a list of URLs or folders full of CSVs, this data lake provides a comprehensive interconnected data model. Previously disconnected data sources can easily be integrated – including time series data – with a single, simple API request. Navigating, pulling, and aggregating data like country statistics, patient diagnosis, age, and even time-based location movements can be done in minutes instead of days or weeks. This is incredibly valuable when time is critical and breakthroughs can mean thousands of lives saved. This is why I created an open-source Python connector for the data lake to provide even more access to this information.”

Connor Makowski

Project Manager MIT Computational and Visual Education (CAVE) Lab

“Our goal is to make it easier for researchers, data scientists, and developers to build, train, and run custom machine learning models on massive amounts of COVID-19 data for greater and faster insights. The COVID-19 Data Lake has the potential to globally impact research efforts and speed breakthroughs to come.”

Mike Clayville

Vice President-Worldwide Commercial Sales and BD, AWS

Virtual Conference


Contribute Data

One of the primary objectives of the COVID-19 Data Lake is to continuously and conceptually expand the corpus of the data lake. Please contact us if you: 1) Are able to directly contribute specific data sets related to COVID-19, or 2) Have a specific request for data sets you want to see included.

Your Feedback is Important

  • Please contribute your questions, answers and insights to COVID-19 Data Lake community on Stack Overflow (make sure to tag c3ai-datalake and reference you are using the COVID-19 Data Lake).
  • For support, please send email to:

By submitting your information, you agree to our Privacy Policy, Terms of Use and our Data Use Conditions.

Access unified, analysis-ready COVID-19 data, at no charge