How Big Data is Helping in Big COVID-19 Pandemic Situation

How Big Data is Helping in Big COVID-19 Pandemic Situation


14 min read

Photo by Clay Banks on Unsplash

Health and technology are inseparable; technology enables easy accessibility to health, while significant breakthroughs in science and health are considered the core reason why technology is improved upon every now and then. Today, when the world is struggling with the novel COVID-19 (coronavirus) pandemic, the role of technology has never felt so important and game-changing; right from bringing education to the comfort of a mobile device to the more complex processes of contact tracing for the virus.

New cases of COVID-19 continue to grow at alarming rates worldwide, with more than 28 million people acquiring the infection, and more than 905k people dead so far. At the core of containing the spread of the pandemic is big data — the health data acquired from these cases — which has become a valuable source of information and knowledge, processed by government and health organizations to improve their response to the pandemic.

What is big data and how is it helping?

*Photo by[ Franki Chamaki]( on[ Unsplash](*Photo by Franki Chamaki on Unsplash

Big Data involves an advanced technology to store, process, and analyze vast amounts of information for which traditional software techniques do not suffice. In the health sector, big data includes patients’ data for coronavirus which is stored digitally. With the help of artificial intelligence (AI), it helps reveal patterns, trends, correlations, and discrepancies through computational analysis. It may also help to reveal insights into the spreading and controlling of the virus. All of this data is used to conduct research and development about the virus that caused it along with the efforts to tackle this virus and its after-effects.

Big data can be used profitably with comprehensive data capture capability to reduce the risk of transmitting this virus. This system is used to store data of all forms of COVID-19-affected cases (infected, recovered, and expired). This data can be used efficiently to classify cases and to assist in allocating the resources for improved public health security. Several digital data modalities including location of patients, proximity, patient-reported travel, patient physiology, comorbidities, and current symptoms can be digitized and used to produce actionable insights at both demographic and community levels.

Leveraging public datasets

A quick search for publicly available datasets for COVID-19 and you will come across thousands of them continuously being updated and analyzed to help better the response of the nations and health industry against the pandemic.

Here’s a link to some of the datasets you can access to understand the scope and reach of big data:


The COVID-19 Data Lake contains COVID-19 related datasets from various sources, covering testing and patient outcome tracking data, social distancing policy, hospital capacity, mobility, etc.

  • Bing COVID-19 Data Bing COVID-19 data includes confirmed, fatal, and recovered cases from all regions, updated daily.

  • COVID Tracking Project The COVID Tracking Project dataset provides the latest numbers on tests, confirmed cases, hospitalizations, and patient outcomes from every US state and territory.

  • European Centre for Disease Prevention and Control (ECDC) Covid-19 Cases The latest available public data on geographic distribution of COVID-19 cases worldwide from the European Center for Disease Prevention and Control (ECDC). Each row/entry contains the number of new cases reported per day and per country or region.

  • Oxford COVID-19 Government Response Tracker The Oxford Covid-19 Government Response Tracker (OxCGRT) dataset contains systematic information on which governments have taken which measures, and when.


  • COVID-19 Open Research Dataset Challenge (CORD-19) COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 200,000 scholarly articles, including over 100,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.

  • Coronavirus Genome Sequence Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China.

  • Coronavirus (covid19) Tweets This dataset contains the Tweets of users who have applied the following hashtags: #coronavirus, #coronavirusoutbreak, #coronavirusPandemic, #covid19, #covid_19. From about 17 March, the dataset also included the following additional hashtags: #epitwitter, #ihavecorona

COVID data lake on CloudFront

This contains numerous datasets time-series datasets for reporting of daily cases country wise. There is a **Full Dashboard **representing this


Collection of all the sources of COVID datasets curated by the Reddit community Spreadsheets and Datasets:

Other Good sources:

These publicly available datasets for COVID-19 prove a valuable resource to the public, doctors, other healthcare professionals, and researchers in order to track the virus and analyze the infection mechanism. Now let us look at how these datasets help towards the ongoing research and analysis for COVID-19.

Identification of infected cases

One of the publicly available datasets, such as the one available by Microsoft, provides information on infected cases based on region. Not only does this big data store the complete medical history of the patients, but it also assists in identifying the infected cases and conducting further risk level analysis.

Travel history

One of the first identifying factors for infection is the travel history of an individual. If you look closely, right from the moment you book a ticket, your data is stored with the airline, in the mandatory government apps such as Aarogya Setu to identify if you’re coming from a contaminated zone, and also with the taxi aggregators. Big datasets such as these store people’s travel history for risk analysis. It also helps in identifying individuals who may be in contact with an infected individual.

Fever symptoms

We all have some or the other app installed in our mobile phones or available natively that helps keep a record of our health. These apps, too, have big data as their backbone to help identify and record symptoms for possible illness. The associated datasets keep records of a patient’s fever and other symptoms and help determine when medical treatment is required.

Identification at an early stage

In a pandemic, time is of utmost importance. If accurate identification is done in time, it is possible to save the lives of millions. With big data, it has become possible for health authorities to move swiftly in identifying infected people at an early stage. For instance, if a patient logs in information about symptoms associated with COVID-19 in his or her doctor’s appointment app, it is possible for easier and swifter identification of the infection at an early stage. Big data also helps examine and classify individuals who could be infected with this virus in the future.

When it comes to India’s fight against the pandemic, big data can be seen at play in a lot of places. For instance, enabled in even a few of your favorite food delivery apps such as Zomato and Swiggy, you can see the body temperature of the delivery person even before it reaches your doorstep with that large pizza you ordered. Meanwhile, the government-backed Aarogya Setu app helps in tracking the movement of the citizens. It also notifies the individual if they came in contact with an infected person and for how long. At the heart of all this is big data.

Big data analytics; a tool towards healthier tomorrow

*Image by [Gerd Altmann]( from [Pixabay](*Image by Gerd Altmann from Pixabay

Big data analytics is capable of serving as a tool for COVID-19 monitoring, control, study, and prevention. It will diversify research and help improve vaccine development. With the assistance of data collection, China suppressed COVID-19 and enforced the process with AI leading to a low spread rate. There are many big data components to this pandemic where AI plays an important role in biomedical testing and mining the scientific literature required to help speed up the process of containing the spread.

Access to public information has resulted in the creation of dashboards that track the virus continuously. Several entities use big data to create dashboards. Techniques for identifying faces and measuring infrared temperatures have been built in all leading cities. Here’s how China used big data in its seemingly Big Brother ways to contain the virus.

  • Chinese AI companies such as Hanwang Technology and SenseTime claim to have developed a special face recognition technology that can accurately identify people even if they are wearing a mask.

  • Smartphone applications are also used to keep a watch on the movements of citizens and to determine if they have been in touch with an infected person or not.

  • Al Jazeera stated that China Mobile, a telecom provider, sent text messages to state-owned media agencies, informing them of those who were infected. The messages had all the information about the travel history of the citizens.

  • CCTV cameras are placed at several locations to ensure that quarantined individuals do not step out.

Big Data vs Privacy

The basis of any data is information collection. More often, this data collection gives privacy advocates a sore eye over infringing the rights of citizens. However, it needs to be widely acknowledged and accepted that when it comes to the health industry, no data would equate to outbreaks bigger than COVID-19.

Even as critics have an alternative say when it comes to data collection, in the coming years, big data is poised to play a crucial role in analyzing global data around detected viruses, modeling disease, monitoring human activity, and visualizing the data. As more and more data keeps on piling up into massive datasets, data scientists will get a better shot to avoid such outbreaks altogether. Meanwhile, a publicly available dataset ensures enhanced transparency as well as accessibility to all stakeholders, including the very public it is meant to benefit.

Follow Me on Linkedin & Twitter

If you are interested in similar content do follow me on Twitter and Linkedin