Exploring the Data on Coronavirus Outbreaks

Intro to Coronaviruses

History

I was disappointed to find that the origin of the term “coronaviruses” has nothing to do with the beer, but instead refers to the appearance of these viruses. The word “corona” is Latin for “crown or garland”, which is how coronaviruses appear under microscopes. However, this does explain why the corona logo has a crown on top of it. Visually, though, the virus is more similar to the sun’s corona, which is the outer part of the sun’s atmosphere. There are currently seven known human coronaviruses.

According to the CDC (2020) these are:

  • 229E
  • OC43
  • NL63
  • HKU1
  • SARS-CoV
  • MERS-CoV
  • 2019 Novel Coronavirus

The first human coronaviruses (229E and OC43) were discovered in the late 1960s (Geller, Varbanov, and Duval, 2012). It wasn’t until the 2003 SARS outbreak that the next one (SARS-CoV) was found. This was followed by NL63 in 2004, and then HKU1 in 2005. Virologists decided to take a break for seven years until they discovered the next one (only kidding; there really were no newly discovered ones during this time). The next coronavirus (MERS-CoV) was identifed due to the MERS outbreak in 2012. This brings us to the current year, when the seventh coronavirus (2019 Novel Coronavirus) was discovered. I know “2019 Novel Coronavirus” doesn’t have a great ring to it, but to be fair, MERS-CoV was actually called “Novel Coronavirus 2012” before getting its final name.

The “Common” Coronaviruses

The first four of the seven viruses (229E, NL63, OC43, and HKU1) are often associated with the common cold (CDC,2020), and can lead to more severe complications including pneumonia. According to Gaunt, Hardie, Claas, Simmonds, & Templeton (2010), infections are much more common during the winter months and cases drop to almost none during the summer . Guant et. al tested 11,661 sick people between 2006 and 2009 and found 267 (2.3%) tested positive for at least one of these four corona viruses.

The “Outbreak” Coronaviruses

The other three coronaviruses (MERS-CoV, SARS-CoV, and 2019-nCoV) are much more unique. They are new viruses that emerge after “jumping” from animals to humans. These viruses are known to infect animals, but a random mutation allows them to suddenly start infecting humans (animal-to-human transmission). An outbreak occurs when an infected person is then able to infect other people (human-to-human transmission). These new viruses are more dangerous for a variety of reasons, one of which being that, because these are truly new viruses, humans have no natural immunity.To me, this is what makes these viruses so scary. The rest of this post will focus on the these “outbreak” coronaviruses.

Severe Acute Respiratory Syndrome (SARS-CoV)

The first case of SARS-CoV occured in Guangdong, China in November of 2002 (WHO, 2013). However, China did not officially report the infection to the World Health Organization (WHO) until February 11, 2003 (WHO, 2003). The WHO issued a global alert on March 12th and reported 150 known cases on March 15th (WHO, 2003). The number of cases had increased to 1,622 by the end of March (WHO, 2003), and ballooned to 5,663 by the end of April (WHO, 2003). On May 22, 2003, the cases crossed the 8,000 mark and had already started to taper off.

Finally, on July 5, 2003 the WHO declared they had stopped and contained human-to-human transmissions of SARS (WHO, 2003). The total cases came to 8,096 with 774 deaths in 17 countries for a final mortality rate of 9.19%. See the map below for a final count of infections and deaths by country. Eventually it was found that the disease originated in horseshoe bats but went through civets (small mammals about the size of a house cat that look more like a long black/white mix of a fox and raccoon) before making the jump to humans. It wasn’t until 2017 (14 years later) that Chinese scientists isolated the exact source of SARS to a single cave in the Yunnan province of China (Hu et al., 2017). The map below contains the final infection and death counts for SARS-CoV. It is interactive; you can zoom in/out and hovering over the countries gives you the counts.

Middle East Respiratory Syndrome (MERS-CoV)

Unlike SARS-CoV, MERS-CoV is still active with new infections and deaths being reported. According to the latest update from WHO (2019), there were five new cases reported during the month of December, 2019. The total of verified cases since 2012 is currently at 2,499, with 861 deaths for a mortality rate of 34.4%. MERS-CoV was first identified during an outbreak in 2012.As its name suggests, most of the infections have originated within and impacted the Middle East with 2,106 cases occurring in Saudi Arabia alone.

It is suspected that Dromedary Camels are a natural reservoir for MERS-CoV, and at least one study has found identical infections in a patient and his camels (Azhar et al., 2014). Reusken et al. (2013) found that 100% of Omani camels and 14% of Spanish camels in their study had antibodies against MERS-CoV, which means they had the infection at some point in the past. I do want to emphasize that this link is not conclusive, and other animals (including bats) are also suspected carriers. This continued animal-to-human transmission is why MERS-CoV continues to infect people unlike SARS-CoV. It still stands to see whether 2019-nCoV will be one-and-done like SARS, if it will still be an issue after it is contained like MERS.

While researching MERS, I came across a really cool publication by Ramshaw et al. (2019). In order to better understand the geographic distribution of MERS, the authors created a database of geopositioned MERS-CoV infections. This means that it is possible for me to map MERS infections with much better accuracy than the map above. Ramshaw et al. geolocated 882 occurrences, but I am going to focus on 402 of their cases. There are two reasons I picked these cases:

  1. They are all human cases (233 were animal, 3 were environmental)
  2. They have complete longitude and latitude data (244 of the humans were missing this data)

This accounts for 16% of the total cases, but Ramshaw et al. do warn that it’s possible that some cases are duplicates. The map below shows these cases with a slider for each year. If you hover over the case it will tell you the “patient type”, which is how they got infected.

  • Patient Types:
    • Absent - Suspected case that was negative for MERS-CoV
    • Import - Virus brought from another country
    • Index - Animal-to-human infection
    • Secondary - Human-to-human infection
    • Unspecified

Novel Coronavirus 2019 (2019-nCoV)

This finally brings us to the current outbreak of coronavirus, 2019-nCoV. At the time of this writing, the outbreak is still ongoing, so please keep in mind information may change in the future. I like to think of it as taking a picture, it’s just a snapshot in the moment but does not tell the whole story.

I’m going to start with my favorite source on this topic: WHO. On December 31, 2019 China informed WHO about a cluster of 44 pneumonia cases with an unknown cause in Wuhan City, Hubei Province (WHO, 2020). On January 7, 2020, a new coronavirus was identified to be the cause. By the time WHO published its first situation report on January 20, 2020, the number of cases had increased to 282 and it had already spread to Thailand, Japan, and South Korea. The next day the cases jumped up against to 314 cases (WHO, 2020 and it has continued increasing at a high rate from there.

I was able to find a great day-by-day, timeseries dataset on Kaggle. It contains data that was taken from a really cool Johns Hopkins University Dashboard. Johns Hopkins in turn took the data from the CDC, WHO, and other health organizations. The data spans from January 22, 2020 through February 11, 2020. On January 21, 2020, there were 314 cases and 6 deaths, so the data begin after this point. The chart below shows the total infections by day.

The virus started in the Hubei province and the majority of cases are in that province (33,366 out of 45,117 or 73.9%). However, the cases have spread worldwide since then and this dataset has 11,751 of those cases. The map below shows the location of those cases with the Hubei cases removed. The size of the circle represents the number of cases (bigger circle = more cases), while the color represents the number of deaths. If you click on the legend you can add/remove circles based on the number of deaths. Hovering over a cirlce will tell you the number of infections, deaths, and recoveries for that area.

The virus is still ongoing and all the numbers have gone up since I last pulled the data. I do plan on coming back to this and creating a new post with updated numbers and some new visuals (Coronaviruses part 2). As I said earlier, this is just a snapshot in time, but it will be interesting to see how well this post ages in the future.

Avatar
Sancho Sequeira
Senior Research Analyst

I am interested in data wrangling, statistics, automating workflows, and data visualization using R.