Why is Coronavirus Data Visualization So Bad?
As a former CDC outbreak investigator (in the Epidemic Intelligence Service), I’ve followed the current outbreak of novel coronavirus with interest, especially the data visualization.
If you’ve been following the outbreak, you have probably noticed the predominance of mapping in data visualization about the virus. Maps are great, of course, but I suspect that the huge explosion in GPS/GIS/mapping capabilities over the last decade has given us all a “hammer in search of a nail”, applying maps even when better dataviz tools might apply.
How about this much-shared map from Johns Hopkins Center for Systems Science and Engineering (CSSE), which has been promoted by CNET, ZDNet, and the NY Times, among many others :
Or these from CNN and the NY Times:
Although these maps are promoted to help readers “track the outbreak”, they all illustrate why maps are not the primary method used by epidemiologists for tracking outbreaks:
Maps show a single point in time, not a progression
You can’t look at a snapshot and know what happened before that snapshot. Likewise, a single map just shows you “now”. To follow progression using maps you need to view multiple copies of the same map over time and ascertain the changes. The NYT/Hopkins maps, however, do not allow you to view changes over time.
Undaunted, I took to the Internet Archive to get the NYT map for the last six days — which required 30 minutes of compiling, cutting, and pasting. Here’s what I found:
The New York Times tells us that we can use this map to “track the spread of the outbreak”. Can we? I challenge you to look at those six days of maps and tell me what has changed in this outbreak.
But the outbreak has changed, and we’ll see that below.
Maps distort the data
As noted above, the maps make it look like the entire country of China is engulfed by nCoV, but there are only about 35k cases in a population of 1.4 billion. Only 0.002% of people in China have coronavirus!
Likewise the CNN map of “coronavirus’ global spread”, above, seems to show that North America, Europe, Australia, and Russia are, along with China, solidly infected — when there are a total of 12 cases, for example, in the US, out of a population of 330 million.
Sequential maps might be a good way of showing the international expansion of the disease IF it was spreading substantially on an international basis but it isn’t (at least not yet): there are fewer than 400 cases of this coronavirus outside of China. In the US, for example, there are 12. One week ago there were 7. Not exactly a wildfire.
What’s better than a map? An epi curve
What makes the overwhelming reliance on maps so strange is that we have a tried and tested tool to compare values over time. It’s called a column chart — or in the context of an epidemic, an “epi curve.” According to CDC:
An epi curve is a visual display of the onset of illness among cases associated with an outbreak.
And it turns out that we do have an epi curve for novel coronavirus, from the hardworking public health pros at the World Health Organization (technically, since this is "date of report" it's not a true epi curve, but it still works for this example). In the epi curve below, I’ve circled the same six days depicted in the six sequential maps above.
What? Hey? Wait a minute: the number of cases is trending down?!
Yes, the number of reported cases has dropped for 3 of the last 6 days. There could be many explanations for this, but the most likely one (especially at a stage when the response in China has now had time to gin up testing capabilities and hospital beds, etc) is the simplest: that cases are dropping. And that information appears nowhere at the CSSE dashboard, or at the NY Times coronavirus page. Or at any of the other major news sites that I’ve seen.
WHO has another epi curve (this one is by data of onset of symptoms), somewhat buried in its daily “sitrep” (situation report) showing the cases outside China:
And you will note that this epi curve is ALSO going down.
If maps are so bad at tracking outbreaks, why is the news full of them?
Of course, I’m not against maps. I’m a map nerd from way back. But to include maps and exclude epi curves leaves readers and viewers dangerously under-informed.
We can see above that
Maps can exaggerate the extent of the coronavirus outbreak: China has a lot of cases, but it’s a tiny fraction of the population — but the most-shared maps make it look like everyone in the country is infected.
The most useful dataviz tool — the epi curve — is missing in action, and not shown on any of those media sites.
Very likely the most important fact about the novel coronavirus outbreak in China — that it may have peaked — is completely obscured by the data visualization tool selected by every single major media outlet.
I don’t know why the media is making these maps the center of coronavirus data visualization, but if I had to guess I would say it’s because the last ten years have given us a LOT of legitimately cool map technology and maps look a lot cooler than column charts (especially in dramatic black and red).
And as the old journalism saying goes, “if it bleeds it leads” — and maps make things look worse.
But including maps with no epi curve distorts and hides crucial info about this important story, which should be exactly the kind of public disservice that an outfit like the New York Times wants to avoid.
Read more about the problems with coronavirus and data visualizations in the followup piece: CoronaGeddon 2019: Why Every Map You've Seen of the Outbreak is Wrong