Lies, damn lies, and statistics

I confess that my headline is intended to grab a few additional readers, because the topic of today’s post – data – isn’t very sexy. In our alternative facts world, having a bit more respect for actual data would be helpful. But the other end of the spectrum, where “data-driven” becomes an obsession or worse, a shield for making difficult decisions without any accountability (“we just followed the data”), is just as bad. And not understanding what the data actually means is worst of all.

All of these problems existed, and continue to exist, when it comes to the COVID-19 pandemic. A year ago, the federal government basically gave up on trying to compile data on COVID testing, infection rates, and deaths. It was left to a handful of reporters and editors at The Atlantic magazine – all working remotely – to come up with what they called the “COVID Tracking Project.” It became the de facto standard for reporting on the pandemic, so much so that the federal government started using their charts in presentations because they had no similar sources themselves, and it spawned hundreds of imitators at the state and local level. Launched by reporters Robinson Meyer and Alexis C. Madrigal along with editor Erin Kissane and data scientist Jeff Hammerbacher, the COVID Tracking Project eventually added several other Atlantic staff members and others to help compile the massive amount of raw data, make sense of and standardize the different measurements, and figure out how to most clearly and effectively communicate the results graphically. It was an immense undertaking and was also very successful, and frankly we owe these folks a huge thank you for their journalism, which filled a critical need while our national leaders were playing politics with the pandemic response.

The COVID Tracking Project has stopped compiling data and will be shutting down for good later this spring, and Meyer and Madrigal have written a look back at its origins as well as providing a warning that we’re still not getting the pandemic data right even today. It’s worth a read.

Significantly, they point out that the speed of data coming in varies greatly and can be affected by many factors, which need to be considered when making assumptions and reacting to changes in the data. We’re making important decisions about when to reopen (or re-close) schools, retail stores, restaurants and bars, and other public health issues using the data, but those decisions are still reacting to data that isn’t fully understood:

Additionally, as I’ve written about before, there continues to be those who argue that the pandemic won’t be “over” and we can’t go back to “normal” until the COVID-19 virus is completely eradicated. That might happen eventually, though it seems unlikely. What is more likely is that the protection provided by the vaccine will reduce the severity of illness from COVID-19 to not much more than a case of the flu or the “common cold” (some variants of which, interestingly, are also caused by coronaviruses, which leads to some optimism that the mRNA method of developing vaccines might eventually “cure the common cold”). While people do die from the flu, especially in years when the influenza strain is particularly virulent, we don’t shut down society when that happens. Instead, we formulate a new flu shot and hope that most people will bother to get it, which in the past hasn’t been the case. Perhaps moving forward we’ll be more diligent about those other available vaccines as well. Perhaps.

Accurate data is useful. Inaccurate data, data whose origins aren’t understood, or just plain made-up numbers masquerading as data, are useless – yet they’re sometimes used as justifications for questionable decisions. The pandemic has shown us that; it would be nice to think we could learn from our mistakes this time.