Weekend Open Thread: Is OC Flattening Its Curve?

.

.

.

1. How Do Points Become a Curve?

Among many samples of health data visualizations I’ve seen, none of them provides a clearly understood graphic addressing the question of whether OC is “flattening its curve” of new COVID-19 cases (or ICU admissions, or deaths, all of which are useful things to track), so I decided to roll my own.  Before getting to that, let’s take a look at some of the useful resources that are out there.

Click for larger image.

 

The chart above (which you can click on for a larger image) is what you’ll get from OC’s official COVID-19 site.  (You’ll also find a find a similar graph of total deaths.)

Click for larger image.

This second graphic (on which you can also click) is a really good contribution by Jason Kraft of Brea, whose work specialty includes data visualization (though this is a freelance effort from home.)  (I’m very happy to showcase it here.)  He offers data in each city, showing cases both in absolute terms and as a proportion of that city’s data. (The icons of Brea, Placentia, and Fullerton show his local interests.)

If you look at the admonition at the bottom of the top drawing, that’s what I think has been missing.  (I’d love to be wrong — it would save me a lot of future work — so if you know of someone else who’s doing this please let me know!)  Statisticians remove some of the “noise” (or random variation) in these sorts of graphs by recognizing the the exact timing of a case appearing in the database is an imprecise measure of when it reached the criterion for inclusion that most other cases do.

Given that, we can represent the value for each day as being the average of the period — usually three, five, or seven days — in which it is centered.  This is called “smoothing the curve” — and that sends us into a little detour on introductory statistics.

2. What is this Curve We’re Trying to Flatten?

When we don’t know what a curve looks like, in most cases — and for good reasons I won’t get into here — we approximate it with something called a normal distribution, which tells us what proportion of the cases will have a given value on the horizontal axis.  (In this case, as is common in epidemiology, the horizontal axis denotes “time.”)  Plotting the points on a normal distribution gives us a “normal curve” — the most famous of the “bell-shaped” curves.

Normal curves can differ in important ways. The first two of these, mean and “standard deviation” (related to variance), are fairly well known.  (This graphic from Wikipedia illustrates them — the standard version of the normal curve, with standard deviation =1, is in red.)  We don’t know where the mean of our curve for COVID-19 cases in OC is at this point, so ignore the green variation on the standard curve.  We’re looking at the red version, which we don’t want to turn into the blue version, and which we do want to turn into the gold version.  In other words, the red version is the one that we want to “flatten.”

Why is that?  Let’s say that the faint line beginning at the “0.2” mark on the left — which is when 20% of the population is affected — refers to the proportion of people whom we can treat in our medical system in one day.  (In other words, pretend for now that we could treat 20% of the people in OC who will eventually get the virus at once.  The number is actually much lower than that.)  Everyone beyond limit that would be on their own.  Not everyone dies without treatment, but a certain proportion — based on factors such as pre-existing conditions — is expected to die.

Looking at the red curve, a lot of people die.  Looking at the blue curve, which we’d see if people actively sought out the virus to get it over with, as used to be the custom with chicken box, a much larger proportion of people die.  But with the yellow curve, literally no one dies for lack of treatment, which is another way of saying that everyone except those who don’t respond to treatment does not die.  The trade-off is that we then have to live with the disease for a lot longer — but we’ll be able to treat them all, rather than denying some people treatment due to scarcity.  And if we put off as many infections as we can for long enough, by then we may have a vaccine or a cure.

This is why we are engaging in social distancing: to keep the total cases below the level at which we can’t cope with them.

Until we get to the peak, we don’t know where we are on the curve.  Being at the 0.2 level could mean that we’re peaking on the gold curve, or about halfway to peaking on the red curve, or less than a quarter of the way to peaking on the blue curve.  That’s why people are asking the big question: “when we will peak?”

It’s more complicated than this, of course.  Technically, we are also looking for a curve that has negative kurtosis (a normal distribution with a flatter peak and thicker tails) and positive skewness (where the right tail is thicker than the left tail.)  And the curve that we would derive by the end would likely depart from normality because we’re trying to change it in midstream.  But if you understand the general concept — that we are trying to stuff down that too-large group of people in the peak so that a goodly number of them will end up in the right tail of the distribution, where they don’t overcome our capacity — then you should be able to follow the level of discussion for non-experts.

OK, now I’ll take a question from our imaginary audience:  Yes, we could try to “get it over with” — but only at the cost of a lot of people dying unnecessarily.  And those who think that they will surely be able to buy their way into the ICU with plenty of oxygen and sure access to a ventilator should not be smug: people will die even with treatment, but less of them will die in the future, when our knowledge of prevention, treatment, and cure will be better than it is right now.  (If you think that hydroxychloroquine will save you, you might turn out being right, but you’re far more likely to have gambled your life on snake oil.  We’ll have much more information in the months ahead.)

But how do we get from that bar graph at the top to a curve that we can actually examine to see what it’s doing?

Glad you asked!

3. Smoothing the Curve

At the bottom of that top graphic, it warns you that you don’t want to put too much emphasis in looking at the single data points, representing the number of cases on a single day.  But what’s the alternative?  What statisticians do with what is called “noisy” data — see how it’s jumping all around in that top graphic from day to day? — is called “Smoothing the Curve.”

The idea is this: if a phenomenon occurred on a given day, small changes in probability — based on small changes like how much sleep they happened to get, or whether they were too busy to get tested, or where their test ended up in the pile of tests to be processed in the lab — may move the date when they get confirmed as positive or negative for the virus.  Rather let that random variation drive us mad, we can instead represent each day as the average of itself and the small number of days on either side of it.  So what I’ve done is to take that array of dots and turn them into something more informative.

In this graph, starting with March 12 (which means that I’m leaving out the first cluster of cases that dropped back to 0), , I created a bar graph for each day through April 9, along with the dots representing the individual amount for each day (that is, the same information you see in the top graph) and the two days before it and the two days after it.  (You may not that there’s a dot but no bar for April 10 — the evening of which being when this was written — because the likelihood is too great that is will be significantly changed.  I included the bar for April 9, despite that it has only four of the five days accounted for, because I doubt it will chance by that much.)

This, finally, gives us something close to a curve that we can inspect and use to make predictions.

With the smoothed curve, we can see what looks like an inflection point (where it changes from concave to convex) around March 24 — though it’s both difficult and unnecessary to pinpoint an exact date.  We have an extreme outlier of reports on April 1 too large to fit onto the scale, and a similar but smaller outlier on April 8, both of which serve to produce a gap on April 4 and 5, where the first has departed and the second has not yet arrived.  (A 7-day smoothed curve would not have show as much gapping.)  What it looks like to me is that confirmations of tests are coming with gaps between batches — with a smaller outlier on April 4 and a previous one on March 28 — but that may not be the right explanation for this noise.  (Note that the cases reported on a given day consist only of ones counted through the previous day, so the April 1 report, for example, includes only cases reported through March 31.)

If we are nearing a peak, though — even one in a distribution with an asymmetric fatter right tail — there’s still a problem.  As Kraft notes in his graphic, the total number of cases as of the April 9 report was 1,138, which resulted in 17 deaths.  That seems far too low to be anywhere near the halfway point for a county as large as OC (though of course it doesn’t include mild cases or those that were mistaken for Influenza A or B, the strains that were running roughshod here during the winter.)

To me, and here I’m beyond my comfort zone, it suggests that our social distancing has been extremely effective — but also that a large part of the population remains vulnerable to new waves of infection coming from outside the county.  If so, we can prepare for them now, figuring out which industrial sectors can be relaxed as well as how we’ll apply the best lessons we’ve learned about safe distribution of goods to people in a more careful and systematic way.  If what we’ve experienced right now is a foreshock, we have been blessed with time to prepare for the Big One and its aftershocks.

Speaking of “blessed” — I don’t know if any churches here are planning huge gatherings along the lines of what we’ve seen elsewhere in the country, but that’s the sort of thing that could generate a new wave as this one rolls over us.  So turn on your TV, get in contact with your friends and family online and by phone, and have a safe and happy Easter at a good distance from harm!

My current plan is to do these updates once a week — but that too is hard to predict.

This is your Weekend Open Thread – talk about this or whatever else you’d like, consistent with applicable guidelines.


About Greg Diamond

Somewhat verbose attorney, semi-disabled and semi-retired, residing in northwest Brea. Occasionally ran for office against jerks who otherwise would have gonr unopposed. Got 45% of the vote against Bob Huff for State Senate in 2012; Josh Newman then won the seat in 2016. In 2014 became the first attorney to challenge OCDA Tony Rackauckas since 2002; Todd Spitzer then won that seat in 2018. Every time he's run against some rotten incumbent, the *next* person to challenge them wins! He's OK with that. Corrupt party hacks hate him. He's OK with that too. He does advise some local campaigns informally and (so far) without compensation. (If that last bit changes, he will declare the interest.) His daughter is a professional campaign treasurer. He doesn't usually know whom she and her firm represent. Whether they do so never influences his endorsements or coverage. (He does have his own strong opinions.) But when he does check campaign finance forms, he is often happily surprised to learn that good candidates he respects often DO hire her firm. (Maybe bad ones are scared off by his relationship with her, but they needn't be.)