Corona Virus and Exponential Growth

Eladar

Science

21 Mar 20 02:04

Eladar

Joined: 12 Jul 08
Moves: 13814

06 Apr 20 06:37

4 edits

With today being day 37, I think I found a decent logistics curve. Using my quadratic model, I estimated out to day 54, with day 48 as maximum death rate

Total deaths = c/(1+ae^(-bx)
a=1522.548562
b=.1606481609
c=41495

I looked at the table and found on day 73 deaths drop below 10 per day. So my Logistic model predicts May 7 as the day the pandemic is over for the US with total deaths about 41500.

Margin of Error based on the data is 741, but since much data was estimated, not sure how valid any of this is lol.

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

06 Apr 20 19:56

1 edit

@eladar said
With today being day 37, I think I found a decent logistics curve. Using my quadratic model, I estimated out to day 54, with day 48 as maximum death rate

Total deaths = c/(1+ae^(-bx)
a=1522.548562
b=.1606481609
c=41495

I looked at the table and found on day 73 deaths drop below 10 per day. So my Logistic model predicts May 7 as the day the pandemic is over for the ...[text shortened]... based on the data is 741, but since much data was estimated, not sure how valid any of this is lol.

Well, I can't get close to reproducing the US curve, but have a reasonable fit for the UK curve. I plotted your function against actual data and it isn't far off, but a note of caution. For the UK data I did linear regression on (daily deaths/cumulative deaths) vs cumulative deaths using various different start dates for the regression and got the following sets of outputs.

Date of initial point | total deaths | m | date of inflection point.
5th March | 7,490 | 0.2872 | 3rd April
14th March | 9,592 | 0.2595 | 5th April
26th March | 15,468 | 0.2255 | 8th April
28th March | 15,868 | 0.2235 | 8th April

All of these curves fit the UK data reasonably well and they differ in number of deaths by a factor of 2. It's possible to get reasonable looking answers and be way out.

Since the number of deaths in the UK looks like it's falling and the second entry in the list has the lowest root adjusted mean square error compared with the actual data - I expect deaths in the UK to continue to fall. So I think with some luck the UK might narrowly avoid 10,000 deaths. It remains to be seen.

With US data it might be better to analyse New York separately as it's dominating the figures, and so you're adding the logistic curve for NY to the logistic curve for the rest of the US and they probably have different parameters. Especially the exponent can be different.

Eladar

Joined: 12 Jul 08
Moves: 13814

06 Apr 20 22:10

@DeepThought

If this thing is seasonal, then it will not matter.

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

07 Apr 20 00:55

I've discovered this [1] which handily gives time series (in inordinate detail) for the entire world as well as a database of deaths by county in the US. So it should be possible to do the analysis by individual State or County.

One problem is that they're losing track of home deaths of covid-19 in New York as they are no longer testing people who died at home [2]. This can severely skew the figures.

I'll do the analysis tomorrow.

[1] https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
[2] https://twitter.com/MarkLevineNYC/status/1247155576221716480

Eladar

Joined: 12 Jul 08
Moves: 13814

07 Apr 20 00:59

Lol you are ridiculous. You are about to lose the this will wipe out 200k people argument and all you have is they are not testing people who die at home.

Eladar

Joined: 12 Jul 08
Moves: 13814

07 Apr 20 02:06

Looks like my quadratic is not catching up! That would mean my logistic regression will be a lower bound.

~~Removed~~

07 Apr 20 18:58

1 edit

@joe-shmo said
I'm getting behind on the models! I'm having trouble seeing this algebra ( currently not sure if I'm loosing it, or you are again doing some algebraic manipulation above my head):

If

g(t) = m f(t) ( 1 - f(t)/a )

Then

g(t)/f(t) = m ( 1 - f(t)/a )

g(t)/f(t) = m - (m/a)*f(t)

How are you getting?

g(t)/f(t) = m*f(t) - m/a

@Deepthought

I'll just assume that was a clerical error.

If perform the procedure you suggest on the US data (from worldometers ), the figures closely resemble one another. -m/a as the slope of the regression and m as the intercept. For data up to 4/6, your method yields:

m = 0.24534
m/a = 1.2005e-5
a = 20436

Again, very nice fit for the data. Awesome job, thanks for figuring this out. It has been a valuable learning experience about the perils of non-linear regressions! Seems to be coming up a bit shy on projected death totals? I obviously haven't done anything with error.

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

08 Apr 20 00:07

2 edits

@joe-shmo said
@Deepthought

I'll just assume that was a clerical error.

If perform the procedure you suggest on the US data (from worldometers ), the figures closely resemble one another. -m/a as the slope of the regression and m as the intercept. For data up to 4/6, your method yields:

m = 0.24534
m/a = 1.2005e-5
a = 20436

Again, very nice fit for the data. Awesome job, ...[text shortened]... to be coming up a bit shy on projected death totals? I obviously haven't done anything with error.

The problem is that the error in (m/a) absolutely kills the predictive power of the method. Suppose our estimate for (m/a) is M and the error in (m/a) is Mε. The estimate for m is C the error in m is Cδ then we have:

a = m/(m/a) = (C + Cδ )/(M + Mε ) = C/M (1 + δ )/(1 + ε ) = C/M (1 + δ )(1 - ε + O(ε² ))

δ is coming out of the order of a couple of percent, which is fine. But ε is big, depending on where I start doing the regression from it can be larger than the estimate for (m/a), which means that the error in the estimate for total deaths is arbitrarily large.

By the way, you can download decent time series data below, which has the benefit that you can also break up the analysis by individual State.

[1] https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

~~Removed~~

08 Apr 20 13:50

@deepthought said
The problem is that the error in (m/a) absolutely kills the predictive power of the method. Suppose our estimate for (m/a) is M and the error in (m/a) is Mε. The estimate for m is C the error in m is Cδ then we have:

a = m/(m/a) = (C + Cδ )/(M + Mε ) = C/M (1 + δ )/(1 + ε ) = C/M (1 + δ )(1 - ε + O(ε² ))

δ is coming out of the order of a couple of percent, which i ...[text shortened]... https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

Do you have any reason to suspect the model will gain stability after the inflection point has been reached?

~~Removed~~

08 Apr 20 15:48

@deepthought said
The problem is that the error in (m/a) absolutely kills the predictive power of the method. Suppose our estimate for (m/a) is M and the error in (m/a) is Mε. The estimate for m is C the error in m is Cδ then we have:

a = m/(m/a) = (C + Cδ )/(M + Mε ) = C/M (1 + δ )/(1 + ε ) = C/M (1 + δ )(1 - ε + O(ε² ))

δ is coming out of the order of a couple of percent, which i ...[text shortened]... https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series

Using a Med-Med Linear regression seems to stabilize both parameters, as opposed to best fit method. Perhaps it is a better estimation?

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

09 Apr 20 18:57

@joe-shmo said
Do you have any reason to suspect the model will gain stability after the inflection point has been reached?

Yes, once a countries past the inflection point predictions become much more reliable. I did predictions for the entire world yesterday. The model correctly predicts the number of deaths in China. It's getting the same number of deaths as IHME for Italy and now the US for the "by county" calculation, I get 40,000 for data for the US as a whole. I think at State level is probably the best for making predictions for the US. I'm getting 14,500 for the UK. I think IHME penalize incomplete lock downs too much, also they might assume overkill due to running out of hospital beds, and the NHS hasn't run out of beds. It's pretty poor early on in the curve, but an exponential estimator is better there, at least for "m". Comedy results were for Cuba - I had an average of 64 quadrillion deaths, but a median figure of 23 deaths. So I need some way of eliminating obviously stupid data points from the averaging.

I'm having problems with the date of the point of inflection for some reason, although this can be goofing up as I've got to enter a formula and then copy and paste it over an entire page of the spreadsheet and it's easy to forget a $ sign and reference data from the wrong place.

I've been playing a little with data smoothing. Instead of using daily deaths as the derivative we can do three point linear regression to get an estimate of the derivative. That seems to produce slightly more believable answers. Especially early in the data some way of smoothing out fluctuations and delays in recording deaths will help a lot - especially if the method tends to reduce errors rather than magnify them.

I'll put the med-med regression you mentioned on the list of things to look at.

~~Removed~~

09 Apr 20 20:12

@deepthought said
Yes, once a countries past the inflection point predictions become much more reliable. I did predictions for the entire world yesterday. The model correctly predicts the number of deaths in China. It's getting the same number of deaths as IHME for Italy and now the US for the "by county" calculation, I get 40,000 for data for the US as a whole. I think at State level ...[text shortened]... han magnify them.

I'll put the med-med regression you mentioned on the list of things to look at.

I was thinking that would be the case. Yeah, it seems as the data does need to be sanded a bit... The Med-Med does that for the linear regression ( I wrote a VBA program to generate the coefficients for it yesterday, its a little bit of process) So, other than the specifics of the process, it sounds like we are on the same page on that.

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

10 Apr 20 01:44

@joe-shmo said
@Deepthought

I'll just assume that was a clerical error.

If perform the procedure you suggest on the US data (from worldometers ), the figures closely resemble one another. -m/a as the slope of the regression and m as the intercept. For data up to 4/6, your method yields:

m = 0.24534
m/a = 1.2005e-5
a = 20436

Again, very nice fit for the data. Awesome job, ...[text shortened]... to be coming up a bit shy on projected death totals? I obviously haven't done anything with error.

Regarding the clerical error. Yes, I goofed up typing the equation in.

DeepThought

Losing the Thread

Quarantined World

Joined: 27 Oct 04
Moves: 87415

10 Apr 20 01:49

1 edit

@joe-shmo said
I was thinking that would be the case. Yeah, it seems as the data does need to be sanded a bit... The Med-Med does that for the linear regression ( I wrote a VBA program to generate the coefficients for it yesterday, its a little bit of process) So, other than the specifics of the process, it sounds like we are on the same page on that.

One interesting thing. IHME are not doing regression with a logistic function, they're doing regression using the error function (the integral of exp[-x²]), as they claim it fits the data better [1]. We could have a look at seeing if this is possible on a spreadsheet, but I'd rather see if smoothing data makes the predictions better.

[1] https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1

~~Removed~~

10 Apr 20 19:25

1 edit

@deepthought said
One interesting thing. IHME are not doing regression with a logistic function, they're doing regression using the error function (the integral of exp[-x²]), as they claim it fits the data better [1]. We could have a look at seeing if this is possible on a spreadsheet, but I'd rather see if smoothing data makes the predictions better.

[1] https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1

For a purely academic experience I would like to see what the methodology might be for setting up that regression based on the error function ( just to understand what they were grappling with). Surely it has to be a more complicated procedure?

On a side note: either/or have been very effective at making accurate predictions for any sizable future date. The IMHE model is constantly being down graded, I think its current projection has moved well outside the ( very large ) initial error bars in the model. I personally think that has to say something negative about what the experts actually know about the spread of an epidemic and the factors at play. They will undoubtedly have better methods and understanding in the future, but I'm sure there will be some controversy over it ( and rightfully so ) when this subsides.