With today being day 37, I think I found a decent logistics curve. Using my quadratic model, I estimated out to day 54, with day 48 as maximum death rate
Total deaths = c/(1+ae^(-bx)
a=1522.548562
b=.1606481609
c=41495
I looked at the table and found on day 73 deaths drop below 10 per day. So my Logistic model predicts May 7 as the day the pandemic is over for the US with total deaths about 41500.
Margin of Error based on the data is 741, but since much data was estimated, not sure how valid any of this is lol.
@eladar saidWell, I can't get close to reproducing the US curve, but have a reasonable fit for the UK curve. I plotted your function against actual data and it isn't far off, but a note of caution. For the UK data I did linear regression on (daily deaths/cumulative deaths) vs cumulative deaths using various different start dates for the regression and got the following sets of outputs.
With today being day 37, I think I found a decent logistics curve. Using my quadratic model, I estimated out to day 54, with day 48 as maximum death rate
Total deaths = c/(1+ae^(-bx)
a=1522.548562
b=.1606481609
c=41495
I looked at the table and found on day 73 deaths drop below 10 per day. So my Logistic model predicts May 7 as the day the pandemic is over for the ...[text shortened]... based on the data is 741, but since much data was estimated, not sure how valid any of this is lol.
Date of initial point | total deaths | m | date of inflection point.
5th March | 7,490 | 0.2872 | 3rd April
14th March | 9,592 | 0.2595 | 5th April
26th March | 15,468 | 0.2255 | 8th April
28th March | 15,868 | 0.2235 | 8th April
All of these curves fit the UK data reasonably well and they differ in number of deaths by a factor of 2. It's possible to get reasonable looking answers and be way out.
Since the number of deaths in the UK looks like it's falling and the second entry in the list has the lowest root adjusted mean square error compared with the actual data - I expect deaths in the UK to continue to fall. So I think with some luck the UK might narrowly avoid 10,000 deaths. It remains to be seen.
With US data it might be better to analyse New York separately as it's dominating the figures, and so you're adding the logistic curve for NY to the logistic curve for the rest of the US and they probably have different parameters. Especially the exponent can be different.
I've discovered this [1] which handily gives time series (in inordinate detail) for the entire world as well as a database of deaths by county in the US. So it should be possible to do the analysis by individual State or County.
One problem is that they're losing track of home deaths of covid-19 in New York as they are no longer testing people who died at home [2]. This can severely skew the figures.
I'll do the analysis tomorrow.
[1] https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
[2] https://twitter.com/MarkLevineNYC/status/1247155576221716480
@joe-shmo said@Deepthought
I'm getting behind on the models! I'm having trouble seeing this algebra ( currently not sure if I'm loosing it, or you are again doing some algebraic manipulation above my head):
If
g(t) = m f(t) ( 1 - f(t)/a )
Then
g(t)/f(t) = m ( 1 - f(t)/a )
g(t)/f(t) = m - (m/a)*f(t)
How are you getting?
g(t)/f(t) = m*f(t) - m/a
I'll just assume that was a clerical error.
If perform the procedure you suggest on the US data (from worldometers ), the figures closely resemble one another. -m/a as the slope of the regression and m as the intercept. For data up to 4/6, your method yields:
m = 0.24534
m/a = 1.2005e-5
a = 20436
Again, very nice fit for the data. Awesome job, thanks for figuring this out. It has been a valuable learning experience about the perils of non-linear regressions! Seems to be coming up a bit shy on projected death totals? I obviously haven't done anything with error.
@joe-shmo saidThe problem is that the error in (m/a) absolutely kills the predictive power of the method. Suppose our estimate for (m/a) is M and the error in (m/a) is Mε. The estimate for m is C the error in m is Cδ then we have:
@Deepthought
I'll just assume that was a clerical error.
If perform the procedure you suggest on the US data (from worldometers ), the figures closely resemble one another. -m/a as the slope of the regression and m as the intercept. For data up to 4/6, your method yields:
m = 0.24534
m/a = 1.2005e-5
a = 20436
Again, very nice fit for the data. Awesome job, ...[text shortened]... to be coming up a bit shy on projected death totals? I obviously haven't done anything with error.
a = m/(m/a) = (C + Cδ )/(M + Mε ) = C/M (1 + δ )/(1 + ε ) = C/M (1 + δ )(1 - ε + O(ε² ))
δ is coming out of the order of a couple of percent, which is fine. But ε is big, depending on where I start doing the regression from it can be larger than the estimate for (m/a), which means that the error in the estimate for total deaths is arbitrarily large.
By the way, you can download decent time series data below, which has the benefit that you can also break up the analysis by individual State.
[1] https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
@deepthought saidDo you have any reason to suspect the model will gain stability after the inflection point has been reached?
The problem is that the error in (m/a) absolutely kills the predictive power of the method. Suppose our estimate for (m/a) is M and the error in (m/a) is Mε. The estimate for m is C the error in m is Cδ then we have:
a = m/(m/a) = (C + Cδ )/(M + Mε ) = C/M (1 + δ )/(1 + ε ) = C/M (1 + δ )(1 - ε + O(ε² ))
δ is coming out of the order of a couple of percent, which i ...[text shortened]... https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
@deepthought saidUsing a Med-Med Linear regression seems to stabilize both parameters, as opposed to best fit method. Perhaps it is a better estimation?
The problem is that the error in (m/a) absolutely kills the predictive power of the method. Suppose our estimate for (m/a) is M and the error in (m/a) is Mε. The estimate for m is C the error in m is Cδ then we have:
a = m/(m/a) = (C + Cδ )/(M + Mε ) = C/M (1 + δ )/(1 + ε ) = C/M (1 + δ )(1 - ε + O(ε² ))
δ is coming out of the order of a couple of percent, which i ...[text shortened]... https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
@joe-shmo saidYes, once a countries past the inflection point predictions become much more reliable. I did predictions for the entire world yesterday. The model correctly predicts the number of deaths in China. It's getting the same number of deaths as IHME for Italy and now the US for the "by county" calculation, I get 40,000 for data for the US as a whole. I think at State level is probably the best for making predictions for the US. I'm getting 14,500 for the UK. I think IHME penalize incomplete lock downs too much, also they might assume overkill due to running out of hospital beds, and the NHS hasn't run out of beds. It's pretty poor early on in the curve, but an exponential estimator is better there, at least for "m". Comedy results were for Cuba - I had an average of 64 quadrillion deaths, but a median figure of 23 deaths. So I need some way of eliminating obviously stupid data points from the averaging.
Do you have any reason to suspect the model will gain stability after the inflection point has been reached?
I'm having problems with the date of the point of inflection for some reason, although this can be goofing up as I've got to enter a formula and then copy and paste it over an entire page of the spreadsheet and it's easy to forget a $ sign and reference data from the wrong place.
I've been playing a little with data smoothing. Instead of using daily deaths as the derivative we can do three point linear regression to get an estimate of the derivative. That seems to produce slightly more believable answers. Especially early in the data some way of smoothing out fluctuations and delays in recording deaths will help a lot - especially if the method tends to reduce errors rather than magnify them.
I'll put the med-med regression you mentioned on the list of things to look at.
@deepthought saidI was thinking that would be the case. Yeah, it seems as the data does need to be sanded a bit... The Med-Med does that for the linear regression ( I wrote a VBA program to generate the coefficients for it yesterday, its a little bit of process) So, other than the specifics of the process, it sounds like we are on the same page on that.
Yes, once a countries past the inflection point predictions become much more reliable. I did predictions for the entire world yesterday. The model correctly predicts the number of deaths in China. It's getting the same number of deaths as IHME for Italy and now the US for the "by county" calculation, I get 40,000 for data for the US as a whole. I think at State level ...[text shortened]... han magnify them.
I'll put the med-med regression you mentioned on the list of things to look at.
@joe-shmo saidRegarding the clerical error. Yes, I goofed up typing the equation in.
@Deepthought
I'll just assume that was a clerical error.
If perform the procedure you suggest on the US data (from worldometers ), the figures closely resemble one another. -m/a as the slope of the regression and m as the intercept. For data up to 4/6, your method yields:
m = 0.24534
m/a = 1.2005e-5
a = 20436
Again, very nice fit for the data. Awesome job, ...[text shortened]... to be coming up a bit shy on projected death totals? I obviously haven't done anything with error.
@joe-shmo saidOne interesting thing. IHME are not doing regression with a logistic function, they're doing regression using the error function (the integral of exp[-x²]), as they claim it fits the data better [1]. We could have a look at seeing if this is possible on a spreadsheet, but I'd rather see if smoothing data makes the predictions better.
I was thinking that would be the case. Yeah, it seems as the data does need to be sanded a bit... The Med-Med does that for the linear regression ( I wrote a VBA program to generate the coefficients for it yesterday, its a little bit of process) So, other than the specifics of the process, it sounds like we are on the same page on that.
[1] https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1
@deepthought saidFor a purely academic experience I would like to see what the methodology might be for setting up that regression based on the error function ( just to understand what they were grappling with). Surely it has to be a more complicated procedure?
One interesting thing. IHME are not doing regression with a logistic function, they're doing regression using the error function (the integral of exp[-x²]), as they claim it fits the data better [1]. We could have a look at seeing if this is possible on a spreadsheet, but I'd rather see if smoothing data makes the predictions better.
[1] https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1
On a side note: either/or have been very effective at making accurate predictions for any sizable future date. The IMHE model is constantly being down graded, I think its current projection has moved well outside the ( very large ) initial error bars in the model. I personally think that has to say something negative about what the experts actually know about the spread of an epidemic and the factors at play. They will undoubtedly have better methods and understanding in the future, but I'm sure there will be some controversy over it ( and rightfully so ) when this subsides.