@joe-shmo saidOk., suppose daily deaths is f(x) = A exp[-m(x - a)²] where A is the maximum daily deaths, x counts days since whatever date you want to count from and a is the date of turnaround. We can take the log to get:
That's not really helpful, because that is not the difficult part. The difficulty arises in trying to determine the parameters, what are they, how can they be formed, etc...?
If we say that our Cumulative Deaths is f(x), then we have to solve
f(x) = Int ( e^(-x^2) ) for some group of parameters that fully manipulates e^(-x^2)
The result of that should be some non- ...[text shortened]... g some "x" into the integral and evaluating. The parameters will effect the value of that integral.
ln (f(x)) = ln(A) + m (x - a)²
Differentiating (we can do this on a spreadsheet by working out running differences or three point gradients or however) we get:
f'(x)/f(x) = 2m (x - a)
So we can do linear regression to get the date of turnaround and the constant m. I think we'll get pretty severe errors estimating A though. Of course, if we know m and the turnaround date pretty precisely then we can get A via the error function and cumulative deaths so far rather than trying to exponentiate some quantity we've got by averaging over daily deaths.
@deepthought saidAnd there it is. The way forward. I suspected the natural log ( which is fairly straightforward thinking) would be pretty useful here.
Ok., suppose daily deaths is f(x) = A exp[-m(x - a)²] where A is the maximum daily deaths, x counts days since whatever date you want to count from and a is the date of turnaround. We can take the log to get:
ln (f(x)) = ln(A) + m (x - a)²
Differentiating (we can do this on a spreadsheet by working out running differences or three point gradients or however) we get ...[text shortened]... hs so far rather than trying to exponentiate some quantity we've got by averaging over daily deaths.
So if we plot Y = ln( f(x) ) vs x, we expect a quadratic:
Y = a*x² + b*x + c
a = 2m
b = -2am
c = ln ( A ) + ma²
When x = 0
ln ( A ) + ma² = 0
A = e^(-ma² )
You expect this to have large error due to potentially large error in "m", and "a", and further exponentiation. Am I on the right track? Forcing "a" to be where we know it is should help that now, but it seems like I would agree that it is potentially bad early on without the future knowledge that we currently have.
So how would it be better to use the Error Function to estimate A for the early data? It still seems like there is going to be the potential for substantial error there?
@joe-shmo saidWhat I'm thinking is that if we try estimating ln(A) from the linear regression it'll have some error, but whatever the error is in that will be magnified when we exponentiate it. A should be an overall multiplier for both daily deaths and cumulative deaths. So if we use, respectively the gaussian or error function to get A that'll work better. My preference is to use the error function since cumulative deaths is sort of naturally smoothed being the sum of daily deaths.
And there it is. The way forward. I suspected the natural log ( which is fairly straightforward thinking) would be pretty useful here.
So if we plot Y = ln( f(x) ) vs x, we expect a quadratic:
Y = a*x² + b*x + c
a = 2m
b = -2am
c = ln ( A ) + ma²
When x = 0
ln ( A ) + ma² = 0
A = e^(-ma² )
You expect this to have large error due to potentially larg ...[text shortened]... the early data? It still seems like there is going to be the potential for substantial error there?
@deepthought saidI performed the analysis:
What I'm thinking is that if we try estimating ln(A) from the linear regression it'll have some error, but whatever the error is in that will be magnified when we exponentiate it. A should be an overall multiplier for both daily deaths and cumulative deaths. So if we use, respectively the gaussian or error function to get A that'll work better. My preference is to use ...[text shortened]... error function since cumulative deaths is sort of naturally smoothed being the sum of daily deaths.
Let the deaths per day be g(x) ( to keep consistent with prior notations used)
g(x) = A*e^(-m*( x-a )² )
ln( g(x) ) = ln(A) -m*( x -a )²
ln( g(x) ) = -m*x² + 2*m*a*x +[ ln(A) - m*a² ]
So we should expect a plot of Y = ln( g(x) ) vs x to yield a quadratic:
Y = b*x² + c*x + d
When I perform the plot it is indeed a very nice fit for the data ( with x = 25 to 41 using the 1000 death threshold ) I get the following:
b = -0.0059983 , c = 0.5322812
So:
m = -b = 0.0059938
2*m*a = c ==> a = c/(2*m) = 44.3693
Correlation Coefficient:
R² = 0.984
To get "A" I evaluated the function at x = 25, g(25) = 1028
ln ( g(25) ) = 5.5513429 = ln(A) - m*a² + 2*m*a*25 - m*25²
A = 2354
The last thing to do is numerically integrate g(x) to get f(x) from 0 to t.
As Eladar said, that part is handled easily by a graphing calculator.
for 3 days into the future:
Int[g(x)dx] = f(x) = f(45) = 28,761
For the effective end
f(100) = 53,872
That is currently aligning pretty well with projections.
Thanks for the derivation @Deepthought!
@joe-shmo saidI forgot to mention the "d". What I did was a little bit a slight of hand.
I performed the analysis:
Let the deaths per day be g(x) ( to keep consistent with prior notations used)
g(x) = A*e^(-m*( x-a )² )
ln( g(x) ) = ln(A) -m*( x -a )²
ln( g(x) ) = -m*x² + 2*m*a*x +[ ln(A) - m*a² ]
So we should expect a plot of Y = ln( g(x) ) vs x to yield a quadratic:
Y = b*x² + c*x + d
When I perform the plot it is indeed a very nice fit ...[text shortened]...
That is currently aligning pretty well with projections.
Thanks for the derivation @Deepthought!
Y = b*x² + c*x + d
Strictly from the regression the value of d:
d = -4.1340986
I had reservations about using it because normally to solve for d ( the y intercept ) we let x = 0.
That's all fine as far as the quadratic is concerned:
Y(0) = d = ln(A) - m*a²
But then, I have to solve for ln(A), and as far as the logarithm is concerned x= 0 is out of the domain that is not allowed.
Y(0) = ln ( g(0) ) = ln(0) = - ∞
We know that g(0) = 0, so if you just ignore what is above:
0 ≈ ln(A) - m*a²
A = 2152
So what I did was reset the Y intercept using a piece of data. I think I got lucky because the R^2 for the fit was very high.
This model using the intercept as is from the regression:
f( 45) = 26,305
f(100) = 49,272
So it seems like a more closely aligned model with IHME earlier on, but seems to be losing some deaths in the distant future compared to IHME.
@joe-shmo
Their death per day model does not appear very symmetrical. It seems steeper on the way up, almost linear on the way down.
Their model projects twice as many deaths after the max death rate date as before. We are on the decline, but we will experience twice as many deaths on the way down.
@eladar saidYeah, I can see there must be some kind of skew applied to the distribution. As far as I'm concerned I'm pretty much done tinkering around with different fits. I've been playing around with plotting different ranges of the data, and one thing I'm now certain of is all the models seem to poorly reflect future values when we are early in the curve.
@joe-shmo
Their death per day model does not appear very symmetrical. It seems steeper on the way up, almost linear on the way down.
Their model projects twice as many deaths after the max death rate date as before. We are on the decline, but we will experience twice as many deaths on the way down.
@Eladar
Well Thanks, Its been a fun, frustrating, but ultimately valuable learning experience. The methodology that's gets us in the ball park is obviously owed to @Deepthought. I'm just a self appointed stooge, with an obsessive personality disorder.
I will say this about @Ponderable exponential model: At this stage in the epidemic its not very accurate, so his real error was in the confidence of his domain for which it would hold. He had a valid of reasoning, I believe, but it has now been shown to be an overreach. On the other hand, its taken until the middle of the epidemic for us to really get in the ballpark with more sophisticated models. Its not clear ( at least to me ), that what we are doing is that much better for the beginning of the data set when his predictions were made. Our models, however would have the advantage of containing key features like inflection points and maximum deaths. Those features are obviously not going to show up in a reduced model like Ponderable's.
@eladar saidIMHE hasn't updated their model with physical information since April 9th. I suspect you are correct.
Knowing the patterns of past pandemics, flu outbreaks as well as China' s timeline was information we did not have, but the experts did.
By the way, if drops continue as they are, the experts will need to drop their predictions again.
@joe-shmo saidI implemented the IHME method for the UK and get vastly different answers depending on where I start the linear regression. I had the entire population of the UK dying if I included the earlier data. My conclusion from this is that it's difficult to get total deaths from curve fitting, using the logistic curve my best fit gives about 18,000 deaths. By fitting data by eye I got 33,000 UK deaths using the error function in the way IHME did, using an automatic averaging system excluding obviously ridiculous answers I get 22,250 deaths. But I don't think their function fits UK data well, the overall answers reasonable but the results bunch too much.
@Eladar
Well Thanks, Its been a fun, frustrating, but ultimately valuable learning experience. The methodology that's gets us in the ball park is obviously owed to @Deepthought. I'm just a self appointed stooge, with an obsessive personality disorder.
I will say this about @Ponderable exponential model: At this stage in the epidemic its not very accurate, so his real ...[text shortened]... imum deaths. Those features are obviously not going to show up in a reduced model like Ponderable's.
What I think is going on is that their model was matched to Chinese information first. China is a huge country and similar to the US in the sense that there's mega-cities with a lot of space between them. Most European countries are much smaller and have one or two mega-cities with a relatively high population density between them, so a method tuned to China or the US will not work well in the UK.
I think the IMHE figures for the UK are overly pessimistic and my result of 18,000 is a little optimistic. The Imperial groups prediction of 20,000 deaths is looking pretty good to me, but it can be higher. The top of my 95% plausibility interval was 25,000 but I can get any result you want depending on where I take the included data in the linear regression.
@ponderable saidSo time has fled.
Taking the data from
https://www.worldometers.info/coronavirus/country/us/
which report 256 dead as of yesterday (and who give sources) and their daily number starting with one dead at 29th of February I can model the curve death over day using the exponential curve
1.61*exp(0.24*d) with a r-value of 0.97 which is reasonable.
If I calculate to day 42 (which is three we ...[text shortened]... Worldometers.info just changed to 276 death today it should be around 316 if the curve was correct.
The very trivial first shot I did here proved to overestiamte after day 34.
The overestimation for 21 days was About 50% which is okay for a very lean data fit of an exponential curve.