Go back
Corona Virus and Exponential Growth

Corona Virus and Exponential Growth

Science

1 edit
Vote Up
Vote Down

@joe-shmo said
That's not really helpful, because that is not the difficult part. The difficulty arises in trying to determine the parameters, what are they, how can they be formed, etc...?

If we say that our Cumulative Deaths is f(x), then we have to solve

f(x) = Int ( e^(-x^2) ) for some group of parameters that fully manipulates e^(-x^2)

The result of that should be some non- ...[text shortened]... g some "x" into the integral and evaluating. The parameters will effect the value of that integral.
Ok., suppose daily deaths is f(x) = A exp[-m(x - a)²] where A is the maximum daily deaths, x counts days since whatever date you want to count from and a is the date of turnaround. We can take the log to get:

ln (f(x)) = ln(A) + m (x - a)²

Differentiating (we can do this on a spreadsheet by working out running differences or three point gradients or however) we get:

f'(x)/f(x) = 2m (x - a)

So we can do linear regression to get the date of turnaround and the constant m. I think we'll get pretty severe errors estimating A though. Of course, if we know m and the turnaround date pretty precisely then we can get A via the error function and cumulative deaths so far rather than trying to exponentiate some quantity we've got by averaging over daily deaths.

2 edits
Vote Up
Vote Down

@deepthought said
Ok., suppose daily deaths is f(x) = A exp[-m(x - a)²] where A is the maximum daily deaths, x counts days since whatever date you want to count from and a is the date of turnaround. We can take the log to get:

ln (f(x)) = ln(A) + m (x - a)²

Differentiating (we can do this on a spreadsheet by working out running differences or three point gradients or however) we get ...[text shortened]... hs so far rather than trying to exponentiate some quantity we've got by averaging over daily deaths.
And there it is. The way forward. I suspected the natural log ( which is fairly straightforward thinking) would be pretty useful here.

So if we plot Y = ln( f(x) ) vs x, we expect a quadratic:

Y = a*x² + b*x + c

a = 2m
b = -2am
c = ln ( A ) + ma²

When x = 0

ln ( A ) + ma² = 0

A = e^(-ma² )

You expect this to have large error due to potentially large error in "m", and "a", and further exponentiation. Am I on the right track? Forcing "a" to be where we know it is should help that now, but it seems like I would agree that it is potentially bad early on without the future knowledge that we currently have.

So how would it be better to use the Error Function to estimate A for the early data? It still seems like there is going to be the potential for substantial error there?

Vote Up
Vote Down

@joe-shmo said
And there it is. The way forward. I suspected the natural log ( which is fairly straightforward thinking) would be pretty useful here.

So if we plot Y = ln( f(x) ) vs x, we expect a quadratic:

Y = a*x² + b*x + c

a = 2m
b = -2am
c = ln ( A ) + ma²

When x = 0

ln ( A ) + ma² = 0

A = e^(-ma² )

You expect this to have large error due to potentially larg ...[text shortened]... the early data? It still seems like there is going to be the potential for substantial error there?
What I'm thinking is that if we try estimating ln(A) from the linear regression it'll have some error, but whatever the error is in that will be magnified when we exponentiate it. A should be an overall multiplier for both daily deaths and cumulative deaths. So if we use, respectively the gaussian or error function to get A that'll work better. My preference is to use the error function since cumulative deaths is sort of naturally smoothed being the sum of daily deaths.

6 edits
Vote Up
Vote Down

@deepthought said
What I'm thinking is that if we try estimating ln(A) from the linear regression it'll have some error, but whatever the error is in that will be magnified when we exponentiate it. A should be an overall multiplier for both daily deaths and cumulative deaths. So if we use, respectively the gaussian or error function to get A that'll work better. My preference is to use ...[text shortened]... error function since cumulative deaths is sort of naturally smoothed being the sum of daily deaths.
I performed the analysis:

Let the deaths per day be g(x) ( to keep consistent with prior notations used)

g(x) = A*e^(-m*( x-a )² )

ln( g(x) ) = ln(A) -m*( x -a )²

ln( g(x) ) = -m*x² + 2*m*a*x +[ ln(A) - m*a² ]

So we should expect a plot of Y = ln( g(x) ) vs x to yield a quadratic:

Y = b*x² + c*x + d

When I perform the plot it is indeed a very nice fit for the data ( with x = 25 to 41 using the 1000 death threshold ) I get the following:

b = -0.0059983 , c = 0.5322812

So:

m = -b = 0.0059938

2*m*a = c ==> a = c/(2*m) = 44.3693

Correlation Coefficient:
R² = 0.984

To get "A" I evaluated the function at x = 25, g(25) = 1028

ln ( g(25) ) = 5.5513429 = ln(A) - m*a² + 2*m*a*25 - m*25²

A = 2354

The last thing to do is numerically integrate g(x) to get f(x) from 0 to t.

As Eladar said, that part is handled easily by a graphing calculator.

for 3 days into the future:

Int[g(x)dx] = f(x) = f(45) = 28,761

For the effective end

f(100) = 53,872

That is currently aligning pretty well with projections.

Thanks for the derivation @Deepthought!

Vote Up
Vote Down

@joe-shmo said
I performed the analysis:

Let the deaths per day be g(x) ( to keep consistent with prior notations used)

g(x) = A*e^(-m*( x-a )² )

ln( g(x) ) = ln(A) -m*( x -a )²

ln( g(x) ) = -m*x² + 2*m*a*x +[ ln(A) - m*a² ]

So we should expect a plot of Y = ln( g(x) ) vs x to yield a quadratic:

Y = b*x² + c*x + d

When I perform the plot it is indeed a very nice fit ...[text shortened]...
That is currently aligning pretty well with projections.

Thanks for the derivation @Deepthought!
I forgot to mention the "d". What I did was a little bit a slight of hand.

Y = b*x² + c*x + d

Strictly from the regression the value of d:

d = -4.1340986

I had reservations about using it because normally to solve for d ( the y intercept ) we let x = 0.

That's all fine as far as the quadratic is concerned:

Y(0) = d = ln(A) - m*a²

But then, I have to solve for ln(A), and as far as the logarithm is concerned x= 0 is out of the domain that is not allowed.

Y(0) = ln ( g(0) ) = ln(0) = - ∞

We know that g(0) = 0, so if you just ignore what is above:

0 ≈ ln(A) - m*a²

A = 2152

So what I did was reset the Y intercept using a piece of data. I think I got lucky because the R^2 for the fit was very high.

This model using the intercept as is from the regression:

f( 45) = 26,305
f(100) = 49,272

So it seems like a more closely aligned model with IHME earlier on, but seems to be losing some deaths in the distant future compared to IHME.

1 edit
Vote Up
Vote Down

@joe-shmo

Their death per day model does not appear very symmetrical. It seems steeper on the way up, almost linear on the way down.

Their model projects twice as many deaths after the max death rate date as before. We are on the decline, but we will experience twice as many deaths on the way down.

Vote Up
Vote Down

@eladar said
@joe-shmo

Their death per day model does not appear very symmetrical. It seems steeper on the way up, almost linear on the way down.

Their model projects twice as many deaths after the max death rate date as before. We are on the decline, but we will experience twice as many deaths on the way down.
Yeah, I can see there must be some kind of skew applied to the distribution. As far as I'm concerned I'm pretty much done tinkering around with different fits. I've been playing around with plotting different ranges of the data, and one thing I'm now certain of is all the models seem to poorly reflect future values when we are early in the curve.

Vote Up
Vote Down

@joe-shmo

That is why part of the stats curriculum I teach concerning data and models is that the model is only valid for the original data's domain.

Vote Up
Vote Down

@joe-shmo

You and deep thought have done a great job of following and projecting this thing. I was a bit disappointed when ponderable seemed to fall off the face of the earth.

I appreciate the time and energy that both of you put into this thing.

1 edit
Vote Up
Vote Down

@Eladar

Well Thanks, Its been a fun, frustrating, but ultimately valuable learning experience. The methodology that's gets us in the ball park is obviously owed to @Deepthought. I'm just a self appointed stooge, with an obsessive personality disorder.

I will say this about @Ponderable exponential model: At this stage in the epidemic its not very accurate, so his real error was in the confidence of his domain for which it would hold. He had a valid of reasoning, I believe, but it has now been shown to be an overreach. On the other hand, its taken until the middle of the epidemic for us to really get in the ballpark with more sophisticated models. Its not clear ( at least to me ), that what we are doing is that much better for the beginning of the data set when his predictions were made. Our models, however would have the advantage of containing key features like inflection points and maximum deaths. Those features are obviously not going to show up in a reduced model like Ponderable's.

Vote Up
Vote Down

Knowing the patterns of past pandemics, flu outbreaks as well as China' s timeline was information we did not have, but the experts did.

By the way, if drops continue as they are, the experts will need to drop their predictions again.

Vote Up
Vote Down

@eladar said
Knowing the patterns of past pandemics, flu outbreaks as well as China' s timeline was information we did not have, but the experts did.

By the way, if drops continue as they are, the experts will need to drop their predictions again.
IMHE hasn't updated their model with physical information since April 9th. I suspect you are correct.

1 edit
Vote Up
Vote Down

Also, The Logistical model and the Gaussian appear to be converging to a central value. Our Logistical Model has been climbing, and the Gaussian has been dropping. With the latest data

L ( ∞ ) ≈ 39,400

and

G ( ∞ ) ≈ 41,500

Qualitatively, the gap between then has been closing as new data is added.

Vote Up
Vote Down

@joe-shmo said
@Eladar

Well Thanks, Its been a fun, frustrating, but ultimately valuable learning experience. The methodology that's gets us in the ball park is obviously owed to @Deepthought. I'm just a self appointed stooge, with an obsessive personality disorder.

I will say this about @Ponderable exponential model: At this stage in the epidemic its not very accurate, so his real ...[text shortened]... imum deaths. Those features are obviously not going to show up in a reduced model like Ponderable's.
I implemented the IHME method for the UK and get vastly different answers depending on where I start the linear regression. I had the entire population of the UK dying if I included the earlier data. My conclusion from this is that it's difficult to get total deaths from curve fitting, using the logistic curve my best fit gives about 18,000 deaths. By fitting data by eye I got 33,000 UK deaths using the error function in the way IHME did, using an automatic averaging system excluding obviously ridiculous answers I get 22,250 deaths. But I don't think their function fits UK data well, the overall answers reasonable but the results bunch too much.

What I think is going on is that their model was matched to Chinese information first. China is a huge country and similar to the US in the sense that there's mega-cities with a lot of space between them. Most European countries are much smaller and have one or two mega-cities with a relatively high population density between them, so a method tuned to China or the US will not work well in the UK.

I think the IMHE figures for the UK are overly pessimistic and my result of 18,000 is a little optimistic. The Imperial groups prediction of 20,000 deaths is looking pretty good to me, but it can be higher. The top of my 95% plausibility interval was 25,000 but I can get any result you want depending on where I take the included data in the linear regression.

Vote Up
Vote Down

@ponderable said
Taking the data from
https://www.worldometers.info/coronavirus/country/us/
which report 256 dead as of yesterday (and who give sources) and their daily number starting with one dead at 29th of February I can model the curve death over day using the exponential curve
1.61*exp(0.24*d) with a r-value of 0.97 which is reasonable.
If I calculate to day 42 (which is three we ...[text shortened]... Worldometers.info just changed to 276 death today it should be around 316 if the curve was correct.
So time has fled.

The very trivial first shot I did here proved to overestiamte after day 34.
The overestimation for 21 days was About 50% which is okay for a very lean data fit of an exponential curve.