Saturday, March 28

COVID19 and exponential growth

Unconstrained, the spread of a new epidemic often follows an exponential growth curve (at least in the early days). When this happens, the cumulative number of cases ($totalCases_t$) at time ($t$) follows a formula like one of the following expressions for exponential growth:

$$\begin{align}
totalCases_t & = a b^t \\
 & = a(1+r)^t \\
 & = a e^{kt}; \ \  (where \ k=ln(1+r)=ln(b)) \\
\end{align}$$
In these equations, $a$ is the total cases at the start (when $t=0$). The equation is subject to the constraints ($a>0, b>1, r>0, k>0$). Some of you will recognise $a(1+r)^t$ as the formula for compound interest. Compound interest is another form of exponential growth.

The interesting thing about exponential growth in cumulative cases is that it depends on exponential growth in new cases. Let's take a moment to prove this. We can solve the number of new cases for any point in time with a difference equation. For example, if our time period is days, then the number of new cases today is the total number of cases we have at the end of today, minus the total number of cases we had at the end of yesterday.

$$\begin{align}
newCases_t & = totalCases_t - totalCases_{t-1} \\
 & = a b^t - a b^{t-1}\\
 & = a(b^t - b^{t-1})\\
 & = a((b-1)b^{t-1})\\
 & = (ab-a) \left(\frac{b^t}{b}\right) \\
 & = a\left( 1-\frac{1}{b}\right) b^t \\
 & = c b^t; \ \  (where \  c=a\left( 1-\frac{1}{b}\right))
\end{align}$$
This last equation is demonstrably an exponential equation, in the same general form as the very first line in the previous block of equations. Also, it maintains the original constrains, as if ($a>0$, and $b>1$), then ($c>0$). Therefore, if the cumulative number of cases is growing exponentially over time, then the new cases on each day are also growing exponetially (QED).

We can quickly check our algebra with some python code.
In  [1]: import pandas as pd 
    ...: a = 7 # day 0 staring point - selected at random - must be greater than 0 
    ...: b = 1.5 # growth rate - selected at random - must be greater than 1 
    ...: t = pd.Series(list(range(1,11))) 
    ...: total_cases_f = a * (b ** t) 
    ...: nu_cases_diff = total_cases_f - total_cases_f.shift(1).fillna(a) 
    ...: nu_cases_f = (a * (1 - 1/b)) * (b ** t) 
    ...: print(pd.DataFrame([t, total_cases_f, nu_cases_diff, nu_cases_f], 
    ...:                     index=['t', 'Total Cases',  
    ...:                     'New Cases (by difference)', 
    ...:                     'New Cases (by formula)']).T) 
    ...:                                                                                                                
      t  Total Cases  New Cases (by difference)  New Cases (by formula)
0   1.0    10.500000                   3.500000                3.500000
1   2.0    15.750000                   5.250000                5.250000
2   3.0    23.625000                   7.875000                7.875000
3   4.0    35.437500                  11.812500               11.812500
4   5.0    53.156250                  17.718750               17.718750
5   6.0    79.734375                  26.578125               26.578125
6   7.0   119.601562                  39.867188               39.867188
7   8.0   179.402344                  59.800781               59.800781
8   9.0   269.103516                  89.701172               89.701172
9  10.0   403.655273                 134.551758              134.551758

And thankfully, it checks out. We get the same answer for new cases from the difference method, and from the exponential equation for new cases, we derived above.

Why all that boring math. Well when we over-plot total cases and new cases on the same chart (but to different scales), it is easy to pick if both series are growing exponentially, or in some other way.

We can demonstrate this with a couple of synthetic examples, and then apply what we have learnt from the synthetic charts to real data from the COVID-19 pandemic. For the synthetic data, we will produce two quick plots, the first will be an exponential growth of 20 per cent per day for thirty days. In the second chart, we will do exponential growth for 25 days, followed by linear growth for the last 10 days.

The first plot - of pure exponential growth - looks like this. The blue line of cumulative exponential growth (with a scale on the right-hand side) maps perfectly to the red bars of exponential growth in new cases reported each day (with a scale on the left-hand side). They overlap perfectly because the axes for the two charts are scaled to fill the available chart space (and because of the relationship identified above).


In the next chart, we have 25 days of exponential growth followed by five days of constant growth. Just because it looks like more red on the chart, notice that the left-hand axis only goes to 16 in this chart. It went to 40 in the previous chart. Also note that the blue-line stops curving upwards from day 25, and tracks in a linear fashion (remember $y=mx+b$ from high school).


The key thing to note when we have less than exponential growth, the chart has a lot of red above the dotted blue line for the cumulative total (which is scaled on the right-hand side)

Let's apply this observation to some real-world examples.

If we look at the growth in cases in the United States, it looks pretty close to being exponential. While there is some red above the blue line, that is probably a result of noise in the data, and perhaps constraints with making testing available.


Conversely, if we look at the growth in cases in Italy, we can see a lot of red above the blue line. The spike on 16 March, is probably making up for the limited additional cases reported on 15 March: a data reporting issue. But more generally, it looks like exponential growth progressed to around 22 March, and since then we have seen linear growth in the order of 5000 to 6000 new cases a day. While that might seem a lot, it is much less than it would have been if the prior exponential growth had continued unabated.


And that is why I am a fan of these charts that plot cumulative growth over time alongside new cases or new deaths for each day.

The full set of charts for each nation can be found on my GitHub gist site. But please note, you won't be able to view the gist from a mobile phone.

Finally, the python code I used to generate the synthetic charts above.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')

def plot_incidence(name, incidence, cumulative, mode):
        
    assert((incidence.index==cumulative.index).all())
        
    # plot daily incidence
    ax = plt.subplot(111)
    labels = [f'{int(p)}' for p in incidence.index]
    ax.bar(labels, incidence.values, 
               color='#dd0000', label=f'New {mode}')
    ax.set_title(f'Synthetic: {name}')
    ax.set_xlabel('Day')
    ax.set_ylabel(None)
        
    # plot cumulative total
    axr = ax.twinx()
    axr.plot(labels, cumulative.values,
             lw=2.0, color='#0000dd', ls='--',
             label=f'Cumulative {mode}')
    axr.set_ylabel(None)
    axr.grid(False)

    # manually label the x-axis
    MAX_LABELS = 9
    ticks = ax.xaxis.get_major_ticks()
    if len(ticks):
        modulus = int(np.floor(len(ticks) / MAX_LABELS) + 1)
        for i in range(len(ticks)):
            if i % modulus:
                ticks[i].label1.set_visible(False)
        
    # put in a legend
    ax.legend(loc='upper left')
    axr.legend(loc='upper center')

    # align the base of the left and right scales
    yliml = list(ax.get_ylim())
    yliml[0] = 0
    ax.set_ylim(yliml)
    ylimr = list(axr.get_ylim())
    ylimr[0] = 0
    axr.set_ylim(ylimr)
        
    # adjust plot width
    xspace = 0.01 * len(ticks)
    setting = (-0.5-xspace, len(ticks)-0.5+xspace)
    ax.set_xlim(*setting)
    axr.set_xlim(*setting)

    # start work on the broader canvas
    fig = ax.figure
    fig.set_size_inches(8, 4)
    fig.tight_layout(pad=1)
        
    # y-axis labels - the hard way
    lHeight = 0.96
    lInstep = 0.02
    fig.text(1.0-lInstep, lHeight, f'Cumulative {mode}',
            ha='right', va='top', fontsize=11,
            color='#333333')
    fig.text(lInstep, lHeight, f'New {mode}',
            ha='left', va='top', fontsize=11,
            color='#333333')
        
    # save - goodbye
    fig.savefig(f'./incidence-{name}.png', dpi=125)
    plt.close()

# fake up the synthetic data
n = 30
days = list(range(1,n+1))
t = pd.Series(days, index=days)
a = 1
b = 1.2
cumulative = a * (b ** t)
incidence = cumulative - cumulative.shift(1).fillna(a)
plot_incidence('1 - Exponential Growth', incidence, cumulative, 'cases')
incidence.iloc[25:] = incidence.iloc[24]
cumulative = incidence.cumsum()
plot_incidence('2 - Exponential then Linear Growth', incidence, cumulative, 'cases')

No comments:

Post a Comment