Mark the Graph: March 2020

Saturday, March 28

COVID19 and exponential growth

Unconstrained, the spread of a new epidemic often follows an exponential growth curve (at least in the early days). When this happens, the cumulative number of cases ($totalCases_t$) at time ($t$) follows a formula like one of the following expressions for exponential growth:

$$\begin{align}
totalCases_t & = a b^t \\
& = a(1+r)^t \\
& = a e^{kt}; \ \ (where \ k=ln(1+r)=ln(b)) \\
\end{align}$$
In these equations, $a$ is the total cases at the start (when $t=0$). The equation is subject to the constraints ($a>0, b>1, r>0, k>0$). Some of you will recognise $a(1+r)^t$ as the formula for compound interest. Compound interest is another form of exponential growth.

The interesting thing about exponential growth in cumulative cases is that it depends on exponential growth in new cases. Let's take a moment to prove this. We can solve the number of new cases for any point in time with a difference equation. For example, if our time period is days, then the number of new cases today is the total number of cases we have at the end of today, minus the total number of cases we had at the end of yesterday.

$$\begin{align}
newCases_t & = totalCases_t - totalCases_{t-1} \\
& = a b^t - a b^{t-1}\\
& = a(b^t - b^{t-1})\\
& = a((b-1)b^{t-1})\\
& = (ab-a) \left(\frac{b^t}{b}\right) \\
& = a\left( 1-\frac{1}{b}\right) b^t \\
& = c b^t; \ \ (where \ c=a\left( 1-\frac{1}{b}\right))
\end{align}$$
This last equation is demonstrably an exponential equation, in the same general form as the very first line in the previous block of equations. Also, it maintains the original constrains, as if ($a>0$, and $b>1$), then ($c>0$). Therefore, if the cumulative number of cases is growing exponentially over time, then the new cases on each day are also growing exponetially (QED).

We can quickly check our algebra with some python code.

In  [1]: import pandas as pd 
    ...: a = 7 # day 0 staring point - selected at random - must be greater than 0 
    ...: b = 1.5 # growth rate - selected at random - must be greater than 1 
    ...: t = pd.Series(list(range(1,11))) 
    ...: total_cases_f = a * (b ** t) 
    ...: nu_cases_diff = total_cases_f - total_cases_f.shift(1).fillna(a) 
    ...: nu_cases_f = (a * (1 - 1/b)) * (b ** t) 
    ...: print(pd.DataFrame([t, total_cases_f, nu_cases_diff, nu_cases_f], 
    ...:                     index=['t', 'Total Cases',  
    ...:                     'New Cases (by difference)', 
    ...:                     'New Cases (by formula)']).T) 
    ...:                                                                                                                
      t  Total Cases  New Cases (by difference)  New Cases (by formula)
0   1.0    10.500000                   3.500000                3.500000
1   2.0    15.750000                   5.250000                5.250000
2   3.0    23.625000                   7.875000                7.875000
3   4.0    35.437500                  11.812500               11.812500
4   5.0    53.156250                  17.718750               17.718750
5   6.0    79.734375                  26.578125               26.578125
6   7.0   119.601562                  39.867188               39.867188
7   8.0   179.402344                  59.800781               59.800781
8   9.0   269.103516                  89.701172               89.701172
9  10.0   403.655273                 134.551758              134.551758

And thankfully, it checks out. We get the same answer for new cases from the difference method, and from the exponential equation for new cases, we derived above.

Why all that boring math. Well when we over-plot total cases and new cases on the same chart (but to different scales), it is easy to pick if both series are growing exponentially, or in some other way.

We can demonstrate this with a couple of synthetic examples, and then apply what we have learnt from the synthetic charts to real data from the COVID-19 pandemic. For the synthetic data, we will produce two quick plots, the first will be an exponential growth of 20 per cent per day for thirty days. In the second chart, we will do exponential growth for 25 days, followed by linear growth for the last 10 days.

The first plot - of pure exponential growth - looks like this. The blue line of cumulative exponential growth (with a scale on the right-hand side) maps perfectly to the red bars of exponential growth in new cases reported each day (with a scale on the left-hand side). They overlap perfectly because the axes for the two charts are scaled to fill the available chart space (and because of the relationship identified above).

In the next chart, we have 25 days of exponential growth followed by five days of constant growth. Just because it looks like more red on the chart, notice that the left-hand axis only goes to 16 in this chart. It went to 40 in the previous chart. Also note that the blue-line stops curving upwards from day 25, and tracks in a linear fashion (remember $y=mx+b$ from high school).

The key thing to note when we have less than exponential growth, the chart has a lot of red above the dotted blue line for the cumulative total (which is scaled on the right-hand side)

Let's apply this observation to some real-world examples.

If we look at the growth in cases in the United States, it looks pretty close to being exponential. While there is some red above the blue line, that is probably a result of noise in the data, and perhaps constraints with making testing available.

Conversely, if we look at the growth in cases in Italy, we can see a lot of red above the blue line. The spike on 16 March, is probably making up for the limited additional cases reported on 15 March: a data reporting issue. But more generally, it looks like exponential growth progressed to around 22 March, and since then we have seen linear growth in the order of 5000 to 6000 new cases a day. While that might seem a lot, it is much less than it would have been if the prior exponential growth had continued unabated.

And that is why I am a fan of these charts that plot cumulative growth over time alongside new cases or new deaths for each day.

The full set of charts for each nation can be found on my GitHub gist site. But please note, you won't be able to view the gist from a mobile phone.

Finally, the python code I used to generate the synthetic charts above.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')

def plot_incidence(name, incidence, cumulative, mode):
        
    assert((incidence.index==cumulative.index).all())
        
    # plot daily incidence
    ax = plt.subplot(111)
    labels = [f'{int(p)}' for p in incidence.index]
    ax.bar(labels, incidence.values, 
               color='#dd0000', label=f'New {mode}')
    ax.set_title(f'Synthetic: {name}')
    ax.set_xlabel('Day')
    ax.set_ylabel(None)
        
    # plot cumulative total
    axr = ax.twinx()
    axr.plot(labels, cumulative.values,
             lw=2.0, color='#0000dd', ls='--',
             label=f'Cumulative {mode}')
    axr.set_ylabel(None)
    axr.grid(False)

    # manually label the x-axis
    MAX_LABELS = 9
    ticks = ax.xaxis.get_major_ticks()
    if len(ticks):
        modulus = int(np.floor(len(ticks) / MAX_LABELS) + 1)
        for i in range(len(ticks)):
            if i % modulus:
                ticks[i].label1.set_visible(False)
        
    # put in a legend
    ax.legend(loc='upper left')
    axr.legend(loc='upper center')

    # align the base of the left and right scales
    yliml = list(ax.get_ylim())
    yliml[0] = 0
    ax.set_ylim(yliml)
    ylimr = list(axr.get_ylim())
    ylimr[0] = 0
    axr.set_ylim(ylimr)
        
    # adjust plot width
    xspace = 0.01 * len(ticks)
    setting = (-0.5-xspace, len(ticks)-0.5+xspace)
    ax.set_xlim(*setting)
    axr.set_xlim(*setting)

    # start work on the broader canvas
    fig = ax.figure
    fig.set_size_inches(8, 4)
    fig.tight_layout(pad=1)
        
    # y-axis labels - the hard way
    lHeight = 0.96
    lInstep = 0.02
    fig.text(1.0-lInstep, lHeight, f'Cumulative {mode}',
            ha='right', va='top', fontsize=11,
            color='#333333')
    fig.text(lInstep, lHeight, f'New {mode}',
            ha='left', va='top', fontsize=11,
            color='#333333')
        
    # save - goodbye
    fig.savefig(f'./incidence-{name}.png', dpi=125)
    plt.close()

# fake up the synthetic data
n = 30
days = list(range(1,n+1))
t = pd.Series(days, index=days)
a = 1
b = 1.2
cumulative = a * (b ** t)
incidence = cumulative - cumulative.shift(1).fillna(a)
plot_incidence('1 - Exponential Growth', incidence, cumulative, 'cases')
incidence.iloc[25:] = incidence.iloc[24]
cumulative = incidence.cumsum()
plot_incidence('2 - Exponential then Linear Growth', incidence, cumulative, 'cases')

Friday, March 13

Covid-19 doubling times by country

I have been looking at the time it takes for the number of Covid-19 cases to double.

The short answer is that, in most countries, the cumulative case total doubles every two to three days. This assumes sustained local transmission where the local counter-measures have not slowed that transmission.

If we assume a doubling every three days, then 100 cases today will grow to around 13,000 in three weeks and 1.6 million in six weeks. If one to ten per cent of those with the virus need hospital treatment, uncontrolled growth in infections would be challenging for the health system.

Some countries (eg. China and Korea) have arrested growth. Others (eg. Singapore and Japan) have a slower growth rate.

For this analysis, I have taken the case numbers from Our World in Data (OWID). The latest records were dated 11 March 2020. [Update: I have updated the charts for the latest data to 12 March 2020; again for the latest data to 14 March 2020; again for the latest data to 16 March 2020; and 17 March 2020].

Update: From 16 March, I have added charts with data from the European Centre for Disease Prevention and Control (EU). While similar, the two data sources are not aligned in many cases.

I have focused on those countries with more than 100 cases in the most recent report, and those countries with at least four days of data with over 50 cases each. I wanted to focus on countries where local transmission had been established, and where I had enough data points to fit a regression.

For each country, I have fitted a regression equation for exponential growth in the form:
$$y=\alpha e^{\beta x}$$
where $x$ is the number of days with 30 or more cases, and $y$ is the total number of cases on that day (cumulative).

To calculate the doubling time, I have taken the $\beta$ exponent from the above regression equation and applied the following:
$$Doubling\ in=\frac{ln(2)}{\beta}\ days$$

I have excluded China, as an exponential growth model no longer describes what is happening there, given the effectiveness of its containment measures.

The results follow. I have annotated the anomalies.

Of note: the doubling rate in Australia does not appear to be as fast as other counties.

The Iranian data does not look like exponential growth; I am not sure why, and I have no direct information about what Iran is doing, but it is possible that containment strategies in Iran are having an impact.

Japan has a slower doubling time than most other countries.

Singapore has a particularly slow growth rate. However, this may be changing with the most recent data.

This is not the best regression fit. We can see that South Korea is arresting the growth rate in cases. This is an example of when the exponential growth regression no longer applies.

Some further comparisons can be seen in the following charts (all based on data from OWID).

The code I used to generate the above plots is saved as a GitHub gist. [Note: To see the Jupyter notebook with the Python3 code, you will need to access the gist from a desktop or laptop computer. It often does not work from a mobile device].