Can we get accurate short-term forecasts of coronavirus cases and deaths?

In a pandemic, policy-makers need to plan healthcare provision carefully and adjust the intensity of measures to mitigate the spread of the virus. That requires real-time forecasts of cases and deaths that are timely and accurate indicators of what will happen over the next week or so.

We all forecast many things every day, even if we rarely think of calculations about how long a shopping trip will take or how much it is likely to cost as forecasts. All such forecasts must come with a health warning: they are uncertain. Numerous caveats also apply when interpreting forecasts. Most forecasts made last year are now terribly wrong because of Covid-19: that event was not part of any model, but it is unreasonable to blame the forecaster for that.

Currently, short-term forecasts of Covid-19 cases and deaths are invaluable. They help to put the daily released data into context. The rapid increases in reported cases and deaths during the initial expansionary phase of the epidemic were often presented as a surprise by the media, yet they were partly predictable based on their trends. Forecasts also provide policy-makers with advance warnings, helping to allocate scarce public health resources and guide lockdown policy.

One aspect of Covid-19 for which we need forecasts is to understand when we might reach a peak in the epidemic. This is essential in indicating whether the disease is under control, and whether various policies are working.

Forecasts are also important for monitoring whether there is a second wave of the disease. Many countries seem to have passed their initial peak. By mid-April, 21 of the 24 European countries that are monitored at doornik.com/COVID-19 had reached a peak in reported new cases, which still persists at the end of June. This allows those countries to relax lockdown and discuss the opening of borders.

The situation in the United States is quite different. By the middle of May, it seemed that new Covid-19 cases had peaked in most states. Unfortunately, by late June, we can see that rapid growth has resumed in more than 20 states. Short-term forecasts are essential in monitoring such events.

Related question: Coronavirus and the economy: what are the trade-offs?

How do we produce forecasts?

There are broadly two approaches to forecasting the Covid-19 pandemic. The first builds mathematical models of disease transmissions based on tested theories of how diseases spread. These are called epidemiological models and they rely on a set of underlying assumptions (such as the frequency with which individuals make contact with other individuals). Projections are made by extrapolating these deterministic relationships. An example of this approach is the one-week ahead forecasts by the Imperial College London Covid-19 Response Team.

The second approach uses the observed history of pandemic data to find past relationships in those data. The projections are made by extrapolating the relationships into the future. These models allow chance variations or randomness in the behaviour of the disease (in the jargon, they are ‘stochastic models’). An example of this approach is the short-term forecasts of cases and deaths produced by Jurgen Doornik, Jennifer Castle and David Hendry at the University of Oxford.

Is one type of forecasting model better?

No. All forecasting models have different underlying assumptions and different ways of using past data. All are ‘wrong’ to some extent – the future is unknown and forecasts are uncertain. Forecasts are certainly not statements about what will definitely occur in the future: unfortunately, we do not have a crystal ball.

The epidemiological models can take changing conditions into account by adjusting the underlying assumptions. The data-based models can adapt to the changing recent past. Both can be informative. Using a range of different models means that you aren’t too reliant on the underlying assumptions or extrapolations of a specific model.

Why are some forecasts poor?

Most data-based models have a normal point, often the historical average of the data, around which there can be deviations. This point can be called the model’s equilibrium. Forecasts from these models tend to predict a reversion to the past equilibrium.

Epidemiological models also have built-in equilibria based on constant rates of change of people entering the susceptible, infected, recovered and death categories. These models often perform well at explaining the observed data, but they may not produce good forecasts if the equilibrium has changed.

Imagine that you are going on a journey from London to Durham (via Barnard Castle) and we track your journey and try to forecast your itinerary. At Darlington, there is a road closure, so you briefly detour. Our forecasts of your route are wrong for a while, but then recover. Our forecasts are predicting that your journey will revert to the previous equilibrium.

Usually that works well, so after the detour, the journey continues to Durham. But now you hear that you shouldn’t visit Durham and decide to go to the Lake District instead. If our model continues forecasting on the basis that you are still aiming for Durham, our forecasts become increasingly poor. The model needs to be able to recover from such a ‘structural break’.

Models that help us to understand the past may not be the best models to use for forecasting: models can be too highly driven by their theoretical formulations. All forecasts will be inaccurate when faced with unanticipated shifts such as those just described. But models that do not depend on the built-in reversion to a past normal may forecast better after the shift has occurred.

Many epidemiological models and statistical models enforce constant relationships, so they struggle to adapt to structural breaks until well after the shift has occurred. When our forecasting model sees your car turn off the A1(M) and head west, it will perform best if it can adapt immediately.

How accurate are the forecasts?

The data-based methods have tended to be more accurate in the initial phase of the pandemic. They can adapt more rapidly to the many shocks and shifts in the data. We expect the forecasts to fail when interventions and policy changes come into effect, so such forecasts can provide a useful benchmark to evaluate the effectiveness of interventions over time.

The epidemiological models allow policy-makers to see what could happen in the event of no, or some, changes in policy, so they have the advantage of modelling possible scenarios. They become more reliable when there are sufficient data to get reasonable estimates of the initial conditions.

Models that combine the two approaches are also feasible, and could include the increasingly popular agent-based models that allow individuals to act in differing ways. These models require a huge amount of data, but allow for more refined projections, for example, by estimating contact matrices that depend on age, context and type of contact.

How good are the data?

Forecasts are only as good as the available data, and the data are often unreliable and non-representative, particularly at the start of the pandemic. The data show similar patterns across countries, with a slow start, an exponential increase and a gradual slowing, with possible additional waves.

We call data that are drawn from the same distribution over time ‘stationary’, which means that future data look much like the past – the time dimension is irrelevant. Pandemic data are definitely not stationary. The distribution of the data looks very different depending on the point in the pandemic evolution at which you look.

Added to this problem is that the methodology used to report the pandemic data is changing over time. The reporting process is also non-stationary. There are reporting delays, changing definitions and data errors – for example, the ramping up of infection and antibody testing; and the sudden inclusion of care home cases for the UK. There are also corrections for previous errors in reporting leading to negative numbers reported for cases for some days, and lags in data releases.

There is a compounding effect of the non-stationarity of the underlying data with the non-stationarity of the reporting process, leading to additional difficulties of modelling and forecasting Covid-19 data. Forecasting models need to be able to adapt to these data constraints. If there is a sudden jump in the number of cases – for example, due to a change in definition – then forecasts that start at the revised data will perform better than forecasts that give little weight to the latest data.

Adaptive forecasts are forecasts that adjust rapidly to the latest information in the data. This could backfire in the short run if revisions such as negative new cases are reported. But in general, a rapid updating in response to changes in how the data are measured are needed, in much the same way that forecasts need to adapt to shifts as described above.

Where can I find out more?

Data on confirmed cases and deaths are available from Johns Hopkins/CSSE for countries and US states. See also Our World in Data for daily data updates.
More detail on the short-term data-based forecasts of cases and deaths can be found in this VoxEU article by Jennifer Castle, Jurgen Doornik and David Hendry, which argues that shifts in distributions can lead to systematic mis-forecasting.
Adaptive data-based models that are ‘robust’ after distributional shifts can be useful. See also the International Institute of Forecasters blog.
There are many other short-term forecasts available, for example:
- From the Institute for Health Metrics and Evaluation for the United States.
- Regional forecasts for both Italy and the UK have been produced by Roberto Pancrazi.
- Fotios Petropoulos, Spyros Makridakis and Neophytos Stylianou have been producing short-term statistical forecasts.
- This article assembles 12 models published by scientists to illustrate the possible trajectories of the pandemic’s death toll in the United States.
Nassim Nicholas Taleb argues that forecasters shouldn’t even produce point forecasts of pandemic processes, but John Ioannidis argues that despite their unreliability, forecasts are necessary to help policy-makers – see here.
Flexible models that can produce robust forecasts clearly have a role in the current pandemic: see Financial Times (11 April).