Forecast COVID-19 cases in the US under different levels of social distancing

Our Research

  • We provide forecasts of COVID-19 cases under different reopening strategies, based on the findings and methodologies discussed in detail in our paper that is published in Scientific Reports.
  • Here is a non-technical summary of the paper.
  • Link to our Model’s code and data

Other COVID Prediction Models


Summary of Methodologies

We measure the extent to which social distancing reduces the speed at which COVID-19 spreads. We then run simulations to forecast the rates of COVID-19 spread under different social distancing levels. We find that COVID-19 spreads less than proportionately with the number of contagious individuals, a distinct difference from the assumption of standard models. We also observe that social distancing greatly reduces the spread of COVID-19.

The model we estimate is a modified version of a susceptible-infected-recovered (SIR) model:

yi,t = Ri,t Si,t (Yi,t-2 – Y t-8 )ω       (1)

where yi,t is the number of new infections in county i on day t, Ri,t is the rate at which infectious individuals transmit the disease, Si,t is the percentage of the county population that has not yet had COVID-19, and Yi,t is the number of cumulative individuals who have been infected by day t.

The most crucial difference between this model and standard SIR models is that standard SIR models constrain ω=1. However, a model with this constraint does not perform well out of sample. We instead find that ω=0.57. As we note in our paper, such a result would be expected if contagious individuals expose many of the same unexposed individuals, which could occur if cases are clustered within households, nursing homes, or places of work. This concavity implies that while cases may initially grow exponentially either at the beginning of the pandemic, or during times of easing social restrictions, the number of new cases will fairly quickly get to a flat level, where it will be relatively steady, declining slowly over time (unless a further intervention occurs).

We allow Ri,t to vary with a number of factors instead of treating it as a constant parameter:

Ri,t = exp(αi + βt + λdi,t + θhi,t + μmi,t + εi,t )       (2)

This specification allows transmission rates to differ across counties (county fixed effects αi reflect different population densities and demographics), dates (date fixed effects βt accommodate different rates of testing and different rates of reporting that happen on weekdays vs. weekends), levels of social distancing di,t, and different temperatures and humidity, hi,t and mi,t, respectively. The social distancing measure is based on cellphone GPS location data that are provided by SafeGraph for free to researchers studying COVID-19.

We estimate equation (1) by taking the logarithm of both sides, and then subtracting ln(Si,t) from both sides. We add 1 to the number of new cases to ensure that the left-hand side is well-defined.

Observed social distancing levels and social distancing regulations are not determined in a vacuum: Rather, people social distance more in areas that are hit harder by COVID-19. Thus, εi,t may be correlated with social distancing, causing a biased measurement of the impact of social distancing on the rate of contagion. We thus use an instrumental variables (IV) technique to control for this endogeneity bias, where the amount of rain is our instrument for social distancing. The first stage F-test for the strength of rain as an instrument is 214.44, which is highly significant, indicating that rain is a strong instrument.