The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes COVID-19, has been causing a series of asynchronous regional hotspots with uneven strain on healthcare and economic systems. In the USA, Federal and local governments have consistently been behind in their effort to control the spread of SARS-CoV-2 by delayed adoption of non-pharmaceutical interventions (NPIs) only after local surges are well underway. This reactive—albeit necessary—response has been inadequate in the USA, as evident by its accumulation of the world’s highest number of infections and deaths 2 years into the pandemic [1].

Having reliable forecasts of the next local hotspot can be a powerful tool for decision-makers to plan for resource allocation and implementation of public health measures, particularly when all state-level governments have lifted most or all NPIs in May 2022. In this commentary, we discuss the technical obstacles in the way of setting proactive policy, and the potential of artificial intelligence, and specifically, deep learning (DL) for forecasting the geographic spread of COVID-19, and more generally, other infectious diseases. We focus our discussions on these technical obstacles, which are rooted in the methodology and reliability of forecasts as the foundation for decision making, rather than the social aspects of encouraging the broader public to adhere to the prescribed NPIs. We highlight the implicit assumptions of spatial homogeneity in conventional models, which hinder forecasting performance; and point to DL’s demonstrated potential for improving the quality of forecasts through capturing spatial dependence, heterogeneity, and interactions.

Forecasting COVID-19 in Practice

Forecasting COVID-19 in the USA has been done as a collective effort, with several teams of infectious disease modelers submitting forecasts for each county and state to the COVID-19 Forecast Hub [2], coordinated by researchers at the University of Massachusetts Amherst, who then generate an ensemble forecast building on the teams’ inputs. The US Center for Disease Prevention and Control (CDC) relies on this ensemble forecast to inform how health officials formulate policy, communicate with the public, and devise clinical guidelines. However, the extent to which the forecasts are used in setting policy is uncertain. The lack of widespread adoption of these models for concrete planning lies in the reliability of their forecasting skill. For instance, the case and hospitalization forecasts generally failed to capture the timing of surges and disease peaks within a 95% prediction interval [3]. The reasons for this underperformance and subsequent lack of use in setting policy are multifold, and mostly technical.

Conventional Models

The large majority of teams contributing to the COVID-19 Forecast Hub use variants of the long-used compartmental SIR models, which generally divide the population into susceptible (S), infected (I), and recovered (R) compartments. The rates of transition between the compartments are governed by a system of differential equations, which once solved for a sample (theoretically representing a homogenous population), can have two use cases: (1) deriving epidemiological parameters such as the reproduction number (R0) [4], which helps with evaluating the efficacy of various interventions implemented in different populations; and (2) forecasting the incidence of the disease in the future using the calibrated parameters. While SIR-derived models provide a framework for interpretable epidemiological parameters and setting retroactive policy (use case 1), their forecasting skill (use case 2) is shown to be outperformed by deep learning (DL) algorithms, according to several studies including one of ours [5,6,7,8]. Reliable forecasting is essential for setting proactive policy.

In practice, the spatial heterogeneity and temporal changes of population and disease characteristics pose challenges to forecasting using SIR models, which need to be recalibrated (or mathematically modified) to incorporate temporally dynamic (or new) variables such as vaccination rates, social distancing, masking, or waning immunity of the population. Even though these modifications are technically feasible, they require consistent surveillance and secondary data collection at each jurisdiction (e.g., masking adherence levels) in addition to expert monitoring and evaluation, and after all, may not be effective in improving over simple SIR models in practice [9].

Deep Learning

Deep Learning (DL)—a family of statistical learning models under artificial intelligence—provides a more flexible structure as it learns optimal values for its parameters based on the patterns embedded directly in the primary observed data, i.e., reported case or hospitalization incidence at each jurisdiction. Furthermore, some DL algorithms such as long-short term memory (LSTM) networks [10] or graph neural networks (GNN) [11] have been developed specifically for capturing complex nonlinear relationships across changes in space and/or time [12, 13]. In this case, this means that a forecast incorporates both previous (historical) observations in the geographic area of interest (temporal data) and previous observations in all connected areas (spatiotemporal data) [14]. This approach inherently captures the spatiotemporally dynamic characteristics of the disease and population, such as waning immunity over time, vaccination rates, levels of physical interactions, or the results of other NPI measures. Furthermore, DL methods easily accommodate the addition of ancillary variables, which might quantify phenomena such as population movement, air temperature, and socio-demographic characteristics (e.g., age structures). In addition to modeling space and time, DL methods are also capable of modeling feedback loops as captured in observed data; e.g., an increase in human movement and mobility leads to more new cases. Conversely, an increase in new cases may lead individuals to voluntarily stay at home [6].

DL has revolutionized many fields such as weather prediction or medical diagnosis, and now, it presents immense potential for improving disease forecasting. Today’s success of DL methods is partly due to the increase in data collection, in this case, reported by healthcare centers. Although DL is data intensive, even before the first peak in the USA in the Fall of 2020 (when fewer time series of COVID-19 incidence were recorded in the USA), DL-based methods generated more accurate forecasts compared to SIR-based models [5,6,7]. Nevertheless, how much data each family of models require (especially once a new variant emerges) is an open research question. Another open research question is whether the well-established field of transfer learning—e.g., an approach in which DL is trained on one disease/variant, but incrementally updated on a new one—is a promising approach [15].

More Fundamental Work for Actionable Science

Devising more effective ways of quantifying interactions and regional connectivity between spatial units remains a promising area for investigation. Aggregate and anonymized cell phone movement data (which is not available in every country) or Facebook’s Social Connectivity Index (available in more than 35 countries) can help create predictive variables that capture the influence of connected counties on the increase of cases in a specific county [16]. With close to 3 billion users worldwide, the Facebook Social Connectedness index is one of the most representative graphs available for approximating social ties (and other derivatives) amongst populations at different geographic levels (e.g., counties, states, or countries). Recent studies have demonstrated that Social Connectedness Index can be used to characterize social capital [17], and be turned into one of the most predictive variables of upward economic mobility [18]. Analyses like these were not possible up to now, due to the lack of representative and large-scale social network data. Incorporating social ties (as reflected in the Facebook Connectedness Index) across geographic boundaries into spatial weight formulas or more advanced architectures of neural networks may prove effective in capturing the spatial dependence between jurisdictions, resulting in more reliable forecasts. To what degree such spatial models can improve the forecasts in data-rich and data-sparse scenarios is an open question for future research.

Another worthwhile area of focus is devising hybrid models that integrate DL (thus, its forecasting skill) and SIR to preserve explainable parameters such as the reproduction number [19, 20]. Interpretability is generally an obstacle against the adoption of DL algorithms, and more transdisciplinary innovation in this space is justified to achieve interpretability [21,22,23].

SIR models have provided epidemiological parameters for characterizing infectious diseases. However, DL algorithms exhibit state-of-the-art forecasting skill, the key enabling factor for setting proactive policy. The high potential of deep learning for forecasting the spread and burden of infectious diseases in general, COVID-19 in particular needs urgent attention. Facing the fact that SARS-CoV-2 is here to stay and the likelihood of renewed surges, there is a need for interdisciplinary collaboration and investment in this family of models by answering several impactful and policy-relevant research questions to arm decision-makers with timely predictions for crafting advanced public policy and investment. We call on the scientific community to consider innovation crossing epidemiological modeling and deep learning.