Emerging infectious diseases are among the most destructive and costly natural forces. As new diseases emerge and old diseases re-emerge, as pathogens and their vectors are transported worldwide through trade and travel, it is now time to improve global warning systems for emerging infectious diseases in general(1). The best countermeasure is an early warning to give affected regions or communities more time to prepare for the impact. Anticipating and responding to disease risk requires interpreting disease events—outbreaks and epidemics—as emergent properties of a complex system from which to gather infectious disease intelligence. The production of intelligence involves identifying actionable and biologically meaningful data patterns, developing predictions about future risk and epidemic trajectories, and characterizing possible losses under a range of intervention scenarios. Infectious disease intelligence therefore relies fundamentally on data from multiple sources to provide a stream of information that can be inspected by modeling and real-time analytics to make decisions about prevention, surveillance, or emergency responses to outbreaks.
During disease emergencies, the major goals are quick containment and damage control in the human population. To achieve these objectives, forecasting and scenario analysis must focus on estimating the amount of control or containment efforts —such as the number of treatment centers, the spatial and temporal extent of quarantine, or the mobilization of existing vaccines—needed to achieve a desired outcome, which might be measured in deaths averted, reduction in disability adjusted life years, or some other societal value.
The goal of forecasting is to predict the short-term trajectory of a given situation. Data streams that are essential to improving forecasting are real-time figures about case counts including location data, and results of outbreak investigations, genetic sequences of viral or bacterial isolates, which can then be used to estimate both the evolutionary potential of the pathogen and the actual case burden, and records of actions taken, such as school closures, quarantines, or deployments. Combined, these data can be used to triangulate the current status and trajectory of evolving epidemics.
Scenario analysis, in contrast, does not aim to make quantitative predictions, but explores the possible medium- or long-term outcomes of the available courses of action (see Sidebar A). For instance, to provide useful guidance, modelers need information about infrastructure and equipment such as transportation networks, hospitals, laboratory locations, and capabilities; about available technologies including diagnostic tests and instruments or vaccines; and about supply chains. To predict the potential effectiveness of interventions, it is important to know how effective they are supposed to be. Additionally, effectiveness is modulated by individual behaviors, for instance education about protective measures or government policies, which may have unintended side effects. Most approaches to modeling epidemics are either highly abstract, in which case they may elegantly illuminate the underlying principles governing disease dynamics, but lack the flexibility to represent idiosyncratic conditions on the ground; or they are detailed “tactical models” that characterize the most likely outcomes, but may give a false sense of precision, particularly when data are scarce.
Modeling and analytics are key to generating infectious disease intelligence. But models are not a panacea. One should bear in mind what we might call the First Law of Information: There is no information without data. This version of you-can’t-get-something from- nothing states that models cannot make up for ignorance. Modeling is not a magical bridge to cross an information gap. The hard limits to forecasting are set by the volume and quality of basic scientific information.
Big Data and Machine Learning have generated a plethora of methodologies that are useful for infectious disease intelligence. For instance, statistical learning algorithms identify certain patterns in datasets and detect anomalies. Decision support analysis incorporates the models identified by such algorithms into an organizational or policy-making decision process that can align empirical outcomes, such as deaths averted, and possible actions. Mathematical models of social, epidemiologic, and evolutionary dynamics are useful for explaining how individual decisions (such as hospitalization) and events (such as transmission) “scale up” to generate emergent phenomena at the population level. Such models are often too simplistic for tactical use, but may serve as the core for more complex simulations. Simulation models may then be used prospectively for short-term forecasting (prediction of number of cases in the next one to four weeks) or long-term scenario analysis. Simulation models can also be used inversely to test hypotheses about the underlying biological mechanisms or to evaluate the plausibility of alternative theories. The answers generated by modeling and analytics for surveillance (watch), response (warning), and intervention (emergency) may be imprecise, but they can be improved.
In the future, can we attempt to answer ambitious question: How could we make use of recent advances in shared platforms to facilitate data usage for focused infectious disease epidemiology query? Yet, two major challenges related to Big Data Analytics in the application to diseases control need to be considered:
- Diverse data challenges: how to control, manage, and manipulate the multiple data sources relating to infectious disease epidemiology, known and uncertain disease characteristics and the recent advance of data management methods.
- Methodological challenges: how to best integrate advanced data-mining, disease modelling and syndromic surveillance methods to maximize our understanding of emerging and reemerging diseases based on the multiple data sources.
Several key limitations and concerns in utilizing AI models for enhancing decision-making relating to disease preparedness should also be taken into consideration. First, every single infectious disease exhibits unique natural characteristics (for instance: transmission route, infectivity, incubation period). Second, our level of understanding of a new emerging infectious disease may be limited at the early phase of outbreak. Third, disease- related data may reside in different formats, which would require substantial information extraction efforts. AI algorithms may require specific calibration to disease-specific scenarios. In the other words, a one-size-fit-all approach may not be applicable across all diseases contexts.
(1) Barbara A Han & John M Drake, Future directions in analytics for infectious disease intelligence, EMBO Reports (2016).