Keywords

14.1 Introduction

During the last few decades, emerging infectious diseases have become an increasingly important global public health problem and a major cause of morbidity and mortality. Emerging infectious diseases are characterized by a rapid increase in incidence or geographical range [17]. Examples of such outbreaks are the Severe Acute Respiratory Syndrome (SARS) originated from Asia in 2003, the Avian influenza H5N1, and the H1N1 2009 pandemic [17]. Effective surveillance systems for early warning of the outbreak are crucial. Traditional surveillance systems involve laboratory identification of the pathogen responsible for the disease. Such surveillance systems are passive in their nature, since they require a bottom-up process of identifying a possible infectious disease by clinicians, reporting to the appropriate authorities, confirming the disease by laboratory tests, and disseminating the information by the authorities.

Another prominent limitation of traditional surveillance systems is that they are not capable of detecting epidemics in real-time. Due to this limitation, new surveillance systems were developed. An impetus to developing these new surveillance systems was the adoption of the revised World Health Organization (WHO) International Health Regulations (IHR) of 2005, which required national capability for surveillance and reporting of both familiar and previously unfamiliar infectious diseases [25]. As a result, digital surveillance, or, digital systems for the detection of infection disease epidemics have been evolving dramatically. Morse has defined “digital disease detection” as “the use of the internet and computer technologies for collecting and processing health information, including outbreak reports and surveillance data” [17].

Digital systems can be classified into two types: formal and informal. These systems are based on syndromes rather than laboratory identification, without laboratory confirmation [17]. Formal digital systems are based on data arriving from formal organizations such as hospitals, healthcare clinics, and health agencies, whereas informal digital systems are based on data collected through media sources such as news reports on the Internet, mailing lists, and RSS (Really Simple Syndication) feeds, as well as data collected through official sources. Informal digital systems are characterized by their ability of mining, categorizing, filtering, and visualizing online information regarding epidemics [4]. They exploit the ease of using online information, as well as the freely available mapping technology to produce globally available information on ongoing infectious diseases, which may not be captured through traditional surveillance, and may be useful to governments and health agencies [4]. These systems are designed to function during all phases of disease outbreak, and are planned to increase sensitivity and timeliness. However, the role of such systems before, during and after infectious disease epidemics and, in particular, whether such systems are currently capable of early detection of epidemics remains unclear.

14.2 Methods

A literature review was carried out to compare informal digital systems with regards to their source of information, the manner in which they process and disseminate the information, their role in each phase of an epidemic, and whether and to what extent these systems are capable of early detection of epidemics. The systems evaluated were ProMED-mail, Global Public Health Intelligence Network (GPHIN), HealthMap, MediSys, EpiSPIDER, BioCaster, H5N1 Google Earth mashup, Avian Influenza Daily Digest and Blog, Google flu trends and Argus.

14.3 Results

14.3.1 Description of the Systems

14.3.1.1 ProMED-mail

ProMED-mail is “an internet based reporting system aimed at rapidly disseminating information on infectious disease outbreaks and acute exposures to toxins that affect human health, including those in animals and in plants grown for food or animal feed” [21] (ProMED-mail website). ProMED-mail receives information from a number of sources, such as media reports, official reports, online summaries and local observers. The reports are reviewed and investigated by ProMED-mail expert team, and then distributed by e-mail to ProMED subscribers, and published in ProMED-mail website (ProMED-mail website). In addition to filtering the received information, ProMED-mail expert team may also add related information from media, government and other sources [23]. ProMED-mail was proven as an efficient system during the 2003 outbreak of SARS, where information about points of outbreak, including additional information from a British Medical Journal article, was efficiently disseminated [23]. It should be stressed that ProMED-mail collects, filters, disseminates and archives it. They do not carry out formal analysis of the information although they provide some evaluation.

14.3.1.2 Global Public Health Intelligence Network (GPHIN)

The Global Public Health Intelligence Network (GPHIN) is a biosurveillance system developed by Health Canada in collaboration with the WHO. GPHIN receives as input, information about disease outbreaks arriving from news service items, ProMED-mail, electronic discussion groups and selected websites, and disseminates information to subscribers using the following decision algorithm. A relevance score is computed for each information item. Two thresholds are determined, high and low. If the item relevance score is greater than the high threshold, then it is immediately disseminated to subscribers. If the item relevance score is lower than the low threshold, then it is automatically “trashed”. Otherwise (if the item relevance score is between the high and the low thresholds), the item goes through human analysis and then disseminated to subscribers [23].

A prominent limitation of GPHIN efficiency is its reliance on the time in which information about an outbreak or other event if published in one of GPHIN data sources. Nevertheless, GPHIN is considered efficient in providing earlier warning of events of interest to the international community compared with other systems, as 56% of the 578 outbreaks verified by WHO between July 1998 and August 2001 were initially picked up by GPHIN [23].

14.3.1.3 HealthMap

HealthMap is a freely accessible automated electronic information system aimed at facilitating knowledge management and early detection of infectious disease outbreaks by aggregating, extracting, categorizing, filtering and integrating reports on new and ongoing infectious disease outbreaks. Data on outbreaks are organized according to geography, time, and infectious disease agent [5].

HealthMap receives as input reports received from variety of electronic sources, including online news sources aggregated in websites such as Google News, reporting systems such as ProMED-mail, and validated official reports received from organizations such as the WHO [5, 11]. An internet search is performed by HealthMap every hour, 24 h a day, in order to obtain the required information. Search criteria include disease name (scientific and common), symptoms, keywords, and phrases. After collecting the reports, HealthMap uses text mining algorithms in order to characterize the reports. Characterization includes the following stages: (1) Categorization: reports are categorized according to disease and location and relevance is determined. (2) Clustering: similar reports are grouped together and exact duplicates are removed. Clustering is performed based on similarity of the report’s headline, body text, and disease and location categories. (3) Filtering: reports are reviewed and corrected by an analyst, and then filtered into five categories – breaking news, warning, old news, context, and not disease related.

In order to reduce information overload and to focus on disseminating information regarding outbreaks of high impact, only reports classified as breaking news are overlaid on an interactive geographic map located on HealthMap site [5].

Among the users of HealthMap are the WHO, the US Centers for Disease Control and Prevention, and the European Center for Disease Prevention and Control, which use its information for surveillance activities [5, 11].

14.3.1.4 MedISys (Medical Intelligence System)

Medical Information System (MedISys) is an informal automatic public health surveillance system. MedISys is designed and operated by the Joint Research Center (JRC) of the European Commission, in cooperation with the Health Threat Unit at the European Union Directorate General for Health and Consumer Affairs and the University of Helsinki. MedISys collects its information from open-source news media, mainly articles from news pages. MedISys categorizes the collected information according to predefined categories and disseminates it to subscribed users by e-mail. The system also provides its user with features and statistics available on its website, including a world map in which event locations are highlighted, aggregated news count per each geographic location presented on graphs, and the most significant event location for the last 25 h. MedISys is available in 26 languages (the system collects information in 45 languages, but the website is available in 26 languages). Users can filter the information according to language, disease and location, as well as by outbreaks, treatments and legislations. MedISys users can also select articles into predefined categories, add comments to these articles, add information, and disseminate them to user-defined groups [12].

14.3.1.5 Argus

Argus is an informal biosurveillance system aimed at detecting and monitoring biological events that may be a global health threat to human, plant and animals. The system is hosted at the Georgetown University Medical Center (Washington, DC, United States), and funded by the United States Government. Argus collects information in 40 native languages from media sources, including printed newspapers, electronic media, Internet-based newsletters and blogs, as well as from official sources (the World Health Organization (WHO) and the World Organization For Animal Health (OIE). The system uses Bayesian analysis tools for selecting and filtering the collected articles. The process is performed by about 40 regional professional analysts, who monitor several thousand internet sources on a daily basis. By using Bayesian analysis tools, the analysts select reports from a dynamic database of media reports. Relevance is determined according to a specific set of terms and keywords applicable to infectious diseases surveillance. After filtering the information, events that may indicate the initiation of an outbreak are disseminated to the system users. Also disseminated are events that may require investigation [12, 20].

14.3.1.6 BioCaster

BioCaster is an informal surveillance system aimed to collect information on disease outbreaks, filter the information, and disseminate it to users. The system is a part of a research project developed and managed by the National Institute of Informatics in Japan, which involves five institutes in three countries. BioCaster focuses mainly on the Asia-Pacific region. The system collects information by using Really System Syndication (RSS) feeds from more than 1700 sources. Information is collected mainly from Google News, Yahoo! News, and European Media Monitor, filtered and disseminated in a fully automated manner with no human analysis in any stage. Filtered information (about 90 articles per day) is published in three languages (English, Japanese and Vietnamese). Articles are processed and disseminated every hour. In addition, BioCaster creates an ontology which covers approximately 117 infectious diseases and six syndromes. The ontology is produced in eight languages (English, Japanese, Vietnamese, Chinese, Thai, Korean, Spanish and French), and is used as an input to Global Health Monitor web portal, which offers its users maps and graphs of health-concerning events [12].

14.3.1.7 EpiSPIDER

The Semantic Processing and Integration of Distributed Electronic Resources for Epidemiology (EpiSPIDER) is a web-based tool which integrates information gathered from electronic media resources containing health information, as well as from informal surveillance systems, such as ProMED-mail. The aim is to enhance the surveillance of infectious disease outbreaks.EpiSPIDER uses ProMED-mail reports as an input, as well as health news sources that provide RSS feeds. By using natural language processing, it extracts location information from the input sources, and geocode them using the Yahoo and Google geocoding services. After a filtering process, the system generates summaries of ProMED reports (on a daily basis). These reports are available in the EpiSPIDER website [13].

14.3.1.8 H5N1 Google Earth Mashup

Google earth combines satellite images, aerial photography and map data to create a 3D interactive template of the world. This template can be used by anyone to add and share information about any subject that involves geographical elements. Nature (international weekly journal of science) uses Google earth to track the spread of the H5N1 avian flu virus around the globe, and to present a geographic visualization of the spread of H5N1 [19] (Nature website).

14.3.1.9 Avian Influenza Daily Digest

Avian Influenza Daily Digest is a digest produced by the United States government. The digest collects raw open source content regarding Avian influenza and disseminates it to subscribers. Material is disseminated without any processing. Users are encouraged to provide with updates and/or clarifications that will be posted in subsequent issues of the digest [2].

14.3.1.10 Google Flu Trend

Google Flu Trend is designed by Google Internet Company to be a near real-time tool for detection of influenza outbreaks. Google Flu Trend exploits the fact that millions of people worldwide search online for health-related information on a daily basis. The tool was designed based on the assumption that there is an association between the number of people searching for influenza-related topics and the number of people who actually have influenza symptoms, and therefore, an unusual increase in the number of people searching for influenza-related topic on the web may simulate an increase in influenza syndromes. Studies performed by Google and Yahoo have shown that plotting data on searches using influenza-related keywords has led to an epidemic curve that closely matched the epidemic curve generated by traditional surveillance of influenza [4]. Google Flu Trends analyzes a fraction of the total Google searches over a period of time, and extrapolates the data to estimate the search volume. The information is displayed in a graph called “search volume index graph”. It is claimed by the tool’s designers that, according to tool testing, it can detect outbreaks of influenza 7–10 days before it is detected by conventional CDC surveillance [4].

14.3.2 Comparison Between Systems

All the studied digital resources use similar sources of data – official reports, as well as media reports, including global media resources, news aggregators, eyewitness reports, internet-based newsletters and blogs. However, they use different algorithms to create their output, and cover different geographic areas. In addition, existing digital resources are different in the manner they filter and analyze the information and may create different output. Therefore they complement each other with respect to information completeness.

14.3.3 The Role of Informal Digital Systems in Each Phase of the Epidemic

14.3.3.1 Before the Epidemic (Early Detection)

Retrospective studies of some systems have shown a theoretical decrease in the time of outbreak detection compared to conventional surveillance. However, evidence of such ability in real time is sparse and unclear. Chan et al. [8] have analyzed the average interval between the estimated start of the outbreak to the earliest date of discovery and publication, using WHO confirmed outbreak reports, as well as ProMED-mail, GPHIN and Healthmap reports. Analysis showed a decrease in intervals over 14 years, which was partially attributed to the emergence of informal digital resources [8]. A retrospective study of Argus reports on respiratory disease in Mexico showed a significant increase in reporting frequency during the 2008–2009 influenza season relative to that of 2007–2008. The authors suggest that, according to these retrospective results, respiratory disease was prevalent in Mexico and reported as unusual much earlier than when the H1N1 pandemic virus was formally identified. However, its connection with the 2009 pandemic is unclear [20].

The Google Flu Trends tool was also retrospectively tested. According to retrospective testing, influenza epidemics can be detected by using Google flu trends tool 7–10 days before it is detected by conventional surveillance [6, 7], however, there are still no prospective evidence to such capability. A retrospective study from China reported that Google flu trend search data are correlated with traditional methods of surveillance [14]. Another retrospective study tested the real-time detection ability of six informal digital systems, including Argus, BioCaster, GPHIN, HealthMap, MedISys and ProMED-mail. Data from these systems were used to detect epidemics and compared to official data. Results suggested that all tested systems have shown retrospective real-time detection ability. Moreover, it was found that the combined expertise amongst systems provided a better early detection [3].

Unlike retrospective evidence, prospective evidence of informal digital systems capability for early detection of epidemics is sparse. Some epidemics have been claimed to be first reported by ProMED-mail, before they were officially reported by the WHO [15]. These reports were proved to be reliable, since they were later confirmed by the WHO. However most of the reports were first published by ProMED-mail not because the information was not available to the WHO by this time, but because the WHO was not authorized to publish them due to lack of conformation [15]. The SARS in China (February 10, 2003) is the best known outbreak first reported on ProMED-mail [17].

A detection in real-time was also demonstrated by GPHIN during the SARS outbreak of 2002. GPHIN detected SARS and issues the first alert to the WHO more than 2 months before it was first published by the WHO [16, 18, 24]. However, the time between the GPHIN alert and the first time it was reported by the WHO implies that the whole detection process was not shortened due to the GPHIN alert.

Retrospective reviews of the polio outbreak of 2013 and 2014 and the Ebola outbreak of 2014 showed that informal digital detection preceded official detection by an average of 14.6 days. For example, ProMED and GPHIN reported the polio epidemic in Cameroon in 2013 23 days after the outbreak began, where the official WHO report was published 51 days after the outbreak began [1]. However, the digital systems detection did not contribute in real-time to the whole process of outbreak detection and declaration. Hence, in real-time it is not an early detection.

14.3.3.2 During the Outbreak

There is evidence in the literature on the systems’ usefulness in communicating the information during previous outbreaks to public health professionals, as well as to the general public. ProMED-mail and GPHIN had critical roles in updating public health officials about the SARS outbreak in 2002 [4]. Such systems are also capable of providing officials, clinicians and the general public with guidance to medical decision making, including the importance of vaccination and other preventive actions [4]. The first report on SARS on February 10, 2003 published by ProMED-mail, and the hundreds of subsequent ProMED-mail reports have helped health professionals worldwide to gather critical details regarding SARS, and by this to recognize SARS and discover its cause [15]. Assessment of correlation between Healthmap reports and official government reports reported during the first 100 day of the 2010 Haitian Cholera outbreak has confirmed that data yielded from informal digital systems were well correlated with data officially reported from the Haitian health authorities. Moreover, this study has shown that informal digital systems are capable of being used at the early stages of an outbreak not only as an indicator of the outbreak occurrence, but also as a predictive tool by providing a reliable estimation of the reproductive number, a major epidemic parameter [9].

14.3.3.3 After the Outbreak

There is no evidence in the literature of the use of informal digital systems after an epiodemic. Nevertheless, we believe that data collected during outbreaks through informal digital systems are being used by public health agencies for retrospectively studying the dynamics of epidemic, and for drawing conclusions about the management of the epidemic.

14.4 Discussion

There has been impressive progress in the development of informal digital systems for disease surveillance. Informal digital systems are widely used by the general public, as well as by health officials. A good example is the GORAN digital system (the Global Outbreak and Response Network) developed by the WHO, which gather information from number of sources both governmental and informal, including GPHIN and ProMED-mail [17].

One of the most prominent suggested advantages of the digital systems is their functioning in early notification of infectious disease outbreaks, before the official notifications, and their contribution to the epidemiological investigation of the disease before official data are available. During epidemics, data gathered and disseminated through official public health authorities are usually not available to public health officials and to policy makers for some time, sometimes due to political and logistic limitations. This period of time is critical for estimating the epidemic dynamics and implementing the response plan [9]. Unlike official data, data collected by digital systems are available in near real-time, and may be used for epidemiological assessment.

A mandatory requisite for the use of digital systems data for epidemiological investigation of an outbreak is the reliability of the data, as well as their equivalence to official data. In other words, there should be a match between the number of cases derived from the informal data and the number of cases officially reported by public health authorities. Indeed, our results have pointed out an example in which a correlation between digital systems data and official data in the first stages of epidemic was confirmed in the data collected from Healthmap regarding the 2010 Haitian Cholera. However, as mentioned by the authors, epidemiological measurements using digital systems data should be also tested in other epidemics, in order to confirm the method’s reliability [9]. The fact that the number of subscribers to digital systems is increasing each year [15] makes these systems an efficient tool for globally spreading the information, as well as a tool for epidemiological investigation, complementary to official data. However, despite their theoretical advantage over traditional surveillance, there is no evidence in the literature that information collected through digital system had affected public health policy makers.

Although we did not find evidence in the literature, we believe that digital systems may also contribute to the public health community after the outbreak ends. The abundance of reports collected and disseminated by these systems during outbreaks creates an epidemiological reservoir, which, due to its availability worldwide, may be used for a post-pandemic investigation and conclusion making.

As for early detection of infectious disease outbreaks, we did not find any prospective evidence showing the capability of digital systems of detection infectious disease outbreaks in real-time. Our results are consistent with some other studies conclusions, pointing out that currently digital systems are not capable of detecting an outbreak [17, 4]. Although there is evidence of informal digital systems publishing reports on outbreaks before official detection (such as in the Polio outbreak of 2013 and 2014 pointed out in the results section) [1], these reports did not actually affect the process of detection. The formal process of detection includes receiving the information, processing the information and using the information. The early digital systems reports were not used in any of the detection phases and did not change the process. It may be viewed as an analogue to screening tests which are effective only if they are capable of changing the natural history of a disease. Since there is no evidence of informal digital systems capable of changing the “natural history of outbreak” so far, they cannot be considered useful for early detection.

Informal digital systems may also have an important role in disease surveillance. Incorporating informal digital systems into existing formal systems may improve their performance. A study in the United States showed that combining information gathered from informal digital systems with information received from the Texas Influenza-Like-Illness Surveillance Network (ILINet) improved the ability of predicting hospitalizations due to influenza [22]. Another study in the United States showed a good correlation between Google flu searches and emergency department influenza-like illness visits [10].

Moreover, since digital sources usually contains data not captured through traditional methods, they are used by public health organizations, including the Global Outbreak Alert and Response Network of the WHO, which uses digital sources for surveillance on a daily basis [4].

However, the usage of digital systems as a surveillance tool may have some limitations. First, most systems accumulate a huge mass of information on a large variety of diseases, making it difficult to extract critical information. In other words, no integration of the information is performed to yield useful information. The challenge is to present critical information clearly and concisely. Second, digital systems are less specific than traditional surveillance systems, mostly due to false alarms, misinformation and information based on rumors [9, 17]. Therefore, they may not be solely used but as a complementary tool for traditional surveillance systems [17]. A third limitation is the lack of a response system to early warnings. With the lack of such a system, early warning is not useful, as no practical action is followed by the publication of the information. Such a response system may include triggers and decision criteria, which would lead to an appropriate and proportionate response to the threat [17].

To summarize, considerable efforts and resources have been invested in the development of informal digital system for detection of infectious disease outbreaks. As a result, a new generation of informal digital systems has emerged. The most prominent advantage of such systems is their ability to report on an outbreak in near real-time, or, in other words, before the information is officially reported, and by this to be used by public health decision makers for epidemiological assessment and preparation for the pandemic. Currently there is no evidence in the literature for their capability to detect an outbreak at its onset. In addition, there are no hard data to prove the benefits of using such systems before and during an outbreak. We do not believe that they can be used to identify early cases, but should be used as a support system for describing the spread of the disease. The challenge is to empirically assess the efficiency of informal digital systems and their use for decision making and interventions during crisis, as well as to test the systems’ sensitivity and specificity. A more general informal system, which provides syndromic-based analysis of reports disseminated by all currently existing systems, may be the next step toward disease outbreak detection based on informal systems.