FluDetWeb: an interactive web-based system for the early detection of the onset of influenza epidemics
- 5.6k Downloads
The early identification of influenza outbreaks has became a priority in public health practice. A large variety of statistical algorithms for the automated monitoring of influenza surveillance have been proposed, but most of them require not only a lot of computational effort but also operation of sometimes not-so-friendly software.
In this paper, we introduce FluDetWeb, an implementation of a prospective influenza surveillance methodology based on a client-server architecture with a thin (web-based) client application design. Users can introduce and edit their own data consisting of a series of weekly influenza incidence rates. The system returns the probability of being in an epidemic phase (via e-mail if desired). When the probability is greater than 0.5, it also returns the probability of an increase in the incidence rate during the following week. The system also provides two complementary graphs. This system has been implemented using statistical free-software (ℝ and WinBUGS), a web server environment for Java code (Tomcat) and a software module created by us (Rdp) responsible for managing internal tasks; the software package MySQL has been used to construct the database management system. The implementation is available on-line from: http://www.geeitema.org/meviepi/fludetweb/.
The ease of use of FluDetWeb and its on-line availability can make it a valuable tool for public health practitioners who want to obtain information about the probability that their system is in an epidemic phase. Moreover, the architecture described can also be useful for developers of systems based on computationally intensive methods.
KeywordsInfluenza Posterior Probability Markov Chain Monte Carlo Public Health Practitioner Influenza Surveillance
Public Health agencies use disease surveillance tools in order to monitor the incidence or prevalence of specific health problems over time. This knowledge allows them to detect changes in the estimated incidence rates, which produces better planning and allocation of resources and the possibility of avoiding breakdowns in Health Care Systems. In addition, a good surveillance infrastructure can be very useful in preparing for pandemics and for monitoring new emerging diseases.
An important matter of concern when dealing with the surveillance of infectious diseases is that of detecting the onset of an epidemic as soon as possible. The early identification of infectious disease outbreaks would enable prompt intervention which could have, for example, a great impact on the number of lives saved. Several statistical methods have been proposed (and most of them applied) over recent decades for detecting outbreaks and informing health authorities of the presence and spread of disease (see LeStrat , Buckeridge  and Burkom  for comprehensive surveys of these kinds of methods and Bravata et al.  for a critical evaluation of the potential utility of surveillance systems for illnesses and syndromes related to bioterrorism up to that date).
Among other diseases, influenza has been of special interest among researchers as influenza epidemics occur virtually every year and result in substantial disease, death and expense. Moreover, genetic changes in the influenza virus make vaccine effectiveness questionable every year and give this disease pandemic potential. Although the extent and severity of such epidemics vary greatly, it is worth noting that approximately 10–15% of people get influenza around the world every year and that the disease is responsible for up to 50 million illnesses and up to 47,200 deaths in the United States each year, with a similar situation in Europe , . With all these figures in mind, it is quite understandable why the control of influenza has become a priority in public health practice.
As a result, a large variety of statistical algorithms for the automated monitoring of influenza surveillance have been proposed. The most widely used approaches are based on historical limit methods or on Serfling's method . For instance, these methods are used, respectively, in Europe by the European Influenza Surveillance Scheme (EISS) and in the United States by the Center for Disease Control and Prevention (CDC) Influenza Branch. Although both methods are very easy to implement, they have some drawbacks (see Rath et al.  and Martínez-Beneito et al.  for more details). Many other solutions have been proposed and we just highlight here some of the most recent: LeStrat and Carrat , Rath et al. , Viboud et al. , Cowling et al. , Nuño and Pagano , Bock et al.  and Jégat et al. .
The complexity of disease surveillance methods has been increasing progressively. In fact, most of the above mentioned methods are not easy to implement. On the contrary, most of them and, in general, most advanced surveillance systems require skilled personnel to implement, fine-tune and maintain them. These requirements have kept these new developments from common usage. In order to resolve this issue, there has been a recent interest in enhancing existing disease surveillance methodologies by using tools for presenting data and information to users. Hauenstein et al.  describe in detail the processes and tools (such as system architecture, web-based applications, etc.) needed to do so. Two examples of how web-based surveillance systems can enhance the ability for identifying, estimating and assessing public health hazards are a web application by Pelat et al. , which allows users to analyze seasonal time series with periodic regression models, and Berchialla et al. , who present a web-based tool for injury risk assessment of foreign body injuries in children. Lewis et al.  review other existing automated disease surveillance systems in use by health departments (ESSENCE, RODS, EARS, RedBat and SYRIS).
The main purpose of this paper is to provide an enhanced web implementation of a novel prospective influenza surveillance methodology . The method uses a Bayesian Markov switching model to determine the epidemic and non-epidemic periods from influenza surveillance data, and so detect influenza epidemics during the first onset week or as soon as the data allow. Nevertheless, this methodology requires a lot of computational effort and knowledge of sometimes not-so-friendly software. In particular, in order to estimate the parameters of the model, Markov Chain Monte Carlo (MCMC) methods are necessary, WinBUGS  being our choice to carry out the inference.
Implementation of the surveillance methodology has been done using a client-server architecture with a web-based client application design. By way of a friendly interface, users can introduce and edit their own data consisting of a series of weekly influenza incidence rates. Users may also obtain estimates of the probability of being in an epidemic phase for weeks of interest. The estimation process is not immediate, so the system has been designed to respond to requests from a multi-user environment on a first-come, first-served basis. After completion of the process, the system returns the probability of being in an epidemic phase together with the probability of an increase in the incidence rate during the following week. It also provides two graphs. The first one shows the weekly rates of the last two seasons indicating whether the posterior probability of being in an epidemic phase in the analyzed week is greater than 0.5 or not. The second one shows all the weekly rates with flags only for requested weeks. In particular, flags indicate whether the posterior probability of being in an epidemic phase is greater than 0.5 or not. The ease of use and its on-line availability should make the resulting application a valuable tool for public health practitioners.
In what follows, we introduce the kind of data sets that could be analyzed using our prospective surveillance method , we briefly review the method itself and we describe the client-server architecture and the client application design used to implement our surveillance methodology.
The method was originally developed to analyze data from the Valencian Sentinel Network (VSN) for influenza surveillance, a system which collects information on influenza-like illness (ILI) in the Comunitat Valenciana, one of the 17 autonomous regions in Spain. Like other sentinel Networks, the VSN is formed by volunteer practitioners that report weekly the number of ILI cases (usually defined as fever plus acute respiratory symptoms such as cough and/or sore throat) in seasons (each one lasting 30 weeks) that extend over two consecutive years, as the epidemic activity usually extends across both of them. It is worth mentioning that each weekly rate is obtained by considering the population covered by those sentinels that report information on the corresponding week.
Nevertheless, the usefulness of a surveillance method is measured by its adaptability to the environment in which it operates. As stated above, our method was developed to analyze weekly incidence rates (as is usual in all the Spanish Sentinel Networks). But it can be adapted (with slight modifications) to work with data coming from Sentinel Networks in which providers report weekly the percentage of patients with ILI from the total number of patients seen and the number of those patients with ILI. Moreover, the method is applicable not only for Western countries, but for any other network in which the identified periods of high possibility of influenza activity last the whole year. In this latter case, seasons could be defined as the whole year.
Instead of modelling the mean of the influenza incidence rates series, it has been discussed in  that it is more appropriate to model the first-order differenced series (formed by the differences between rates in consecutive weeks). In particular, the underlying prospective influenza surveillance method is based on a modelling which segments the series of differences into two phases, epidemic and non-epidemic, using a Markov switching model (see  for a detailed description of the method).
Using all the data set, Bayesian paradigm is used to estimate the parameters, which needs the specification of the priors and their corresponding hyperpriors (see  for more details). Nevertheless, the resulting posterior distribution of the parameters P (parameters|data) does not yield analytical estimates and so in order to estimate the parameters of the model, Markov Chain Monte Carlo (MCMC) methods are necessary, WinBUGS  being our choice to carry out the inference. More details and the WinBUGS code can be downloaded from the following web page: http://www.geeitema.org/doc/meviepi/influenza.html. From the simulation of the posterior distribution of all the parameters it is possible to obtain a lot of information. In particular, it can be used to identify which are the epidemic weeks during the whole period analyzed, most importantly, the distribution of the state of the last week analyzed. Knowing whether the system is in an epidemic phase during the analyzed week is so important because it allows an on-line use of the method which can be crucial to detecting the time step at which the epidemic phase starts.
where P(Yi, j|parameters) ~ Open image in new window .
Neither the posterior predictive distribution nor the posterior distribution of the parameters have an analytical form. Nevertheless, it is not difficult to obtain a simulation from the predictive distribution P(Yi, j|data) by first simulating from the posterior distribution of the parameters P(parameters|data) and then simulating from the distribution of the difference Yi, jconditional to those previously simulated valuesof the parameters (see, for instance, Gelman et al.  for a description of how to simulate from posterior predictive distributions).
Architecture of the system
As Hauenstein et al.  state, "the cornerstone of a robust and effective electronic information system is a carefully designed architecture that meets the needs of its users for reliability, performance, and usability and the requirements of the development team for cost, scalability, security and maintainability".
Following their comments, one of the first issues to consider when building an information system is to choose an appropriate architecture.
The second tier is the business logic tier, which is the core of the system as it controls the running of our prospective influenza surveillance algorithm. This tier consists of two components. The first one is Tomcat , a web container that functions as a web application server supporting servlets and JSPs whose function is to insert and edit data in the database and send information to the visualization tier. The second one is Rdp (R Distributed and Persistent), a software module created by us and implemented in Java using Apache Commons Daemon, Rclient and Java Mail libraries. We call it "distributed" because tasks are distributed between slaves, and "persistent" because all the necessary information for recovering the system is stored in the database via the Application Programming Interface (API) JDBC.
Basically, Rdp is responsible for managing tasks and controlling the availability of slaves in order to send tasks to those free slaves and recover information from them when the task is finished. In particular, when a request for the probability of being in the epidemic phase is sent by any user, the request is stored in the database in a list of tasks to be done. Rdp is in charge of checking both the list of tasks and the list of free slaves in such a way that when Rdp detects that there is one free slave and one task on the list, it sends the task to the slave to be done. As the process to complete the tasks is not immediate, the system has been designed to respond to demands on a first-come first-served basis. The Rclient module is used to connect the server with R-serve, a package of ℝ installed in each slave. This package is ultimately responsible for sending the tasks to ℝ and WinBUGS. When the task is done, the results obtained are sent (if desired) to the user attaching a pdf document generated using the API JasperReports .
Using all the computers in the department to make the calculations allows any member of the departament to check the list of tasks to be done at any moment and (if necessary) execute Rserve on his/her PC and add the PC to the list of free slaves.
The final layer is the data tier and, as mentioned above, it is responsible for data storage, not only of the influenza rates but also of the user's personal information, availability and state of slaves, IP addresses, assigned tasks, etc. In order to construct our relational database, we have used MySQL© software .
In what follows we present a case study to demonstrate how our web-based application allows users (epidemiologists, public health officials, etc.) to obtain the posterior probability of being in an epidemic phase, and so rapidly detect when the annual flu epidemic period starts. To do that, we will use the data set introduced above, consisting of the thirteen time series formed by the weekly ILI incidence rates provided by the VSN during the seasons from 1996–1997 to 2008–2009. All the WinBUGS and ℝ codes are freely available in Additional file 1.
Using the system
After registering (when using the system for first time) and logging on, users automatically enter the initial page from which they can access the four main pages. From the first page, users can edit and modify their personal information, while the second page is from where users can enter and/or edit their own influenza data. As mentioned above, weekly ILI incidence rates must be per 100.000 inhabitants.
The process for obtaining the results could take several minutes, depending on how busy the system is. If users select the option "Send results via e-mail" in the personal data, they will get the results in a pdf file. A second option is to look at the View Results page when calculations are finished. This page is similar to the application launcher page, but instead of showing the rates it shows the posterior probability of being in the epidemic phase (with the same code of colors mentioned above) for all the weeks in which we have asked to obtain it (following the above mentioned condition of using only information from the weeks previous to the one analyzed). FluDetWeb shows a separate page of results for each week analyzed. This page presents the posterior probability of being in the epidemic phase. Values exceeding 0.5 indicate that, in that week, we are observing a higher probability of being in an epidemic phase than of being in a non-epidemic one, and so an alarm could be triggered if considered necessary. If this probability does exceed 0.5, the program also shows the probabilities of an increase and of a decrease in the incidence rate for the coming week. Otherwise, no other probabilities are shown.
This information should be sufficient for users to detect when the annual flu epidemic period starts. But bearing in mind that the best way of communicating information to users is by using visualization components , FluDetWeb also provides two graphs. The first one is a comparison graph of the weekly influenza incidence rates during the current and the previous season indicating if the posterior probability of being in an epidemic phase in the analyzed week is lower than 0.5 (black spot) or greater (red spot). The second one shows the weekly rates of all the seasons and indicates, in a similar manner to the application launcher page, in which weeks it is not possible to obtain the posterior probability (showing the weekly rate in black), in which ones it has not been obtained (in white), and, for those in which it has been calculated, if probability is greater than 0.5 (in red) or less than 0.5 (in blue).
Analyzing the data from the VSN
The Valencian Sentinel Network collects weekly ILI incidence rates in seasons that extend over two consecutive years, each season lasting 30 weeks (from the 42nd week of one year to the 19th week of the following), and has been reporting information on ILI cases since 1996. As can be appreciated in Figure 1, at the time of writing this paper (October 29th, 2008), data consist of twelve complete time series (from 1996–1997 to 2007–2008) and one partial time series (corresponding to the 2008–2009 season) only containing four weekly ILI incidence rates.
Our interest in this paper has been to describe an implementation of a prospective methodology for obtaining the posterior probability of being in an epidemic phase. Implementation has been done using a client-server architecture with a web-based client application design, which allows users to introduce and edit their own data, and obtain information about the possibility of their system being in an epidemic phase. Data needed are weekly ILI incidence rates (per 100000 habitants) provided by a Sentinel Network obtained by considering only the population covered by those sentinels that report information on the corresponding week. In order to obtain results, the minimum input dataset must contain at least 3 years of historical rates. Availability and software requirements are listed below in the following Section.
We now comment on possible extensions to this implementation. First of all, one of the benefits of using a three tier architecture in which the functions of the client-server are defined separately is that each layer could be upgraded or replaced independently. This modularity allows us to change any part we want, for instance, the algorithm used to detect the instant. We could change it, for example, for another in which the probability of being in an epidemic phase could depend not only on the rate of the previous week but also on the particular moment in the season (maybe at its early stages or at its final ones).
In line with this, at the moment we are developing a different methodology which could be used with other kinds of data (percentages, rates, etc.), for instance, with data coming from Sentinel Networks in which providers report weekly the percentage of patients with ILI from the total number of patients seen and the number of those patients with ILI.
Another extension could be to incorporate other statistical algorithms for automated monitoring of influenza surveillance and the possibility of comparing their behaviour, in a similar way as in the R-package surveillance by Höhle , which contains functionality to visualize surveillance data, provides algorithms for the detection of aberrations and benchmark numbers like sensitivity, specificity and detection delay in order to compare algorithms.
With respect to the limitations of this implementation, we should point out that our prospective influenza surveillance methodology needs the specification of two hyperparameters, a and b. Our web system has been fine-tuned for these values by giving two specific values. Using them in other situations could result in erroneous conclusions. The second limitation is the need of a complete run of the MCMC method every week. The waiting time for getting the result is not too long (less than 5 minutes), but a great demand of this system could cause a long delay in getting back the results. One way of solving this issue could be using sequential MCMC. This method basically consists of taking advantage of the results from the previous week in order to get more rapid an estimation of the probability of being in an epidemic phase in the analyzed week.
Finally, we would like to stress that the ease of use of FluDetWeb and its on-line availability can make it a valuable tool for public health practitioners who want to obtain information about the probability that their system is in an epidemic phase and that the architecture described can also be useful for developers of systems based on computationally intensive methods.
Availability and requirements
Project name: FluDetWeb.
Project home page: http://www.geeitema.org/meviepi/fludetweb/
Operating system: Platform independent.
Programming language: R, WinBUGS, JavaServer Pages, Java (tested with Mozilla and Internet Explorer).
Other requirements: Java 1.3.1 or higher, Tomcat 4.0 or higher, Rserve, Java Mail, Rclient, JasperReport and MySQL.
License: GNU, GPL.
Any restrictions to use by non-academics: no licence needed.
The authors would like to thank the generous work of the practitioners of the Valencian Sentinel Network. Financial support from the Conselleria de Sanitat of the Generalitat Valenciana (the Valencian Regional Health Authority) is gratefully acknowledged. The authors would also like to acknowledge financial support from the Ministerio de Educación y Ciencia (the Spanish Ministry of Education and Science) via research grants MTM2007-61554 (jointly financed with the European Regional Development Fund) and FUT-C2-0047 (as part of the INGENIO-MATHEMATICA research project) and from the Generalitat Valenciana via research grants GV/2007/079, AP-049/08 and EVES-015/2008.
- 1.LeStrat Y: Overview of temporal surveillance. Spatial and syndromic surveillance for public health. Edited by: Lawson AB, Kleinman K. 2005, John Wiley and Sons, Ltd, 13-29.Google Scholar
- 3.Burkom H: Alerting algorithms for Biosurveillance. Disease surveillance, a public health informatics approach. Edited by: Lombardo JS, Buckeridge DL. 2007, John Wiley and Sons, Ltd, 143-192.Google Scholar
- 6.Fleming DM, Zambon M, Bartelds AI, de Jong JC: The duration and magnitude of influenza epidemics: a study of surveillance data from sentinel general practices in England, Wales and the Netherlands. European Journal of Epidemiology. 1999, 15: 467-473. 10.1023/A:1007525402861.CrossRefPubMedGoogle Scholar
- 14.Nuño M, Pagano M: A Model for Characterizing Annual Flu Cases. Intelligence and Security Informatics: BioSurveillance, Lecture Notes in Computer Science. Edited by: Zeng D, Gotham I, Komatsu K, Lynch C, Thurmond M, Madigan D, Lober B, Kvach J, Chen H. 2007, Springer Berlin/Heidelberg, 4506: 37-46.CrossRefGoogle Scholar
- 17.Hauenstein L, Wojcik R, Loschen W, Ashar R, Sniegoski C, Tabernero N: Putting it together: the Biosurveillance information system. Disease surveillance, a public health informatics approach. Edited by: Lombardo JS, Buckeridge DL. 2007, John Wiley and Sons, Ltd, 193-261.Google Scholar
- 18.Pelat C, Boëlle PY, Cowling BJ, Carrat F, Flahault A, Ansart S, Valleron AJ: Online detection and quantification of epidemics. BMC Medical Informatics and Decision Making. 2007, 7 (29):Google Scholar
- 20.Lewis SH, Hurt-Mullen K, Martin C, Ma H, Tokars JI, Lombardo JS, Babin S: Putting it together: the Biosurveillance information system. Modern disease surveillance systems in public health practice. Edited by: Lombardo JS, Buckeridge DL. 2007, John Wiley and Sons, Ltd, 265-302.Google Scholar
- 22.Gelman A, Carlin JB, Stern HS, Rubin DB: Bayesian Data Analysis. 2003, Chapman and Hall, 2Google Scholar
- 24.JavaServer Pages: JavaServer Pages TM. [http://java.sun.com/products/jsp/]
- 25.The Apache Software Foundation: Apache Tomcat. [http://tomcat.apache.org/]
- 26.JasperForge: JasperReports. [http://jasperforge.org/]
- 27.MySQL: MySQL 5.5 G.A. [http://www.mysql.com/]
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/9/36/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.