Abstract
Epidemiological surveillance is an important issue of public health policy. In this paper, we describe a method based on knowledge extraction from news and news classification to understand the epidemic evolution. Descriptive studies are useful for gathering information on the incidence and characteristics of an epidemic. New approaches, based on new modes of mass publication through the web, are developed: based on the analysis of user queries or on the echo that an epidemic may have in the media. In this study, we focus on a particular media: web news. We propose the Epimining approach, which allows the extraction of information from web news (based on pattern research) and a fine classification of these news into various classes (new cases, deaths...). The experiments conducted on a real corpora (AFP news) showed a precision greater than 94% and an F-measure above 85%. We also investigate the interest of tacking into account the data collected through social networks such as Twitter to trigger alarms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tsui, F.C., Espino, J., Dato, V.M., Gesteland, P.H., Hutman, J., Wagner, M.: Technical description of rods: A real-time public health surveillance system. The Journal of the American Medical Informatics Association 10, 399–408 (2003)
Polgreen, P., Chen, Y., Pennock, D., Forrest, D.: Healthcare epidemiology: Using internet searches for influenza surveillance. Invited Article in Clinical Infectious Diseases – Infectious Diseases Society of America 47, 1443–1448 (2008)
Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature, 1012–1015 (2009)
Collier, N., Doan, S., Kawazoe, A., Goodwin, R., Conway, M., Tateno, Y., Ngo, Q., Dien, D., Kawtrakul, A., Takeuchi, K., Shigematsu, M., Taniguchi, K.: Biocaster: detecting public health rumors with a web-based text mining system. Bioinformatics 24(24), 2940–2941 (2008)
Zant, M.E., Royauté, J., Roux, M.: Représentation événementielle des déplacements dans des dépêches épidémiologiques. In: TALN 2008, Avignon (2008)
Zhanga, Y., Danga, Y., Chena, H., Thurmondb, M., Larsona, C.: Automatic online news monitoring and classification for syndromic surveillance. Decision Support Systems 47(4), 508–517 (2009)
Turchin, A., Kolatkar, N.S., Grant, R.W., Makhni, E.C., Pendergrass, M.L., Einbinder, J.S.: Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes. Journal of the American Medical Informatics Association: JAMIA 13(6), 691–695 (2006)
Lu, Y., Xu, H., Peterson, N.B., Dai, Q., Jiang, M., Denny, J., Liu, M.: Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches. Int. J. Data Min. Bioinformatics 6(4), 447–459 (2012)
Schmid, H.: Probabilistic Part-of-Speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: AAAI 1999 Workshop on Machine Learning for Information Extraction, pp. 1–6 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Breton, D., Bringay, S., Marques, F., Poncelet, P., Roche, M. (2013). Mining Web Data for Epidemiological Surveillance. In: Washio, T., Luo, J. (eds) Emerging Trends in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36778-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-36778-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36777-9
Online ISBN: 978-3-642-36778-6
eBook Packages: Computer ScienceComputer Science (R0)