Added-Value of Automatic Multilingual Text Analysis for Epidemic Surveillance
The early detection of disease outbursts is an important objective of epidemic surveillance. The web news are one of the information bases for detecting epidemic events as soon as possible, but to analyze tens of thousands articles published daily is costly. Recently, automatic systems have been devoted to epidemiological surveillance. The main issue for these systems is to process more languages at a limited cost. However, existing systems mainly process major languages (English, French, Russian, Spanish…). Thus, when the first news reporting a disease is in a minor language, the timeliness of event detection is worsened. In this paper, we test an automatic style-based method, designed to fill the gaps of existing automatic systems. It is parsimonious in resources and specially designed for multilingual issues. The events detected by the human-moderated ProMED mail between November 2011 and January 2012 are used as a reference dataset and compared to events detected in 17 languages by the system DAnIEL2 from web articles of this time-window. We show how being able to process press articles in languages less-spoken allows quicker detection of epidemic events in some regions of the world.
KeywordsNatural Language Processing Local Language Reference Dataset Main Language Major Language
Unable to display preview. Download preview PDF.
- 1.Collier, N.: Towards cross-lingual alerting for bursty epidemic events. Journal of Biomedical Semantics 2(supp. 5), 1–11 (2011)Google Scholar
- 4.Katsiavriades, K., Qureshi, T.: The 30 most spoken languages of the world (2007), http://www.krysstal.com/spoken.html
- 6.Lejeune, G., Doucet, A., Yangarber, R., Lucas, N.: Filtering news for epidemic surveillance: Towards processing more languages with fewer resources. In: 4th Workshop on Cross Lingual Information Access, pp. 3–10 (2010)Google Scholar
- 7.Lyon, A., Nunn, M., Grossel, G., Burgman, M.: Comparison of Web-Based Biosecurity Intelligence Systems: BioCaster, EpiSPIDER and HealthMap. Transboundary and Emerging Diseases 59(3), 223–232 (2011), http://dx.doi.org/10.1111/j.1865-1682.2011.01258.x CrossRefGoogle Scholar
- 8.Madoff, L., Freedman, D.: Detection of Infectious Diseases Using Unofficial Sources. In: Infectious Diseases: A Geographic Guide, pp. 11–21. Wiley-Blackwell (2011)Google Scholar
- 9.Mawudeku, A., Blench, M.: Global Public Health Intelligence Network (GPHIN). In: 7th Conference of the Association for Machine Translation in the Americas (AMTA), pp. 7–11 (2006)Google Scholar
- 12.Piskorski, J., Belyaeva, J., Atkinson, M.: Exploring the usefulness of cross-lingual information fusion for refining real-time news event extraction: A preliminary study. In: Proceedings of Recent Advances in Natural Language Processing, pp. 210–217 (2011)Google Scholar
- 13.Son, D., Quoc, H.N., Ai, K., Collier, N.: Global Health Monitor - A Web-based system for detecting and mapping infectious diseases. In: Proc. International Joint Conference on Natural Language Processing (IJCNLP), pp. 951–956 (2008)Google Scholar
- 14.Steinberger, R.: A survey of methods to ease the development of highly multilingual text mining applications. Language Resources and Evaluation, 1–22 (2011)Google Scholar
- 15.Tolentino, H., Kamadjeu, R., Fontelo, P., Liu, F., Matters, M., Pollack, M.P., Madoff, L.: Scanning the Emerging Infectious Diseases Horizon - Visualizing ProMED Emails Using EpiSPIDER. Advances in Disease Surveillance 2, 169 (2007)Google Scholar
- 16.Yangarber, R., von Etter, P., Steinberger, R.: Content collection and analysis in the domain of epidemiology. In: Proceedings of DrMED-2008: International Workshop on Describing Medical Web Resources (2008), http://www.mendeley.com/research/content-collection-analysis-domain-epidemiology/