DAnIEL: Language Independent Character-Based News Surveillance

Lejeune, Gaël; Brixtel, Romain; Doucet, Antoine; Lucas, Nadine

doi:10.1007/978-3-642-33983-7_7

Gaël Lejeune²⁰,
Romain Brixtel²⁰,
Antoine Doucet²⁰ &
…
Nadine Lucas²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7614))

Included in the following conference series:

International Conference on NLP

1599 Accesses
4 Citations

Abstract

This study aims at developing a news surveillance system able to address multilingual web corpora. As an example of a domain where multilingual capacity is crucial, we focus on Epidemic Surveillance. This task necessitates worldwide coverage of news in order to detect new events as quickly as possible, anywhere, whatever the language it is first reported in. In this study, text-genre is used rather than sentence analysis. The news-genre properties allow us to assess the thematic relevance of news, filtered with the help of a specialised lexicon that is automatically collected on Wikipedia. Afterwards, a more detailed analysis of text specific properties is applied to relevant documents to better characterize the epidemic event (i.e., which disease spreads where?). Results from 400 documents in each language demonstrate the interest of this multilingual approach with light resources. DAnIEL achieves an F ₁-measure score around 85%. Two issues are addressed: the first is morphology rich languages, e.g. Greek, Polish and Russian as compared to English. The second is event location detection as related to disease detection. This system provides a reliable alternative to the generic IE architecture that is constrained by the lack of numerous components in many languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Linge, J., Steinberger, R., Weber, T., Yangarber, R., van der Goot, E., Al Khudhairy, D., Stilianakis, N.: Internet surveillance systems for early alerting of threats. Eurosurveillance 14(13) (2009)
Google Scholar
Lyon, A., Nunn, M., Grossel, G., Burgman, M.: Comparison of web-based biosecurity intelligence systems: BioCaster, EpiSPIDER and HealthMap. Transboundary and Emerging Diseases (2011)
Google Scholar
Son, D., Quoc, H.N., Ai, K., Collier, N.: Global health monitor - a web-based system for detecting and mapping infectious diseases. In: International Joint Conference on Natural Language Processing, pp. 951–956 (2008)
Google Scholar
Hartley, D.M., Nelson, N.P., Walters, R., Arthur, R., Yangarber, R., Madoff, L., Linge, J., Mawudeku, A., Collier, N., Bronstein, J.S., Thinus, G., Lightfoot, N.: The landscape of international event-based biosurveillance. Emerging Health Threats Journal 3(e3) (2010)
Google Scholar
Reilly, A.R., Iarocci, E.A., Jung, C.M., Hartley, D.M., Nelson, N.P.: Indications and warning of pandemic influenza compared to seasonal inflluenza. Advances in Disease Surveillance 5, 190 (2008)
Google Scholar
Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., Yangarber, R.: Text mining from the web for medical intelligence. In: Mining Massive Data Sets for Security, pp. 295–310. OIS Press (2008)
Google Scholar
Huttunen, S., Arto, V., von Etter, P., Yangarber, R.: Relevance prediction in information extraction using discourse and lexical features. In: Nordic Conference on Computational Linguistics, Nodalida 2011, pp. 114–121 (2011)
Google Scholar
Ji, H.: Challenges from information extraction to information fusion. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 507–515 (2010)
Google Scholar
Du, M., Von Etter, P., Kopotev, M., Novikov, M., Tarbeeva, N., Yangarber, R.: Building Support Tools for Russian-Language Information Extraction. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 380–387. Springer, Heidelberg (2011)
Chapter Google Scholar
Lucas, N.: Stylistic devices in the news, as related to topic recognition. In: Kwiatkowska, A. (ed.) Texts and Minds: Papers in Cognitive Poetics and Rhetoric. Łódź, Studies in language. Peter Lang, Frankfurt am Main, vol. 26, pp. 301–316 (2012)
Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Open information extraction: The second generation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 3–10 (2011)
Google Scholar
Hobbs, J.R.: The generic information extraction system. In: Proceedings of the 5th Conference on Message Understanding, MUC5 1993, pp. 87–91. Association for Computational Linguistics, Stroudsburg (1993)
Chapter Google Scholar
Steinberger, R.: A survey of methods to ease the development of highly multilingual text mining applications. Language Resources and Evaluation, 1–22 (2011)
Google Scholar
Church, K.: Empirical estimates of adaptation: the chance of two Noriegas is closer to \(\frac{p}{2}\) than p ². In: Proceedings of the 18th Conference on Computational Linguistics, vol. 1, pp. 173–179. Association for Computational Linguistics (2000)
Google Scholar
Collier, N., Ai, K., Jin, L., et al.: A multilingual ontology for infectious disease surveillance: rationale, design and challenges. Journal of Language Resources and Evaluation, 405–413 (2007)
Google Scholar
Ukkonen, E.: Maximal and minimal representations of gapped and non-gapped motifs of a string. Theorie in Computer Science 410(43), 4341–4349 (2009)
Article MathSciNet MATH Google Scholar
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. Journal of the ACM 53(6), 918–936 (2006)
Article MathSciNet Google Scholar
Liao, S., Grishman, R.: Using document level cross-event inference to improve event extraction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 789–797 (2010)
Google Scholar
Piskorski, J., Belyaeva, J., Atkinson, M.: On refining real-time multilingual news event extraction through deployment of cross-lingual information fusion techniques. In: Proceedings of European Intelligence and Security Informatics Conference (EISIC), pp. 38–45 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

GREYC, University of Caen Lower-Normandy, Boulevard du Maréchal Juin, BP5186, 14032, Caen Cedex, France
Gaël Lejeune, Romain Brixtel, Antoine Doucet & Nadine Lucas

Authors

Gaël Lejeune
View author publications
You can also search for this author in PubMed Google Scholar
Romain Brixtel
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Doucet
View author publications
You can also search for this author in PubMed Google Scholar
Nadine Lucas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information and Media Center, Toyohashi Universtiy of Technology, 1-1 Hibarigaoka, Tenpakucho, 441-8580, Toyohashi, Japan
Hitoshi Isahara & Kyoko Kanzaki &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lejeune, G., Brixtel, R., Doucet, A., Lucas, N. (2012). DAnIEL: Language Independent Character-Based News Surveillance. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-33983-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33982-0
Online ISBN: 978-3-642-33983-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics