Skip to main content

DAnIEL: Language Independent Character-Based News Surveillance

  • Conference paper
Advances in Natural Language Processing (JapTAL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7614))

Included in the following conference series:

Abstract

This study aims at developing a news surveillance system able to address multilingual web corpora. As an example of a domain where multilingual capacity is crucial, we focus on Epidemic Surveillance. This task necessitates worldwide coverage of news in order to detect new events as quickly as possible, anywhere, whatever the language it is first reported in. In this study, text-genre is used rather than sentence analysis. The news-genre properties allow us to assess the thematic relevance of news, filtered with the help of a specialised lexicon that is automatically collected on Wikipedia. Afterwards, a more detailed analysis of text specific properties is applied to relevant documents to better characterize the epidemic event (i.e., which disease spreads where?). Results from 400 documents in each language demonstrate the interest of this multilingual approach with light resources. DAnIEL achieves an F 1-measure score around 85%. Two issues are addressed: the first is morphology rich languages, e.g. Greek, Polish and Russian as compared to English. The second is event location detection as related to disease detection. This system provides a reliable alternative to the generic IE architecture that is constrained by the lack of numerous components in many languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Linge, J., Steinberger, R., Weber, T., Yangarber, R., van der Goot, E., Al Khudhairy, D., Stilianakis, N.: Internet surveillance systems for early alerting of threats. EurosurveillanceĀ 14(13) (2009)

    Google ScholarĀ 

  2. Lyon, A., Nunn, M., Grossel, G., Burgman, M.: Comparison of web-based biosecurity intelligence systems: BioCaster, EpiSPIDER and HealthMap. Transboundary and Emerging Diseases (2011)

    Google ScholarĀ 

  3. Son, D., Quoc, H.N., Ai, K., Collier, N.: Global health monitor - a web-based system for detecting and mapping infectious diseases. In: International Joint Conference on Natural Language Processing, pp. 951ā€“956 (2008)

    Google ScholarĀ 

  4. Hartley, D.M., Nelson, N.P., Walters, R., Arthur, R., Yangarber, R., Madoff, L., Linge, J., Mawudeku, A., Collier, N., Bronstein, J.S., Thinus, G., Lightfoot, N.: The landscape of international event-based biosurveillance. Emerging Health Threats JournalĀ 3(e3) (2010)

    Google ScholarĀ 

  5. Reilly, A.R., Iarocci, E.A., Jung, C.M., Hartley, D.M., Nelson, N.P.: Indications and warning of pandemic influenza compared to seasonal inflluenza. Advances in Disease SurveillanceĀ 5, 190 (2008)

    Google ScholarĀ 

  6. Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., Yangarber, R.: Text mining from the web for medical intelligence. In: Mining Massive Data Sets for Security, pp. 295ā€“310. OIS Press (2008)

    Google ScholarĀ 

  7. Huttunen, S., Arto, V., von Etter, P., Yangarber, R.: Relevance prediction in information extraction using discourse and lexical features. In: Nordic Conference on Computational Linguistics, Nodalida 2011, pp. 114ā€“121 (2011)

    Google ScholarĀ 

  8. Ji, H.: Challenges from information extraction to information fusion. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 507ā€“515 (2010)

    Google ScholarĀ 

  9. Du, M., Von Etter, P., Kopotev, M., Novikov, M., Tarbeeva, N., Yangarber, R.: Building Support Tools for Russian-Language Information Extraction. In: Habernal, I., MatouÅ”ek, V. (eds.) TSD 2011. LNCS, vol.Ā 6836, pp. 380ā€“387. Springer, Heidelberg (2011)

    ChapterĀ  Google ScholarĀ 

  10. Lucas, N.: Stylistic devices in the news, as related to topic recognition. In: Kwiatkowska, A. (ed.) Texts and Minds: Papers in Cognitive Poetics and Rhetoric. ÅĆ³dÅŗ, Studies in language. Peter Lang, Frankfurt am Main, vol.Ā 26, pp. 301ā€“316 (2012)

    Google ScholarĀ 

  11. Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Open information extraction: The second generation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 3ā€“10 (2011)

    Google ScholarĀ 

  12. Hobbs, J.R.: The generic information extraction system. In: Proceedings of the 5th Conference on Message Understanding, MUC5 1993, pp. 87ā€“91. Association for Computational Linguistics, Stroudsburg (1993)

    ChapterĀ  Google ScholarĀ 

  13. Steinberger, R.: A survey of methods to ease the development of highly multilingual text mining applications. Language Resources and Evaluation, 1ā€“22 (2011)

    Google ScholarĀ 

  14. Church, K.: Empirical estimates of adaptation: the chance of two Noriegas is closer to \(\frac{p}{2}\) than p 2. In: Proceedings of the 18th Conference on Computational Linguistics, vol.Ā 1, pp. 173ā€“179. Association for Computational Linguistics (2000)

    Google ScholarĀ 

  15. Collier, N., Ai, K., Jin, L., et al.: A multilingual ontology for infectious disease surveillance: rationale, design and challenges. Journal of Language Resources and Evaluation, 405ā€“413 (2007)

    Google ScholarĀ 

  16. Ukkonen, E.: Maximal and minimal representations of gapped and non-gapped motifs of a string. Theorie in Computer ScienceĀ 410(43), 4341ā€“4349 (2009)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  17. KƤrkkƤinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. Journal of the ACMĀ 53(6), 918ā€“936 (2006)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  18. Liao, S., Grishman, R.: Using document level cross-event inference to improve event extraction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 789ā€“797 (2010)

    Google ScholarĀ 

  19. Piskorski, J., Belyaeva, J., Atkinson, M.: On refining real-time multilingual news event extraction through deployment of cross-lingual information fusion techniques. In: Proceedings of European Intelligence and Security Informatics Conference (EISIC), pp. 38ā€“45 (2011)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lejeune, G., Brixtel, R., Doucet, A., Lucas, N. (2012). DAnIEL: Language Independent Character-Based News Surveillance. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33983-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33982-0

  • Online ISBN: 978-3-642-33983-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics