Skip to main content

Exploring Curvature-Based Topic Development Analysis for Detecting Event Reporting Boundaries

  • Chapter
Aspects of Natural Language Processing

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5070))

Abstract

In the era of proliferation of electronic news media and an ever-growing demand for prompt and concise information, natural language text processing technologies which map free texts into structured data format are becoming paramount. Recently, we have witnessed an emergence of publicly accessible news aggregation systems for facilitating navigation through news. This paper reports on some explorations of refining a real-time news event extraction system, which runs on top of the Europe Media Monitoring news aggregation system developed at the Joint Research Centre of the European Commission. Our experiments focus on the task of detecting new events in a given news story, i.e. tagging events extracted by the core event extraction system as new. Several methods ranging from simple similarity computation of event descriptions of adjacent events to more elaborate ones based on curvature-based topic development analysis which utilize global knowledge. The paper describes first the particularities of the real-time news event extraction processing chain. Next, in order to get a better insight how news stories evolve over time some statistics on event dynamics are presented. Finally, the new event detection techniques are introduced and the results of the evaluation are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alani, H., Kim, S., Millard, D., Weal, M., Hall, W., Lewis, P., Shadbolt, N.: Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation. In: Proceedings of the Workshop on Knowledge Markup and Semantic Annotation, K-Cap’03 (2003)

    Google Scholar 

  2. Ashish, N., Appelt, D., Freitag, D., Zelenko, D.: Proceedings of the Workshop on Event Extraction and Synthesis, held in conjunction with the AAAI 2006 conference, Menlo Park, California, USA (2006)

    Google Scholar 

  3. Bejan, C., Harabagiu, S.: A Linguistic Resource for Discovering Event Structures and Resolving Event Coreference. In: ELRA, E.L.R.A. (ed.) Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco (2008)

    Google Scholar 

  4. Best, C., van der Goot, E., Blackler, K., Garcia, T., Horby, D.: Europe Media Monitor. Technical Report EUR 22173 EN, European Commission (2005)

    Google Scholar 

  5. Brants, T., Chen, F., Farahat, A.: A System for New Event Detection. In: SIGIR ’03: Proceedings of the 26t th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 330–337. ACM, New York (2003)

    Chapter  Google Scholar 

  6. Chieu, H., Keok Lee, Y.: Query Based Event Extraction along a Timeline. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 425–432. ACM, New York (2004)

    Google Scholar 

  7. Cox, T., Cox, M.: Multidimensional Scaling, 2nd edn. Monographs on Statistics and Applied Probability. Chapman and Hall, London (2001)

    MATH  Google Scholar 

  8. Fillmore, C., Narayanan, S., Baker, C.: What Linguistics Can Contribute to Event Extraction. In: Proceedings of the AAAI 2006 Workshop on Event Extraction, AAAI Press, Menlo Park (2006)

    Google Scholar 

  9. Grishman, R., Huttunen, S., Yangarber, R.: Real-time Event Extraction for Infectious Disease Outbreaks. In: Proceedings of Human Language Technology Conference (HLT) 2002, San Diego, USA (2002)

    Google Scholar 

  10. Hearst, M., Plaunt, C.: Subtopic Structuring for Full-length Document Access. In: Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 59–68 (1993)

    Google Scholar 

  11. Huttunen, S., Yangarber, R., Grishman, R.: Complexity of Event Structure in IE Scenarios. In: Proceedings of the 19th International Conference on Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 1–7 (2002)

    Google Scholar 

  12. Ji, H., Grishman, R.: Refining Event Extraction through Unsupervised Cross-document Inference. In: Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, Ohio, USA (2008)

    Google Scholar 

  13. Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for Text Learning Tasks. In: Proceedings of IJCAI-99 Workshop on Text Mining: Foundations, Techniques, and Applications, Stockholm, Sweden (1999)

    Google Scholar 

  14. King, G., Lowe, W.: An Automated Information Extraction Tool For International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design. International Organization 57, 617–642 (2003)

    Article  Google Scholar 

  15. Mann, G., Yarowsky, D.: Multi-field Information Extraction and Cross-document Fusion. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 483–490 (2005)

    Google Scholar 

  16. Naughton, M., Kushmerick, N., Carthy, J.: Event Extraction from Heterogeneous News Sources. In: AAAI 2006 Workshop on Event Extraction and Synthesis, AAAI Press, Menlo Park (2006)

    Google Scholar 

  17. Otterbacher, J., Radev, D.: Modeling Document Dynamics: an Evolutionary Approach. In (ELRA), E.L.R.A. (ed.) Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco (2008)

    Google Scholar 

  18. Piskorski, J.: ExPRESS – Extraction Pattern Recognition Engine and Specification Suite. In: Proceedings of the International Workshop Finite-State Methods and Natural language Processing 2007 (FSMNLP’2007), Potsdam, Germany (2007)

    Google Scholar 

  19. Piskorski, J.: CORLEONE – Core Linguistic Entity Online Extraction. In: Technical report 23393 EN, Joint Research Center of the European Commission, Ispra, Italy (2008)

    Google Scholar 

  20. Piskorski, J., Tanev, H., Atkinson, M., van der Goot, E.: Cluster-Centric Approach to News Event Extraction. In: Proceedings of the International Conference on Multimedia & Network Information Systems, Wroclaw, Poland, IOS Press, Amsterdam (2008)

    Google Scholar 

  21. Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., Fuart, F., Zaghouani, W., Widiger, A., Forslund, A., Best, C.: Geocoding multilingual texts: Recognition, Disambiguation and Visualisation. In: Proceedings of LREC 2006, Genoa, Italy, pp. 24–26 (2006)

    Google Scholar 

  22. Pui, G., Fung, C., Yu, J., Liu, H., Yu, P.: Time-dependent Event Hierarchy Construction. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 300–309. ACM, New York (2007)

    Google Scholar 

  23. Qi, Y., Candan, K.S.: CUTS: Curvature-based Development Pattern Analysis and Segmentation for Blogs and Other Text Streams. In: Proceedings of Hypertext 2006, ACM Press, New York (2006)

    Google Scholar 

  24. Riloff, E.: Automatically Constructing a Dictionary for Information Extraction Tasks. In: Proceedings of the 11th National Conference on Artificial Intelligence (1993)

    Google Scholar 

  25. Tanev, H., Oezden-Wennerberg, P.: Learning to Populate an Ontology of Violent Events. In: Fogelman-Soulie, F., Perrotta, D., Piskorski, J., Steinberger, R. (eds.) Mining Massive Data Sets for Security, IOS Press, Amsterdam (2008)

    Google Scholar 

  26. Tanev, H., Piskorski, J., Atkinson, M.: Real-Time News Event Extraction for Global Crisis Monitoring. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 207–218. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  27. Wagner, E., Liu, J., Birnbaum, L., Forbus, K., Baker, J.: Using Explicit Semantic Models to Track Situations Across News Articles. In: AAAI 2006 Workshop on Event Extraction and Synthesis, AAAI Press, Menlo Park (2006)

    Google Scholar 

  28. Wang, C., Zhang, M., Ma, S., Ru, L.: Automatic Online News Issue Construction in Web Environment. In: Proceedings of 17th International World Wide Web Conference, Bejing, China, pp. 457–466. ACM, New York (2008)

    Google Scholar 

  29. Yangarber, R.: Counter-Training in Discovery of Semantic Patterns. In: Proceedings of the 41st Annual Meeting of the ACL (2003)

    Google Scholar 

  30. Yangarber, R.: Verification of Facts across Document Boundaries. In: Proceedings International Workshop on Intelligent Information Access, IIIA-2006 (2006)

    Google Scholar 

  31. Yangarber, R., Jokipii, L.: Redundancy-based Correction of Automatically Extracted Facts. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA, Association for Computational Linguistics, pp. 57–64 (2005)

    Google Scholar 

  32. Zavarella, V., Piskorski, J., Tanev, H.: Event Extraction for Italian using a Cascade of Finite-State Grammars. In: Proceedings of the 7th International Workshop on Finite-State Machines and Natural Language Processsing, Ispra, Italy (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Piskorski, J. (2009). Exploring Curvature-Based Topic Development Analysis for Detecting Event Reporting Boundaries. In: Marciniak, M., Mykowiecka, A. (eds) Aspects of Natural Language Processing. Lecture Notes in Computer Science, vol 5070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04735-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04735-0_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04734-3

  • Online ISBN: 978-3-642-04735-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics