Abstract
In the era of proliferation of electronic news media and an ever-growing demand for prompt and concise information, natural language text processing technologies which map free texts into structured data format are becoming paramount. Recently, we have witnessed an emergence of publicly accessible news aggregation systems for facilitating navigation through news. This paper reports on some explorations of refining a real-time news event extraction system, which runs on top of the Europe Media Monitoring news aggregation system developed at the Joint Research Centre of the European Commission. Our experiments focus on the task of detecting new events in a given news story, i.e. tagging events extracted by the core event extraction system as new. Several methods ranging from simple similarity computation of event descriptions of adjacent events to more elaborate ones based on curvature-based topic development analysis which utilize global knowledge. The paper describes first the particularities of the real-time news event extraction processing chain. Next, in order to get a better insight how news stories evolve over time some statistics on event dynamics are presented. Finally, the new event detection techniques are introduced and the results of the evaluation are given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alani, H., Kim, S., Millard, D., Weal, M., Hall, W., Lewis, P., Shadbolt, N.: Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation. In: Proceedings of the Workshop on Knowledge Markup and Semantic Annotation, K-Cap’03 (2003)
Ashish, N., Appelt, D., Freitag, D., Zelenko, D.: Proceedings of the Workshop on Event Extraction and Synthesis, held in conjunction with the AAAI 2006 conference, Menlo Park, California, USA (2006)
Bejan, C., Harabagiu, S.: A Linguistic Resource for Discovering Event Structures and Resolving Event Coreference. In: ELRA, E.L.R.A. (ed.) Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco (2008)
Best, C., van der Goot, E., Blackler, K., Garcia, T., Horby, D.: Europe Media Monitor. Technical Report EUR 22173 EN, European Commission (2005)
Brants, T., Chen, F., Farahat, A.: A System for New Event Detection. In: SIGIR ’03: Proceedings of the 26t th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 330–337. ACM, New York (2003)
Chieu, H., Keok Lee, Y.: Query Based Event Extraction along a Timeline. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 425–432. ACM, New York (2004)
Cox, T., Cox, M.: Multidimensional Scaling, 2nd edn. Monographs on Statistics and Applied Probability. Chapman and Hall, London (2001)
Fillmore, C., Narayanan, S., Baker, C.: What Linguistics Can Contribute to Event Extraction. In: Proceedings of the AAAI 2006 Workshop on Event Extraction, AAAI Press, Menlo Park (2006)
Grishman, R., Huttunen, S., Yangarber, R.: Real-time Event Extraction for Infectious Disease Outbreaks. In: Proceedings of Human Language Technology Conference (HLT) 2002, San Diego, USA (2002)
Hearst, M., Plaunt, C.: Subtopic Structuring for Full-length Document Access. In: Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 59–68 (1993)
Huttunen, S., Yangarber, R., Grishman, R.: Complexity of Event Structure in IE Scenarios. In: Proceedings of the 19th International Conference on Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 1–7 (2002)
Ji, H., Grishman, R.: Refining Event Extraction through Unsupervised Cross-document Inference. In: Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, Ohio, USA (2008)
Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for Text Learning Tasks. In: Proceedings of IJCAI-99 Workshop on Text Mining: Foundations, Techniques, and Applications, Stockholm, Sweden (1999)
King, G., Lowe, W.: An Automated Information Extraction Tool For International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design. International Organization 57, 617–642 (2003)
Mann, G., Yarowsky, D.: Multi-field Information Extraction and Cross-document Fusion. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 483–490 (2005)
Naughton, M., Kushmerick, N., Carthy, J.: Event Extraction from Heterogeneous News Sources. In: AAAI 2006 Workshop on Event Extraction and Synthesis, AAAI Press, Menlo Park (2006)
Otterbacher, J., Radev, D.: Modeling Document Dynamics: an Evolutionary Approach. In (ELRA), E.L.R.A. (ed.) Proceedings of the 6th International Language Resources and Evaluation (LREC’08), Marrakech, Morocco (2008)
Piskorski, J.: ExPRESS – Extraction Pattern Recognition Engine and Specification Suite. In: Proceedings of the International Workshop Finite-State Methods and Natural language Processing 2007 (FSMNLP’2007), Potsdam, Germany (2007)
Piskorski, J.: CORLEONE – Core Linguistic Entity Online Extraction. In: Technical report 23393 EN, Joint Research Center of the European Commission, Ispra, Italy (2008)
Piskorski, J., Tanev, H., Atkinson, M., van der Goot, E.: Cluster-Centric Approach to News Event Extraction. In: Proceedings of the International Conference on Multimedia & Network Information Systems, Wroclaw, Poland, IOS Press, Amsterdam (2008)
Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Blackler, K., Fuart, F., Zaghouani, W., Widiger, A., Forslund, A., Best, C.: Geocoding multilingual texts: Recognition, Disambiguation and Visualisation. In: Proceedings of LREC 2006, Genoa, Italy, pp. 24–26 (2006)
Pui, G., Fung, C., Yu, J., Liu, H., Yu, P.: Time-dependent Event Hierarchy Construction. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 300–309. ACM, New York (2007)
Qi, Y., Candan, K.S.: CUTS: Curvature-based Development Pattern Analysis and Segmentation for Blogs and Other Text Streams. In: Proceedings of Hypertext 2006, ACM Press, New York (2006)
Riloff, E.: Automatically Constructing a Dictionary for Information Extraction Tasks. In: Proceedings of the 11th National Conference on Artificial Intelligence (1993)
Tanev, H., Oezden-Wennerberg, P.: Learning to Populate an Ontology of Violent Events. In: Fogelman-Soulie, F., Perrotta, D., Piskorski, J., Steinberger, R. (eds.) Mining Massive Data Sets for Security, IOS Press, Amsterdam (2008)
Tanev, H., Piskorski, J., Atkinson, M.: Real-Time News Event Extraction for Global Crisis Monitoring. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 207–218. Springer, Heidelberg (2008)
Wagner, E., Liu, J., Birnbaum, L., Forbus, K., Baker, J.: Using Explicit Semantic Models to Track Situations Across News Articles. In: AAAI 2006 Workshop on Event Extraction and Synthesis, AAAI Press, Menlo Park (2006)
Wang, C., Zhang, M., Ma, S., Ru, L.: Automatic Online News Issue Construction in Web Environment. In: Proceedings of 17th International World Wide Web Conference, Bejing, China, pp. 457–466. ACM, New York (2008)
Yangarber, R.: Counter-Training in Discovery of Semantic Patterns. In: Proceedings of the 41st Annual Meeting of the ACL (2003)
Yangarber, R.: Verification of Facts across Document Boundaries. In: Proceedings International Workshop on Intelligent Information Access, IIIA-2006 (2006)
Yangarber, R., Jokipii, L.: Redundancy-based Correction of Automatically Extracted Facts. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA, Association for Computational Linguistics, pp. 57–64 (2005)
Zavarella, V., Piskorski, J., Tanev, H.: Event Extraction for Italian using a Cascade of Finite-State Grammars. In: Proceedings of the 7th International Workshop on Finite-State Machines and Natural Language Processsing, Ispra, Italy (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Piskorski, J. (2009). Exploring Curvature-Based Topic Development Analysis for Detecting Event Reporting Boundaries. In: Marciniak, M., Mykowiecka, A. (eds) Aspects of Natural Language Processing. Lecture Notes in Computer Science, vol 5070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04735-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-04735-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04734-3
Online ISBN: 978-3-642-04735-0
eBook Packages: Computer ScienceComputer Science (R0)