In this chapter, we study the problem of selecting documents so as to extract terrorist event information from a collection of documents. We represent an event by its entity and relation instances. Very often, these entity and relation instances have to be extracted from multiple documents. We therefore define an information extraction (IE) task as selecting documents and extracting from which entity and relation instances relevant to a user-specified event (aka domain specific event entity and relation extraction). We adopt domain specific IE patterns to extract potentially relevant entity and relation instances from documents, and develop a number of document ranking strategies using the extracted instances to address this extraction task. Each ranking strategy (aka pattern-based document ranking strategy) assigns a score to each document, which estimates the latter's contribution to the gain in event related instances. We conducted experiments on two document collection datasets constructed using two historical terrorism events. Experiments showed that our proposed patternbased document ranking strategies performed well on the domain specific event entity and relation extraction task for document collections of various sizes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
bombsecurity.com: Extremists online, http://www.bombsecurity.com/extremists.html (2006)
Apache: Lucene search engine. http://jakarta.apache.org/lucene (2006)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley (1999)
Techner, K.: A literature survey on information extraction and text summarization. Computational Linguistics Program (1997)
Palmer, D.D., Day, D.S.: A statistical profile of the named entity task. In: Proceedings of the Fifth ACL Conference for Applied Natural Language Processing. (1997)
Riloff, E.: Automatically constructing a dictionary form information extraction tasks. In: Proceedings of the Eleventh National Conference on Artificial Intelligence. (1993)
Riloff, E., Shoen, J.: Automatically acquiring conceptual patterns without an annotated corpus. In: Proceedings of the Third Workshop on Very Large Corpora. (1995)
Riloff, E.: Automatically generating extraction patterns from untagged text. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence. (1996)
Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence. (1999)
Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. (2002)
Guo, Z., Jian, S.: A high-performance coreference resolution system using a multi-agent strategy. In: Proceedings of 20th International Conference on Computational Linguistics. (2004)
Sun, Z.: Domain specific event information extraction on large text collections. Master’s thesis, Nanyang Technological University, School of Computer Engineering (2006)
Soderland, S., Fisher, D., Aseltine, J., Lehnert, W.: Crystal: Inducing a conceptual dictionary. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. (1995)
Soderland, S.: Learning to extract text-based information from the world wide web. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. (1997)
Krupka, G.: Description of the SRA system as used for MUC. In: Proceedings of the Sixth Message Understanding Conference. (1995)
Huffman, S.: Learning information extraction patterns from examples. In: Proceedings of IJCAI-95Workshop on new approaches to learning for natural language processing. (1995)
Kim, J., Moldavan, D.: Acquisition of linquistic patterns for knowledge-based information extraction. In: IEEE Transactions on Knowledge and Data Engineering. (1995)
Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: Proceedings of AAAI99 Workshop on Machine Learning for Information Extraction. (1999)
Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM International Conference on Digital Libraries. (2000)
Goldstein, J., Mittal, V.O., Carbonell, J.G., Callan, J.P.: Creating and evaluating multi-document sentence extract summaries. In: CIKM. (2000) 165-172 21.
Masterson, D., Kushmerick, N.: Information extraction from multi-document threads. In: Proceedings of ATEM. (2003)
Reidsma, D., Kuper, J., Declerck, T., Saggion, H., Cunningham, H.: Cross document ontology based information extraction for multimedia retrieval. In: Supplementary proceedings of the ICCS03. (2003)
Agichtein, E., Gravano, L.: Querying text database for efficient information extraction. In: Proceedings of the 2002 Conference on the 19th IEEE International Conference on Data Engineering. (2003)
Finn, A., Kushmerick, N.: Active learning selection strategies for information extraction. In: Proceedings of ATEM. (2003)
Madhyastha, H.V., Balakrishnan, N., Ramakrishnan, K.R.: Event information extraction using link grammar. In: Proceedings of the 13th International WorkShop on Research Issues in Data Engineering: Multi-lingual Information Management. (2003)
Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press (1998) 37-45
Brants, T., Chen, F.: A system for new event detection. In: SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, ACM Press (2003) 330-337
Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: SIGIR ’04: Proceedings of the 27th annual international conference on Research and development in information retrieval, ACM Press (2004) 297-304
Chen, F.R., Farahat, A.O., Brants, T.: Story link detection and new event detection are asymmetric. In: Proceedings of Human Language Technology Conference(HLT-NAACL 2003). (2003)
Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B.T., Liu, X.: Learning approaches for detecting and tracking news events. In: IEEE Intelligent Systems, 14 (4):32-43. (1999)
Fellbaum, C.: Wordnet: An electronic lexical database. MIT Press (1998)
Bikel, D.M., Schwartz, R.L., Weischedel, R.M.: An algorithm that learns whats in a name. Machine Learning 34(1-3) (1999) 211-231
Various:Badger information extraction(ie)software.http://www.nlp.cs.umass.edu/software/badger.html (2006)
Sun, Z., Lim, E.P., Chang, K., Ong, T.K., Gunaratna, R.K.: Event-driven document selection for terrorism information extraction. In Kantor, P., Muresan, G., Roberts, F., D., D., eds.: IEEE International Conference on Intelligence and Security. Lecture Notes in Computer Science, Berlin Heidelberg, Springer Verlag (2005) 37-48
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Sun, Z., Lim, EP., Chang, K., Suryanto, M.A., Gunaratna, R.K. (2008). Document Selection for Extracting Entity and Relationship Instances of Terrorist Events. In: Chen, H., Reid, E., Sinai, J., Silke, A., Ganor, B. (eds) Terrorism Informatics. Integrated Series In Information Systems, vol 18. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-71613-8_15
Download citation
DOI: https://doi.org/10.1007/978-0-387-71613-8_15
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-71612-1
Online ISBN: 978-0-387-71613-8
eBook Packages: Business and EconomicsBusiness and Management (R0)