Document Selection for Extracting Entity and Relationship Instances of Terrorist Events

  • Zhen Sun
  • Ee-Peng Lim
  • Kuiyu Chang
  • Maggy Anastasia Suryanto
  • Rohan Kumar Gunaratna
Part of the Integrated Series In Information Systems book series (ISIS, volume 18)

In this chapter, we study the problem of selecting documents so as to extract terrorist event information from a collection of documents. We represent an event by its entity and relation instances. Very often, these entity and relation instances have to be extracted from multiple documents. We therefore define an information extraction (IE) task as selecting documents and extracting from which entity and relation instances relevant to a user-specified event (aka domain specific event entity and relation extraction). We adopt domain specific IE patterns to extract potentially relevant entity and relation instances from documents, and develop a number of document ranking strategies using the extracted instances to address this extraction task. Each ranking strategy (aka pattern-based document ranking strategy) assigns a score to each document, which estimates the latter's contribution to the gain in event related instances. We conducted experiments on two document collection datasets constructed using two historical terrorism events. Experiments showed that our proposed patternbased document ranking strategies performed well on the domain specific event entity and relation extraction task for document collections of various sizes.


Information Extraction Terrorist Organization Name Entity Recognition Extraction Pattern Entity Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1. Extremists online, (2006)
  2. 2.
    Apache: Lucene search engine. (2006)
  3. 3.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley (1999)Google Scholar
  4. 4.
    Techner, K.: A literature survey on information extraction and text summarization. Computational Linguistics Program (1997)Google Scholar
  5. 5.
    Palmer, D.D., Day, D.S.: A statistical profile of the named entity task. In: Proceedings of the Fifth ACL Conference for Applied Natural Language Processing. (1997)Google Scholar
  6. 6.
    Riloff, E.: Automatically constructing a dictionary form information extraction tasks. In: Proceedings of the Eleventh National Conference on Artificial Intelligence. (1993)Google Scholar
  7. 7.
    Riloff, E., Shoen, J.: Automatically acquiring conceptual patterns without an annotated corpus. In: Proceedings of the Third Workshop on Very Large Corpora. (1995)Google Scholar
  8. 8.
    Riloff, E.: Automatically generating extraction patterns from untagged text. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence. (1996)Google Scholar
  9. 9.
    Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence. (1999)Google Scholar
  10. 10.
    Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. (2002)Google Scholar
  11. 11.
    Guo, Z., Jian, S.: A high-performance coreference resolution system using a multi-agent strategy. In: Proceedings of 20th International Conference on Computational Linguistics. (2004)Google Scholar
  12. 12.
    Sun, Z.: Domain specific event information extraction on large text collections. Master’s thesis, Nanyang Technological University, School of Computer Engineering (2006)Google Scholar
  13. 13.
    Soderland, S., Fisher, D., Aseltine, J., Lehnert, W.: Crystal: Inducing a conceptual dictionary. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. (1995)Google Scholar
  14. 14.
    Soderland, S.: Learning to extract text-based information from the world wide web. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. (1997)Google Scholar
  15. 15.
    Krupka, G.: Description of the SRA system as used for MUC. In: Proceedings of the Sixth Message Understanding Conference. (1995)Google Scholar
  16. 16.
    Huffman, S.: Learning information extraction patterns from examples. In: Proceedings of IJCAI-95Workshop on new approaches to learning for natural language processing. (1995)Google Scholar
  17. 17.
    Kim, J., Moldavan, D.: Acquisition of linquistic patterns for knowledge-based information extraction. In: IEEE Transactions on Knowledge and Data Engineering. (1995)Google Scholar
  18. 18.
    Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: Proceedings of AAAI99 Workshop on Machine Learning for Information Extraction. (1999)Google Scholar
  19. 19.
    Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM International Conference on Digital Libraries. (2000)Google Scholar
  20. 20.
    Goldstein, J., Mittal, V.O., Carbonell, J.G., Callan, J.P.: Creating and evaluating multi-document sentence extract summaries. In: CIKM. (2000) 165-172 21.Google Scholar
  21. 21.
    Masterson, D., Kushmerick, N.: Information extraction from multi-document threads. In: Proceedings of ATEM. (2003)Google Scholar
  22. 22.
    Reidsma, D., Kuper, J., Declerck, T., Saggion, H., Cunningham, H.: Cross document ontology based information extraction for multimedia retrieval. In: Supplementary proceedings of the ICCS03. (2003)Google Scholar
  23. 23.
    Agichtein, E., Gravano, L.: Querying text database for efficient information extraction. In: Proceedings of the 2002 Conference on the 19th IEEE International Conference on Data Engineering. (2003)Google Scholar
  24. 24.
    Finn, A., Kushmerick, N.: Active learning selection strategies for information extraction. In: Proceedings of ATEM. (2003)Google Scholar
  25. 25.
    Madhyastha, H.V., Balakrishnan, N., Ramakrishnan, K.R.: Event information extraction using link grammar. In: Proceedings of the 13th International WorkShop on Research Issues in Data Engineering: Multi-lingual Information Management. (2003)Google Scholar
  26. 26.
    Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press (1998) 37-45Google Scholar
  27. 27.
    Brants, T., Chen, F.: A system for new event detection. In: SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, ACM Press (2003) 330-337Google Scholar
  28. 28.
    Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: SIGIR ’04: Proceedings of the 27th annual international conference on Research and development in information retrieval, ACM Press (2004) 297-304Google Scholar
  29. 29.
    Chen, F.R., Farahat, A.O., Brants, T.: Story link detection and new event detection are asymmetric. In: Proceedings of Human Language Technology Conference(HLT-NAACL 2003). (2003)Google Scholar
  30. 30.
    Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B.T., Liu, X.: Learning approaches for detecting and tracking news events. In: IEEE Intelligent Systems, 14 (4):32-43. (1999)CrossRefGoogle Scholar
  31. 31.
    Fellbaum, C.: Wordnet: An electronic lexical database. MIT Press (1998)Google Scholar
  32. 32.
    Bikel, D.M., Schwartz, R.L., Weischedel, R.M.: An algorithm that learns whats in a name. Machine Learning 34(1-3) (1999) 211-231CrossRefGoogle Scholar
  33. 33.
    Various:Badger information extraction(ie)software. (2006)
  34. 34.
    Sun, Z., Lim, E.P., Chang, K., Ong, T.K., Gunaratna, R.K.: Event-driven document selection for terrorism information extraction. In Kantor, P., Muresan, G., Roberts, F., D., D., eds.: IEEE International Conference on Intelligence and Security. Lecture Notes in Computer Science, Berlin Heidelberg, Springer Verlag (2005) 37-48Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Zhen Sun
    • 1
  • Ee-Peng Lim
    • 1
  • Kuiyu Chang
    • 1
  • Maggy Anastasia Suryanto
    • 1
  • Rohan Kumar Gunaratna
    • 2
  1. 1.Centre for Advanced Information SystemsNanyang Technological UniversitySingapore
  2. 2.International Center for Political Violence and Terrorism Research InstitutNanyang Technological UniversitySingapore

Personalised recommendations