Skip to main content

Document Selection for Extracting Entity and Relationship Instances of Terrorist Events

  • Chapter
Terrorism Informatics

Part of the book series: Integrated Series In Information Systems ((ISIS,volume 18))

In this chapter, we study the problem of selecting documents so as to extract terrorist event information from a collection of documents. We represent an event by its entity and relation instances. Very often, these entity and relation instances have to be extracted from multiple documents. We therefore define an information extraction (IE) task as selecting documents and extracting from which entity and relation instances relevant to a user-specified event (aka domain specific event entity and relation extraction). We adopt domain specific IE patterns to extract potentially relevant entity and relation instances from documents, and develop a number of document ranking strategies using the extracted instances to address this extraction task. Each ranking strategy (aka pattern-based document ranking strategy) assigns a score to each document, which estimates the latter's contribution to the gain in event related instances. We conducted experiments on two document collection datasets constructed using two historical terrorism events. Experiments showed that our proposed patternbased document ranking strategies performed well on the domain specific event entity and relation extraction task for document collections of various sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. bombsecurity.com: Extremists online, http://www.bombsecurity.com/extremists.html (2006)

  2. Apache: Lucene search engine. http://jakarta.apache.org/lucene (2006)

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley (1999)

    Google Scholar 

  4. Techner, K.: A literature survey on information extraction and text summarization. Computational Linguistics Program (1997)

    Google Scholar 

  5. Palmer, D.D., Day, D.S.: A statistical profile of the named entity task. In: Proceedings of the Fifth ACL Conference for Applied Natural Language Processing. (1997)

    Google Scholar 

  6. Riloff, E.: Automatically constructing a dictionary form information extraction tasks. In: Proceedings of the Eleventh National Conference on Artificial Intelligence. (1993)

    Google Scholar 

  7. Riloff, E., Shoen, J.: Automatically acquiring conceptual patterns without an annotated corpus. In: Proceedings of the Third Workshop on Very Large Corpora. (1995)

    Google Scholar 

  8. Riloff, E.: Automatically generating extraction patterns from untagged text. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence. (1996)

    Google Scholar 

  9. Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence. (1999)

    Google Scholar 

  10. Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. (2002)

    Google Scholar 

  11. Guo, Z., Jian, S.: A high-performance coreference resolution system using a multi-agent strategy. In: Proceedings of 20th International Conference on Computational Linguistics. (2004)

    Google Scholar 

  12. Sun, Z.: Domain specific event information extraction on large text collections. Master’s thesis, Nanyang Technological University, School of Computer Engineering (2006)

    Google Scholar 

  13. Soderland, S., Fisher, D., Aseltine, J., Lehnert, W.: Crystal: Inducing a conceptual dictionary. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. (1995)

    Google Scholar 

  14. Soderland, S.: Learning to extract text-based information from the world wide web. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. (1997)

    Google Scholar 

  15. Krupka, G.: Description of the SRA system as used for MUC. In: Proceedings of the Sixth Message Understanding Conference. (1995)

    Google Scholar 

  16. Huffman, S.: Learning information extraction patterns from examples. In: Proceedings of IJCAI-95Workshop on new approaches to learning for natural language processing. (1995)

    Google Scholar 

  17. Kim, J., Moldavan, D.: Acquisition of linquistic patterns for knowledge-based information extraction. In: IEEE Transactions on Knowledge and Data Engineering. (1995)

    Google Scholar 

  18. Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: Proceedings of AAAI99 Workshop on Machine Learning for Information Extraction. (1999)

    Google Scholar 

  19. Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM International Conference on Digital Libraries. (2000)

    Google Scholar 

  20. Goldstein, J., Mittal, V.O., Carbonell, J.G., Callan, J.P.: Creating and evaluating multi-document sentence extract summaries. In: CIKM. (2000) 165-172 21.

    Google Scholar 

  21. Masterson, D., Kushmerick, N.: Information extraction from multi-document threads. In: Proceedings of ATEM. (2003)

    Google Scholar 

  22. Reidsma, D., Kuper, J., Declerck, T., Saggion, H., Cunningham, H.: Cross document ontology based information extraction for multimedia retrieval. In: Supplementary proceedings of the ICCS03. (2003)

    Google Scholar 

  23. Agichtein, E., Gravano, L.: Querying text database for efficient information extraction. In: Proceedings of the 2002 Conference on the 19th IEEE International Conference on Data Engineering. (2003)

    Google Scholar 

  24. Finn, A., Kushmerick, N.: Active learning selection strategies for information extraction. In: Proceedings of ATEM. (2003)

    Google Scholar 

  25. Madhyastha, H.V., Balakrishnan, N., Ramakrishnan, K.R.: Event information extraction using link grammar. In: Proceedings of the 13th International WorkShop on Research Issues in Data Engineering: Multi-lingual Information Management. (2003)

    Google Scholar 

  26. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press (1998) 37-45

    Google Scholar 

  27. Brants, T., Chen, F.: A system for new event detection. In: SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, ACM Press (2003) 330-337

    Google Scholar 

  28. Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: SIGIR ’04: Proceedings of the 27th annual international conference on Research and development in information retrieval, ACM Press (2004) 297-304

    Google Scholar 

  29. Chen, F.R., Farahat, A.O., Brants, T.: Story link detection and new event detection are asymmetric. In: Proceedings of Human Language Technology Conference(HLT-NAACL 2003). (2003)

    Google Scholar 

  30. Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B.T., Liu, X.: Learning approaches for detecting and tracking news events. In: IEEE Intelligent Systems, 14 (4):32-43. (1999)

    Article  Google Scholar 

  31. Fellbaum, C.: Wordnet: An electronic lexical database. MIT Press (1998)

    Google Scholar 

  32. Bikel, D.M., Schwartz, R.L., Weischedel, R.M.: An algorithm that learns whats in a name. Machine Learning 34(1-3) (1999) 211-231

    Article  Google Scholar 

  33. Various:Badger information extraction(ie)software.http://www.nlp.cs.umass.edu/software/badger.html (2006)

  34. Sun, Z., Lim, E.P., Chang, K., Ong, T.K., Gunaratna, R.K.: Event-driven document selection for terrorism information extraction. In Kantor, P., Muresan, G., Roberts, F., D., D., eds.: IEEE International Conference on Intelligence and Security. Lecture Notes in Computer Science, Berlin Heidelberg, Springer Verlag (2005) 37-48

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Sun, Z., Lim, EP., Chang, K., Suryanto, M.A., Gunaratna, R.K. (2008). Document Selection for Extracting Entity and Relationship Instances of Terrorist Events. In: Chen, H., Reid, E., Sinai, J., Silke, A., Ganor, B. (eds) Terrorism Informatics. Integrated Series In Information Systems, vol 18. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-71613-8_15

Download citation

Publish with us

Policies and ethics