Author Extraction: A Test Experience for Flexible Information Extraction

  • Jesús Cardeñosa
  • Luis Iraola
  • Edmundo Tovar
Part of the Advances in Soft Computing book series (AINSC, volume 7)


This paper presents an experience in the area of information extraction within the framework of the on-going research ESPRIT IV project FLEX. The work presented is part of a larger effort aimed at building the first prototype of a flexible information system and it has consisted in the detection and extraction of named entities from a collection of newspaper articles. Although name extraction has received the attention of the Information Extraction research community from the beginning, this attention has been recently increased by the inclusion of a Named Entity task both in the Seventh Message Understanding Conference [1] and the Information Retrieval and Extraction Exercise [2]. The work presented is a practical application of techniques discussed and developed in recent years by the Information Extraction community. Besides that practical application, the results obtained serve for validating our overall approach to cost-effective, re-usable information extraction in the context of a viable, marketable system.


Newspaper Article Content Provider Lexical Resource Topic Detection Information Extraction System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chinchor, Nancy A. “Overview of MUC-7/MET-2”. In Proceedings of the Seventh Message Understanding Conference (MUC-7)., 1998.Google Scholar
  2. 2.
    Sekine, S., Isahara, H. “IREX Project Overview”. In Proceedings of the IREX Workshop. Tokyo (1999). Homepage at: Scholar
  3. 3.
    Hearst, M. A. “Untangling Text Data Mining”, in Proceedings of ACL’99: the 37`h Annual Meeting of the Association for Computational Linguistics, University of Maryland, 1999.Google Scholar
  4. 4.
    FLEX Esprit IV Project. P29158. Technical Annex. Task 2.2,“Knowledge Extraction ”.Google Scholar
  5. 5.
    Mikheev, A., Grover, C. Moens, M. “Description of the LTG system used for MUC-7”. See [1].Google Scholar
  6. 6.
    Krupka, G. R., Hausman, K. “Description of the NetOwlrMExtractor System as Used for MUC-7”. See [1].Google Scholar
  7. 7.
    Miller, S. et al. “Algorithms that Learn to Extract Information. BBN: Description of the Sift System as used for MUC-7”. See [1].Google Scholar
  8. 8.
    Borthwick, A.; Starling, J.;Agichtein, E.;Grisham, R.; “NYU: Description of the MENE Named Entity System as Used in MUC-7”. See [1].Google Scholar
  9. 9.
    Black, W.J., Rinaldi F., Mowatt, D. “FACILE: Description of the NE system used for MUC-7. See [1].Google Scholar
  10. 10.
    Consortium For Linguistic Research.http://CIr.CS.nmSU.edU/Cgi-bin/Tools/CLRGoogle Scholar
  11. 11.
    USA Census Data. Homepage at: Scholar
  12. 12.
    CELEX Lexicon. Homepage at: Scholar
  13. 13.
    OALDCE Dictionary. Available from the Consortium for Linguistic Research.Google Scholar
  14. 14.
    WORDNET 1.6. Homepage at: Scholar
  15. 15.
    Shieber, S. M. “An Introduction to Unification-Based Approaches to Grammar”. Chicago University Press, 1986.Google Scholar
  16. 16.
    Covington, M. “Natural Language Processing for PROLOG Programmers”. Prentice-Hall, 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Jesús Cardeñosa
    • 1
  • Luis Iraola
    • 1
  • Edmundo Tovar
    • 1
  1. 1.Facultad de InformáticaUniversidad Politécnica de MadridBoadilla del Monte (Madrid)Spain

Personalised recommendations