Advertisement

Named Entity Recognition from Greek Texts: The GIE Project

  • Vangelis Karkaletsis
  • Constantine D. Spyropoulos
  • George Petasis
Part of the International Series on Microprocessor-Based and Intelligent Systems Engineering book series (ISCA, volume 21)

Abstract

Todays’ overload of information, particularly through the World Wide Web, makes difficult the users’ access to the right information. The situation becomes even more difficult due to the fact that a lot of this information is in different languages. Therefore, it is important to apply an information process that will extract from all that volume of information only the facts that match users’ interests, and allow the user to access facts written in a different language. Information Extraction (IE) technology can meet these requirements, since unlike what happens with information retrieval and filtering technology, in IE the user interests are on specific facts extracted from the documents and not on the documents themselves. Some documents may contain the requested keywords but be irrelevant to the users’ interests. Working with specific facts instead of documents provides users information more relevant to their domain of interest. The IE systems developed so far, extract, in most cases, fixed information from documents in a fixed language. However, in order for the IE technology to be truly applicable in real life applications, meeting the above requirements, IE systems need to be easily adaptable (customisable) to new domains and users interests, as well as to multiple languages. During the last decade, substantial progress has been made in developing reliable Information Extraction (IE) technology. IE technology is currently exploited in real applications, such as the extraction of information for companies acquisitions [1],[2],[3], stock exchanges [4], companies profits and losses [5], joint ventures and management succession events [6],[7],[8], as well as for the understanding of military messages [9] and police reports [10],[11],[12].

Keywords

Noun Phrase Information Extraction User Interest Text Corpus Entity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cowie J., Wakao T., Jin W., Pustejovsky J. and Waterman S., The diderot information extraction system. In Proceedings of the First Conference of the Pacific Association for Computational Linguistics (PACLING 93). Vancouver, Canada, 1993.Google Scholar
  2. 2.
    Jacobs P.S. and Rau L.F., Scisor: Extracting information from on-line news. Communications of the ACM, 33(11):88–97, 1990.CrossRefGoogle Scholar
  3. 3.
    Wilks Y. Diderot: a text extraction system. In DARPA Speech and Natural Language Workshop. Morgan Kaufmann, San Mateo, CA, 1991.Google Scholar
  4. 4.
    Vichot F., Wolinski F., Tomeh J., Guennou S., Dillet B., Aydjian S., High Precision Hypertext Navigation Based on NLP Automatic Extractions, Hypertext, Information Retrieval, Multimedia (HTM′97), Dortmund, Germany, (30): 161–174. October, 1997.Google Scholar
  5. 5.
    Andersen P.M., Hayes P.J., Huettner A.K., Nirenburg LB., Schmandt L.M. and Weinstein S.P. Automatic extraction of facts from press releases to generate news stories. In Proceedings of the Third Conference on Applied Natural Language Processing, pages 170–177. ACL, 1992.Google Scholar
  6. 6.
    ECRAN: Extraction of Content: Research at Near Market, http://www2.echo.lu/langeneg/en/le1/ecran/ecran.html
  7. 7.
    MUC5, 1993. Proceedings of the Fifth Message Understanding Conference, San Francisco, Calif.: Morgan Kaufmann.Google Scholar
  8. 8.
    MUC6, 1995. Proceedings of the Sixth Message Understanding Conference. San Francisco, Calif.: Morgan Kaufmann.Google Scholar
  9. 9.
    DARPA Speech and Natural Language Workshop, Harriman, NY, 1992.Google Scholar
  10. 10.
    AVENT1NUS: Advanced Information System for Multinational Drug Enforcement. http://www2.echo.lu/langeneg/en/lel/aventinus/aventinus.html
  11. 11.
    Evans R.and Hartley A.F., The traffic information collator. Expert Systems: The International Journal of Knowledge Engineering, 7(4):209–214, 1990.CrossRefGoogle Scholar
  12. 12.
    Gaizauskas R., Evans R., Cahill L.J., Richardson I. and Walker J., Poetic: A system for gathering and disseminating traffic information. In S.G. Ritchie and G.T. Hendrickson, editors, Conference Preprints of the International Conference on Artificial Intelligence Applications in Transportation Engineering, pages 79–98, San Buenaventura, California, 1992.Google Scholar
  13. 13.
    Gaizauskas, R., Wilks, Y. «Information Extraction beyond Document Retrieval», University of Sheffield, Dept. of Computer Science, CS-97-10, 1997.Google Scholar
  14. 14.
    Cunningham, H., Wilks, Y., Gaizauskas, R., GATE — a General Architecture for Text Engineering, 16th Conference on Computational Linguistics (COLING′96), 274–279, 1996.Google Scholar
  15. 15.
    Gazdar G. and Mellish C, 1989. Natural Language Processing in Prolog. Addison-Wesley, 1989.Google Scholar
  16. 16.
    Paliouras G., Karkaletsis V. and Spyropoulos C.D., “Machine Learning for Domain-Adaptive Word Sense Disambiguation”. Proceedings of the LREC Workshop on “Adapting Lexical and Corpus Resources to Sublanguages and Applications”, Granada, Spain, May 26, 1998.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1999

Authors and Affiliations

  • Vangelis Karkaletsis
    • 1
  • Constantine D. Spyropoulos
    • 1
  • George Petasis
    • 1
  1. 1.Software and Knowledge Engineering LaboratoryInstitute of Informatics and Telecommunications, N.C.S.R. «Demokritos»Greece

Personalised recommendations