Skip to main content

Multilingual Knowledge-Based Concept Recognition in Textual Data

  • Conference paper
  • First Online:
Advances in Data Analysis, Data Handling and Business Intelligence

Abstract

With respect to the increasing volume of textual data which is available through digital resources today, the identification of the main concepts in those texts becomes increasingly important and can be seen as a vital step in the analysis of unstructured information.

Research in this area has focused on the detection of named entities like person names or organization names, which only cover a very small part of concepts in texts. Especially the unique mapping between concepts in different languages requires parallel corpora, which are rarely available in industrial settings.

We therefore propose a powerful new knowledge based model to recognize various kinds of concepts even in very short and specialized texts using linguistic information for synonym handling and word sense disambiguation.

We evaluate the proposed model on texts from the automotive domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Cohen, A. (2005). Unsupervised gene/protein named entity normalization using automatically extracted dictionaries, In Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics.

    Google Scholar 

  • Ferrucci, D., & Lally, A. (2004). UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3–4), 327–348.

    Article  Google Scholar 

  • Grishman, R. (1995). The NYU system for MUC-6 or where’s the syntax? In MUC6 ’95: Proceedings of the 6th conference on Message understanding (pp. 167–175). Morristown, NJ: Association for Computational Linguistics.

    Google Scholar 

  • Hanisch, D., Fundel, K., Mevissen, H. T., Zimmer, R., & Fluck, J. (2005). ProMiner: Rule-based protein and gene entity recognition. BMC Bioinformatics, 6(Suppl. 1), S14.

    Article  Google Scholar 

  • Hobbs, J., Appelt, D., Tyson, M., Bear, J., & Israel D. (1992). Description of the fastus system used for MUC-4. In Proceedings of the Fourth Message Understanding Conference (MUC-4) (pp. 268–275).

    Google Scholar 

  • Huang, F. (2005). Multilingual named entity extraction and translation from text and speech. PhD thesis, Carnegie Mellon University.

    Google Scholar 

  • Klementiev, A., & Roth, D. (2006). Named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp. 82–88). Morristown, NJ: Association for Computational Linguistics.

    Google Scholar 

  • Manning, C., & Schuetze, H. (1999). Foundations of statistical natural language processing (Chap. 8.1). Cambridge, MA: MIT.

    Google Scholar 

  • Miller, G., Beckwirth, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3, 235–244.

    Article  Google Scholar 

  • Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schwartz, R., Stone, R., et al. (1998). Algorithms that learn to extract information BBN: description of the SIFT system as used for MUC. In Proceedings of the Seventh Message Understanding Conference (MUC-7).

    Google Scholar 

  • Saito, K., & Nagata, M. (2003). Multi-language named-entity recognition system based on HMM. In Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition (pp. 41–48). Morristown, NJ: Association for Computational Linguistics.

    Google Scholar 

  • Schierle, M., Schulz, S., & Ackermann, M. (2007). From spelling correction to text cleaning using context information. In GfKl, 2007.

    Google Scholar 

  • Sekine, S., Grishman, R., & Shinnou, H. (1998). A decision tree method for Finding and classifying names in japanese texts. In Proceedings of the Sixth Workshop on Very Large Corpora.

    Google Scholar 

  • Vossen, P., & Letteren, C. (1997). EuroWordNet: A multilingual database for information retrieval. Paper presented at the DELOS workshop on Cross-language Information Retrieval, Zurich.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Schierle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schierle, M., Trabold, D. (2009). Multilingual Knowledge-Based Concept Recognition in Textual Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_30

Download citation

Publish with us

Policies and ethics