Multilingual Knowledge-Based Concept Recognition in Textual Data

Schierle, Martin; Trabold, Daniel

doi:10.1007/978-3-642-01044-6_30

Martin Schierle⁵ &
Daniel Trabold

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2887 Accesses
2 Citations

Abstract

With respect to the increasing volume of textual data which is available through digital resources today, the identification of the main concepts in those texts becomes increasingly important and can be seen as a vital step in the analysis of unstructured information.

Research in this area has focused on the detection of named entities like person names or organization names, which only cover a very small part of concepts in texts. Especially the unique mapping between concepts in different languages requires parallel corpora, which are rarely available in industrial settings.

We therefore propose a powerful new knowledge based model to recognize various kinds of concepts even in very short and specialized texts using linguistic information for synonym handling and word sense disambiguation.

We evaluate the proposed model on texts from the automotive domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cohen, A. (2005). Unsupervised gene/protein named entity normalization using automatically extracted dictionaries, In Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics.
Google Scholar
Ferrucci, D., & Lally, A. (2004). UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3–4), 327–348.
Article Google Scholar
Grishman, R. (1995). The NYU system for MUC-6 or where’s the syntax? In MUC6 ’95: Proceedings of the 6th conference on Message understanding (pp. 167–175). Morristown, NJ: Association for Computational Linguistics.
Google Scholar
Hanisch, D., Fundel, K., Mevissen, H. T., Zimmer, R., & Fluck, J. (2005). ProMiner: Rule-based protein and gene entity recognition. BMC Bioinformatics, 6(Suppl. 1), S14.
Article Google Scholar
Hobbs, J., Appelt, D., Tyson, M., Bear, J., & Israel D. (1992). Description of the fastus system used for MUC-4. In Proceedings of the Fourth Message Understanding Conference (MUC-4) (pp. 268–275).
Google Scholar
Huang, F. (2005). Multilingual named entity extraction and translation from text and speech. PhD thesis, Carnegie Mellon University.
Google Scholar
Klementiev, A., & Roth, D. (2006). Named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp. 82–88). Morristown, NJ: Association for Computational Linguistics.
Google Scholar
Manning, C., & Schuetze, H. (1999). Foundations of statistical natural language processing (Chap. 8.1). Cambridge, MA: MIT.
Google Scholar
Miller, G., Beckwirth, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3, 235–244.
Article Google Scholar
Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schwartz, R., Stone, R., et al. (1998). Algorithms that learn to extract information BBN: description of the SIFT system as used for MUC. In Proceedings of the Seventh Message Understanding Conference (MUC-7).
Google Scholar
Saito, K., & Nagata, M. (2003). Multi-language named-entity recognition system based on HMM. In Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition (pp. 41–48). Morristown, NJ: Association for Computational Linguistics.
Google Scholar
Schierle, M., Schulz, S., & Ackermann, M. (2007). From spelling correction to text cleaning using context information. In GfKl, 2007.
Google Scholar
Sekine, S., Grishman, R., & Shinnou, H. (1998). A decision tree method for Finding and classifying names in japanese texts. In Proceedings of the Sixth Workshop on Very Large Corpora.
Google Scholar
Vossen, P., & Letteren, C. (1997). EuroWordNet: A multilingual database for information retrieval. Paper presented at the DELOS workshop on Cross-language Information Retrieval, Zurich.
Google Scholar

Download references

Author information

Authors and Affiliations

Daimler AG, Ulm, Germany
Martin Schierle

Authors

Martin Schierle
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Trabold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Schierle .

Editor information

Editors and Affiliations

Universität der Bundeswehr, Fak. Wirtschafts-/Sozialwissenschaften, Helmut-Schmidt-Universität, Holstenhofweg 85, Hamburg, 22043, Germany
Andreas Fink
Dept. Mathematical Sciences, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom
Berthold Lausen
Universität der Bundeswehr, Fak. Wirtschafts-/Sozialwissenschaften, Helmut-Schmidt-Universität, Holstenhofweg 85, Hamburg, 22043, Germany
Wilfried Seidel
FB 12 Mathematik und Informatik, Datenbionik AG, Universität Marburg, Hans-Meerwein-Straße, Marburg, 35032, Germany
Alfred Ultsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schierle, M., Trabold, D. (2009). Multilingual Knowledge-Based Concept Recognition in Textual Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-01044-6_30
Published: 31 July 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics