Abstract
With respect to the increasing volume of textual data which is available through digital resources today, the identification of the main concepts in those texts becomes increasingly important and can be seen as a vital step in the analysis of unstructured information.
Research in this area has focused on the detection of named entities like person names or organization names, which only cover a very small part of concepts in texts. Especially the unique mapping between concepts in different languages requires parallel corpora, which are rarely available in industrial settings.
We therefore propose a powerful new knowledge based model to recognize various kinds of concepts even in very short and specialized texts using linguistic information for synonym handling and word sense disambiguation.
We evaluate the proposed model on texts from the automotive domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cohen, A. (2005). Unsupervised gene/protein named entity normalization using automatically extracted dictionaries, In Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics.
Ferrucci, D., & Lally, A. (2004). UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3–4), 327–348.
Grishman, R. (1995). The NYU system for MUC-6 or where’s the syntax? In MUC6 ’95: Proceedings of the 6th conference on Message understanding (pp. 167–175). Morristown, NJ: Association for Computational Linguistics.
Hanisch, D., Fundel, K., Mevissen, H. T., Zimmer, R., & Fluck, J. (2005). ProMiner: Rule-based protein and gene entity recognition. BMC Bioinformatics, 6(Suppl. 1), S14.
Hobbs, J., Appelt, D., Tyson, M., Bear, J., & Israel D. (1992). Description of the fastus system used for MUC-4. In Proceedings of the Fourth Message Understanding Conference (MUC-4) (pp. 268–275).
Huang, F. (2005). Multilingual named entity extraction and translation from text and speech. PhD thesis, Carnegie Mellon University.
Klementiev, A., & Roth, D. (2006). Named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (pp. 82–88). Morristown, NJ: Association for Computational Linguistics.
Manning, C., & Schuetze, H. (1999). Foundations of statistical natural language processing (Chap. 8.1). Cambridge, MA: MIT.
Miller, G., Beckwirth, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3, 235–244.
Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schwartz, R., Stone, R., et al. (1998). Algorithms that learn to extract information BBN: description of the SIFT system as used for MUC. In Proceedings of the Seventh Message Understanding Conference (MUC-7).
Saito, K., & Nagata, M. (2003). Multi-language named-entity recognition system based on HMM. In Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition (pp. 41–48). Morristown, NJ: Association for Computational Linguistics.
Schierle, M., Schulz, S., & Ackermann, M. (2007). From spelling correction to text cleaning using context information. In GfKl, 2007.
Sekine, S., Grishman, R., & Shinnou, H. (1998). A decision tree method for Finding and classifying names in japanese texts. In Proceedings of the Sixth Workshop on Very Large Corpora.
Vossen, P., & Letteren, C. (1997). EuroWordNet: A multilingual database for information retrieval. Paper presented at the DELOS workshop on Cross-language Information Retrieval, Zurich.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schierle, M., Trabold, D. (2009). Multilingual Knowledge-Based Concept Recognition in Textual Data. In: Fink, A., Lausen, B., Seidel, W., Ultsch, A. (eds) Advances in Data Analysis, Data Handling and Business Intelligence. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01044-6_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-01044-6_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01043-9
Online ISBN: 978-3-642-01044-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)