Sequential Supervised Learning for Hypernym Discovery from Wikipedia

Litz, Berenike; Langer, Hagen; Malaka, Rainer

doi:10.1007/978-3-642-19032-2_5

Sequential Supervised Learning for Hypernym Discovery from Wikipedia

Berenike Litz⁵,
Hagen Langer⁵ &
Rainer Malaka⁵

Conference paper

842 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 128))

Abstract

Hypernym discovery is an essential task for building and extending ontologies automatically. In comparison to the whole Web as a source for information extraction, online encyclopedias provide far more structuredness and reliability. In this paper we propose a novel approach that combines syntactic and lexical-semantic information to identify hypernymic relationships. We compiled semi-automatically and manually created training data and a gold standard for evaluation with the first sentences from the German version of Wikipedia. We trained a sequential supervised learner with a semantically enhanced tagset. The experiments showed that the cleanliness of the data is far more important than the amount of the same. Furthermore, it was shown that bootstrapping is a viable approach to ameliorate the results. Our approach outperformed the competitive lexico-syntactic patterns by 7% leading to an F ₁-measure of over .91.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the Conference on Computational Linguistics (COLING), Nantes, France (1992)
Google Scholar
Caraballo, S.A.: Automatic construction of a hypernym-labeled noun hierarchy from text. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, Morristown, NJ, USA, Association for Computational Linguistics, pp. 120–126 (1999)
Google Scholar
Kliegr, T., Chandramouli, K., Nemrava, J., Svatek, V., Izquierdo, E.: Combining image captions and visual analysis for image concept classification. In: Proceedings of the 9th International Workshop on Multimedia Data Mining (MDM), pp. 8–17. ACM, New York (2008)
Google Scholar
Kozareva, Z., Riloff, E., Hovy, E.: Semantic class learning from the web with hyponym pattern linkage graphs. In: Proceedings of the Annual Meeting on Association for Computational Linguistics (ACL), Columbus, Ohio, Association for Computational Linguistics, pp. 1048–1056 (June 2008)
Google Scholar
Snow, R., Jurafsky, D., Ng, A.Y.: Semantic taxonomy induction from heterogenous evidence. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, Association for Computational Linguistics, pp. 801–808 (July 2006)
Google Scholar
Kliegr, T., Chandramouli, K., Nemrava, J., Svatek, V., Izquierdo, E.: Wikipedia as the premiere source for targeted hypernym discovery. In: Proceedings of the Wikis, Blogs, Bookmarking Tools - Mining the Web 2.0 Workshop co-located with the 18th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (2008)
Google Scholar
Kazama, J., Torisawa, K.: Exploiting wikipedia as external knowledge for named entity recognition. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 698–707 (2007)
Google Scholar
Tufis, D., Mason, O.: Tagging romanian texts: a case study for qtag, a language independent probabilistic tagger. In: Proceedings of the 1st International Conference of Language Resources and Evaluation (LREC), Granada, Spain (1998)
Google Scholar
Brants, T.: TnT – A statistical Part-of-Speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing (ANLP), Seattle, Washington, pp. 224–231 (2000)
Google Scholar
Schmidt, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing (NeMLaP), Manchester, U.K., pp. 14–16 (September 1994)
Google Scholar
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, Morristown, NJ, USA, Association for Computational Linguistics, pp. 63–70 (2000)
Google Scholar
Loos, B., Porzel, R.: Resolution of lexical ambiguities in spoken dialogue systems. In: Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue (SIGdial), Morristown, NJ, USA, Association for Computational Linguistics (2004)
Google Scholar
Samuelsson, C.: Morphological tagging based entirely on bayesian inference. In: Eklund, R. (ed.) Proceedings of the 9th Scandinavian Conference on Computational Linguistics, Stockholm, Sweden, pp. 225–238 (1994)
Google Scholar
Abney, S.: Bootstrapping. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Morristown, NJ, USA, Association for Computational Linguistics, pp. 360–367 (2002)
Google Scholar
Van Rijsbergen, C.J.K.: Information Retrieval, 2nd edn., Dept. of Computer Science, University of Glasgow (1979), doi:Van Rijsbergen, C.J.K
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of International Conference on Machine Learning (ICML), pp. 282–289 (2001)
Google Scholar
Loos, B., Biemann, C.: Supporting web-based address extraction with unsupervised tagging. In: Bock, H.H., Gaul, W., Vichi, M. (eds.) Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

TZI, University of Bremen, Bremen, Germany
Berenike Litz, Hagen Langer & Rainer Malaka

Authors

Berenike Litz
View author publications
You can also search for this author in PubMed Google Scholar
Hagen Langer
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Malaka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Delft University of Technology, Mekelweg 4, 2628, Delft, CD, The Netherlands
Jan L. G. Dietz
Informatics Research Centre, Henley Business School, University of Reading, RG6 6UD, Reading, UK
Kecheng Liu
Departament of Systems and Informatics, Polytechnic Institute of Setúbal – INSTICC, Rua do Vale de Chaves - Estefanilha, 2910-761, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Litz, B., Langer, H., Malaka, R. (2011). Sequential Supervised Learning for Hypernym Discovery from Wikipedia. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowlege Engineering and Knowledge Management. IC3K 2009. Communications in Computer and Information Science, vol 128. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19032-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-19032-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19031-5
Online ISBN: 978-3-642-19032-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics