Skip to main content

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6997))

Abstract

In this paper, we propose the Automatic Taxonomy Construction from Text (ATCT) framework for building taxonomies from text-based Web corpora. The framework is composed of multiple processing steps. Firstly, domain terms are extracted using a filtering method. Subsequently, Word Sense Disambiguation (WSD) is optionally applied in order to determine the senses of these terms. Then, by means of a subsumption technique, the resulting concepts are arranged in a hierarchy. We construct taxonomies with and without WSD and we investigate the effect of WSD on the quality of concept type-of relations using an evaluation framework that uses a golden taxonomy. We find that WSD improves the quality of the built taxonomy in terms of the taxonomic F-Measure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bechhofer, S., Miles, A.: SKOS Simple Knowledge Organization System Reference - W3C Recommendation, August 18 (2009), http://www.w3.org/TR/2009/REC-skos-reference-20090818/

  2. Budanitsky, A., Hirst, G.: Semantic Distance in WordNet: An Experimental, Application-Oriented Evaluation of Five Measures. In: Workshop on WordNet and Other Lexical Resources, 2nd Meeting of the North American Chapter of the Assocation for Computational Linguistics (NAACL 2001), pp. 29–34. Association for Computational Linguistics (2001)

    Google Scholar 

  3. Cimiano, P., Hotho, A., Staab, S.: Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis. Journal of Artificial Intelligence Research 24(1), 305–339 (2005)

    MATH  Google Scholar 

  4. Dellschaft, K., Staab, S.: On How to Perform a Gold Standard Based Evaluation of Ontology Learning. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 228–241. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 5(2), 199–221 (1993)

    Article  Google Scholar 

  6. Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th Conf. on Computational Linguistics (COLING 1992), vol. 2, pp. 539–545 (1992)

    Google Scholar 

  7. Jian, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: 10th Republic of China Computational Linguistics Conf. on Research in Computational Linguistics, The Association for Compuational Linguistics and Chinese Language Processing (ROCLING 1997), pp. 19–33 (1997)

    Google Scholar 

  8. Klein, D., Manning, C.D.: Fast Exact Inference with a Factored Model for Natural Language Processing. In: 16th Annual Conf. on Neural Information Processing Systems (NIPS 2002). Advances in Neural Information Processing Systems, vol. 15, pp. 3–10. MIT Press, Cambridge (2002)

    Google Scholar 

  9. McBride, B.: Jena: Semantic Web Toolkit. IEEE Internet Computing 6(6), 55–59 (2002)

    Article  Google Scholar 

  10. Navigli, R., Lapata, M.: Graph Connectivity Measures for Unsupervised Word Sense Disambiguation. In: Veloso, M.M. (ed.) 20th Int. Joint Conf. on Artificial Intelligence (IJCAI 2007), pp. 1683–1688. AAAI Press, Menlo Park (2007)

    Google Scholar 

  11. Sanderson, M., Croft, B.: Deriving Concept Hierarchies from Text. In: 22nd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR 1999), pp. 206–213. ACM, New York (1999)

    Chapter  Google Scholar 

  12. Sclano, F., Velardi, P.: TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities. In: 7th Conf. on Terminology and Artificial Intelligence (TIA 2007). Presses Universitaires de Grenoble (2007)

    Google Scholar 

  13. Weber, N., Buitelaar, P.: Web-based Ontology Learning with ISOLDE. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 428–444. Springer, Heidelberg (2006), http://www.dfki.de/dfkibib/publications/docs/ISWC06.WebContentMining.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

de Knijff, J., Meijer, K., Frasincar, F., Hogenboom, F. (2011). Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds) Web Information System Engineering – WISE 2011. WISE 2011. Lecture Notes in Computer Science, vol 6997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24434-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24434-6_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24433-9

  • Online ISBN: 978-3-642-24434-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics