Skip to main content

Unsupervised Ontology Acquisition from Plain Texts: The OntoGain System

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2010)

Abstract

We propose OntoGain, a system for unsupervised ontology acquisition from unstructured text which relies on multi-word term extraction. For the acquisition of taxonomic relations, we exploit inherent multi-word terms’ lexical information in a comparative implementation of agglomerative hierarchical clustering and formal concept analysis methods. For the detection of non-taxonomic relations, we comparatively investigate in OntoGain an association rules based algorithm and a probabilistic algorithm. The OntoGain system allows for transformation of the derived ontology into standard OWL statements. OntoGain results are compared to both hand-crafted ontologies, as well as to a state-of-the art system, in two different domains: the medical and computer science domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pinto, H., Martins, J.: Ontologies: How can They be Built? Knowledge and Information Systems 6(4), 441–464 (2004)

    Article  Google Scholar 

  2. Suchanek, F.M., Sozio, M., Weikum, G.: SOFIE: A Self-Organizing Framework for Information Extraction. In: Proc. of the 18th Intern. World Wide Web Conf. (WWW 2009), Madrid, Spain, pp. 631–640. ACM Press, New York (2009)

    Chapter  Google Scholar 

  3. Pantel, P., Pennacchiotti, M.: Automatically Harvesting and Ontologizing Semantic Relations. In: Proc. of the 2008 Conf. on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 171–195. IOS Press, Amsterdam (2008)

    Google Scholar 

  4. Velardi, P., Navigli, R., Cucchiarelli, A., Neri, F.: Evaluation of OntoLearn, a Methodology for Automatic Learning of Ontologies. In: Buitelaar, P., Cimmiano, P., Magnini, B. (eds.) Ontology Learning from Text: Methods, Evaluation and Applications, pp. 569–572. IOS Press, Amsterdam (2005)

    Google Scholar 

  5. Buitelaar, P., Cimiano, P., Frank, A., Racioppa, S.: SOBA: SmartWeb Ontology-based Annotation. In: Proc. of the Demo Session at the Intern. Semantic Web Conference (ISWC), Athens GA, USA (November 2006)

    Google Scholar 

  6. Cimiano, P., Hotho, A., Staab, S.: Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis. Journal of Artificial Intelligence Research (JAIR) 24, 305–339 (2005)

    MATH  Google Scholar 

  7. Haav, H.M.: An application of inductive concept analysis to construction of domain-specific ontologies. In: Brandenburg University of Technology at Cottbus, pp. 63–67 (2003)

    Google Scholar 

  8. Maedche, A., Staab, S.: Discovering Conceptual Relations from Text. In: Proc. of the 14th European Conf. on Artificial Intelligence (ECAI 2000), August 2000, pp. 321–325. IOS Press, Amsterdam (2000)

    Google Scholar 

  9. Ciaramita, M., Gangemi, A., Ratsch, E., Saric, J., Rojas, I.: Unsupervised Learning of Semantic Relations for Molecular Biology Ontologies. In: Buitelaar, P., Cimiano, P. (eds.) Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 99–104. IOS Press, Amsterdam (2008)

    Google Scholar 

  10. Soderland, S., Mandhani, B.: Moving from Textual Relations to Ontologized Relations. In: Proc. of the 2007 AAAI Spring Symposium on Machine Reading, pp. 85–90. AAAI Press, Menlo Park (2007)

    Google Scholar 

  11. Cimiano, P., Völker, J.: Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 227–238. Springer, Heidelberg (2005)

    Google Scholar 

  12. Buitelaar, P., Cimiano, P., Magnini, B.: Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press, Amsterdam (2005)

    Google Scholar 

  13. Frantzi, K., Ananiadou, S., Mima, H.: Automatic Recognition of Multi-Word Terms: The C-Value/NC-Value Method. Intern. Journal of Digital Libraries 3(2), 117–132 (2000)

    Google Scholar 

  14. Witschel, H.: Terminology Extraction and Automatic Indexing – Comparison and Qualitative Evaluation of Methods. In: Proc. of Terminology and Knowledge Engineering, TKE (2005)

    Google Scholar 

  15. Cimiano, P.: Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer, Heidelberg (2006)

    Google Scholar 

  16. Brank, J., Grobelnik, M., Mladenic, D.: A Survey of Ontology Evaluation Techniques. In: Proc. of the Conf. on Data Mining and Data Warehouses (SiKDD 2005), Ljubljana, Slovenia (October 2005)

    Google Scholar 

  17. Kavalec, M., Maedche, A., Svátek, V.: Discovery of Lexical Entries for Non-taxonomic Relations in Ontology Learning. In: Van Emde Boas, P., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2004. LNCS, vol. 2932, pp. 249–256. Springer, Heidelberg (2004)

    Google Scholar 

  18. Nenadic, G., Spasic, I., Ananiadou, S.: Automatic Discovery of Term Similarities Using Pattern Mining. Intl. Journal of Terminology 10(1), 55–80 (2004)

    Article  Google Scholar 

  19. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  20. Hindle, D.: Noun Classification from Predicate-Argument Structures. In: Proc. of the 28th Annual Meeting of the Association for Computational Linguistics (ACL 1990), Pittsburgh, PA, USA, June 1990, pp. 268–275 (1990)

    Google Scholar 

  21. Resnik, P.: Selectional Preference and Sense Disambiguation. In: Proc. of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington, DC (1997)

    Google Scholar 

  22. Ganter, B., Reuter, K.: Finding all Closed Sets: A General Approach. Order 8(3), 283–290 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  23. Srikant, R., Agrawal, R.: Mining Generalized Association Rules. In: Proc. of 21th Conf. on Very Large Data Bases (VLDB 1995), Zurich, Switzerland, September 1995, pp. 407–419. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  24. Scheffer, T.: Finding Association Rules that Trade Support Optimally Against Confidence. Intelligent Data Analysis 9(4), 381–395 (2005)

    Google Scholar 

  25. Cimiano, P., Hartung, M., Ratsch, E.: Finding the Appropriate Generalization Level for Binary Relations Extracted from the Genia Corpus. In: Proc. of the Intern. Conf. on Language Resources and Evaluation (LREC 2006), ELRA, May 2006, pp. 161–169 (2006)

    Google Scholar 

  26. Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: An Interactive Retrieval Evaluation and New Large Test Collection for Research. In: Proc. of the 17th ACM SIGIR, Dublin, Ireland, pp. 192–201 (1994)

    Google Scholar 

  27. Milios, E., Zhang, Y., He, B., Dong, L.: Automatic Term Extraction and Document Similarity in Special Text Corpora. In: 6th Conf. of the Pacific Association for Computational Linguistics, Halifax, Canada, August 2003, pp. 22–25 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Drymonas, E., Zervanou, K., Petrakis, E.G.M. (2010). Unsupervised Ontology Acquisition from Plain Texts: The OntoGain System. In: Hopfe, C.J., Rezgui, Y., MĂ©tais, E., Preece, A., Li, H. (eds) Natural Language Processing and Information Systems. NLDB 2010. Lecture Notes in Computer Science, vol 6177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13881-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13881-2_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13880-5

  • Online ISBN: 978-3-642-13881-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics