Ontology-Based Feature Transformations: A Data-Driven Approach

Ginter, Filip; Pyysalo, Sampo; Boberg, Jorma; Järvinen, Jouni; Salakoski, Tapio

doi:10.1007/978-3-540-30228-5_25

Filip Ginter⁵,
Sampo Pyysalo⁵,
Jorma Boberg⁵,
Jouni Järvinen⁵ &
…
Tapio Salakoski⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3230))

Included in the following conference series:

International Conference on Natural Language Processing (in Spain)

662 Accesses
2 Citations

Abstract

We present a novel approach to incorporating semantic information to the problems of natural language processing, in particular to the document classification task. The approach builds on the intuition that semantic relatedness of words can be viewed as a non-static property of the words that depends on the particular task at hand. The semantic relatedness information is incorporated using feature transformations, where the transformations are based on a feature ontology and on the particular classification task and data. We demonstrate the approach on the problem of classifying MEDLINE-indexed documents using the MeSH ontology. The results suggest that the method is capable of improving the classification performance on most of the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rada, R., Bicknell, E.: Ranking documents with a thesaurus. Journal of the American Society for Information Science 40, 304–310 (1989)
Article Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Mellish, C. (ed.) Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Budanitsky, A.: Lexical semantic relatedness and its application in natural language processing. Technical Report CSRG390, University of Toronto (1999)
Google Scholar
Baker, D., McCallum, A.: Distributional clustering of words for text classification. In: Croft, W.B., Moffat, A., van Rijsbergen, C.J., Wilkinson, R., Zobel, J. (eds.) Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM Press, New York (1998)
Google Scholar
Scott, S., Matwin, S.: Text classification using WordNet hypernyms. In: Harabagiu, S. (ed.) Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, pp. 38–44, Somerset, New Jersey. Association for Computational Linguistics (1998)
Google Scholar
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W.W., Hirsh, H. (eds.) Proceedings of the 11th International Conference on Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Witten, I.H., Frank, E.: Data Mining. Morgan Kauffman, San Francisco (2000)
MATH Google Scholar
Dietterich, T.G.: Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)
Article Google Scholar
Alpaydm, E.: Combined 5 × 2 cv F test for comparing supervised classification learning algorithms. Neural Computation 11, 1885–1892 (1999)
Article Google Scholar
Ng, H.T.: Exemplar-based word sense disambiguation: Some recent improvements. In: Cardie, C., Weischedel, R. (eds.) Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 208–213, Somerset, New Jersey. Association for Computational Linguistics (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Turku Centre for Computer Science and Department of Information Technology, University of Turku, Lemminkäisenkatu 14, 20520, Turku, Finland
Filip Ginter, Sampo Pyysalo, Jorma Boberg, Jouni Järvinen & Tapio Salakoski

Authors

Filip Ginter
View author publications
You can also search for this author in PubMed Google Scholar
Sampo Pyysalo
View author publications
You can also search for this author in PubMed Google Scholar
Jorma Boberg
View author publications
You can also search for this author in PubMed Google Scholar
Jouni Järvinen
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Salakoski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
José Luis Vicedo
Natural Language Processing and Information Systems Group, Department of Software and Computing Systems, University of Alicante, Spain
Patricio Martínez-Barco
Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Rafael Muńoz
Departamento de Lenguajes y Sistemas Informáticos, Carretera de San Vicente del Raspeig, Universidad de Alicante, 03690 San Vicente del Raspeig, Alicante, Spain
Maximiliano Saiz Noeda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ginter, F., Pyysalo, S., Boberg, J., Järvinen, J., Salakoski, T. (2004). Ontology-Based Feature Transformations: A Data-Driven Approach. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-30228-5_25
Published: 20 October 2004
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23498-2
Online ISBN: 978-3-540-30228-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics