A Support Vector Machine Approach to Dutch Part-of-Speech Tagging

Poel, Mannes; Stegeman, Luite; op den Akker, Rieks

doi:10.1007/978-3-540-74825-0_25

Mannes Poel¹,
Luite Stegeman¹ &
Rieks op den Akker¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1483 Accesses
4 Citations

Abstract

Part-of-Speech tagging, the assignment of Parts-of-Speech to the words in a given context of use, is a basic technique in many systems that handle natural languages. This paper describes a method for supervised training of a Part-of-Speech tagger using a committee of Support Vector Machines on a large corpus of annotated transcriptions of spoken Dutch. Special attention is paid to the decomposition of the large data set into parts for common, uncommon and unknown words. This does not only solve the space problems caused by the amount of data, it also improves the tagging time. The performance of the resulting tagger in terms of accuracy is 97.54 %, which is quite good, where the speed of the tagger is reasonably good.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Canisius, S., van den Bosch, A.: A memory-based shallow parser for spoken dutch. In: ILK/Computational Linguistics and AI, Tilburg University (2004)
Google Scholar
Daelemans, W., Zavrel, J., Berck, P., Gillis, S.: Mbt: A memory-based part of speech tagger-generator. In: Proceedings of the 4th Workshop on Very Large Corpora, ACL SIGDAT (2000)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Google Scholar
Brants, T.: TnT – A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied NLP Conference (ANLP-2000), pp. 224–331 (2000)
Google Scholar
Zavrel, J., Daelemans, W.: Bootstrapping a tagged corpus through combination of existing heterogeneous taggers. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC) (2002)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Boser, B., Guyon, I., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)
Chapter Google Scholar
Oostdijk, N., Goedertier, W., van Eynde, F., Boves, L., Martens, J.P., Moortgat, M., Baayen, H.: Experiences from the spoken dutch corpus project. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC), pp. 340–347 (2002)
Google Scholar
van Eynde, F.: Part of speech tagging en lemmatisering. Technical report, Centrum voor Computerlinguïstiek, K.U. Leuven (2000)
Google Scholar
Giménez, J., Márquez, L.: SVMTool: A general POS tagger based on support vector machines. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC) (2004)
Google Scholar
Nakagawa, T., Kudo, T., Matsumoto, Y.: Unknown word guessing and part-of-speech tagging using support vector machines. In: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, pp. 325–331 (2001)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Human Media Interaction, Dept. Computer Science, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands
Mannes Poel, Luite Stegeman & Rieks op den Akker

Authors

Mannes Poel
View author publications
You can also search for this author in PubMed Google Scholar
Luite Stegeman
View author publications
You can also search for this author in PubMed Google Scholar
Rieks op den Akker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Poel, M., Stegeman, L., op den Akker, R. (2007). A Support Vector Machine Approach to Dutch Part-of-Speech Tagging. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-74825-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74824-3
Online ISBN: 978-3-540-74825-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics