Abstract
We developed algorithmic stemmers for Hungarian and used them for the ad-hoc monolingual task for CLEF 2005. Our goal was to determine what degree of stemming is the most effective. Although on average the stemmers did not perform as well as the the best n-gram, we found that stemming over a broad range of suffixes especially on nouns is highly useful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Szeged Corpus. A morpho-syntactically annotated and POS tagged Hungarian corpus (2005)
Di Nunzio, G.M., Ferro, N., Jones, G.J.F., Peters, C.: CLEF 2005: Ad Hoc Track Overview. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 11–36. Springer, Heidelberg (2006), http://www.clef-campaign.org/2005/working_notes/workingnotes2005/dinunzio05.pdf
Erjavec, T., Monachini, M.: Specifications and notation for lexicon encoding. Technical report, COP Project 106 MULTEXT - East, December 17 (1997)
Fissaha Adafre, S., van Hage, W.R., Kamps, J., de Melo, G.L., de Rijke, M.: The University of Amsterdam at CLEF 2004 (2004)
Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for European languages (2003)
Korenius, T., Laurikkala, J., Jarvelin, K., Juhola, M.: Stemming and lemmatization in the clustering of finnish text documents. In: Proceedings of the Thirteenth ACM conference on Information and knowledge management, pp. 625–633 (2005)
Lucene. The Lucene search engine. http://jakarta.apache.org/lucene/
Megyesi, B.: The Hungarian language, http://www.speech.kth.se/~bea/hungarian.pdf
Paice, C.D.: Method for evaluation of stemming algorithms based on error counting. Journal of The American Society for Information Science 47(8), 632–649 (1996)
Peters, C.: What happened in clef (2005), http://www.clef-campaign.org/2005/working_notes/workingnotes2005/peters05.pdf
Snowball. The Snowball string processing language (2005), http://snowball.tartarus.org/
Tordai, A., de Rijke, M.: Hungarian monolingual retrieval at clef (2005), http://www.clef-campaign.org/2005/working_notes/workingnotes2005/tordai05.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tordai, A., de Rijke, M. (2006). Four Stemmers and a Funeral: Stemming in Hungarian at CLEF 2005. In: Peters, C., et al. Accessing Multilingual Information Repositories. CLEF 2005. Lecture Notes in Computer Science, vol 4022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11878773_20
Download citation
DOI: https://doi.org/10.1007/11878773_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45697-1
Online ISBN: 978-3-540-45700-8
eBook Packages: Computer ScienceComputer Science (R0)