Abstract
In this paper, a language-independent morphological annotation tool is presented that is based on the Moses SMT toolkit. Taking Hungarian as an example, we demonstrate that the algorithm performs very well for morphologically rich languages. In order to reach a very high, more than 98%, annotation accuracy, the presented system uses a trie-based suffix guesser, which enables the tool to handle words unseen in the training data effectively. The system yields state-of-the-art performance among language-independent tools for morphological annotation of Hungarian. For PoS tagging, it even outperforms the best hybrid tagger, which includes a language-specific morphological analyzer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brants, T.: Tnt - a Statistical Part-of-Speech Tagger. In: Proceedings of the Sixth Applied Natural Language Processing (ANLP 2000), Seattle, WA (2000)
Csendes, D., Csirik, J.A., Gyimóthy, T.: The Szeged Corpus: A POS Tagged and Syntactically Annotated Hungarian Natural Language Corpus. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 41–47. Springer, Heidelberg (2004)
Erjavec, T.: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Fourth International Conference on Language Resources and Evaluation, LREC 2004, pp. 1535–1538. ELRA (2004)
Gascó i Mora, G., Sánchez Peiró, J.A.: Part-of-Speech tagging based on machine translation techniques. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4477, pp. 257–264. Springer, Heidelberg (2007)
Halácsy, P., Kornai, A., Oravecz, C., Trón, V., Varga, D.: Using a morphological analyzer in high precision POS tagging of Hungarian. In: Proceedings of LREC 2006, pp. 2245–2248 (2006)
James, F.: Modified Kneser-Ney smoothing of n-gram models. Tech. rep. (2000)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd edn. Prentice Hall series in artificial intelligence. Prentice Hall, Pearson Education International, Englewood Cliffs, NJ (2009)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 177–180. Association for Computational Linguistics, Prague (2007)
Laki, L.: Investigating the Possibilities of Using SMT for Text Annotation. In: Simões, A., Queirós, R., da Cruz, D. (eds.) 1st Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), vol. 21, pp. 267–283. Schloss Dagstuhl–Leibniz-Zentrum Informatik, Dagstuhl (2012)
Novák, A.: What is good Humor like? In: I. Magyar Számítógés Nyelvészeti Konferencia, pp. 138–144. SZTE, Szeged (2003)
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the 38h Annual Meeting on Association for Computational Linguistics, Hongkong, China, pp. 440–447 (2000)
Oravecz, C., Dienes, P.: Efficient stochastic Part-of-Speech tagging for Hungarian. In: Proc. of the Third LREC, pp. 710–717. ELRA (2002)
Orosz, G., Novák, A.: PurePos – an open source morphological disambiguator. In: Sharp, B., Zock, M. (eds.) Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science, Wroclaw, pp. 53–63 (2012)
Trón, V., Halácsy, P., Rebrus, P., András Rung, P.V., Simon, E.: Morphdb.hu: Hungarian lexical database and morphological grammar. In: LREC, pp. 1670–1673 (2006)
Zsibrita, J., Vincze, V., Farkas, R.: Ismeretlen kifejezések és a szófaji egyértelműsítés. In: VII. Magyar Számítgépes Nyelvészeti Konferencia, Szegedi Tudományegyetem, Szeged, pp. 275–283 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Laki, L.J., Orosz, G., Novák, A. (2013). HuLaPos 2.0 – Decoding Morphology. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-45114-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45113-3
Online ISBN: 978-3-642-45114-0
eBook Packages: Computer ScienceComputer Science (R0)