Skip to main content

HuLaPos 2.0 – Decoding Morphology

  • Conference paper
Book cover Advances in Artificial Intelligence and Its Applications (MICAI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8265))

Included in the following conference series:

  • 1307 Accesses

Abstract

In this paper, a language-independent morphological annotation tool is presented that is based on the Moses SMT toolkit. Taking Hungarian as an example, we demonstrate that the algorithm performs very well for morphologically rich languages. In order to reach a very high, more than 98%, annotation accuracy, the presented system uses a trie-based suffix guesser, which enables the tool to handle words unseen in the training data effectively. The system yields state-of-the-art performance among language-independent tools for morphological annotation of Hungarian. For PoS tagging, it even outperforms the best hybrid tagger, which includes a language-specific morphological analyzer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brants, T.: Tnt - a Statistical Part-of-Speech Tagger. In: Proceedings of the Sixth Applied Natural Language Processing (ANLP 2000), Seattle, WA (2000)

    Google Scholar 

  2. Csendes, D., Csirik, J.A., Gyimóthy, T.: The Szeged Corpus: A POS Tagged and Syntactically Annotated Hungarian Natural Language Corpus. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 41–47. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Erjavec, T.: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Fourth International Conference on Language Resources and Evaluation, LREC 2004, pp. 1535–1538. ELRA (2004)

    Google Scholar 

  4. Gascó i Mora, G., Sánchez Peiró, J.A.: Part-of-Speech tagging based on machine translation techniques. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4477, pp. 257–264. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Halácsy, P., Kornai, A., Oravecz, C., Trón, V., Varga, D.: Using a morphological analyzer in high precision POS tagging of Hungarian. In: Proceedings of LREC 2006, pp. 2245–2248 (2006)

    Google Scholar 

  6. James, F.: Modified Kneser-Ney smoothing of n-gram models. Tech. rep. (2000)

    Google Scholar 

  7. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd edn. Prentice Hall series in artificial intelligence. Prentice Hall, Pearson Education International, Englewood Cliffs, NJ (2009)

    Google Scholar 

  8. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 177–180. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  9. Laki, L.: Investigating the Possibilities of Using SMT for Text Annotation. In: Simões, A., Queirós, R., da Cruz, D. (eds.) 1st Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), vol. 21, pp. 267–283. Schloss Dagstuhl–Leibniz-Zentrum Informatik, Dagstuhl (2012)

    Google Scholar 

  10. Novák, A.: What is good Humor like? In: I. Magyar Számítógés Nyelvészeti Konferencia, pp. 138–144. SZTE, Szeged (2003)

    Google Scholar 

  11. Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the 38h Annual Meeting on Association for Computational Linguistics, Hongkong, China, pp. 440–447 (2000)

    Google Scholar 

  12. Oravecz, C., Dienes, P.: Efficient stochastic Part-of-Speech tagging for Hungarian. In: Proc. of the Third LREC, pp. 710–717. ELRA (2002)

    Google Scholar 

  13. Orosz, G., Novák, A.: PurePos – an open source morphological disambiguator. In: Sharp, B., Zock, M. (eds.) Proceedings of the 9th International Workshop on Natural Language Processing and Cognitive Science, Wroclaw, pp. 53–63 (2012)

    Google Scholar 

  14. Trón, V., Halácsy, P., Rebrus, P., András Rung, P.V., Simon, E.: Morphdb.hu: Hungarian lexical database and morphological grammar. In: LREC, pp. 1670–1673 (2006)

    Google Scholar 

  15. Zsibrita, J., Vincze, V., Farkas, R.: Ismeretlen kifejezések és a szófaji egyértelműsítés. In: VII. Magyar Számítgépes Nyelvészeti Konferencia, Szegedi Tudományegyetem, Szeged, pp. 275–283 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Laki, L.J., Orosz, G., Novák, A. (2013). HuLaPos 2.0 – Decoding Morphology. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45114-0_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45113-3

  • Online ISBN: 978-3-642-45114-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics