Skip to main content

The AnIta-Lemmatiser: A Tool for Accurate Lemmatisation of Italian Texts

  • Conference paper
Evaluation of Natural Language and Speech Tools for Italian (EVALITA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7689))

Abstract

This paper presents the AnIta-Lemmatiser, an automatic tool to lemmatise Italian texts. It is based on a powerful morphological analyser enriched with a large lexicon and some heuristic techniques to select the most appropriate lemma among those that can be morphologically associated to an ambiguous wordform. The heuristics are essentially based on the frequency-of-use tags provided by the De Mauro/Paravia electronic dictionary. The AnIta-Lemmatiser ranked at the second place in the Lemmatisation Task of the EVALITA 2011 evaluation campaign. Beyond the official lemmatiser used for EVALITA, some further improvements are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agic, Z., Tadic, M., Dovedan, Z.: Evaluating Full Lemmatization of Croatian Texts. Recent Advances in Intelligent Information Systems, pp. 175–184. Academic Publishing House (2009)

    Google Scholar 

  2. Airio, E.: Word normalization and decompounding in mono- and bilingual. IR Information Retrieval 9, 249–271 (2006)

    Article  Google Scholar 

  3. Battista, M., Pirrelli, V.: Monotonic Paradigmatic Schemata in Italian Verb Inflexion. In: Proc. of COLING 1996, Copenhagen, pp. 77–82 (1996)

    Google Scholar 

  4. Battista, M., Pirrelli, V.: Una piattaforma di morfologia computazionale per l’analisi e la generazione delle parole italiane. ILC-CNR (2000)

    Google Scholar 

  5. Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Publications (2003)

    Google Scholar 

  6. Carota, F.: Derivational Morphology of Italian: Principles for Formalisation. Literary and Linguistic Computing 21, 41–53 (2006)

    Article  Google Scholar 

  7. Cöltekin, C.: A Freely Available Morphological Analyzer for Turkish. In: Proc. of the 7th International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta (2010)

    Google Scholar 

  8. Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4(1), 3:1–3:34 (2007)

    Article  Google Scholar 

  9. Delmonte, R.: Computational Linguistic Text Processing - Lexicon, Grammar, Parsing and Anaphora Resolution. Nova Science Publisher, New York (2009)

    Google Scholar 

  10. De Mauro, T.: Guida all’uso delle parole. Editori Riuniti, Roma (1980)

    Google Scholar 

  11. De Mauro, T.: Il dizionario della lingua italiana, Paravia (2000)

    Google Scholar 

  12. Gridach, M., Chenfour, N.: XMODEL: An XML-based Morphological Analyzer for Arabic Language. International Journal of Computational Linguistics 1(2), 12–26 (2010)

    Google Scholar 

  13. Hammarström, H., Borin, L.: Unsupervised Learning of Morphology. Computational Linguistics 37(2), 309–350 (2011)

    Article  Google Scholar 

  14. Hardie, A., Lohani Yogendra, R.R., Yadava, P.: Extending corpus annotation of Nepali: advances in tokenisation and lemmatisation. Himalayan Linguistics 10(1), 151–165 (2011)

    Google Scholar 

  15. Ingason, A.K., Helgadóttir, S., Loftsson, H., Rögnvaldsson, E.: A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI). In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 205–216. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Kiraz, G.A.: Computational Nonlinear Morphology: with emphasis on Semitic Languages. Cambridge University Press (2004)

    Google Scholar 

  17. Koskenniemi, K.: Two-level morphology: A general computational model for word-form recognition and generation. PhD Thesis, University of Helsinki (1983)

    Google Scholar 

  18. Lindén, K., Silfverberg, M., Pirinen, T.: HFST Tools for Morphology - An Efficient Open-Source Package for Construction of Morphological Analyzers. In: Proc. of the Workshop on Systems and Frameworks for Computational Morphology, Zurich (2009)

    Google Scholar 

  19. Mendes, A., Amaro, R., Bacelar do Nascimento, M.F.: Reusing Available Resources for Tagging a Spoken Portuguese Corpus. In: Branco, A., Mendes, A., Ribeiro, R. (eds.) Language Technology for Portuguese: Shallow Processing Tools and Resources, Lisbon, Edicoes Colibri, pp. 25–28 (2003)

    Google Scholar 

  20. Pianta, E., Girardi, C., Zanoli, R.: The TextPro tool suite. In: Proc. of the 6th Language Resources and Evaluation Conference (LREC 2008), Marrakech (2008)

    Google Scholar 

  21. Plisson, J., Lavrač, N., Mladenić, D., Erjavec, T.: Ripple Down Rule Learning for Automated Word Lemmatisation. AI Communications 21, 15–26 (2008)

    MathSciNet  MATH  Google Scholar 

  22. Roark, B., Sproat, R.: Computational Approaches to Morphology and Syntax. Oxford University Press (2006)

    Google Scholar 

  23. Rossini Favretti, R., Tamburini, F., De Santis, C.: CORIS/CODIS: A corpus of written Italian based on a defined and a dynamic model. In: Wilson, A., Rayson, P., McEnery, T. (eds.) A Rainbow of Corpora: Corpus Linguistics and the Languages of the World, Lincom-Europa, Munich, pp. 27–38 (2002)

    Google Scholar 

  24. Schmid, H., Fitschen, A., Heid, U.: SMOR: A German computational morphology covering derivation, composition, and inflection. In: Proc. of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, pp. 1263–1266 (2004)

    Google Scholar 

  25. Tamburini, F.: The EVALITA 2011 Lemmatisation Task. In: Working Notes of EVALITA 2011, Rome, Italy (January 24-25, 2012)

    Google Scholar 

  26. Tamburini, F., Melandri, M.: AnIta: a powerful morphological analyser for Italian. In: Proc. of LREC 2012, Istanbul, pp. 941–947 (2012)

    Google Scholar 

  27. Tzoukermann, E., Libermann, M.Y.: A finite-state morphological processor for Spanish. In: Proc. of COLING 1990, pp. 277–281 (1990)

    Google Scholar 

  28. Van Eynde, F., Zavrel, J., Daelemans, W.: Lemmatisation and morphosyntactic annotation for the spoken Dutch corpus. In: Proceedings of CLIN 1999, pp. 53–62. Utrecht Institute of Linguistics OTS, Utrecht (1999)

    Google Scholar 

  29. Widdows, D.: Geometry and Meaning. CSLI Publication (2004)

    Google Scholar 

  30. Zanchetta, E., Baroni, M.: Morph-it! A free corpus-based morphological resource for the Italian language. In: Proc. Corpus Linguistics 2005, Birmingham (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tamburini, F. (2013). The AnIta-Lemmatiser: A Tool for Accurate Lemmatisation of Italian Texts. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35828-9_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35827-2

  • Online ISBN: 978-3-642-35828-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics