Skip to main content

Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

Abstract

Two of the problems that should arise when developing a stemming scheme for diachronic corpora are: (1) morphological systems of natural languages may vary throughout time, and these changes are normally not documented sufficiently; and (2) they exhibit very diverse orthographic characteristics. In this short paper, a stemming strategy for a diachronic corpus of Mexican Spanish is briefly described, which partially faces up to these problems. Success rates of the method are contrasted to those of a Porter stemmer.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  2. Medina-Urrea, A., Hlaváčová, J.: Automatic Recognition of Czech Derivational Prefixes. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 189–197. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Medina-Urrea, A., Buenrostro Díaz, E.C.: Características cuantitativas de la flexión verbal del chuj. Estudios de Lingüística Aplicada 38, 15–31 (2003)

    Google Scholar 

  4. Medina-Urrea, A., Alvarado García, M.: Análisis cuantitativo y cualitativo de la derivación léxica en ralámuli. Primer Coloquio Leonardo Manrique, Mexico, Conaculta-INAH (2004)

    Google Scholar 

  5. Medina-Urrea, A.: Automatic Discovery of Affixes by Means of a Corpus: A Catalog of Spanish Affixes. Journal of Quantitative Linguistics 7(2), 97–114 (2000)

    Article  Google Scholar 

  6. Harris, J.: Historical Excursus: Reflexes of the Medieval Stridents. In: Spanish Phonology, pp. 189–206. MIT Press, Cambridge (1969)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Medina-Urrea, A. (2006). Towards the Automatic Lemmatization of 16th Century Mexican Spanish: A Stemming Scheme for the CHEM. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_12

Download citation

  • DOI: https://doi.org/10.1007/11671299_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32205-4

  • Online ISBN: 978-3-540-32206-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics