Sepe: A POS Tagger for Spanish

  • Héctor Jiménez
  • Guillermo Morales
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2276)


We describe a part-of-speech tagging system specially designed to tag Spanish texts using small linguistic resources. Nevertheless, the tagger obtains encouraging results. We have found and exploited useful contextual parameters to tag ambiguous and unknown words. Our tagger is mainly supported by word lists and one corpus with around 104 words. The system has been tested for texts of the so called “news” genre and is still on continuous development.


Spanish language part-of-speech tagging 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Daelemans, Walter: Memory-based lexical acquisition and processing, Lecture Notes in Artificial Intelligence, 898, Springer Verlag, pp 85–98, 1995.Google Scholar
  2. 2.
    Daelemans, Walter; Durieux, Gert & van-den-Bosch, Antal: Towards inductive lexicon, Proc. of LREC Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications, Granada,, 1998.
  3. 3.
    Daelemans, Walter; van-den-Bosch, Antal; Zavrel, Jakub; Veenstra, Jorn; Buchholz, Sabine & Busser, Bertjan: Rapid development of NLP modules with memorybased learning, Proc. of ELSNET in Wonderland, pp 105–113, 1998.Google Scholar
  4. 4.
    Jiménez-Salazar, Héctor & Morales-Luna, Guillermo: Instance metrics improvement by probabilistic support, Lecture Notes in Artificial Intelligence, 1793, Springer Verlag, pp 699–705, 2000.Google Scholar
  5. 5.
    Lara, Luis Fernando; Ham-Chande, Roberto & García-Hidalgo, Ma. Isabel: Investigaciones lingüísticas en lexicografía, Jornadas 89, El Colegio de México, 1979.Google Scholar
  6. 6.
    Màrquez, Lluís & Rodríquez, Horacio: Part-of-speech tagging using decision trees, Lecture Notes in Artificial Intelligence, 1398, pp 25–33, 1998.Google Scholar
  7. 7.
    Marques, N. & Pereira, G.: A POS-tagger generator for unknown languages, Procesamiento del Lenguaje Natural, Rev. No. 27, SEPLN, pp 199–206, Spain, 2001.Google Scholar
  8. 8.
    Moreno de Alba, Jose G.: Morfología derivativa nominal en el español de México, National University of Mexico (UNAM), Mexico 1986.Google Scholar
  9. 9.
    Pla, F.; Molina, A. & Prieto N.: Evaluación de un etiquetador morfosintáctico basado en bigramas especializados para el castellano, Procesamiento del Lenguaje Natural, Rev. No. 27, SEPLN, pp 215–221, Spain, 2001.Google Scholar
  10. 10.
    Rodríguez, Santiago & Carretero, Jesús: Building a Spanish speller,, 1997.
  11. 11.
    Ruiz, L.: Desarrollo de un modelo computacional para el procesamiento de corpus textuales basado en la etiquetación automática, Ph. D. dissertation, Universidad de Oriente, Cuba, 2001.Google Scholar
  12. 12.
    van-den Bosch, Antal; Daelemans, Walter; Weijters, Ton: Morphological analysis as classification: an inductive-learning approach,, 1996.
  13. 13.
    Zavrel, Jakub; Daelemans, Walter; Veenstra, Jorn: Resolving PP-attachment ambiguities with MBL, CoNLL,, 1997.

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Héctor Jiménez
    • 1
  • Guillermo Morales
    • 2
  1. 1.Faculty of Computer ScienceAutonomous University of PueblaPueblaMexico
  2. 2.Computer Science Section, CINVESTAVMolecular Engineering Program, Mexican Institute of PetroleumMexico

Personalised recommendations