XML Rules for Enclitic Segmentation

  • Fco. Mario Barcala
  • Miguel A. Molinero
  • Eva Domínguez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4739)


Sentence word segmentation is an important task in robust part-of-speech (POS) tagging systems. In some cases this is relatively simple, since each textual word (or token) corresponds to one linguistic component. However, there are many others where segmentation can be very hard, such as those of contractions, verbal forms with enclitic pronouns, etc., where the same token contains information about two or more linguistic components.


Verbal Form Main System Romance Language Evaluation Element Verbal Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Expert Advisory Group on Language Engineering Standards (EAGLES). Synopsis and Comparison of Morphosyntactic Phenomena Encoded in Lexicons and Corpora. A Common Proposal and Applications to European Languages. In: EAGLES Document EAG-CLWG-MORPHSYN/R (May 1996)Google Scholar
  2. 2.
    Moreno, J.L.A., Lugrís, A.Á., Guinovart, X.G.: Aplicación do etiquetario morfosintáctico do SLI ó corpus de traducción TECTRA. Viceversa, 207–231 (2002)Google Scholar
  3. 3.
    Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing: An Open-Source Suite of Language Analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal (2004)Google Scholar
  4. 4.
    Brants, T.: A statistical part-of-speech tagger. In: Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP’2000), Seatle (2000)Google Scholar
  5. 5.
    Graña, J.: Técnicas de Análisis Sintáctico Robusto para la Etiquetación del Lenguaje Natural. In: Doctoral thesis, Universidad de La Coruña, Spain (2000)Google Scholar
  6. 6.
    Graña, J., Alonso, M.A., Vilares, M.: A Common Solution for Tokenization and Part-of-Speech Tagging: One-Pass Viterbi Algorithm vs. Iterative Approaches. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 3–10. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Graña, J., Barcala, F. M., Vilares, J.: Formal Methods of Tokenization for Part-of-Speech Tagging. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 240–249. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    World Wide Web Consortium,
  9. 9.
    Álvarez, R., Xove, X.: Gramática da Lingua Galega. In: Editorial Galaxia, Vigo, Spain (2002)Google Scholar
  10. 10.
    Graña, J., Barcala, F. M., Alonso, M.A.: Compilation Methods of Minimal Acyclic Finite-State Automata for Large Dictionaries. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 135–148. Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Fco. Mario Barcala
    • 1
  • Miguel A. Molinero
    • 1
  • Eva Domínguez
    • 1
  1. 1.Centro Ramón Piñeiro, Ctra. Santiago-Noia km. 3, A Barcia, 15896 Santiago de CompostelaSpain

Personalised recommendations