Abstract
We address the problem of morpho-syntactic disambiguation of arbitrary texts in a highly inflectional natural language. We use a large tagset (615 tags), EAGLES and MULTEXT compliant [5]. The large tagset is internally mapped onto a reduced one (82 tags), serving statistical disambiguation, and a text disambiguated in terms of this tagset is subsequently subject to a recovery process of all the information left out from the large tagset. This two step process is called tiered tagging. To further improve the tagging accuracy we use a combined language models classifier, a procedure that interpolates the results of tagging the same text with several register-specific language models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brill, E., and Wu, J. (1998): Classifier Combination for Improved Lexical Disambiguation In Proceedings of COLING-ACL98 Montreal, Canada, 191–195
Dietterich, T. (1998) Approximate StatisticalTests for Comparing Supervised Classification Learning Algorithms, 1998, http://www.cs.orst.edu/~tgd/cv/pubs.html.
Dietterich, T. (1997): Machine Learning Research: Four Current Directions, In AI Magazine, Winter, 97–136
Elworthy, D. (1995): Tagset Design and Inflected Languages, In Proceedings of the ACL SIGDAT Workshop, Dublin, Ireland (also available as cmp-lg archive 9504002)
Erjavec, T., Monachini., M. eds. (1997): Specifications and Notation for Lexicon Encoding of Eastern Languages. Deliverable 1.1F Multext-East http://nl.ijs.si/ME
v. Halteren, H., Zavrel, J., and Daelemans, W. (1998): Improving Data Driven Wordclass Tagging by System Combination In Proceedings of COLING-ACL98, Montreal, Canada, 491–497
Tufiş, D., Chiţu, A. (1999): Automatic insertion of diacritics in Romanian Texts, In Proceedings of COMPLEX 99, Pecs, Hungary
Tufiş, D., Mason O. (1998): Tagging Romanian Texts: a Case Study for QTAG, a Language Independent Probabilistic Tagger In Proceedings of First International Conference on Language Resources and Evaluation, Granada, Spain, 589–596
Tufiş, D., Barbu, A. M., Pătraşcu, V., Rotariu, G., Popescu C. (1997). “Corpora and Corpus-Based Morpho-Lexical Processing” in Dan Tufiş, Poul Andersen (eds.) Recent Advances in Romanian Language Technology, Editura Academiei, 35–56 (also available at http://www.racai.ro/books)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tufiş, D. (1999). Tiered Tagging and Combined Language Models Classifiers. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_5
Download citation
DOI: https://doi.org/10.1007/3-540-48239-3_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive