Tiered Tagging and Combined Language Models Classifiers

Tufiş, Dan

doi:10.1007/3-540-48239-3_5

Dan Tufiş³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

International Workshop on Text, Speech and Dialogue

497 Accesses
14 Citations

Abstract

We address the problem of morpho-syntactic disambiguation of arbitrary texts in a highly inflectional natural language. We use a large tagset (615 tags), EAGLES and MULTEXT compliant [5]. The large tagset is internally mapped onto a reduced one (82 tags), serving statistical disambiguation, and a text disambiguated in terms of this tagset is subsequently subject to a recovery process of all the information left out from the large tagset. This two step process is called tiered tagging. To further improve the tagging accuracy we use a combined language models classifier, a procedure that interpolates the results of tagging the same text with several register-specific language models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brill, E., and Wu, J. (1998): Classifier Combination for Improved Lexical Disambiguation In Proceedings of COLING-ACL98 Montreal, Canada, 191–195
Google Scholar
Dietterich, T. (1998) Approximate StatisticalTests for Comparing Supervised Classification Learning Algorithms, 1998, http://www.cs.orst.edu/~tgd/cv/pubs.html.
Dietterich, T. (1997): Machine Learning Research: Four Current Directions, In AI Magazine, Winter, 97–136
Google Scholar
Elworthy, D. (1995): Tagset Design and Inflected Languages, In Proceedings of the ACL SIGDAT Workshop, Dublin, Ireland (also available as cmp-lg archive 9504002)
Google Scholar
Erjavec, T., Monachini., M. eds. (1997): Specifications and Notation for Lexicon Encoding of Eastern Languages. Deliverable 1.1F Multext-East http://nl.ijs.si/ME
v. Halteren, H., Zavrel, J., and Daelemans, W. (1998): Improving Data Driven Wordclass Tagging by System Combination In Proceedings of COLING-ACL98, Montreal, Canada, 491–497
Google Scholar
Tufiş, D., Chiţu, A. (1999): Automatic insertion of diacritics in Romanian Texts, In Proceedings of COMPLEX 99, Pecs, Hungary
Google Scholar
Tufiş, D., Mason O. (1998): Tagging Romanian Texts: a Case Study for QTAG, a Language Independent Probabilistic Tagger In Proceedings of First International Conference on Language Resources and Evaluation, Granada, Spain, 589–596
Google Scholar
Tufiş, D., Barbu, A. M., Pătraşcu, V., Rotariu, G., Popescu C. (1997). “Corpora and Corpus-Based Morpho-Lexical Processing” in Dan Tufiş, Poul Andersen (eds.) Recent Advances in Romanian Language Technology, Editura Academiei, 35–56 (also available at http://www.racai.ro/books)

Download references

Author information

Authors and Affiliations

RACAI-Romanian Academy, 13, '13 Septembrie', Ro-74311, Buchareşt
Dan Tufiş

Authors

Dan Tufiş
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineerig, Faculty of Applied Sciences, University of West Bohemia in Plzeň, Universitní 22, 306 14, Pizeň, Czech Republic
Václav Matousek , Pavel Mautner & Jana Ocelíková , &
Department of Programming Systems and Communication, Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tufiş, D. (1999). Tiered Tagging and Combined Language Models Classifiers. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_5

Download citation

DOI: https://doi.org/10.1007/3-540-48239-3_5
Published: 01 October 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics