Entropy Guided Transformation Learning

dos Santos, Cícero Nogueira; Milidiú, Ruy Luiz

doi:10.1007/978-3-642-01082-8_7

Cícero Nogueira dos Santos⁶ &
Ruy Luiz Milidiú⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 201))

1001 Accesses
5 Citations

Abstract

This work presents Entropy Guided Transformation Learning (ETL), a new machine learning algorithm for classification tasks. ETL generalizes Transformation Based Learning (TBL) by automatically solving the TBL bottleneck: the construction of good template sets. ETL uses the information gain in order to select the feature combinations that provide good template sets.

We describe the application of ETL to two language independent Text Mining preprocessing tasks: part-of-speech tagging and phrase chunking. We also report our findings on one language independent Information Extraction task: named entity recognition. Overall, we successfully apply it to six different languages: Dutch, English, German, Hindi, Portuguese and Spanish.

For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. Furthermore, our extensive experimental results demonstrate that ETL is an effective way to learn accurate transformation rules. We believe that by avoiding the use of handcrafted templates, ETL enables the use of transformation rules to a greater range of Text Mining applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aluísio, S.M., Pelizzoni, J.M., Marchi, A.R., de Oliveira, L., Manenti, R., Marquiafável, V.: An account of the challenge of tagging a reference corpus for brazilian portuguese. In: PROPOR, pp. 110–117 (2003)
Google Scholar
Bharati, A., Mannem, P.R.: Introduction to shallow parsing contest on south asian languages. In: Proceedings of the IJCAI and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pp. 1–8 (2007)
Google Scholar
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol (2002)
Google Scholar
Brants, T.: Tnt – a statistical part-of-speech tagger. In: ANLP, pp. 224–231 (2000)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput. Linguistics 21(4), 543–565 (1995)
Google Scholar
Carberry, S., Vijay-Shanker, K., Wilson, A., Samuel, K.: Randomized rule selection in transformation-based learning: a comparative study. Natural Language Engineering 7(2), 99–116 (2001)
Article Google Scholar
Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 167–170 (2002)
Google Scholar
Corston-Oliver, S., Gamon, M.: Combining decision trees and transformation-based learning to correct transferred linguistic representations. In: Proceedings of the Ninth Machine Tranlsation Summit, New Orleans, USA, pp. 55–62. Association for Machine Translation in the Americas (2003)
Google Scholar
Curran, J.R., Wong, R.K.: Formalisation of transformation-based learning. In: Proceedings of the ACSC, Canberra, Australia, pp. 51–57 (2000)
Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1, 131–156 (1997)
Article Google Scholar
dos Santos, C.N., Oliveira, C.: Constrained atomic term: Widening the reach of rule templates in transformation based learning. In: Bento, C., Cardoso, A., Dias, G. (eds.) EPIA 2005. LNCS, vol. 3808, pp. 622–633. Springer, Heidelberg (2005)
Chapter Google Scholar
dos Santos, C.N., Milidiú, R.L., Rentería, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Proceedings of 8th Workshop on Computational Processing of Written and Spoken Portuguese, pp. 143–152 (2008)
Google Scholar
Elming, J.: Transformation-based corrections of rule-based mt. In: Proceedings of the EAMT 11th Annual Conference, Oslo, Norway (2006)
Google Scholar
Finger, M.: Técnicas de otimização da precisão empregadas no etiquetador tycho brahe. In: Proceedings of PROPOR, São Paulo, pp. 141–154 (November 2000)
Google Scholar
Florian, R.: Named entity recognition as a house of cards: Classifier stacking. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 175–178 (2002)
Google Scholar
Florian, R., Henderson, J.C., Ngai, G.: Coaxing confidences from an old friend: Probabilistic classifications from transformation rule lists. In: Proceedings of Joint Sigdat Conference on Empirical Methods in NLP and Very Large Corpora, Hong Kong University of Science and Technology (October 2000)
Google Scholar
Forman, G., Guyon, I., Elisseeff, A.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
Article MATH Google Scholar
Francis, W.N., Kucera, H.: Frequency analysis of english usage. Lexicon and grammar (1982)
Google Scholar
Freitas, M.C., Duarte, J.C., dos Santos, C.N., Milidiú, R.L., Renteria, R.P., Quental, V.: A machine learning approach to the identification of appositives. In: Proceedings of Ibero-American AI Conference, Ribeirão Preto, Brazil (October 2006)
Google Scholar
Freitas, M.C., Garrao, M., Oliveira, C., dos Santos, C.N., Silveira, M.: A anotação de um corpus para o aprendizado supervisionado de um modelo de sn. In: Proceedings of the III TIL / XXV Congresso da SBC, São Leopoldo - RS - Brasil (2005)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Article MATH MathSciNet Google Scholar
Higgins, D.: A transformation-based approach to argument labeling. In: Ng, H.T., Riloff, E. (eds.) HLT-NAACL 2004 Workshop: Eighth Conference on Computational Natural Language Learning (CoNLL 2004), Boston, Massachusetts, USA, May 6 - 7, 2004, pp. 114–117. Association for Computational Linguistics (2004)
Google Scholar
Hwang, Y.-S., Chung, H.-J., Rim, H.-C.: Weighted probabilistic sum model based on decision tree decomposition for text chunking. International Journal of Computer Processing of Oriental Languages (1), 1–20 (2003)
Article Google Scholar
IEL-UNICAMP and IME-USP. Corpus anotado do português histórico tycho brahe, http://www.ime.usp.br/~tycho/corpus/ (accessed January 23, 2008)
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice-Hall, Englewood Cliffs (2000)
Google Scholar
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of the NAACL 2001 (2001)
Google Scholar
Mangu, L., Brill, E.: Automatic rule acquisition for spelling correction. In: Proceedings of The Fourteenth ICML. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Màrquez, L., Carreras, X., Litkowski, K.C., Stevenson, S.: Semantic role labeling: an introduction to the special issue. Computational Linguistics 34(2), 145–159 (2008)
Article Google Scholar
Megyesi, B.: Shallow parsing with pos taggers and linguistic features. Journal of Machine Learning Research 2, 639–668 (2002)
Article MATH Google Scholar
Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation learning. In: Proceedings of ACL 2008, Columbus, Ohio (2008)
Google Scholar
Milidiú, R.L., Duarte, J.C., Cavalcante, R.: Machine learning algorithms for portuguese named entity recognition. In: Proceedings of Fourth Workshop in Information and Human Language Technology, Ribeirão Preto, Brazil (2006)
Google Scholar
Milidiú, R.L., Duarte, J.C., dos Santos, C.N.: Tbl template selection: An evolutionary approach. In: Proceedings of Conference of the Spanish Association for Artificial Intelligence - CAEPIA, Salamanca, Spain (2007)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of North Americal ACL, pp. 40–47 (June 2001)
Google Scholar
Avinesh, P.V.S., Gali, K.: Part-of-speech tagging and chunking using conditional random fields and transformation based learning. In: Proceedings of the IJCAI and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pp. 21–24 (2007)
Google Scholar
Ross Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Ross Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K.W., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Kluwer, Dordrecht (1999)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the conll-2000 shared task: chunking. In: Proceedings of the 2nd workshop on Learning language in logic and the 4th CONLL, Morristown, NJ, USA, pp. 127–132. Association for Computational Linguistics (2000)
Google Scholar
Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proceedings of ANLP 1997 (1997)
Google Scholar
Su, J., Zhang, H.: A fast decision tree learning algorithm. In: AAAI (2006)
Google Scholar
Tjong Kim Sang, E.F.: Introduction to the conll-2002 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)
Google Scholar
Wu, Y.-C., Chang, C.-H., Lee, Y.-S.: A general and multi-lingual phrase chunking model based on masking method. In: Proceedings of 7th International Conference on Intelligent Text Processing and Computational Linguistics, pp. 144–155 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Informática, PUC-Rio, Rio de Janeiro, Brazil
Cícero Nogueira dos Santos & Ruy Luiz Milidiú

Authors

Cícero Nogueira dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Ruy Luiz Milidiú
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Business Administration, Quantitative and Information System Department, Kuwait University, P.O. Box 5486, 13055, Safat, Kuwait
Aboul-Ella Hassanien
Center of Excellence for Quantifiable, Quality of Service, Norwegian University of Science & Technology, O.S. Bragstads plass 2E, 7491, Trondheim, Norway
Ajith Abraham
Department of Computer and Telecommunications Engineering, University ofWestern Macedonia, Agios Dimitrios Park, 50 100, Kozani, Greece
Athanasios V. Vasilakos
Dept. Electrical and Computer Engineering, University of Alberta, T6J 2V4, Edmonton,Alberta, Canada
Witold Pedrycz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

dos Santos, C.N., Milidiú, R.L. (2009). Entropy Guided Transformation Learning. In: Hassanien, AE., Abraham, A., Vasilakos, A.V., Pedrycz, W. (eds) Foundations of Computational, Intelligence Volume 1. Studies in Computational Intelligence, vol 201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01082-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-01082-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01081-1
Online ISBN: 978-3-642-01082-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics