Skip to main content

Inductive Improvement of Part-of-Speech Tagging and Its Effect on a Terminology of Molecular Biology

  • Conference paper
Book cover Advances in Artificial Intelligence (Canadian AI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3501))

Abstract

In the context of Part-of-Speech (PoS)-tagging of specialized corpora, we proposed an inductive approach focusing on the most ‘important’ PoStags because mistaking them can lead to a total misunderstanding of the text. After a standard tagging of a biological corpus by Brill’s tagger, we noted persistent errors that are very hard to deal with. As an application, we studied two cases of different nature: first, confusion between past participle, adjective and preterit for verbs that end with ‘ed’; second, confusion between plural nouns and verbs, 3rd person singular present. With a friendly user interface, the expert corrected the examples. Then, from these well-annotated examples, we induced rules using a propositional rule induction algorithm. Experimental validation showed improvement in tagging precision. The relevance of the terminology of the considered field, here molecular biology, is greatly improved when the number of these tagging errors decreases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amrani, A., Azé, J., Heitz, T., Kodratoff, Y., Roche, M.: From the texts to the concepts they contain: a chain of linguistic treatments. In: Proceedings of TREC 2004 (Text REtrieval Conference), National Institute of Standards and Technology, Gaithersburg Maryland USA, pp. 712–722 (2004)

    Google Scholar 

  2. Brill, E.: Some Advances in Transformation-Based Part of Speech Tagging, vol. 1, pp. 722–727. AAAI, Menlo Park (1994)

    Google Scholar 

  3. Amrani, A., Kodratoff, Y., Matte-Tailliez, O.: A Semi-automatic System for Tagging Specialized Corpora. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 670–681. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Cussens, J.: Part-of-speech tagging using Progol. In: Džeroski, S., Lavrač, N. (eds.) ILP 1997. LNCS, vol. 1297, pp. 93–108. Springer, Heidelberg (1997)

    Google Scholar 

  5. Eineborg, M., Lindberg, N.: ILP in Part-of-Speech Tagging - An Overview. In: Cussens, J., Džeroski, S. (eds.) LLL 1999. LNCS (LNAI), vol. 1925, p. 157. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  6. Lindberg, N., Eineborg, M.: Learning Constraint Grammar-style Disambiguation Rules using Inductive Logic Programming. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 775–779 (1998)

    Google Scholar 

  7. Daelemans, W., Zavrel, J., Berck, P., Gillis, S.: MBT: A Memory-Based Part of Speech Tagger-Generator. In: Ejerhed, E., Dagan, I. (eds.) Proceedings of the 4th Workshop on Very Large Corpora, Copenhagen, pp. 14–27 (1996)

    Google Scholar 

  8. Zavrel, J., Daelemans, W.: Recent Advances in Memory-Based Part-of-Speech Tagging. In: Actas del VI Simposio Internacional de Comunicacion Social, Santiago de Cuba, pp. 590–597 (1999)

    Google Scholar 

  9. Marquez, L., Rodriguez, H.: Part-of-Speech Tagging Using Decision Trees. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 25–36. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Brants, T.: TnT - A Statistical Part- of-Speech Tagger. In: Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle (2000)

    Google Scholar 

  11. Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practicial part-of-speech tagger. In: Proceedings of the 3rd Conference on Applied Natural Language Processing (1992)

    Google Scholar 

  12. Brill, E., Wu, J.: Classifier Combination for Improved Lexical Disambiguation. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (1998)

    Google Scholar 

  13. Halteren, V., Zavrel, J., Daelemans, W.: Improving Accuracy in Word Class Tagging through the Combination of Machine Learning Systems. Computational linguistics 27, 199–229 (2001)

    Article  Google Scholar 

  14. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  15. Cohen, W.: Fast Effective Rule Induction. In: Proceedings of the 12th ICML (1995)

    Google Scholar 

  16. Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: Shavlik, J. (ed.) Proceedings of the 15th International Conference on Machine Learning, Madison, Wisconsin, pp. 144–151 (1998)

    Google Scholar 

  17. Amrani, A., Azé, J., Kodratoff, Y.: Etiq: Logiciel d’aide à l’étiquetage morpho-syntaxique de textes de spécialité. Dans la revue RNTI, numéro spécial EGC 2005 (session démonstrations) E3, 673–678 (2005)

    Google Scholar 

  18. Halliday, M.A.K.: System and Function in Language. Oxford University Press, London (1976)

    Google Scholar 

  19. Roche, M., Heitz, T., Matte-Tailliez, O., Kodratoff, Y.: EXIT: Un système itératif pour l’extraction de la terminologie du domaine à partir de corpus spécialisés. In: International Conference on Statistical Analysis of Textual Data (JADT 2004), pp. 946–956 (2004)

    Google Scholar 

  20. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16, 22–29 (1990)

    Google Scholar 

  21. Daille, B., Gaussier, E., Langé, J.: An evaluation of statistical scores for word association. In: The Tbilisi Symposium on Logic, Language and Computation, pp. 177–188. CSLI Publications, Stanford (1998)

    Google Scholar 

  22. Nerima, L., Seretan, V., Wehrli, E.: Creating a multilingual collocations dictionary from large text corpora. In: Proceedings of Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 131–134 (2003)

    Google Scholar 

  23. Roche, M.: Intégration de la construction de la terminologie de domaines spécialisés dans un processus global de fouille de textes. Thèse de Doctorat en Informatique (PhD thesis), Université Paris-Sud, France (2004)

    Google Scholar 

  24. Xu, F., Kurz, D., Piskorski, J., Schmeier, S.: A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and their Relations with Bootstrapping. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation (2002)

    Google Scholar 

  25. Dunning, T.E.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  26. Thanopoulos, A., Fakotakis, N., Kokkianakis, G.: Comparative Evaluation of Collocation Extraction Metrics. In: Proceedings of 3rd International Conference on Language Resources and Evaluation (LREC 2002), vol. 2, pp. 620–625 (2002)

    Google Scholar 

  27. Aussenac-Gilles, N., Bourigault, D.: The Th(IC)2 Initiative: Corpus-Based Thesaurus Construction for Indexing WWW Documents. In: Proceedings of the EKAW 2000 Workshop on Ontologies and Texts, vol. 51 (2000)

    Google Scholar 

  28. Alphonse, E., Rouveirol, C.: Lazy Propositionalisation for Relational Learning. In: Proceedings of the 14th European Conference on Artificial Intelligence, pp. 256–260 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amrani, A., Roche, M., Kodratoff, Y., Matte-Tailliez, O. (2005). Inductive Improvement of Part-of-Speech Tagging and Its Effect on a Terminology of Molecular Biology. In: Kégl, B., Lapalme, G. (eds) Advances in Artificial Intelligence. Canadian AI 2005. Lecture Notes in Computer Science(), vol 3501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424918_38

Download citation

  • DOI: https://doi.org/10.1007/11424918_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25864-3

  • Online ISBN: 978-3-540-31952-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics