Skip to main content

Terminology Mining

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2700))

Abstract

Terminology mining is a major step forward in terminology extraction and covers acquisition and structuring of the candidate terms. We presents a terminology mining method based on linguistic criteria and combined computational methods. In terminology mining, references are made to the acquisition of complex terms, the discovering of new terms, but also, the structuring of the acquired candidate terms. First, the linguistic specifications of terms are given for French and we define a typology of base-terms and their variations. We stress the crucial part of the handling of term variations to build a linguistic structuring, to detect advanced lexicalisation and to obtain an optimised representativity of the candidate term occurrences. Second, we move to the computational methods implemented: shallow parsing, morphological analysis, morphological rule learning and lexical statistics. Third, the system that identifies base terms and their variations, ACABIT (Automatic Corpus-Based Acquisition of Binary Terms) is introduced: its architecture, the languages it applies on and its functions. To conclude, a review of evaluation methods for terminology extraction is presented and results of the efficiency of ACABIT in evaluation campaigns are discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.95
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S.: Parsing By Chunks. In: Berwick, R., Tenny, C. (eds.) Principle-Based Parsing, vol. 44, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)

    Chapter  Google Scholar 

  2. Abney, S.: Part-of-Speech Tagging and Partial Parsing. In: Young, S., Bloothooft, G. (eds.) Corpus-Based Methods in Language and Speech Processing, vol. 2, ch. 4. Kluwer Academic Publishers, Dordrecht (1997)

    Google Scholar 

  3. Adda, G., Mariani, G., Paroubek, J., Rajman, M., Lecomte, J.: L’action GRACE dévaluation de l’assignation des parties du discours pour le français. Langues 2, 119–129 (1999)

    Google Scholar 

  4. Amar, M., Béguin, A., de Brito, M., David, S., L’Homme, M.-C., EI Hadi, W.M., Paroubek, P.: Rapport final auf arc a3: évaluation d’outils d’aide à la construction automatique de terminologie et de relations sémantiques entre termes à partir de corpus. Technical report, Université Charles-de-Gaulle, Lille-III (2001)

    Google Scholar 

  5. Argamon, S., Dagan, I., Krymolowski, Y.: A Memory-Based Approach to Learning Shallow Natural Language Patterns. In: Proceedings, 17th International Conference on Computational Linguistics (COLING 1998), Montreal, Canada, Août (1998)

    Google Scholar 

  6. Berry-Rogghe, G.: The computation of collocations and their relevance in lexical studies. The computer and the literary studies, 102–112 (1973)

    Google Scholar 

  7. Bourigault, D.: Surface grammatical analysis for the extraction of terminological noun phrases. In: Proceedings 14th International Conference on Computational Linguistics (COLING 1992), Nantes, France, pp. 977–981 (1992)

    Google Scholar 

  8. Bourigault, D., Fabre, C.: Approche linguistique pour l’analyse syntaxique en corpus. Cahiers de Grammaire 25, 131–152 (2000)

    Google Scholar 

  9. Bourigault, D., Habert, B.: Evaluation of terminology extractors: Principles and experiments. In: Proceedings of the First International Conference on Language Resdources and Evaluation (LREC 1998), Granada, Spain, pp. 299–305. ELRA (1998)

    Google Scholar 

  10. Burnage, G.: CELEX: A Guide for Users. Center for Lexical Information. University of Nijmegen (1990), http://www.kun.nl/celex/

  11. Chien, L.-F., Chen, C.-L.: Incremental extraction of domain-specific terms from online text resources. In: Bourigault, D., Jacquemin, C., L’Homme, M.-C. (eds.) Recent Advances in Computational Terminology. Natural Language Processing, vol. 2, pp. 89–109. John Benjamins, Amsterdam (2001)

    Chapter  Google Scholar 

  12. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)

    Google Scholar 

  13. Danielle Corbin. Morphologie derivationnelle et structuration du lexique. Tubingen, Niemeyer, 1987.

    Google Scholar 

  14. Corbin, D.: Hypothèses sur les frontières de la composition nominale. Cahiers de grammaire 17, 26–55 (1992)

    Google Scholar 

  15. Daille, B.: Qualitative terminology extraction. In: Bourigault, D., Jacquemin, C., L’Homme, M.-C. (eds.) Recent Advances in Computational Terminology. Natural Language Processing, vol. 2, pp. 149–166. John Benjamins, Amsterdam (2001)

    Chapter  Google Scholar 

  16. Daille, B.: Morphological rule induction for terminology acquisition. In: The 18th International Conference on Computational Linguistics (COLING 2000), Sarrbrucken, Germany, pp. 215–221 (August 2000)

    Google Scholar 

  17. Daille, B., Êric, G., Langé, J.-M.: An evaluation of statistical scores for word association. In: Ginzburg, J., Khasidashvili, Z., Vogel, C., Lévy, J.-J., Entic, V. (eds.) The Tbilisi Symposium on Logic, Language and Computation: Selected Papers, pp. 177–188. CSLI Publications and FoLLI, Stanford (1998)

    Google Scholar 

  18. Daille, B., Habert, B., Jacquemin, C., Royauté, J.: Empirical observation of term variations and principles for their description. Terminology 3(2), 197–257 (1996)

    Article  Google Scholar 

  19. Daille, B., Jacquemin, C.: Lexical database and information access: A fruitful association. In: ELRA (ed.) Proceedings, First International Conference on Language Resources and Evaluation (LREC 1998), Granada, June 1998, pp. 669–673 (1998)

    Google Scholar 

  20. Daille, B., Royauté, J., Polanco, X.: Evaluation d’une plate-forme d’indexation de termes complexes. Traitement automatique des langues (TAL) 41(2), 395–422 (2000)

    Google Scholar 

  21. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  22. Enguehard, C.: Supple equality of terms. In: Arabnia, H.R. (ed.) Proceedings of the International Conference on Artificial Intelligence, ICAI 2000, Las Vegas, Nevada, USA, June 2000, pp. 1239–1245 (2000)

    Google Scholar 

  23. Everitt, B.S.: The Analysis of Contingency Tables, 2nd edn. Chapman & Hall, Boca Raton (1992)

    MATH  Google Scholar 

  24. Gaussier, E.: Unsupervised learning of derivational morphology from inflectional lexicons. In: Proceedings, Workshop on Unsupervised methods for NLP, 37th Annual Meeting of the Association for Computational Linguistics, ACL 1999 (1999)

    Google Scholar 

  25. Goldsmith, J.: Unsupervised Learning of the Morphology of a Natural Language. Computatinal Linguistics 27(2), 153–198 (2001)

    Article  MathSciNet  Google Scholar 

  26. Gross, G.: Degré de figement des noms composés. Langages, Larousse, Paris 90 (1988)

    Article  Google Scholar 

  27. Guilbert, L.: Terminologie et linguistique. In: Siforov, V.I. (ed.) Textes choisis de terminologie, GISTERM, pp. 199–219. Université de Laval, Québec (1981)

    Google Scholar 

  28. Hamon, T., Nazarenko, A.: Detection of synonymy link between terms: Experiment and results. In: Bourigault, D., Jacquemin, C., L’Homme, M.-C. (eds.) Recent Advances in Computational Terminology. Natural Language Processing, vol. 2, pp. 185–208. John Benjamins, Amsterdam (2001)

    Chapter  Google Scholar 

  29. Harris, Z.S.: Mathematical Structures of Language. Wiley, New York (1968)

    MATH  Google Scholar 

  30. Jacquemin, C.: Guessing Morphology from Terms and Corpora. In: Proceedings, 20th annual international ACM SIGIR Conference on Research and Developmentt in Information Retrieval (SIGIR 1997), Philadelphia, PA, USA (1997)

    Google Scholar 

  31. Jones, K.S.: What is the role of NLP in Text Retrieval? In: Strzalkowski, T. (ed.) Natural Language Information Retrieval, pp. 1–24. Kluwer Academic Publishers, Boston (1999)

    Google Scholar 

  32. Kageura, K.: The dynamics of terminology: A theoretico-descriptive study of term formation and terminological growth (2001) (Non published)

    Google Scholar 

  33. Kageura, K., Umino, B.: Methods of automatic term recognition: a review. Terminology 3(2), 259–289 (1996)

    Article  Google Scholar 

  34. Kageura, K., Yoshioka, M., Takeuchi, K., Koyama, K., Tsuji, K., Yoshikane, F.: Recent advances in automatic term recognition: Experiences from the NTCIR workshop on information retrieval and term recognition. Terminology 6(2), 151–173 (2000)

    Article  Google Scholar 

  35. Kageura, K., Yoshioka, M., Tsuji, K., Yoshikane, F., Takeuchi, K., Koyama, K.: Evaluation of the term recognition task. In: NTCIR Workshop 1: Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition, pp. 417–434. NACSIS, Tokyo (1999)

    Google Scholar 

  36. Kinyon, A.: A language-independant shallow-parser compiler. In: Proceedings, 39th Annual Meeting of the Association for Computational Linguistics and 10th Conference of the European Chapter of the Association for Computational Linguistics (ACL-EACL 2001), pp. 322–329 (2001)

    Google Scholar 

  37. Kleiber, G.: Dénomination et relation dénominatives. Langages 76, 77–94 (1984)

    Article  Google Scholar 

  38. Levenshtein, V.I.: Binary codes capable of correcting deletations, insertions and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)

    Google Scholar 

  39. Maynard, D., Ananiadou, S.: Term extraction using a similarity-based approach. In: Bourigault, D., Jacquemin, C., L’Homme, M.-C. (eds.) Recent Advances in Computational Terminology. Natural Language Processing, vol. 2, pp. 261–279. John Benjamins, Amsterdam (2001)

    Chapter  Google Scholar 

  40. Monceaux, A.: La formation des noms composés de structure NOM ADJECTIF. Phd thesis in linguistics, Université de Marne la Vallée (1993)

    Google Scholar 

  41. Morin, E.: Des patrons lexico-syntaxiques pour aider au dépouillement terminologique. Traitement Automatique des Langues 40(1), 143–166 (1999)

    Google Scholar 

  42. Morin, E.: Extraction de liens sémantiques entre termes à partir de corpus de textes techniques. PhD Thesis in Computer Sciences, Université de Nantes (1999)

    Google Scholar 

  43. Paroubek, P., Rajman, M.: Étiquetage morpho-syntaxique. In: Pierrel, J.-M. (ed.) Ingénierie des langues, Hermès, pp. 131–150 (2000)

    Google Scholar 

  44. Pearson, J.: Terms in Context. John Benjamins, Amsterdam (1998)

    Book  Google Scholar 

  45. Péry-Woodley, M.-P.: Quels corpus pour quels traitements automatiques. Traitement automatique des langues (TAL) 36(1-2), 213–232 (1995)

    Google Scholar 

  46. Ramshaw, L., Marcus, M.: Text chunking transformation-based learning. In: 3rd Workshop on Very Large Corpora, pp. 82–94 (1995)

    Google Scholar 

  47. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. Computer Science Series. McGraw Hill, New York (1983)

    MATH  Google Scholar 

  48. Salton, G.: Automatic Text Processing. Addison-Wesley Publishing Company, Reading (1989)

    Google Scholar 

  49. Savary, A.: Recensement et description des mots composés-méthodes et applications. Phd thesis in computer science, Université de Marne-la-Vallée (2000)

    Google Scholar 

  50. Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)

    Google Scholar 

  51. Sta, J.-D.: Comportement statistique des termes et acquisition terminologique a partir de à Traitement automatique des langues (TAL) 36(1- 2), 119–132 (1995)

    Google Scholar 

  52. Tartier, A.: Methodes d’analyse automatique de l’évolution terminologique au travers des variations repérées dans les corpus diachroniques. In: Actes, Qua-trième rencontre Terminologie et Intelligence Artificielle (TIA 2001), pp. 191–200 (2001)

    Google Scholar 

  53. Toussaint, Y., Namer, F., Daille, B., Jacquemin, C., Royauté, J., Hathout, N.: Une approche linguistique et statistique pour l’analyse de l’information en corpus. In: Actes, Cinquième Conférence Nationale sur le Traitement Automatique des Langues Naturelles (TALN 1998), Paris, pp. 182–191 (1998)

    Google Scholar 

  54. Voutilainen, A.: NPTool, a detector of English noun phrases. In: Proceedings of the Workshop of Very Large Corpora, pp. 48–57 (1993)

    Google Scholar 

  55. Wagner, R.A., Fisher, M.J.: The string-to-string correction problem. Journal of the Association for Computing Machinery 21(1), 168–173 (1974)

    Article  MathSciNet  Google Scholar 

  56. Zweigenbaum, P., Grabar, N.: A contribution of Medical Terminology to Medical Processing Resources: Experiments in Morphology Knowledge Acquisition from Thesauri. In: Proceedings, Conference on NLP and Medical Concept, pp. 155–167 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Daille, B. (2003). Terminology Mining. In: Pazienza, M.T. (eds) Information Extraction in the Web Era. SCIE 2002. Lecture Notes in Computer Science(), vol 2700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45092-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45092-4_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40579-5

  • Online ISBN: 978-3-540-45092-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics