Skip to main content

Reducing Information Variation in Text

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2705))

Abstract

We discuss the nature and the scope of linguistic (morphological, syntactic and semantic) variation of terms and its impact on two information retrieval tasks: term acquisition and automatic indexing. A review of natural language processing techniques existing in these two areas is done, along with an in-depth presentation of FASTR, a corpus processor for the recognition, normalization, and acquisition of multi-word terms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abeillé, A.: Les nouvelles syntaxes. Grammaires d’unification et analyse du franais. Armand Colin, Paris (1993)

    Google Scholar 

  2. Abeillé, A.: Grammaires et analyseurs syntaxiques. In: Pierrel, J.-M. (ed.) Ingénierie des langues. Hermes Sciences, Paris (2000)

    Google Scholar 

  3. Abney, S.: Partial parsing via finite-state cascade. In: Proceedings, Workshop on Robust Parsing, 8th European Summer Schol in Logic, Language and Information, Prague, Czech Republic, pp. 8–15 (1996)

    Google Scholar 

  4. AGROVOC. AGROVOC - Multilingual Agricultural Thesaurus. Food and Agricultural Organization of the United Nations (1995), http://www.fao.org/catalog/Book/products/v9669e.htm

  5. Ambroziak, J., Woods, W.A.: Natural language technology in precision content retrieval. In: Proceedings, Natural Language Processing and Industrial Applications (NLP+IA 1998), Moncton, New Brunswick. University of Moncton (1998)

    Google Scholar 

  6. Andreewsky, A., Debili, F., Fluhr, C.: Computational learning of semantic lexical relations for the generation and automatic analysis of content. In: Proceedings, IFIP Congress, Toronto. pp. 667–673. IFIP (1977)

    Google Scholar 

  7. Arampatzis, A.T., Koster, C.H.A., Tsoris, T.: IRENA: Information retrieval engine based on natural language analysis. In: Proceedings, Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1997), Montreal, pp. 159–175. CID, Paris (1997)

    Google Scholar 

  8. Arampatzis, A.T., Tsoris, T., Koster, C.H.A., van der Weide., T.P.: Phrase-based information retrieval. Information Processing and Management 34(6), 693–707 (1998)

    Article  Google Scholar 

  9. Arppe,A.: Term extraction from unrestricted text (1995), http://www.lingsoft.fi/doc/nptool/termextraction.html

  10. Barkema, H.: Determining the syntactic flexibility of idioms. In: Fries, U., Tottie, G., Schneider, P. (eds.) Creating and using English language corpora, Rodopi, Amsterdam, pp. 39–52 (1994)

    Google Scholar 

  11. Boguraev, B.K., Jones, K.S.: A natural language front end to databases with evaluative feedback. In: Boguraev, B.K., Jones, K.S. (eds.) New Applications of Databases, Academic Press, London (1984)

    Google Scholar 

  12. Bourigault, D., Slodzian, M.: Pour une terminologie textuelle. Terminologies Nouvelles 19 (1999)

    Google Scholar 

  13. Bourigault, D.: An endogeneous corpus-based method for structural noun phrase disambiguation. In: Proceedings, Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL 1993), Utrecht, pp. 81–86. ACL (1993)

    Google Scholar 

  14. Bourigault, D.: LEXTER un Logiciel d’EXtraction de TERminologie. Application à l’extraction des connaissances à partir de textes. Thèse en mathématiques, informatique appliquée aux sciences de l’homme, École des Hautes Études en Sciences Sociales, Paris (1994)

    Google Scholar 

  15. Bourigault, D.: LEXTER, a Natural Language tool for terminology extraction. In: Proceedings, Seventh EURALEX International Congress, Göteborg, pp. 771–779. EURALEX (1996)

    Google Scholar 

  16. Bourigault, D., Jacquemin, C.: Term extraction + term clustering: An integrated platform for computer-aided terminology. In: Proceedings, Ninth Conference of the European Chapter of the Association for Computational Linguistics (EACL 1999), Bergen, pp. 15–22. ACL (1999)

    Google Scholar 

  17. Bresnan, J. (ed.): The Mental Representation of Grammatical Relations. MIT Press, Cambridge (1992)

    Google Scholar 

  18. Brill, E.: A simple rule-based part of speech tagger. In: Proceedings, Third Conference on Applied Natural Language Processing (ANLP 1992), Trento, pp. 152–55. ACL (1992)

    Google Scholar 

  19. Brown, P.L., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)

    Google Scholar 

  20. Byrd, R.J., Klavans, J.L., Aronoff, M., Anshen, F.: Computer methods for morphological analysis. In: Proceedings, 24th Annual Meeting of the Association for Computational Linguistics (ACL 1986), New York, pp. 120–127. ACL (1986)

    Google Scholar 

  21. Castellví, M.T.C., Bagot, R.E., Palatresi, J.V.: Automatic term detection: A review of current systems. In: Bourigault, D., Jacquemin, C., L’Homme, M.-C. (eds.) Recent Advances in Computational Terminology, John Benjamins, Amsterdam (2001)

    Google Scholar 

  22. Chanod, J.-P., Tapanainen, P.: Statistical and constraint-based taggers for french. Technical report, Xerox Research Centre Europe, Grenoble, France (1994)

    Google Scholar 

  23. Charniak, E.: Statistical Language Learning. A Bradford Book. MIT Press, Cambridge (1993)

    Google Scholar 

  24. Chen, K.-H., Chen, H.-H.: Extracting noun phrases from large-scale texts: A hybrid approach and its automatic evaluation. In: Proceedings, 32nd Annual Meeting of the Association for Computational Linguistics (ACL 1994), Las Cruces, NM, pp. 234–241. ACL (1994)

    Google Scholar 

  25. Church, K.W., Hanks, P.: Word association norms, Mutual Information and lexicography. Computational Linguistics 16(1), 22–29 (1990)

    Google Scholar 

  26. Clemenceau, D.: Finite-state morphology: Inflections and derivations in a single framework using dictionaries and rules. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing, pp. 383–406. MIT Press, Cambridge (1997)

    Google Scholar 

  27. Courtois, B.: Un système de dictionnaires électroniques pour les mots simples du franais. Langue Française 87 (1990)

    Google Scholar 

  28. Daciuk, J., Mihov, S., Watson, B., Watson, R.: Incremental construction of minimal acyclic finite state automata. Computational Linguistics 26(1), 3–16 (2000)

    Article  MathSciNet  Google Scholar 

  29. Dagan, I., Church, K.W.: Termight: Identifying and translating technical terminology. In: Proceedings, Fourth Conference on Applied Natural Language Processing (ANLP 1994), Stuttgart, pp. 34–40. ACL (1994)

    Google Scholar 

  30. Daille, B.: Approche mixte pour l’extraction de terminologie: Statistique lexicale et filtres linguistiques. In: Thèse en informatique fondamentale, Université de Paris 7, Paris (1994)

    Google Scholar 

  31. Daille, B.: Study and implementation of combined techniques for automatic extraction of terminology. In: Klavans, J.L., Resnik, P. (eds.) The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)

    Google Scholar 

  32. Dal, G., Hathout, N., Namer, F.: Construire un lexique dérivationnel: Théorie et réalisations. In: Proceedings, Conférence de Traitement Automatique du Langage Naturel (TALN 1999), Cargèse, pp. 115–124. ATALA, Paris (1999)

    Google Scholar 

  33. David, S., Plante, P.: De la nécessité d’une approche morpho-syntaxique dans l’analyse de textes. Intelligence Artificielle et Sciences Cognitives au Québec 3(3), 140–154 (1990)

    Google Scholar 

  34. David, S., Plante, P.: Le progiciel TERMINO: de la nécessité d’une analyse morphosyntaxique pour le dépouillement terminologique des textes. In: Colloque International sur les Industries de la Langue: Perspectives des Années 1990, Montréal, pp. 71–88 (1990); Office de la Langue Fran caise et Société des Traducteurs du Quebec

    Google Scholar 

  35. Debili, F.: Analyse syntaxico-sémantique fondée sur une acquisition automatique de relations lexicales-sémantiques. Thèse de doctorat d’état en sciences informatiques, University of Paris 11, Orsay (1982)

    Google Scholar 

  36. Dice, L.R.: Measures of the amount of ecologic association between species. Journal of Ecology 26, 297–302 (1945)

    Article  Google Scholar 

  37. Dillon, M., Gray, A.S.: FASIT: A fully automatic syntactically based indexing system. Journal of the American Society for Information Science 34(2), 99–108 (1983)

    Article  Google Scholar 

  38. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  39. Enguehard, C., Pantera, L.: Automatic natural acquisition of a terminology. Journal of Quantitative Linguistics 2(1), 27–32 (1995)

    Article  Google Scholar 

  40. Evans, D.A., Ginther-Webster, K., Hart, M., Lefferts, R.G., Monarch, I.A.: Automatic indexing using selective NLP and first-order thesauri. In: Proceedings, Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1991), Barcelona, pp. 624–643. CID, Paris (1991)

    Google Scholar 

  41. Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings, 34th Annual Meeting of the Association for Computational Linguistics (ACL 1996), Santa Cruz, pp. 17–24. ACL (1996)

    Google Scholar 

  42. Fagan, J.L.: Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In: Proceedings, Tenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1987), pp. 91–101. ACM, New York (1987)

    Chapter  Google Scholar 

  43. Fano, R.M.: Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge (1961)

    Google Scholar 

  44. Frantzi, K.T., Ananiadou, S.: Retrieving collocations by co-occurrences and word order constraints. In: Proceedings, 16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, pp. 41–46. ACL (1996)

    Google Scholar 

  45. Frenkel, K.A.: The human genome project and informatics. Communications of the ACM 34(11), 41–51 (1991)

    Article  Google Scholar 

  46. Friburger, N., Maurel, D.: Finite-state transducer cascade to extract proper nouns in texts. In: Proceedings, 6th Conference on Implementations and Applications of Automata, Pretoria, South Africa, pp. 97–106 (2001)

    Google Scholar 

  47. Fung, P.: Using Word Signature Features for Terminology Translation from Large Corpora. PhD dissertation, Graduate School of Arts and Science, Columbia University, New York (1997)

    Google Scholar 

  48. Gaál, T.: Is this finite-state transducer sequentiable? In: Proceedings, 6th Conference on Implementations and Applications of Automata, Pretoria, South Africa, pp. 107–115 (2001)

    Google Scholar 

  49. Gaussier, É.: Flow network models for word alignment and terminology extraction from bilingual corpora. In: Proceedings, 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, pp. 444–450. ACL (1998)

    Google Scholar 

  50. Gazdar, G., Klein, E., Pullum, G.K., Sag, I.A.: Generalized Phrase Structure Grammar. Harvard University Press, Cambridge (1985)

    Google Scholar 

  51. Gonzalo, J., Peñas, A., Verdejo, F.: Lexical ambiguity and information retrieval revisited. In: Proceedings, Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 1999), University of Maryland, CollegePark, pp. 195–203. ACL (1999)

    Google Scholar 

  52. Gouadec, D. (ed.): Terminologie et Phraséologie pour Traduire - Le concordancier du Traducteur, La Maison du Dictionnaire, Paris (1997)

    Google Scholar 

  53. Gross, G.: Degré de figement des noms composés. Langages 90, 57–72 (1988)

    Article  Google Scholar 

  54. Gross, M.: Grammaire transformationnelle du française, 2: Syntaxe du nom. Systématique de la langue française, Cantilène, Paris (1986)

    Google Scholar 

  55. Guilbert, L.: La formation du vocabulaire de l’aviation, Larousse, Paris (1965)

    Google Scholar 

  56. Habert, B.: OLMES: a versatile and extensible parser in CLOS. In: Proceedings, Fourth International Conference on Technology of Object-Oriented Languages and Systems (TOOLS 1991), Paris, pp. 149–160. Prentice-Hall, Englewood Cliffs (1991)

    Google Scholar 

  57. Habert, B., Jacquemin, C.: Noms composés, termes, dénominations complexes: Problématiques linguistiques et traitements automatiques. Traitement automatique des langues 34(2), 5–42 (1993)

    Google Scholar 

  58. Hall, P.A., Dowling, G.R.: Approximate string matching. Computing Surveys 12(4), 381–402 (1980)

    Article  MathSciNet  Google Scholar 

  59. Hamon, T., Nazarenko, A., Gros, C.: A step towards the detection of semantic variants of terms in technical documents. In: Proceedings, 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, pp. 498–504. ACL (1998)

    Google Scholar 

  60. Harris, Z.S.: Mathematical Structure of Language. John Wiley, New York (1968)

    Google Scholar 

  61. Heidorn, G.E.: Augmented phrase structure grammars. In: Schank, R., Nash- Webber, B.L. (eds.) Theoretical Issues in Natural Language Processing: An Interdisciplinary Workshop in Computational Linguistics, Psychology, Linguistics, and Artificial Intelligence, pp. 10–13 Lawrence Erlbaum Associates, Hillsdale (1975)

    Google Scholar 

  62. Hobbs, J.R., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: FASTUS: A cascaded finite-state transducer for extracting information from natural-language text. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing, pp. 383–406. MIT Press, Cambridge (1997)

    Google Scholar 

  63. Hopcroft, J.E.: An n log n algorithm for minimizing the states of in a finite automaton. In: Kohavi, Z., Paz, A. (eds.) The Theory of Machines and Computations, pp. 189–196. Academic Press, New York (1971)

    Google Scholar 

  64. Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading (1979)

    MATH  Google Scholar 

  65. Ikehara, S., Shirai, S., Uchino, H.: A statistical method for extracting uninterrupted and interrupted collocations from very large corpora. In: Proceedings, 16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, pp. 574–579. ACL (1996)

    Google Scholar 

  66. Jacquemin, C.: Optimizing the computational lexicalization of large grammars. In: Proceedings, 32nd Annual Meeting of the Association for Computational Linguistics (ACL 1994), Las Cruces, NM, pp. 196–203. ACL (1994)

    Google Scholar 

  67. Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: Proceedings, 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), University of Maryland, CollegePark, pp. 341–348. ACL (1999)

    Google Scholar 

  68. Jacquemin, C.: Spotting and Discovering Terms through NLP. MIT Press, Cambridge (2001)

    Google Scholar 

  69. Jacquemin, C., Daille, B., Royauté, J., Polanco, X.: In vitro evaluation of a program for machine-aided indexing. Information Processing and Management (2001) (forthcoming)

    Google Scholar 

  70. Jacquemin, C., Klavans, J.L., Tzoukermann, E.: Expansion of multiword terms for indexing and retrieval using morphology and syntax. In: Proceedings, 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (ACL-EACL 1997), Madrid, pp. 24–31. ACL (1997)

    Google Scholar 

  71. Jacquemin, C., Tzoukermann, E.: NLP for term variant extraction: A synergy of morphology, lexicon, and syntax. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval, pp. 25–74. Kluwer Academic Publisher, Boston (1999)

    Google Scholar 

  72. Joshi, A.K.: An introduction to Tree Adjoining Grammars. In: Manaster-Ramer, A. (ed.) Mathematics of Language, pp. 87–115. John Benjamins, Amsterdam (1987)

    Google Scholar 

  73. Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1), 9–27 (1995)

    Article  Google Scholar 

  74. Kaplan, R., Kay, M.: Regular models of phonological rule systems. Computational Linguistics 20(3) (1994)

    Google Scholar 

  75. Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A. (eds.): Constraint Grammar A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)

    Google Scholar 

  76. Kay, M.: Algorithm schemata and data structures in syntactic processing. In: Proceedings, Nobel Symposium on Text Processing, Gotheborg, Danemark, pp. 35–70 (1980); reprint in Grosz, B., Sparck Jones, K., Webber, B. (eds.): Readings in Natural Language Processing. Morgan Kaufman, San Francisco

    Google Scholar 

  77. Keen, E.M.: On the generation and searching of entries is printed subject indexes. Journal of Documentation 33(1), 15–45 (1977)

    Article  Google Scholar 

  78. Klavans, J.L., Jacquemin, C., Tzoukermann, E.: A natural language approach to multi-word term conflation. In: DELOS Workshop on Cross-Language Information retireval, ETHZ, Zurich, Switzerland (1997) ERCIM: European Consortium for Informatics and Mathematics

    Google Scholar 

  79. Klavans, J.L., Resnik, P. (eds.): The Balancing Act: Combining Symbolic and Statistical Approaches to Language. MIT Press, Cambridge (1996)

    Google Scholar 

  80. Kornai, A.: Extended Finite State Models of Language. Cambridge University Press, Cambridge (1999)

    MATH  Google Scholar 

  81. Koskenniemi, K.: Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. PhD dissertation, University of Helsinki, Helsinki (1983)

    Google Scholar 

  82. Laporte, E.: Rational transductions for phonetic conversion and phonology. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing, MIT Press, Cambridge (1997)

    Google Scholar 

  83. Laporte, E., Monceaux, A.: Elimination of lexical ambiguities by grammars: the ELAG system. Linguisticae Investigationes 22, John Benjamins Publishing Company (1998)

    Google Scholar 

  84. Lauriston, A.: Automatic recognition of complex terms: Problems and the TERMINO solution. Terminology 1(1), 147–170 (1994)

    Article  Google Scholar 

  85. Lovins, J.B.: Development of a stemming algorithm. Translation and Computational Linguistics 11(1), 22–31 (1968)

    Google Scholar 

  86. Mathieu-Colas, M.: Orthographe et informatique: Établissement d’un dictionnaire électronique des variantes graphiques. Langue Française 87, 104–111 (1990)

    Article  Google Scholar 

  87. Melishar, B., Skryja, J.: On the size of deterministic finite automata. In: Proceedings, 6th Conference on Implementations and Applications of Automata, Pretoria, South Africa, pp. 203–216 (2001)

    Google Scholar 

  88. Metzler, D.P., Haas, S.W.: The Constituent Object Parser: Syntactic structure matching for Information Retrieval. ACM Transactions on Information Systems 7(3), 292–316 (1989)

    Article  Google Scholar 

  89. Metzler, D.P., Haas, S.W., Cosic, C.L., Weise, C.A.: Conjunction ellipsis, and other discontinuous constituents in the Constituent Object Parser. Information Processing and Management 26(1), 53–71 (1990)

    Article  Google Scholar 

  90. Metzler, D.P., Haas, S.W., Cosic, C.L., Wheeler, L.H.: Constituent Object Parsing for Information Retrieval and similar text processing problems. Journal of the American Society for Information Science 40(6), 398–423 (1989)

    Article  Google Scholar 

  91. Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An analysis of statistical and syntactic phrases. In: Proceedings, Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1997), Montreal, pp. 200–214. CID, Paris (1997)

    Google Scholar 

  92. Mohri, M.: Compact representations by finite-state transducers. In: Proceedings, 32nd Annual Meeting of the Association for Computational Linguistics (ACL 1994), Las Cruces, NM, pp. 204–208. ACL (1994)

    Google Scholar 

  93. Monceaux, A.: Le dictionnaire des mots simples anglais: mots nouveaux et variantes orthographiques. Sèrie Informes IGM 95-15, Institut Gaspard Monge, Université de Marnela-Vallée, Noisy-le-Grand, France (1995)

    Google Scholar 

  94. Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics 22(1), 73–89 (1996)

    Google Scholar 

  95. Pollard, C., Sag, I.A.: Information-Based Syntax and Semantics. Volume 1: Fundamentals. CSLI Lecture Notes, vol. 13. Chicago University Press, Chicago (1987)

    Google Scholar 

  96. Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980)

    Google Scholar 

  97. Roche, E.: Parsing with finite state transducers. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing, MIT Press, Cambridge (1997)

    Google Scholar 

  98. Roche, E., Schabes, Y.: Deterministic part-of-speech tagging with finitestate transducers. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing, pp. 205–240. MIT Press, Cambridge (1997)

    Google Scholar 

  99. Sager, J.C.: A Practical Course in Terminology Processing. John Benjamins, Amsterdam (1990)

    Google Scholar 

  100. Sager, N.: Natural Language Information Processing: A Computer Grammar of English and Its Applications. Addison-Wesley, Reading (1981)

    Google Scholar 

  101. Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  102. Salton, G., Lesk, M.E.: Computer evaluation og indexing and text processing. Journal of the Association for Computational Machinery 15(1), 8–36 (1968)

    MATH  Google Scholar 

  103. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  104. Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. Journal of the American Society for Information Science 26(1), 33–44 (1975)

    Article  Google Scholar 

  105. Savary, A.: Recensement et description des mots composés — méthodes et applications. Thèse de doctorat, Université de Marne-la-Vallée, Noisy-le-Grand, France (2000)

    Google Scholar 

  106. Savary, A.: Etude comparativee de deux outils d’acquisition de termes complexes. In: Proceedings, Conference Terminologie et Intelligence Artificielle (TIA-2001), INIST-CNRS, Nancy (2001)

    Google Scholar 

  107. Schabes, Y., Abeillé, A., Joshi, A.: Parsing strategies with ‘lexicalized’ grammars. In: Proceedings, 12th International Conference on Computational Linguistics (COLING 1988), Budapest, pp. 578–583. ACL (1988)

    Google Scholar 

  108. Schabes, Y., Joshi, A.K.: Parsing with Lexicalized Tree Adjoining Grammar. In: Tomita, M. (ed.) Current Issues in Parsing Technologies, Kluwer Academic Publisher, Boston (1990)

    Google Scholar 

  109. Schwarz, C.: Content-based text handling. Information Processing and Management 26(2), 219–226 (1989)

    Article  Google Scholar 

  110. Schwarz, C.: Automatic syntactic analysis of free text. Journal of the American Society for Information Science 41(6), 408–417 (1990)

    Article  Google Scholar 

  111. Sheridan, P., Smeaton, A.F.: The application of morpho-syntactic language processing to effective phrase matching. Information Processing and Management 28(3), 349–369 (1992)

    Article  Google Scholar 

  112. Shieber, S.M.: An Introduction to Unification-Based Approaches to Grammar. CSLI Lecture Notes, vol. 4. Chicago University Press, Chicago (1986)

    Google Scholar 

  113. Shimohata, S., Sugio, T., Nagata, J.: Retrieving collocations by cooccurrences and word order constraints. In: Proceedings, 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (ACL-EACL 1997), Madrid, pp. 476–481. ACL (1997)

    Google Scholar 

  114. Silberztein, M.: Dictionnaires électroniques et analyse automatique de textes: Le système INTEX, Masson, Paris (1993)

    Google Scholar 

  115. Smadja, F.: Xtract: An overview. Computer and the Humanities 26, 399–413 (1993)

    Article  Google Scholar 

  116. Smadja, F., McKeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22(1), 1–38 (1996)

    Google Scholar 

  117. Smeaton, A.F., Sheridan, P.: Using morpho-syntactic language analysis in phrase matching. In: Proceedings, Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1991), Barcelona, pp. 415–429. CID, Paris (1991)

    Google Scholar 

  118. Jones, K.S., Tait, J.I.: Automatic search term variant generation. Journal of Documentation 40(1), 50–66 (1984)

    Article  Google Scholar 

  119. Jones, K.S., Tait, J.I.: Linguistically motivated descriptive term selection. In: Proceedings, Tenth International Conference on Computational Linguistics (COLING 1984), Stanford, pp. 287–290. ACL (1984)

    Google Scholar 

  120. Srinivas, B., Egedi, D., Doran, C., Becker, T.: Lexicalization and grammar development. In: Proceedings, KONVENS 1994, Vienna, pp. 310–319. ÖGAI (1994)

    Google Scholar 

  121. Strzalkowski, T.: Robust text processing in automatic information retrieval. In: Proceedings, Fourth Conference on Applied Natural Language Processing (ANLP 1994), Stuttgart, pp. 168–173. ACL (1994)

    Google Scholar 

  122. Strzalkowski, T.: Natural language information retrieval. Information Processing and Management 31(3), 397–417 (1995)

    Article  Google Scholar 

  123. Strzalkowski, T., Scheyen, P.G.N.: Evaluation of the Tagged Text Parser. In: Bunt, H., Tomita, M. (eds.) Recent Advances in Parsing Technology, pp. 201–220. Kluwer Academic Publisher, Boston (1996)

    Google Scholar 

  124. Strzalkowski, T., Vauthey, B.: Information retrieval using robust natural language processing. In: Proceedings, 20th Annual Meeting of the Association for Computational Linguistics (ACL 1992), Newark, DE, pp. 104–111. ACL (1992)

    Google Scholar 

  125. Tanimoto, T.T.: An elementary mathematical theory of classification. Technical report, IBM (1958)

    Google Scholar 

  126. Tzoukermann, É., Liberman, M.: A finite-state processor for Spanish. In: Proceedings, 13th International Conference on Computational Linguistics (COLING 1990), Helsinki, ACL (1990)

    Google Scholar 

  127. UMLS. Unified Medical Language System, UMLS Knowledge Source. National Library of Medicine, sixth experimental edition (1995), http://www.nlm.nih.gov/research/umls/UMLSDOC.HTML

  128. Van der Eijk, P.: Automating the acquisition of bilingual terminology. In: Proceedings, Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL 1993), Utrecht, pp. 113–119. ACL (1993)

    Google Scholar 

  129. Véronis, J., Langlais, P.: Evaluation of parallel text alignement systems: Arcade. In: Véronis, J. (ed.) Parallel Text Processing, Kluwer Academic Publisher, Dordrecht (2000)

    Google Scholar 

  130. Palatresi, J.V.: Extracción de candidatos a término mediante combinación de estrategias heterogéneas. Tesi doctoral, Universitat Politécnica de Catalunya, Barcelona, Spain (2001)

    Google Scholar 

  131. Voutilainen, A.: NPtool, A detector of English noun phrases. In: Proceedings, Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, Ohio, pp. 48–57. ACL (1993)

    Google Scholar 

  132. Wagner, R.A., Fisher, M.J.: The string-to-string correction problem. Journal of the Association for Computational Machinery 21(1), 168–173 (1974)

    MATH  Google Scholar 

  133. Watson, B.: Taxonomies and Toolkits of Regular Language Algorithms. PhD. Thesis, University of Technology, Eindhoven, the Netherlands (1995)

    Google Scholar 

  134. Woods, W.A.: Conceptual indexing: A better way to organize knowledge. Technical Report SMLI TR-97-61, Sun Microsystems Laboratories, Mountain View (1997)

    Google Scholar 

  135. Yoshikane, F., Tsuji, K., Kageura, K., Jacquemin, C.: Detecting Japanese term variation in textual corpus. In: Proceedings, Fourth International Workshop on Information Retrieval with Asian Languages (IRAL 1999), Academia Sinica, Taipei, Taiwan, pp. 97–108 (1998)

    Google Scholar 

  136. Zhai, C.: Fast statistical parsing of noun phrases for document indexing. In: Proceedings, Fifth Conference on Applied Natural Language Processing (ANLP 1997), Washington, DC, pp. 312–319. ACL (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Savary, A., Jacquemin, C. (2003). Reducing Information Variation in Text. In: Renals, S., Grefenstette, G. (eds) Text- and Speech-Triggered Information Access. Lecture Notes in Computer Science(), vol 2705. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45115-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45115-0_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40635-8

  • Online ISBN: 978-3-540-45115-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics