Abstract
The recognition of index terms is a very frequent way of answering a query through the identification of documents which fit the query. However, these terms can vary in their form (singular or plural, verbal or nominal form, etc.) and become difficult to identify. We present the flexible-equality of terms which determines whether two terms can be considered as variations one from the other. This operator is based on the minimum editing distance between two strings and has been extended to complex terms (composed of several words). This operator does not require large amounts of linguistic resources other than a list of functional words. We evaluate the performances of the algorithm by comparison with the results of FASTR, a system dedicated to the recognition of complex terms, on an English corpus. This system is based on linguistic treatments and uses linguistic resources. We found a rate of recall of 76,6% and a rate of precision of 96,3% on complex terms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bescherelle. L’art de l’orthographe. vol.2, Hatier, 1980.
Enguehard, C., Pantéra, L., “Automatic Natural Acquisition of a Terminology”, Journal of quantitative linguistics, vol.2, n°1, p. 27– 32, 1995.
Harman, D. How effective is suffixing ?. J. of the American Society for Information Science. n°42,p. 7– 15, 1991.
Jacquemin, C., Tzoukermann, E. NLP for term variant extraction: Synergy of morphology, lexicon, and syntax. in T. Strzalkowski, éditeur, Natural Language Information Retrieval. Kluwer, Boston, MA, pp. 25 – 74, 1990
Levenshtein, V. I. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics-Doklady. vol.10, n°8, pp. 707– 710, 1966.
Lovins, J. B. Development of a stemming algorithm. Mechanical Translation and Computational Linguistics. n°11, pp. 22 –31, 1968.
Meyer, I., Mackintosh, K. The corpus from a terminographer’s viewpoint. Internation-al Journal of Corpus Linguistics. vol.1, n°2, pp. 257 –285, 1996.
Nkwenti-Azeh, B., “Positional and combinational characteristics of terms: consequences for corpus-based terminology”, Terminology, vol.1 (1), 99. 61 – 95, 1994.
Odell, M. K., Russel, R. C., US Patent N°1261167 (1918) et 1435663 (1922).
Pérennou, G., Daubez, P., Lahens, F. La vérification et la correction automatique de textes: le système VORTEX. Techniques et Sciences Informatique. vol.5, n°4, pp. 285 –306, 1986.
Pollock, J.J., Zamora, A. Automatic spelling correction in scientific and scholarly text. communication of the ACM. vol.27, n°4, pp. 358 – 368, 1984.
Popovic, M., Willett, P. The effectiveness of stemming for natural-language access to Slovene textual data. J. of the American Society for Information Science. 43, pp. 384 – 390, 1992.
Stephen, G. A., String searching algorithms. Lecture notes series on computing. vol. 3, 1994.
Wagner, R.A., Fischer, M.J. The string-to-string correction problem. J. of the Association for Computing Machinery. vol.21, n°1, p.168–173, January, 1974.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Enguehard, C. (2001). Flexible-Equality of Terms: Definition and Evaluation. In: Larsen, H.L., Andreasen, T., Christiansen, H., Kacprzyk, J., Zadrożny, S. (eds) Flexible Query Answering Systems. Advances in Soft Computing, vol 7. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1834-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-7908-1834-5_27
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-1347-0
Online ISBN: 978-3-7908-1834-5
eBook Packages: Springer Book Archive