Skip to main content

Correction orthographique de requêtes: L’apport des distances de Levenshtein et Stoilos

  • Chapter
Systèmes d’information pour l’amélioration de la qualité en santé

Part of the book series: Informatique et Santé ((INFORMATIQUE,volume 1))

  • 594 Accesses

Abstract

Background: Medical text repositories not only constitute a significant amount of data but also represent an interesting scientific test bed for those willing to apply natural language processing to information retrieval. In order to improve retrieval performance of the Catalogue and Index of Health Resources in French (CISMeF) and its search tool Doc’CISMeF, we tested a new method to correct misspellings of the queries written by the users. Methods: In addition to exact phonetic term matching, we tested two approximate string comparators. The approximate comparators are the string distance metric of Stoilos and the Levenshtein edit distance. We also calculated the results of the two-combined algorithm to examine whether it improves misspelling correction of the queries. Results: At a threshold comparator score of 0.2, the normalized Levenshtein algorithm achieved the highest recall of 76% but the highest precision 94% is achieved by combining the distances of Levenshtein and Stoilos. Conclusion: Although the well-known good performance of the normalized edit distance of Levenshtein, we have demonstrated in this paper that its combination with the Stoilos algorithm improves the results for misspelling correction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Références

  1. Grannis SJ, Overhag MJ, Mc Donald C. Real World Performance of Approximate String Comparators for use in Patient Matching. Stud Health Technol Inform 2004; 107: 43–47

    Google Scholar 

  2. Levenshtein VI. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Dokl 1965; 10: 707–10

    MathSciNet  Google Scholar 

  3. Yarkoni T, Balota D, Yap M. Moving beyond Coltheart’s N: A new measure of orthographic similarity. Psychonomic Bulletin & Review 2008; 971–9

    Google Scholar 

  4. Soualmia LF. Étude et évaluation d’approches multiples d’expansion de requêtes pour une recherche d’information intelligente: application au domaine de la santé sur l’internet. Thèse INSA de Rouen, 2004

    Google Scholar 

  5. Douyère M, Soualmia LF, Névéol A, Rogozan A, Dahamna B, Leroy JP, Thirion B, Darmoni S. Enhancing the MeSH thesaurus to retrieve French online health resources in a quality-controlled gateway. Health Info Libr J 2004; 21(4): 253–61

    Article  Google Scholar 

  6. Stoilos G, Stamou G, Kollias S. A string Metric for Ontology Alignment. International Semantic Web Conference 2005; 624–37

    Google Scholar 

  7. Darmoni S, Leroux V, Thirion B, Santamaria P, Gea M. Netscoring: critères de qualité de l’information de santé sur internet. Les enjeux des industries du savoir, 1999; 29–44

    Google Scholar 

  8. Brouard F. L’art des »soundex «, 2004. Disponible sur: 〈http://sqlpro.developpez.com/cours/soundex/〉 (Consulté le 26.10.2010)

    Google Scholar 

  9. Yujian L, Bo L. A Normalized Levenshtein Distance Metric. IEEE Transactions on Pattern Analysis and Machine Intelligence 2007; 1091–5

    Google Scholar 

  10. Mazuel L, Charlet J. Alignement entre des ontologies de domaine et la SNOMED: trois études de cas. Actes des 20es Journées francophones d’ingénierie des connaissances, IC2009; 1–12

    Google Scholar 

  11. Winkler W. The state record linkage and current research problems. Technical report: Statistics of Income Division, Internal Revenue Service Publication, 1999

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stéfan J. Darmoni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag France

About this chapter

Cite this chapter

Moalla, Z., Soualmia, L.F., Prieur-Gaston, É., Darmoni, S.J. (2011). Correction orthographique de requêtes: L’apport des distances de Levenshtein et Stoilos. In: Staccini, P.M., Harmel, A., Darmoni, S.J., Gouider, R. (eds) Systèmes d’information pour l’amélioration de la qualité en santé. Informatique et Santé, vol 1. Springer, Paris. https://doi.org/10.1007/978-2-8178-0285-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-2-8178-0285-5_1

  • Publisher Name: Springer, Paris

  • Print ISBN: 978-2-8178-0284-8

  • Online ISBN: 978-2-8178-0285-5

Publish with us

Policies and ethics