Advertisement

Using Part-of-Speech and Word-Sense Disambiguation for Boosting String-Edit Distance Spelling Correction

  • Patrick Ruch
  • Robert Baud
  • Antoine Geissbühler
  • Christian Lovis
  • Anne-Marie Rassinoux
  • Alain Rivière
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2101)

Abstract

We report on the design of a system for correcting spelling errors resulting in non-existent words. The system aims at improving edition of medical reports. Unlike traditional systems, both semantic and syntactic contexts are considered here. The system is organized along three steps. The first module is based on a context independent string-to-string edit distance calculus. The second module, based on the morpho-syntactic context attempts to rank more relevantly the data set provided by the first module, finally a third contextual module processes words with the same part-of-speech by applying some contextual word-sense disambiguation. Modules 2 and 3 are using both hand written rules and data-driven Markovian matrices. A final evaluation shows a significant improvement compared to context-free spelling correction.

Keywords

Edit Distance Spelling Error Lexical Ambiguity Spelling Correction Syntactic Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jurafsky D. and Martin J.H.: Speech and Language Processing, Prentice Hall. London.Google Scholar
  2. 2.
    Hersh W.R., Campbell E.M., Malveau S.E.: Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: a lexical analysis. Proc AMIA Annu Fall Symp (United States), 1997, p580–4Google Scholar
  3. 3.
    Lilley L.L., Guancy R.: Sound-alike cephalosporins. How drugs with similar spellings and sounds can lead to serious errors. Am J Nurs (United States), Jun 1995, 95(6) p14CrossRefGoogle Scholar
  4. 4.
    Lambert B.L.: Predicting look-alike and sound-alike medication errors. Am J Health Syst Pharm (United States), May 15 1997, 54(10) p1161–71Google Scholar
  5. 5.
    Golding A.R., Shabes Y.: Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction. In Proc. of the 34th Annual Meeting of the ACL, Santa Cruz, (1996) p. 71–78.Google Scholar
  6. 6.
    Golding A.R., Roth D.: Applying Winnow to Context-Sensitive Spelling Correction. In Proc of ICML (1996): p 182–190.Google Scholar
  7. 7.
    Mangu L., and Brill E.: Automatic Rule Acquisition for Spelling Correction. In Proc. of ICML, (1997).Google Scholar
  8. 8.
    Peterson, JL.: Computer Programs for Detecting and Correcting Spelling Errors. Computer Practices, Communications of the ACM (1980), vol. 23, number 12.Google Scholar
  9. 9.
    Brill E. and Moore R.C.: An Improved Error Model for Noisy Channel Spelling Correction. Proc. of the 38th Annual Meeting of the ACL, Hong-Kong (2000) p. ?.Google Scholar
  10. 10.
    Mays E., Damereau F., Mercer R.L.: Context based spelling correction. Information Processing and Management, 27(5), (1991), p. 517–522.CrossRefGoogle Scholar
  11. 11.
    Oflazer, K.: Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction. Computational Linguistics (1996), 1–18. Association for Computational Linguistics Eds.Google Scholar
  12. 12.
    Baud R., Lovis C., Ruch P., Rassinoux A.-M.: A Toolset for Medicl Text Processing, in Medical Infobahn for Europe, Proc. of MIE’2000. A. Hasman, B. Blobel, J. Dudeck, R. Engelbrecht, G. Gell, H.-U. Prokosh (eds). IOS Press. (2000).Google Scholar
  13. 13.
    Courtin J., Dujardin D., Kowarski I., Genthial D., De Lima V.L.: Towards a complete detection/correction system. Proc. of the ICCICL, Penang, Malaysia. (1991), p. 158–173.Google Scholar
  14. 14.
    Church K.W., Gale W.A.: Probability scoring for spelling correction. In Stat. Comp. 1., (1991) p. 93–103.Google Scholar
  15. 15.
    Ristad E., and Yanilos P.: Learning String Edit Distance. Int. Conf. on Machine Learning, Morgan Kaufmann. (1997).Google Scholar
  16. 16.
    Rivest R.L.: Learning Decision Lists, in Machine Learning, 2, (1987) 229–246.MathSciNetGoogle Scholar
  17. 17.
    Yarowsky D.: Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proc. of ACL (1994), p. 88–95.Google Scholar
  18. 18.
    Ruch P., Baud R., Bouillon P., Rassinoux A.-M., Robert G.: Tagging medical text: a rulebased experiment, in Medical Infobahn for Europe, Proc. of MIE’2000. A. Hasman, B. Blobel, J. Dudeck, R. Engelbrecht, G. Gell, H.-U. Prokosh (eds). IOS Press. (2000).Google Scholar
  19. 19.
    Ruch P., Baud R., Bouillon P., Robert G.: Minimal Commitment and Full Lexical Disambiguation: Balancing Rules and Hidden Markov Models. In Proc. of CoNLL-2000 (ACLSIGNLL). Lisbon. ACL (ed). (2000), p.111–115.Google Scholar
  20. 20.
    Ruch P., Baud R., Bouillon P., Rassinoux A.-M., Scherrer J.-R., MEDTAG: Tag-like Semantics for Medical Document Indexing. In Proc. of the AMIA’99 Annual Symposium. Washington. (1999).Google Scholar
  21. 21.
    Bouillon, P., Baud R., Robert G., Ruch P., Indexing by statistical tagging. In Proc. of the JADT’2000. Lausanne. (2000).Google Scholar
  22. 22.
    Damereau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM, vol. 7, number 3. (1964)Google Scholar
  23. 23.
    Pollock J.J., Zamora A.: Automatic spelling correction in scientific and scholarly text. Computer Practices, Communications of the ACM (1984), vol. 27, number 4.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Patrick Ruch
    • 1
  • Robert Baud
    • 1
  • Antoine Geissbühler
    • 1
  • Christian Lovis
    • 1
  • Anne-Marie Rassinoux
    • 1
  • Alain Rivière
    • 1
  1. 1.Medical Informatics DivisionUniversity Hospital of GenevaGeneva

Personalised recommendations