Skip to main content

Spelling Correction for Search Engine Queries

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3230))

Abstract

Search engines have become the primary means of accessing information on the Web. However, recent studies show misspelled words are very common in queries to these systems. When users misspell query, the results are incorrect or provide inconclusive information. In this work, we discuss the integration of a spelling correction component into tumba!, our community Web search engine. We present an algorithm that attempts to select the best choice among all possible corrections for a misspelled term, and discuss its implementation based on a ternary search tree data structure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the web. ACM Transactions on Internet Technology 1(1), 2–43 (2001)

    Article  Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)

    Google Scholar 

  3. Bentley, J., Sedgewick, R.: Fast algorithms for sorting and searching strings. In: Proceedings of SODA 1997, the 8th ACM-SIAM Symposium on Discrete Algorithms (1997)

    Google Scholar 

  4. Bentley, J., Sedgewick, R.: Ternary search trees. Dr. Dobb’s Journal 23(4), 20–25 (1998)

    Google Scholar 

  5. Bigert, J.: Probabilistic detection of context-sensitive spelling errors. In: Proceedings of LREC-2004, the 4th International Conference on Language Resources and Evaluation (2004)

    Google Scholar 

  6. Bonfante, A.G.: Uso de redes neurais para correção gramatical do português: Um estudo de caso. Master’s thesis, Instituto de Ciências Matemáticas e da Computação da Universidade de São Paulo, São Carlos, São Paulo, Brazil, Dissertação de Mestrado (1997)

    Google Scholar 

  7. Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of ACL 2000, the 38th Annual Meeting of the Association for Computational Linguistics, pp. 286–293 (2000)

    Google Scholar 

  8. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1–7), 107–117 (1998)

    Article  Google Scholar 

  9. Clément, J., Flajolet, P., Vallée, B.: The analysis of hybrid trie structures. In: Proceedings of DA 1998, the 9th annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp. 531–539 (1998)

    Google Scholar 

  10. Dalianis, H.: Evaluating a spelling support in a search engine. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 183–190. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Damerau, F.J.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3), 171–176 (1964)

    Article  Google Scholar 

  12. Davidson, L.: Retrieval of mis-spelled names in an airline passenger record system. Communications of the ACM 5(3), 169–171 (1962)

    Article  Google Scholar 

  13. Durham, I., Lamb, D.A., Saxe, J.B.: Spelling correction in user interfaces. Communications of the ACM 26(10), 764–773 (1983)

    Article  Google Scholar 

  14. Elmi, M.A., Evens, M.: Spelling correction using context. In: Boitet, C., Whitelock, P. (eds.) Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 360–364. Morgan Kaufmann Publishers, San Francisco (1998)

    Google Scholar 

  15. Fisher, W.M.: A statistical text-to-phone function using n-grams and rules. In: Proceedings of ICASSP 1999, the 1999 IEEE International Conference on Acoustics, Speech and Signal Processing, March 1999, vol. 2, pp. 649–652 (1999)

    Google Scholar 

  16. Hodge, V.J., Austin, J.: An evaluation of phonetic spell checkers. Technical Report YCS 338, Department of Computer Science of the University of York (2001)

    Google Scholar 

  17. Hodge, V.J., Austin, J.: A novel binary spell checker. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, p. 1199. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  18. Kashyap, R.L., Oommen, J.: Spelling correction using probabilistic methods. Pattern Recognition Letters (1985)

    Google Scholar 

  19. Knuth, D.E.: The Art of Computer Programming, Sorting and Searching, 2nd edn., vol. 3. Addison-Wesley Publishing Company, Reading (1982)

    MATH  Google Scholar 

  20. Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), 377–440 (1992)

    Article  Google Scholar 

  21. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  22. Medeiros, J.C.D.: Processamento morfológico e correcção ortográfica do português. Master’s thesis, Instituto Superior Técnico (1995)

    Google Scholar 

  23. Philips, L.: Hanging on the metaphone. Computer Language 7(12), 39–43 (1990)

    Google Scholar 

  24. Philips, L.: The double-metaphone search algorithm. C/C++ User’s Journal 18(6) (June 2000)

    Google Scholar 

  25. Riseman, E.M., Hanson, A.R.: A contextual postprocessing system for error correction using binary n-grams. IEEE Transactions on Computer Systems C-23(5), 480–493 (1974)

    Article  Google Scholar 

  26. Santos, D., Rocha, P.: Evaluating cetempúblico, a free resource for portuguese. In: Proceedings of ACL 2001, the 39th Annual Meeting of the Association for Computational Linguistics, July 2001, pp. 442–449 (2001)

    Google Scholar 

  27. Santos, D., Sarmento, L.: O projecto AC/DC: acesso a corpora / disponibilização de corpora. In: Mendes, A., Freitas, T. (eds.) Actas do XVIII Encontro da Associação Portuguesa de Linguística, October 2002, pp. 705–717 (2002)

    Google Scholar 

  28. Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce english text. Complex Systems 1, 145–168 (1987)

    MATH  Google Scholar 

  29. Silva, M.J.: The case for a portuguese Web search engine. DI/FCUL TR 03–03, Department of Informatics, University of Lisbon (March 2003)

    Google Scholar 

  30. Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction (July 2002)

    Google Scholar 

  31. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Communications of the ACM 1(21), 168–173 (1974)

    MathSciNet  MATH  Google Scholar 

  32. Yannakoudakis, E.J.: Expert spelling error analysis and correction. In: Jones, K.P. (ed.) Proceedings of a Conference held by the Aslib Informatics Group and the Information Retrieval Group of the British Computer Society, March 1983, pp. 39–52 (1983)

    Google Scholar 

  33. Zamora, E.M., Pollock, J.J., Zamora, A.: The use of trigram analysis for spelling error detection. Information Processing and Management 6(17), 305–316 (1981)

    Article  Google Scholar 

  34. Zobel, J., Dart, P.: Phonetic string matching: Lessons from information retrieval. In: Proceedings of SIGIR 1996, the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 166–172 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martins, B., Silva, M.J. (2004). Spelling Correction for Search Engine Queries. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30228-5_33

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23498-2

  • Online ISBN: 978-3-540-30228-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics