Skip to main content

State-of-the-Art in Weighted Finite-State Spell-Checking

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

The following claims can be made about finite-state methods for spell-checking: 1) Finite-state language models provide support for morphologically complex languages that word lists, affix stripping and similar approaches do not provide; 2) Weighted finite-state models have expressive power equal to other, state-of-the-art string algorithms used by contemporary spell-checkers; and 3) Finite-state models are at least as fast as other string algorithms for lookup and error correction. In this article, we use some contemporary non-finite-state spell-checking methods as a baseline and perform tests in light of the claims, to evaluate state-of-the-art finite-state spell-checking methods. We verify that finite-state spell-checking systems outperform the traditional approaches for English. We also show that the models for morphologically complex languages can be made to perform on par with English systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: principles, techniques, and tools, vol. 1009. Pearson/Addison Wesley (2007)

    Google Scholar 

  2. Beesley, K.R.: Morphological analysis and generation: A first step in natural language processing. In: First Steps in Language Documentation for Minority Languages: Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation, Proceedings of the SALTMIL Workshop at LREC, pp. 1–8 (2004)

    Google Scholar 

  3. Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Publications (2003)

    Google Scholar 

  4. Bhagat, M.: Spelling Error Pattern Analysis of Punjabi Typed Text. Master’s thesis, Thapar University (2007)

    Google Scholar 

  5. Brants, T., Popat, A.C., Xu, P., Och, F.J., Dean, J.: Large language models in machine translation. In: EMNLP (2007)

    Google Scholar 

  6. Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: ACL 2000: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293. Association for Computational Linguistics, Morristown (2000)

    Google Scholar 

  7. Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Statistics and Computing 1, 93–103 (1991)

    Article  Google Scholar 

  8. Damerau, F.J.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3), 171–176 (1964)

    Article  Google Scholar 

  9. Deorowicz, S., Ciura, M.G.: Correcting spelling errors by modelling their causes. International Journal of Applied Mathematics and Computer Science 15(2), 275 (2005)

    Google Scholar 

  10. Huldén, M.: Foma: a finite-state compiler and library. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session, pp. 29–32. Association for Computational Linguistics (2009)

    Google Scholar 

  11. Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. 24(4), 377–439 (1992)

    Article  Google Scholar 

  12. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics—Doklady 10, 707–710 (1966); Translated from Doklady Akademii Nauk SSSR, 845–848

    Google Scholar 

  13. Max, A., Wisniewski, G.: Mining naturally-occurring corrections and paraphrases from wikipedia’s revision history. In: Proceedings of LREC (2010)

    Google Scholar 

  14. Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Inf. Process. Manage. 27(5), 517–522 (1991)

    Article  Google Scholar 

  15. Mohri, M.: Weighted automata algorithms. In: Handbook of Weighted Automata, pp. 213–254 (2009)

    Google Scholar 

  16. Németh, L.: Hunspell manual. Electronic Software Manual (manpage) (2011)

    Google Scholar 

  17. Norvig, P.: How to write a spelling corrector (2010), http://norvig.com/spell-correct.html (referred January 11, 2011)

  18. Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Comput. Linguist. 22(1), 73–89 (1996)

    Google Scholar 

  19. Otero, J., Graña, J., Vilares, M.: Contextual spelling correction. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds.) EUROCAST 2007. LNCS, vol. 4739, pp. 290–296. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  20. Pirinen, T.A., Hardwick, S.: Effects of weighted finite-state language and error models on speed and efficiency of finite-state spell-checking. In: FSMNLP 2012, pp. 6–14. University of the Basque Country (2012)

    Google Scholar 

  21. Pirinen, T.A., Lindén, K.: Finite-state spell-checking with weighted language and error models. In: Proceedings of the Seventh SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages, Valletta, Malta, pp. 13–18 (2010)

    Google Scholar 

  22. Pirinen, T.A., Lindén, K.: Creating and weighting Hunspell dictionaries as finite-state automata. Investigationes Linguisticae 21 (2010)

    Google Scholar 

  23. Pirinen, T.A., Silfverberg, M., Lindén, K.: Improving finite-state spellchecker suggestions with part of speech n-grams. In: CICLING (2012)

    Google Scholar 

  24. Pitkänen, H.: Hunspell-in kesäkoodi 2006: Final report. Technical report (2006), http://www.puimula.org/htp/archive/kesakoodi2006-report.pdf (referred on September 16)

  25. Raviv, J.: Decision making in Markov chains applied to the problem of pattern recognition. IEEE Transactions on Information Theory 13(4), 536–551 (1967)

    Article  Google Scholar 

  26. Savary, A.: Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 251–260. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  27. Schulz, K., Mihov, S.: Fast string correction with Levenshtein-automata. International Journal of Document Analysis and Recognition 5, 67–85 (2002)

    Article  MATH  Google Scholar 

  28. Shannon, C.E.: A mathematical theory of communications, i and ii. Bell Syst. Tech. J. 27, 379–423 (1948)

    Article  MATH  MathSciNet  Google Scholar 

  29. Wilcox-O’Hearn, A., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 605–616. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  30. Drobac, S., Lindén, K., Pirinen, T., Silfverberg, M.: Heuristic Hyperminimization of Finite-State Lexicons. In: The Proceedings of LREC, Reykavik, Iceland (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pirinen, T.A., Lindén, K. (2014). State-of-the-Art in Weighted Finite-State Spell-Checking. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics