Abstract
The following claims can be made about finite-state methods for spell-checking: 1) Finite-state language models provide support for morphologically complex languages that word lists, affix stripping and similar approaches do not provide; 2) Weighted finite-state models have expressive power equal to other, state-of-the-art string algorithms used by contemporary spell-checkers; and 3) Finite-state models are at least as fast as other string algorithms for lookup and error correction. In this article, we use some contemporary non-finite-state spell-checking methods as a baseline and perform tests in light of the claims, to evaluate state-of-the-art finite-state spell-checking methods. We verify that finite-state spell-checking systems outperform the traditional approaches for English. We also show that the models for morphologically complex languages can be made to perform on par with English systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: principles, techniques, and tools, vol. 1009. Pearson/Addison Wesley (2007)
Beesley, K.R.: Morphological analysis and generation: A first step in natural language processing. In: First Steps in Language Documentation for Minority Languages: Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation, Proceedings of the SALTMIL Workshop at LREC, pp. 1–8 (2004)
Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Publications (2003)
Bhagat, M.: Spelling Error Pattern Analysis of Punjabi Typed Text. Master’s thesis, Thapar University (2007)
Brants, T., Popat, A.C., Xu, P., Och, F.J., Dean, J.: Large language models in machine translation. In: EMNLP (2007)
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: ACL 2000: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293. Association for Computational Linguistics, Morristown (2000)
Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Statistics and Computing 1, 93–103 (1991)
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3), 171–176 (1964)
Deorowicz, S., Ciura, M.G.: Correcting spelling errors by modelling their causes. International Journal of Applied Mathematics and Computer Science 15(2), 275 (2005)
Huldén, M.: Foma: a finite-state compiler and library. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session, pp. 29–32. Association for Computational Linguistics (2009)
Kukich, K.: Techniques for automatically correcting words in text. ACM Comput. Surv. 24(4), 377–439 (1992)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics—Doklady 10, 707–710 (1966); Translated from Doklady Akademii Nauk SSSR, 845–848
Max, A., Wisniewski, G.: Mining naturally-occurring corrections and paraphrases from wikipedia’s revision history. In: Proceedings of LREC (2010)
Mays, E., Damerau, F.J., Mercer, R.L.: Context based spelling correction. Inf. Process. Manage. 27(5), 517–522 (1991)
Mohri, M.: Weighted automata algorithms. In: Handbook of Weighted Automata, pp. 213–254 (2009)
Németh, L.: Hunspell manual. Electronic Software Manual (manpage) (2011)
Norvig, P.: How to write a spelling corrector (2010), http://norvig.com/spell-correct.html (referred January 11, 2011)
Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Comput. Linguist. 22(1), 73–89 (1996)
Otero, J., Graña, J., Vilares, M.: Contextual spelling correction. In: Moreno DÃaz, R., Pichler, F., Quesada Arencibia, A. (eds.) EUROCAST 2007. LNCS, vol. 4739, pp. 290–296. Springer, Heidelberg (2007)
Pirinen, T.A., Hardwick, S.: Effects of weighted finite-state language and error models on speed and efficiency of finite-state spell-checking. In: FSMNLP 2012, pp. 6–14. University of the Basque Country (2012)
Pirinen, T.A., Lindén, K.: Finite-state spell-checking with weighted language and error models. In: Proceedings of the Seventh SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-Resourced Languages, Valletta, Malta, pp. 13–18 (2010)
Pirinen, T.A., Lindén, K.: Creating and weighting Hunspell dictionaries as finite-state automata. Investigationes Linguisticae 21 (2010)
Pirinen, T.A., Silfverberg, M., Lindén, K.: Improving finite-state spellchecker suggestions with part of speech n-grams. In: CICLING (2012)
Pitkänen, H.: Hunspell-in kesäkoodi 2006: Final report. Technical report (2006), http://www.puimula.org/htp/archive/kesakoodi2006-report.pdf (referred on September 16)
Raviv, J.: Decision making in Markov chains applied to the problem of pattern recognition. IEEE Transactions on Information Theory 13(4), 536–551 (1967)
Savary, A.: Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 251–260. Springer, Heidelberg (2003)
Schulz, K., Mihov, S.: Fast string correction with Levenshtein-automata. International Journal of Document Analysis and Recognition 5, 67–85 (2002)
Shannon, C.E.: A mathematical theory of communications, i and ii. Bell Syst. Tech. J. 27, 379–423 (1948)
Wilcox-O’Hearn, A., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 605–616. Springer, Heidelberg (2008)
Drobac, S., Lindén, K., Pirinen, T., Silfverberg, M.: Heuristic Hyperminimization of Finite-State Lexicons. In: The Proceedings of LREC, Reykavik, Iceland (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pirinen, T.A., Lindén, K. (2014). State-of-the-Art in Weighted Finite-State Spell-Checking. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)