Skip to main content

Real-Word Typo Detection

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5723))

Abstract

Context-sensitive spelling correction (CSSC) is a widely accepted and long studied formalization of the problem of finding and fixing contextually incorrect words. We argue that CSSC has its limitations as a model, and propose a weakened CSSC model (RWTD) to partially counter these limitations. We weaken the CSSC model by canceling its word-correction role. Thus, RWTD is focused solely on finding words that require correction. Once this is done, the actual correction process is performed by a human or a CSSC solution.

We propose a preliminary solution for RWTD model that differs from related CSSC work in several ways. The solution does not rely on a set of confusion lists and detects not only a limited set of confusion typos, but almost any class of typos. The solution offers a flexible trade-off between the time a human is willing to spend on the task and the quality of the proofreading. It does not require POS tagging and may be applied seamlessly to different languages. Experiment running times prove to be acceptable for real-world applications.

We report Brown corpus real-word typos that were exposed by implementing our solution. We also discuss experiments in applying the solution to other real-world test texts and demonstrate improved false positive and hit rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Golding, A.R., Roth, D.: A winnow-based approach to context-sensitive spelling correction. Machine Learning 34(1) (February 1999)

    Google Scholar 

  2. Wilcox-O’Hearn, L.A., Hirst, G., Budanitsky, A.: Real-word spelling correction with trigrams: A reconsideration of the mays, damerau, and mercer model. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 605–616. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Fossati, D., Eugenio, B.D.: A mixed trigrams approach for context sensitive spell checking. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 623–633. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Reynaert, M.: All, and only, the errors: more complete and consistent spelling and ocr-error correction evaluation. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

  5. Bolshakov, I.A., Bolshakova, E.I., Kotlyarov, A.P., Gelbukh, A.F.: Various criteria of collocation cohesion in internet: Comparison of resolving power. In: Computational Linguistics and Intelligent Text Processing, Haifa, Israel (2008)

    Google Scholar 

  6. Asonov, D.: Real-word typo detection: Supplementary material (2009), http://www.fastpl.com/pubs/nldb09supm.pdf

  7. Hirst, G.: An evaluation of the contextual spelling checker of microsoft office word 2007 (2008)

    Google Scholar 

  8. Mitton, R.: Spellchecking by computer. Journal of the Simplified Spelling Society 20(1) (1996)

    Google Scholar 

  9. Lapata, M., Keller, F.: Web-based models for natural language processing. TSLP 2(1), 1–31 (2005)

    Article  Google Scholar 

  10. Morris, R., Cherry, L.L.: Computer detection of typographical errors. IEEE Transactions on Professional Communication 18(1) (1975)

    Google Scholar 

  11. Bolshakova, E., Bolshakov, I., Kotlyarov, A.: Experiments in detection and correction of russian malapropisms by means of the web. International Journal Information Theories and Applications 12 (2006)

    Google Scholar 

  12. Fossati, D., Eugenio, B.D.: I saw tree trees in the park: How to correct real-word spelling mistakes. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Asonov, D. (2010). Real-Word Typo Detection. In: Horacek, H., Métais, E., Muñoz, R., Wolska, M. (eds) Natural Language Processing and Information Systems. NLDB 2009. Lecture Notes in Computer Science, vol 5723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12550-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12550-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12549-2

  • Online ISBN: 978-3-642-12550-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics