Advertisement

An Automatic Spell Checker Framework for Malay Language Blogs

  • Surayaini Binti Basri
  • Rayner AlfredEmail author
  • Chin Kim On
  • Mohd Norhisham Razali
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 387)

Abstract

A Spell checker is a system that is used to detect and correct misspelled word. Misspelled word is a word that exists in the existing lexicon that is not correctly spelled or in shortened form. These misspelled words often result in ineffective results of the Information Retrieval (IR) application such as document retrieval. This is because IR application should be able to recognize all words in a particular language in order to be more robust. The current spell checker for the Malay language uses a dictionary that contains pair of commonly misspelled word and its correctly spelled word in detecting and correcting misspelled word. However, this type of spell checker can only correct misspelled words that exist in the existing dictionary; otherwise it requires user interaction to correct it manually. This approach works well if the spell checker is a standalone system but it is not really an effective system when the spell checker is part of another IR application such as document retrieval for weblog. This is because there will be always new misspelled words created along with the increasing number of weblog pages. Thus, the number of misspelled words will also grow extremely. In this paper, we propose a new spell checker that detects and automatically corrects misspelled words in Malay without any interaction from the user. The proposed approach is evaluated by using texts that are selected randomly from the popular Malay blog. Based on the experimental results obtained, the proposed approach is found to be effective in detecting and correcting the Malay misspelled word automatically.

Keywords

Malay weblog Information retrieval Misspelled word Spell checker 

Notes

Acknowledgments

This work has been partly supported by the LRGS and RAGS projects funded by the Ministry of Higher Education (MoHE), Malaysia under Grants No. LRGS/TD/2011/UiTM/ICT/04 and RAG0008-TK-2012.

References

  1. 1.
    World Blogger and Social Media Awards 2012 (2012, March 16). http://www.socialmediaweek.com.my/awards/index.php. Accessed 18 July 2012
  2. 2.
    Travis, P.A.: Patent No. 5604897. United States of America (1997)Google Scholar
  3. 3.
    Leong, L.C., Basri, S., Alfred, R.: Enhancing Malay stemming algorithm with background knowledge. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS (LNAI), vol. 7458, pp. 753–758. Springer, Heidelberg (2012)Google Scholar
  4. 4.
    Kasbon, R., Amran, N., Mazlan, E., Mahamad, S.: Malay language sentence checker. World Appl. Sci. J. (Special Issue on Computer Applications and Knowledge Management) 12, 19–25 (2011)Google Scholar
  5. 5.
    Mishne, G.: Information access challenges in the Blogspace. 1 (2007)Google Scholar
  6. 6.
    Schabes, Y., Roche, E.: Patent No. US 7853874 B2. United States of America (2010)Google Scholar
  7. 7.
    Ulicny, B.: Modeling Malaysian public opinion by mining the Malaysian Blogosphere. 5 (2008)Google Scholar
  8. 8.
    Walfish, M., Hachamovitch, Andrew, F.: Patent No. 6047300. United States of America (2000)Google Scholar
  9. 9.
    Colette, M.: Blogging Phenomenon Sweeps Asia. MSN Press Release. 27 November 2006Google Scholar
  10. 10.
    Janssen, M.: Orthographic neologisms: selection criteria and semi-automatic detection. http://maarten.janssenweb.net/publications (unpublished)
  11. 11.
    Abdullah, M.T., Ahmad, F., Mahmod, R., Sembok, M.T.: Rules frequency order stemmer for Malay language. IJCSNS: Int. J. Comput. Sci. Network Secur. 9(2), 433–438 (2009)Google Scholar
  12. 12.
    Cambridge University Press (n.d.). Cambridge Dictionary Online. http://dictionary.cambridge.org/dictionary/british/misspell. Accessed 16 July 2012
  13. 13.
    Sourceforge (n.d.). The Java Open Source Spell Checker. http://jazzy.sourceforge.net/. Accessed 31 June 2012
  14. 14.
    Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Cybern. Control Theory 10(8), 707–710 (1966)MathSciNetGoogle Scholar
  15. 15.
    Bali, R.M., Chua, C.C., Ng, P.K.: Identification and classification of unknown words in Malay languageGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Surayaini Binti Basri
    • 1
  • Rayner Alfred
    • 1
    Email author
  • Chin Kim On
    • 1
  • Mohd Norhisham Razali
    • 1
  1. 1.School of Engineering and Information TechnologyUniversiti Malaysia SabahKota KinabaluMalaysia

Personalised recommendations