Advertisement

Design Consideration of Malay Text Stemmer Using Structured Approach

  • Mohamad Nizam KassimEmail author
  • Shaiful Hisham Mat Jali
  • Mohd Aizaini Maarof
  • Anazida Zainal
  • Amirudin Abdul Wahab
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 165)

Abstract

Word stemmer (or text stemmer) is used to remove bound morphemes from derived words so that various morphological variants are mapped into common base forms. It is usually used as one of the preprocessing tools in text classification, text mining, and information retrieval tasks. Therefore, the design of an effective text stemmer is crucial for ensuring text stemming process maps morphological variants into correct base forms. This paper investigates the design consideration of an effective text stemmer from the perspective of the Malay language. These design considerations are based on current challenges faced by previous researchers in performing text stemming against Malay texts. By adopting these considerations, an effective text stemmer is expected to address common stemming errors and also, expected to produce promising stemming accuracy.

Keywords

Word stemming Text stemming Word stemming algorithm Word stemmer Text stemmer Affixes removal method 

Notes

Acknowledgements

The authors would like to thank the Editor in Chief and the anonymous reviewers of the manuscript for their valuable comments and suggestions. This research was funded by Universiti Teknologi Malaysia’s Research University Grant (VUP) PY/2017/01736.

References

  1. 1.
    Kietzmann, J.H., Hermkens, K., McCarthy, I.P., Silvestre, B.S.: Social media? get serious! understanding the functional building blocks of social media. Bus. Horiz. 54(3), 241–251 (2011)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C., Zhai, C. (eds.).: Mining Text Data. Springer Science and Business Media (2012)Google Scholar
  3. 3.
    Alfred, R., Leong, L.C., On, C.K., Anthony, P.: A literature review and discussion of Malay rule-based affix elimination algorithms. In: The 8th International Conference on Knowledge Management in Organizations, pp. 285–297. Springer, Dordrecht (2014)Google Scholar
  4. 4.
    Singh, J., Gupta, V.: A systematic review of text stemming techniques. Artif. Intell. Rev. 48(2), 157–217 (2017)CrossRefGoogle Scholar
  5. 5.
    Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: Word stemming challenges in Malay texts: a literature review. In: 2016 4th International Conference on Information and Communication Technology (ICoICT), pp. 1–6. IEEE (2016)Google Scholar
  6. 6.
    Othman, A.: Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen, MSc Thesis. Universiti Kebangsaan Malaysia, Bangi (1993)Google Scholar
  7. 7.
    Ahmad, F., Yusoff, M., Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words. J. Am. Soc. Inform. Sci. 47(12), 909–918 (1996)CrossRefGoogle Scholar
  8. 8.
    Idris, N., Syed, S.M.F.D.: Stemming for term conflation in Malay texts. In: International Conference on Artificial Intelligence (2001)Google Scholar
  9. 9.
    Sankupellay, M., Valliappan, S.: Malay language stemmer. Sunway Acad. J. 3, 147–153 (2006)Google Scholar
  10. 10.
    Yasukawa, M., Lim, H.T., Yokoo, H.: Stemming Malay text and its application in automatic text categorization. IEICE Trans. Inform. Syst. 92(12), 2351–2359 (2009)CrossRefGoogle Scholar
  11. 11.
    Abdullah, M.T., Ahmad, F., Mahmod, R., Sembok, T.M.T.: Rules frequency order stemmer for Malay language. IJCSNS Int. J. Comput. Sci. Netw. Secur. 9(2), 433–438 (2009)Google Scholar
  12. 12.
    Fadzli, S.A., Norsalehen, A.K., Syarilla, I.A., Hasni, H., Dhalila, M.S.S.: Simple rules Malay stemmer. In: The International Conference on Informatics and Applications (ICIA2012), The Society of Digital Information and Wireless Communication, pp. 28–35 (2012)Google Scholar
  13. 13.
    Leong, L.C., Basri, S., Alfred, R.: Enhancing Malay stemming algorithm with background knowledge. In: PRICAI 2012: Trends in Artificial Intelligence, pp. 753–758. Springer, Heidelberg (2012)Google Scholar
  14. 14.
    Lee, J., Othman, R.M., Mohamad, N.Z.: Syllable-based Malay word stemmer. In: Computers and Informatics (ISCI), 2013 IEEE Symposium, pp. 7–11. IEEE (2013)Google Scholar
  15. 15.
    Darwis, S.A., Abdullah, R., Idris, N.: Exhaustive affix stripping and a Malay word register to solve stemming errors and ambiguity problem in Malay stemmers. Malays. J. Comput. Sci. (2012)Google Scholar
  16. 16.
    Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: Enhanced affixation word stemmer with stemming error reducer to solve affixation stemming errors. J. Telecommun. Electron. Comput. Eng. (JTEC) 8(3), 37–41 (2016)Google Scholar
  17. 17.
    Kassim, M.N., Jali, S.H.M., Maarof, M.A., Zainal, A.: Towards stemming error reduction for Malay texts. In: Computational Science and Technology, pp. 13–23. Springer, Singapore (2019)Google Scholar
  18. 18.
    Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: enhanced rules application order to stem affixation, reduplication and compounding words in Malay texts. In: Pacific Rim Knowledge Acquisition Workshop, pp. 71–85. Springer, Cham (2016)CrossRefGoogle Scholar
  19. 19.
    Hassan, A.: Morfologi, vol. 13. PTS Professional (2006)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Mohamad Nizam Kassim
    • 1
    Email author
  • Shaiful Hisham Mat Jali
    • 2
  • Mohd Aizaini Maarof
    • 2
  • Anazida Zainal
    • 2
  • Amirudin Abdul Wahab
    • 1
  1. 1.CyberSecurity MalaysiaSeri KembanganMalaysia
  2. 2.Faculty of ComputingUniversiti Teknologi MalaysiaSkudaiMalaysia

Personalised recommendations