Abstract
Stemming is one of the techniques in natural language processing that is used to reduce a word to its root. Information retrieval and knowledge management can further be improved by improving the stemming process. There are four strategies that are being used widely in stemming that includes table lookup, rule-based affix elimination, successor variety and n-gram. However, not all of these strategies are being applied in Malay stemming algorithm. The well-known strategy used in stemming Malay text documents is called a rule-based affix elimination algorithm. In this paper, several Malay stemming algorithms will be discussed such as Othman’s algorithm, Sembok’s algorithm, Idris’s algorithm, Rule Frequency Order Stemmer and Mangalam’s algorithm. This paper also discusses some of the improvements made by researchers based on previous Malay stemming algorithm and this provides the current trend of Malay stemming algorithm. Different morphologies rules also being applied in different Malay stemming algorithms. Based on this review paper, it can be concluded that there are a lot of works related to the arrangement of the morphologies rules are conducted. However, this stemming process can still be improved by applying certain background knowledge such as root words dictionaries that can be used for checking the word during the process of eliminating affix words.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Vallbé J, Martí MA, Blaz Fortuna A, Dunja Mladenic J, Casanovas P (2007) Stemming and lemmatization: improving knowledge management through language processing techniques. Trends in Legal Knowledge, the Semantic Web and the Regulation of Electronic Social Systems
Sharifloo AA, Shamsfard M (2008) A Bottom up approach to Persian stemming. In: The third international joint conference on natural language processing, IJCNLP
Lovins JB (1968) Development of a stemming algorithm. Mech Transl Comput Linguist 11(1):pp 22–31
Idris N, Mustapha SMFDS (2001) Stemming for term conflation in Malay texts. ACM 3:12–17
Musa H, Kadir RA, Azman A, Abdullah MT (2011) Syllabification algorithm based on syllable rule matching for Malay language. In: WSEAS international conference on applied computer and applied computational science, ACACOS ‘11, pp 279–286
Stein B, Potthast M (2007) Putting successor variety stemming to work. In: Proceedings of the 30th annual conference of the Gesellschaft für Klassifikation e.V., Freie Universität Berlin, 2006, Springer, pp 367–374
Wagner S. (2005) A German decompounder Retrieved August 10, 2006 from the World Wide Web: http://www-user.tu-chemnitz.de/wags/cv/clr.pdf
Sembok TMT, Bakar ZA (2011) Effectiveness of stemming and n-grams string similarity matching on Malay documents. Int J Appl Math Inform 5(3):208–215
Sodhy GC (1998) Prefix extraction of Malay words using backpropagation neural network
Othman A (1993) Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen. Unpublished master’s thesis, Universiti Kebangsaan Malaysia, Bangi
Ahmad F, Yusoff M, Sembok TMT (1996) Experiments with a stemming algorithm for Malay words. J Am Soc Inform Sci 47(12):909–918
Al-Shammari ET (2008) Towards an error-free stemming. In: Proceedings of the IADIS European conference data mining, pp 160–163
Narayan BL, Pal SK (2005) Distribution based stemmer refinement, Lecture Notes in Computer Science, vol 3776/2005, pp 672–677
Abdullah MT, Ahmad F, Mahmod R, Sembok MT (2009) Rules frequency order stemmer for Malay language. Int J Comput Sci Netw Secur 9:433–438
Sankupellay M, Valliappan S (2006) Malay-language stemmer. Univ Malaya Sunway Acad J 3:147–153
Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137
Malaçon BR (2004) Computational analysis of affixed words in Malay language. In: Proceeding of international symposium on Malay/Indonesian linguistics, Penang
Acknowledgments
This work has been supported by the Long Term Research Grant Scheme (LRGS) project funded by the Ministry of Higher Education (MoHE), Malaysia under grants No. LRGS/TD/2011/UiTM/ICT/04.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Alfred, R., Leong, L.C., On, C.K., Anthony, P. (2014). A Literature Review and Discussion of Malay Rule - Based Affix Elimination Algorithms. In: Uden, L., Wang, L., Corchado Rodríguez, J., Yang, HC., Ting, IH. (eds) The 8th International Conference on Knowledge Management in Organizations. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7287-8_23
Download citation
DOI: https://doi.org/10.1007/978-94-007-7287-8_23
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-7286-1
Online ISBN: 978-94-007-7287-8
eBook Packages: Computer ScienceComputer Science (R0)