Skip to main content

A Literature Review and Discussion of Malay Rule - Based Affix Elimination Algorithms

  • Conference paper
  • First Online:

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Abstract

Stemming is one of the techniques in natural language processing that is used to reduce a word to its root. Information retrieval and knowledge management can further be improved by improving the stemming process. There are four strategies that are being used widely in stemming that includes table lookup, rule-based affix elimination, successor variety and n-gram. However, not all of these strategies are being applied in Malay stemming algorithm. The well-known strategy used in stemming Malay text documents is called a rule-based affix elimination algorithm. In this paper, several Malay stemming algorithms will be discussed such as Othman’s algorithm, Sembok’s algorithm, Idris’s algorithm, Rule Frequency Order Stemmer and Mangalam’s algorithm. This paper also discusses some of the improvements made by researchers based on previous Malay stemming algorithm and this provides the current trend of Malay stemming algorithm. Different morphologies rules also being applied in different Malay stemming algorithms. Based on this review paper, it can be concluded that there are a lot of works related to the arrangement of the morphologies rules are conducted. However, this stemming process can still be improved by applying certain background knowledge such as root words dictionaries that can be used for checking the word during the process of eliminating affix words.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Vallbé J, Martí MA, Blaz Fortuna A, Dunja Mladenic J, Casanovas P (2007) Stemming and lemmatization: improving knowledge management through language processing techniques. Trends in Legal Knowledge, the Semantic Web and the Regulation of Electronic Social Systems

    Google Scholar 

  2. Sharifloo AA, Shamsfard M (2008) A Bottom up approach to Persian stemming. In: The third international joint conference on natural language processing, IJCNLP

    Google Scholar 

  3. Lovins JB (1968) Development of a stemming algorithm. Mech Transl Comput Linguist 11(1):pp 22–31

    Google Scholar 

  4. Idris N, Mustapha SMFDS (2001) Stemming for term conflation in Malay texts. ACM 3:12–17

    Google Scholar 

  5. Musa H, Kadir RA, Azman A, Abdullah MT (2011) Syllabification algorithm based on syllable rule matching for Malay language. In: WSEAS international conference on applied computer and applied computational science, ACACOS ‘11, pp 279–286

    Google Scholar 

  6. Stein B, Potthast M (2007) Putting successor variety stemming to work. In: Proceedings of the 30th annual conference of the Gesellschaft für Klassifikation e.V., Freie Universität Berlin, 2006, Springer, pp 367–374

    Google Scholar 

  7. Wagner S. (2005) A German decompounder Retrieved August 10, 2006 from the World Wide Web: http://www-user.tu-chemnitz.de/wags/cv/clr.pdf

  8. Sembok TMT, Bakar ZA (2011) Effectiveness of stemming and n-grams string similarity matching on Malay documents. Int J Appl Math Inform 5(3):208–215

    Google Scholar 

  9. Sodhy GC (1998) Prefix extraction of Malay words using backpropagation neural network

    Google Scholar 

  10. Othman A (1993) Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen. Unpublished master’s thesis, Universiti Kebangsaan Malaysia, Bangi

    Google Scholar 

  11. Ahmad F, Yusoff M, Sembok TMT (1996) Experiments with a stemming algorithm for Malay words. J Am Soc Inform Sci 47(12):909–918

    Article  Google Scholar 

  12. Al-Shammari ET (2008) Towards an error-free stemming. In: Proceedings of the IADIS European conference data mining, pp 160–163

    Google Scholar 

  13. Narayan BL, Pal SK (2005) Distribution based stemmer refinement, Lecture Notes in Computer Science, vol 3776/2005, pp 672–677

    Google Scholar 

  14. Abdullah MT, Ahmad F, Mahmod R, Sembok MT (2009) Rules frequency order stemmer for Malay language. Int J Comput Sci Netw Secur 9:433–438

    Google Scholar 

  15. Sankupellay M, Valliappan S (2006) Malay-language stemmer. Univ Malaya Sunway Acad J 3:147–153

    Google Scholar 

  16. Porter M (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Article  Google Scholar 

  17. Malaçon BR (2004) Computational analysis of affixed words in Malay language. In: Proceeding of international symposium on Malay/Indonesian linguistics, Penang

    Google Scholar 

Download references

Acknowledgments

This work has been supported by the Long Term Research Grant Scheme (LRGS) project funded by the Ministry of Higher Education (MoHE), Malaysia under grants No. LRGS/TD/2011/UiTM/ICT/04.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rayner Alfred .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Alfred, R., Leong, L.C., On, C.K., Anthony, P. (2014). A Literature Review and Discussion of Malay Rule - Based Affix Elimination Algorithms. In: Uden, L., Wang, L., Corchado Rodríguez, J., Yang, HC., Ting, IH. (eds) The 8th International Conference on Knowledge Management in Organizations. Springer Proceedings in Complexity. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-7287-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-7287-8_23

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-7286-1

  • Online ISBN: 978-94-007-7287-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics