A Rule-Based Named-Entity Recognition for Malay Articles

  • Rayner Alfred
  • Leow Ching Leong
  • Chin Kim On
  • Patricia Anthony
  • Tan Soo Fun
  • Mohd Norhisham Bin Razali
  • Mohd Hanafi Ahmad Hijazi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8346)


A Named-Entity Recognition (NER) is part of the process in Text Mining used for information extraction. This NER tool can be used to assist user in identifying and detecting entities such as person, location or organization. Different languages may have different morphologies and thus require different NER processes. For instance, an English NER process cannot be applied in processing Malay articles due to the different morphology used in different languages. This paper proposes a Rule-Based Named-Entity Recognition algorithm for Malay articles. The proposed Malay NER is designed based on a Malay part-of-speech (POS) tagging features and contextual features that had been implemented to handle Malay articles. Based on the POS results, proper names will be identified or detected as the possible candidates for annotation. Besides that, there are some symbols and conjunctions that will also be considered in the process of identifying named-entity for Malay articles. Several manually constructed dictionaries will be used to handle three named-entities; Person, Location and Organizations. The experimental results show a reasonable output of 89.47% for the F-Measure value. The proposed Malay NER algorithm can be further improved by having more complete dictionaries and refined rules to be used in order to identify the correct Malay entities system.


Named Entity Recognition Malay Named Entity Recognition Rule-based Information Extraction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Liddy, E.D.: Natural Language Processing. In: Encyclopedia of Library and Information Science, 2nd edn., Marcel Decker Inc., NY (2001)Google Scholar
  2. 2.
  3. 3.
  4. 4.
    Song, Y., Eunji, Y., Eunju, K., Gary, G.L.: POSBIOTM-NER: A machine learning approach for bio-named entity recognition. In: Proceedings of the EMBO Workshop on Critical Assessment of Text Mining Methods in Molecular Biology (2004)Google Scholar
  5. 5.
    Ralph, G., Beth, S.: Message Understanding Conference-6: A Brief History. In: The Proceedings of the 16th International Conference on Computational Linguistics (COLING), pp. 466–471. Center for Sprogteknologi, Copenhagen (1996)Google Scholar
  6. 6.
    Ralph, G.: The NYU system for MUC-6 or Where’s the Syntax. In: Inthe Proceedings of Sixth Message Understanding Conference (MUC-6), Fairfax, Virginia, pp. 167–195 (1995)Google Scholar
  7. 7.
    Wakao, T., Gaizaukas, R., Wilks, Y.: Evaluation of an algorithm for the Recognition and Classification of Proper Names. In: The Proceedings of COLING 1996 (1996)Google Scholar
  8. 8.
    Alireza, M., Liily, S.A., Ali, M.: Named Entity Recognition Approaches. Proceedings of IJCSNS International Journal of Computer Science and Network Security 8(2), 339–344 (2008)Google Scholar
  9. 9.
    Micheal, C., Yoram, S.: Unsupervised models for Named Entity Classification. In: The Proceedings of the Joing SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)Google Scholar
  10. 10.
    Kim, J., Kang, I., Choi, K.: Unsupervised Named Entity Classification Models and their Ensembles. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)Google Scholar
  11. 11.
    Naji, F.M., Nazlia, O.: Arabic Named Entity Recognition Using Artificial Neural Network. Journal of Computer Science 8(8), 1549–3636 (2012) ISBN 1549-3636Google Scholar
  12. 12.
    Daniel, M.B., Scoot, M., Richard, S., Ralph, W.: Nymble: a highperformance learning name-finder. In: The Proceedings of the Fifth Conference Applied Natural Language Processing, pp. 194–201. USA Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  13. 13.
    Benajiba, Y., Rosso, P., BenedíRuiz, J.M.: ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 143–153. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Bechet, F., Nasr, A., Genet, F.: Tagging Unknown Proper Names Using Decision Trees. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (2002)Google Scholar
  15. 15.
    Wu, Y.-C., Fan, T.-K., Lee, Y.-S., Yen, S.-J.: Extracting Named Entities Using Support Vector Machines. In: Bremer, E.G., Hakenberg, J., Han, E.-H(S.), Berrar, D., Dubitzky, W. (eds.) KDLL 2006. LNCS (LNBI), vol. 3886, pp. 91–103. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Srihari, R., Niu, C., Li, W.: A Hybrid Approach for Named Entity and Sub-Type Tagging. In: Proceedings of the Conference on Applied Natural Language Processing (ANLP 2000), pp. 247–254 (2000)Google Scholar
  17. 17.
    Alfred, R., Mujat, A., Obit, J.H.: A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part II. LNCS, vol. 7803, pp. 50–59. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  18. 18.
    Douthat, A.: The Message Understanding Conference Scoring Software User’s Manual. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)Google Scholar
  19. 19.
    Budi, I., Bressan, S., Wahyudi, G., Hasibuan, Z.A., Nazief, B.A.A.: Named Entity Recognition for the Indonesian Language: Combining Contextual, Morphological and Part-of-Speech Features into a Knowledge Engineering Approach. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 57–69. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  20. 20.
    Yong, S.F., Bali, R.M., Alvin, Y.W.: NERSIL: The Named-Entity Recognition System for Iban Language. In: PACLIC, pp. 549–558 (2011)Google Scholar
  21. 21.
    Asharef, M., Omar, N., Albared, M.: Arabic Named Entity Recognition in Crime Documents. Journal of Theoretical and Applied Information Technology 44(1), 1–6 (2012)Google Scholar
  22. 22.
    Ferreira, E., Balsa, J., Branco, A.: Combining rule-based and statistical methods for named entity recognition in Portuguese. In: Actas da 5a Workshop emTecnologias da Informação e da Linguagem Humana (2007)Google Scholar
  23. 23.
    Alfred, R., Kazakov, D., Bartlett, M., Paskaleva, E.: Hierarchical Agglomerative Clustering for Cross-Language Information Retrieval. International Journal of Translation 19(1), 139–162 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Rayner Alfred
    • 1
  • Leow Ching Leong
    • 1
  • Chin Kim On
    • 1
  • Patricia Anthony
    • 2
  • Tan Soo Fun
    • 1
  • Mohd Norhisham Bin Razali
    • 1
  • Mohd Hanafi Ahmad Hijazi
    • 1
  1. 1.Center of Excellence in Semantic Agents, School of Engineering and Information TechnologyUniversiti Malaysia Sabah, Jalan UMSKota KinabaluMalaysia
  2. 2.Department of Applied Computing, Faculty of Environment, Society and DesignLincoln UniversityChristchurchNew Zealand

Personalised recommendations