Abstract
A Named-Entity Recognition (NER) is part of the process in Text Mining used for information extraction. This NER tool can be used to assist user in identifying and detecting entities such as person, location or organization. Different languages may have different morphologies and thus require different NER processes. For instance, an English NER process cannot be applied in processing Malay articles due to the different morphology used in different languages. This paper proposes a Rule-Based Named-Entity Recognition algorithm for Malay articles. The proposed Malay NER is designed based on a Malay part-of-speech (POS) tagging features and contextual features that had been implemented to handle Malay articles. Based on the POS results, proper names will be identified or detected as the possible candidates for annotation. Besides that, there are some symbols and conjunctions that will also be considered in the process of identifying named-entity for Malay articles. Several manually constructed dictionaries will be used to handle three named-entities; Person, Location and Organizations. The experimental results show a reasonable output of 89.47% for the F-Measure value. The proposed Malay NER algorithm can be further improved by having more complete dictionaries and refined rules to be used in order to identify the correct Malay entities system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Liddy, E.D.: Natural Language Processing. In: Encyclopedia of Library and Information Science, 2nd edn., Marcel Decker Inc., NY (2001)
Song, Y., Eunji, Y., Eunju, K., Gary, G.L.: POSBIOTM-NER: A machine learning approach for bio-named entity recognition. In: Proceedings of the EMBO Workshop on Critical Assessment of Text Mining Methods in Molecular Biology (2004)
Ralph, G., Beth, S.: Message Understanding Conference-6: A Brief History. In: The Proceedings of the 16th International Conference on Computational Linguistics (COLING), pp. 466–471. Center for Sprogteknologi, Copenhagen (1996)
Ralph, G.: The NYU system for MUC-6 or Where’s the Syntax. In: Inthe Proceedings of Sixth Message Understanding Conference (MUC-6), Fairfax, Virginia, pp. 167–195 (1995)
Wakao, T., Gaizaukas, R., Wilks, Y.: Evaluation of an algorithm for the Recognition and Classification of Proper Names. In: The Proceedings of COLING 1996 (1996)
Alireza, M., Liily, S.A., Ali, M.: Named Entity Recognition Approaches. Proceedings of IJCSNS International Journal of Computer Science and Network Security 8(2), 339–344 (2008)
Micheal, C., Yoram, S.: Unsupervised models for Named Entity Classification. In: The Proceedings of the Joing SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Kim, J., Kang, I., Choi, K.: Unsupervised Named Entity Classification Models and their Ensembles. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)
Naji, F.M., Nazlia, O.: Arabic Named Entity Recognition Using Artificial Neural Network. Journal of Computer Science 8(8), 1549–3636 (2012) ISBN 1549-3636
Daniel, M.B., Scoot, M., Richard, S., Ralph, W.: Nymble: a highperformance learning name-finder. In: The Proceedings of the Fifth Conference Applied Natural Language Processing, pp. 194–201. USA Morgan Kaufmann Publishers Inc., San Francisco (1997)
Benajiba, Y., Rosso, P., BenedíRuiz, J.M.: ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 143–153. Springer, Heidelberg (2007)
Bechet, F., Nasr, A., Genet, F.: Tagging Unknown Proper Names Using Decision Trees. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (2002)
Wu, Y.-C., Fan, T.-K., Lee, Y.-S., Yen, S.-J.: Extracting Named Entities Using Support Vector Machines. In: Bremer, E.G., Hakenberg, J., Han, E.-H(S.), Berrar, D., Dubitzky, W. (eds.) KDLL 2006. LNCS (LNBI), vol. 3886, pp. 91–103. Springer, Heidelberg (2006)
Srihari, R., Niu, C., Li, W.: A Hybrid Approach for Named Entity and Sub-Type Tagging. In: Proceedings of the Conference on Applied Natural Language Processing (ANLP 2000), pp. 247–254 (2000)
Alfred, R., Mujat, A., Obit, J.H.: A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part II. LNCS, vol. 7803, pp. 50–59. Springer, Heidelberg (2013)
Douthat, A.: The Message Understanding Conference Scoring Software User’s Manual. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)
Budi, I., Bressan, S., Wahyudi, G., Hasibuan, Z.A., Nazief, B.A.A.: Named Entity Recognition for the Indonesian Language: Combining Contextual, Morphological and Part-of-Speech Features into a Knowledge Engineering Approach. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 57–69. Springer, Heidelberg (2005)
Yong, S.F., Bali, R.M., Alvin, Y.W.: NERSIL: The Named-Entity Recognition System for Iban Language. In: PACLIC, pp. 549–558 (2011)
Asharef, M., Omar, N., Albared, M.: Arabic Named Entity Recognition in Crime Documents. Journal of Theoretical and Applied Information Technology 44(1), 1–6 (2012)
Ferreira, E., Balsa, J., Branco, A.: Combining rule-based and statistical methods for named entity recognition in Portuguese. In: Actas da 5a Workshop emTecnologias da Informação e da Linguagem Humana (2007)
Alfred, R., Kazakov, D., Bartlett, M., Paskaleva, E.: Hierarchical Agglomerative Clustering for Cross-Language Information Retrieval. International Journal of Translation 19(1), 139–162 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alfred, R. et al. (2013). A Rule-Based Named-Entity Recognition for Malay Articles. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-53914-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)