Skip to main content

A Rule-Based Named-Entity Recognition for Malay Articles

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8346))

Abstract

A Named-Entity Recognition (NER) is part of the process in Text Mining used for information extraction. This NER tool can be used to assist user in identifying and detecting entities such as person, location or organization. Different languages may have different morphologies and thus require different NER processes. For instance, an English NER process cannot be applied in processing Malay articles due to the different morphology used in different languages. This paper proposes a Rule-Based Named-Entity Recognition algorithm for Malay articles. The proposed Malay NER is designed based on a Malay part-of-speech (POS) tagging features and contextual features that had been implemented to handle Malay articles. Based on the POS results, proper names will be identified or detected as the possible candidates for annotation. Besides that, there are some symbols and conjunctions that will also be considered in the process of identifying named-entity for Malay articles. Several manually constructed dictionaries will be used to handle three named-entities; Person, Location and Organizations. The experimental results show a reasonable output of 89.47% for the F-Measure value. The proposed Malay NER algorithm can be further improved by having more complete dictionaries and refined rules to be used in order to identify the correct Malay entities system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liddy, E.D.: Natural Language Processing. In: Encyclopedia of Library and Information Science, 2nd edn., Marcel Decker Inc., NY (2001)

    Google Scholar 

  2. ABGene, ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/AbGene/

  3. Abner, http://pages.cs.wisc.edu/~bsettles/abner/

  4. Song, Y., Eunji, Y., Eunju, K., Gary, G.L.: POSBIOTM-NER: A machine learning approach for bio-named entity recognition. In: Proceedings of the EMBO Workshop on Critical Assessment of Text Mining Methods in Molecular Biology (2004)

    Google Scholar 

  5. Ralph, G., Beth, S.: Message Understanding Conference-6: A Brief History. In: The Proceedings of the 16th International Conference on Computational Linguistics (COLING), pp. 466–471. Center for Sprogteknologi, Copenhagen (1996)

    Google Scholar 

  6. Ralph, G.: The NYU system for MUC-6 or Where’s the Syntax. In: Inthe Proceedings of Sixth Message Understanding Conference (MUC-6), Fairfax, Virginia, pp. 167–195 (1995)

    Google Scholar 

  7. Wakao, T., Gaizaukas, R., Wilks, Y.: Evaluation of an algorithm for the Recognition and Classification of Proper Names. In: The Proceedings of COLING 1996 (1996)

    Google Scholar 

  8. Alireza, M., Liily, S.A., Ali, M.: Named Entity Recognition Approaches. Proceedings of IJCSNS International Journal of Computer Science and Network Security 8(2), 339–344 (2008)

    Google Scholar 

  9. Micheal, C., Yoram, S.: Unsupervised models for Named Entity Classification. In: The Proceedings of the Joing SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)

    Google Scholar 

  10. Kim, J., Kang, I., Choi, K.: Unsupervised Named Entity Classification Models and their Ensembles. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)

    Google Scholar 

  11. Naji, F.M., Nazlia, O.: Arabic Named Entity Recognition Using Artificial Neural Network. Journal of Computer Science 8(8), 1549–3636 (2012) ISBN 1549-3636

    Google Scholar 

  12. Daniel, M.B., Scoot, M., Richard, S., Ralph, W.: Nymble: a highperformance learning name-finder. In: The Proceedings of the Fifth Conference Applied Natural Language Processing, pp. 194–201. USA Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  13. Benajiba, Y., Rosso, P., BenedíRuiz, J.M.: ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 143–153. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Bechet, F., Nasr, A., Genet, F.: Tagging Unknown Proper Names Using Decision Trees. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (2002)

    Google Scholar 

  15. Wu, Y.-C., Fan, T.-K., Lee, Y.-S., Yen, S.-J.: Extracting Named Entities Using Support Vector Machines. In: Bremer, E.G., Hakenberg, J., Han, E.-H(S.), Berrar, D., Dubitzky, W. (eds.) KDLL 2006. LNCS (LNBI), vol. 3886, pp. 91–103. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  16. Srihari, R., Niu, C., Li, W.: A Hybrid Approach for Named Entity and Sub-Type Tagging. In: Proceedings of the Conference on Applied Natural Language Processing (ANLP 2000), pp. 247–254 (2000)

    Google Scholar 

  17. Alfred, R., Mujat, A., Obit, J.H.: A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part II. LNCS, vol. 7803, pp. 50–59. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  18. Douthat, A.: The Message Understanding Conference Scoring Software User’s Manual. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)

    Google Scholar 

  19. Budi, I., Bressan, S., Wahyudi, G., Hasibuan, Z.A., Nazief, B.A.A.: Named Entity Recognition for the Indonesian Language: Combining Contextual, Morphological and Part-of-Speech Features into a Knowledge Engineering Approach. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 57–69. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  20. Yong, S.F., Bali, R.M., Alvin, Y.W.: NERSIL: The Named-Entity Recognition System for Iban Language. In: PACLIC, pp. 549–558 (2011)

    Google Scholar 

  21. Asharef, M., Omar, N., Albared, M.: Arabic Named Entity Recognition in Crime Documents. Journal of Theoretical and Applied Information Technology 44(1), 1–6 (2012)

    Google Scholar 

  22. Ferreira, E., Balsa, J., Branco, A.: Combining rule-based and statistical methods for named entity recognition in Portuguese. In: Actas da 5a Workshop emTecnologias da Informação e da Linguagem Humana (2007)

    Google Scholar 

  23. Alfred, R., Kazakov, D., Bartlett, M., Paskaleva, E.: Hierarchical Agglomerative Clustering for Cross-Language Information Retrieval. International Journal of Translation 19(1), 139–162 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alfred, R. et al. (2013). A Rule-Based Named-Entity Recognition for Malay Articles. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53914-5_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53913-8

  • Online ISBN: 978-3-642-53914-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics