Advertisement

A Novel Hybrid Approach to Arabic Named Entity Recognition

  • Mohamed A. Meselhi
  • Hitham M. Abo Bakr
  • Ibrahim Ziedan
  • Khaled Shaalan
Part of the Communications in Computer and Information Science book series (CCIS, volume 493)

Abstract

Named Entity Recognition (NER) task is an essential preprocessing task for many Natural Language Processing (NLP) applications such as text summarization, document categorization, Information Retrieval, among others. NER systems follow either rule-based approach or machine learning approach. In this paper, we introduce a novel NER system for Arabic using a hybrid approach, which combines a rule-based approach and a machine learning approach in order to improve the performance of Arabic NER. The system is able to recognize three types of named entities, including Person, Location and Organization. Experimental results on ANERcorp dataset showed that our hybrid approach has achieved better performance than using the rule-based approach and the machine learning approach when they are processed separately. It also outperforms the state-of-the-art hybrid Arabic NER systems.

Keywords

Support Vector Machine Hybrid Approach Natural Language Processing Conditional Random Field Machine Learning Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Benajiba, Y., Diab, M., Rosso, P.: Arabic Named Entity Recognition using Optimized Feature Sets. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, pp. 284–293 (2008)Google Scholar
  2. 2.
    Al-Sughaiyer, I.A., Al-Kharashi, I.A.: Arabic morphological analysis techniques: a comprehensive survey. Journal of the American Society for Information Science and Technology 55(2004), 189–213 (2004)CrossRefGoogle Scholar
  3. 3.
    Shaalan, K., Raza, H.: NERA: Named entity recognition for arabic. Journal of the American Society for Information Science and Technology, 1652–1663 (2009)Google Scholar
  4. 4.
    Habash, N., Rambow, O., Roth, R.: MADA+TOKAN: A Toolkit for Arabic Tokenization, Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatization. In: Proceedings of MEDAR, Cairo, Egypt, pp. 102–109 (2009)Google Scholar
  5. 5.
    Shaalan, K., Raza, H.: Arabic Named Entity Recognition from Diverse Text Types. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 440–451. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Farber, B., Freitag, D., Habash, N., Rambow, O.: Improving NER in Arabic Using a Morphological Tagger. In: Proceedings of LREC 2008 (2008)Google Scholar
  7. 7.
    Habash, N.Y.: Introduction to Arabic Natural Language Processing. Mogran & Claypool Publisher (2010)Google Scholar
  8. 8.
    Habash, N., Soudi, A., Buckwalter, T.: On Arabic transliteration. In: Arabic Computational Morphology: Knowledge-based and Empirical Methods. Kluwer/Springer (2007)Google Scholar
  9. 9.
    Shaalan, K.: A Survey of Arabic Named Entity Recognition and Classification. Computational Linguistics 40(2), 469–510 (2014)CrossRefGoogle Scholar
  10. 10.
    Benajiba, Y., Rosso, P., BenedíRuiz, J.M.: ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 143–153. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Oudah, M., Shaalan, K.: A pipeline Arabic named entity recognition using a hybrid approach. In: Proceedings of the 24th International Conference on Computational Linguistics, COLING 2012, India, pp. 2159–2176 (2012)Google Scholar
  12. 12.
    Borthwick, A.: A Maximum Entropy Approach to Named Entity Recognition Ph.D. thesis, Computer Science Department, New York University (1999)Google Scholar
  13. 13.
    Bikel, D.M., Schwartz, R.L., Weischedel, R.M.: An Algorithm that Learns What’s in a Name. Machine Learning 34(1-3), 211–231 (1999)CrossRefzbMATHGoogle Scholar
  14. 14.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289 (2001)Google Scholar
  15. 15.
    McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. In: Proceedings of Seventh Conference on Natural Language Learning, CoNLL 2003 (2003)Google Scholar
  16. 16.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)CrossRefzbMATHGoogle Scholar
  17. 17.
    Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition: An svm-based approach. In: The International Arab Conference on Information Technology, ACIT 2008 (2008)Google Scholar
  18. 18.
    Sitter, A.D., Calders, T., Daelemans, W.: A Formal Framework for Evaluation of Information Extraction, University of Antwerp, Dept. of Mathematics and Computer Science, Technical Report, TR 2004-0 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Mohamed A. Meselhi
    • 1
  • Hitham M. Abo Bakr
    • 1
  • Ibrahim Ziedan
    • 1
  • Khaled Shaalan
    • 2
  1. 1.Derpartment of Computer and System Engineering; Faculty of EngineeringZagazig UniversityEgypt
  2. 2.The British UniversityDubaiUAE

Personalised recommendations