A Hybrid Approach for Extracting Arabic Persons’ Names and Resolving Their Ambiguity from Twitter

  • Omnia H. Zayed
  • Samhaa R. El-Beltagy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9103)


Tweets offer a novel way of communication that enables users all over the world to share real-time news and ideas. The massive amount of tweets, generated regularly by Arabic speakers, has resulted in a growing interest in building Arabic named entity recognition (NER) systems that deal with the informal colloquial Arabic. The unique characteristics of the Arabic language make Arabic NER a challenging task, which, the informal nature of tweets further complicates. The majority of previous works addressing Arabic NER were concerned with formal modern standard Arabic (MSA). Moreover, taggers and parsers were often utilized to solve the ambiguity problem of Arabic persons’ names. Although, previously developed approaches perform well on MSA text, they are not suited for colloquial Arabic. This paper introduces a hybrid approach to extract Arabic persons’ names from tweets in addition to a way to resolve their ambiguity using context bigram patterns. The introduced approach attempts not to use any language-dependent resources. Evaluation of the presented approach shows a 7 % improvement in the F-score over the best reported result in the literature.


Training Dataset Conditional Random Field Name Entity Recognition Baseline System Arabic Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Semiocast: Geolocation analysis of Twitter accounts and tweets by Semiocast.
  2. 2.
    Shaalan, K.: A survey of arabic named entity recognition and classification. Comput. Linguist. 40, 469–510 (2014)CrossRefGoogle Scholar
  3. 3.
    Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. 8, 1–22 (2009)CrossRefGoogle Scholar
  4. 4.
    Zayed, O., El-Beltagy, S., Haggag, O.: An approach for extracting and disambiguating arabic persons’ names using clustered dictionaries and scored patterns. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2013. LNCS, vol. 7934, pp. 201–212. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Darwish, K.: Named entity recognition using cross-lingual resources: arabic as an example. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1558–1567. Association for Computational Linguistics, Sofia (2013)Google Scholar
  6. 6.
  7. 7.
    Habash, N.Y.: Introduction to Arabic Natural Language Processing. Mogran & Claypool, San Rafael (2010)Google Scholar
  8. 8.
    Wikipedia People Category.
  9. 9.
  10. 10.
  11. 11.
    Singhal, A.: Modern information retrieval: a brief overview. Bull. IEEE Comput. Soc. Tech. Comm. DATA Eng. 24, 35–43 (2001)Google Scholar
  12. 12.
    Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with Java implementations. In: Kasabov, N., Ko, K. (eds.) Proceedings of the ICONIP/ANZIIS/ANNES 1999 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, Dunedin, New Zealand, pp 192–196 (1999)Google Scholar
  13. 13.
    CoNLL’s Standard NER Evaluation Script.
  14. 14.
    De Sitter, A., Calders, T., Daelemans, W.: A formal framework for evaluation of information extraction, Antwerp (2004)Google Scholar
  15. 15.
    Darwish, K., Gao, W.: Simple effective microblog named entity recognition: arabic as an example. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 2513–2517. European Language Resources Association (ELRA), Reykjavik (2014)Google Scholar
  16. 16.
    Linguistic Data Consortium (LDC).
  17. 17.
    Shaalan, K., Raza, H.: Person name entity recognition for Arabic. In: Proceedings of the 5th Workshop on Important Unresolved Matters, pp. 17–24. Association for Computational Linguistics, Prague (2007)Google Scholar
  18. 18.
    Shaalan, K., Raza, H.: Arabic named entity recognition from diverse text types. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 440–451. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Shaalan, K., Raza, H.: NERA: named entity recognition for Arabic. J. Amer. Soc. for. Inf. Sci. Technol. 60, 1652–1663 (2009)Google Scholar
  20. 20.
    Abdallah, S., Shaalan, K., Shoaib, M.: Integrating rule-based system with classification for Arabic named entity recognition. In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 311–322. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  21. 21.
    Benajiba, Y., Rosso, P., BenedíRuiz, J.M.: ANERsys: an arabic named entity recognition system based on maximum entropy. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 143–153. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  22. 22.
    Benajiba, Y., Diab, M., Rosso, P., Valencia, D.: Arabic named entity recognition using optimized feature sets. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, pp. 284–293. Association for Computational Linguistics, Morristown (2008)Google Scholar
  23. 23.
    Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition: a feature-driven study. IEEE Trans. Audio Speech Lang. Process. 17, 926–934 (2009)CrossRefGoogle Scholar
  24. 24.
    Benajiba, Y., Diab, M., Rosso, P.: Arabic named entity recognition: an SVM based approach. In: Proceeding of the 2008 Arab International Conference on Information Technology (ACIT) (2008)Google Scholar
  25. 25.
    Benajiba, Y., Rosso, P.: ANERsys 2 . 0 : conquering the NER task for the arabic language by combining the maximum entropy with POS-tag information. In: IICAI, pp. 1814–1823 (2007)Google Scholar
  26. 26.
    Benajiba, Y., Rosso, P.: Arabic named entity recognition using conditional random fields. In: Proceedings of Workshop on HLT & NLP within the Arabic World, LREC, vol. 8 (2008)Google Scholar
  27. 27.
    Shaalan, K., Oudah, M.: A hybrid approach to Arabic named entity recognition. J. Inf. Sci. 40, 67–87 (2014)CrossRefGoogle Scholar
  28. 28.
    Zirikly, A., Diab, M.: Named entity recognition system for dialectal Arabic. In: Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pp. 78–86. Association for Computational Linguistics, Doha (2014)Google Scholar
  29. 29.
    Abdul-hamid, A., Darwish, K.: Simplified feature set for Arabic named entity recognition. In: Proceedings of the 2010 Named Entities Workshop (NEWS 2010), pp. 110–115. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  30. 30.
    Pasha, A., Al-Badrashiny, M., Diab, M., Kholy, A.El, Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 26–31. European Language Resources Association (ELRA), Reykjavik (2014)Google Scholar
  31. 31.
    Brown, P.F., DeSouza, P.V., Mercer, R.L., Dellapietra, V.J., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Center of Informatics ScienceNile UniversityGizaEgypt

Personalised recommendations