Skip to main content

CasANER: Arabic Named Entity Recognition Tool

  • Chapter
  • First Online:
Intelligent Natural Language Processing: Trends and Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 740))

Abstract

Actually, the Named Entity Recognition (NER) task is a very innovative research line involving the process of unstructured or semi-structured textual resources to identify the relevant NEs and classify them into predefined categories. Generally, NER task is based on the classification process, which always refers to the previous categorizations. In this context, we propose CasANER, which is a system recognizing and annotating the ANEs. The CasANER elaboration is based on a deep categorization made using a representative Arabic Wikipedia corpus. Moreover, our proposed system is composed of two kinds of transducer cascades, which are the analysis and synthesis transducers. The analysis cascade, which is dedicated to the ANE recognition process, includes the analysis, filtering and generic transduces. However, the synthesis cascade enables to transform the annotation of the recognized ANEs into an annotation respecting the TEI recommendation in order to provide a structured output. The implementation of CasANER is ensured by the linguistic platform Unitex. Then, its evaluation is made using measure values, which show that our proposed system outcomes are satisfactory. Besides, we compare CasANER system with a statistical system recognizing ANEs. The comparison phase proved that the results obtained by our system are as efficient as those of the statistical system in the recognition and annotation of the person’s names and organization names.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://biopython.org/wiki/Biopython.

  2. 2.

    http://www-igm.univ-mlv.fr/~unitex/.

  3. 3.

    http://wiki.kiwix.org/wiki/Main_Page.

  4. 4.

    The tagset of Arabic Unitex package dictionaries.

  5. 5.

    http://users.dsic.upv.es/grupos/nle/?file=kop4.php.

References

  1. AbdelRahman, S., Elarnaoty, M., Magdy, M., Fahmy, A.: Integrated machine learning techniques for Arabic named entity recognition. Int. J. Comput. Sci. (IJCSI) 27–36 (2010)

    Google Scholar 

  2. Aboaoga, M., Aziz, M.J.A.: Arabic person names recognition by using a rule based approach. J. Comput. Sci. 922–927 (2013)

    Google Scholar 

  3. Aliane, H., Guendouzi, A., Mokrani, A.: Annotating Events, Time and Place Expressions in Arabic Texts. In: Proceedings of Recent Advances in Natural Language Processing, pp 25–31, Hissar, Bulgaria, 7–13 (2013)

    Google Scholar 

  4. Alsayadi, H.A., ElKorany, A.M.: Integrating semantic features for enhancing arabic named entity recognition. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(3), 2016 (2016)

    Google Scholar 

  5. Arnulphy, B., and Tannier, X.: Entités nommées événement: guide d’annotation. Notes ET Documents LMSI N: 2013–12 (2013)

    Google Scholar 

  6. Benajiba, Y., Rosso, P., Benedíruiz, J.M.: Anersys: An Arabic named entity recognition system based on maximum entropy. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 143–153 (2007)

    Google Scholar 

  7. Ben Ismail, S., Maraoui, H., Haddar, K., Romary, L.: ALIF editor for generating Arabic normalized lexicons. In: Will Appear in Proceedings of the International Conference on Information and Communication Systems (ICICS) (2017)

    Google Scholar 

  8. Ben Mesmia, F., Friburger, N., Haddar, K., Maurel, D.: Construction d’une cascade de transducteurs pour la reconnaissance des dates à partir d’un corpus Wikipédia. Colloque pour les Étudiants Chercheurs en Traitement Automatique du Langage naturel et ses applications, pp 8–11, Sousse, Tunisie (2015)

    Google Scholar 

  9. Ben Mesmia, F., Friburger, N., Haddar, K., Maurel, D.: Arabic named entity recognition process using transducer cascade and Arabic wikipedia. In: Proceedings of Recent Advances in Natural Language Processing, pp 48–54, Hissar, Bulgaria (2015)

    Google Scholar 

  10. Ben Mesmia, F., Friburger N., Haddar, K., Maurel, D.: Transducer cascade for an automatic recognition of Arabic Named Entities in order to establish links to free resources. In: First International Conference on Arabic Computational Linguistics (ACLing). pp 61–67 (2015)

    Google Scholar 

  11. Ben Mesmia, F., Friburger, N., Haddar, K., Maurel, D.: Recognition and TEI annotation of arabic event using transducers. In: Will appear in IEEE proceedings of CiLing’16 (2016)

    Google Scholar 

  12. Btoush, M.-H., Alarabeyyat, A., Olab, I.: Rule based approach for Arabic part of speech tagging and name entity recognition. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(6), 331–335 (2016)

    Google Scholar 

  13. Chinchor, N.: Overview of MUC-7/MET-2. In Proceedings of the Seventh Message Understanding Conference (MUC-7), p 1–4, Fairfax, VA, USA (1998)

    Google Scholar 

  14. Darwish, K., Gao, W.: Simple effective microblog named entity recognition: Arabic as an example. In: LREC, pp 2513–2517 (2014)

    Google Scholar 

  15. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction (ACE) program tasks, data, and evaluation. In: Proceedings of the 5th Conference on Language Resources and Evaluation (LREC 2004), pp. 837–840, Lisbon, Portugal (2004)

    Google Scholar 

  16. Doumi, N., Lehireche, A., Maurel, D., Ali Cherif, M.: La conception d’un jeu de ressources libres pour le TAL arabe sous Unitex. Paper presented at the TRADETAL2013, Colloque international en Traductologie et TAL, Oran—Algeria, 5–6 may. pp. 5–6 (2013)

    Google Scholar 

  17. Fehri, H., Haddar, K., Hamadou, A.B.: Recognition and translation of Arabic named entities with NooJ using a new representation model. In: Constant, M., Maletti, A., Savary, A. (eds.) FSMNLP, 9th International Workshop, pp. 134–142. ACL, Blois, France (2011)

    Google Scholar 

  18. Gravier, G., Bonastre, J.F., Galliano, S., Geoffrois, E., Mc Tait, K., Choukri, K.: ESTER, une campagne d’évaluation des systèmes d’indexation d’émissions radiophoniques, Proc. Journées d’Etude sur la Parole (2004)

    Google Scholar 

  19. Grouin, C., Rosset, S., Zweignbaum, P., Fort, K., Quintard, L.: Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview. In Proceedings of Linguistic Annotation Workshop, pp. 92–100 (2011)

    Google Scholar 

  20. Grishman, R., Sundheim, B.: Message understanding conference—6: a brief history. In: Proceedings of the 16th conference on Computational linguistics (COLING’96), pp 466–471, Copenhagen, Denmark (1996)

    Google Scholar 

  21. Kanya, N., Ravi, T.: Named Entity recognition from biomedical text—an information extraction task. ICTACT J. Sort Comput. 06(04), 1302–1307 (2016)

    Google Scholar 

  22. Küçük, D., Yazici, A.: A hybrid named entity recognizer for Turkish. Expert Syst. Appl. 39(3), 2733–2742 (2012)

    Article  MATH  Google Scholar 

  23. Maurel, D., Friburger, N., Eshkol, I., Antoine. J.-Y.: Explorer des corpus à l’aide de CasSys. Application au Corpus d’Orléans. G. Willems (ed.). Texte et corpus n°4, Actes des 6es Journées Internationales de Linguistique de Corpus (JLC). pp 189–196 (2013)

    Google Scholar 

  24. Maraoui, H., Haddar, K.: Automatisation de l’encodage des lexiques arabes. Colloque pour les Étudiants Chercheurs en Traitement Automatique du Langage naturel et ses applications, pp 74–82, Sousse, Tunisie (2015)

    Google Scholar 

  25. Merchant, R., Okurowski, M., Chinchor, N.: The multilingual entity task (MET) overview. In: Proceedings of a workshop on held at Vienna, Virginia. Morristown, NJ, USA. Association for Computational Linguistics. pp 445–447 ( 1996)

    Google Scholar 

  26. Mohammed, N.F., Omar, N.: Arabic named entity recognition using artificial neural network. J. Comput. Sci. 1285–1293 (2012)

    Google Scholar 

  27. Oudah, M., Shaalan, K.: NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic. Nat. Lang. Eng. 1–32 (2016)

    Google Scholar 

  28. Paumier, S.: UNITEX 3.2 ALPHA. User Manuel. Université Paris-Est Marne-la-Vallée. Date of version, February 23, 2017. 383 p. (2017)

    Google Scholar 

  29. Ramesh, D., Sanampudi, S.-K. A Hybrid model for Named Entity Recognition in Biomedical text. Int. J. Sci. Eng. Res. 7(6), 1164–1166. (2016). ISSN 2229-5518

    Google Scholar 

  30. Shaalan, K., Raza, H.: Person named entity recognition for Arabic. In: Proceedings of the 5th Workshop on Important Unresolved Matters, pp. 17–24 (2007)

    Google Scholar 

  31. Shaalan, K., Raza, H.: NERA: named entity recognition for Arabic. J. Am. Soc. Inform. Sci. Technol. 60(9), 1652–1663 (2009)

    Article  Google Scholar 

  32. Shaalan, K., Oudah, M.: A hybrid approach to Arabic named entity recognition. J. Inf. Sci. 40(1), 67–87 (2014)

    Article  Google Scholar 

  33. Shaalan, K.: A survey of Arabic named entity recognition and classification. Comput. Linguist. 40(2), 469–510 (2014)

    Article  Google Scholar 

  34. Saleh, I., Tounsi, L., Van Genabith, J.: ZamAn and Raqm: extracting temporal and numerical expressions. In Arabic in Information Retrieval, Lecture Notes in Computer Science, vol. 7097, pp. 562–573 (2011)

    Google Scholar 

  35. Serrano, L., Charnois, T., Brunessaux, S., Grilheres, B., Bouzid, M.: Combinaison d’approches pour l’extraction automatique d’événements. In: TALN’2012, volume 2, p: 423–430, Grenoble, France (2012)

    Google Scholar 

  36. Sharma, P., Sharma, U., Kalita, J.: Named Entity Recognition in Assamese: A hybrid approach. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI-2016), Jaipur, India (2016)

    Google Scholar 

  37. Sharma, P., Sharma, U., Kalita J.: Named entity recognition in assamese. J. Comput. Appl. 142(8), 1–8 (2016)

    Google Scholar 

  38. Text Encoding Initiative Consortium: TEI P5: Guidelines for Electronic Text Encoding and Interchange. Edited by C.M. Sperberg-McQueen and Lou Burnard for the ACH-ALLC-ACL. Version 3.1.0. 1887 p. (2016)

    Google Scholar 

  39. Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.-W.: Biomedical named entity recognition based on deep neutral network. Int. J. Hybrid Inf. Technol. 8(8), 279–288 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatma Ben Mesmia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mesmia, F.B., Haddar, K., Friburger, N., Maurel, D. (2018). CasANER: Arabic Named Entity Recognition Tool. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67056-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67055-3

  • Online ISBN: 978-3-319-67056-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics