Abstract
Actually, the Named Entity Recognition (NER) task is a very innovative research line involving the process of unstructured or semi-structured textual resources to identify the relevant NEs and classify them into predefined categories. Generally, NER task is based on the classification process, which always refers to the previous categorizations. In this context, we propose CasANER, which is a system recognizing and annotating the ANEs. The CasANER elaboration is based on a deep categorization made using a representative Arabic Wikipedia corpus. Moreover, our proposed system is composed of two kinds of transducer cascades, which are the analysis and synthesis transducers. The analysis cascade, which is dedicated to the ANE recognition process, includes the analysis, filtering and generic transduces. However, the synthesis cascade enables to transform the annotation of the recognized ANEs into an annotation respecting the TEI recommendation in order to provide a structured output. The implementation of CasANER is ensured by the linguistic platform Unitex. Then, its evaluation is made using measure values, which show that our proposed system outcomes are satisfactory. Besides, we compare CasANER system with a statistical system recognizing ANEs. The comparison phase proved that the results obtained by our system are as efficient as those of the statistical system in the recognition and annotation of the person’s names and organization names.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
The tagset of Arabic Unitex package dictionaries.
- 5.
References
AbdelRahman, S., Elarnaoty, M., Magdy, M., Fahmy, A.: Integrated machine learning techniques for Arabic named entity recognition. Int. J. Comput. Sci. (IJCSI) 27–36 (2010)
Aboaoga, M., Aziz, M.J.A.: Arabic person names recognition by using a rule based approach. J. Comput. Sci. 922–927 (2013)
Aliane, H., Guendouzi, A., Mokrani, A.: Annotating Events, Time and Place Expressions in Arabic Texts. In: Proceedings of Recent Advances in Natural Language Processing, pp 25–31, Hissar, Bulgaria, 7–13 (2013)
Alsayadi, H.A., ElKorany, A.M.: Integrating semantic features for enhancing arabic named entity recognition. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(3), 2016 (2016)
Arnulphy, B., and Tannier, X.: Entités nommées événement: guide d’annotation. Notes ET Documents LMSI N: 2013–12 (2013)
Benajiba, Y., Rosso, P., Benedíruiz, J.M.: Anersys: An Arabic named entity recognition system based on maximum entropy. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 143–153 (2007)
Ben Ismail, S., Maraoui, H., Haddar, K., Romary, L.: ALIF editor for generating Arabic normalized lexicons. In: Will Appear in Proceedings of the International Conference on Information and Communication Systems (ICICS) (2017)
Ben Mesmia, F., Friburger, N., Haddar, K., Maurel, D.: Construction d’une cascade de transducteurs pour la reconnaissance des dates à partir d’un corpus Wikipédia. Colloque pour les Étudiants Chercheurs en Traitement Automatique du Langage naturel et ses applications, pp 8–11, Sousse, Tunisie (2015)
Ben Mesmia, F., Friburger, N., Haddar, K., Maurel, D.: Arabic named entity recognition process using transducer cascade and Arabic wikipedia. In: Proceedings of Recent Advances in Natural Language Processing, pp 48–54, Hissar, Bulgaria (2015)
Ben Mesmia, F., Friburger N., Haddar, K., Maurel, D.: Transducer cascade for an automatic recognition of Arabic Named Entities in order to establish links to free resources. In: First International Conference on Arabic Computational Linguistics (ACLing). pp 61–67 (2015)
Ben Mesmia, F., Friburger, N., Haddar, K., Maurel, D.: Recognition and TEI annotation of arabic event using transducers. In: Will appear in IEEE proceedings of CiLing’16 (2016)
Btoush, M.-H., Alarabeyyat, A., Olab, I.: Rule based approach for Arabic part of speech tagging and name entity recognition. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(6), 331–335 (2016)
Chinchor, N.: Overview of MUC-7/MET-2. In Proceedings of the Seventh Message Understanding Conference (MUC-7), p 1–4, Fairfax, VA, USA (1998)
Darwish, K., Gao, W.: Simple effective microblog named entity recognition: Arabic as an example. In: LREC, pp 2513–2517 (2014)
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The automatic content extraction (ACE) program tasks, data, and evaluation. In: Proceedings of the 5th Conference on Language Resources and Evaluation (LREC 2004), pp. 837–840, Lisbon, Portugal (2004)
Doumi, N., Lehireche, A., Maurel, D., Ali Cherif, M.: La conception d’un jeu de ressources libres pour le TAL arabe sous Unitex. Paper presented at the TRADETAL2013, Colloque international en Traductologie et TAL, Oran—Algeria, 5–6 may. pp. 5–6 (2013)
Fehri, H., Haddar, K., Hamadou, A.B.: Recognition and translation of Arabic named entities with NooJ using a new representation model. In: Constant, M., Maletti, A., Savary, A. (eds.) FSMNLP, 9th International Workshop, pp. 134–142. ACL, Blois, France (2011)
Gravier, G., Bonastre, J.F., Galliano, S., Geoffrois, E., Mc Tait, K., Choukri, K.: ESTER, une campagne d’évaluation des systèmes d’indexation d’émissions radiophoniques, Proc. Journées d’Etude sur la Parole (2004)
Grouin, C., Rosset, S., Zweignbaum, P., Fort, K., Quintard, L.: Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview. In Proceedings of Linguistic Annotation Workshop, pp. 92–100 (2011)
Grishman, R., Sundheim, B.: Message understanding conference—6: a brief history. In: Proceedings of the 16th conference on Computational linguistics (COLING’96), pp 466–471, Copenhagen, Denmark (1996)
Kanya, N., Ravi, T.: Named Entity recognition from biomedical text—an information extraction task. ICTACT J. Sort Comput. 06(04), 1302–1307 (2016)
Küçük, D., Yazici, A.: A hybrid named entity recognizer for Turkish. Expert Syst. Appl. 39(3), 2733–2742 (2012)
Maurel, D., Friburger, N., Eshkol, I., Antoine. J.-Y.: Explorer des corpus à l’aide de CasSys. Application au Corpus d’Orléans. G. Willems (ed.). Texte et corpus n°4, Actes des 6es Journées Internationales de Linguistique de Corpus (JLC). pp 189–196 (2013)
Maraoui, H., Haddar, K.: Automatisation de l’encodage des lexiques arabes. Colloque pour les Étudiants Chercheurs en Traitement Automatique du Langage naturel et ses applications, pp 74–82, Sousse, Tunisie (2015)
Merchant, R., Okurowski, M., Chinchor, N.: The multilingual entity task (MET) overview. In: Proceedings of a workshop on held at Vienna, Virginia. Morristown, NJ, USA. Association for Computational Linguistics. pp 445–447 ( 1996)
Mohammed, N.F., Omar, N.: Arabic named entity recognition using artificial neural network. J. Comput. Sci. 1285–1293 (2012)
Oudah, M., Shaalan, K.: NERA 2.0: Improving coverage and performance of rule-based named entity recognition for Arabic. Nat. Lang. Eng. 1–32 (2016)
Paumier, S.: UNITEX 3.2 ALPHA. User Manuel. Université Paris-Est Marne-la-Vallée. Date of version, February 23, 2017. 383 p. (2017)
Ramesh, D., Sanampudi, S.-K. A Hybrid model for Named Entity Recognition in Biomedical text. Int. J. Sci. Eng. Res. 7(6), 1164–1166. (2016). ISSN 2229-5518
Shaalan, K., Raza, H.: Person named entity recognition for Arabic. In: Proceedings of the 5th Workshop on Important Unresolved Matters, pp. 17–24 (2007)
Shaalan, K., Raza, H.: NERA: named entity recognition for Arabic. J. Am. Soc. Inform. Sci. Technol. 60(9), 1652–1663 (2009)
Shaalan, K., Oudah, M.: A hybrid approach to Arabic named entity recognition. J. Inf. Sci. 40(1), 67–87 (2014)
Shaalan, K.: A survey of Arabic named entity recognition and classification. Comput. Linguist. 40(2), 469–510 (2014)
Saleh, I., Tounsi, L., Van Genabith, J.: ZamAn and Raqm: extracting temporal and numerical expressions. In Arabic in Information Retrieval, Lecture Notes in Computer Science, vol. 7097, pp. 562–573 (2011)
Serrano, L., Charnois, T., Brunessaux, S., Grilheres, B., Bouzid, M.: Combinaison d’approches pour l’extraction automatique d’événements. In: TALN’2012, volume 2, p: 423–430, Grenoble, France (2012)
Sharma, P., Sharma, U., Kalita, J.: Named Entity Recognition in Assamese: A hybrid approach. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI-2016), Jaipur, India (2016)
Sharma, P., Sharma, U., Kalita J.: Named entity recognition in assamese. J. Comput. Appl. 142(8), 1–8 (2016)
Text Encoding Initiative Consortium: TEI P5: Guidelines for Electronic Text Encoding and Interchange. Edited by C.M. Sperberg-McQueen and Lou Burnard for the ACH-ALLC-ACL. Version 3.1.0. 1887 p. (2016)
Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.-W.: Biomedical named entity recognition based on deep neutral network. Int. J. Hybrid Inf. Technol. 8(8), 279–288 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Mesmia, F.B., Haddar, K., Friburger, N., Maurel, D. (2018). CasANER: Arabic Named Entity Recognition Tool. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-67056-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67055-3
Online ISBN: 978-3-319-67056-0
eBook Packages: EngineeringEngineering (R0)