Abstract
Amazigh is a morphologically rich language, which presents a challenge for Part of Speech tagging. Part of Speech (POS) tagging is an important component for almost all Natural Language Processing (NLP) application areas.
Applying machine-learning techniques to the less computerized languages require development of appropriately tagged corpus. In this paper, we have developed POS taggers for Amazigh language, a less privileged language, using Conditional Random Field (CRF), Support Vector Machine (SVM) and TreeTagger system. We have manually annotated approximately 75000 tokens, collected from the written texts with a POS tagset of 28 tags defined for the Amazigh language. The POS taggers make use of the different contextual and orthographic word-level features. These features are language independent and applicable to other languages also. POS taggers have been trained, and tested with the same corpora. Evaluation results demonstrated the accuracies of 89.18%, 88.02% and 90.86% in the CRF, SVM and TreeTagger, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Singh, J., Joshi, N., Mathur, I.: Development of Marathi part of speech tagger using statistical approach. In: International Conference on Advances in Computing, Communications and Informatics (2013)
Kumar, D., Singh Josan, G.: Part of speech tagger for morphologically rich Indian language: a survey. Int. J. Comput. Appl. 6(5), 1–9 (2010)
Dhanalakshmi, V., Kumar, A., Shivapratap, G., Soman, K.P., Rajendran, S.: Tamil POS tagging using linear programming. Int. J. Recent Trends Eng. 1(2), 166–169 (2009)
Kaur Sidhu, G., Kaur, N.: Role of machine translation and word sense disambiguation in natural language processing. IOSR J. Comput. Eng. (IOSR-JCE) 11, 78–83 (2013)
Martin, J.H, Jurafsky, D.: Speech and Language Processing. International Edition (2010)
Van Guilder, L.: Automated Part of Speech Tagging: A Brief Overview, Handout for LING361. Georgetown University (1995)
Nakagawa, T., Uchimoto, K.: A hybrid approach to word segmentation and pos tagging. In: The 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 217–220 (2007)
Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21, 543–565 (1995)
Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the ACL SIGDAT-Workshop, pp. 13–26. Academic Publishers, Dordrecht (1999)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of EMNLP, Philadelphia, USA (1996)
Kudo, T., Matsumoto, Y.: Use of support vector learning for chunk identification (2000)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001, pp. 282–289 (2001)
Chafiq, M.: [Forty four lessons in Amazigh]. éd. Arabo-africaines (1991)
Chaker, S.: Textes en linguistique berbère - introduction au domaine berbère, éditions du CNRS, pp. 232–242 (1984)
Boukhris, F., Boumalk, A., Elmoujahid, E., Souifi, H.: «La nouvelle grammaire de l’amazighe». IRCAM, Rabat (2008)
Amri, S., Zenkouar, L., Outahajala, M.: Amazigh part-of-speech tagging using Markov models and decision trees. IJCSIT J. 8(5), 61–71 (2016)
Amri, S., Zenkouar, L., Outahajala, M.: Build a morphosyntaxically annotated amazigh corpus. In: Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, Tetuan, Morocco (2017). https://doi.org/10.1145/3090354.3090362
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)
Dermatas, E., George, K.: Automatic stochastic tagging of natural language texts. Comput. Linguist. 21(2), 137–163 (1995)
Outahajala, M., Benajiba, Y., Rosso, P., Zenkouar, L.: POS tagging in amazigh using support vector machines and conditional random fields. In: Natural Language to Information Systems. LNCS, vol. 6716, pp. 238–241. Springer (2011). https://doi.org/10.1007/978-3-642-22327-3_28
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Samir, A., Lahbib, Z., Mohamed, O. (2018). Amazigh PoS Tagging Using Machine Learning Techniques. In: Ben Ahmed, M., Boudhir, A. (eds) Innovations in Smart Cities and Applications. SCAMS 2017. Lecture Notes in Networks and Systems, vol 37. Springer, Cham. https://doi.org/10.1007/978-3-319-74500-8_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-74500-8_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74499-5
Online ISBN: 978-3-319-74500-8
eBook Packages: EngineeringEngineering (R0)