Skip to main content

Amazigh PoS Tagging Using Machine Learning Techniques

  • Conference paper
  • First Online:
Innovations in Smart Cities and Applications (SCAMS 2017)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 37))

Included in the following conference series:

Abstract

Amazigh is a morphologically rich language, which presents a challenge for Part of Speech tagging. Part of Speech (POS) tagging is an important component for almost all Natural Language Processing (NLP) application areas.

Applying machine-learning techniques to the less computerized languages require development of appropriately tagged corpus. In this paper, we have developed POS taggers for Amazigh language, a less privileged language, using Conditional Random Field (CRF), Support Vector Machine (SVM) and TreeTagger system. We have manually annotated approximately 75000 tokens, collected from the written texts with a POS tagset of 28 tags defined for the Amazigh language. The POS taggers make use of the different contextual and orthographic word-level features. These features are language independent and applicable to other languages also. POS taggers have been trained, and tested with the same corpora. Evaluation results demonstrated the accuracies of 89.18%, 88.02% and 90.86% in the CRF, SVM and TreeTagger, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://crfpp.sourceforge.net/.

  2. 2.

    http://chasen.org/∼taku/software/yamcha/.

  3. 3.

    http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/.

References

  1. Singh, J., Joshi, N., Mathur, I.: Development of Marathi part of speech tagger using statistical approach. In: International Conference on Advances in Computing, Communications and Informatics (2013)

    Google Scholar 

  2. Kumar, D., Singh Josan, G.: Part of speech tagger for morphologically rich Indian language: a survey. Int. J. Comput. Appl. 6(5), 1–9 (2010)

    Google Scholar 

  3. Dhanalakshmi, V., Kumar, A., Shivapratap, G., Soman, K.P., Rajendran, S.: Tamil POS tagging using linear programming. Int. J. Recent Trends Eng. 1(2), 166–169 (2009)

    Google Scholar 

  4. Kaur Sidhu, G., Kaur, N.: Role of machine translation and word sense disambiguation in natural language processing. IOSR J. Comput. Eng. (IOSR-JCE) 11, 78–83 (2013)

    Article  Google Scholar 

  5. Martin, J.H, Jurafsky, D.: Speech and Language Processing. International Edition (2010)

    Google Scholar 

  6. Van Guilder, L.: Automated Part of Speech Tagging: A Brief Overview, Handout for LING361. Georgetown University (1995)

    Google Scholar 

  7. Nakagawa, T., Uchimoto, K.: A hybrid approach to word segmentation and pos tagging. In: The 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 217–220 (2007)

    Google Scholar 

  8. Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)

    Google Scholar 

  9. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21, 543–565 (1995)

    MathSciNet  Google Scholar 

  10. Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the ACL SIGDAT-Workshop, pp. 13–26. Academic Publishers, Dordrecht (1999)

    Google Scholar 

  11. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of EMNLP, Philadelphia, USA (1996)

    Google Scholar 

  12. Kudo, T., Matsumoto, Y.: Use of support vector learning for chunk identification (2000)

    Google Scholar 

  13. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001, pp. 282–289 (2001)

    Google Scholar 

  14. Chafiq, M.: [Forty four lessons in Amazigh]. éd. Arabo-africaines (1991)

    Google Scholar 

  15. Chaker, S.: Textes en linguistique berbère - introduction au domaine berbère, éditions du CNRS, pp. 232–242 (1984)

    Google Scholar 

  16. Boukhris, F., Boumalk, A., Elmoujahid, E., Souifi, H.: «La nouvelle grammaire de l’amazighe». IRCAM, Rabat (2008)

    Google Scholar 

  17. Amri, S., Zenkouar, L., Outahajala, M.: Amazigh part-of-speech tagging using Markov models and decision trees. IJCSIT J. 8(5), 61–71 (2016)

    Article  Google Scholar 

  18. Amri, S., Zenkouar, L., Outahajala, M.: Build a morphosyntaxically annotated amazigh corpus. In: Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, Tetuan, Morocco (2017). https://doi.org/10.1145/3090354.3090362

  19. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Google Scholar 

  20. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)

    Google Scholar 

  21. Dermatas, E., George, K.: Automatic stochastic tagging of natural language texts. Comput. Linguist. 21(2), 137–163 (1995)

    Google Scholar 

  22. Outahajala, M., Benajiba, Y., Rosso, P., Zenkouar, L.: POS tagging in amazigh using support vector machines and conditional random fields. In: Natural Language to Information Systems. LNCS, vol. 6716, pp. 238–241. Springer (2011). https://doi.org/10.1007/978-3-642-22327-3_28

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amri Samir .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Samir, A., Lahbib, Z., Mohamed, O. (2018). Amazigh PoS Tagging Using Machine Learning Techniques. In: Ben Ahmed, M., Boudhir, A. (eds) Innovations in Smart Cities and Applications. SCAMS 2017. Lecture Notes in Networks and Systems, vol 37. Springer, Cham. https://doi.org/10.1007/978-3-319-74500-8_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74500-8_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74499-5

  • Online ISBN: 978-3-319-74500-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics