Skip to main content

A Statistical Approach for the Best Deep Neural Network Configuration for Arabic Language Processing

  • Conference paper
  • First Online:
Modelling and Implementation of Complex Systems (MISC 2018)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 64))

  • 447 Accesses

Abstract

The widespread of the computer technology and the Internet lead to a massive amount of textual information being available in written Arabic. This that more is available, it becomes more difficult to extract the relevant information. To meet this challenge, many researchers are directed to the development of information retrieval systems based on syntactic and semantic parsing. In Arabic, this field is restricted by the lack of labeled datasets. Thus, it is important to build systems for part-of-speech tagging and language modeling and use their results for further syntactic and semantic parsing in fields like chunking, semantic role labeling, information extraction, named entity recognition and statistical machine translation. Deep neural networks have proved efficient in fields like imaging or acoustics and recently in natural language processing. In this study, we used the Taguchi method to find the optimal parameter combination for a deep neural network architecture. Therefore, the neural network obtained the most accurate results. The main use of the Taguchi method in our work is to help us to choose the best context which is the number of words before and after the word on which the training is made.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The root is the original form of the word before any transformation process, and it plays an important role in language studies

  2. 2.

    A stem is a morpheme or a set of morphemes which expresses some central idea or meaning

  3. 3.

    MADA+TOKAN version 3.2 using Aramorph version 1.2.1.

  4. 4.

    Minitab 18: Statistical Software Free Trial. http://www.minitab.com.

References

  1. Jurafsky, D., Martin, J.H.: Speech and language processing: an introduction to speech recognition. In: Computational Linguistics and Natural Language Processing. Prentice Hall (2008)

    Google Scholar 

  2. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine learning, pp. 160–167. ACM (2008)

    Google Scholar 

  3. Freeman, A.: Brill’s POS Tagger and a Morphology Parser for Arabic (2004)

    Google Scholar 

  4. Diab, M., Hacioglu, K., Jurafsky, D.: Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004: Short papers, Association for Computational Linguistics, pp. 149–152 (2004)

    Google Scholar 

  5. Duh, K., Kirchhoff, K.: POS tagging of dialectal Arabic: a minimally supervised approach. In: Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Association for Computational Linguistics, pp. 55–62 (2005)

    Google Scholar 

  6. Khoja, S.: APT: Arabic part-of-speech tagger. In: Proceedings of the Student Workshop at NAACL, pp. 20–25 (2001)

    Google Scholar 

  7. Heintz, I.: Arabic language modeling with finite state transducers. In: Proceedings of the ACL-08: HLT Student Research Workshop (Companion Volume), Association for Computational Linguistics, Columbus, pp. 37–42 (2008)

    Google Scholar 

  8. Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A.: Morphology-Based Language Modeling for Arabic Speech Recognition (2004)

    Google Scholar 

  9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  10. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. volume 12, Aug, 2493–2537 (2011)

    MATH  Google Scholar 

  11. Fischer, A., Igel, C.: An introduction to restricted Boltzmann machines. In: Iberoamerican Congress on Pattern Recognition, pp. 14–36. Springer (2012)

    Google Scholar 

  12. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Association for Computational Linguistics, vol. 1, pp. 173–180 (2003)

    Google Scholar 

  13. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. Aistats 9, 249–256 (2010)

    Google Scholar 

  14. Mahmoud, E.: KALIMAT a multipurpose Arabic corpus. https://sourceforge.net/projects/kalimat/files/kalimat/Part%20of%20Speech%20Tagged%20Corpus (2013)

  15. Takeuchi, K., Collier, N.: Use of support vector machines in extended named entity. In: Proceedings on Computational Natural Language Learning, Taiwan, pp. 119–125 (2002)

    Google Scholar 

  16. Toutanova, K.: Stanford log-linear part-of-speech tagger. https://nlp.stanford.edu/software/tagger.shtml (2011)

  17. Lee, H.H.: Principles and Practices of Quality Design, 4th edn. Gaulih Book Publishing (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelhalim Saadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saadi, A., Belhadef, H. (2019). A Statistical Approach for the Best Deep Neural Network Configuration for Arabic Language Processing. In: Chikhi, S., Amine, A., Chaoui, A., Saidouni, D.E. (eds) Modelling and Implementation of Complex Systems. MISC 2018. Lecture Notes in Networks and Systems, vol 64. Springer, Cham. https://doi.org/10.1007/978-3-030-05481-6_16

Download citation

Publish with us

Policies and ethics