A Statistical Approach for the Best Deep Neural Network Configuration for Arabic Language Processing

Saadi, Abdelhalim; Belhadef, Hacene

doi:10.1007/978-3-030-05481-6_16

Abdelhalim Saadi¹³ &
Hacene Belhadef¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 64))

Included in the following conference series:

International Symposium on Modelling and Implementation of Complex Systems

447 Accesses

Abstract

The widespread of the computer technology and the Internet lead to a massive amount of textual information being available in written Arabic. This that more is available, it becomes more difficult to extract the relevant information. To meet this challenge, many researchers are directed to the development of information retrieval systems based on syntactic and semantic parsing. In Arabic, this field is restricted by the lack of labeled datasets. Thus, it is important to build systems for part-of-speech tagging and language modeling and use their results for further syntactic and semantic parsing in fields like chunking, semantic role labeling, information extraction, named entity recognition and statistical machine translation. Deep neural networks have proved efficient in fields like imaging or acoustics and recently in natural language processing. In this study, we used the Taguchi method to find the optimal parameter combination for a deep neural network architecture. Therefore, the neural network obtained the most accurate results. The main use of the Taguchi method in our work is to help us to choose the best context which is the number of words before and after the word on which the training is made.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The root is the original form of the word before any transformation process, and it plays an important role in language studies
2.
A stem is a morpheme or a set of morphemes which expresses some central idea or meaning
3.
MADA+TOKAN version 3.2 using Aramorph version 1.2.1.
4.
Minitab 18: Statistical Software Free Trial. http://www.minitab.com.

References

Jurafsky, D., Martin, J.H.: Speech and language processing: an introduction to speech recognition. In: Computational Linguistics and Natural Language Processing. Prentice Hall (2008)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine learning, pp. 160–167. ACM (2008)
Google Scholar
Freeman, A.: Brill’s POS Tagger and a Morphology Parser for Arabic (2004)
Google Scholar
Diab, M., Hacioglu, K., Jurafsky, D.: Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004: Short papers, Association for Computational Linguistics, pp. 149–152 (2004)
Google Scholar
Duh, K., Kirchhoff, K.: POS tagging of dialectal Arabic: a minimally supervised approach. In: Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Association for Computational Linguistics, pp. 55–62 (2005)
Google Scholar
Khoja, S.: APT: Arabic part-of-speech tagger. In: Proceedings of the Student Workshop at NAACL, pp. 20–25 (2001)
Google Scholar
Heintz, I.: Arabic language modeling with finite state transducers. In: Proceedings of the ACL-08: HLT Student Research Workshop (Companion Volume), Association for Computational Linguistics, Columbus, pp. 37–42 (2008)
Google Scholar
Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A.: Morphology-Based Language Modeling for Arabic Speech Recognition (2004)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. volume 12, Aug, 2493–2537 (2011)
MATH Google Scholar
Fischer, A., Igel, C.: An introduction to restricted Boltzmann machines. In: Iberoamerican Congress on Pattern Recognition, pp. 14–36. Springer (2012)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Association for Computational Linguistics, vol. 1, pp. 173–180 (2003)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. Aistats 9, 249–256 (2010)
Google Scholar
Mahmoud, E.: KALIMAT a multipurpose Arabic corpus. https://sourceforge.net/projects/kalimat/files/kalimat/Part%20of%20Speech%20Tagged%20Corpus (2013)
Takeuchi, K., Collier, N.: Use of support vector machines in extended named entity. In: Proceedings on Computational Natural Language Learning, Taiwan, pp. 119–125 (2002)
Google Scholar
Toutanova, K.: Stanford log-linear part-of-speech tagger. https://nlp.stanford.edu/software/tagger.shtml (2011)
Lee, H.H.: Principles and Practices of Quality Design, 4th edn. Gaulih Book Publishing (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Technology, Department of Basic Education in Technology, University Ferhat ABBAS Setif 1, 19000, Setif, Algeria
Abdelhalim Saadi
MISC Laboratory, NTIC Faculty, Abdelhamid Mehri, Constantine 2 University, Constantine, Algeria
Hacene Belhadef

Authors

Abdelhalim Saadi
View author publications
You can also search for this author in PubMed Google Scholar
Hacene Belhadef
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdelhalim Saadi .

Editor information

Editors and Affiliations

Faculty of New Information and Communication Technologies, University of Constantine 2, Constantine, Algeria
Salim Chikhi
Faculty of Technology, GeCoDe Laboratory, University of Saida, Saida, Algeria
Abdelmalek Amine
Faculty of New Information and Communication Technologies, University of Constantine 2, Constantine, Algeria
Allaoua Chaoui
Faculty of New Information and Communication Technologies, University of Constantine 2, Constantine, Algeria
Djamel Eddine Saidouni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saadi, A., Belhadef, H. (2019). A Statistical Approach for the Best Deep Neural Network Configuration for Arabic Language Processing. In: Chikhi, S., Amine, A., Chaoui, A., Saidouni, D.E. (eds) Modelling and Implementation of Complex Systems. MISC 2018. Lecture Notes in Networks and Systems, vol 64. Springer, Cham. https://doi.org/10.1007/978-3-030-05481-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-05481-6_16
Published: 30 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05480-9
Online ISBN: 978-3-030-05481-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics