Abstract
The widespread of the computer technology and the Internet lead to a massive amount of textual information being available in written Arabic. This that more is available, it becomes more difficult to extract the relevant information. To meet this challenge, many researchers are directed to the development of information retrieval systems based on syntactic and semantic parsing. In Arabic, this field is restricted by the lack of labeled datasets. Thus, it is important to build systems for part-of-speech tagging and language modeling and use their results for further syntactic and semantic parsing in fields like chunking, semantic role labeling, information extraction, named entity recognition and statistical machine translation. Deep neural networks have proved efficient in fields like imaging or acoustics and recently in natural language processing. In this study, we used the Taguchi method to find the optimal parameter combination for a deep neural network architecture. Therefore, the neural network obtained the most accurate results. The main use of the Taguchi method in our work is to help us to choose the best context which is the number of words before and after the word on which the training is made.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The root is the original form of the word before any transformation process, and it plays an important role in language studies
- 2.
A stem is a morpheme or a set of morphemes which expresses some central idea or meaning
- 3.
MADA+TOKAN version 3.2 using Aramorph version 1.2.1.
- 4.
Minitab 18: Statistical Software Free Trial. http://www.minitab.com.
References
Jurafsky, D., Martin, J.H.: Speech and language processing: an introduction to speech recognition. In: Computational Linguistics and Natural Language Processing. Prentice Hall (2008)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine learning, pp. 160–167. ACM (2008)
Freeman, A.: Brill’s POS Tagger and a Morphology Parser for Arabic (2004)
Diab, M., Hacioglu, K., Jurafsky, D.: Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004: Short papers, Association for Computational Linguistics, pp. 149–152 (2004)
Duh, K., Kirchhoff, K.: POS tagging of dialectal Arabic: a minimally supervised approach. In: Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Association for Computational Linguistics, pp. 55–62 (2005)
Khoja, S.: APT: Arabic part-of-speech tagger. In: Proceedings of the Student Workshop at NAACL, pp. 20–25 (2001)
Heintz, I.: Arabic language modeling with finite state transducers. In: Proceedings of the ACL-08: HLT Student Research Workshop (Companion Volume), Association for Computational Linguistics, Columbus, pp. 37–42 (2008)
Vergyri, D., Kirchhoff, K., Duh, K., Stolcke, A.: Morphology-Based Language Modeling for Arabic Speech Recognition (2004)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. volume 12, Aug, 2493–2537 (2011)
Fischer, A., Igel, C.: An introduction to restricted Boltzmann machines. In: Iberoamerican Congress on Pattern Recognition, pp. 14–36. Springer (2012)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Association for Computational Linguistics, vol. 1, pp. 173–180 (2003)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. Aistats 9, 249–256 (2010)
Mahmoud, E.: KALIMAT a multipurpose Arabic corpus. https://sourceforge.net/projects/kalimat/files/kalimat/Part%20of%20Speech%20Tagged%20Corpus (2013)
Takeuchi, K., Collier, N.: Use of support vector machines in extended named entity. In: Proceedings on Computational Natural Language Learning, Taiwan, pp. 119–125 (2002)
Toutanova, K.: Stanford log-linear part-of-speech tagger. https://nlp.stanford.edu/software/tagger.shtml (2011)
Lee, H.H.: Principles and Practices of Quality Design, 4th edn. Gaulih Book Publishing (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Saadi, A., Belhadef, H. (2019). A Statistical Approach for the Best Deep Neural Network Configuration for Arabic Language Processing. In: Chikhi, S., Amine, A., Chaoui, A., Saidouni, D.E. (eds) Modelling and Implementation of Complex Systems. MISC 2018. Lecture Notes in Networks and Systems, vol 64. Springer, Cham. https://doi.org/10.1007/978-3-030-05481-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-05481-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05480-9
Online ISBN: 978-3-030-05481-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)