Lexicon Free Arabic Speech Recognition Recipe

Ahmed, Abdelrahman; Hifny, Yasser; Shaalan, Khaled; Toral, Sergio

doi:10.1007/978-3-319-48308-5_15

Abdelrahman Ahmed⁷,
Yasser Hifny⁸,
Khaled Shaalan^9,10 &
…
Sergio Toral⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 533))

Included in the following conference series:

International Conference on Advanced Intelligent Systems and Informatics

2562 Accesses
4 Citations

Abstract

We present the first end-to-end recipe of Arabic speech recognition using lexicon free Connection Temporal Classification (CTC) and Recurrent Neural Networks (RNN). The study describes in details the decisions made, step by step, in building Arabic system including transcription method, feature extraction, training process and decoding optimization. The results are compared with Hidden Markov Models (HMM), Gaussian mixture models (GMM), and tandem baseline in Arabic using the same data set. The corpus is Aljazeera broadcast and language model extracted from the Aljazeera corpus, web and twitter crawling using different n-grams. We measure both word error rate (WER) and character error rate (CER) for each n-gram order. The results achieved are very close to the baseline with some recommendations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Alfred, M.: Signal Analysis Wavelets, Filter Banks, Time-Frequency Transforms and Applications. Wiley, New York (1999)
MATH Google Scholar
Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J.: A complete KALDI recipe for building Arabic speech recognition systems. In: Spoken Language Technology Workshop (SLT), IEEE 2014. IEEE (2014a)
Google Scholar
Ali, A., Zhang, Y., Vogel, S.: QCRI advanced transcription system (QATS). In: Spoken Language Technology Workshop (SLT) (2014b)
Google Scholar
Attia, M., Samih, Y., Shaalan, K.F., van Genabith, J.: The floating Arabic dictionary: an automatic method for updating a lexical database through the detection and lemmatization of unknown words. In: COLING (2012)
Google Scholar
Bourlard, H., Morgan, N.: Hybrid HMM/ANN systems for speech recognition: overview and new research directions. In: Giles, C.L., Gori, M. (eds.) NN 1997. LNCS, vol. 1387, pp. 389–417. Springer, Heidelberg (1998). doi:10.1007/BFb0054006
Chapter Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (1996)
Google Scholar
Diehl, F., Gales, M.J., Tomalin, M., Woodland, P.C.: Morphological decomposition in Arabic ASR systems. Comput. Speech Lang. 26(4), 229–243 (2012)
Article Google Scholar
Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. (TALIP) 8(4), 14 (2009)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics(2011)
Google Scholar
Graves, A., Fernndez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning. ACM (2006)
Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML-2014) (2014)
Google Scholar
Graves, A., Mohamed, A.-R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)
Google Scholar
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint. arXiv:1412.5567
Hermansky, H., Ellis, D. W., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000. IEEE (2000)
Google Scholar
Hifny, Y.: Unified acoustic modeling using deep conditional random fields. Trans. Mach. Learn. Artif. Intell. 3(2), 65 (2015)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Maas, A. L., Xie, Z., Jurafsky, D., Ng, A.Y.: Lexicon-free conversational speech recognition with neural networks. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015)
Google Scholar
Motlicek, P., Imseng, D., Potard, B., Garner, P.N., Himawan, I.: Exploiting foreign resources for DNN-based ASR. EURASIP J. Audio Speech Music Process. 2015(1), 1–10 (2015)
Article Google Scholar
Nadeu, C., Hernando, J., Gorricho, M.: On the decorrelation of filter-bank energies in speech recognition. In: Eurospeech. Citeseer (1995)
Google Scholar
Othman, E., Shaalan, K., Rafea, A.: Towards resolving ambiguity in understanding arabic sentence, In: International Conference on Arabic Language Resources and Tools, NEMLAR. Citeseer (2004)
Google Scholar
Radha, V., Vimala, C.: A review on speech recognition challenges and approaches. World Comput. Sci. Inf. Technol. J. 2(1), 1–7 (2012)
Google Scholar
Raschka, S.: Python Machine Learning. Packt Publishing, Birmingham (2015)
Google Scholar
Shaalan, K.: A survey of arabic named entity recognition and classification. Comput. Linguist. 40(2), 469–510 (2014)
Article Google Scholar
Shaalan, K., Bakr, H.M.A., Ziedan, I.: A hybrid approach for building Arabic diacritizer. In: Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics (2009)
Google Scholar
Shaalan, K., Magdy, M., Fahmy, A.: Analysis and feedback of erroneous Arabic verbs. Nat. Lang. Eng. 21(02), 271–323 (2015)
Article Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D.: The HTK Book (for HTK Version 3.5). Cambridge University Engineering Department, Cambridge (2015)
Google Scholar
Young, S., Evermann, G., Gales, M., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, D., Woodland, P.: The HTK Book. Cambridge University Engineering Department, Cambridge (2013)
Google Scholar
Yu, D., Deng, L.: Automatic Speech Recognition. Springer, London (2012)
Google Scholar

Download references

Acknowledgment

Many thanks for Luminous technology center (info@luminous-technologies.com) for having full access to server NVidia based setup. Special thanks for QCRI for proving Aljazeera corpus. We thank Ziang Xie, Standard University.

Author information

Authors and Affiliations

Electronic Engineering Department, University of Seville, Seville, Spain
Abdelrahman Ahmed & Sergio Toral
Department of Information Technology, University of Helwan, Cairo, Egypt
Yasser Hifny
School of Informatics, Edinburgh, UK
Khaled Shaalan
The British University in Dubai, Dubai, UAE
Khaled Shaalan

Authors

Abdelrahman Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Yasser Hifny
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Shaalan
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Toral
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdelrahman Ahmed .

Editor information

Editors and Affiliations

Faculty of Computers & Information, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Dubai International Academic City, The British University, Dubai, United Arab Emirates
Khaled Shaalan
CS Dept. Faculty of Computers and Inform, Suez Canal University CS Dept. Faculty of Computers and Inform, Ismailia, Egypt
Tarek Gaber
Ahmed Orabi Square , Menouf, Egypt
Ahmad Taher Azar
Faculty of Computer & Information Scienc, Ain Shams University Faculty of Computer & Information Scienc, Cairo, Egypt
M. F. Tolba

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmed, A., Hifny, Y., Shaalan, K., Toral, S. (2017). Lexicon Free Arabic Speech Recognition Recipe. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-48308-5_15
Published: 18 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48307-8
Online ISBN: 978-3-319-48308-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics