Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units

Sindran, Fadi; Mualla, Firas; Haderlein, Tino; Daqrouq, Khaled; Nöth, Elmar

doi:10.1007/978-3-319-64206-2_23

Fadi Sindran¹⁵,
Firas Mualla¹⁵,
Tino Haderlein¹⁵,
Khaled Daqrouq¹⁶ &
…
Elmar Nöth¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1485 Accesses
1 Citations

Abstract

Phonetically rich and balanced speech corpora are essential components in state-of-the-art automatic speech recognition (ASR) and text-to-speech (TTS) systems. The written form of speech corpora must be prepared carefully to represent the richness and balance in the linguistic content. There is a lack of this type of spoken and written corpora for Standard Arabic (SA), and the only one available was prepared manually by expert linguists and phoneticians. In this work, we address the task of automatic preparation of written corpora with rich linguistic units. Our work depends on a comprehensive statistical linguistic study of SA based on automatic phonetic transcription of texts with more than 5 million words. We prepared two written corpora: the first corpus contains all allophones in SA with at least 3 occurrences of each allophone and 17 occurences of each phoneme. The second corpus contains, in addition to all allophones, 90.72% of diphones in SA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al Jazeera Website For Learning Arabic, in Arabic: “”, March 2017. http://learning.aljazeera.net/Arabic
Diwan of Standard Arabic Poetry, in Arabic “”, March 2017. http://www.aldiwan.net/poem.html?Word=%C7%E1%DF%C7%E3%E1&Find=meaning
Holy Bible, in Arabic: “”, March 2017. http://ar.arabicbible.com/arabic-bible/word.html
Holy Quran, in Arabic: “”, March 2017. http://www.holyquran.net/quran/index.html
Nahj al-Balagha, in Arabic: “”, March 2017. http://ia600306.us.archive.org/7/items/98472389432/nhj-blagh-ali.pdf
Abushariah, M., Ainon, R., Zainuddin, R., Khalifa, O., Elshafei, M.: Phonetically rich and balanced arabic speech corpus: an overview. In: International Conference on Computer and Communication Engineering, pp. 1–6. IEEE, Kuala Lumpur (2010)
Google Scholar
Alghamdi, M., Alhamid, A.H., Aldasuqi, M.M.: Database of Arabic sounds: sentences, in Arabic: “”. Technical report, King Abdulaziz City of Science and Technology (KACST), Riyadh, Saudi Arabia (2003)
Google Scholar
Bobzin, K.: Arabic Basic Course, in German: “Arabisch Grundkurs”. Harrassowitz Verlag, Wiesbaden (2009)
Google Scholar
Gibbon, D., Moore, R., Winski, R.: Handbook of Standards and Resources for Spoken Language Systems. Mouton De Gruyter, Berlin (1997)
Google Scholar
Matoušek, J., Romportl, J.: On building phonetically and prosodically rich speech corpus for text-to-speech synthesis. In: Proceedings of the Second IASTED International Conference on Computational Intelligence, pp. 442–447. ACTA Press, San Francisco (2006)
Google Scholar
Sindran, F., Mualla, F., Haderlein, T., Daqrouq, K., Nöth, E.: Automatic phonetization-based statistical linguistic study of standard Arabic. Int. J. Comput. Linguist. (IJCL) 7, 38–53 (2016)
Google Scholar
Sindran, F., Mualla, F., Haderlein, T., Daqrouq, K., Nöth, E.: Rule-based standard arabic phonetization at phoneme, allophone, and syllable level. Int. J. Comput. Linguist. (IJCL) 7, 23–37 (2016)
Google Scholar
Cormen, Thomas H., Leiserson, Charles E., Rivest, Ronald L., Stein, Clifford: Introduction to Algorithms. The MIT Press, Massachusetts (2009)
MATH Google Scholar
Yuwan, R., Lestari, D.P.: Automatic extraction phonetically rich and balanced verses for speaker-dependent quranic speech recognition system. In: 14th International Conference of the Pacific Association for Computational Linguistics, pp. 65–75. Springer, Bali (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Informatik 5 (Mustererkennung), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), MartensstraßE 3, 91058, Erlangen, Germany
Fadi Sindran, Firas Mualla, Tino Haderlein & Elmar Nöth
Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah, 22254, Saudi Arabia
Khaled Daqrouq

Authors

Fadi Sindran
View author publications
You can also search for this author in PubMed Google Scholar
Firas Mualla
View author publications
You can also search for this author in PubMed Google Scholar
Tino Haderlein
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Daqrouq
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fadi Sindran .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sindran, F., Mualla, F., Haderlein, T., Daqrouq, K., Nöth, E. (2017). Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-64206-2_23
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics