Skip to main content

Impact of a Newly Developed Modern Standard Arabic Speech Corpus on Implementing and Evaluating Automatic Continuous Speech Recognition Systems

  • Conference paper
Spoken Dialogue Systems for Ambient Environments (IWSDS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6392))

Included in the following conference series:

Abstract

Being current formal linguistic standard and only acceptable form of Arabic language for all native speakers, Modern Standard Arabic (MSA) still lacks sufficient spoken corpora compared to other forms like Dialectal Arabic. This paper describes our work towards developing a new speech corpus for MSA, which can be used for implementing and evaluating any Arabic automatic continuous speech recognition system. The speech corpus contains 415 (367 training and 48 testing) sentences recorded by 42 (21 male and 21 female) Arabic native speakers from 11 countries representing three major regions (Levant, Gulf, and Africa). The impact of using this speech corpus on overall performance of Arabic automatic continuous speech recognition systems was examined. Two development phases were conducted based on the size of training data, Gaussian mixture distributions, and tied states (senones). Overall results indicate that larger training data size result higher word recognition rates and lower Word Error Rates (WER).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elmahdy, M., Gruhn, R., Minker, W., Abdennadher, S.: Survey on common Arabic language forms from a speech recognition point of view. In: International Conference on Acoustics (NAG-DAGA), Rotterdam, Netherlands, pp. 63 – 66 (2009)

    Google Scholar 

  2. Alotaibi, Y.A.: Comparative Study of ANN and HMM to Arabic Digits Recognition Systems. Journal of King Abdulaziz University: Engineering Sciences 19(1), 43–59 (2008)

    Article  Google Scholar 

  3. Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schone, P., Schwartz, R., Vergyri, D.: Novel approaches to Arabic speech recognition. In: Report from the 2002 Johns-Hopkins Summer Workshop, ICASSP 2003, Hong Kong, vol. 1, pp. 344–347 (2003)

    Google Scholar 

  4. Al-Sulaiti, L., Atwell, E.: The design of a corpus of Contemporary Arabic. International Journal of Corpus Linguistics, John Benjamins Publishing Company, 1 – 36 (2006)

    Google Scholar 

  5. Nikkhou, M., Choukri, K.: Survey on Industrial needs for Language Resources. Technical Report, NEMLAR – Network for Euro-Mediterranean Language Resources (2004)

    Google Scholar 

  6. Nikkhou, M., Choukri, K.: Survey on Arabic Language Resources and Tools in the Mediterranean Countries. Technical Report, NEMLAR – Network for Euro-Mediterranean Language Resources (2005)

    Google Scholar 

  7. Alghamdi, M., Alhamid, A.H., Aldasuqi, M.M.: Database of Arabic Sounds: Sentences. Technical Report, King Abdulaziz City of Science and Technology, Saudi Arabia, In Arabic (2003)

    Google Scholar 

  8. Ali, M., Elshafei, M., Alghamdi, M., Almuhtaseb, H., Al-Najjar, A.: Generation of Arabic Phonetic Dictionaries for Speech Recognition. In: IEEE Proceedings of the International Conference on Innovations in Information Technology, UAE, pp. 59 – 63 (2008)

    Google Scholar 

  9. Elshafei, A.M.: Toward an Arabic Text-to-Speech System. The Arabian Journal of Science and Engineering 16(4B), 565–583 (1991)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abushariah, M.A.M., Ainon, R.N., Zainuddin, R., Al-Qatab, B.A., Alqudah, A.A.M. (2010). Impact of a Newly Developed Modern Standard Arabic Speech Corpus on Implementing and Evaluating Automatic Continuous Speech Recognition Systems. In: Lee, G.G., Mariani, J., Minker, W., Nakamura, S. (eds) Spoken Dialogue Systems for Ambient Environments. IWSDS 2010. Lecture Notes in Computer Science(), vol 6392. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16202-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16202-2_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16201-5

  • Online ISBN: 978-3-642-16202-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics