Language Resources and Evaluation

, Volume 52, Issue 1, pp 249–267 | Cite as

Automatic speech recognition system for Tunisian dialect

  • Abir Masmoudi
  • Fethi Bougares
  • Mariem Ellouze
  • Yannick Estève
  • Lamia Belguith
Original Paper

Abstract

Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.

Keywords

Under-resourced language Rule-based Grapheme-to-phoneme conversion Automatic speech recognition Tunisian dialect 

References

  1. Abdel-Rahman A. (1991). Code-switching and linguistic accommodation in Arabic, In Perspectives on arabic linguistics III: Papers from the third annual symposium on Arabic linguistics (vol. 80, pp. 231250). John Benjamins Publishing.Google Scholar
  2. Alghamdi, M., Elshafei, M. & and Al-Muhtaseb, H. (2002). Speech units for Arabic text-to-speech, fourth workshop on computer and information sciences, pp. 199–212.Google Scholar
  3. Alghamdi, M., Muzaffar, Z., & Alhakami, H. (2010). Automatic restoration of Arabic diacritics: A simple, purely statistical approach. The Arabian Journal for Science and Engineering, 35(2), 35.Google Scholar
  4. Andersen, O., Kuhn, R., Lazaridès, A., Dalsgaard, P., Haas, J., & Nth, E. (1996). Comparison of two tree-structured approaches for Grapheme-to-Phoneme conversion, spoken language processing (Vol. 3, pp. 1700–1703). Philadelphia, USA.Google Scholar
  5. Baccouche, T. (2003). Larabe, dune koin dialectale une langue de culture, Mmoires de la soci linguistique de Paris, TomeXI, (les langues de Communication...), 87–93.Google Scholar
  6. Barnard, E., Davel, M. H., & Van Huyssteen, G. B. (2010). Speech technology for information access: A South African case study. In AAAI spring symposium: artificial intelligence for development.Google Scholar
  7. Besacier, L., Le, V.B., Castelli, E., Sethserey, S. & Protin, L. (2005). Reconnaissance automatique de la parole pour des langues peu dotees: Application au vietnamien et au khmer, TALN’2005.Google Scholar
  8. Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100.CrossRefGoogle Scholar
  9. Biadsy, F., Habash, N. & Hirschberg, J. (2009). Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. In Annual conference of the North American, Boulder, Colorado p. 397405.Google Scholar
  10. Bisani, M., & Ney, H. (2008). Joint-sequence models for Grapheme-to-Phoneme conversion. Speech Communication, 50, 434–451.CrossRefGoogle Scholar
  11. Blachona, D., Gauthiera, E., Besacier, L., Kouarata, G., Adda-Deckerb, M. & Rialland, A. (2016). Parallel speech collection for under-resourced language studies using the lig-aikuma mobile device app, In 5th workshop on spoken language technology for under-resourced languages, SLTU’2016.Google Scholar
  12. Cucu, H., Buzo, A., Besacier, L., & Burileanu, C. (2014). SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian. Speech Communication, 56, 195–212.CrossRefGoogle Scholar
  13. El-Imam, Y. (2004). Phonetization of Arabic: Rules and algorithms, Computer Speech and Language.Google Scholar
  14. Elmahdy, M., Hasegawa-Johnson, M. & Mustafawi, E. (2014). Development of a TV broadcasts speech recognition system for Qatari Arabic, In The 9th edition of the language resources and evaluation conference: LREC’2014.Google Scholar
  15. Elshafei, M., Al-Muhtaseb, H. & Alghamdi. M. (2006). Statistical methods for automatic diacritization of Arabic text. In The Saudi 18th national computer conference (vol. 18, pp. 301–306).Google Scholar
  16. Gauthier, E., Besacier, L., Voisin, S., Melese, M. & Elingui, U. P. (2016). Collecting resources in sub-Saharan African languages for automatic speech recognition: A case study of wolof, LREC’2016.Google Scholar
  17. Gauthiera, E., Besacier, L. & Voisinb, S. (2016). Automatic speech recognition for African languages with vowel length contrast. In 5th workshop on spoken language technology for under-resourced languages, SLTU’2016.Google Scholar
  18. Gelas, H., Abate, S. T., Besacier, L. & Pellegrino, F. (2012). Analyse des performances de modles de langage sub-lexicale pour des langues peu-dotees morphologie riche, JEP-TALN-RECITAL 2012, Atelier TALAf 2012: Traitement Automatique des Langues Africaines.Google Scholar
  19. Graja, M., Jaoua, M. & Belguith, L. (2010). Lexical study of a spoken dialogue corpus in Tunisian dialect. In ACIT2010: The International Arab conference on information technology, Benghazi-Libya, December 1416.Google Scholar
  20. Graja, M., Jaoua, M., & Belguith, L. (2015). Statistical framework with knowledge base integration for robust speech understanding of the Tunisian dialect. IEEE/ACM Transactions on Audio, Speech & Language Processing, 23, 2311–2321.CrossRefGoogle Scholar
  21. Habash, N., Diab, D. & Rambow, O. (2012). Conventional orthography for dialectal Arabic. In Proceedings of the eighth international conference on language resources and evaluation, LREC’2012.Google Scholar
  22. Habash, N. (2010). Introduction to Arabic natural language processing, synthesis lectures on human language technologies, Graeme Hirst. San Rafael: Morgan & Claypool Publishers.Google Scholar
  23. Habash, N. (2006). On Arabic and its dialects. Multilingual Magazine, 17, 81.Google Scholar
  24. Häkkinen, J., Suontausta, J., Riis, S., & Jensen, K. (2003). Assessing text-to-phoneme mapping strategies in speaker independent isolated word recognition. Speech Communication, 41, 455–467.Google Scholar
  25. Harrat, S., Meftouh, K., Abbas, M., & Smaïli, K. (2014). Grapheme to Phoneme conversion—an Arabic dialect case, In Spoken language technologies for under-resourced languages, (SLTU’2014).Google Scholar
  26. Illina, I., Fohr, D., & Jouvet, D. (2011). Grapheme-to-phoneme conversion using conditional random fields, Interspeech’ 2011.Google Scholar
  27. Jensen, J., & Riis, S. (2000). Self-organizing letter code-book for text-to-phoneme neural network model. Spoken Language Processing, 3(318), 321.Google Scholar
  28. Juan, S., & Besacier, L. (2013). Fast bootstrapping of Grapheme to Phoneme system for under-resourced languages-application to the iban language, WSSANLP-2013.Google Scholar
  29. Kheang, S., Katsurada, K., Iribe, Y., & Nitta, T. (2014). Solving the phoneme conflict in Grapheme-to-Phoneme conversion using a two-stage neural network-based approach. IEICE Transactions on Information and Systems, 97, 901–910.CrossRefGoogle Scholar
  30. Lawson, S., & Itesh, S. (1997). Accommodation communicative en Tunisie: une tude empirique (pp. 101–114). Plurilinguisme et identits au Maghreb: Publications de lUniversite de Rouen.Google Scholar
  31. Lileikyta, R., Gorinaa, A., Lamela, L., Gauvaina, J., & Fraga-Silva, T. H. (2016). Lithuanian broadcast speech transcription using semi-supervised acoustic model training. In 5th Workshop on spoken language technology for under-resourced languages, SLTU’2016.Google Scholar
  32. Loots, L., & Niesler, T. (2011). Automatic conversion between pronunciations of different English accents. Speech Communication, 53, 7584.CrossRefGoogle Scholar
  33. Marchand, Y., & Damper, R. (2000). A multistrategy approach to improving pronunciation by analogy. Computational Linguistics, 26, 19–219.CrossRefGoogle Scholar
  34. Masmoudi, A., Khmekhem, M., Estève, Y., Belguith, L., & Habash, N. (2014). A corpus and phonetic dictionary for Tunisian Arabic speech recognition. In Proceedings of the ninth international conference on language resources and evaluation (LREC-2014), Reykjavik, Iceland (pp. 306–310).Google Scholar
  35. Masmoudi, A., Habash, N., Khmekhem, M., Estève, Y., & Belguith, L. (2015). Arabic transliteration of Romanized Tunisian dialect text: A preliminary investigation. In 16th international conference on computational linguistics and intelligent text processing, CICLing 2015. Cairo: Egypt, pp. 608–619.Google Scholar
  36. Mejri, S., Said, S., & Sfar, I. (2009). Pluringuisme et diglossie en Tunisie. Synergies Tunisie, 1, 53–74.Google Scholar
  37. Nimaan, A., Nocera, P., & Torres-Moreno, J. M. (2006). Boites a outils tal pour les langues peu informatisees: Le cas du somali. JADT06: actes des 8es Journees internationales danalyse statistique des donnees textuelles: Besancon.Google Scholar
  38. Pagel, V., Lenzo, K., & Black, A. (1998). Letter-to-sound rules for accented lexicon compression. Spoken Language Processing, Sydney, Australia, 2015, 2018.Google Scholar
  39. Pellegrini, T. (2008). Transcription automatique de langues peu dotees, Ph.D. thesis; Universite Paris Sud-Paris XI.Google Scholar
  40. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, K., Stemmer, G., & Vesely, K. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding.Google Scholar
  41. Rasipuram, R., & Doss, M. (2012). Acoustic data-driven grapheme-to-phoneme conversion using KL-HMM. In Acoustics, Speech and Signal Processing (ICASSP’2012), pp. 4841–4844.Google Scholar
  42. Saadane, H., & Habash, N. (2015). A conventional orthography for Algerian Arabic. In Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 69–79.Google Scholar
  43. Samson, S., Besacier, L., Lecouteux, B., & Dyab, M. (2015). Using resources from a closely-related language to develop ASR for a very under-resourced language: A case study for iban, interspeech’2015. Germany: Dresden.Google Scholar
  44. Schlippe, T., Djomgang, E., Vu, N., Ochs, S., & Schultz, T. (2012). Hause large vocabulary continuous speech recognition. In The Third International Workshop on Spoken Languages Technologies for Under-Resourced Languages, Cape Town, South Africa, SLTU’2012.Google Scholar
  45. Sejnowski, T., & Rosenberg, C. H. (1987). Parallel networks that learn to pronounce English text. Complex Systems Publications (pp. 145–168).Google Scholar
  46. Seng. K., Iribe, Y., Nitta, T. (2011). Letter-to-phoneme conversion based on two-stage neural network focusing on letter and phoneme contexts. In INTERSPEECH’2011, 12th Annual Conference of the International Speech Communication Association, ISCA, pp. 1885–1888.Google Scholar
  47. Taylor, P. (2005). Hidden Markov models for grapheme to phoneme conversion. In INTERSPEECH’ 2005Eurospeech, 9th European Conference on Speech Communication and Technology, ISCA, pp. 1973–1976.Google Scholar
  48. Tebbi, H. (2007). Transcription orthographique phonétique en vue de la synthèse de la parole partir du texte de lArabe. Algérie: Univrersité de Blida.Google Scholar
  49. Vergyri, D., Mandal, A., Wang, W., Stolcke, A., Zheng, J., Graciarena, M., et al. (2008). Development of the SRI/Nightingale Arabic ASR system. Interspeech, 2008, 14371440.Google Scholar
  50. Vu, N.T, Kraus, F., & Schultz, T. (2011). Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training, Interspeech, Citeseer.Google Scholar
  51. Wang, X., & Sim, K. (2013). Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion, INTERSPEECH’2013.Google Scholar
  52. Zribi, I., Boujelbane, R., Masmoudi, A., Ellouze, M., Belguith, L., & Habash, N. (2014). A conventional orthography for Tunisian Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) (pp. 2355–2361). Reykjavik, Iceland.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  • Abir Masmoudi
    • 1
    • 2
  • Fethi Bougares
    • 1
  • Mariem Ellouze
    • 2
  • Yannick Estève
    • 1
  • Lamia Belguith
    • 2
  1. 1.LIUM, Le Mans UniversityLe MansFrance
  2. 2.ANLP Research group, MIRACL Lab.University of SfaxSfaxTunisia

Personalised recommendations