Advertisement

The South African directory enquiries (SADE) name corpus

  • Jan W. F. Thirion
  • Charl van Heerden
  • Oluwapelumi Giwa
  • Marelie H. DavelEmail author
Original Paper
  • 3 Downloads

Abstract

We present the design and development of a South African directory enquiries corpus. It contains audio and orthographic transcriptions of a wide range of South African names produced by first-language speakers of four languages, namely Afrikaans, English, isiZulu and Sesotho. Useful as a resource to understand the effect of name language and speaker language on pronunciation, this is the first corpus to also aim to identify the “intended language”: an implicit assumption with regard to word origin made by the speaker of the name. We describe the design, collection, annotation, and verification of the corpus. This includes an analysis of the algorithms used to tag the corpus with meta information that may be beneficial to pronunciation modelling tasks.

Keywords

Speech corpus collection Pronunciation modelling Speech recognition Proper names 

Notes

Acknowledgements

This work is based on research supported by the Department of Arts and Culture (DAC) of the government of South Africa, through their Human Language Technologies (HLT) unit, and the National Research Foundation (NRF). Any opinion, finding and conclusion or recommendation expressed in this material is that of the authors and the NRF does not accept any liability in this regard. The support by both institutions is gratefully acknowledged.

References

  1. Adda-Decker, M., & Lamel, L. (2006). Multilingual dictionaries. In T. Schultz & K. Kirchoff (Eds.), Multilingual speech processing (pp. 123–166). Berlington, MA: Academic Press. chap 5.CrossRefGoogle Scholar
  2. Amdal, I., & Fosler-Lussier, E. (2003). Pronunciation variation modeling in automatic speech recognition. Telektronikk, 99, 70–82.Google Scholar
  3. Barnard, E., Davel, M., & van Heerden, C. (2009). ASR corpus design for resource-scarce languages. In Proceedings of the 10th annual conference of the international speech communication association (INTERSPEECH), Brighton, UK (pp. 2847–2850).Google Scholar
  4. Barnard, E., Davel, M. H., van Heerden, C. J., De Wet, F., & Badenhorst, J. (2014). The NCHLT speech corpus of the South African languages. In Proceedings of the of the 4th workshop on spoken language technologies for under-resourced languages (SLTU), St. Peterburg, Russia (pp. 194–200).Google Scholar
  5. Barnard, E., Davel, M, H., & van Huyssteen, G. B. (2010). Speech technology for information access: a South African case study. In Proceedings of the AAAI spring symposium on artificial intelligence for development (AAI-D) (pp. 8–13).Google Scholar
  6. Bechet, F., De Mori, R., & Subsol, G. (2001) Very large vocabulary proper name recognition for directory assistance. In IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 222–225).Google Scholar
  7. Bechet, F., De Mori, R., & Subsol, G. (2002). Dynamic generation of proper name pronunciations for directory assistance. In IEEE international conference on acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. I–745–I–748).Google Scholar
  8. Bisani, M., & Ney, H. (2008). Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication, 50(5), 434–451.  https://doi.org/10.1016/j.specom.2008.01.002.CrossRefGoogle Scholar
  9. Church, K. W. (1985). Stress assignment in letter-to-sound rules for speech synthesis. The Journal of the Acoustical Society of America, 78(S1), S7–S7.CrossRefGoogle Scholar
  10. Córdoba, R., San-Segundo, R., Montero, J. M., Colás, J., Ferreiros, J., Macías-Guarasa, J., & Pardo, J. M. (2001). An interactive directory assistance service for Spanish with large-vocabulary recognition. In Proceedings of the 2nd annual conference of the international speech communication association (INTERSPEECH), Scandinavia (pp. 1279–1282).Google Scholar
  11. Davel, M. H., Basson, W. D., van Heerden, C. J., & Barnard, E. (2013). NCHLT dictionaries: Project report. Technical report. Multilingual Speech Technologies, North-West University.Google Scholar
  12. Davel, M. H., & Martirosian, O. (2009). Pronunciation dictionary development in resource-scarce environments. In Proceedings of the 10th annual conference of the international speech communication association (INTERSPEECH) (pp. 2851–2854).Google Scholar
  13. Davel, M. H., van Heerden, C. J., & Barnard, E. (2012). Validating smartphone-collected speech corpora (accepted for publication). In Proceedings of the spoken language technologies for under-resourced languages (SLTU).Google Scholar
  14. Giwa, O., & Davel, M. H. (2014). Language identification of individual words with Joint Sequence Models. In Proceedings of the 15th annual conference of the international speech communication association (Interspeech).Google Scholar
  15. Giwa, O., & Davel, M. H. (2015) Text-based language identification of multilingual names. In Proceedings of the pattern recognition association of South Africa and robotics and mechatronics international conference (PRASA-RobMech) (pp. 166–171).Google Scholar
  16. Giwa, O., Davel, M. H., & Barnard, E. (2011). A Southern African corpus for multilingual name pronunciation. In Proceedings of the 22nd annual symposium of the pattern recognition association of South Africa (PRASA) (pp. 49–53).Google Scholar
  17. Gustafson, J. (2009). ONOMASTICA—Creating a multi-lingual dictionary of European names. Lund Working Papers in Linguistics, 43, 66–69.Google Scholar
  18. Kamm, C. A., Shamieh, C., & Singhal, S. (1995). Speech recognition issues for directory assistance applications. Speech Communication, 17(3), 303–311.CrossRefGoogle Scholar
  19. Kgampe, M., & Davel, M. H. (2010). Consistency of cross-lingual pronunciation of South African personal names. In 21st annual symposium of the pattern recognition association of South Africa (PRASA 2010) (pp. 123–127).Google Scholar
  20. Kgampe, M., Davel, M. H. (2011). The predictability of name pronunciation errors in four South African languages. In Proceedings of the 22nd annual symposium of the pattern recognition association of South Africa (PRASA), Emerald Casino and Resort, Vanderbijlpark, South Africa (pp. 85–90).Google Scholar
  21. Llitjós, A.F., & Black, A.W. (2001) Knowledge of language origin improves pronunciation accuracy of proper names. In 7th European conference on speech communication and technology (EUROSPEECH) (pp. 1919–1922).Google Scholar
  22. Llitjós, A. F., Black, A. W., Lenzo, K., & Rosenfeld, R. (2001) Improving pronunciation accuracy of proper names with language origin classes. In Proceedings of the 7th ESSLLI student session.Google Scholar
  23. Loots, L., & Niesler, T. (2011). Automatic conversion between pronunciations of different English accents. Speech Communication, 53, 75–84.  https://doi.org/10.1016/j.specom.2010.07.006.CrossRefGoogle Scholar
  24. Maison, B., Chen, S. F., & Cohen, P. S. (2003). Pronunciation modeling for names of foreign origin. In Proceedings of the IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE (pp. 429–434).  https://doi.org/10.1109/ASRU.2003.1318479.
  25. Modipa, T., de Wet, F., Davel, M. H. (2009) ASR performance analysis of an experimental call routing system. In Proceedings of the 20th annual symposium of the pattern recognition association of South Africa (PRASA) (pp. 127–130).Google Scholar
  26. Modipa, T. I., Davel, M. H., & de Wet, F. (2013). Pronunciation modelling of foreign words for Sepedi ASR. In Proceedings of the annual symposium of the pattern recognition association of South Africa (PRASA), Johannesburg, South Africa (pp. 64–69).Google Scholar
  27. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., & Schwarz, P., et al. (2011) The Kaldi speech recognition toolkit. In Proceedings of the IEEE 2011 workshop on automatic speech recognition and understanding (ASRU), Big Island, Hawaii, EPFL-CONF-192584.Google Scholar
  28. Réveil, B., Martens, J. P., & D’Hoore, B. (2009) How speaker tongue and name source language affect the automatic recognition of spoken names. In 10th annual conference of the international speech communication association (INTERSPEECH) (pp. 2971–2974).Google Scholar
  29. Réveil, B., Martens, J. P., & van den Heuvel, H. (2010) Improving proper name recognition by adding automatically learned pronunciation variants to the lexicon. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner & D. Tapias (Eds.), Proceedings of the 7th conference on international language resources and evaluation (LREC) (pp. 2149–2154).Google Scholar
  30. Réveil, B., Martens, J. P., & van den Heuvel, H. (2012). Improving proper name recognition by means of automatically learned pronunciation variants. Speech Communication, 54(3), 321–340.CrossRefGoogle Scholar
  31. Schramm, H., Rueber, B., & Kellner, A. (2000). Strategies for name recognition in automatic directory assistance systems. Speech Communication, 31(4), 329–338.  https://doi.org/10.1016/S0167-6393(99)00066-7.CrossRefGoogle Scholar
  32. Spiegel, M. F. (2003). Proper name pronunciations for speech technology applications. International Journal of Speech Technology, 6(4), 419–427.CrossRefGoogle Scholar
  33. Strik, H., & Cucchiarini, C. (1999). Modeling pronunciation variation for ASR: A survey of the literature. Speech Communication, 29(2–4), 225–246.CrossRefGoogle Scholar
  34. Thirion, J. W., Davel, M. H., & Barnard, E. (2012) Multilingual pronunciations of proper names in a Southern African corpus. In Proceedings of the 23rd annual symposium of the pattern recognition association of South Africa (PRASA), Pretoria, South Africa (pp. 102–108).Google Scholar
  35. Trancoso, I., & Viana, M. C. (1995). Issues in the pronunciation of proper names: The experience of the Onomastica project. In Workshop on integration of language and speech (pp. 1–16).Google Scholar
  36. van den Heuvel, H., Martens, J. P., D’hanens, K., & Konings, N. (2008) The autonomata spoken names corpus. In Proceedings of the 6th conference on international language resources and evaluation (LREC) (pp. 140–143).Google Scholar
  37. van den Heuvel, H., Réveil, B., & Martens, J. P. (2009). Pronunciation-based ASR for names. In Proceedings of the 10th annual conference of the international speech communication association (INTERSPEECH) (pp. 2959–2962).Google Scholar
  38. van Heerden, C., Davel, M. H., & Barnard, E. (2014). Performance analysis of a multilingual directory enquiries application. In Proceedings of the annual symposium of the pattern recognition association of South Africa (PRASA).Google Scholar
  39. van Heerden, C., Kleynhans, N., & Davel, M. (2016). Improving the Lwazi ASR baseline. In Proceedings of the INTERSPEECH (pp. 3534–3538).Google Scholar
  40. Yang, Q., Martens, J.P., Konings, N., & van den Heuvel, H. (2006) Development of a phoneme-to-phoneme (p2p) converter to improve the grapheme-to-phoneme (g2p) conversion of names. In Proceedings of the 5th international conference on language resources and evaluation (LREC) (pp. 287–292).Google Scholar
  41. Yu, D., Ju, Y. C., Wang, Y. Y., Zweig, G., & Acero, A. (2007) Automated directory assistance system—From theory to practice. In Proceedings of the 8th annual conference of the international speech communication association (INTERSPEECH) (pp. 2709–2712).Google Scholar
  42. Zulu, P. N., Botha, G., & Barnard, E. (2008). Orthographic measures of language distances between the official South African languages. Literator: Journal of Literary Criticism, Comparative Linguistics and Literary Studies, 29(1), 185–204.CrossRefGoogle Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Multilingual Speech Technologies (MuST), Faculty of EngineeringNorth-West UniversityPotchefstroomSouth Africa

Personalised recommendations