AequeVox: Automated Fairness Testing of Speech Recognition Systems

Rajan, Sai Sathiesh; Udeshi, Sakshi; Chattopadhyay, Sudipta

doi:10.1007/978-3-030-99429-7_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13241))

Included in the following conference series:

International Conference on Fundamental Approaches to Software Engineering

2951 Accesses
3 Citations
1 Altmetric

Abstract

Automatic Speech Recognition (ASR) systems have become ubiquitous. They can be found in a variety of form factors and are increasingly important in our daily lives. As such, ensuring that these systems are equitable to different subgroups of the population is crucial. In this paper, we introduce, AequeVox, an automated testing framework for evaluating the fairness of ASR systems. AequeVox simulates different environments to assess the effectiveness of ASR systems for different populations. In addition, we investigate whether the chosen simulations are comprehensible to humans. We further propose a fault localization technique capable of identifying words that are not robust to these varying environments. Both components of AequeVox are able to operate in the absence of ground truth data.

We evaluate AequeVox on speech from four different datasets using three different commercial ASRs. Our experiments reveal that non-native English, female and Nigerian English speakers generate 109%, 528.5% and 156.9% more errors, on average than native English, male and UK Midlands speakers, respectively. Our user study also reveals that 82.9% of the simulations (employed through speech transformations) had a comprehensibility rating above seven (out of ten), with the lowest rating being 6.78. This further validates the fairness violations discovered by AequeVox. Finally, we show that the non-robust words, as predicted by the fault localization technique embodied in AequeVox, show 223.8% more errors than the predicted robust words across all ASRs.

This work is partially supported by Singapore Ministry of Education (MOE) grant number MOE2018-T2-1-098 and OneConnect Financial grant number RGOCFT2001.

Download to read the full chapter text

Chapter PDF

Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies

Article Open access 10 March 2021

HATS: An Open Data Set Integrating Human Perception Applied to the Evaluation of Automatic Speech Recognition Metrics

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

References

Audio data augmentation (2021), https://www.kaggle.com/CVxTz/audio-data-augmentation
Crowdsourced high-quality nigerian english speech data set (2021), http://openslr.org/70/
Grammarly (2021), https://app.grammarly.com/
Aggarwal, A., Lohia, P., Nagar, S., Dey, K., Saha, D.: Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 625–635 (2019)
Google Scholar
Asyrofi, M.H., Thung, F., Lo, D., Jiang, L.: Crossasr: Efficient differential testing of automatic speech recognition via text-to-speech. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). pp. 640–650 (2020). https://doi.org/10.1109/ICSME46990.2020.00066
Buolamwini, J., Gebru, T.: Gender shades: Intersectional accuracy disparities in commercial gender classification. In: Conference on fairness, accountability and transparency. pp. 77–91. PMLR (2018)
Google Scholar
Butterworth, S., et al.: On the theory of filter amplifiers. Wireless Engineer 7(6), 536–541 (1930)
Google Scholar
Calò, A., Arcaini, P., Ali, S., Hauer, F., Ishikawa, F.: Simultaneously searching and solving multiple avoidable collisions for testing autonomous driving systems. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference. pp. 1055–1063 (2020)
Google Scholar
Carlini, N., Wagner, D.: Audio adversarial examples: Targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW). pp. 1–7. IEEE (2018)
Google Scholar
Chen, G., Chen, S., Fan, L., Du, X., Zhao, Z., Song, F., Liu, Y.: Who is real bob? adversarial attacks on speaker recognition systems. In: IEEE Symposium on Security and Privacy (2021)
Google Scholar
Demirsahin, I., Kjartansson, O., Gutkin, A., Rivera, C.: Open-source Multi-speaker Corpora of the English Accents in the British Isles. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC). pp. 6532–6541. European Language Resources Association (ELRA), Marseille, France (May 2020), https://www.aclweb.org/anthology/2020.lrec-1.804
Denton, E., Hutchinson, B., Mitchell, M., Gebru, T., Zaldivar, A.: Image counterfactual sensitivity analysis for detecting unintended bias. In: CVPR 2019 Workshop on Fairness Accountability Transparency and Ethics in Computer Vision (2019)
Google Scholar
Du, X., Xie, X., Li, Y., Ma, L., Zhao, J., Liu, Y.: Deepcruiser: Automated guided testing for stateful deep learning systems. CoRR abs/1812.05339 (2018), http://arxiv.org/abs/1812.05339
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference. pp. 214–226 (2012)
Google Scholar
Eniser, H.F., Gerasimou, S., Sen, A.: Deepfault: Fault localization for deep neural networks. In: Hähnle, R., van der Aalst, W.M.P. (eds.) Fundamental Approaches to Software Engineering - 22nd International Conference, FASE 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11424, pp. 171–191. Springer (2019)
Google Scholar
Feng, Y., Shi, Q., Gao, X., Wan, J., Fang, C., Chen, Z.: Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. pp. 177–188 (2020)
Google Scholar
Galhotra, S., Brun, Y., Meliou, A.: Fairness testing: testing software for discrimination. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017. pp. 498–510 (2017). https://doi.org/10.1145/3106237.3106277, http://doi.acm.org/10.1145/3106237.3106277
Goss, F.R., Zhou, L., Weiner, S.G.: Incidence of speech recognition errors in the emergency department. International journal of medical informatics 93, 70–73 (2016)
Google Scholar
Guo, Q., Xie, X., Li, Y., Zhang, X., Liu, Y., Li, X., Shen, C.: Audee: Automated testing for deep learning frameworks. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 486–498. ACM (Dec 2020)
Google Scholar
Hawley, M.S.: Speech recognition as an input to electronic assistive technology. British Journal of Occupational Therapy 65(1), 15–20 (2002)
Google Scholar
Helmke, H., Ohneiser, O., Mühlhausen, T., Wies, M.: Reducing controller workload with automatic speech recognition. In: 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). pp. 1–10. IEEE (2016)
Google Scholar
Huang, C., Chen, T., Li, S.Z., Chang, E., Zhou, J.L.: Analysis of speaker variability. In: INTERSPEECH. pp. 1377–1380 (2001)
Google Scholar
Iwama, F., Fukuda, T.: Automated testing of basic recognition capability for speech recognition systems. In: 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). pp. 13–24. IEEE (2019)
Google Scholar
Jain, A., Upreti, M., Jyothi, P.: Improved accented speech recognition using accent embeddings and multi-task learning. In: Interspeech. pp. 2454–2458 (2018)
Google Scholar
Johnson, D.H.: Signal-to-noise ratio. Scholarpedia 1(12), 2088 (2006)
Google Scholar
Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J.R., Jurafsky, D., Goel, S.: Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences 117(14), 7684–7689 (2020)
Google Scholar
Kopald, H.D., Chanen, A., Chen, S., Smith, E.C., Tarakan, R.M.: Applying automatic speech recognition technology to air traffic management. In: 2013 IEEE/AIAA 32nd Digital Avionics Systems Conference (DASC). pp. 6C3–1. IEEE (2013)
Google Scholar
Kreuk, F., Adi, Y., Cisse, M., Keshet, J.: Fooling end-to-end speaker verification with adversarial examples. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1962–1966. IEEE (2018)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet physics doklady. vol. 10, pp. 707–710. Soviet Union (1966)
Google Scholar
Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one 13(5), e0196391 (2018)
Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 4765–4774. Curran Associates, Inc. (2017), http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., Zhao, J., Wang, Y.: Deepgauge: multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018. pp. 120–131 (2018)
Google Scholar
Ma, P., Wang, S., Liu, J.: Metamorphic testing and certified mitigation of fairness violations in NLP models. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020. pp. 458–465
Google Scholar
Odena, A., Olsson, C., Andersen, D., Goodfellow, I.: Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In: International Conference on Machine Learning. pp. 4901–4911. PMLR (2019)
Google Scholar
Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: Automated whitebox testing of deep learning systems. In: Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017. pp. 1–18 (2017)
Google Scholar
Phillips, A.: Defending equality of outcome. Journal of political philosophy 12(1), 1–19 (2004)
Google Scholar
Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International conference on machine learning. pp. 5231–5240. PMLR (2019)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: "why should I trust you?": Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. pp. 1135–1144 (2016)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32 (2018)
Google Scholar
Ribeiro, M.T., Wu, T., Guestrin, C., Singh, S.: Beyond accuracy: Behavioral testing of NLP models with checklist. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. pp. 4902–4912. Association for Computational Linguistics (2020)
Google Scholar
Sharma, A., Demir, C., Ngomo, A.C.N., Wehrheim, H.: Mlcheck-property-driven testing of machine learning models. arXiv preprint arXiv:2105.00741 (2021)
Sharma, A., Wehrheim, H.: Testing machine learning algorithms for balanced data usage. In: 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). pp. 125–135. IEEE (2019)
Google Scholar
Sharma, A., Wehrheim, H.: Automatic fairness testing of machine learning models. In: IFIP International Conference on Testing Software and Systems. pp. 255–271. Springer (2020)
Google Scholar
Sharma, A., Wehrheim, H.: Higher income, larger loan? monotonicity testing of machine learning models. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. pp. 200–210 (2020)
Google Scholar
Soremekun, E., Udeshi, S., Chattopadhyay, S.: Astraea: Grammar-based fairness testing. arXiv preprint arXiv:2010.02542 (2020)
Sun, Y., Chockler, H., Huang, X., Kroening, D.: Explaining image classifiers using statistical fault localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. (eds.) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXVIII. Lecture Notes in Computer Science, vol. 12373, pp. 391–406. Springer (2020)
Google Scholar
Sun, Y., Wu, M., Ruan, W., Huang, X., Kwiatkowska, M., Kroening, D.: Concolic testing for deep neural networks. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018. pp. 109–119 (2018)
Google Scholar
Tian, Y., Pei, K., Jana, S., Ray, B.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. pp. 303–314 (2018)
Google Scholar
Udeshi, S., Arora, P., Chattopadhyay, S.: Automated directed fairness testing. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018. pp. 98–108 (2018)
Google Scholar
Udeshi, S.S., Chattopadhyay, S.: Grammar based directed testing of machine learning systems. IEEE Transactions on Software Engineering (2019)
Google Scholar
Verma, S., Rubin, J.: Fairness definitions explained. In: 2018 ieee/acm international workshop on software fairness (fairware). pp. 1–7. IEEE (2018)
Google Scholar
Wang, J., Chen, J., Sun, Y., Ma, X., Wang, D., Sun, J., Cheng, P.: Robot: Robustness-oriented testing for deep learning systems. In: ICSE ’21: 43rd International Conference on Software Engineering (2021)
Google Scholar
Wardat, M., Le, W., Rajan, H.: Deeplocalize: Fault localization for deep neural networks. In: 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. pp. 251–262. IEEE (2021)
Google Scholar
Weinberger, S.H., Kunath, S.A.: The speech accent archive: towards a typology of english accents. In: Corpus-based Studies in Language Use, Language Learning, and Language Documentation, pp. 265–281. Brill Rodopi (2011)
Google Scholar
Xie, X., Ma, L., Juefei-Xu, F., Xue, M., Chen, H., Liu, Y., Zhao, J., Li, B., Yin, J., See, S.: Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. pp. 146–157 (2019)
Google Scholar
Xie, X., Zhang, Z., Chen, T.Y., Liu, Y., Poon, P.L., Xu, B.: Mettle: a metamorphic testing approach to assessing and validating unsupervised machine learning systems. IEEE Transactions on Reliability 69(4), 1293–1322 (2020)
Google Scholar
Zhang, J., Harman, M.: "ignorance and prejudice" in software fairness. In: International Conference on Software Engineering. vol. 43. IEEE (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Singapore University of Technology and Design, Singapore, 487372, Singapore
Sai Sathiesh Rajan, Sakshi Udeshi & Sudipta Chattopadhyay

Authors

Sai Sathiesh Rajan
View author publications
You can also search for this author in PubMed Google Scholar
Sakshi Udeshi
View author publications
You can also search for this author in PubMed Google Scholar
Sudipta Chattopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sai Sathiesh Rajan .

Editor information

Editors and Affiliations

University of Oslo, Oslo, Norway
Einar Broch Johnsen
Johannes Kepler University of Linz, Linz, Austria
Manuel Wimmer

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajan, S.S., Udeshi, S., Chattopadhyay, S. (2022). AequeVox: Automated Fairness Testing of Speech Recognition Systems. In: Johnsen, E.B., Wimmer, M. (eds) Fundamental Approaches to Software Engineering. FASE 2022. Lecture Notes in Computer Science, vol 13241. Springer, Cham. https://doi.org/10.1007/978-3-030-99429-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-99429-7_14
Published: 29 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99428-0
Online ISBN: 978-3-030-99429-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The European Joint Conferences on Theory and Practice of Software. (opens in a new tab)

AequeVox: Automated Fairness Testing of Speech Recognition Systems

Abstract

Chapter PDF

Similar content being viewed by others

Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies

HATS: An Open Data Set Integrating Human Perception Applied to the Evaluation of Automatic Speech Recognition Metrics

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

AequeVox: Automated Fairness Testing of Speech Recognition Systems

Abstract

Chapter PDF

Similar content being viewed by others

Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies

HATS: An Open Data Set Integrating Human Perception Applied to the Evaluation of Automatic Speech Recognition Metrics

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation