Advertisement

Enhancing Speech Recorded from a Wearable Sensor Using a Collection of Autoencoders

  • Astryd González-Salazar
  • Michelle Gutiérrez-Muñoz
  • Marvin Coto-JiménezEmail author
Conference paper
  • 22 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1087)

Abstract

Assistive Technology (AT) is a concept which includes the use of technological devices to improve the learning process or the general capabilities of people with disabilities. One of the major tasks of the AT is the development of devices that offer alternative or augmentative communication capabilities.

In this work, we implemented a simple AT device with a low-cost sensor for registering speech signals, in which the sound is perceived as low quality and corrupted. Thus, it is not suitable to integrate into speech recognition systems, automatic transcription or general recognition of vocal-tract sounds for people with disabilities.

We propose the use of a group of artificial neural networks that improve different aspects of the signal. In the study of the speech enhancement, it is normal to focus on how to make improvements in specific conditions of the signal, such as background noise, reverberation, natural noises, among others. In this case, the conditions that degrade the sound are unknown. This uncertainty represents a bigger challenge for the enhancement of the speech, in a real-life application.

The results show the capacity of the artificial neural networks to enhance the quality of the sound, under several objective evaluation measurements. Therefore, this proposal can become a way of treating these kinds of signals to improve robust speech recognition systems and increase the real possibilities for implementing low-cost AT devices.

Keywords

Artificial neural networks Assistive Technology LSTM Speech enhancement 

Notes

Acknowledgements

This work was supported by the University of Costa Rica (UCR), Project No. 322-B9-105 and ED-3416.

References

  1. 1.
    Alshurafa, N., et al.: Recognition of nutrition intake using time-frequency decomposition in a wearable necklace using a piezoelectric sensor. IEEE Sens. J. 15(7), 3909–3916 (2015)CrossRefGoogle Scholar
  2. 2.
    Alshurafa, N., Kalantarian, H., Pourhomayoun, M., Sarin, S., Liu, J.J., Sarrafzadeh, M.: Non-invasive monitoring of eating behavior using spectrogram analysis in a wearable necklace. In: 2014 IEEE Healthcare Innovation Conference (HIC), pp. 71–74. IEEE (2014)Google Scholar
  3. 3.
    Coto-Jiménez, M.: Pre-training long short-term memory neural networks for efficient regression in artificial speech postfiltering. In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–7. IEEE (2018)Google Scholar
  4. 4.
    Coto-Jiménez, M., Goddard-Close, J.: LSTM deep neural networks postfiltering for improving the quality of synthetic voices. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Ayala-Ramírez, V., Olvera-López, J.A., Jiang, X. (eds.) MCPR 2016. LNCS, vol. 9703, pp. 280–289. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-39393-3_28CrossRefGoogle Scholar
  5. 5.
    Coto-Jiménez, M., Goddard-Close, J.: LSTM deep neural networks postfiltering for enhancing synthetic voices. Int. J. Pattern Recogn. Artif. Intell. 32(01), 1860008 (2018)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Coto-Jimenez, M., Goddard-Close, J., Di Persia, L., Rufiner, H.L.: Hybrid speech enhancement with wiener filters and deep LSTM denoising autoencoders. In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp. 1–8. IEEE (2018)Google Scholar
  7. 7.
    Coto-Jiménez, M., Goddard-Close, J., Martínez-Licona, F.: Improving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 354–361. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-43958-7_42CrossRefGoogle Scholar
  8. 8.
    Du, J., Wang, Q., Gao, T., Xu, Y., Dai, L.R., Lee, C.H.: Robust speech recognition with speech enhanced deep neural networks. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)Google Scholar
  9. 9.
    Erro, D., Sainz, I., Navas, E., Hernáez, I.: Improved HNM-based vocoder for statistical synthesizers. In: Twelfth Annual Conference of the International Speech Communication Association (2011)Google Scholar
  10. 10.
    Fan, Y., Qian, Y., Xie, F.L., Soong, F.K.: TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)Google Scholar
  11. 11.
    Feng, X., Zhang, Y., Glass, J.: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1759–1763. IEEE (2014)Google Scholar
  12. 12.
    Gautschi, G.: Piezoelectric sensors. In: Gautschi, G. (ed.) Piezoelectric Sensorics, pp. 73–91. Springer, Heidelberg (2002).  https://doi.org/10.1007/978-3-662-04732-3_5CrossRefGoogle Scholar
  13. 13.
    Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005).  https://doi.org/10.1007/11550907_126CrossRefGoogle Scholar
  15. 15.
    Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)Google Scholar
  16. 16.
    Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Han, K., He, Y., Bagchi, D., Fosler-Lussier, E., Wang, D.: Deep neural network based spectral feature mapping for robust speech recognition. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)Google Scholar
  18. 18.
    Healy, E.W., Yoho, S.E., Wang, Y., Wang, D.: An algorithm to improve speech recognition in noise for hearing-impaired listeners. J. Acoust. Soc. Am. 134(4), 3029–3038 (2013)CrossRefGoogle Scholar
  19. 19.
    Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  20. 20.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  21. 21.
    Ishii, T., Komiyama, H., Shinozaki, T., Horiuchi, Y., Kuroiwa, S.: Reverberant speech recognition based on denoising autoencoder. In: INTERSPEECH, pp. 3512–3516 (2013)Google Scholar
  22. 22.
    Kim, D., et al.: Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 167–176. ACM (2012)Google Scholar
  23. 23.
    Kolasinska, A., Quadrio, G., Gaggi, O., Palazzi, C.E.: Technology and aging: users’ preferences in wearable sensor networks. In: Proceedings of the 4th EAI International Conference on Smart Objects and Technologies for Social Good, pp. 77–81. ACM (2018)Google Scholar
  24. 24.
    Llombart, J., Ribas, D., Miguel, A., Vicente, L., Ortega, A., Lleida, E.: Speech enhancement with wide residual networks in reverberant environments. arXiv preprint arXiv:1904.05167 (2019)
  25. 25.
    Maegaard, B., Choukri, K., Calzolari, N., Odijk, J.: ELRA-European Language Resources Association-background, recent developments and future perspectives. Lang. Resour. Eval. 39(1), 9–23 (2005)CrossRefGoogle Scholar
  26. 26.
    Manganiello, L., Vega, C., Rıos, A., Valcárcel, M.: Use of wavelet transform to enhance piezoelectric signals for analytical purposes. Anal. Chim. Acta 456(1), 93–103 (2002)CrossRefGoogle Scholar
  27. 27.
    Morabito, V.: Wearable technologies. The Future of Digital Business Innovation, pp. 23–42. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-26874-3_2CrossRefGoogle Scholar
  28. 28.
    Nanayakkara, S., Shilkrot, R., Yeo, K.P., Maes, P.: EyeRing: a finger-worn input device for seamless interactions with our surroundings. In: Proceedings of the 4th Augmented Human International Conference, pp. 13–20. ACM (2013)Google Scholar
  29. 29.
    Naylor, P.A., Gaubitch, N.D.: Speech Dereverberation. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-1-84996-056-4CrossRefzbMATHGoogle Scholar
  30. 30.
    Ribas, D., Llombart, J., Miguel, A., Vicente, L.: Deep speech enhancement for reverberated and noisy signals using wide residual networks. arXiv preprint arXiv:1901.00660 (2019)
  31. 31.
    Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7398–7402. IEEE (2013)Google Scholar
  32. 32.
    Sirohi, J., Chopra, I.: Fundamental understanding of piezoelectric strain sensors. J. Intell. Mater. Syst. Struct. 11(4), 246–257 (2000)CrossRefGoogle Scholar
  33. 33.
    Tressler, J.F., Alkoy, S., Newnham, R.E.: Piezoelectric sensors and sensor materials. J. Electroceram. 2(4), 257–272 (1998)CrossRefGoogle Scholar
  34. 34.
    Velázquez, R.: Wearable assistive devices for the blind. In: Lay-Ekuakille, A., Mukhopadhyay, S.C. (eds.) Wearable and Autonomous Biomedical Devices and Systems for Smart Environment. LNEE, vol. 75, pp. 331–349. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15687-8_17CrossRefGoogle Scholar
  35. 35.
    Villamizar, L.H., Gualdron, M., Gonzalez, F., Aceros, J., Rizzo-Sierra, C.V.: A necklace sonar with adjustable scope range for assisting the visually impaired. In: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 1450–1453. IEEE (2013)Google Scholar
  36. 36.
    Vincent, E., Watanabe, S., Nugraha, A.A., Barker, J., Marxer, R.: An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Comput. Speech Lang. 46, 535–557 (2017)CrossRefGoogle Scholar
  37. 37.
    Wilson, J., Walker, B.N., Lindsay, J., Cambias, C., Dellaert, F.: Swan: system for wearable audio navigation. In: 2007 11th IEEE International Symposium on Wearable Computers, pp. 91–98. IEEE (2007)Google Scholar
  38. 38.
    Yu, L., Bao, J., Giurgiutiu, V.: Signal processing techniques for damage detection with piezoelectric wafer active sensors and embedded ultrasonic structural radar. In: Smart Structures and Materials 2004: Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems, vol. 5391, pp. 492–504. International Society for Optics and Photonics (2004)Google Scholar
  39. 39.
    Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4470–4474. IEEE (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.PRIS-Lab, Escuela de Ingeniería EléctricaUniversidad de Costa RicaSan PedroCosta Rica

Personalised recommendations