On Distant Speech Recognition for Home Automation

Vacher, Michel; Lecouteux, Benjamin; Portet, François

doi:10.1007/978-3-319-16226-3_7

Michel Vacher¹⁶,
Benjamin Lecouteux¹⁷ &
François Portet¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8700))

2599 Accesses
4 Citations

Abstract

In the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms. Generally speaking, a short overview allows then to outline the research challenges that speech technologies must take up for Ambient Assisted Living and Augmentative and Alternative Communication, and the current reseach avenues in this domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Chan, M., Estève, D., Escriba, C., Campo, E.: A review of smart homes- present state and future challenges. Comput. Methods Programs Biomed. 91(1), 55–81 (2008)
Article Google Scholar
Vacher, M., Portet, F., Rossato, S., Aman, F., Golanski, C., Dugheanu, R.: Speech-based interaction in an AAL context. Gerontechnology 11(2), 310 (2012)
Google Scholar
Vacher, M., Portet, F., Fleury, A., Noury, N.: Development of audio sensing technology for ambient assisted living: applications and challenges. Int. J. E-Health Med. Commun. 2(1), 35–54 (2011)
Article Google Scholar
Katz, S., Akpom, C.: A measure of primary sociobiological functions. J. Health Serv. 6(3), 493508 (1976)
Google Scholar
Badii, A., Boudy, J.: CompanionAble - integrated cognitive assistive & domotic companion robotic systems for ability and security. In: 1st Congrés of the Société Française des Technologies pour l’Autonomie et de Gérontechnologie (SFTAG 2009), pp. 18–20, Troyes (2009)
Google Scholar
Filho, G., Moir, T.: From science fiction to science fact: a smart-house interface using speech technology and a photorealistic avatar. Int. J. Comput. Appl. Technol. 39(8), 32–39 (2010)
Article Google Scholar
Gödde, F., Möller, S., Engelbrecht, K.P., Kühnel, C., Schleicher, R., Naumann, A., Wolters, M.: Study of a speech-based smart home system with older users. In: International Workshop on Intelligent User Interfaces for Ambient Assisted Living pp. 17–22 (2008)
Google Scholar
Hamill, M., Young, V., Boger, J., Mihailidis, A.: Development of an automated speech recognition interface for personal emergency response systems. J. NeuroEngineering Rehabil. 6(1), 26 (2009)
Article Google Scholar
Vacher, M., Chahuara, P., Lecouteux, B., Istrate, D., Portet, F., Joubert, T., Sehili, M.E.A., Meillon, B., Bonnefond, N., Fabre, S., Roux, C., Caffiau, S.: The SWEET-HOME project: audio technology in smart homes to improve well-being and reliance. In: 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2013), Osaka, Japan, pp. 7298–7301, July 2013
Google Scholar
Portet, F., Vacher, M., Golanski, C., Roux, C., Meillon, B.: Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects. Pers. Ubiquit. Comput. 17(1), 127–144 (2013)
Article Google Scholar
López-Cózar, R., Callejas, Z.: Multimodal dialogue for ambient intelligence and smart environments. In: Nakashima, H., Aghajan, H., Augusto, J.C. (eds.) Handbook of Ambient Intelligence and Smart Environments, pp. 559–579. Springer, Berlin (2010)
Chapter Google Scholar
Koskela, T., Väänänen-Vainio-Mattila, K.: Evolution towards smart home environments: empirical evaluation of three user interfaces. Pers. Ubiquit. Comput. 8, 234–240 (2004)
Article Google Scholar
Vacher, M., Portet, F., Fleury, A., Noury, N.: Challenges in the processing of audio channels for ambient assisted living. In: IEEE HealthCom 2010, Lyon, France, pp. 330–337, 1–3 July 2010
Google Scholar
Mäyrä, F., Soronen, A., Vanhala, J., Mikkonen, J., Zakrzewski, M., Koskinen, I., Kuusela, K.: Probing a proactive home: challenges in researching and designing everyday smart environments. Hum. Technol. 2, 158–186 (2006)
Google Scholar
Edwards, W., Grinter, R.: At home with ubiquitous computing: seven challenges. In: Abowd, G., Brumitt, B., Shafer, S. (eds.) Ubicomp 2001. LNCS, vol. 2201, pp. 256–272. Springer, Heidelberg (2001)
Chapter Google Scholar
Wölfel, M., McDonough, J.W.: Distant Speech Recognition. Wiley, New York (2009)
Book Google Scholar
Deng, L., Acero, A., Plumpe, M., Huang, X.: Large-vocabulary speech recognition under adverse acoustic environments. In: ICSLP-2000, vol. 3, pp. 806–809. ISCA, Beijing, China (2000)
Google Scholar
Baba, A., Lee, A., Saruwatari, H., Shikano, K.: Speech recognition by reverberation adapted acoustic model. In: ASJ General Meeting, pp. 27–28 (2002)
Google Scholar
Michaut, F., Bellanger, M.: Filtrage adaptatif: théorie et algorithmes. Hermes Science Publication, Lavoisier (2005)
Google Scholar
Valin, J.M.: On adjusting the learning rate in frequency domain echo cancellation with double talk. IEEE Trans. Acoust. Speech Signal Process. 15(3), 1030–1034 (2007)
Google Scholar
Vacher, M., Fleury, A., Guirand, N., Serignat, J.F., Noury, N.: Speech recognition in a smart home: some experiments for telemonitoring. In: Corneliu Burileanu, H.N.T. (ed.) From Speech Processing to Spoken Language Technology, pp. 171–179. Publishing House of the Romanian Academy, Constanta (2009)
Google Scholar
Vacher, M., Fleury, A., Serignat, J.F., Noury, N., Glasson, H.: Preliminary evaluation of speech/sound recognition for telemedicine application in a real environment. In: Proceedings of the InterSpeech, pp. 496–499 (2008)
Google Scholar
Reidel, K., Tamblyn, R., Patel, V., Huang, A.: Pilot study of an interactive voice response system to improve medication refill compliance. BMC Med. Inform. Decis. Mak. 8, 46 (2008)
Article Google Scholar
Vacher, M., Lecouteux, B., Chahuara, P., Portet, F., Meillon, B., Bonnefond, N.: The Sweet-Home speech and multimodal corpus for home automation interaction. In: The 9th edition of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland, pp. 4499–4506 (2014)
Google Scholar
Nocera, P., Linares, G., Massonié, D., Lefort, L.: Phoneme lattice based A* search algorithm for speech recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 301–308. Springer, Heidelberg (2002)
Chapter Google Scholar
Aman, F., Vacher, M., Rossato, S., Portet, F.: Speech recognition of aged voices in the AAL context: detection of distress sentences. In: The 7th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2013, Cluj-Napoca, Romania, pp. 177–184 (2013)
Google Scholar
Wang, Y., Zhu, X.: A new approach for incremental speaker adaptation. In: Proceedings of the International Symposium on Chinese Spoken Language Processing (ISCSLP 2000), pp. 163–166 (2000)
Google Scholar
Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of the IEEE Workshop ASRU, pp. 347–354 (1997)
Google Scholar
Lecouteux, B., Linarès, G., Estève, Y., Mauclair, J.: System combination by driven decoding. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2007, vol. 4, pp. IV-341–IV-344 (2007)
Google Scholar
Lecouteux, B., Linarès, G., Estève, Y., Gravier, G.: Generalized driven decoding for speech recognition system combination. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2008, pp. 1549–1552 (2008)
Google Scholar
Lecouteux, B., Linarès, G., Nocéra, P., Bonastre, J.: Reconnaissance de la parole guidée par des transcriptions approchées. In: Journées d’Etudes sur la Parole (JEP 2006), Dinard, France, pp. 53–56 (2006)
Google Scholar
Berndt, D., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Workshop on Knowledge Discovery in Databases (KDD 1994) pp. 359–370 (1994)
Google Scholar
Vacher, M., Lecouteux, B., Istrate, D., Joubert, T., Portet, F., Sehili, M., Chahuara, P.: Experimental evaluation of speech recognition technologies for voice-based home automation control in a smart home. In: 4th Workshop on Speech and Language Processing for Assistive Technologies, Grenoble, France, pp. 99–105 (2013)
Google Scholar
Chahuara, P., Portet, F., Vacher, M.: Making context aware decision from uncertain information in a smart home: a Markov logic network approach. In: Augusto, J.C., Wichert, R., Collier, R., Keyson, D., Salah, A.A., Tan, A.-H. (eds.) AmI 2013. LNCS, vol. 8309, pp. 78–93. Springer, Heidelberg (2013)
Chapter Google Scholar
Franco, A.: Conférence invitée: Nouveaux paradigmes et technologies pour la santé et l’autonomie (invited conference: new paradigms and technologies for health and autonomy) [in french]. In: JEP-TALN-RECITAL 2012, Workshop ILADI 2012: Interactions Langagières pour personnes Agées Dans les habitats Intelligents (ILADI 2012: Language Interaction for Elderly in Smart Homes), pp. 1–2. ATALA/AFCP, Grenoble, France, June 2012
Google Scholar
Vacher, M., Lecouteux, B., Portet, F.: Recognition of voice commands by multisource ASR and noise cancellation in a smart home environment. In: EUSIPCO (European Signal Processing Conference), Bucarest, Romania, pp. 1663–1667, 27–31 August 2012
Google Scholar
Gemmeke, J.F., Ons, B., Tessema, N., hamme, H.V., van de Loo, J., Pauw, G.D., Daelemans, W., Huyghe, J., Derboven, J., Vuegen, L., Broeck, B.V.D., Karsmakers, P., Vanrumste, B.: Self-taught assistive vocal interfaces: an overview of the ALADIN project. In: Interspeech 2013, pp. 2039–2043 (2013)
Google Scholar
Christensen, H., Casanueva, I., Cunningham, S., Green, P., Hain, T.: homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition. In: 4th Workshop on Speech and Language Processing for Assistive Technologies (2013)
Google Scholar
Cristoforetti, L., Ravanelli, M., Omologo, M., Sosi, A., Abad, A., Hagmueller, M., Maragos, P.: The DIRHA simulated corpus. In: The 9th edition of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland, pp. 2629–2634 (2014)
Google Scholar
Rougui, J., Istrate, D., Souidene, W.: Audio sound event identification for distress situations and context awareness. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2009, Minneapolis, USA, pp. 3501–3504 (2009)
Google Scholar
Milhorat, P., Istrate, D., Boudy, J., Chollet, G.: Hands-free speech-sound interactions at home. In: Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 1678–1682, August 2012
Google Scholar
Lines, L., Hone, K.S.: Multiple voices, multiple choices: older adults’ evaluation of speech output to support independent living. Gerontechnology J. 5(2), 78–91 (2006)
Google Scholar
Wolters, M.K., Georgila, K., Moore, J.D., MacPherson, S.E.: Being old doesn’t mean acting old: how older users interact with spoken dialog systems. TACCESS 2(1), 1–31 (2009)
Article Google Scholar
Cavazza, M., de la Camara, R.S., Turunen, M.: How was your day?: a companion ECA. In: AAMAS, pp. 1629–1630 (2010)
Google Scholar
Istrate, D., Vacher, M., Serignat, J.F.: Embedded implementation of distress situation identification through sound analysis. J. Inf. Technol. Healthc. 6, 204–211 (2008)
Google Scholar
Charalampos, D., Maglogiannis, I.: Enabling human status awareness in assistive environments based on advanced sound and motion data classification. In: Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments, pp. 1:1–1:8 (2008)
Google Scholar
Popescu, M., Li, Y., Skubic, M., Rantz, M.: An acoustic fall detector system that uses sound height information to reduce the false alarm rate. In: Proceedings 30th Annual International Conference of the IEEE-EMBS 2008, pp. 4628–4631, 20–25 August 2008
Google Scholar
Lecouteux, B., Vacher, M., Portet, F.: Distant speech recognition in a smart home: comparison of several multisource ASRs in realistic conditions. In: Association, I.S.C. (ed.) Interspeech 2011 Florence, pp. 2273–2276. Florence, Italy (2011)
Google Scholar
Bouakaz, S., Vacher, M., Bobillier-Chaumon, M.E., Aman, F., Bekkadja, S., Portet, F., Guillou, E., Rossato, S., Desserée, E., Traineau, P., Vimon, J.P., Chevalier, T.: CIRDO: smart companion for helping elderly to live at home for longer. IRBM 35(2), 101–108 (2014)
Article Google Scholar
Barker, J., Vincent, E., Ma, N., Christensen, H., Green, P.: The PASCAL CHiME speech separation and recognition challenge. Comput. Speech Lang. 27(3), 621–633 (2013)
Article Google Scholar
Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F., Matassoni, M.: The second ‘CHiME’ speech separation and recognition challenge: an overview of challenge systems and outcomes. In: 2013 IEEE Automatic Speech Recognition and Understanding Workshop, Olomouc, Czech Republic, December 2013
Google Scholar
Ryan, W., Burk, K.: Perceptual and acoustic correlates in the speech of males. J. Commun. Disord. 7, 181–192 (1974)
Article Google Scholar
Takeda, N., Thomas, G., Ludlow, C.: Aging effects on motor units in the human thyroarytenoid muscle. Laryngoscope 110, 1018–1025 (2000)
Article Google Scholar
Mueller, P., Sweeney, R., Baribeau, L.: Acoustic and morphologic study of the senescent voice. Ear Nose Throat J. 63, 71–75 (1984)
Google Scholar
Vipperla, R.C., Wolters, M., Georgila, K., Renals, S.: Speech input from older users in smart environments: challenges and perspectives. In: Stephanidis, C. (ed.) Universal Access in HCI, Part II, HCII 2009. LNCS, vol. 5615, pp. 117–126. Springer, Heidelberg (2009)
Chapter Google Scholar
Pellegrini, T., Trancoso, I., Hämäläinen, A., Calado, A., Dias, M.S., Braga, D.: Impact of age in ASR for the elderly: preliminary experiments in European Portuguese. In: Torre Toledano, D., Ortega Giménez, A., Teixeira, A., González Rodríguez, J., Hernández Gómez, L., San Segundo Hernández, R., Ramos Castro, D. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 139–147. Springer, Heidelberg (2012)
Chapter Google Scholar
Baba, A., Yoshizawa, S., Yamada, M., Lee, A., Shikano, K.: Acoustic models of the elderly for large-vocabulary continuous speech recognition. Electron. Commun. 87(2), 49–57 (2004)
Google Scholar
Vipperla, R., Renals, S., Frankel, J.: Longitudinal study of ASR performance on ageing voices. In: Proceedings of Interspeech 2008, Brisbane, pp. 2550–2553 (2008)
Google Scholar
Baeckman, L., Small, B., Wahlin, A.: Aging and memory: cognitive and biological perspectives. In: Birren, J.E., Schaie, K.W. (eds.) Handbook of the Psychology of Aging, 5th edn, pp. 349–377. Academic Press, San Diego (2001)
Google Scholar
Fozard, J., Gordont-Salant, S.: Changes in vision and hearing with aging. In: Birren, J.E., Schaie, K.W. (eds.) Handbook of the Psychology of Aging, 5th edn, pp. 241–266. Academic Press, San Diego (2001)
Google Scholar
Audibert, N., Aubergé, V., Rilliard, A.: The prosodic dimensions of emotion in speech: the relative weights of parameters. In: Proceedings of Interspeech 2005, Lisbon, Portugal, pp. 525–528 (2005)
Google Scholar
Vlasenko, B., Prylipko, D., Philippou-Hübner, D., Wendemuth, A.: Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. Proc. Interspeech 2011, 1577–1580 (2011)
Google Scholar
Vlasenko, B., Prylipko, D., Wendemuth, A.: Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. In: 35th German Conference on Artificial Intelligence (KI-2012), Saarbrücken, Germany, pp. 103–107, September 2012
Google Scholar
Aman, F., Auberge, V., Vacher, M.: How affects can perturbe the automatic speech recognition of domotic interactions. In: Workshop on Affective Social Speech Signals, Grenoble, France, pp. 1–5 (2013)
Google Scholar
Ziefle, M., Wilkowska, W.: Technology acceptability for medical assistance. In: PervasiveHealth, pp. 1–9, March 2010
Google Scholar
McCoy, K., Waller, A.: Introduction to the special issue on AAC. ACM Trans. Access. Comput. 1(3), 1–34 (2009)
Article Google Scholar
McCoy, K., Arnott, J., Ferres, L., Fried-Oken, M., Roark, B.: Speech and language processing as assistive technologies. Comput. Speech Lang. 27, 1143–1146 (2013)
Article Google Scholar

Download references

Acknowledgments

This work is part of the Sweet-Home project supported by the French National Research Agency (Agence Nationale de la Recherche / ANR-09-VERS-011).

Author information

Authors and Affiliations

LIG, CNRS, 38000, Grenoble, France
Michel Vacher
LIG, University Grenoble Alpes, 38000, Grenoble, France
Benjamin Lecouteux & François Portet

Authors

Michel Vacher
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Lecouteux
View author publications
You can also search for this author in PubMed Google Scholar
François Portet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michel Vacher .

Editor information

Editors and Affiliations

Research Unit HCI-KDD, Medical University of Graz, Graz, Austria
Andreas Holzinger
Industrial HCI Research Lab, Fraunhofer Institute, Lemgo, Germany
Carsten Röcker
Human-Computer Interaction Center, RWTH Aachen University, Aachen, Germany
Martina Ziefle

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vacher, M., Lecouteux, B., Portet, F. (2015). On Distant Speech Recognition for Home Automation. In: Holzinger, A., Röcker, C., Ziefle, M. (eds) Smart Health. Lecture Notes in Computer Science(), vol 8700. Springer, Cham. https://doi.org/10.1007/978-3-319-16226-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-16226-3_7
Published: 25 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16225-6
Online ISBN: 978-3-319-16226-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics