Automatic Analysis of Speech and Acoustic Events for Ambient Assisted Living
We present a prototype of an ambient assisted living (AAL) with multimodal user interaction. In our research, the AAL environment is one studio room of 60 + square meters that has several tables, chairs and a sink, as well as equipped with four stationary microphones and two omni-directional video cameras. In this paper, we focus mainly on audio signal processing techniques for monitoring the assistive smart space and recognition of speech and non-speech acoustic events for automatic analysis of human’s activities and detection of possible emergency situations with the user (when an emergent help is needed). Acoustical modeling in our audio recognition system is based on single order Hidden Markov Models with Gaussian Mixture Models. The recognition vocabulary includes 12 non-speech acoustic events for different types of human activities plus 5 useful spoken commands (keywords), including a subset of alarm audio events. We have collected an audio-visual corpus containing about 1.3 h of audio data from 5 testers, who performed proposed test scenarios, and made the practical experiments with the system, results of which are reported in this paper.
KeywordsAmbient assisted living Assistive technology Multimodal user interfaces Universal access Human-Computer interaction Automatic speech recognition Acoustic event detection
This research is partially supported by the Council for Grants of the President of Russia (Projects No. MD-3035.2015.8 and MK-5209.2015.8), by the Russian Foundation for Basic Research (Projects No. 15-07-04415 and 15-07-04322), and by the Government of the Russian Federation (Grant No. 074-U01).
- 1.Burzagli, L., Di Fonzo, L., Emiliani, P.L.: Services and applications in an ambient assisted living (aal) environment. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2014, Part III. LNCS, vol. 8515, pp. 475–482. Springer, Heidelberg (2014)Google Scholar
- 2.Sacco, M., Caldarola, E.G., Modoni, G., Terkaj, W.: Supporting the design of AAL through a SW integration framework: the D4All project. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2014, Part I. LNCS, vol. 8513, pp. 75–84. Springer, Heidelberg (2014)Google Scholar
- 3.Mora, N., Bianchi, V., De Munari, I., Ciampolini, P.: A BCI platform supporting AAL applications. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2014, Part I. LNCS, vol. 8513, pp. 515–526. Springer, Heidelberg (2014)Google Scholar
- 4.Karpov, A., Ronzhin, A.: A Universal assistive technology with multimodal input and multimedia output interfaces. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2014, Part I. LNCS, vol. 8513, pp. 369–378. Springer, Heidelberg (2014)Google Scholar
- 6.Karpov, A., Ronzhin, A., Kipyatkova, I.: An assistive bi-modal user interface integrating multi-channel speech recognition and computer vision. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part II, HCII 2011. LNCS, vol. 6762, pp. 454–463. Springer, Heidelberg (2011)Google Scholar
- 8.Karpov A., Akarun L., Yalçın H., Ronzhin Al., Demiröz B., Çoban A., Zelezny M.: Audio-visual signal processing in a multimodal assisted living environment. In: Proceedings of the 15th International Conference, INTERSPEECH-2014, Singapore, pp. 1023–1027 (2014)Google Scholar
- 11.Drugman T., Urbain J., Dutoit T. Assessment of audıo features for automatıc cough detectıon. In: Proceedings of the 19th European Signal Processing Conference, EUSIPCO-2011, Barcelona, Spain, pp. 1289–1293 (2011)Google Scholar
- 13.Miao, Yu., Naqvi, S.M., Rhuma, A., Chambers J.: Fall detection in a smart room by using a fuzzy one class support vector machine and imperfect training data. In: Proceedings of the 36th International Conference, ICASSP-2011, Prague, Czech Republic, pp. 1833–1836 (2011)Google Scholar
- 14.Huynh, T.H., Tran, V.A., Tran, H.D.: Semi-supervised tree support vector machine for online cough recognition, In: Proceedings of the 12th International Conference, INTERSPEECH-2011, Florence, Italy, pp. 1637–1640 (2011)Google Scholar
- 15.Aman, F., Vacher, M., Rossato S., Portet, F.: In-Home Detection of Distress Calls: The Case of Aged Users. In: Proceedings of the 14th International Conference, INTERSPEECH-2013, Lyon, France, pp. 2065–2067 (2013)Google Scholar
- 16.Levin, K. et al.: Automated Closed Captioning for Russian Live Broadcasting. In: Proceedings of the 15th International Conference, INTERSPEECH-2014, Singapore, pp. 1438–1442 (2014)Google Scholar