Abstract
Systems for keyword and non-linguistic vocalization detection in conversational agent applications need to be robust with respect to background noise and different speaking styles. Focussing on the Sensitive Artificial Listener (SAL) scenario which involves spontaneous, emotionally colored speech, this paper proposes a multi-stream model that applies the principle of Long Short-Term Memory to generate context-sensitive phoneme predictions which can be used for keyword detection. Further, we investigate the incorporation of noisy training material in order to create noise robust acoustic models. We show that both strategies can improve recognition performance when evaluated on spontaneous human-machine conversations as contained in the SEMAINE database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
McTear, M.F.: Spoken dialogue technology: enabling the conversational user interface. ACM Computing Surverys 34(1), 90–169 (2002)
Droppo, J., Acero, A.: Environmental robustness. In: Handbook of Speech Processing, pp. 658–659. Springer, Heidelberg (2007)
Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Recognition of noisy speech: A comparative survey of robust model architecture and feature enhancement. Journal on Audio, Speech, and Music Processing (2009), ID 942617
Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: Tandem connectionist feature extraction for conversational speech recognition. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 223–231. Springer, Heidelberg (2005)
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks, pp. 1–15. IEEE Press, Los Alamitos (2001)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework. Cognitive Computation 2(3), 180–190 (2010)
Wöllmer, M., Eyben, F., Schuller, B., Rigoll, G.: Recognition of spontaneous conversational speech using long short-term memory phoneme predictions. In: Proc. of Interspeech, Makuhari, Japan, pp. 1946–1949 (2010)
Schröder, M., Cowie, R., Heylen, D., Pantic, M., Pelachaud, C., Schuller, B.: Towards responsive sensitive artificial listeners. In: Proc. of 4th Intern. Workshop on Human-Computer Conversation, Bellagio, Italy, pp. 1–6 (2008)
Wöllmer, M., Schuller, B., Eyben, F., Rigoll, G.: Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening. IEEE Journal of Selected Topics in Signal Processing 4(5), 867–881 (2010)
Stupakov, A., Hanusa, E., Bilmes, J., Fox, D.: COSINE - a corpus of multi-party conversational speech in noisy environments. In: Proc. of ICASSP, Taipei, Taiwan (2009)
Eyben, F., Wöllmer, M., Schuller, B.: openSMILE - the Munich versatile and fast open-source audio feature extractor. In: Proc. of ACM Multimedia, Firenze, Italy, pp. 1459–1462 (2010)
Principi, E., Cifani, S., Rocchi, C., Squartini, S., Piazza, F.: Keyword spotting based system for conversation fostering in tabletop scenarios: preliminary evaluation. In: Proc. of HSI, Catania, Italy, pp. 216–219 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wöllmer, M., Marchi, E., Squartini, S., Schuller, B. (2011). Robust Multi-stream Keyword and Non-linguistic Vocalization Detection for Computationally Intelligent Virtual Agents. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds) Advances in Neural Networks – ISNN 2011. ISNN 2011. Lecture Notes in Computer Science, vol 6676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21090-7_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-21090-7_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21089-1
Online ISBN: 978-3-642-21090-7
eBook Packages: Computer ScienceComputer Science (R0)