Abstract
Speech is broadly considered as being the most natural communication form for humans. Obviously, there are manifold applications opening up for general technical and computer systems, once they are able to recognise speech as well as humans do—be it for interaction purposes with humans, mediation purposes between humans, or speech retrieval. Here, state-of-the-art methodology is presented for highly robust speech recognition, nonlinguistic vocalisation recognition, paralinguistic speaker states and traits as exemplified by sentiment, emotion, interest, age, gender, intoxication and sleepiness. All examples stem from the author’s recent work. In particular the latter are chosen from a series of Challenges co-organised by the author at Interspeech from 2009 onwards.
Speech is an arrangement of notes that will never be played again.
—Francis Scott Fitzgerald.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
http://www.metacritic.com, accessed January 2009.
- 9.
- 10.
- 11.
- 12.
Per mill BAC by volume (standard in most central and eastern European countries; further ways exist, e.g., percent BAC by volume, i.e., the range resembles 0.028 to 0.175 per cent (Australia, Canada, USA), points by volume (GB), per mill by BAC per mass (Scandinavia) or part per million.)
References
Shriberg, E.: Spontaneous speech: how peoply really talk and why engineers should care. In: Proceedings of Eurospeech, pp. 1781–1784. Lisbon (2005)
Schuller, B., Ablameier, M., Müller, R., Reifinger, S., Poitschke, T., Rigoll, G.: Speech communication and multimodal interfaces. In: Kraiss, K.-F. (ed.) Advanced Man Machine Interaction. Signals and Communication Technology. Chapter 4, pp. 141–190. Springer, Berlin (2006)
Lee, C.-C., Black, M., Katsamanis, A., Lammert, A., Baucom, B., Christensen, A., Georgiou, P., Narayanan, S.: Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples. In: Proceedings of Interspeech, pp. 793–796, Makuhari (2010)
Schuller, B., Wöllmer, M., Eyben, F., Rigoll, G.: Retrieval of paralinguistic information in broadcasts. In: Maybury, M.T. (ed.) Multimedia Information Extraction: Advances in Video, Audio, and Imagery Extraction for Search, Data Mining, Surveillance, and Authoring. Chapter 17, pp. 273–288. Wiley, IEEE Computer Society Press (2012)
Moreno, P.: Speech recognition in noisy environments. PhD thesis, Carnegie Mellon University, Pittsburgh (1996)
Kim, D., Lee, S., Kil, R.: Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans. Speech Audio Process. 7, 55–69 (1999)
Rose, R.: Environmental robustness in automatic speech recognition. In: COST278 and ISCA Tutorial and Research Workshop on Robustness Issues in Conversational, Interaction (2004)
Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Robust spelling and digit recognition in the car: switching models and their like. In: Proceedings 34. Jahrestagung für Akustik, DAGA. DEGA, pp. 847–848. Dresden, March 2008
Schuller, B., Wöllmer, M., Moosmayr, T., Ruske, G., Rigoll, G.: Switching linear dynamic models for noise robust in-car speech recognition. In: Rigoll, G. (ed.) Pattern Recognition: 30th DAGM Symposium Munich, Germany. Proceedings of Lecture Notes on Computer Science (LNCS), vol. 5096, pp. 244–253. Springer, Berlin 10–13 June 2008
Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement. EURASIP J. Audio Speech Music Process. 2009(Article ID 942617), 17 (2009)
Wöllmer, M., Eyben, F., Schuller, B., Sun, Y., Moosmayr, T., Nguyen-Thien, N.: Robust in-car spelling recognition: a tandem blstm-hmm approach. In: Proceedings INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, pp. 1990–9772. ISCA, Brighton, Sept 2009
Schuller, B., Weninger, F., Wöllmer, M., Sun, Y. Rigoll, G.: Non-negative matrix factorization as noise-robust feature extractor for speech recognition. In Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 4562–4565. IEEE, Dallas, March 2010
Weninger, F., Geiger, J., Wöllmer, M., Schuller, B., Rigoll, G.: The munich 2011 chime challenge contribution: Nmf-blstm speech enhancement and recognition for reverberated multisource environments. In: Proceedings Machine Listening in Multisource Environments, CHiME 2011, Satellite Workshop of Interspeech, pp. 24–29. ISCA, Florence, Sept 2011
Weninger, F., Wöllmer, M., Geiger, J. Schuller, B., Gemmeke, J., Hurmalainen, A., Virtanen, T., Rigoll, G.: Non-negative matrix factorization for highly noise-robust asr: to enhance or to recognize? In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 4681–4684. IEEE, Kyoto, March 2012
de la Torre, A., Fohr, D., Haton, J.: Compensation of noise effects for robust speech recognition in car environments. In: Proceedings of International Conference on Spoken Language Processing (2000)
Langmann, D., Fischer, A., Wuppermann, F., Haeb-Umbach, R., Eisele, T.: Acoustic front ends for speaker-independent digit recognition in car environments. In: Proceedings of Eurospeech, pp. 2571–2574 (1997)
Doddington, G., Schalk, T.: Speech recognition: turning theory to practice. In: IEEE Spectrum, pp. 26–32 (1981)
Hirsch, H.G., Pierce, D.: The AURORA experimental framework for the performance evaluation of speech recognition systems under noise conditions. Challenges for the Next Millenium, Automatic Speech Recognition (2000)
Mesot, B., Barber, D.: Switching linear dynamical systems for noise robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 15, 1850–1858 (2007)
Schuller, B., Rigoll, G., Grimm, M., Kroschel, K., Moosmayr, T., Ruske, G.: Effects of in-car noise-conditions on the recognition of emotion within speech. In: Proceedings 33. Jahrestagung für Akustik, DAGA 2007, pp. 305–306. DEGA, Stuttgart, March 2007
Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G., Moosmayr, T.: On the necessity and feasibility of detecting a driver’s emotional state while driving. In: Paiva, A., Picard, R.W., Prada, R. (eds.) Affective Computing and Intelligent Interaction: Second International Conference, pp. 126–138. ACII 2007, Lisbon, Portugal, September 12–14, 2007. Proceedings of Lecture Notes on Computer Science (LNCS)Springer, vol. 4738/2007. Berlin/Heidelberg (2007)
Schuller, B.: Speaker, noise, and acoustic space adaptation for emotion recognition in the automotive environment. In: Proceedings 8th ITG Conference on Speech Communication, vol. 211, p. 4. ITG-Fachbericht, Aachen, Germany, ITG, VDE-Verlag (2008)
Cooke, M., Scharenborg, O.: The interspeech 2008 consonant challenge. In: Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and, Signal Processing (2008)
Borgström, B., Alwan, A.: HMM-based estimation of unreliable spectral components for noise robust speech recognition. In: Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and, Signal Processing (2008)
Jancovic, P., Münevver, K.: On the mask modeling and feature representation in the missing-feature ASR: evaluation on the consonant challenge. In: Proceedings of Interspeech (2008)
Gemmeke, J., Cranen, B.: Noise reduction through compressed sensing. In: Proceedings of Interspeech (2008)
Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement. In: Proceedings INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, incorporating 12th Australasian International Conference on Speech Science and Technology, SST 2008, pp. 1789–1792, Brisbane, Australia, ISCA/ASSTA, ISCA (2008)
Wöllmer, M., Eyben, F., Schuller, B., Rigoll, G.: A multi-stream asr framework for blstm modeling of conversational speech. In: Proceedings 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 4860–4863. Prague, Czech Republic, IEEE, IEEE (2011)
Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: A tandem blstm-dbn architecture for keyword spotting with enhanced context modeling. In: Proceedings ISCA Tutorial and Research Workshop on Non-Linear Speech Processing, p. 9. NOLISP 2009, Vic, Spain. ISCA, ISCA (2009)
Wöllmer, M., Eyben, F., Keshet, J., Graves, A., Schuller, B., Rigoll, G.: Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional lstm networks. In: Proceedings 34th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, pp. 3949–3952. Taipei, Taiwan, IEEE, IEEE (2009)
Wöllmer, M., Eyben, F., Schuller, B., Rigoll, G.: Robust vocabulary independent keyword spotting with graphical models. In: Proceedings 11th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2009, pp. 349–353. Merano, Italy, IEEE, IEEE (2009)
Wöllmer, M., Sun, Y., Eyben, F., Schuller, B.: Long short-term memory networks for noise robust speech recognition. In: Proceedings INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 2966–2969. Makuhari, Japan, ISCA, ISCA (2010)
Wöllmer, M., Eyben, F., Schuller, B., Rigoll, G.: Spoken term detection with connectionist temporal classification: a novel hybrid ctc-dbn decoder. In: Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5274–5277. Dallas, TX, IEEE, IEEE (2010)
Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: Improving keyword spotting with a tandem blstm-dbn architecture. In: Sole-Casals, J., Zaiats, V. (eds.) Advances in Non-Linear Speech Processing: International Conference on Nonlinear Speech Processing, NOLISP 2009, Vic, Spain, 25–27 June 2009. Revised Selected Papers of Lecture Notes on Computer Science (LNCS), vol. 5933/2010, pp. 68–75. Springer (2010)
Wöllmer, M., Eyben, F., Schuller, B., Rigoll, G.: Recognition of spontaneous conversational speech using long short-term memory phoneme predictions. In: Proceedings INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 1946–1949. Makuhari, Japan, ISCA, ISCA (2010)
Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: Bidirectional lstm networks for context-sensitive keyword detection in a cognitive virtual agent framework. Cogn. Comput. Spec. Issue Non-Linear Non-Conv. Speech Proces. 2(3), 180–190 (2010)
Wöllmer, M., Schuller, B.: Enhancing spontaneous speech recognition with blstm features. In: Travieso-González, C.M., Alonso-Hernández, J. (eds.) Advances in Nonlinear Speech Processing, 5th International Conference on Nonlinear Speech Processing, NoLISP 2011, Las Palmas de Gran Canaria, Spain, 7–9 November 2011. Proceedings of Lecture Notes in Computer Science (LNCS), vol. 7015/2011, pp. 17–24. Springer (2011)
Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Multi-stream lstm-hmm decoding and histogram equalization for noise robust keyword spotting. Cogn. Neurodyn. 5(3), 253–264 (2011)
Wöllmer, M., Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Tandem decoding of children’s speech for keyword detection in a child-robot interaction scenario. In: ACM Transactions on Speech and Language Processing. Special Issue on Speech and Language Processing of Children’s Speech for Child-machine Interaction Applications, vol. 7, Issue 4, p. 22 (2011)
Wöllmer, M., Schuller, B., Rigoll, G.: A novel bottleneck-blstm front-end for feature-level context modeling in conversational speech recognition. In: Proceedings 12th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2011, pp. 36–41. Big Island, HY, IEEE, IEEE (2011)
Wöllmer, M., Schuller, B., Rigoll, G.. Feature frame stacking in rnn-based tandem asr systems—learned vs. predefined context. In: Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 1233–1236. Florence, Italy, ISCA, ISCA (2011)
Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Robust multi-stream keyword and non-linguistic vocalization detection for computationally intelligent virtual agents. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) Proceedings 8th International Conference on Advances in Neural Networks, ISNN 2011, Guilin, China, 29.05.–01.06.2011. Part II of Lecture Notes in Computer Science (LNCS), vol. 6676, pp. 496–505. Springer, Berlin/Heidelberg (2011)
Schröder, M., Bevacqua, E., Cowie, R., Eyben, F., Gunes, H., Heylen, D., ter Maat, M., McKeown, G., Pammi, S., Pantic, M., Pelachaud, C., Schuller, B., de Sevin, E., Valstar, M., Wöllmer, M.: Building autonomous sensitive artificial listeners. IEEE Trans. Affect. Comput. 3(2):165–183 (2012)
Aradilla, G., Vepa, J., Bourlard, H.: An acoustic model based on Kullback-Leibler divergence for posterior features. In: Proceedings of the ICASSP, pp. 657–660. Honolulu, HI (2007)
Grezl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: Proceedings of the ICASSP, pp. 4729–4732. Las Vegas, NV (2008)
Hermansky, H., Fousek, P.: Multi-resolution RASTA filtering for TANDEM-based ASR. In: Proceedings of the European Conference on Speech Communication and Technology, pp. 361–364. Lisbon, Portugal (2008)
Graves, A., Fernandez, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Proceedings of ICANN, pp. 602–610. Warsaw, Poland (2005)
Fernandez, S., Graves, A., Schmidhuber, J.: An application of recurrent neural networks to discriminative keyword spotting. In: Proceedings of Internet Corporation for Assigned Names and Numbers 2007, vol. 4669, pp. 220–229. Porto, Portugal (2007)
Stupakov, A., Hanusa, E., Bilmes, J., Fox, D.: COSINE—a corpus of multi-party conversational speech in noisy environments. In: Proceedings of the ICASSP, Taipei, Taiwan (2009)
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile—the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 9th ACM International Conference on Multimedia, MM 2010, pp. 1459–1462. Florence, Italy, ACM, ACM (2010)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5—-6), 602–610 (2005)
Campbell, N.: On the use of nonverbal speech sounds in human communication. In: Proceedings of the COST 2102 Workshop, pp. 117–128. Vietri sul Mare, Italy (2007)
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 2253–2256. Antwerp, Belgium. ISCA, ISCA (2007)
Schuller, B., Eyben, F., Rigoll, G.: Static and dynamic modelling for the recognition of non-verbal vocalisations in conversational speech. In: André, E., Dybkjaer, L., Neumann, H., Pieraccini, R., Weber, M. (eds.) Perception in Multimodal Dialogue Systems: 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, pp. 99–110. PIT 2008, Kloster Irsee, Germany, 16–18 June 2008. Proceedings of Lecture Notes on Computer Science (LNCS), vol. 5078/2008. Springer, Berlin/Heidelberg (2008)
Batliner, A., Steidl, S., Eyben, F., Schuller, B., Laughter in child-robot interaction. In: Proceedings Interdisciplinary Workshop on Laughter and other Interactional Vocalisations in Speech, Laughter, Berlin. February, Germany (2009)
Eyben, F., Petridis, S., Schuller, B., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: Proceedings 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 5844–5847. Prague, Czech Republic, IEEE, IEEE (2011)
Batliner, A., Steidl, S., Eyben, F., Schuller, B.: On laughter and speech laugh, based on observations of child-robot interaction. In: Trouvain, J., Campbell, N. (eds.) The Phonetics of Laughing, p. 23. Saarland University Press, Saarbrücken (2012)
Prylipko, D., Schuller, B., Wendemuth, A.: Fine-tuning hmms for nonverbal vocalizations in spontaneous speech: a multicorpus perspective. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 4625–4628, Kyoto, Japan, IEEE, IEEE (2012)
Eyben, F., Petridis, S., Schuller, B., Pantic, M.: Audiovisual vocal outburst classification in noisy acoustic conditions. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 5097–5100. Kyoto, Japan, IEEE, IEEE (2012)
M. Goto, K. Itou, and S. Hayamizu. A real-time filled pause detection system for spontaneous speech recognition. In: Proceedings of the Eurospeech, pp. 227–230. Budapest, Hungary (1999)
Truong, K.P., van Leeuwen, D.A.: Automatic detection of laughter. In: Proceedings of the Interspeech, pp. 485–488. Lisbon, Portugal (2005)
Campbell, N., Kashioka, H., Ohara, R.: No laughing matter. In: Proceedings of the Interspeech, pp. 465–468. Lisbon, Portugal (2005)
Knox, M.T., Mirghafori, N.: Automatic laughter detection using neural networks. In: Proceedings INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 2973–2976. Antwerp, Belgium, ISCA, ISCA (2007)
Cho, Y.-C., Choi, S., Bang, S.-Y.: Non-negative component parts of sound for classification. In: Proceedings of the ISSPIT, pp. 633–636. Darmstadt, Germany (2003)
Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Image Vis. Comput. Special Issue on Visual and Multimodal Analysis of Human Spontaneous Behavior 27(12), 1760–1774 (2009)
Schuller, B., Weninger, F.: Discrimination of speech and non-linguistic vocalizations by non-negative matrix factorization. In: Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5054–5057. Dallas, TX, IEEE, IEEE (2010)
Schmidt, M.N., Olsson, R.K.: Single-channel speech separation using sparse non-negative matrix factorization. In: Proceedings of the Interspeech, pp. 2–5. Pittsburgh, Pennsylvania (2006)
Smaragdis, P.: Discovering auditory objects through non-negativity constraints. In: Proceedings of the SAPA, Jeju, Korea (2004)
Schuller, B.: Automatisches verstehen gesprochener mathematischer formeln. Technische Universität München, Munich, Germany, October, Diploma thesis (1999)
Schuller, B., Schenk, J., Rigoll, G., Knaup, T.: “the godfather” vs. “chaos”: comparing linguistic analysis based on online knowledge sources and bags-of-n-grams for movie review valence estimation. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 858–862. Barcelona, Spain, IAPR, IEEE (2009)
Schuller, B., Knaup, T.: Learning and knowledge-based sentiment analysis in movie review key excerpts. In: Esposito, A., Esposito, A.M., Martone, R., Müller, V., Scarpetta, G. (eds.) Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues: Third COST 2102 International Training School, Caserta, Italy, 15–19 March 2010, Revised Selected Papers, Lecture Notes on Computer Science (LNCS), vol. 6456/2010, 1st edn, pp. 448–472. Springer, Heidelberg (2011)
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Proceedings INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, pp. 312–315. Brighton, UK, ISCA, ISCA (2009)
Schuller, B., Steidl, S., Batliner, A., Jurcicek, F.: The interspeech 2009 emotion challenge—results and lessons learnt. Speech and Language Processing Technical Committee (SLTC) Newsletter (2009)
Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. Special Issue on Sensing Emotion and Affect—Facing Realism in Speech Processing. 53(9/10), 1062–1087 (2011)
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C, Narayanan, S.: The interspeech 2010 paralinguistic challenge. In: Proceedings INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 2794–2797. Makuhari, Japan, ISCA, ISCA (2010)
Schuller, B., Wöllmer, M., Eyben, F., Rigoll, G., Arsić, D.: Semantic speech tagging: towards combined analysis of speaker traits. In: Brandenburg, K., Sandler, M. (eds.) Proceedings AES 42nd International Conference, pp. 89–97. AES, Audio Engineering Society, Ilmenau (2011)
Schuller, B., Batliner, A., Steidl, S., Schiel, F., Krajewski, J.: The interspeech 2011 speaker state challenge. In: Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 3201–3204. Florence, Italy, ISCA, ISCA (2011)
Chen, A.: Perception of paralinguistic intonational meaning in a second language. Lang. Learn. 59(2), 367–409 (2009)
Bello, R.: Causes and paralinguistic correlates of interpersonal equivocation. J. Pragmat. 38(9), 1430–1441 (2006)
Fernandez, R., Picard, R.W.: Modeling drivers’ speech under stress. Speech Commun. 40, 145–159 (2003)
Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., Cox, C.: ASR for emotional speech: Clarifying the issues and enhancing performance. Neural Netw. 18, 437–444 (2005)
Steidl, S., Batliner, A., Seppi, D., Schuller, B.: On the impact of children’s emotional speech on acoustic and language models. EURASIP J. Audio, Speech, Music Process. Special Issue on Atypical Speech 2010(Article ID 783954), 14 (2010)
Wöllmer, M., Schuller, B., Eyben, F., Rigoll, G.: Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening. IEEE J. Sel. Topics Signal Process. Special Issue on Speech Processing for Natural Interaction with Intelligent Environments 4(5), 867–881 (2010)
Wöllmer, M., Klebert, N., Schuller, B.: Switching linear dynamic models for recognition of emotionally colored and noisy speech. In: Proceedings 9th ITG Conference on Speech Communication, ITG-Fachbericht, vol. 225. Bochum, Germany, ITG, VDE-Verlag (2010)
Romanyshyn, N.: Paralinguistic maintenance of verbal communicative interaction in literary discourse (on the material of W. S. Maugham’s novel "Theatre"). In: Experience of Designing and Application of CAD Systems in Microelectronics—Proceedings of the 10th International Conference, CADSM 2009, pp. 550–552. Polyana-Svalyava, Ukraine (2009)
Kennedy, L., Ellis, D.: Pitch-based emphasis detection for characterization of meeting recordings. In: Proceedings of the ASRU, pp. 243–248. Virgin Islands (2003)
Laskowski, K.: Contrasting emotion-bearing laughter types in multiparticipant vocal activity detection for meetings. In: Proceedings of the ICASSP, pp. 4765–4768. Taipei, Taiwan, IEEE (2009)
Massida, Z., Belin, P., James, C., Rouger, J., Fraysse, B., Barone, P., Deguine, O.: Voice discrimination in cochlear-implanted deaf subjects. Hear. Res. 275(1–2), 120–129 (2011)
Demouy, J., Plaza, M., Xavier, J., Ringeval, F., Chetouani, M. Prisse, D., Chauvin, D., Viaux, S., Golse, B., Cohen, D., Robel, L.: Differential language markers of pathology in autism, pervasive developmental disorder not otherwise specified and specific language impairment. Res. Autism Spectr. Disord. 5(4), 1402–1412 (2011)
Mower, E., Black, M., Flores, E., Williams, M., Narayanan, S.: Design of an emotionally targeted interactive agent for children with autism. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2011), pp. 1–6. Barcelona, Spain (2011)
de Sevin, E., Bevacqua, E., Pammi, S., Pelachaud, C., Schröder, M., Schuller, B.: A multimodal listener behaviour driven by audio input. In: Proceedings International Workshop on Interacting with ECAs as Virtual Characters, satellite of AAMAS 2010, p. 4. Toronto, Canada, ACM, ACM (2010)
Biever, C.: You have three happy messages. New Sci. 185(2481), 21 (2005)
Martinez, C.A., Cruz, A.: Emotion recognition in non-structured utterances for human-robot interaction. In: IEEE International Workshop on Robot and Human Interactive, Communication, pp. 19–23 (2005)
Batliner, A., Steidl, S., Nöth, E.: Associating children’s non-verbal and verbal behaviour: body movements, emotions, and laughter in a human-robot interaction. In: Proceedings of ICASSP, pp. 5828–5831. Prague (2011)
Delaborde, A., Devillers, L.: Use of non-verbal speech cues in social interaction between human and robot: emotional and interactional markers. In: AFFINE’10—Proceedings of the 3rd ACM Workshop on Affective Interaction in Natural Environments, Co-located with ACM Multimedia 2010, pp. 75–80. Florence, Italy (2010)
Schröder, M., Cowie, R., Heylen, D., Pantic, M., Pelachaud, C., Schuller, B.: Towards responsive sensitive artificial listeners. In: Proceedings 4th International Workshop on Human-Computer Conversation, p. 6. Bellagio, Italy (2008)
Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Proceedings of the Electronic Speech Signal Processing ESSP, pp. 123–131 (2005)
Mishne, G., Carmel, D., Hoory, R., Roytman, A., Soffer, A.: Automatic analysis of call-center conversations. In: Proceedings of the CIKM’05, pp. 453–459. Bremen, Germany (2005)
Belin, P., Fillion-Bilodeau, S., Gosselin, F.: The montreal affective voices: a validated set of nonverbal affect bursts for research on auditory affective processing. Behav. Res. Meth. 40(2), 531–539 (2008)
Schoentgen, J.: Vocal cues of disordered voices: an overview. Acta Acustica United Acustica 92(5), 667–680 (2006)
Rektorova, I., Barrett, J., Mikl, M., Rektor, I., Paus, T.: Functional abnormalities in the primary orofacial sensorimotor cortex during speech in parkinson’s disease. Mov. Disord 22(14), 2043–2051 (2007)
Sapir, S., Ramig, L.O., Spielman, J.L., Fox, C.: Formant centralization ratio: a proposal for a new acoustic measure of dysarthric speech. J. Speech Lang. Hear. Res. 53 (2009)
Oller, D.K., Niyogic, P., Grayd, S., Richards, J.A., Gilkerson, J., Xu, D., Yapanel, U., Warrene, S.F.: Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. In: Proceedings of the National Academy of Sciences of the United States of America (PNAS), vol. 107. (2010)
Maier, A., Haderlein, T., Eysholdt, U., Rosanowski, F., Batliner, A., Schuster, M., Nöth, E.: PEAKS—a system for the automatic evaluation of voice and speech disorders. Speech Commun. 51, 425–437 (2009)
Malyska, N., Quatieri, T., Sturim, D.: Automatic dysphonia recognition using bilogically inspired amplitude-modulation features. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. I, pp. 873–876. Prague (2005)
Dibazar, A., Narayanan, S.: A system for automatic detection of pathological speech. In: Proceedings of Conference Signals, Systems, and Computers, Asilomar, CA (2002)
Litman, D., Rotaru, M., Nicholas, G.: Classifying turn-level uncertainty using word-level prosody. In: Proceedings of the Interspeech, pp. 2003–2006. Brighton, UK (2009)
Boril, H., Sadjadi, S., Kleinschmidt, T., Hansen, J.: Analysis and detection of cognitive load and frustration in drivers’ speech. In: Proceedings of the Interspeech 2010, pp. 502–505. Makuhari, Japan (2010)
Litman, D., Forbes, K.: Recognizing emotions from student speech in tutoring dialogues. In: Proceedings of ASRU, pp. 25–30. Virgin Island (2003)
Ai, H., Litman, D., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of the Interspeech, pp. 797–800. Pittsburgh (2006)
Price, L., Richardson, J.T.E., Jelfs, A.: Face-to-face versus online tutoring support in distance education. Stud. High. Edu. 32(1), 1–20 (2007)
Pfister, T., Robinson, P.: Speech emotion classification and public speaking skill assessment. In: Proceedings of the International Workshop on Human Behaviour Understanding, pp. 151–162. Istanbul, Turkey (2010)
Schuller, B., Eyben, F., Can, S., Feussner, H.: Speech in minimal invasive surgery—towards an affective language resource of real-life medical operations. In: Devillers, L., Schuller, B., Cowie, R., Douglas-Cowie, E., Batliner, A. (eds.) Proceedings 3rd International Workshop on EMOTION: Corpora for Research on Emotion and Affect, satellite of LREC 2010, pp. 5–9. Valletta, Malta. ELRA, European Language Resources Association (2010)
Ronzhin, A.L.: Estimating psycho-physiological state of a human by speech analysis. Proc. SPIE Int. Soc. Opt. Eng. 5797, 170–181 (2005)
Schuller, B., Wimmer, M, Arsić, D., Moosmayr, T., Rigoll, G.: Detection of security related affect and behaviour in passenger transport. In: Proceedings INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, incorporating 12th Australasian International Conference on Speech Science and Technology, SST 2008, pp. 265–268. Brisbane, Australia. ISCA/ASSTA, ISCA (2008)
Kwon, H., Berisha, V., Spanias, A.: Real-time sensing and acoustic scene characterization for security applications. In: 3rd International Symposium on Wireless Pervasive Computing, ISWPC 2008, Proceedings, pp. 755–758 (2008)
Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50(6), 487–503 (2008)
Boril, H., Sangwan, A., Hasan, T., Hansen, J.: Automatic excitement-level detection for sports highlights generation. In: Proceedings of the Interspeech 2010, pp. 2202–2205. Makuhari, Japan (2011)
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424. Philadelphia (2002)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on World Wide Web, pp. 519–528. Budapest, Hungary, ACM (2003)
Yi, J., Nasukawa, T., Bunescu, R., Niblack, W.: Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 427–434 (2003)
Popescu, A., Etzioni, O.: Extracting product features and opinions from reviews. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 339–346. Association for Computational Linguistics Morristown, NJ, USA (2005)
B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and comparing opinions on the web. In: WWW ’05: Proceedings of the 14th international conference on World Wide Web, pp. 342–351. New York, NY, ACM (2005)
Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: WSDM ’08: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 231–240, New York, NY, USA, ACM (2008)
Das, S.R., Chen, M.Y.: Yahoo! for amazon: sentiment parsing from small talk on the web. In: Proceedings of the 8th Asia Pacific Finance Association Annual Conference (2001)
Pang., B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86. Philadelphia, PA (2002)
Zhuang, L., Jing, F., Zhu, X.-Y.: Movie review mining and summarization. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM ’06), pp. 43–50, New York, NY, USA, ACM (2006)
Porter, M.F.: An algorithm for suffix stripping. Program 3(14), 130–137 (October 1980)
Marcus, M., Marcinkiewicz, M., Santorini, B.: Building a large annotated corpus of english: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: NAACL ’03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 134–141. Morristown, NJ, USA. Association for Computational Linguistics (2003)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Wiebe, J., Wilson, T., Bell, M.: Identifying collocations for recognizing opinions. In: Proceedings of the ACL-01 Workshop on Collocation: Computational Extraction, Analysis, and Exploitation, pp. 24–31 (2001)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: HLT ’05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Morristown, NJ, USA, Association for Computational Linguistics (2005)
Turney, P.D., Littman, M.L.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. 21(4), 315–346 (October 2003)
Esuli, A., Sebastiani, F.: Determining term subjectivity and term orientation for opinion mining. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL ’06), Trento, Italy (2006)
Lizhong, W., Oviatt, S., Cohen, P.R.: Multimodal integration—a statistical view. IEEE Trans. Multimed. 1, 334–341 (1999)
Wöllmer, M., Al-Hames, M., Eyben, F., Schuller, B., Rigoll, G.: A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams. Neurocomputing 73(1–3), 366–380 (2009)
Liu, D.: Automatic mood detection from acoustic music data, pp. 13–17. In: Proceedings International Conference on Music, Information Retrieval (2003)
Nose, T., Kato, Y., Kobayashi, T.: Style estimation of speech based on multiple regression hidden semi-markov model. In: Proceedings INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 2285–2288. Antwerp, Belgium, ISCA, ISCA (2007)
Zhang, C., Hansen, J.H.L.: Analysis and classification of speech mode: whispered through shouted. In: International Speech Communication Association—8th Annual Conference of the International Speech Communication Association, Interspeech 2007, vol. 4, pp. 2396–2399 (2007)
Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Commun. 40, 227–256 (2003)
Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., Vogt, T., Aharonson, V., Amir, N.: The automatic recognition of emotions in speech. In: Cowie, R., Petta, P., Pelachaud, C. (eds.) Emotion-Oriented Systems: The HUMAINE Handbook, Cognitive Technologies, 1st edn, pp. 71–99. Springer, New York (2010)
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., Amir, N.: Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput. Speech Lang. Special Issue on Affective Speech in real-life interactions 25(1), 4–28 (2011)
Batliner, A., Steidl, S., Hacker, C., Nöth, E.: Private emotions vs. social interaction—a data-driven approach towards analysing emotions in speech. User Modeling and User-Adapted Interaction. J. Personal. Res. 18(1–2), 175–206 (2008)
Hansen, J., Bou-Ghazale, S.: Getting started with susas: a speech under simulated and actual stress database. In: Proceedings of the EUROSPEECH-97, vol. 4, pp. 1743–1746. Rhodes, Greece (1997)
Batliner, A., Schuller, B., Schaeffler, S., Steidl, S.: Mothers, adults, children, pets—towards the acoustics of intimacy. In: Proceedings 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, pp. 4497–4500. Las Vegas, NV, IEEE, IEEE (2008)
Pon-Barry, H.: Prosodic manifestations of confidence and uncertainty in spoken language. In: INTERSPEECH 2008—9th Annual Conference of the International Speech Communication Association, pp. 74–77. Brisbane, Australia (2008)
Black, M., Chang, J., Narayanan, S.: An empirical analysis of user uncertainty in problem-solving child-machine interactions. In: Proceedings of the 1st Workshop on Child, Computer and Interaction, Chania, Greece (2008)
Enos, F., Shriberg, E., Graciarena, M., Hirschberg, J., Stolcke, A.: Detecting deception using critical segments. In: Proceedings INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 2281–2284. Antwerp, Belgium, ISCA, ISCA (2007)
Bénézech, M.: Vérité et mensonge : l’évaluation de la crédibilité en psychiatrie lgale et en pratique judiciaire. Annales Medico-Psychologiques 165(5), 351–364 (2007)
Nadeu, M., Prieto, P.: Pitch range, gestural information, and perceived politeness in catalan. J. Pragmat. 43(3), 841–854 (2011)
Yildirim, S., Lee, C., Lee, S., Potamianos, A., Narayanan, S.: Detecting politeness and frustration state of a child in a Conversational Computer Game. In: Proceedings of the Interspeech 2005, pp. 2209–2212. Lisbon, Portugal, ISCA (2005)
Yildirim, S., Narayanan, S., Potamianos, A.: Detecting emotional state of a child in a conversational computer game. Comput. Speech Lang. 25, 29–44 (2011)
Ang, J., Dhillon, R., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings International Conference on Spoken Language Processing (ICSLP), pp. 2037–2040. Denver, CO, (2002)
Arunachalam, S., Gould, D., Anderson, E., Byrd, D., Narayanan, S.S.: Politeness and frustration language in child-machine interactions. In: Proceedings EUROSPEECH, pp. 2675–2678, Aalborg, Denmark, (2001)
Lee, C., Narayanan, S., Pieraccini, R.: Recognition of negative emotions from the speech signal. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU’01) (2001)
Rankin, K.P., Salazar, A., Gorno-Tempini, M.L., Sollberger, M., Wilson, S.M., Pavlic, D., Stanley, C.M., Glenn, S., Weiner, M.W., Miller, B.L.: Detecting sarcasm from paralinguistic cues: anatomic and cognitive correlates in neurodegenerative disease. NeuroImage 47(4), 2005–2015 (2009)
Tepperman, J., Traum, D., Narayanan, S.: “Yeah Right”: sarcasm recognition for spoken dialogue systems. In: Proceedings of the Interspeech, pp. 1838–1841. Pittsburgh, Pennsylvania (2006)
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal Mach. Intell. 31(1), 39–58 (2009)
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proceedings 5th Slovenian and 1st International Language Technologies Conference, ISLTC 2006, pp. 240–245. Ljubljana, Slovenia, October 2006. Slovenian Language Technologies Society (2006)
Schuller, B., Vlasenko, B., Eyben, F., Wöllmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings 11th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2009, pp. 552–557. Merano, Italy, IEEE, IEEE (2009)
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 5688–5691, Prague, Czech Republic, IEEE, IEEE (2011)
Ververidis, D., Kotropoulos, C.: A state of the art review on emotional speech databases. In: 1st Richmedia Conference, pp. 109–119. Lausanne, Switzerland (2003)
Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 865–868. Hannover, Germany (2008)
Steidl, S.: Automatic Classification of Emotion-Related User States in Spontaneous Speech. Logos, Berlin (2009)
Batliner, A., Seppi, D., Steidl, S., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Adv. Human Comput. Interact. Special Issue on Emotion-Aware Natural Interaction 2010(Article ID 782802), 15 (2010)
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
Eyben, F., Wöllmer, M., Schuller, B.: Openear—introducing the munich open-source emotion and affect recognition toolkit. In: Proceedings 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, vol. I, pp. 576–581, Amsterdam, The Netherlands, HUMAINE Association, IEEE (2009)
Ishi, C., Ishiguro. H., Hagita, N.. Using prosodic and voice quality features for paralinguistic information extraction. In: Proceedings of Speech Prosody 2006, pp. 883–886, Dresden (2006)
Müller, C.: Classifying speakers according to age and gender. In: Müller, C. (ed.) Speaker Classification II, vol. 4343. Lecture Notes in Computer Science/Artificial Intelligence. Springer, Heidelberg (2007)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (v3.4). Cambridge University Press, Cambridge (2006)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Steidl, S., Schuller, B., Seppi, D., Batliner, A.: The hinterland of emotions: facing the open-microphone challenge. In: Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, vol. I, pp. 690–697, Amsterdam, The Netherlands, HUMAINE Association, IEEE (2009)
Schuller, B., Metze, F., Steidl, S., Batliner, A., Eyben, F., Polzehl, T.: Late fusion of individual engines for improved recognition of negative emotions in speech—learning vs. democratic vote. In: Proceedings of the 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5230–5233, Dallas, TX, IEEE, IEEE (2010)
Wöllmer, M., Weninger, F., Eyben, F., Schuller, B.: Computational assessment of interest in speech - facing the real-life challenge. Künstliche Intelligenz (German J. Artif. Intell.), Special Issue on Emotion and Computing 25(3), 227–236 (2011)
Wöllmer, M., Weninger, F., Eyben, F., Schuller, B.: Acoustic-linguistic recognition of interest in speech with bottleneck-blstm nets. In: Proceedings of INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 3201–3204. Florence, Italy, ISCA, ISCA (2011)
Mporas, I., Ganchev, T.: Estimation of unknown speaker’s height from speech. Int. J. Speech Tech. 12(4), 149–160 (2009)
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., Narayanan, S.: Paralinguistics in speech and language—state-of-the-art and the challenge. Comput. Speech Lang. Special Issue on Paralinguistics in Naturalistic Speech and Language 27(1), 4–39 (2013)
Omar, M.K., Pelecanos, J.: A novel approach to detecting non-native speakers and their native language. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, pp. 4398–4401. Dallas, Texas (2010)
Weiss, B., Burkhardt, F.: Voice attributes affecting likability perception. In: Proceedings of the INTERSPEECH, pp. 2014–2017. Makuhari, Japan (2010)
Bruckert, L., Lienard, J., Lacroix, A., Kreutzer, M., Leboucher, G.: Women use voice parameter to assess men’s characteristics. Proc. R. Soc. B. 237(1582), 83–89 (2006)
Gocsál, A.: Female listeners’ personality attributions to male speakers: the role of acoustic parameters of speech. Pollack Period. 4(3), 155–165 (2009)
Mohammadi, G., Vinciarelli, A., Mortillaro, M.: The voice of personality: mapping nonverbal vocal behavior into trait attributions. In: Proceedings of the SSPW 2010, pp. 17–20, Firenze, Italy (2010)
Polzehl, T., Möller, S., Metze, F.: Automatically assessing personality from speech. In: Proceedings—2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010, pp. 134–140. Pittsburgh, PA (2010)
Wallhoff, F., Schuller, B., Rigoll, G.: Speaker identification—comparing linear regression based adaptation and acoustic high-level features. In: Proceedings 31. Jahrestagung für Akustik, DAGA 2005, pp. 221–222. Munich, Germany, DEGA, DEGA (2005)
Müller, C., Burkhardt, F.: Combining short-term cepstral and long-term prosodic features for automatic recognition of speaker age. In: Interspeech, pp. 1–4,.Antwerp, Belgium (2007)
van Dommelen, W., Moxness, B.: Acoustic parameters in speaker height and weight identification: sex-specific behaviour. Lang. Speech 38(3), 267–287 (1995)
Krauss, R.M., Freyberg, R., Morsella, E.: Inferring speakers physical attributes from their voices. J. Exp. Soc. Psychol. 38(6), 618–625 (2002)
Gonzalez, J.: Formant frequencies and body size of speaker: a weak relationship in adult humans. J. Phonetics 32(2), 277–287 (2004)
Evans, S., Neave, N., Wakelin, D.: Relationships between vocal characteristics and body size and shape in human males: an evolutionary explanation for a deep male voice. Biol. Psychol. 72(2), 160–163 (2006)
Grimm, M., Kroschel, K., Narayanan, S.: Support vector regression for automatic recognition of spontaneous emotions in speech. In: International Conference on Acoustics, Speech and Signal Processing, vol. IV, pp. 1085–1088. IEEE (2007)
Hassan, A., Damper, R.I.: Multi-class and hierarchical SVMs for emotion recognition. In: Proceedings of the Interspeech, pp. 2354–2357, Makuhari, Japan (2010)
Burkhardt, F., Eckert, M., Johannsen, W., Stegmann, J.: A database of age and gender annotated telephone speech. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), pp. 1562–1565, Valletta, Malta (2010)
Fisher, M., Doddington, G., Goudie-Marshall, K.: The DARPA speech recognition research database: specifications and status. In: Proceedings of the DARPA Workshop on Speech Recognition, pp. 93–99 (1986)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1) (2009)
Krajewski, J., Batliner, A., Golz, M.: Acoustic sleepiness detection—framework and validation of a speech adapted pattern recognition approach. Behav. Res. Meth. 41, 795–804 (2009)
Levit, M., Huber, R., Batliner, A., Nöth, E.: Use of prosodic speech characteristics for automated detection of alcohol intoxination. In: Bacchiani, M., Hirschberg, J., Litman, D., Ostendorf, M. (eds.) Proceedings of the Workshop on Prosody and Speech Recognition 2001Red Bank, NJ, pp. 103–106 (2001)
Schiel, F., Heinrich, C.: Laying the foundation for in-car alcohol detection by speech. In: Proceedings of INTERSPEECH 2009, pp. 983–986, Brighton, UK (2009)
Ellgring, H., Scherer, K.R.: Vocal indicators of mood change in depression. J. Nonverbal Behav. 20, 83–110 (1996)
Laskowski, K., Ostendorf, M., Schultz, T.: Modeling vocal interaction for text-independent participant characterization in multi-party conversation. In: Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, pp. 148–155, Columbus (2008)
Ipgrave, J.: The language of friendship and identity: children’s communication choices in an interfaith exchange. Br. J. Relig. Edu. 31(3), 213–225 (2009)
Fujie, S., Ejiri, Y., Kikuchi, H., Kobayashi, T.: Recognition of positive/negative attitude and its application to a spoken dialogue system. Syst. Comput. Jpn. 37(12), 45–55 (2006)
Vinciarelli, A., Pantic, M., Bourlard, H.: Social signal processing: survey of an emerging domain. Image Vis. Comput. 27, 1743-1759 (2009)
Lee, C.-C., Katsamanis, A., Black, M., Baucom, B., Georgiou, P., Narayanan, S.: An analysis of pca-based vocal entrainment measures in married couples’ affective spoken interactions. In: Proceedings of Interspeech, pp. 3101–3104, Florence, Italy (2011)
Brenner, M., Cash, J.: Speech analysis as an index of alcohol intoxication—the Exxon Valdez accident. Aviat. Space Environ. Med. 62, 893–898 (1991)
Harrison, Y., Horne, J.: The impact of sleep deprivation on decision making: a review. J. Exp. Psychol. Appl. 6, 236–249 (2000)
Bard, E.G., Sotillo, C., Anderson, A.H., Thompson, H.S., Taylor, M.M.: The DCIEM map task corpus: spontaneous dialogue under SD and drug treatment. Speech Commun. 20, 71–84 (1996)
Caraty, M., Montacie, C.: Multivariate analysis of vocal fatigue in continuous reading. In: Proceedings of Interspeech 2010, pp. 470–473, Makuhari, Japan (2010)
Schiel, F., Heinrich, C., Barfüßer, S.: Alcohol language corpus—the first public corpus of alcoholized German speech. Lang. Res. Eval. 46(3), 503–521 (2012)
Akerstedt, T., Gillberg, M.: Subjective and objective sleepiness in the active individual. Int. J. Neurosci. 52(1–2), 29–37 (May 1990)
Krajewski, J., Schnieder, S., Sommer, D., Batliner, A., Schuller, B.: Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech. Neurocomputing. Special Issue From neuron to behavior: evidence from behavioral measurements 84, 65–75 (2012)
Krajewski, J., Kröger, B.: Using prosodic and spectral characteristics for sleepiness detection. In: Proceedings of INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 1841–1844, Antwerp, Belgium, ISCA, ISCA (2007)
Chin, S.B., Pisoni, D.B.: Alcohol and Speech. Academic Press Inc, New York (1997)
Dhupati, L., Kar, S., Rajaguru, A., Routray, A.: A novel drowsiness detection scheme based on speech analysis with validation using simultaneous EEG recordings. In: Proceedings of IEEE Conference on Automation Science and Engineering (CASE), pp. 917–921, Toronto, ON (2010)
Weninger, F., Schuller, B., Fusing utterance-level classifiers for robust intoxication recognition from speech. In: Proceedings MMCogEmS, : Workshop (Inferring Cognitive and Emotional States from Multimodal Measures), held in conjunction with the 13th International Conference on Multimodal Interaction, ICMI 2011, Alicante, Spain, ACM, ACM (2011)
Schuller, B., Weninger, F.: Ten recent trends in computational paralinguistics. In: Esposito, A., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) 4th COST 2102 International Training School on Cognitive Behavioural Systems. Lecture Notes on Computer Science (LNCS), p. 15. Springer, New York (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schuller, B. (2013). Applications in Intelligent Speech Analysis. In: Intelligent Audio Analysis. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36806-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-36806-6_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36805-9
Online ISBN: 978-3-642-36806-6
eBook Packages: EngineeringEngineering (R0)