Skip to main content

Applications in Intelligent Speech Analysis

  • Chapter
  • First Online:
Intelligent Audio Analysis

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

Speech is broadly considered as being the most natural communication form for humans. Obviously, there are manifold applications opening up for general technical and computer systems, once they are able to recognise speech as well as humans do—be it for interaction purposes with humans, mediation purposes between humans, or speech retrieval. Here, state-of-the-art methodology is presented for highly robust speech recognition, nonlinguistic vocalisation recognition, paralinguistic speaker states and traits as exemplified by sentiment, emotion, interest, age, gender, intoxication and sleepiness. All examples stem from the author’s recent work. In particular the latter are chosen from a series of Challenges co-organised by the author at Interspeech from 2009 onwards.

Speech is an arrangement of notes that will never be played again.

—Francis Scott Fitzgerald.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://semaine-project.eu/

  2. 2.

    http://www.nemesysco.com/

  3. 3.

    http://www.thq.com/

  4. 4.

    http://www.metacritic.com

  5. 5.

    http://www.epinions.com

  6. 6.

    http://www.imdb.com

  7. 7.

    http://www.cs.cornell.edu/people/pabo/movie-review-data/

  8. 8.

    http://www.metacritic.com, accessed January 2009.

  9. 9.

    http://opennlp.sourceforge.net/

  10. 10.

    http://htk.eng.cam.ac.uk/docs/docs.shtml

  11. 11.

    http://www.cs.waikato.ac.nz/ml/weka/

  12. 12.

    Per mill BAC by volume (standard in most central and eastern European countries; further ways exist, e.g., percent BAC by volume, i.e., the range resembles 0.028 to 0.175 per cent (Australia, Canada, USA), points by volume (GB), per mill by BAC per mass (Scandinavia) or part per million.)

References

  1. Shriberg, E.: Spontaneous speech: how peoply really talk and why engineers should care. In: Proceedings of Eurospeech, pp. 1781–1784. Lisbon (2005)

    Google Scholar 

  2. Schuller, B., Ablameier, M., Müller, R., Reifinger, S., Poitschke, T., Rigoll, G.: Speech communication and multimodal interfaces. In: Kraiss, K.-F. (ed.) Advanced Man Machine Interaction. Signals and Communication Technology. Chapter 4, pp. 141–190. Springer, Berlin (2006)

    Google Scholar 

  3. Lee, C.-C., Black, M., Katsamanis, A., Lammert, A., Baucom, B., Christensen, A., Georgiou, P., Narayanan, S.: Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples. In: Proceedings of Interspeech, pp. 793–796, Makuhari (2010)

    Google Scholar 

  4. Schuller, B., Wöllmer, M., Eyben, F., Rigoll, G.: Retrieval of paralinguistic information in broadcasts. In: Maybury, M.T. (ed.) Multimedia Information Extraction: Advances in Video, Audio, and Imagery Extraction for Search, Data Mining, Surveillance, and Authoring. Chapter 17, pp. 273–288. Wiley, IEEE Computer Society Press (2012)

    Google Scholar 

  5. Moreno, P.: Speech recognition in noisy environments. PhD thesis, Carnegie Mellon University, Pittsburgh (1996)

    Google Scholar 

  6. Kim, D., Lee, S., Kil, R.: Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans. Speech Audio Process. 7, 55–69 (1999)

    Article  Google Scholar 

  7. Rose, R.: Environmental robustness in automatic speech recognition. In: COST278 and ISCA Tutorial and Research Workshop on Robustness Issues in Conversational, Interaction (2004)

    Google Scholar 

  8. Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Robust spelling and digit recognition in the car: switching models and their like. In: Proceedings 34. Jahrestagung für Akustik, DAGA. DEGA, pp. 847–848. Dresden, March 2008

    Google Scholar 

  9. Schuller, B., Wöllmer, M., Moosmayr, T., Ruske, G., Rigoll, G.: Switching linear dynamic models for noise robust in-car speech recognition. In: Rigoll, G. (ed.) Pattern Recognition: 30th DAGM Symposium Munich, Germany. Proceedings of Lecture Notes on Computer Science (LNCS), vol. 5096, pp. 244–253. Springer, Berlin 10–13 June 2008

    Google Scholar 

  10. Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement. EURASIP J. Audio Speech Music Process. 2009(Article ID 942617), 17 (2009)

    Google Scholar 

  11. Wöllmer, M., Eyben, F., Schuller, B., Sun, Y., Moosmayr, T., Nguyen-Thien, N.: Robust in-car spelling recognition: a tandem blstm-hmm approach. In: Proceedings INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, pp. 1990–9772. ISCA, Brighton, Sept 2009

    Google Scholar 

  12. Schuller, B., Weninger, F., Wöllmer, M., Sun, Y. Rigoll, G.: Non-negative matrix factorization as noise-robust feature extractor for speech recognition. In Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 4562–4565. IEEE, Dallas, March 2010

    Google Scholar 

  13. Weninger, F., Geiger, J., Wöllmer, M., Schuller, B., Rigoll, G.: The munich 2011 chime challenge contribution: Nmf-blstm speech enhancement and recognition for reverberated multisource environments. In: Proceedings Machine Listening in Multisource Environments, CHiME 2011, Satellite Workshop of Interspeech, pp. 24–29. ISCA, Florence, Sept 2011

    Google Scholar 

  14. Weninger, F., Wöllmer, M., Geiger, J. Schuller, B., Gemmeke, J., Hurmalainen, A., Virtanen, T., Rigoll, G.: Non-negative matrix factorization for highly noise-robust asr: to enhance or to recognize? In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 4681–4684. IEEE, Kyoto, March 2012

    Google Scholar 

  15. de la Torre, A., Fohr, D., Haton, J.: Compensation of noise effects for robust speech recognition in car environments. In: Proceedings of International Conference on Spoken Language Processing (2000)

    Google Scholar 

  16. Langmann, D., Fischer, A., Wuppermann, F., Haeb-Umbach, R., Eisele, T.: Acoustic front ends for speaker-independent digit recognition in car environments. In: Proceedings of Eurospeech, pp. 2571–2574 (1997)

    Google Scholar 

  17. Doddington, G., Schalk, T.: Speech recognition: turning theory to practice. In: IEEE Spectrum, pp. 26–32 (1981)

    Google Scholar 

  18. Hirsch, H.G., Pierce, D.: The AURORA experimental framework for the performance evaluation of speech recognition systems under noise conditions. Challenges for the Next Millenium, Automatic Speech Recognition (2000)

    Google Scholar 

  19. Mesot, B., Barber, D.: Switching linear dynamical systems for noise robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 15, 1850–1858 (2007)

    Article  Google Scholar 

  20. Schuller, B., Rigoll, G., Grimm, M., Kroschel, K., Moosmayr, T., Ruske, G.: Effects of in-car noise-conditions on the recognition of emotion within speech. In: Proceedings 33. Jahrestagung für Akustik, DAGA 2007, pp. 305–306. DEGA, Stuttgart, March 2007

    Google Scholar 

  21. Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G., Moosmayr, T.: On the necessity and feasibility of detecting a driver’s emotional state while driving. In: Paiva, A., Picard, R.W., Prada, R. (eds.) Affective Computing and Intelligent Interaction: Second International Conference, pp. 126–138. ACII 2007, Lisbon, Portugal, September 12–14, 2007. Proceedings of Lecture Notes on Computer Science (LNCS)Springer, vol. 4738/2007. Berlin/Heidelberg (2007)

    Google Scholar 

  22. Schuller, B.: Speaker, noise, and acoustic space adaptation for emotion recognition in the automotive environment. In: Proceedings 8th ITG Conference on Speech Communication, vol. 211, p. 4. ITG-Fachbericht, Aachen, Germany, ITG, VDE-Verlag (2008)

    Google Scholar 

  23. Cooke, M., Scharenborg, O.: The interspeech 2008 consonant challenge. In: Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and, Signal Processing (2008)

    Google Scholar 

  24. Borgström, B., Alwan, A.: HMM-based estimation of unreliable spectral components for noise robust speech recognition. In: Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and, Signal Processing (2008)

    Google Scholar 

  25. Jancovic, P., Münevver, K.: On the mask modeling and feature representation in the missing-feature ASR: evaluation on the consonant challenge. In: Proceedings of Interspeech (2008)

    Google Scholar 

  26. Gemmeke, J., Cranen, B.: Noise reduction through compressed sensing. In: Proceedings of Interspeech (2008)

    Google Scholar 

  27. Schuller, B., Wöllmer, M., Moosmayr, T., Rigoll, G.: Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement. In: Proceedings INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, incorporating 12th Australasian International Conference on Speech Science and Technology, SST 2008, pp. 1789–1792, Brisbane, Australia, ISCA/ASSTA, ISCA (2008)

    Google Scholar 

  28. Wöllmer, M., Eyben, F., Schuller, B., Rigoll, G.: A multi-stream asr framework for blstm modeling of conversational speech. In: Proceedings 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 4860–4863. Prague, Czech Republic, IEEE, IEEE (2011)

    Google Scholar 

  29. Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: A tandem blstm-dbn architecture for keyword spotting with enhanced context modeling. In: Proceedings ISCA Tutorial and Research Workshop on Non-Linear Speech Processing, p. 9. NOLISP 2009, Vic, Spain. ISCA, ISCA (2009)

    Google Scholar 

  30. Wöllmer, M., Eyben, F., Keshet, J., Graves, A., Schuller, B., Rigoll, G.: Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional lstm networks. In: Proceedings 34th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, pp. 3949–3952. Taipei, Taiwan, IEEE, IEEE (2009)

    Google Scholar 

  31. Wöllmer, M., Eyben, F., Schuller, B., Rigoll, G.: Robust vocabulary independent keyword spotting with graphical models. In: Proceedings 11th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2009, pp. 349–353. Merano, Italy, IEEE, IEEE (2009)

    Google Scholar 

  32. Wöllmer, M., Sun, Y., Eyben, F., Schuller, B.: Long short-term memory networks for noise robust speech recognition. In: Proceedings INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 2966–2969. Makuhari, Japan, ISCA, ISCA (2010)

    Google Scholar 

  33. Wöllmer, M., Eyben, F., Schuller, B., Rigoll, G.: Spoken term detection with connectionist temporal classification: a novel hybrid ctc-dbn decoder. In: Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5274–5277. Dallas, TX, IEEE, IEEE (2010)

    Google Scholar 

  34. Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: Improving keyword spotting with a tandem blstm-dbn architecture. In: Sole-Casals, J., Zaiats, V. (eds.) Advances in Non-Linear Speech Processing: International Conference on Nonlinear Speech Processing, NOLISP 2009, Vic, Spain, 25–27 June 2009. Revised Selected Papers of Lecture Notes on Computer Science (LNCS), vol. 5933/2010, pp. 68–75. Springer (2010)

    Google Scholar 

  35. Wöllmer, M., Eyben, F., Schuller, B., Rigoll, G.: Recognition of spontaneous conversational speech using long short-term memory phoneme predictions. In: Proceedings INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 1946–1949. Makuhari, Japan, ISCA, ISCA (2010)

    Google Scholar 

  36. Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: Bidirectional lstm networks for context-sensitive keyword detection in a cognitive virtual agent framework. Cogn. Comput. Spec. Issue Non-Linear Non-Conv. Speech Proces. 2(3), 180–190 (2010)

    Google Scholar 

  37. Wöllmer, M., Schuller, B.: Enhancing spontaneous speech recognition with blstm features. In: Travieso-González, C.M., Alonso-Hernández, J. (eds.) Advances in Nonlinear Speech Processing, 5th International Conference on Nonlinear Speech Processing, NoLISP 2011, Las Palmas de Gran Canaria, Spain, 7–9 November 2011. Proceedings of Lecture Notes in Computer Science (LNCS), vol. 7015/2011, pp. 17–24. Springer (2011)

    Google Scholar 

  38. Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Multi-stream lstm-hmm decoding and histogram equalization for noise robust keyword spotting. Cogn. Neurodyn. 5(3), 253–264 (2011)

    Article  Google Scholar 

  39. Wöllmer, M., Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Tandem decoding of children’s speech for keyword detection in a child-robot interaction scenario. In: ACM Transactions on Speech and Language Processing. Special Issue on Speech and Language Processing of Children’s Speech for Child-machine Interaction Applications, vol. 7, Issue 4, p. 22 (2011)

    Google Scholar 

  40. Wöllmer, M., Schuller, B., Rigoll, G.: A novel bottleneck-blstm front-end for feature-level context modeling in conversational speech recognition. In: Proceedings 12th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2011, pp. 36–41. Big Island, HY, IEEE, IEEE (2011)

    Google Scholar 

  41. Wöllmer, M., Schuller, B., Rigoll, G.. Feature frame stacking in rnn-based tandem asr systems—learned vs. predefined context. In: Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 1233–1236. Florence, Italy, ISCA, ISCA (2011)

    Google Scholar 

  42. Wöllmer, M., Marchi, E., Squartini, S., Schuller, B.: Robust multi-stream keyword and non-linguistic vocalization detection for computationally intelligent virtual agents. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) Proceedings 8th International Conference on Advances in Neural Networks, ISNN 2011, Guilin, China, 29.05.–01.06.2011. Part II of Lecture Notes in Computer Science (LNCS), vol. 6676, pp. 496–505. Springer, Berlin/Heidelberg (2011)

    Google Scholar 

  43. Schröder, M., Bevacqua, E., Cowie, R., Eyben, F., Gunes, H., Heylen, D., ter Maat, M., McKeown, G., Pammi, S., Pantic, M., Pelachaud, C., Schuller, B., de Sevin, E., Valstar, M., Wöllmer, M.: Building autonomous sensitive artificial listeners. IEEE Trans. Affect. Comput. 3(2):165–183 (2012)

    Google Scholar 

  44. Aradilla, G., Vepa, J., Bourlard, H.: An acoustic model based on Kullback-Leibler divergence for posterior features. In: Proceedings of the ICASSP, pp. 657–660. Honolulu, HI (2007)

    Google Scholar 

  45. Grezl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: Proceedings of the ICASSP, pp. 4729–4732. Las Vegas, NV (2008)

    Google Scholar 

  46. Hermansky, H., Fousek, P.: Multi-resolution RASTA filtering for TANDEM-based ASR. In: Proceedings of the European Conference on Speech Communication and Technology, pp. 361–364. Lisbon, Portugal (2008)

    Google Scholar 

  47. Graves, A., Fernandez, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Proceedings of ICANN, pp. 602–610. Warsaw, Poland (2005)

    Google Scholar 

  48. Fernandez, S., Graves, A., Schmidhuber, J.: An application of recurrent neural networks to discriminative keyword spotting. In: Proceedings of Internet Corporation for Assigned Names and Numbers 2007, vol. 4669, pp. 220–229. Porto, Portugal (2007)

    Google Scholar 

  49. Stupakov, A., Hanusa, E., Bilmes, J., Fox, D.: COSINE—a corpus of multi-party conversational speech in noisy environments. In: Proceedings of the ICASSP, Taipei, Taiwan (2009)

    Google Scholar 

  50. Eyben, F., Wöllmer, M., Schuller, B.: Opensmile—the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 9th ACM International Conference on Multimedia, MM 2010, pp. 1459–1462. Florence, Italy, ACM, ACM (2010)

    Google Scholar 

  51. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5—-6), 602–610 (2005)

    Article  Google Scholar 

  52. Campbell, N.: On the use of nonverbal speech sounds in human communication. In: Proceedings of the COST 2102 Workshop, pp. 117–128. Vietri sul Mare, Italy (2007)

    Google Scholar 

  53. Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 2253–2256. Antwerp, Belgium. ISCA, ISCA (2007)

    Google Scholar 

  54. Schuller, B., Eyben, F., Rigoll, G.: Static and dynamic modelling for the recognition of non-verbal vocalisations in conversational speech. In: André, E., Dybkjaer, L., Neumann, H., Pieraccini, R., Weber, M. (eds.) Perception in Multimodal Dialogue Systems: 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, pp. 99–110. PIT 2008, Kloster Irsee, Germany, 16–18 June 2008. Proceedings of Lecture Notes on Computer Science (LNCS), vol. 5078/2008. Springer, Berlin/Heidelberg (2008)

    Google Scholar 

  55. Batliner, A., Steidl, S., Eyben, F., Schuller, B., Laughter in child-robot interaction. In: Proceedings Interdisciplinary Workshop on Laughter and other Interactional Vocalisations in Speech, Laughter, Berlin. February, Germany (2009)

    Google Scholar 

  56. Eyben, F., Petridis, S., Schuller, B., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: Audiovisual classification of vocal outbursts in human conversation using long-short-term memory networks. In: Proceedings 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 5844–5847. Prague, Czech Republic, IEEE, IEEE (2011)

    Google Scholar 

  57. Batliner, A., Steidl, S., Eyben, F., Schuller, B.: On laughter and speech laugh, based on observations of child-robot interaction. In: Trouvain, J., Campbell, N. (eds.) The Phonetics of Laughing, p. 23. Saarland University Press, Saarbrücken (2012)

    Google Scholar 

  58. Prylipko, D., Schuller, B., Wendemuth, A.: Fine-tuning hmms for nonverbal vocalizations in spontaneous speech: a multicorpus perspective. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 4625–4628, Kyoto, Japan, IEEE, IEEE (2012)

    Google Scholar 

  59. Eyben, F., Petridis, S., Schuller, B., Pantic, M.: Audiovisual vocal outburst classification in noisy acoustic conditions. In: Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, pp. 5097–5100. Kyoto, Japan, IEEE, IEEE (2012)

    Google Scholar 

  60. M. Goto, K. Itou, and S. Hayamizu. A real-time filled pause detection system for spontaneous speech recognition. In: Proceedings of the Eurospeech, pp. 227–230. Budapest, Hungary (1999)

    Google Scholar 

  61. Truong, K.P., van Leeuwen, D.A.: Automatic detection of laughter. In: Proceedings of the Interspeech, pp. 485–488. Lisbon, Portugal (2005)

    Google Scholar 

  62. Campbell, N., Kashioka, H., Ohara, R.: No laughing matter. In: Proceedings of the Interspeech, pp. 465–468. Lisbon, Portugal (2005)

    Google Scholar 

  63. Knox, M.T., Mirghafori, N.: Automatic laughter detection using neural networks. In: Proceedings INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 2973–2976. Antwerp, Belgium, ISCA, ISCA (2007)

    Google Scholar 

  64. Cho, Y.-C., Choi, S., Bang, S.-Y.: Non-negative component parts of sound for classification. In: Proceedings of the ISSPIT, pp. 633–636. Darmstadt, Germany (2003)

    Google Scholar 

  65. Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? recognising natural interest by extensive audiovisual integration for real-life application. Image Vis. Comput. Special Issue on Visual and Multimodal Analysis of Human Spontaneous Behavior 27(12), 1760–1774 (2009)

    Google Scholar 

  66. Schuller, B., Weninger, F.: Discrimination of speech and non-linguistic vocalizations by non-negative matrix factorization. In: Proceedings 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5054–5057. Dallas, TX, IEEE, IEEE (2010)

    Google Scholar 

  67. Schmidt, M.N., Olsson, R.K.: Single-channel speech separation using sparse non-negative matrix factorization. In: Proceedings of the Interspeech, pp. 2–5. Pittsburgh, Pennsylvania (2006)

    Google Scholar 

  68. Smaragdis, P.: Discovering auditory objects through non-negativity constraints. In: Proceedings of the SAPA, Jeju, Korea (2004)

    Google Scholar 

  69. Schuller, B.: Automatisches verstehen gesprochener mathematischer formeln. Technische Universität München, Munich, Germany, October, Diploma thesis (1999)

    Google Scholar 

  70. Schuller, B., Schenk, J., Rigoll, G., Knaup, T.: “the godfather” vs. “chaos”: comparing linguistic analysis based on online knowledge sources and bags-of-n-grams for movie review valence estimation. In: Proceedings of 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 858–862. Barcelona, Spain, IAPR, IEEE (2009)

    Google Scholar 

  71. Schuller, B., Knaup, T.: Learning and knowledge-based sentiment analysis in movie review key excerpts. In: Esposito, A., Esposito, A.M., Martone, R., Müller, V., Scarpetta, G. (eds.) Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues: Third COST 2102 International Training School, Caserta, Italy, 15–19 March 2010, Revised Selected Papers, Lecture Notes on Computer Science (LNCS), vol. 6456/2010, 1st edn, pp. 448–472. Springer, Heidelberg (2011)

    Google Scholar 

  72. Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Proceedings INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, pp. 312–315. Brighton, UK, ISCA, ISCA (2009)

    Google Scholar 

  73. Schuller, B., Steidl, S., Batliner, A., Jurcicek, F.: The interspeech 2009 emotion challenge—results and lessons learnt. Speech and Language Processing Technical Committee (SLTC) Newsletter (2009)

    Google Scholar 

  74. Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. Special Issue on Sensing Emotion and Affect—Facing Realism in Speech Processing. 53(9/10), 1062–1087 (2011)

    Google Scholar 

  75. Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C, Narayanan, S.: The interspeech 2010 paralinguistic challenge. In: Proceedings INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, pp. 2794–2797. Makuhari, Japan, ISCA, ISCA (2010)

    Google Scholar 

  76. Schuller, B., Wöllmer, M., Eyben, F., Rigoll, G., Arsić, D.: Semantic speech tagging: towards combined analysis of speaker traits. In: Brandenburg, K., Sandler, M. (eds.) Proceedings AES 42nd International Conference, pp. 89–97. AES, Audio Engineering Society, Ilmenau (2011)

    Google Scholar 

  77. Schuller, B., Batliner, A., Steidl, S., Schiel, F., Krajewski, J.: The interspeech 2011 speaker state challenge. In: Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 3201–3204. Florence, Italy, ISCA, ISCA (2011)

    Google Scholar 

  78. Chen, A.: Perception of paralinguistic intonational meaning in a second language. Lang. Learn. 59(2), 367–409 (2009)

    Article  Google Scholar 

  79. Bello, R.: Causes and paralinguistic correlates of interpersonal equivocation. J. Pragmat. 38(9), 1430–1441 (2006)

    Article  Google Scholar 

  80. Fernandez, R., Picard, R.W.: Modeling drivers’ speech under stress. Speech Commun. 40, 145–159 (2003)

    Article  MATH  Google Scholar 

  81. Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., Cox, C.: ASR for emotional speech: Clarifying the issues and enhancing performance. Neural Netw. 18, 437–444 (2005)

    Article  Google Scholar 

  82. Steidl, S., Batliner, A., Seppi, D., Schuller, B.: On the impact of children’s emotional speech on acoustic and language models. EURASIP J. Audio, Speech, Music Process. Special Issue on Atypical Speech 2010(Article ID 783954), 14 (2010)

    Google Scholar 

  83. Wöllmer, M., Schuller, B., Eyben, F., Rigoll, G.: Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening. IEEE J. Sel. Topics Signal Process. Special Issue on Speech Processing for Natural Interaction with Intelligent Environments 4(5), 867–881 (2010)

    Google Scholar 

  84. Wöllmer, M., Klebert, N., Schuller, B.: Switching linear dynamic models for recognition of emotionally colored and noisy speech. In: Proceedings 9th ITG Conference on Speech Communication, ITG-Fachbericht, vol. 225. Bochum, Germany, ITG, VDE-Verlag (2010)

    Google Scholar 

  85. Romanyshyn, N.: Paralinguistic maintenance of verbal communicative interaction in literary discourse (on the material of W. S. Maugham’s novel "Theatre"). In: Experience of Designing and Application of CAD Systems in Microelectronics—Proceedings of the 10th International Conference, CADSM 2009, pp. 550–552. Polyana-Svalyava, Ukraine (2009)

    Google Scholar 

  86. Kennedy, L., Ellis, D.: Pitch-based emphasis detection for characterization of meeting recordings. In: Proceedings of the ASRU, pp. 243–248. Virgin Islands (2003)

    Google Scholar 

  87. Laskowski, K.: Contrasting emotion-bearing laughter types in multiparticipant vocal activity detection for meetings. In: Proceedings of the ICASSP, pp. 4765–4768. Taipei, Taiwan, IEEE (2009)

    Google Scholar 

  88. Massida, Z., Belin, P., James, C., Rouger, J., Fraysse, B., Barone, P., Deguine, O.: Voice discrimination in cochlear-implanted deaf subjects. Hear. Res. 275(1–2), 120–129 (2011)

    Article  Google Scholar 

  89. Demouy, J., Plaza, M., Xavier, J., Ringeval, F., Chetouani, M. Prisse, D., Chauvin, D., Viaux, S., Golse, B., Cohen, D., Robel, L.: Differential language markers of pathology in autism, pervasive developmental disorder not otherwise specified and specific language impairment. Res. Autism Spectr. Disord. 5(4), 1402–1412 (2011)

    Google Scholar 

  90. Mower, E., Black, M., Flores, E., Williams, M., Narayanan, S.: Design of an emotionally targeted interactive agent for children with autism. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2011), pp. 1–6. Barcelona, Spain (2011)

    Google Scholar 

  91. de Sevin, E., Bevacqua, E., Pammi, S., Pelachaud, C., Schröder, M., Schuller, B.: A multimodal listener behaviour driven by audio input. In: Proceedings International Workshop on Interacting with ECAs as Virtual Characters, satellite of AAMAS 2010, p. 4. Toronto, Canada, ACM, ACM (2010)

    Google Scholar 

  92. Biever, C.: You have three happy messages. New Sci. 185(2481), 21 (2005)

    Google Scholar 

  93. Martinez, C.A., Cruz, A.: Emotion recognition in non-structured utterances for human-robot interaction. In: IEEE International Workshop on Robot and Human Interactive, Communication, pp. 19–23 (2005)

    Google Scholar 

  94. Batliner, A., Steidl, S., Nöth, E.: Associating children’s non-verbal and verbal behaviour: body movements, emotions, and laughter in a human-robot interaction. In: Proceedings of ICASSP, pp. 5828–5831. Prague (2011)

    Google Scholar 

  95. Delaborde, A., Devillers, L.: Use of non-verbal speech cues in social interaction between human and robot: emotional and interactional markers. In: AFFINE’10—Proceedings of the 3rd ACM Workshop on Affective Interaction in Natural Environments, Co-located with ACM Multimedia 2010, pp. 75–80. Florence, Italy (2010)

    Google Scholar 

  96. Schröder, M., Cowie, R., Heylen, D., Pantic, M., Pelachaud, C., Schuller, B.: Towards responsive sensitive artificial listeners. In: Proceedings 4th International Workshop on Human-Computer Conversation, p. 6. Bellagio, Italy (2008)

    Google Scholar 

  97. Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Proceedings of the Electronic Speech Signal Processing ESSP, pp. 123–131 (2005)

    Google Scholar 

  98. Mishne, G., Carmel, D., Hoory, R., Roytman, A., Soffer, A.: Automatic analysis of call-center conversations. In: Proceedings of the CIKM’05, pp. 453–459. Bremen, Germany (2005)

    Google Scholar 

  99. Belin, P., Fillion-Bilodeau, S., Gosselin, F.: The montreal affective voices: a validated set of nonverbal affect bursts for research on auditory affective processing. Behav. Res. Meth. 40(2), 531–539 (2008)

    Article  Google Scholar 

  100. Schoentgen, J.: Vocal cues of disordered voices: an overview. Acta Acustica United Acustica 92(5), 667–680 (2006)

    Google Scholar 

  101. Rektorova, I., Barrett, J., Mikl, M., Rektor, I., Paus, T.: Functional abnormalities in the primary orofacial sensorimotor cortex during speech in parkinson’s disease. Mov. Disord 22(14), 2043–2051 (2007)

    Article  Google Scholar 

  102. Sapir, S., Ramig, L.O., Spielman, J.L., Fox, C.: Formant centralization ratio: a proposal for a new acoustic measure of dysarthric speech. J. Speech Lang. Hear. Res. 53 (2009)

    Google Scholar 

  103. Oller, D.K., Niyogic, P., Grayd, S., Richards, J.A., Gilkerson, J., Xu, D., Yapanel, U., Warrene, S.F.: Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. In: Proceedings of the National Academy of Sciences of the United States of America (PNAS), vol. 107. (2010)

    Google Scholar 

  104. Maier, A., Haderlein, T., Eysholdt, U., Rosanowski, F., Batliner, A., Schuster, M., Nöth, E.: PEAKS—a system for the automatic evaluation of voice and speech disorders. Speech Commun. 51, 425–437 (2009)

    Article  Google Scholar 

  105. Malyska, N., Quatieri, T., Sturim, D.: Automatic dysphonia recognition using bilogically inspired amplitude-modulation features. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. I, pp. 873–876. Prague (2005)

    Google Scholar 

  106. Dibazar, A., Narayanan, S.: A system for automatic detection of pathological speech. In: Proceedings of Conference Signals, Systems, and Computers, Asilomar, CA (2002)

    Google Scholar 

  107. Litman, D., Rotaru, M., Nicholas, G.: Classifying turn-level uncertainty using word-level prosody. In: Proceedings of the Interspeech, pp. 2003–2006. Brighton, UK (2009)

    Google Scholar 

  108. Boril, H., Sadjadi, S., Kleinschmidt, T., Hansen, J.: Analysis and detection of cognitive load and frustration in drivers’ speech. In: Proceedings of the Interspeech 2010, pp. 502–505. Makuhari, Japan (2010)

    Google Scholar 

  109. Litman, D., Forbes, K.: Recognizing emotions from student speech in tutoring dialogues. In: Proceedings of ASRU, pp. 25–30. Virgin Island (2003)

    Google Scholar 

  110. Ai, H., Litman, D., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of the Interspeech, pp. 797–800. Pittsburgh (2006)

    Google Scholar 

  111. Price, L., Richardson, J.T.E., Jelfs, A.: Face-to-face versus online tutoring support in distance education. Stud. High. Edu. 32(1), 1–20 (2007)

    Article  Google Scholar 

  112. Pfister, T., Robinson, P.: Speech emotion classification and public speaking skill assessment. In: Proceedings of the International Workshop on Human Behaviour Understanding, pp. 151–162. Istanbul, Turkey (2010)

    Google Scholar 

  113. Schuller, B., Eyben, F., Can, S., Feussner, H.: Speech in minimal invasive surgery—towards an affective language resource of real-life medical operations. In: Devillers, L., Schuller, B., Cowie, R., Douglas-Cowie, E., Batliner, A. (eds.) Proceedings 3rd International Workshop on EMOTION: Corpora for Research on Emotion and Affect, satellite of LREC 2010, pp. 5–9. Valletta, Malta. ELRA, European Language Resources Association (2010)

    Google Scholar 

  114. Ronzhin, A.L.: Estimating psycho-physiological state of a human by speech analysis. Proc. SPIE Int. Soc. Opt. Eng. 5797, 170–181 (2005)

    Article  Google Scholar 

  115. Schuller, B., Wimmer, M, Arsić, D., Moosmayr, T., Rigoll, G.: Detection of security related affect and behaviour in passenger transport. In: Proceedings INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, incorporating 12th Australasian International Conference on Speech Science and Technology, SST 2008, pp. 265–268. Brisbane, Australia. ISCA/ASSTA, ISCA (2008)

    Google Scholar 

  116. Kwon, H., Berisha, V., Spanias, A.: Real-time sensing and acoustic scene characterization for security applications. In: 3rd International Symposium on Wireless Pervasive Computing, ISWPC 2008, Proceedings, pp. 755–758 (2008)

    Google Scholar 

  117. Clavel, C., Vasilescu, I., Devillers, L., Richard, G., Ehrette, T.: Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun. 50(6), 487–503 (2008)

    Article  Google Scholar 

  118. Boril, H., Sangwan, A., Hasan, T., Hansen, J.: Automatic excitement-level detection for sports highlights generation. In: Proceedings of the Interspeech 2010, pp. 2202–2205. Makuhari, Japan (2011)

    Google Scholar 

  119. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424. Philadelphia (2002)

    Google Scholar 

  120. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on World Wide Web, pp. 519–528. Budapest, Hungary, ACM (2003)

    Google Scholar 

  121. Yi, J., Nasukawa, T., Bunescu, R., Niblack, W.: Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 427–434 (2003)

    Google Scholar 

  122. Popescu, A., Etzioni, O.: Extracting product features and opinions from reviews. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 339–346. Association for Computational Linguistics Morristown, NJ, USA (2005)

    Google Scholar 

  123. B. Liu, M. Hu, and J. Cheng. Opinion observer: analyzing and comparing opinions on the web. In: WWW ’05: Proceedings of the 14th international conference on World Wide Web, pp. 342–351. New York, NY, ACM (2005)

    Google Scholar 

  124. Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: WSDM ’08: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 231–240, New York, NY, USA, ACM (2008)

    Google Scholar 

  125. Das, S.R., Chen, M.Y.: Yahoo! for amazon: sentiment parsing from small talk on the web. In: Proceedings of the 8th Asia Pacific Finance Association Annual Conference (2001)

    Google Scholar 

  126. Pang., B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79–86. Philadelphia, PA (2002)

    Google Scholar 

  127. Zhuang, L., Jing, F., Zhu, X.-Y.: Movie review mining and summarization. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM ’06), pp. 43–50, New York, NY, USA, ACM (2006)

    Google Scholar 

  128. Porter, M.F.: An algorithm for suffix stripping. Program 3(14), 130–137 (October 1980)

    Google Scholar 

  129. Marcus, M., Marcinkiewicz, M., Santorini, B.: Building a large annotated corpus of english: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  130. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: NAACL ’03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 134–141. Morristown, NJ, USA. Association for Computational Linguistics (2003)

    Google Scholar 

  131. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  132. Wiebe, J., Wilson, T., Bell, M.: Identifying collocations for recognizing opinions. In: Proceedings of the ACL-01 Workshop on Collocation: Computational Extraction, Analysis, and Exploitation, pp. 24–31 (2001)

    Google Scholar 

  133. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: HLT ’05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Morristown, NJ, USA, Association for Computational Linguistics (2005)

    Google Scholar 

  134. Turney, P.D., Littman, M.L.: Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. 21(4), 315–346 (October 2003)

    Article  Google Scholar 

  135. Esuli, A., Sebastiani, F.: Determining term subjectivity and term orientation for opinion mining. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL ’06), Trento, Italy (2006)

    Google Scholar 

  136. Lizhong, W., Oviatt, S., Cohen, P.R.: Multimodal integration—a statistical view. IEEE Trans. Multimed. 1, 334–341 (1999)

    Article  Google Scholar 

  137. Wöllmer, M., Al-Hames, M., Eyben, F., Schuller, B., Rigoll, G.: A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams. Neurocomputing 73(1–3), 366–380 (2009)

    Article  Google Scholar 

  138. Liu, D.: Automatic mood detection from acoustic music data, pp. 13–17. In: Proceedings International Conference on Music, Information Retrieval (2003)

    Google Scholar 

  139. Nose, T., Kato, Y., Kobayashi, T.: Style estimation of speech based on multiple regression hidden semi-markov model. In: Proceedings INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 2285–2288. Antwerp, Belgium, ISCA, ISCA (2007)

    Google Scholar 

  140. Zhang, C., Hansen, J.H.L.: Analysis and classification of speech mode: whispered through shouted. In: International Speech Communication Association—8th Annual Conference of the International Speech Communication Association, Interspeech 2007, vol. 4, pp. 2396–2399 (2007)

    Google Scholar 

  141. Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Commun. 40, 227–256 (2003)

    Article  MATH  Google Scholar 

  142. Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., Vogt, T., Aharonson, V., Amir, N.: The automatic recognition of emotions in speech. In: Cowie, R., Petta, P., Pelachaud, C. (eds.) Emotion-Oriented Systems: The HUMAINE Handbook, Cognitive Technologies, 1st edn, pp. 71–99. Springer, New York (2010)

    Google Scholar 

  143. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., Amir, N.: Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput. Speech Lang. Special Issue on Affective Speech in real-life interactions 25(1), 4–28 (2011)

    Google Scholar 

  144. Batliner, A., Steidl, S., Hacker, C., Nöth, E.: Private emotions vs. social interaction—a data-driven approach towards analysing emotions in speech. User Modeling and User-Adapted Interaction. J. Personal. Res. 18(1–2), 175–206 (2008)

    Google Scholar 

  145. Hansen, J., Bou-Ghazale, S.: Getting started with susas: a speech under simulated and actual stress database. In: Proceedings of the EUROSPEECH-97, vol. 4, pp. 1743–1746. Rhodes, Greece (1997)

    Google Scholar 

  146. Batliner, A., Schuller, B., Schaeffler, S., Steidl, S.: Mothers, adults, children, pets—towards the acoustics of intimacy. In: Proceedings 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2008, pp. 4497–4500. Las Vegas, NV, IEEE, IEEE (2008)

    Google Scholar 

  147. Pon-Barry, H.: Prosodic manifestations of confidence and uncertainty in spoken language. In: INTERSPEECH 2008—9th Annual Conference of the International Speech Communication Association, pp. 74–77. Brisbane, Australia (2008)

    Google Scholar 

  148. Black, M., Chang, J., Narayanan, S.: An empirical analysis of user uncertainty in problem-solving child-machine interactions. In: Proceedings of the 1st Workshop on Child, Computer and Interaction, Chania, Greece (2008)

    Google Scholar 

  149. Enos, F., Shriberg, E., Graciarena, M., Hirschberg, J., Stolcke, A.: Detecting deception using critical segments. In: Proceedings INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 2281–2284. Antwerp, Belgium, ISCA, ISCA (2007)

    Google Scholar 

  150. Bénézech, M.: Vérité et mensonge : l’évaluation de la crédibilité en psychiatrie lgale et en pratique judiciaire. Annales Medico-Psychologiques 165(5), 351–364 (2007)

    Google Scholar 

  151. Nadeu, M., Prieto, P.: Pitch range, gestural information, and perceived politeness in catalan. J. Pragmat. 43(3), 841–854 (2011)

    Article  Google Scholar 

  152. Yildirim, S., Lee, C., Lee, S., Potamianos, A., Narayanan, S.: Detecting politeness and frustration state of a child in a Conversational Computer Game. In: Proceedings of the Interspeech 2005, pp. 2209–2212. Lisbon, Portugal, ISCA (2005)

    Google Scholar 

  153. Yildirim, S., Narayanan, S., Potamianos, A.: Detecting emotional state of a child in a conversational computer game. Comput. Speech Lang. 25, 29–44 (2011)

    Article  Google Scholar 

  154. Ang, J., Dhillon, R., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proceedings International Conference on Spoken Language Processing (ICSLP), pp. 2037–2040. Denver, CO, (2002)

    Google Scholar 

  155. Arunachalam, S., Gould, D., Anderson, E., Byrd, D., Narayanan, S.S.: Politeness and frustration language in child-machine interactions. In: Proceedings EUROSPEECH, pp. 2675–2678, Aalborg, Denmark, (2001)

    Google Scholar 

  156. Lee, C., Narayanan, S., Pieraccini, R.: Recognition of negative emotions from the speech signal. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU’01) (2001)

    Google Scholar 

  157. Rankin, K.P., Salazar, A., Gorno-Tempini, M.L., Sollberger, M., Wilson, S.M., Pavlic, D., Stanley, C.M., Glenn, S., Weiner, M.W., Miller, B.L.: Detecting sarcasm from paralinguistic cues: anatomic and cognitive correlates in neurodegenerative disease. NeuroImage 47(4), 2005–2015 (2009)

    Article  Google Scholar 

  158. Tepperman, J., Traum, D., Narayanan, S.: “Yeah Right”: sarcasm recognition for spoken dialogue systems. In: Proceedings of the Interspeech, pp. 1838–1841. Pittsburgh, Pennsylvania (2006)

    Google Scholar 

  159. Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal Mach. Intell. 31(1), 39–58 (2009)

    Article  Google Scholar 

  160. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: Proceedings 5th Slovenian and 1st International Language Technologies Conference, ISLTC 2006, pp. 240–245. Ljubljana, Slovenia, October 2006. Slovenian Language Technologies Society (2006)

    Google Scholar 

  161. Schuller, B., Vlasenko, B., Eyben, F., Wöllmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)

    Google Scholar 

  162. Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings 11th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2009, pp. 552–557. Merano, Italy, IEEE, IEEE (2009)

    Google Scholar 

  163. Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 5688–5691, Prague, Czech Republic, IEEE, IEEE (2011)

    Google Scholar 

  164. Ververidis, D., Kotropoulos, C.: A state of the art review on emotional speech databases. In: 1st Richmedia Conference, pp. 109–119. Lausanne, Switzerland (2003)

    Google Scholar 

  165. Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pp. 865–868. Hannover, Germany (2008)

    Google Scholar 

  166. Steidl, S.: Automatic Classification of Emotion-Related User States in Spontaneous Speech. Logos, Berlin (2009)

    Google Scholar 

  167. Batliner, A., Seppi, D., Steidl, S., Schuller, B.: Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Adv. Human Comput. Interact. Special Issue on Emotion-Aware Natural Interaction 2010(Article ID 782802), 15 (2010)

    Google Scholar 

  168. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)

    Article  Google Scholar 

  169. Eyben, F., Wöllmer, M., Schuller, B.: Openear—introducing the munich open-source emotion and affect recognition toolkit. In: Proceedings 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, vol. I, pp. 576–581, Amsterdam, The Netherlands, HUMAINE Association, IEEE (2009)

    Google Scholar 

  170. Ishi, C., Ishiguro. H., Hagita, N.. Using prosodic and voice quality features for paralinguistic information extraction. In: Proceedings of Speech Prosody 2006, pp. 883–886, Dresden (2006)

    Google Scholar 

  171. Müller, C.: Classifying speakers according to age and gender. In: Müller, C. (ed.) Speaker Classification II, vol. 4343. Lecture Notes in Computer Science/Artificial Intelligence. Springer, Heidelberg (2007)

    Google Scholar 

  172. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (v3.4). Cambridge University Press, Cambridge (2006)

    Google Scholar 

  173. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  174. Steidl, S., Schuller, B., Seppi, D., Batliner, A.: The hinterland of emotions: facing the open-microphone challenge. In: Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, vol. I, pp. 690–697, Amsterdam, The Netherlands, HUMAINE Association, IEEE (2009)

    Google Scholar 

  175. Schuller, B., Metze, F., Steidl, S., Batliner, A., Eyben, F., Polzehl, T.: Late fusion of individual engines for improved recognition of negative emotions in speech—learning vs. democratic vote. In: Proceedings of the 35th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010, pp. 5230–5233, Dallas, TX, IEEE, IEEE (2010)

    Google Scholar 

  176. Wöllmer, M., Weninger, F., Eyben, F., Schuller, B.: Computational assessment of interest in speech - facing the real-life challenge. Künstliche Intelligenz (German J. Artif. Intell.), Special Issue on Emotion and Computing 25(3), 227–236 (2011)

    Google Scholar 

  177. Wöllmer, M., Weninger, F., Eyben, F., Schuller, B.: Acoustic-linguistic recognition of interest in speech with bottleneck-blstm nets. In: Proceedings of INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, pp. 3201–3204. Florence, Italy, ISCA, ISCA (2011)

    Google Scholar 

  178. Mporas, I., Ganchev, T.: Estimation of unknown speaker’s height from speech. Int. J. Speech Tech. 12(4), 149–160 (2009)

    Google Scholar 

  179. Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., Narayanan, S.: Paralinguistics in speech and language—state-of-the-art and the challenge. Comput. Speech Lang. Special Issue on Paralinguistics in Naturalistic Speech and Language 27(1), 4–39 (2013)

    Google Scholar 

  180. Omar, M.K., Pelecanos, J.: A novel approach to detecting non-native speakers and their native language. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, pp. 4398–4401. Dallas, Texas (2010)

    Google Scholar 

  181. Weiss, B., Burkhardt, F.: Voice attributes affecting likability perception. In: Proceedings of the INTERSPEECH, pp. 2014–2017. Makuhari, Japan (2010)

    Google Scholar 

  182. Bruckert, L., Lienard, J., Lacroix, A., Kreutzer, M., Leboucher, G.: Women use voice parameter to assess men’s characteristics. Proc. R. Soc. B. 237(1582), 83–89 (2006)

    Article  Google Scholar 

  183. Gocsál, A.: Female listeners’ personality attributions to male speakers: the role of acoustic parameters of speech. Pollack Period. 4(3), 155–165 (2009)

    Article  Google Scholar 

  184. Mohammadi, G., Vinciarelli, A., Mortillaro, M.: The voice of personality: mapping nonverbal vocal behavior into trait attributions. In: Proceedings of the SSPW 2010, pp. 17–20, Firenze, Italy (2010)

    Google Scholar 

  185. Polzehl, T., Möller, S., Metze, F.: Automatically assessing personality from speech. In: Proceedings—2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010, pp. 134–140. Pittsburgh, PA (2010)

    Google Scholar 

  186. Wallhoff, F., Schuller, B., Rigoll, G.: Speaker identification—comparing linear regression based adaptation and acoustic high-level features. In: Proceedings 31. Jahrestagung für Akustik, DAGA 2005, pp. 221–222. Munich, Germany, DEGA, DEGA (2005)

    Google Scholar 

  187. Müller, C., Burkhardt, F.: Combining short-term cepstral and long-term prosodic features for automatic recognition of speaker age. In: Interspeech, pp. 1–4,.Antwerp, Belgium (2007)

    Google Scholar 

  188. van Dommelen, W., Moxness, B.: Acoustic parameters in speaker height and weight identification: sex-specific behaviour. Lang. Speech 38(3), 267–287 (1995)

    Google Scholar 

  189. Krauss, R.M., Freyberg, R., Morsella, E.: Inferring speakers physical attributes from their voices. J. Exp. Soc. Psychol. 38(6), 618–625 (2002)

    Google Scholar 

  190. Gonzalez, J.: Formant frequencies and body size of speaker: a weak relationship in adult humans. J. Phonetics 32(2), 277–287 (2004)

    Article  Google Scholar 

  191. Evans, S., Neave, N., Wakelin, D.: Relationships between vocal characteristics and body size and shape in human males: an evolutionary explanation for a deep male voice. Biol. Psychol. 72(2), 160–163 (2006)

    Article  Google Scholar 

  192. Grimm, M., Kroschel, K., Narayanan, S.: Support vector regression for automatic recognition of spontaneous emotions in speech. In: International Conference on Acoustics, Speech and Signal Processing, vol. IV, pp. 1085–1088. IEEE (2007)

    Google Scholar 

  193. Hassan, A., Damper, R.I.: Multi-class and hierarchical SVMs for emotion recognition. In: Proceedings of the Interspeech, pp. 2354–2357, Makuhari, Japan (2010)

    Google Scholar 

  194. Burkhardt, F., Eckert, M., Johannsen, W., Stegmann, J.: A database of age and gender annotated telephone speech. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), pp. 1562–1565, Valletta, Malta (2010)

    Google Scholar 

  195. Fisher, M., Doddington, G., Goudie-Marshall, K.: The DARPA speech recognition research database: specifications and status. In: Proceedings of the DARPA Workshop on Speech Recognition, pp. 93–99 (1986)

    Google Scholar 

  196. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1) (2009)

    Google Scholar 

  197. Krajewski, J., Batliner, A., Golz, M.: Acoustic sleepiness detection—framework and validation of a speech adapted pattern recognition approach. Behav. Res. Meth. 41, 795–804 (2009)

    Article  Google Scholar 

  198. Levit, M., Huber, R., Batliner, A., Nöth, E.: Use of prosodic speech characteristics for automated detection of alcohol intoxination. In: Bacchiani, M., Hirschberg, J., Litman, D., Ostendorf, M. (eds.) Proceedings of the Workshop on Prosody and Speech Recognition 2001Red Bank, NJ, pp. 103–106 (2001)

    Google Scholar 

  199. Schiel, F., Heinrich, C.: Laying the foundation for in-car alcohol detection by speech. In: Proceedings of INTERSPEECH 2009, pp. 983–986, Brighton, UK (2009)

    Google Scholar 

  200. Ellgring, H., Scherer, K.R.: Vocal indicators of mood change in depression. J. Nonverbal Behav. 20, 83–110 (1996)

    Article  Google Scholar 

  201. Laskowski, K., Ostendorf, M., Schultz, T.: Modeling vocal interaction for text-independent participant characterization in multi-party conversation. In: Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, pp. 148–155, Columbus (2008)

    Google Scholar 

  202. Ipgrave, J.: The language of friendship and identity: children’s communication choices in an interfaith exchange. Br. J. Relig. Edu. 31(3), 213–225 (2009)

    Article  Google Scholar 

  203. Fujie, S., Ejiri, Y., Kikuchi, H., Kobayashi, T.: Recognition of positive/negative attitude and its application to a spoken dialogue system. Syst. Comput. Jpn. 37(12), 45–55 (2006)

    Article  Google Scholar 

  204. Vinciarelli, A., Pantic, M., Bourlard, H.: Social signal processing: survey of an emerging domain. Image Vis. Comput. 27, 1743-1759 (2009)

    Google Scholar 

  205. Lee, C.-C., Katsamanis, A., Black, M., Baucom, B., Georgiou, P., Narayanan, S.: An analysis of pca-based vocal entrainment measures in married couples’ affective spoken interactions. In: Proceedings of Interspeech, pp. 3101–3104, Florence, Italy (2011)

    Google Scholar 

  206. Brenner, M., Cash, J.: Speech analysis as an index of alcohol intoxication—the Exxon Valdez accident. Aviat. Space Environ. Med. 62, 893–898 (1991)

    Google Scholar 

  207. Harrison, Y., Horne, J.: The impact of sleep deprivation on decision making: a review. J. Exp. Psychol. Appl. 6, 236–249 (2000)

    Article  Google Scholar 

  208. Bard, E.G., Sotillo, C., Anderson, A.H., Thompson, H.S., Taylor, M.M.: The DCIEM map task corpus: spontaneous dialogue under SD and drug treatment. Speech Commun. 20, 71–84 (1996)

    Article  Google Scholar 

  209. Caraty, M., Montacie, C.: Multivariate analysis of vocal fatigue in continuous reading. In: Proceedings of Interspeech 2010, pp. 470–473, Makuhari, Japan (2010)

    Google Scholar 

  210. Schiel, F., Heinrich, C., Barfüßer, S.: Alcohol language corpus—the first public corpus of alcoholized German speech. Lang. Res. Eval. 46(3), 503–521 (2012)

    Article  Google Scholar 

  211. Akerstedt, T., Gillberg, M.: Subjective and objective sleepiness in the active individual. Int. J. Neurosci. 52(1–2), 29–37 (May 1990)

    Article  Google Scholar 

  212. Krajewski, J., Schnieder, S., Sommer, D., Batliner, A., Schuller, B.: Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech. Neurocomputing. Special Issue From neuron to behavior: evidence from behavioral measurements 84, 65–75 (2012)

    Google Scholar 

  213. Krajewski, J., Kröger, B.: Using prosodic and spectral characteristics for sleepiness detection. In: Proceedings of INTERSPEECH 2007, 8th Annual Conference of the International Speech Communication Association, pp. 1841–1844, Antwerp, Belgium, ISCA, ISCA (2007)

    Google Scholar 

  214. Chin, S.B., Pisoni, D.B.: Alcohol and Speech. Academic Press Inc, New York (1997)

    Google Scholar 

  215. Dhupati, L., Kar, S., Rajaguru, A., Routray, A.: A novel drowsiness detection scheme based on speech analysis with validation using simultaneous EEG recordings. In: Proceedings of IEEE Conference on Automation Science and Engineering (CASE), pp. 917–921, Toronto, ON (2010)

    Google Scholar 

  216. Weninger, F., Schuller, B., Fusing utterance-level classifiers for robust intoxication recognition from speech. In: Proceedings MMCogEmS, : Workshop (Inferring Cognitive and Emotional States from Multimodal Measures), held in conjunction with the 13th International Conference on Multimodal Interaction, ICMI 2011, Alicante, Spain, ACM, ACM (2011)

    Google Scholar 

  217. Schuller, B., Weninger, F.: Ten recent trends in computational paralinguistics. In: Esposito, A., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) 4th COST 2102 International Training School on Cognitive Behavioural Systems. Lecture Notes on Computer Science (LNCS), p. 15. Springer, New York (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn Schuller .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schuller, B. (2013). Applications in Intelligent Speech Analysis. In: Intelligent Audio Analysis. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36806-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36806-6_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36805-9

  • Online ISBN: 978-3-642-36806-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics