Speaker Characteristics

Schultz, Tanja

doi:10.1007/978-3-540-74200-5_3

Tanja Schultz¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

2503 Accesses
9 Citations

Abstract

In this chapter, we give a brief introduction to speech-driven applications in order to motivate why it is desirable to automatically recognize particular speaker characteristics from speech. Starting from these applications, we derive what kind of characteristics might be useful. After categorizing relevant speaker characteristics, we describe in more detail language, accent, dialect, idiolect, and sociolect. Next, we briefly summarize classification approaches to illustrate how these characteristics can be recognized automatically, and conclude with a practical example of a system implementation that performs well on the classification of various speaker characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sacks, O.W.: The Man who Mistook His Wife for a Hat - and other Clinical Trials. New York (summit Books) (1985)
Google Scholar
Krauss, R.M., Freyberg, R., Morsella, E.: Inferring speakers’ physical attributes from their voices. Journal of Experimental Social Psychology 38, 618–625 (2002)
Article Google Scholar
Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship. MIT Press, Cambridge (2005)
Google Scholar
Sproat, R.: Review in Computational Linguist 17.65 on Nass and Brave 2005. Linguist List 17.65 (2006), http://linguistlist.org/issues/17/17-65.html
Nass, C., Gong, L.: Speech Interfaces from an Evolutionary Perspective: Social Psychological Research and Design Implications. Communications of the ACM 43(9), 36–43 (2000)
Article Google Scholar
Nass, C., Lee, K.M.: Does computer-generated speech manifest personality? an experimental test of similarity-attraction. In: CHI 2000. Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 329–336. ACM Press, New York (2000)
Google Scholar
Tokuda, K.: Hidden Markov model-based Speech Synthesis as a Tool for constructing Communicative Spoken Dialog Systems. In: Proc. 4th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Special Session on Speech Communication: Communicative Speech Synthesis and Spoken Dialog, invited paper, Honolulu, Hawaii (2006)
Google Scholar
Doddington, G.: Speaker Recognition - Identifying People by their Voices. Proceedings of the IEEE 73(11), 1651–1664 (1985)
Article Google Scholar
Meng, H., Li, D.: Multilingual Spoken Dialog Systems. In: Multilingual Speech Processing, pp. 399–447. Elsevier, Academic Press (2006)
Google Scholar
Seneff, S., Hirschman, L., Zue, V.W.: Interactive problem solving and dialogue in the ATIS domain. In: Proceedings of the Fourth DARPA Speech and Natural Language Workshop, Defense Advanced Research Projects Agency, pp. 1531–1534. Morgan Kaufmann, Pacific Grove (1991)
Google Scholar
Rudnicky, A., Thayer, E., Constantinides, P., Tchou, C., Shern, R., Lenzo, K., Xu, W., Oh, A.: Creating natural dialogs in the Carnegie Mellon Communicator system. In: EUROSPEECH. Proc. of the European Conference on Speech Communication and Technology, Budapest, Hungary, pp. 1531–1534 (1999)
Google Scholar
Litman, D., Forbes, K.: Recognizing Emotions from Student Speech in Tutoring Dialogues. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, St. Thomas, Virgin Islands (2003)
Google Scholar
Zue, V., Seneff, S., Glass, J., Polifroni, J., Pao, C., Hazen, T., Hetherington, L.: JUPITER: A telephone-based conversational interface for weather information. IEEE Transactions on Speech and Audio Processing 8(1) (2000)
Google Scholar
Hazen, T., Jones, D., Park, A., Kukolich, L., Reynolds, D.: Integration of Speaker Recognition into Conversational Spoken Dialog Systems. In: EUROSPEECH. Proc. of the European Conference on Speech Communication and Technology, Geneva, Switzerland (2003)
Google Scholar
Muthusamy, Y.K., Barnard, E., Cole, R.A.: Reviewing Automatic Language Identification. IEEE Signal Processing Magazin (1994)
Google Scholar
Gorin, A.L., Riccardi, G., Wright, J.H.: How may I help you? Speech Communication 23(1/2), 113–127 (1997)
Article Google Scholar
Batliner, A., Fischer, K., Huber, R., Spilker, J., Noth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2004)
Article Google Scholar
Polzin, T., Waibel, A.: Emotion-sensitive Human-Computer Interfaces. In: Proc. ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, Belfast, Northern Ireland (2000)
Google Scholar
Raux, A., Langner, B., Black, A.W., Eskenazi, M.: LET’S GO: Improving Spoken Language Dialog Systems for the Elderly and Non-natives. In: EUROSPEECH. Proc. of the European Conference on Speech Communication and Technology, Geneva, Switzerland (2003)
Google Scholar
ELLS: The e-language learning system. ELLS Web-server (2004) (retrieved, December 2006), from http://ott.educ.msu.edu/elanguage/
Eskenazi, M.: Issues in the Use of Speech Recognition for Foreign Language Tutors. Language Learning and Technology Journal 2(2), 62–76 (1999)
Google Scholar
Barnard, E., Cloete, J.P.L., Patel, H.: Language and Technology Literacy Barriers to Accessing Government Services. In: Traunmüller, R. (ed.) EGOV 2003. LNCS, vol. 2739, pp. 37–42. Springer, Heidelberg (2003)
Chapter Google Scholar
CHIL: Computers in the human interaction loop. CHIL Web-server (2006) (retrieved, December 2006), from http://chil.server.de
Schultz, T., Waibel, A., Bett, M., Metze, F., Pan, Y., Ries, K., Schaaf, T., Soltau, H., Westphal, M., Yu, H., Zechner, K.: The ISL Meeting Room System. In: HSC-2001. Proceedings of the Workshop on Hands-Free Speech Communication, Kyoto, Japan (2001)
Google Scholar
Waibel, A., Bett, M., Finke, M., Stiefelhagen, R.: Meeting browser: Tracking and summarizing meetings. In: Penrose, D.E.M. (ed.) Proceedings of the Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, pp. 281–286. Morgan Kaufmann, San Francisco (1998)
Google Scholar
AMI: Augmented multi-party interaction. AMI Web-server (2006) (retrieved, December 2006), from http://amiproject.org/
Vogel, S., Schultz, T., Waibel, A., Yamamoto, S.: Speech-to-Speech Translation. In: Multilingual Speech Processing. Elsevier, Academic Press, pp. 317–398 (2006)
Google Scholar
GALE: Global autonomous language exploitation. GALE Program (2006) (retrieved, December 2006), from http://www.darpa.mil/ipto/Programs/gale/index.htm
Wahlster, W. (ed.): Verbmobil: Foundations of Speech-to-Speech Translation. LNCS (LNAI). Springer, Berlin, Heidelberg, New York (2000)
MATH Google Scholar
Waibel, A., Soltau, H., Schultz, T., Schaaf, T., Metze, F.: Multilingual Speech Recognition. In: The Verbmobil Book, Springer, Heidelberg (2000)
Google Scholar
McNair, A., Hauptmann, A., Waibel, A., Jain, A., Saito, H., Tebelskis, J.: Janus: A Speech-To-Speech Translation System Using Connectionist And Symbolic Processing Strategies. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Toronto, Canada (1991)
Google Scholar
Cincarek, T., Toda, T., Saruwatari, H., Shikano, K.: Acoustic Modeling for Spoken Dialog Systems based on Unsupervised Utterance-based Selective Training. In: ICSLP. Proc. of the International Conference on Spoken Language Processing, Pittsburgh, PA (2006)
Google Scholar
Kemp, T., Waibel, A.: Unsupervised Training of a Speech Recognizer using TV Broadcasts. In: ICSLP. Proc. of the International Conference on Spoken Language Processing, Sydney, Australia, pp. 2207–2210 (1998)
Google Scholar
Schultz, T., Waibel, A.: Language Independent and Language Adaptive Acoustic Modeling for Speech Recognition. Speech Communication 35(1-2), 31–51 (2001)
Article MATH Google Scholar
Doddington, G.: Speaker recognition based on idiolectal differences between speakers. In: Proceedings of Eurospeech (2001)
Google Scholar
Goronzy, S., Tomokiyo, L.M., Barnard, E., Davel, M.: Other Challenges: Non-native Speech, Dialects, Accents, and Local Interfaces. In: Multilingual Speech Processing. Elsevier, Academic Press, pp. 273–315 (2006)
Google Scholar
Jessen, M.: Speaker Classification in Forensic Phonetics and Acoustics. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Google Scholar
Eriksson, E., Rodman, R., Hubal, R.C.: Emotions in Speech: Juristic Implications. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Google Scholar
Reynolds, D.: Tutorial on SuperSID. In: JHU 2002 Workshop (2002) (retrieved, December 2006), from http://www.clsp.jhu.edu/ws2002/groups/supersid/SuperSID_Tutorial.pdf
Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., Fischer, K.: The Recognition of Emotion. In: The Verbmobil Book, pp. 122–130. Springer, Heidelberg (2000)
Google Scholar
Katzenmaier, M., Schultz, T., Stiefelhagen, R.: Human-Human-Robot Interaction. In: International Conference on Multimodal Interfaces, Penn State University - State College, PA (2004)
Google Scholar
Kirchhoff, K.: Language Characteristics. In: Multilingual Speech Processing. Elsevier, Academic Press, pp. 5–32 (2006)
Google Scholar
Goronzy, S.: Robust Adaptation to Non-Native Accents in Automatic Speech Recognition. LNCS (LNAI), vol. 2560. Springer, Heidelberg (2002)
MATH Google Scholar
Wang, Z., Schultz, T.: Non-Native Spontaneous Speech Recognition through Polyphone Decision Tree Specialization. In: EUROSPEECH. Proc. of the European Conference on Speech Communication and Technology, Geneva, Switzerland, pp. 1449–1452 (2003)
Google Scholar
Fischer, V., Gao, Y., Janke, E.: Speaker-independent upfront dialect adaptation in a large vocabulary continuous speech recognizer. In: ICSLP. Proc. of the International Conference on Spoken Language Processing (1998)
Google Scholar
Sancier, M.L., Fowler, C.A.: Gestural drift in bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics 25, 421–436 (1997)
Article Google Scholar
Cohen, P., Dharanipragada, S., Gros, J., Monkowski, M., Neti, C., Roukos, S., Ward, T.: Towards a universal speech recognizer for multiple languages. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 591–598 (1997)
Google Scholar
Fügen, C., Stüker, S., Soltau, H., Metze, F., Schultz, T.: Efficient handling of multilingual language models. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 441–446 (2003)
Google Scholar
Navrátil, J.: Automatic Language Identification. In: Multilingual Speech Processing. Elsevier, Academic Press, pp. 233–272 (2006)
Google Scholar
Reynolds, D.: An Overview of Automatic Speaker Recognition Technology. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, pp. 4072–4075 (2002)
Google Scholar
Huang, X.D., Acero, A., Hon, H.-W.: Spoken Language Processing. Prentice Hall PTR, New Jersey (2001)
Google Scholar
Reynolds, D.: A Gaussian mixture modeling approach to text-independent using automatic acoustic segmentation. PhD thesis, Georgia Institute of Technology (1993)
Google Scholar
Kohler, M.A., Andrews, W.D., Campbell, J.P., Hernander-Cordero, L.: Phonetic Refraction for Speaker Recognition. In: Proceedings of Workshop on Multilingual Speech and Language Processing, Aalborg, Denmark (2001)
Google Scholar
Jin, Q., Navratil, J., Reynolds, D., Andrews, W., Campbell, J., Abramson, J.: Cross-stream and Time Dimensions in Phonetic Speaker Recognition. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, HongKong, China (2003)
Google Scholar
Campbell, J.P.: Speaker recognition: A tutorial. Proceedings of the IEEE 85, 1437–1462 (1997)
Article Google Scholar
Jin, Q.: Robust Speaker Recognition. PhD thesis, Carnegie Mellon University, Language Technologies Institute, Pittsburgh, PA (2007)
Google Scholar
Cimarusti, D., Ives, R.: Development of an automatic identification system of spoken languages: Phase 1. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Paris (1982)
Google Scholar
Zissman, M.A.: Language Identification Using Phone Recognition and Phonotactic Language Modeling. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing. vol. 5, pp. 3503–3506. Detroit, MI (1995)
Google Scholar
Hazen, T.J., Zue, V.W.: Segment-based automatic language identification. Journal of the Acoustical Society of America 101(4), 2323–2331 (1997)
Article Google Scholar
Navrátil, J.: Spoken language recognition - a step towards multilinguality in speech processing. IEEE Trans. Audio and Speech Processing 9(6), 678–685 (2001)
Article Google Scholar
Parandekar, S., Kirchhoff, K.: Multi-stream language identification using data-driven dependency selection. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing (2003)
Google Scholar
Torres-Carrasquillo, P., Reynolds, D., Deller, Jr., J.: Language identification using gaussian mixture model tokenization. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing (2002)
Google Scholar
Eady, S.J.: Differences in the f0 patterns of speech: Tone language versus stress language. Language and Speech 25(1), 29–42 (1982)
Google Scholar
Schultz, T., Rogina, I.A.W.: Lvcsr-based language identification. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Atlanta, Georgia, IEEE (1996)
Google Scholar
Schultz, T.: Globalphone: A multilingual text and speech database developed at karlsruhe university. In: ICSLP. Proc. of the International Conference on Spoken Language Processing, Denver, CO (2002)
Google Scholar
Jin, Q., Schultz, T., Waibel, A.: Speaker Identification using Multilingual Phone Strings. In: ICASSP. Proc. of the International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL (2002)
Google Scholar
NIST: Speaker recognition evaluation plan (1999) (retrieved, December 2006), from http://www.itl.nist.gov/iaui/894.01/spk99/spk99plan.html
Tomokiyo-Mayfield, L.: Recognizing Non-Native Speech: Characterizing and Adapting to Non-Native Usage in LVCSR. PhD thesis, CMU-LTI-01-168, Language Technologies Institute, Carnegie Mellon, Pittsburgh, PA (2001)
Google Scholar
Schultz, T., Jin, Q., Laskowski, K., Tribble, A., Waibel, A.: Speaker, accent, and language identification using multilingual phone strings. In: HLT. Proceedings of the Human Language Technologies Conference, San Diego, Morgan Kaufman, San Francisco (2002)
Google Scholar
Schultz, T., Jin, Q., Laskowski, K., Tribble, A., Waibel, A.: Improvements in non-verbal cue identification using multilingual phone strings. In: Proceedings of the 40nd Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, The Association for Computational Linguistics (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Tanja Schultz

Authors

Tanja Schultz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schultz, T. (2007). Speaker Characteristics. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-74200-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74186-2
Online ISBN: 978-3-540-74200-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics