Historical and Procedural Overview of Forensic Speaker Recognition as a Science

Amino, Kanae; Osanai, Takashi; Kamada, Toshiaki; Makinae, Hisanori; Arai, Takayuki

doi:10.1007/978-1-4614-0263-3_1

Kanae Amino Ph.D.³,
Takashi Osanai Ph.D.³,
Toshiaki Kamada B.E.³,
Hisanori Makinae Ph.D.³ &
…
Takayuki Arai Ph.D.⁴

1731 Accesses
5 Citations

Abstract

Forensic phonetics and acoustics are nowadays widely used regarding police and legal use of acoustic samples. Among many tasks included in this area, forensic speaker recognition is considered as one of the most complex problems. Forensic speaker recognition, sometimes called forensic speaker comparison, is a process for making judgments on whether or not two speech samples are from the same speaker. This chapter introduces the historical backgrounds of forensic speaker recognition including “voiceprint” controversy, human-based visual and auditory forensic speaker recognition, and automatic forensic speaker recognition. Procedural considerations in forensic speaker recognition processes and factors that affect recognition performances are also presented. Finally, we will give a summary of the progress and developments made in the forensic automatic speaker recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nolan F (1983) The phonetic basis of speaker recognition. Cambridge studies in speech science and communiation. Cambridge University Press, Cambridge
Google Scholar
Schmidt-Nielsen A, Stern KR (1985) Identification of known voices as a function of familiarity and narrow-band coding. J Acoust Soc Am 77:658–663
Article Google Scholar
Van Lacker D, Kreiman J, Emmorey K (1985) Familiar voice recognition: patterns and parameters part 1: recognition of backward voices. J Phonetics 13:19–38
Google Scholar
Van Lacker D, Kreiman J (1985) Familiar voice recognition: patterns and parameters part 2: recognition of rate-altered voices. J Phonetics 13:39–52
Google Scholar
Cheney D, Seyfarth R (1980) Vocal recognition in free-ranging vervet monkeys. Anim Behav 28:362–367
Article Google Scholar
Rendall D, Rodman PS, Emond RE (1996) Vocal recognition of individuals and kin in free-ranging rhesus monkeys. Anim Behav 51:1007–1015
Article Google Scholar
Sugiura H (2001) Vocal exchange of coo calls in Japanese macaques. In: Matsuzawa T (ed) Primate origins of human cognition and behaviour. Springer, Tokyo, pp 135–154
Google Scholar
Bricker P, Pruzansky S (1976) Speaker recognition. In: Lass N (ed) Contemporary issues in experimental phonetics. Academic Press, New York, pp 295–326
Google Scholar
Furui S (1992) Acoustic and speech engineering (onkyo, onsei kougaku). Kindai Kagakusha Publishing Company, Tokyo
Google Scholar
National Research Council (1979) On the theory and practice of voice identification. National Academy of Science, Washington, pp 3–13
Google Scholar
Steinberg JC (1934) Application of sound measuring instruments to the study of phonetic problems. J Acoust Soc Am 6:16–24
Article Google Scholar
Potter R (1945) Visible patterns of speech. Science 102:463–470
Article Google Scholar
Grey CHG, Kopp GA (1944) Voiceprint identification. Bell Telephone Laboratory Annual Report, New York, pp 1–14
Google Scholar
Tosi O, Oyer H, Lashbrook W, Pedrey C, Nicol J, Nash E (1972) Experiment on voice identification. J Acoust Soc Am 51:2030–2043
Article Google Scholar
Kersta L (1962) Voiceprint identification. Nature 196:1253–1257
Article Google Scholar
Campbell JP, Shen W, Campbell WM, Schwartz R, Bonastre JF, Matrouf D (2009) Forensic speaker recognition. IEEE Signal Process Mag 26:95–103
Article Google Scholar
Young MA, Campbell RA (1967) Effects of context on talker identification. J Acoust Soc Am 42:1250–1254
Article Google Scholar
Tosi O (1968) Speaker identification through acoustic spectrography. Proc Logoped Phoniatr, pp 138–145
Google Scholar
Stevens KN, Williams CE, Carbonell JR, Woods B (1968) Speaker authentication and identification: a comparison of spectrographic and auditory presentations of speech material. J Acoust Soc Am 44:1596–1607
Article Google Scholar
Bolt RH, Cooper FS, David EE Jr, Denes PB, Pickett JM, Stevens KN (1970) Speaker identification by speech spectrograms: a scientists’ view of its reliability for legal purposes. J Acoust Soc Am 47:597–612
Article Google Scholar
Bolt RH, Cooper FS, David EE Jr, Denes PB, Pickett JM, Stevens KN (1973) Speaker identification by speech spectrograpms: some further observations. J Acoust Soc Am 54:531–534
Article Google Scholar
Koenig BE (1986) Spectrographic voice identification: a forensic survey. J Acoust Soc Am 79:2088–2090
Article Google Scholar
Shipp T, Doherty TE, Hollien H (1987) Some fundamental considerations regarding voice identification. J Acoust Soc Am 82:687–688
Article Google Scholar
Koenig BE, Ritenour DV Jr, Kohus BA, Kelly S (1987) Reply to ‘Some fundamental considerations regarding voice identification’. J Acoust Soc Am 82:688–689
Article Google Scholar
Lindh J (2004) Handling the voiceprint issue. Proc Fonetik, pp 72–75
Google Scholar
Poza FT, Begault DR (2005) Voice identification and elimination using sural-spectrographic protocols. Proc AES Int’l Conf, pp 1–8
Google Scholar
McGehee F (1937) The reliability of the identification of the human voice. J Gen Psychol 17:249–271
Article Google Scholar
McGehee F (1944) An experimental study of voice recognition. J Gen Psychol 31:53–65
Article Google Scholar
Pollack I, Pickett JM, Sumby WH (1954) On the identification of speaker by voice. J Acoust Soc Am 26:403–406
Article Google Scholar
Bricker P, Pruzansky S (1966) Effects of stimulus content and duration on talker identification. J Acoust Soc Am 40:1441–1450
Article Google Scholar
Clifford BR (1980) Voice identification by human listeners: on earwitness reliability. Law Human Behav 4:373–394
Article Google Scholar
Papcun G, Kreiman J, Davis A (1989) Long-term memory for unfamiliar voices. J Acoust Soc Am 85:913–925
Article Google Scholar
Yarmey AD, Matthys E (1992) Voice identification of an abductor. Appl Cogn Psychol 6:367–377
Article Google Scholar
Yarmey AD, Yarmey AL, Yarmey M, Parliament L (2001) Commonsense beliefs and the identification of familiar voices. Appl Cogn Psychol 15:283–299
Article Google Scholar
O’Shaughnessy D (2001) Speech communication—human and machine, 2nd edn. Addison-Wesley Publishing Company, New York
Google Scholar
Hollien H (2002) Forensic voice identification. Academic Press, San Diego
Google Scholar
Bonastre JF, Bimbot F, Boe LJ, Campbell JP, Reynolds DA, Magrin-Chagnolleau I (2003) Person authentication by voice: a need for caution. Proc Eurospeech, pp 1–4
Google Scholar
Denes PB, Pinson EN (1993) The speech chain, 2nd edn. Worth Publishers, New York
Google Scholar
Kuenzel H (2000) Effects of voice disguise on speaking fundamental frequency. Forensic Ling 7:149–179
Article Google Scholar
Zhang C, Tan T (2007) Voice disguise and automatic speaker recognition. Forensic Sci Int 175:118–122
Article Google Scholar
Reich AR, Duke JE (1979) Effects of selected vocal disguises upon speaker identification by listening. J Acoust Soc Am 66:1023–1028
Article Google Scholar
Orchard TL, Yarmey AD (1995) The effects of whispers, voice-sample duration, and voice distinctiveness on criminal speaker identification. Appl Cogn Psychol 9:249–260
Article Google Scholar
Sjoestroem M, Eriksson E, Zetterholm E, Sullivan KP (2006) A switch of dialect as disguise. Lund Univ. Linguistics and Phonetics Woking Papers, vol 52, pp 113–116
Google Scholar
Markham D (1999) Listeners and disguised voices: the imitation and perception of dialect accent. J Speech Lang Law 6:289–299
Google Scholar
Amino K, Arai T (2009) Dialectal characteristics of Osaka and Tokyo Japanese: analyses of phonologically identical words. Proc Interspeech, pp 2303–2306
Google Scholar
House AS, Stevens KN (1993) Speech production: thirty years after. J Acoust Soc Am 94:1763
Article Google Scholar
Hollien H, Schwartz R (2000) Aural-perceptual speaker identification: problems with noncontemporary samples. Forensic Linguist 7:199–211
Article Google Scholar
Hollien H, Schwartz R (2001) Speaker identification utilizing noncontemporary speech. J Forensic Sci 46:63–67
Google Scholar
Amino K, Osanai T, Kamada T, Makinae H, Arai T (2011) Effects of the phonological contents and transmission channels on forensic speaker recognition. In: Neustein A, Patil HA (eds) Advances in forensic speaker recognition. Springer
Google Scholar
Kuenzel HJ (2001) Beware of the ’telephone effect’: the influence of telephone transmission on the measurement of formant frequencies. Forensic Liguist 8:80–99
Article Google Scholar
Byne C, Foulkes P (2004) The ‘mobile phone effect’ on vowel formants. J Speech Lang Law 11:1350–1771
Google Scholar
Lawrence S, Nolan F, McDougall K (2008) Acoustic and perceptual effects of telephone transmission on vowel quality. J Speech Lang Law 15:161–192
Google Scholar
Titze I (1989) Physiologic and acoustic differences between male and female voices. J Acoust Soc Am 85:1699–1707
Article Google Scholar
Kent RD, Read C (2001) Acoustic analysis of speech, 2nd edn. Cengage Learning
Google Scholar
Clarke FR, Becker RW (1969) Comparison of techniques for discriminating among talkers. J Speech Hear Res 12:747–761
Google Scholar
Thompson CP (1987) A language effect in voice identification. Appl Cogn Psychol 1:121–131
Article Google Scholar
Goggin J, Thompson CP, Strube G, Simental LR (1991) The role of language familiarity in voice identification. Mem Cognit 19:448–458
Article Google Scholar
Koester O, Schiller NO (1997) Different influences of the native language of a listener on speaker recognition. Forensic Linguist 4:18–28
Google Scholar
Philippon AC, Cherryman J, Bull R, Vrij A (2007) Earwitness identification performances: the effect of language, target, deliberate strategies and indirect measures. Appl Cogn Psychol 21:539–550
Article Google Scholar
Hashimoto M, Kitagawa S, Higuchi N (1998) Quantitative analysis of acoustic features affecting speaker identification. J Acoust Soc Jpn 54:169–178
Google Scholar
Hollien H, Majewski W, Doherty TE (1982) Perceptual identification of voices under normal, stress, and disguise speaking conditions. J Phonetics 10:139–148
Google Scholar
Ladefoged P, Ladefoged J (1980) The ability of listeners to identify voices. UCLA Working Papers Phon 49:43–89
Google Scholar
Nygaard L (2005) Perceptual integration of linguistic and nonlinguistic properties of speech. In: Pisoni DB, Remez RE (eds) The handbook of speech perception. Blackwell, Oxford, pp 390–413
Google Scholar
Roebuck R, Wilding J (1993) Effects of vowel variety and sample length on identification of a speaker in a line-up. Appl Cogn Psychol 7:475–481
Article Google Scholar
Cook S, Wilding J (1997) Earwitness testimony: never mind the variety, hear the length. Appl Cogn Psychol 11:95–111
Article Google Scholar
Loftus EF, Loftus GR, Messo J (1987) Some facts about weapon focus. Law Human Behav 11:55–62
Article Google Scholar
Loftus EF, Miller DG, Burns HJ (1978) Semantic integration of verbal information into a visual memory. J Exp Psychol Human Learn Mem 4:19–31
Article Google Scholar
Schooler JW, Engstler-Schooler TY (1990) Verbal overshadowing of visual memories: some things are better left unsaid. Cogn Psychol 22:36–71
Article Google Scholar
Chin JM, Schooler JW (2008) Why do words hurt? Content, process, and criterion shift accounts of verbal overshadowing. Eur J Cogn Psychol 20:396–413
Article Google Scholar
Kitagami S (2001) Disruptive effect of verbal encoding on memory and cognition of nonverbal information. Kyoto Univ Dept Edu Bull Paper 47:403–413
Google Scholar
Kasahara H, Ochi K (2008) Verbal overshadowing effect in earwitness perception. Proc Ann Conv Jpn Psychol Assoc 72:889
Google Scholar
Cook S, Wilding J (2001) Earwitness testimony: effects of exposure and attention on the face overshadowing effect. Br J Psychol 92:617–629
Article Google Scholar
Kasahara H, Ochi K (2006) Effect of face presence on memory for a voice. J Jpn Acad Facial Studies 6:71–76
Google Scholar
Yarmey AD, Yarmey AL, Yarmey MJ (1994) Face and voice identifications in showups and lineups. Appl Cogn Psychol 8:453–464
Article Google Scholar
Bull R, Clifford BR (1984) Earwitness voice recognition accuracy. In: Wells GL, Loftus EF (eds) Eyewitness testimony: psychological perspectives. Cambridge University Press, Cambridge, pp 92–123
Google Scholar
Kerstholt JH, Jansen N, Van Amelsvoort AG, Broeders AP (2004) Earwitnesses: effects of speech duration, retention, internal and acoustic environment. Appl Cogn Psychol 18:327–336
Article Google Scholar
Van Wallendael LR, Surace A, Parsons DH, Brown M (1994) Earwitness’ voice recognition: factors affecting accuracy and impact on jurors. Appl Cogn Psychol 8:661–677
Article Google Scholar
Pruzansky S (1963) Pattern-matching procedure for automatic talker recognition. J Acoust Soc Am 35:354–358
Article Google Scholar
Li KP, Dammann JE, Chapman WD (1966) Experimental studies in speaker verification, using and adaptive system. J Acoust Soc Am 40:966–978
Article Google Scholar
Glenn JW, Kleiner N (1967) Speaker identification based on nasal phonation. J Acoust Soc Am 43:368–372
Article Google Scholar
Furui S, Itakura F, Saito S (1972) Talker recognition by the longtime averaged speech spectrum. IEICE Trans 55-A(1):549–556
Google Scholar
Wolf JJ (1971) Efficient acoustic parameters for speaker recognition. J Acoust Soc Am 51:2044–2056
Article Google Scholar
Atal BS (1972) Automatic speaker recognition based on pitch contours. J Acoust Soc Am 52:1687–1697
Article Google Scholar
Furui S, Itakura F (1973) Talker recognition by statistical features of speech sounds. Electron Commun Jap 56-A:62–71
Google Scholar
Atal BS (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55:1304–1312
Article Google Scholar
Sambur MR (1975) Selection of acoustic features for speaker identification. IEEE Trans Acoust Speech Sig Process 23:176–182
Article Google Scholar
Hollien H, Majewski W (1977) Speaker identification by long-term spectra under normal and distorted speech conditions. J Acoust Soc Am 62:975–980
Article Google Scholar
Matsumoto H, Nimura T (1978) Text-independent speaker identification based on piecewise canonical discriminant analysis. Proc Int Conf Acoust Speech Sig Process, 3:291–294
Google Scholar
Markel JD, Davis SB (1979) Text-independent speaker recognition from a large linguistically unconstrained time spaced data base. IEEE Trans Acoust Speech Sig Process 27:74–82
Article Google Scholar
Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Sig Process 29:254–272
Article Google Scholar
Li KP, Wrench EH (1983) Text-independent speaker recognition with short utterances. Proc Int Conf Acoust Speech Sig Process, 8:555–558
Google Scholar
Soong F, Rosenberg A, Rabiner L, Juang BH (1985) A vector quantization approach to speaker recognition. Proc Int Conf Acoust Speech Sig Process, 387–390
Google Scholar
Rosenberg A, Soong F (1986) Evaluation of a vector quantisation talker recognition system in text independent and text dependent modes. Proc Int Conf Acoust Speech Sig Process, 11:873–876
Google Scholar
Shirai K, Mano K, Ishige D (1987) Speaker identification based on frequency distribution of vector-quantised spectra. IEICE Trans 70-D:1181–1188
Google Scholar
Rosenberg A, Lee CH, Soong F (1990) Sub-word unit talker verification using Hidden Markov Models. Proc Int Conf Acoust Speech Sig Process, 1:269–272
Google Scholar
Higgins A, Bahler L, Porter J (1991) Speaker verification using randomized phrase prompting. Digit Signal Process 1:89–106
Google Scholar
Tishby NZ (1991) On the application of mixture AR Hidden Markov Models to text-independent speaker recognition. IEEE Trans Acoust Speech Sig Process 39:563–570
Google Scholar
Reynolds AD, Carlson B (1995) Text-dependent speaker verification using decoupled and integrated speaker and speech recognizers. Proc Eurospeech, pp 647–650
Google Scholar
Reynolds AD, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audi Process 3:72–83
Article Google Scholar
Che C, Lin Q (1995) Speaker recognition using HMM with experiments on the YOHO database. Proc Eurospeech, pp 625–628
Google Scholar
NIST webpage. http://www.nist.gov/index.html
Google Scholar
NIST-SRE. http://www.itl.nist.gov/iad/mig//tests/sre/
Google Scholar
Doddington GR, Przybocki MA, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodology, systems, results, perspective. Speech Commun 31:225–254
Article Google Scholar
Nakasone H, Beck SD (2001) Forensic automatic speaker recognition. Proc A Speaker Odyssey—the speaker recognition workshop, pp 139–142
Google Scholar
Drygajlo A (2007) Forensic automatic speaker recognition. IEEE Signal Process Mag 24:132–135
Article Google Scholar
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. Proc Eurospeech, pp 1895–1898
Google Scholar
Bimbot F, Bonastre JF, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-Garcia J, Petrovska-Delacretaz D, Reynolds DA (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Process 4:430–451
Google Scholar
Noda H, Darada K, Kawaguchi E, Sawai H (1998) A context-dependent approach for speaker verification using sequential decision. Proc Int Conf Spoken Lang Process
Google Scholar
Ortega-Garcia J, Cruz-Llanas S, Gonzalez-Rodriguez J (1998) Quantitative influence of speech variability factors for automatic speaker verification in forensic tasks. Proc Int Conf Spoken Lang Process
Google Scholar
Gonzalez-Rodriguez J, Ortega-Garcia J, Lucena-Molina JJ (2001) On the application of the Bayesian approach to real forensic conditions with GMM-based systems. Proc a speaker odyssey—the speaker recognition workshop, pp 135–138
Google Scholar
Meuwly D, Drygajlo A (2001) Forensic speaker recognition based on a Bayesian framework and Gaussian Mixture Modelling (GMM). Proc a speaker odyssey—the speaker recognition workshop, pp 145–150
Google Scholar
Alexander A, Botti F, Drygajlo A (2004) Handling mismatch in corpus-based forensic speaker recognition. Proc odyssey04 the speaker and language recognition workshop, pp 69–74
Google Scholar
Ramos D, Gonzalez-Rodriguez J, Gonzalez-Dominguez J, Lucena-Molina JJ (2008) Addressing database mismatch in forensic speaker recognition with Ahumada III: A public real-casework database in Spanish Proc Interspeech, pp 1493–1496
Google Scholar
Thiruvaran T, Ambikairajah E, Epps J (2008) FM features for automatic forensic speaker recognition. Proc Interspeech, pp 1497–1500
Google Scholar
Becker T, Jessen M, Grigoras C (2008) Forensic speaker verification using formant features and Gaussian Mixture Models. Proc Interspeech, pp 1505–1508
Google Scholar
Becker T, Jessen M, Alsbach S, Bross F, Meier T (2010) SPES: The BKA forensic automatic voice comparison system. Proc Odyssey—the Speaker and Language Recognition Workshop, pp 58–62
Google Scholar
Hermansky H (1989) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87:1738–1752
Article Google Scholar
Paul JE, Rabinowitz AS, Riganati JP, Richardson JM (1975) Semi-automatic speaker identification system (SASIS)—analytical studies. Final Report C74–11841501, Rockwell International
Google Scholar
Bunge E (1977) Speaker recognition by computer. Philips Tech. Review 37(8):207–219
Google Scholar
Nakasone H, Melvin C (1989) C.A.V.I.S.: (Computer assisted voice identification system). Final Report 85-IJ-CX-0024. National Institute of Justice
Google Scholar
Falcone M, De Sairo N (1994) A PC speaker identification system for forensic use: IDEM. Proc ESCA workshop on automatic speaker recognition, identification and verification, pp 169–172
Google Scholar
Gonzalez-Rodriguez J, Ortega-Garcia J, Lucena-Molina JJ (2001) IdentiVox: a PC-Windows tool for text-independent speaker recognition in forensic environments. Prob Forensic Sci 47:246–253
Google Scholar
Drygajlo A, Meuwly D, Alexander A (2003) Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition. Proc Eurospeech, pp 689–692
Google Scholar
Agnitio, Sociedad Limitada. http://www.agnitio.es/index.php
Google Scholar
Morrison GS (2009) Forensic voice comparison and the paradigm shift. Sci Justice 49:298–308
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Research Institute of Police Science, 6-3-1 Kashiwanoha, Kashiwa-shi, 277-0882, Chiba, Japan
Kanae Amino Ph.D., Takashi Osanai Ph.D., Toshiaki Kamada B.E. & Hisanori Makinae Ph.D.
Department of Electrical and Electronics Engineering, Sophia University, 7-1 Kioi-cho, Chiyoda-ku, 102-8554, Tokyo, Japan
Takayuki Arai Ph.D.

Authors

Kanae Amino Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Osanai Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Toshiaki Kamada B.E.
View author publications
You can also search for this author in PubMed Google Scholar
Hisanori Makinae Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar
Takayuki Arai Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kanae Amino Ph.D. .

Editor information

Editors and Affiliations

Lingustic Technology Systems, Palisade Ave Apt 1809 800, Fort Lee, 07024-4121, New Jersey, USA
Amy Neustein
Near Indroda Circle, DA-IICT, Room 4103, Faculty Block 4, Gandhinagar, 382 007, Gujarat, India
Hemant A. Patil

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Amino, K., Osanai, T., Kamada, T., Makinae, H., Arai, T. (2012). Historical and Procedural Overview of Forensic Speaker Recognition as a Science. In: Neustein, A., Patil, H. (eds) Forensic Speaker Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0263-3_1

Download citation

DOI: https://doi.org/10.1007/978-1-4614-0263-3_1
Published: 04 October 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-0262-6
Online ISBN: 978-1-4614-0263-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics