Abstract
Forensic phonetics and acoustics are nowadays widely used regarding police and legal use of acoustic samples. Among many tasks included in this area, forensic speaker recognition is considered as one of the most complex problems. Forensic speaker recognition, sometimes called forensic speaker comparison, is a process for making judgments on whether or not two speech samples are from the same speaker. This chapter introduces the historical backgrounds of forensic speaker recognition including “voiceprint” controversy, human-based visual and auditory forensic speaker recognition, and automatic forensic speaker recognition. Procedural considerations in forensic speaker recognition processes and factors that affect recognition performances are also presented. Finally, we will give a summary of the progress and developments made in the forensic automatic speaker recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nolan F (1983) The phonetic basis of speaker recognition. Cambridge studies in speech science and communiation. Cambridge University Press, Cambridge
Schmidt-Nielsen A, Stern KR (1985) Identification of known voices as a function of familiarity and narrow-band coding. J Acoust Soc Am 77:658–663
Van Lacker D, Kreiman J, Emmorey K (1985) Familiar voice recognition: patterns and parameters part 1: recognition of backward voices. J Phonetics 13:19–38
Van Lacker D, Kreiman J (1985) Familiar voice recognition: patterns and parameters part 2: recognition of rate-altered voices. J Phonetics 13:39–52
Cheney D, Seyfarth R (1980) Vocal recognition in free-ranging vervet monkeys. Anim Behav 28:362–367
Rendall D, Rodman PS, Emond RE (1996) Vocal recognition of individuals and kin in free-ranging rhesus monkeys. Anim Behav 51:1007–1015
Sugiura H (2001) Vocal exchange of coo calls in Japanese macaques. In: Matsuzawa T (ed) Primate origins of human cognition and behaviour. Springer, Tokyo, pp 135–154
Bricker P, Pruzansky S (1976) Speaker recognition. In: Lass N (ed) Contemporary issues in experimental phonetics. Academic Press, New York, pp 295–326
Furui S (1992) Acoustic and speech engineering (onkyo, onsei kougaku). Kindai Kagakusha Publishing Company, Tokyo
National Research Council (1979) On the theory and practice of voice identification. National Academy of Science, Washington, pp 3–13
Steinberg JC (1934) Application of sound measuring instruments to the study of phonetic problems. J Acoust Soc Am 6:16–24
Potter R (1945) Visible patterns of speech. Science 102:463–470
Grey CHG, Kopp GA (1944) Voiceprint identification. Bell Telephone Laboratory Annual Report, New York, pp 1–14
Tosi O, Oyer H, Lashbrook W, Pedrey C, Nicol J, Nash E (1972) Experiment on voice identification. J Acoust Soc Am 51:2030–2043
Kersta L (1962) Voiceprint identification. Nature 196:1253–1257
Campbell JP, Shen W, Campbell WM, Schwartz R, Bonastre JF, Matrouf D (2009) Forensic speaker recognition. IEEE Signal Process Mag 26:95–103
Young MA, Campbell RA (1967) Effects of context on talker identification. J Acoust Soc Am 42:1250–1254
Tosi O (1968) Speaker identification through acoustic spectrography. Proc Logoped Phoniatr, pp 138–145
Stevens KN, Williams CE, Carbonell JR, Woods B (1968) Speaker authentication and identification: a comparison of spectrographic and auditory presentations of speech material. J Acoust Soc Am 44:1596–1607
Bolt RH, Cooper FS, David EE Jr, Denes PB, Pickett JM, Stevens KN (1970) Speaker identification by speech spectrograms: a scientists’ view of its reliability for legal purposes. J Acoust Soc Am 47:597–612
Bolt RH, Cooper FS, David EE Jr, Denes PB, Pickett JM, Stevens KN (1973) Speaker identification by speech spectrograpms: some further observations. J Acoust Soc Am 54:531–534
Koenig BE (1986) Spectrographic voice identification: a forensic survey. J Acoust Soc Am 79:2088–2090
Shipp T, Doherty TE, Hollien H (1987) Some fundamental considerations regarding voice identification. J Acoust Soc Am 82:687–688
Koenig BE, Ritenour DV Jr, Kohus BA, Kelly S (1987) Reply to ‘Some fundamental considerations regarding voice identification’. J Acoust Soc Am 82:688–689
Lindh J (2004) Handling the voiceprint issue. Proc Fonetik, pp 72–75
Poza FT, Begault DR (2005) Voice identification and elimination using sural-spectrographic protocols. Proc AES Int’l Conf, pp 1–8
McGehee F (1937) The reliability of the identification of the human voice. J Gen Psychol 17:249–271
McGehee F (1944) An experimental study of voice recognition. J Gen Psychol 31:53–65
Pollack I, Pickett JM, Sumby WH (1954) On the identification of speaker by voice. J Acoust Soc Am 26:403–406
Bricker P, Pruzansky S (1966) Effects of stimulus content and duration on talker identification. J Acoust Soc Am 40:1441–1450
Clifford BR (1980) Voice identification by human listeners: on earwitness reliability. Law Human Behav 4:373–394
Papcun G, Kreiman J, Davis A (1989) Long-term memory for unfamiliar voices. J Acoust Soc Am 85:913–925
Yarmey AD, Matthys E (1992) Voice identification of an abductor. Appl Cogn Psychol 6:367–377
Yarmey AD, Yarmey AL, Yarmey M, Parliament L (2001) Commonsense beliefs and the identification of familiar voices. Appl Cogn Psychol 15:283–299
O’Shaughnessy D (2001) Speech communication—human and machine, 2nd edn. Addison-Wesley Publishing Company, New York
Hollien H (2002) Forensic voice identification. Academic Press, San Diego
Bonastre JF, Bimbot F, Boe LJ, Campbell JP, Reynolds DA, Magrin-Chagnolleau I (2003) Person authentication by voice: a need for caution. Proc Eurospeech, pp 1–4
Denes PB, Pinson EN (1993) The speech chain, 2nd edn. Worth Publishers, New York
Kuenzel H (2000) Effects of voice disguise on speaking fundamental frequency. Forensic Ling 7:149–179
Zhang C, Tan T (2007) Voice disguise and automatic speaker recognition. Forensic Sci Int 175:118–122
Reich AR, Duke JE (1979) Effects of selected vocal disguises upon speaker identification by listening. J Acoust Soc Am 66:1023–1028
Orchard TL, Yarmey AD (1995) The effects of whispers, voice-sample duration, and voice distinctiveness on criminal speaker identification. Appl Cogn Psychol 9:249–260
Sjoestroem M, Eriksson E, Zetterholm E, Sullivan KP (2006) A switch of dialect as disguise. Lund Univ. Linguistics and Phonetics Woking Papers, vol 52, pp 113–116
Markham D (1999) Listeners and disguised voices: the imitation and perception of dialect accent. J Speech Lang Law 6:289–299
Amino K, Arai T (2009) Dialectal characteristics of Osaka and Tokyo Japanese: analyses of phonologically identical words. Proc Interspeech, pp 2303–2306
House AS, Stevens KN (1993) Speech production: thirty years after. J Acoust Soc Am 94:1763
Hollien H, Schwartz R (2000) Aural-perceptual speaker identification: problems with noncontemporary samples. Forensic Linguist 7:199–211
Hollien H, Schwartz R (2001) Speaker identification utilizing noncontemporary speech. J Forensic Sci 46:63–67
Amino K, Osanai T, Kamada T, Makinae H, Arai T (2011) Effects of the phonological contents and transmission channels on forensic speaker recognition. In: Neustein A, Patil HA (eds) Advances in forensic speaker recognition. Springer
Kuenzel HJ (2001) Beware of the ’telephone effect’: the influence of telephone transmission on the measurement of formant frequencies. Forensic Liguist 8:80–99
Byne C, Foulkes P (2004) The ‘mobile phone effect’ on vowel formants. J Speech Lang Law 11:1350–1771
Lawrence S, Nolan F, McDougall K (2008) Acoustic and perceptual effects of telephone transmission on vowel quality. J Speech Lang Law 15:161–192
Titze I (1989) Physiologic and acoustic differences between male and female voices. J Acoust Soc Am 85:1699–1707
Kent RD, Read C (2001) Acoustic analysis of speech, 2nd edn. Cengage Learning
Clarke FR, Becker RW (1969) Comparison of techniques for discriminating among talkers. J Speech Hear Res 12:747–761
Thompson CP (1987) A language effect in voice identification. Appl Cogn Psychol 1:121–131
Goggin J, Thompson CP, Strube G, Simental LR (1991) The role of language familiarity in voice identification. Mem Cognit 19:448–458
Koester O, Schiller NO (1997) Different influences of the native language of a listener on speaker recognition. Forensic Linguist 4:18–28
Philippon AC, Cherryman J, Bull R, Vrij A (2007) Earwitness identification performances: the effect of language, target, deliberate strategies and indirect measures. Appl Cogn Psychol 21:539–550
Hashimoto M, Kitagawa S, Higuchi N (1998) Quantitative analysis of acoustic features affecting speaker identification. J Acoust Soc Jpn 54:169–178
Hollien H, Majewski W, Doherty TE (1982) Perceptual identification of voices under normal, stress, and disguise speaking conditions. J Phonetics 10:139–148
Ladefoged P, Ladefoged J (1980) The ability of listeners to identify voices. UCLA Working Papers Phon 49:43–89
Nygaard L (2005) Perceptual integration of linguistic and nonlinguistic properties of speech. In: Pisoni DB, Remez RE (eds) The handbook of speech perception. Blackwell, Oxford, pp 390–413
Roebuck R, Wilding J (1993) Effects of vowel variety and sample length on identification of a speaker in a line-up. Appl Cogn Psychol 7:475–481
Cook S, Wilding J (1997) Earwitness testimony: never mind the variety, hear the length. Appl Cogn Psychol 11:95–111
Loftus EF, Loftus GR, Messo J (1987) Some facts about weapon focus. Law Human Behav 11:55–62
Loftus EF, Miller DG, Burns HJ (1978) Semantic integration of verbal information into a visual memory. J Exp Psychol Human Learn Mem 4:19–31
Schooler JW, Engstler-Schooler TY (1990) Verbal overshadowing of visual memories: some things are better left unsaid. Cogn Psychol 22:36–71
Chin JM, Schooler JW (2008) Why do words hurt? Content, process, and criterion shift accounts of verbal overshadowing. Eur J Cogn Psychol 20:396–413
Kitagami S (2001) Disruptive effect of verbal encoding on memory and cognition of nonverbal information. Kyoto Univ Dept Edu Bull Paper 47:403–413
Kasahara H, Ochi K (2008) Verbal overshadowing effect in earwitness perception. Proc Ann Conv Jpn Psychol Assoc 72:889
Cook S, Wilding J (2001) Earwitness testimony: effects of exposure and attention on the face overshadowing effect. Br J Psychol 92:617–629
Kasahara H, Ochi K (2006) Effect of face presence on memory for a voice. J Jpn Acad Facial Studies 6:71–76
Yarmey AD, Yarmey AL, Yarmey MJ (1994) Face and voice identifications in showups and lineups. Appl Cogn Psychol 8:453–464
Bull R, Clifford BR (1984) Earwitness voice recognition accuracy. In: Wells GL, Loftus EF (eds) Eyewitness testimony: psychological perspectives. Cambridge University Press, Cambridge, pp 92–123
Kerstholt JH, Jansen N, Van Amelsvoort AG, Broeders AP (2004) Earwitnesses: effects of speech duration, retention, internal and acoustic environment. Appl Cogn Psychol 18:327–336
Van Wallendael LR, Surace A, Parsons DH, Brown M (1994) Earwitness’ voice recognition: factors affecting accuracy and impact on jurors. Appl Cogn Psychol 8:661–677
Pruzansky S (1963) Pattern-matching procedure for automatic talker recognition. J Acoust Soc Am 35:354–358
Li KP, Dammann JE, Chapman WD (1966) Experimental studies in speaker verification, using and adaptive system. J Acoust Soc Am 40:966–978
Glenn JW, Kleiner N (1967) Speaker identification based on nasal phonation. J Acoust Soc Am 43:368–372
Furui S, Itakura F, Saito S (1972) Talker recognition by the longtime averaged speech spectrum. IEICE Trans 55-A(1):549–556
Wolf JJ (1971) Efficient acoustic parameters for speaker recognition. J Acoust Soc Am 51:2044–2056
Atal BS (1972) Automatic speaker recognition based on pitch contours. J Acoust Soc Am 52:1687–1697
Furui S, Itakura F (1973) Talker recognition by statistical features of speech sounds. Electron Commun Jap 56-A:62–71
Atal BS (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55:1304–1312
Sambur MR (1975) Selection of acoustic features for speaker identification. IEEE Trans Acoust Speech Sig Process 23:176–182
Hollien H, Majewski W (1977) Speaker identification by long-term spectra under normal and distorted speech conditions. J Acoust Soc Am 62:975–980
Matsumoto H, Nimura T (1978) Text-independent speaker identification based on piecewise canonical discriminant analysis. Proc Int Conf Acoust Speech Sig Process, 3:291–294
Markel JD, Davis SB (1979) Text-independent speaker recognition from a large linguistically unconstrained time spaced data base. IEEE Trans Acoust Speech Sig Process 27:74–82
Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Sig Process 29:254–272
Li KP, Wrench EH (1983) Text-independent speaker recognition with short utterances. Proc Int Conf Acoust Speech Sig Process, 8:555–558
Soong F, Rosenberg A, Rabiner L, Juang BH (1985) A vector quantization approach to speaker recognition. Proc Int Conf Acoust Speech Sig Process, 387–390
Rosenberg A, Soong F (1986) Evaluation of a vector quantisation talker recognition system in text independent and text dependent modes. Proc Int Conf Acoust Speech Sig Process, 11:873–876
Shirai K, Mano K, Ishige D (1987) Speaker identification based on frequency distribution of vector-quantised spectra. IEICE Trans 70-D:1181–1188
Rosenberg A, Lee CH, Soong F (1990) Sub-word unit talker verification using Hidden Markov Models. Proc Int Conf Acoust Speech Sig Process, 1:269–272
Higgins A, Bahler L, Porter J (1991) Speaker verification using randomized phrase prompting. Digit Signal Process 1:89–106
Tishby NZ (1991) On the application of mixture AR Hidden Markov Models to text-independent speaker recognition. IEEE Trans Acoust Speech Sig Process 39:563–570
Reynolds AD, Carlson B (1995) Text-dependent speaker verification using decoupled and integrated speaker and speech recognizers. Proc Eurospeech, pp 647–650
Reynolds AD, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audi Process 3:72–83
Che C, Lin Q (1995) Speaker recognition using HMM with experiments on the YOHO database. Proc Eurospeech, pp 625–628
NIST webpage. http://www.nist.gov/index.html
NIST-SRE. http://www.itl.nist.gov/iad/mig//tests/sre/
Doddington GR, Przybocki MA, Martin AF, Reynolds DA (2000) The NIST speaker recognition evaluation—overview, methodology, systems, results, perspective. Speech Commun 31:225–254
Nakasone H, Beck SD (2001) Forensic automatic speaker recognition. Proc A Speaker Odyssey—the speaker recognition workshop, pp 139–142
Drygajlo A (2007) Forensic automatic speaker recognition. IEEE Signal Process Mag 24:132–135
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The DET curve in assessment of detection task performance. Proc Eurospeech, pp 1895–1898
Bimbot F, Bonastre JF, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-Garcia J, Petrovska-Delacretaz D, Reynolds DA (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Process 4:430–451
Noda H, Darada K, Kawaguchi E, Sawai H (1998) A context-dependent approach for speaker verification using sequential decision. Proc Int Conf Spoken Lang Process
Ortega-Garcia J, Cruz-Llanas S, Gonzalez-Rodriguez J (1998) Quantitative influence of speech variability factors for automatic speaker verification in forensic tasks. Proc Int Conf Spoken Lang Process
Gonzalez-Rodriguez J, Ortega-Garcia J, Lucena-Molina JJ (2001) On the application of the Bayesian approach to real forensic conditions with GMM-based systems. Proc a speaker odyssey—the speaker recognition workshop, pp 135–138
Meuwly D, Drygajlo A (2001) Forensic speaker recognition based on a Bayesian framework and Gaussian Mixture Modelling (GMM). Proc a speaker odyssey—the speaker recognition workshop, pp 145–150
Alexander A, Botti F, Drygajlo A (2004) Handling mismatch in corpus-based forensic speaker recognition. Proc odyssey04 the speaker and language recognition workshop, pp 69–74
Ramos D, Gonzalez-Rodriguez J, Gonzalez-Dominguez J, Lucena-Molina JJ (2008) Addressing database mismatch in forensic speaker recognition with Ahumada III: A public real-casework database in Spanish Proc Interspeech, pp 1493–1496
Thiruvaran T, Ambikairajah E, Epps J (2008) FM features for automatic forensic speaker recognition. Proc Interspeech, pp 1497–1500
Becker T, Jessen M, Grigoras C (2008) Forensic speaker verification using formant features and Gaussian Mixture Models. Proc Interspeech, pp 1505–1508
Becker T, Jessen M, Alsbach S, Bross F, Meier T (2010) SPES: The BKA forensic automatic voice comparison system. Proc Odyssey—the Speaker and Language Recognition Workshop, pp 58–62
Hermansky H (1989) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87:1738–1752
Paul JE, Rabinowitz AS, Riganati JP, Richardson JM (1975) Semi-automatic speaker identification system (SASIS)—analytical studies. Final Report C74–11841501, Rockwell International
Bunge E (1977) Speaker recognition by computer. Philips Tech. Review 37(8):207–219
Nakasone H, Melvin C (1989) C.A.V.I.S.: (Computer assisted voice identification system). Final Report 85-IJ-CX-0024. National Institute of Justice
Falcone M, De Sairo N (1994) A PC speaker identification system for forensic use: IDEM. Proc ESCA workshop on automatic speaker recognition, identification and verification, pp 169–172
Gonzalez-Rodriguez J, Ortega-Garcia J, Lucena-Molina JJ (2001) IdentiVox: a PC-Windows tool for text-independent speaker recognition in forensic environments. Prob Forensic Sci 47:246–253
Drygajlo A, Meuwly D, Alexander A (2003) Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition. Proc Eurospeech, pp 689–692
Agnitio, Sociedad Limitada. http://www.agnitio.es/index.php
Morrison GS (2009) Forensic voice comparison and the paradigm shift. Sci Justice 49:298–308
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Amino, K., Osanai, T., Kamada, T., Makinae, H., Arai, T. (2012). Historical and Procedural Overview of Forensic Speaker Recognition as a Science. In: Neustein, A., Patil, H. (eds) Forensic Speaker Recognition. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0263-3_1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-0263-3_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-0262-6
Online ISBN: 978-1-4614-0263-3
eBook Packages: EngineeringEngineering (R0)