Skip to main content

Similarity Visualization for the Grouping of Forensic Speech Recordings

  • Conference paper
Computational Forensics (IWCF 2008)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5158))

Included in the following conference series:

Abstract

In a forensic phone wiretapping investigation, a major problem is to get the full picture of the speakers involved. Typically, the wiretapped speech recordings are grouped using a clustering tool. The main disadvantage of such an approach is that in a bootstrapped scenario grouping errors accumulate. In this paper, we propose a visual approach to find similar speech recordings that probably stem from the same speaker. We first model the speech recordings and define suitable similarity measures between recordings. Then, through an approximate 2-D visualization of the inter-speech, similarities the investigator can identify clear groups of recordings and recordings that are harder to differentiate. We did extensive experiments on phone data of 50 speakers with 2 recordings per speaker. We tested quality of the 2-D visualization in relation to original high dimensional similarities. It turned out that for the original high dimensional similarity measure the nearest recording is almost always the one from the same speaker. In the 2-D visualization, we achieved that on average for all speech recordings a recording of the same speaker is among the 10 nearest recordings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schmidt-Nielsen, A., Crystal, T.H.: Human vs. machine speaker identification with telephone speech. In: Proceedings of the 5th International Conference on Spoken Language Processsing, Sydney, Australia (1998)

    Google Scholar 

  2. Ezzaidi, H., Rouat, J.: Speaker identification by computer and human evaluated on the SPIDRE corpus. Canadian Acoustics 28(3) (2000)

    Google Scholar 

  3. Alexander, A., Dessimoz, D., Botti, F., Drygajlo, A.: Aural and automatic forensic speaker recognition in mismatched conditions. Forensic Linguistics: The International Journal of Speech, Language and the Law 12(2) (2005)

    Google Scholar 

  4. Przybocki, M.A., Martin, A.F., Le, A.N.: NIST speaker recognition evaluation chronicles –part 2. In: Proceedings of the Odyssey 2006. Speaker and Language Recognition Workshop, San Juan, Puerto Rico, pp. 1–6 (2006)

    Google Scholar 

  5. Martin, A.F., Przybocki, M.A.: The NIST speaker recognition evaluations: 1996-2001. In: Proceedings of the Odyssey 2001. Speaker and Language Recognition Workshop, Crete, Greece (2001)

    Google Scholar 

  6. Gonzalez-Rodriguez, J., Drygajlo, A., Ramos-Castro, D., Garcia-Gomar, M., Ortega-Garcia, J.: Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Computer Speech and Language 20, 331–355 (2006)

    Article  Google Scholar 

  7. Bimbot, F.B., Fredouille, J.F., Gravier, C., Magrin-Chagnolleau, G., Meignier, I., Merlin, S., Ortega-Garcia, T., Petrovska-Delacretaz, J., Reynolds, D.A.: A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing, 430–451 (2004)

    Google Scholar 

  8. Kim, J.K., Shin, D.S., Bae, M.J.: A study on the improvement of speaker recognition system by voiced detection. In: Proceedings of the 2002 45th Midwest Symposium on Circuits and Systems, Tulsa, USA, vol. 3, pp. 324–327 (2002)

    Google Scholar 

  9. Krishnamachari, K., Yantorno, R.: Spectral autocorrelation ratio as a usability measure of speech segments under co-channel conditions. In: Proceedings of the IEEE International Symposium on Intelligent Signal Processing and Communication Systems, Honolulu, USA (2000)

    Google Scholar 

  10. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics Speech and Signal Processing 28(4) (1980)

    Google Scholar 

  11. Campbell, J.: 8: Speaker Recognition. In: Biometrics - Personal Identification in Networked Society. Springer, Heidelberg (2002)

    Google Scholar 

  12. Reynolds, D.A.: An overview of automatic speaker recognition technology. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, USA, pp. 4072–4075 (2002)

    Google Scholar 

  13. Atal, B.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55(6), 1304–1312 (1974)

    Article  Google Scholar 

  14. Furui, S.: Speaker Recognition. In: Survey of the State of the Art in Human Language Technology, pp. 36–41. Cambridge University Press, Cambridge (1998)

    Google Scholar 

  15. Johnsen, M.H., Svendsen, T., Harborg, E.: Experiments on cepstral mean subtraction (CMS) and rasta-filtering applied to SAMPA phoneme recognition. In: COST 249, Nancy, France (1995)

    Google Scholar 

  16. Reynolds, D.A., Rose, R.: Robust text independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3(1), 72–83 (1995)

    Article  Google Scholar 

  17. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)

    Article  Google Scholar 

  18. Barras, C., Gauvain, J.L.: Feature and score normalization for speaker verification of cellular data. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, vol. 2, pp. 49–52 (2003)

    Google Scholar 

  19. Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support Vector Machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters 13(5), 308–311 (2006)

    Article  Google Scholar 

  20. Schwartz, G.: Estimation of the dimension of a model. Annals of Statistics 6, 461–464 (1978)

    Article  MathSciNet  Google Scholar 

  21. Torgerson, W.S.: Theory and methods of scaling. John Wiley, New York (1958)

    Google Scholar 

  22. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sargur N. Srihari Katrin Franke

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Weiand, K.A., Bouten, J.S., Veenman, C.J. (2008). Similarity Visualization for the Grouping of Forensic Speech Recordings. In: Srihari, S.N., Franke, K. (eds) Computational Forensics. IWCF 2008. Lecture Notes in Computer Science, vol 5158. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85303-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85303-9_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85302-2

  • Online ISBN: 978-3-540-85303-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics