An Interrogation Speech Manipulation Detection Method Using Speech Fingerprinting and Watermarking

  • Shinnya TakahashiEmail author
  • Kazuhiro Kondo
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 110)


We proposed a manipulation detection method for interrogation speech. We used a robust fingerprinting method optimized for speech since our intended target is interrogation speech recorded during a police investigation. The fingerprint uses line spectral pairs (LSP) to measure the spectral envelope of the speech, and is coarsely quantized so that the fingerprint will not be altered by small degradation in the signal, but will be altered enough by malicious modifications to the speech content. This fingerprint is embedded in the speech signal using conventional spread-spectrum watermarks. To detect manipulation, the watermarked fingerprint is detected, and compared to the fingerprint extracted from the speech itself. If the fingerprints match within the predetermined tolerance, it can be authenticated to be unaltered. Otherwise, manipulation should be suspected.

We conducted manipulation detection on a frame by frame basis, and confirmed that we can correctly detect manipulation with noisy and reverberant speech in almost all of the substituted frames.


Interrogation speech Manipulation detection Audio watermark Speech fingerprinting Line spectral pairs 



This work was supported in part by the Cooperative Research Project Program of the Research Institute of Electrical Communication, Tohoku University (H29/A18).


  1. 1.
    Kyodo News: Japanese police to tape all interrogations of suspects facing lay judge trials. The Japan Times online article, September 2016.
  2. 2.
    Kukucka, J.: Lights, camera, justice: The value of recording police investigations. The Huffington Post online article, July 2014.
  3. 3.
    Takahashi, S., kondo, K.: Towards an interrogation speech manipulation detection method using speech fingerprinting and watermarking. In: Proceedings of IIHMSP, Matsue (2017)Google Scholar
  4. 4.
    Itakura, F.: Line spectrum representation of linear prediction coefficients of speechsignals. J. Acoust. Soc. Am. 57, 535 (1975)CrossRefGoogle Scholar
  5. 5.
    Sugamura, N., Itakura, F.: Speech data compression by LSP analysis-synthesis technique. Trans. Inst. Electron., Inf. Commun. Eng. J64-A(8) (Aug 1981), in JapaneseGoogle Scholar
  6. 6.
    Boney, L., Tewkfik, A.H., Hamdy, K.N.: Digital watermarks for audio signals. In: Proceedings of the IEEE International Conference on Multimedia Computing and Systems. IEEE, Hiroshima (1996)Google Scholar
  7. 7.
    Habets, E: Room impulse response generator, September 2010.
  8. 8.
    NII Speech Resources Consortium: ASJ continuous speech corpus for research. Accessed 2 Mar 2016

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Graduate School of Science and EngineeringYamagata UniversityYonezawaJapan

Personalised recommendations