Muting Machine Speech Using Audio Watermarking
Spoken dialog systems have become popular and are used in a home environment, such as smart speakers. A problem will occur when two or more smart speakers are in the same environment, in which a dialog system misdetects the other dialog systems voice as a users voice. In this paper, a method to mute synthesized speech is proposed to prevent a speech recognizer from recognizing speech uttered by a machine. The audio watermark technique is used to indicate that a machine utters the speech, and the speech recognizer attenuates the observed speech if it contains the watermark. The watermark is embedded in high frequency so that humans cannot perceive the watermark and the watermark is robustly extracted. From the experimental result, we found that the proposed method robustly determine the existence of the watermark when the SNR is no less than 0 dB.
KeywordsSpoken dialog systems Watermarking Muting
Part of this work was supported by JSPS Kakenhi JP17H00823.
- 3.Furui, S.: Toward robust speech recognition under adverse conditions. In: ESCA Tutorial and Research Workshop on Speech Processing in Adverse Conditions (1992)Google Scholar
- 4.Grant, R., McGregor, P.E.: Method for integrating computer processes with an interface controlled by voice actuated grammars. U.S. Patent No. 6,208,972, March 2001Google Scholar
- 5.Kojima, T., Oizumi, A., Okayasu, K., Parampalli, U.: An audio data hiding based on complete complementary codes and its application to an evacuation guiding system. In: The Sixth International Workshop on Signal Design and Its Applications in Communications, pp. 118–121, October 2013. https://doi.org/10.1109/IWSDA.2013.6849077
- 7.Marx, M.T., et al.: System and method for developing interactive speech applications. U.S. Patent No. 6,173,266, January 2011Google Scholar
- 8.Matsuoka, H., Nakashima, Y., Yoshimura, T.: Acoustic communication system using mobile terminal microphones. NTT DoCoMo Tech. J. 8(2), 4–12 (2006)Google Scholar
- 9.Nakashima, Y., Matsuoka, H., Yoshimura, T.: Evaluation and demonstration of acoustic OFDM. In: 2006 Fortieth Asilomar Conference on Signals, Systems and Computers, pp. 1747–1751, October 2006. https://doi.org/10.1109/ACSSC.2006.355061
- 11.Nishimura, A.: Data hiding for audio signals that are robust with respect to air transmission and a speech codec. In: Proceedings of the International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 601–604, August 2008. https://doi.org/10.1109/IIH-MSP.2008.333
- 12.Nishimura, A.: Encoding data by frequency modulation of a high-low siren emitted by an emergency vehicle. In: 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 255–259, August 2014. https://doi.org/10.1109/IIH-MSP.2014.70
- 13.Suzuki, Y., Nishimura, R., Tao, H.: Audio watermark enhanced by LDPC coding for air transmission. In: Proceedings of the International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 23–26, December 2006. https://doi.org/10.1109/IIH-MSP.2006.265111