Study on Speech Representation Based on Spikegram for Speech Fingerprints
This paper investigates the abilities of spikegrams in representing the content and voice identifications of speech signals. Current speech representation models employ block-based coding techniques to transform speech signals into spectrograms to extract suitable features for further analysis. One issue with this approach is that a speaker produces different speech signals for the same speech content; therefore, processing speech signals in a piecewise manner will result in different spectrograms, and consequently, different fingerprints will be produced for the same spoken words by the same speaker. For this reason, the consistency of speech representation models in the variations of speech is essential to obtain accurate and reliable speech fingerprints. It has been reported that sparse coding surpasses block-based coding in representing speech signals in the way that it is able to capture the underlying structures of speech signals. An over-complete representation model – known as a spikegram – can be created by using a matching pursuit algorithm and Gammatone dictionary to provide a better alternative to a spectrogram. This paper reports the ability of spikegrams in representing the speech content and voice identities of speakers, which can be used for improving the robustness of speech fingerprints.
KeywordsSpeech fingerprint Spikegram Matching pursuit algorithm Gammatone filterbank Non-negative matrix factorization
This work was supported by a Grant-in-Aid for Scientific Research (B) (No. 17H01761).
- 1.Cano, P., Batle, E., Kalker, T., Haitsma, J.: A review of algorithms for audio fingerprinting. In: IEEE Workshop Multimedia Signal Processing (2002)Google Scholar
- 2.Wang, A.L.-C.: An Industrial-Strength Audio Search Algorithm (2003)Google Scholar
- 3.Milano, D.: Content Control: Digital Watermarking and Fingerprinting, White Paper: Video Water Marking and FingerprintingGoogle Scholar
- 7.Mineault, P.: Matching pursuit for 1D signals. https://www.mathworks.com/matlabcentral/fileexchange/32426-matching-pursuit-for-1d-signals
- 9.Ellis, D.: Robust Landmark-Based Audio Fingerprinting. https://labrosa.ee.columbia.edu/matlab/fingerprint/