Abstract
This paper investigates the abilities of spikegrams in representing the content and voice identifications of speech signals. Current speech representation models employ block-based coding techniques to transform speech signals into spectrograms to extract suitable features for further analysis. One issue with this approach is that a speaker produces different speech signals for the same speech content; therefore, processing speech signals in a piecewise manner will result in different spectrograms, and consequently, different fingerprints will be produced for the same spoken words by the same speaker. For this reason, the consistency of speech representation models in the variations of speech is essential to obtain accurate and reliable speech fingerprints. It has been reported that sparse coding surpasses block-based coding in representing speech signals in the way that it is able to capture the underlying structures of speech signals. An over-complete representation model – known as a spikegram – can be created by using a matching pursuit algorithm and Gammatone dictionary to provide a better alternative to a spectrogram. This paper reports the ability of spikegrams in representing the speech content and voice identities of speakers, which can be used for improving the robustness of speech fingerprints.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cano, P., Batle, E., Kalker, T., Haitsma, J.: A review of algorithms for audio fingerprinting. In: IEEE Workshop Multimedia Signal Processing (2002)
Wang, A.L.-C.: An Industrial-Strength Audio Search Algorithm (2003)
Milano, D.: Content Control: Digital Watermarking and Fingerprinting, White Paper: Video Water Marking and Fingerprinting
Pichevar, R., Najaf-Zadeh, H., Thibault, L., Lahdili, H.: Auditory-inspired sparse representation of audio signals. Speech Commun. 53(5), 643–657 (2011)
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
Evan, S., Lewicki, M.S.: Efficient coding of time-relative structure using spikes. Neural Comput. 17(1), 19–45 (2005)
Mineault, P.: Matching pursuit for 1D signals. https://www.mathworks.com/matlabcentral/fileexchange/32426-matching-pursuit-for-1d-signals
Unoki, M., Akagi, M.: A method of signal extraction from noisy signal based on auditory scene analysis. Speech Commun. 27(3–4), 261–279 (1999)
Ellis, D.: Robust Landmark-Based Audio Fingerprinting. https://labrosa.ee.columbia.edu/matlab/fingerprint/
He, D.C., Wang, L.: Texture unit, texture spectrum, and texture analysis. IEEE Trans. Geosci. Remote Sens. 28(4), 509–512 (1990)
Acknowledgments
This work was supported by a Grant-in-Aid for Scientific Research (B) (No. 17H01761).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Tran, D.K., Unoki, M. (2018). Study on Speech Representation Based on Spikegram for Speech Fingerprints. In: Pan, JS., Tsai, PW., Watada, J., Jain, L. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2017. Smart Innovation, Systems and Technologies, vol 82. Springer, Cham. https://doi.org/10.1007/978-3-319-63859-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-63859-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63858-4
Online ISBN: 978-3-319-63859-1
eBook Packages: EngineeringEngineering (R0)