Study on Speech Representation Based on Spikegram for Speech Fingerprints

Tran, Dung Kim; Unoki, Masashi

doi:10.1007/978-3-319-63859-1_20

Study on Speech Representation Based on Spikegram for Speech Fingerprints

Dung Kim Tran⁷ &
Masashi Unoki⁷

Conference paper
First Online: 18 July 2017

1194 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 82))

Abstract

This paper investigates the abilities of spikegrams in representing the content and voice identifications of speech signals. Current speech representation models employ block-based coding techniques to transform speech signals into spectrograms to extract suitable features for further analysis. One issue with this approach is that a speaker produces different speech signals for the same speech content; therefore, processing speech signals in a piecewise manner will result in different spectrograms, and consequently, different fingerprints will be produced for the same spoken words by the same speaker. For this reason, the consistency of speech representation models in the variations of speech is essential to obtain accurate and reliable speech fingerprints. It has been reported that sparse coding surpasses block-based coding in representing speech signals in the way that it is able to capture the underlying structures of speech signals. An over-complete representation model – known as a spikegram – can be created by using a matching pursuit algorithm and Gammatone dictionary to provide a better alternative to a spectrogram. This paper reports the ability of spikegrams in representing the speech content and voice identities of speakers, which can be used for improving the robustness of speech fingerprints.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Cano, P., Batle, E., Kalker, T., Haitsma, J.: A review of algorithms for audio fingerprinting. In: IEEE Workshop Multimedia Signal Processing (2002)
Google Scholar
Wang, A.L.-C.: An Industrial-Strength Audio Search Algorithm (2003)
Google Scholar
Milano, D.: Content Control: Digital Watermarking and Fingerprinting, White Paper: Video Water Marking and Fingerprinting
Google Scholar
Pichevar, R., Najaf-Zadeh, H., Thibault, L., Lahdili, H.: Auditory-inspired sparse representation of audio signals. Speech Commun. 53(5), 643–657 (2011)
Article Google Scholar
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
Article MATH Google Scholar
Evan, S., Lewicki, M.S.: Efficient coding of time-relative structure using spikes. Neural Comput. 17(1), 19–45 (2005)
Article MATH Google Scholar
Mineault, P.: Matching pursuit for 1D signals. https://www.mathworks.com/matlabcentral/fileexchange/32426-matching-pursuit-for-1d-signals
Unoki, M., Akagi, M.: A method of signal extraction from noisy signal based on auditory scene analysis. Speech Commun. 27(3–4), 261–279 (1999)
Article Google Scholar
Ellis, D.: Robust Landmark-Based Audio Fingerprinting. https://labrosa.ee.columbia.edu/matlab/fingerprint/
He, D.C., Wang, L.: Texture unit, texture spectrum, and texture analysis. IEEE Trans. Geosci. Remote Sens. 28(4), 509–512 (1990)
Article Google Scholar

Download references

Acknowledgments

This work was supported by a Grant-in-Aid for Scientific Research (B) (No. 17H01761).

Author information

Authors and Affiliations

School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan
Dung Kim Tran & Masashi Unoki

Authors

Dung Kim Tran
View author publications
You can also search for this author in PubMed Google Scholar
Masashi Unoki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Dung Kim Tran or Masashi Unoki .

Editor information

Editors and Affiliations

Fujian Provincial Key Lab of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, Fujian, China
Jeng-Shyang Pan
Swinburne University of Technology, Hawthorn, Victoria, Australia
Pei-Wei Tsai
Universiti Teknologi Petronas, Teronoh, Malaysia
Junzo Watada
University of Canberra, Bruce, Aust Capital Terr, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran, D.K., Unoki, M. (2018). Study on Speech Representation Based on Spikegram for Speech Fingerprints. In: Pan, JS., Tsai, PW., Watada, J., Jain, L. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2017. Smart Innovation, Systems and Technologies, vol 82. Springer, Cham. https://doi.org/10.1007/978-3-319-63859-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-63859-1_20
Published: 18 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63858-4
Online ISBN: 978-3-319-63859-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics