Skip to main content

Study on Speech Representation Based on Spikegram for Speech Fingerprints

  • Conference paper
  • First Online:
  • 1194 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 82))

Abstract

This paper investigates the abilities of spikegrams in representing the content and voice identifications of speech signals. Current speech representation models employ block-based coding techniques to transform speech signals into spectrograms to extract suitable features for further analysis. One issue with this approach is that a speaker produces different speech signals for the same speech content; therefore, processing speech signals in a piecewise manner will result in different spectrograms, and consequently, different fingerprints will be produced for the same spoken words by the same speaker. For this reason, the consistency of speech representation models in the variations of speech is essential to obtain accurate and reliable speech fingerprints. It has been reported that sparse coding surpasses block-based coding in representing speech signals in the way that it is able to capture the underlying structures of speech signals. An over-complete representation model – known as a spikegram – can be created by using a matching pursuit algorithm and Gammatone dictionary to provide a better alternative to a spectrogram. This paper reports the ability of spikegrams in representing the speech content and voice identities of speakers, which can be used for improving the robustness of speech fingerprints.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Cano, P., Batle, E., Kalker, T., Haitsma, J.: A review of algorithms for audio fingerprinting. In: IEEE Workshop Multimedia Signal Processing (2002)

    Google Scholar 

  2. Wang, A.L.-C.: An Industrial-Strength Audio Search Algorithm (2003)

    Google Scholar 

  3. Milano, D.: Content Control: Digital Watermarking and Fingerprinting, White Paper: Video Water Marking and Fingerprinting

    Google Scholar 

  4. Pichevar, R., Najaf-Zadeh, H., Thibault, L., Lahdili, H.: Auditory-inspired sparse representation of audio signals. Speech Commun. 53(5), 643–657 (2011)

    Article  Google Scholar 

  5. Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)

    Article  MATH  Google Scholar 

  6. Evan, S., Lewicki, M.S.: Efficient coding of time-relative structure using spikes. Neural Comput. 17(1), 19–45 (2005)

    Article  MATH  Google Scholar 

  7. Mineault, P.: Matching pursuit for 1D signals. https://www.mathworks.com/matlabcentral/fileexchange/32426-matching-pursuit-for-1d-signals

  8. Unoki, M., Akagi, M.: A method of signal extraction from noisy signal based on auditory scene analysis. Speech Commun. 27(3–4), 261–279 (1999)

    Article  Google Scholar 

  9. Ellis, D.: Robust Landmark-Based Audio Fingerprinting. https://labrosa.ee.columbia.edu/matlab/fingerprint/

  10. He, D.C., Wang, L.: Texture unit, texture spectrum, and texture analysis. IEEE Trans. Geosci. Remote Sens. 28(4), 509–512 (1990)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by a Grant-in-Aid for Scientific Research (B) (No. 17H01761).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dung Kim Tran or Masashi Unoki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Tran, D.K., Unoki, M. (2018). Study on Speech Representation Based on Spikegram for Speech Fingerprints. In: Pan, JS., Tsai, PW., Watada, J., Jain, L. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2017. Smart Innovation, Systems and Technologies, vol 82. Springer, Cham. https://doi.org/10.1007/978-3-319-63859-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63859-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63858-4

  • Online ISBN: 978-3-319-63859-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics