Advertisement

International Journal of Speech Technology

, Volume 17, Issue 1, pp 27–36 | Cite as

Wavelet basis selection for enhanced speech parametrization in speaker verification

  • Todor Ganchev
  • Mihalis Siafarikas
  • Iosif Mporas
  • Tsenka Stoyanova
Article

Abstract

We study the inherent properties of nine wavelet functions and subsequently evaluate their applicability as basis functions in a speech parametrization scheme that is advantageous for speaker verification. Particularly, the inherent properties of nine candidate basis functions are initially analysed and their advantages and disadvantages are discussed. Subsequently, all candidates are employed in a well-proven speech parametrization scheme, and the resulting speech features are computed. Finally, these speech features are evaluated in a common experimental set-up on the speaker verification task. The experimental results, obtained on two well-known speaker recognition databases, show that the Battle-Lemarié wavelet function is the most advantageous one, among all other functions evaluated here, since it leads to the most beneficial speech descriptors. When compared to the baseline Mel-frequency cepstral coefficients (MFCC), a relative reduction of the equal error rate by 4.2 % was observed on the 2001 NIST speaker recognition evaluation database, and by 2.3 % on the Polycost speaker recognition database.

Keywords

Time-frequency signal processing Wavelets Speech analysis Parametric representation of speech Speaker recognition 

References

  1. Battle, G. (1987). A block spin construction of ondelettes. Part I: Lemarié functions. Communications in Mathematical Physics, 110, 601–615. CrossRefMathSciNetGoogle Scholar
  2. Beylkin, G., Coifman, R., & Rokhlin, V. (1991). Fast wavelet transforms and numerical algorithms. Communications on Pure and Applied Mathematics, 44, 141–183. CrossRefMATHMathSciNetGoogle Scholar
  3. Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM. CrossRefMATHGoogle Scholar
  4. Erzin, E., Cetin, A. E., & Yardimci, Y. (1995). Subband analysis for speech recognition in the presence of car noise. In Proc. of the ICASSP-95 (Vol. 1, pp. 417–420). Google Scholar
  5. ETSI ES 201 108, V1.1.2 (2000-4) (2000). ETSI Standard: speech processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm, April 2000, Chap. 4, pp. 8–11. Google Scholar
  6. ETSI ES 202 050, V1.1.5 (2007-1) (2007). ETSI Standard: speech processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end speech reconstruction algorithm, January 2007, Sect. 5.3, pp. 21–24. Google Scholar
  7. Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198. CrossRefGoogle Scholar
  8. Ganchev, T. (2005). Speaker recognition. PhD dissertation, Dept. of Electrical and Computer Engineering, University of Patras, Greece, Nov. 2005. Google Scholar
  9. Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2002). A speaker verification system based on probabilistic neural networks. In 2002 NIST speaker recognition evaluation, results CD workshop presentations & final release of results, Vienna, Virginia, USA. Google Scholar
  10. Guido, R. C., Vieira, L. S., Junior, S. B., Sanchez, F. L., Maciel, C. D., Fonseca, E. S., & Pereira, J. C. (2007). A neural-wavelet architecture for voice conversion. Neurocomputing, 71, 174–180. CrossRefGoogle Scholar
  11. Hennebert, J., Melin, H., Petrovska, D., & Genoud, D. (2000). Polycost: a telephone-speech database for speaker recognition. Speech Communication, 31(2–3), 265–270. CrossRefGoogle Scholar
  12. Lemarié, P. G. (1988). Ondelettes à localisation exponentielle. Journal de Mathématiques Pures et Appliquées, 67, 227–236. MATHGoogle Scholar
  13. Li, J., Tang, Y., Yan, Z., & Zhang, W. (2001). Uniform analytic construction of wavelet analysis filters based on sine and cosine trigonometric functions. Applied Mathematics and Mechanics, 22(5), 569–585. CrossRefMATHMathSciNetGoogle Scholar
  14. Long, C. J., & Datta, S. (1996). Wavelet based feature extraction for phoneme recognition. In Proc. of the ICSLP-96 (Vol. 1, pp. 264–267). Google Scholar
  15. Mallat, S. (1998). A wavelet tour of signal processing. San Diego: Academic Press. MATHGoogle Scholar
  16. Moore, B. C. J. (2003). An introduction to the psychology of hearing (5th edn.). London: Academic Press. Google Scholar
  17. NIST SRE Plan (2001). The NIST year 2001 speaker recognition evaluation plan. National Institute of Standards and Technology of USA. Available: http://www.nist.gov/speech/tests/spk/2001/doc/2001-spkrec-evalplan-v05.9.pdf.
  18. NIST SRE Plan (2002). The NIST year 2002 speaker recognition evaluation plan. National Institute of Standards and Technology of USA. Available: http://www.nist.gov/speech/tests/spk/2002/doc/2002-spkrec-evalplan-v60.pdf.
  19. Nogueira, W., Giese, A., Edler, B., & Büchner, A. (2006). Wavelet packet filterbank for speech processing strategies in cochlear implants. In Proc. of the IEEE ICASSP 2006 (Vol. 5, pp. 121–124). Google Scholar
  20. Polycost Bugs (1999). A list of known bugs in version 1.0 of POLYCOST database. The Polycost web-page. Available. http://circhp.epfl.ch/polycost/polybugs.htm.
  21. Rabiner, L. R., Cheng, M. J., Rosenberg, A. E., & McGonegal, C. A. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(5), 399–418. CrossRefGoogle Scholar
  22. Sarikaya, R., & Hansen, J. H. L. (2000). High resolution speech feature parameterization for monophone-based stressed speech recognition. IEEE Signal Processing Letters, 7(7), 182–185. CrossRefGoogle Scholar
  23. Sarikaya, R., Pellom, B. L., & Hansen, J. H. L. (1998). Wavelet packet transform features with application to speaker identification. In Proc. of the IEEE nordic signal processing symposium (NORSIG’98) (pp. 81–84). Google Scholar
  24. Siafarikas, M., Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2007). Wavelet packet approximation of critical bands for speaker verification. International Journal of Speech Technology, 10(4), 197–218. CrossRefGoogle Scholar
  25. Slaney, M. (1998). Auditory toolbox. Version 2 (Technical Report #1998-010). Interval Research Corporation. Google Scholar
  26. Tufekci, Z., & Gowdy, J. N. (2000). Feature extraction using discrete wavelet transform for speech recognition. In Proc. of the IEEE SoutheastCon 2000 (pp. 116–123). Google Scholar
  27. Yan, R. (2007). Base wavelet selection criteria for non-stationary vibration analysis in bearing health diagnosis. PhD dissertation, Dept. of Mechanical & Industrial Engineering, University of Massachusetts, Amherst, USA, May 2007. Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Todor Ganchev
    • 1
  • Mihalis Siafarikas
    • 2
  • Iosif Mporas
    • 3
    • 4
  • Tsenka Stoyanova
    • 2
  1. 1.Department of ElectronicsTechnical University–VarnaVarnaBulgaria
  2. 2.Department of Electrical & Computer EngineeringUniversity of PatrasRion-PatrasGreece
  3. 3.Dept. of ECEUniversity of PatrasPatrasGreece
  4. 4.Dept. of Mechanical EngineeringTechnological Educational Institute of PatrasPatrasGreece

Personalised recommendations