Skip to main content

Acoustic Features for Estimation of Perceptional Similarity

  • Conference paper
  • 1139 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4810))

Abstract

This paper describes an examination of acoustic features for the estimation of perceptional similarity between speeches. We firstly extract some acoustic features including personality from speeches of 36 persons. Secondly, we calculate each distance between extracted features using Gaussian Mixture Model (GMM) or Dynamic Time Warping (DTW), and then we sort speeches based on the physical similarity. On the other hand, there is the permutation based on the perceptional similarity which is sorted according to the subject. We evaluate the physical features by the Spearman’s rank correlation coefficient with two permutations. Consequently, the results show that DTW distance with high STRAIGHT Cepstrum is an optimum feature for estimation of perceptional similarity.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Morishima, S., Maejima, A., Wemlera, S., Machida, T., Takebayashi, M.: Future Cast System. ACM SIGGRAPH 2005 Sketch. ACM SIGGRAPH 2005 Full Conference DVD-ROM Disc 2 (2005) ISBN 1-59593-099-X.020-morishima.pdf

    Google Scholar 

  2. Toda, T., Saruwatari, H., Shikano, K.: High Quality Voice Conversion Based on Gaussian Mixture Model with Dynamic Frequency Warping. In: Proc. INTERSPEECH2001-EUROSPEECH, Aalborg, Denmark, pp. 349–352 (September 2001)

    Google Scholar 

  3. Amino, K., Sugawara, T., Arai, T.: Speaker Similarities in Human Perception and their Spectral Properties. In: Proc. of WESPAC (2006)

    Google Scholar 

  4. Nagashima, I., Takagiwa, M., Saito, Y., Nagao, Y., Murakami, H., Fukushima, M., Yamnagwa, H.: An investigation of speech similarity for speaker discrimination. In: Acoustical Society of Japan 2003 Spring Meeting, pp. 737–738 (2003)(in Japanese)

    Google Scholar 

  5. Kawahara, H.: STRAIGHT: An extremely high-quality VOCODER for auditory and speech perception research. In: Greenberg, Slaney (eds.) Computational Models of Auditory Function, pp. 343–354. IOS Press, Amsterdam (2001)

    Google Scholar 

  6. Reynolds, D.A.: Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Trans. On Acoust. Speech and Audio Processing 3(1) (1995)

    Google Scholar 

  7. Abe, M.: Speech morphing by gradually changing spectrum parameter and fundamental frequency. In: ICSLP 1996, pp. 2235–2238 (1996)

    Google Scholar 

  8. Kasuya, H., Zhu, W., Matsuda, M., Yang, C.S.: Voice quality conversion based on an ARX speech analysis-synthesis method and its application to the study of speaker individualilty. J. Acoust. Soc. Am. Pt.2 100(4), 2600 (1996)

    Article  Google Scholar 

  9. Kitamura, T., Saitou, T.: Contribution of acoustic features of sustained vowels on perception of speaker characteristic. Acoustical Society of Japan 2007 Spring Meeting , 443–444 (2007) (in japanese)

    Google Scholar 

  10. Furui, S., Akagi, M.: Perception of voice individuality and physical correlates. Journal of the Acoustical Society of Japan J66-A, 311–318 (1985)

    Google Scholar 

  11. Saitou, T., Kitamura, T.: Factors in /VVV/ concatenated vowels affecting perception of speaker individuality. Acoustical Society of Japan 2007 Spring Meeting , 441–442 (2007) (in Japanese)

    Google Scholar 

  12. Higuchi, N., Hashimoto, M.: Analysis of acoustic features affecting speaker identification. In: Proc. of EUROSPEECH 1995, pp. 435–438 (1995)

    Google Scholar 

  13. Higuchi, N., Hashimoto, M.: Analysis of acoustic features affecting speaker identification. J. Acoust. Soc. Jpn (E) 17(1), 33–35 (1996)

    Google Scholar 

  14. Francis, A.L., Nusbaum, H.C.: Paying attention to speaking rate. In: Proc. of ICSLP 1996 (1996)

    Google Scholar 

  15. Minowa, Y., Kido, H., Kasuya, H.: The acoustic parameters associated with the expression of voice quality -a preliminary study. In: Proc. Spring Meeting Acoust. Soc. Japan, pp. 363–364 (2000)

    Google Scholar 

  16. Kido, H., Kasuya, H.: Voice quality expressions of speech utterance and their acoustic correlates. Technical report of IEICE, SP2002-95, WIT2002-35 (2002)

    Google Scholar 

  17. Martin, A., Przybocki, M., Doddington, G., Reynolds, D.: The NIST speaker recognition evaluation - overview, methodology, system, results, perspectives. Speech Communication 31, 225–254 (2000)

    Article  Google Scholar 

  18. Weber, F., Manganaro, L., Peskin, B., Shriberg, E.: Using Prosodic and Lexical Information for Speaker Identification. In: Proc. ICASSP, vol. 1, pp. 141–144 (2002)

    Google Scholar 

  19. Reynolds, D.A.: Speaker Identification and Verification using Gaussian Mixuture Speaker Models. Speech Communication 17, 177–192 (1995)

    Article  Google Scholar 

  20. Sukkar, R.A., Gandhi, M.B., Setlur, A.R.: Speaker Verification Using Mixture Decomposition Discrimination. IEEE Trans. Speech Audio Proc. 8(3), 292–299 (2000)

    Article  Google Scholar 

  21. Sakoe, H., Chiba, S.: A Dynamic Programming Algorithm Optimization for Spoken Word Recognition. IEEE Trans. on ASSP 26(27), 43–49 (1978)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Horace H.-S. Ip Oscar C. Au Howard Leung Ming-Ting Sun Wei-Ying Ma Shi-Min Hu

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Adachi, Y., Kawamoto, S., Morishima, S., Nakamura, S. (2007). Acoustic Features for Estimation of Perceptional Similarity. In: Ip, H.HS., Au, O.C., Leung, H., Sun, MT., Ma, WY., Hu, SM. (eds) Advances in Multimedia Information Processing – PCM 2007. PCM 2007. Lecture Notes in Computer Science, vol 4810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77255-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77255-2_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77254-5

  • Online ISBN: 978-3-540-77255-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics