Skip to main content

Multidimensional Mapping of Voice Attractiveness and Listener’s Preference: Optimization and Estimation from Audio Signal

  • Chapter
  • First Online:
Voice Attractiveness

Part of the book series: Prosody, Phonology and Phonetics ((PRPHPH))

  • 692 Accesses

Abstract

In this chapter, a new framework of listener-dependent quantification of voice attractiveness is introduced. The probabilistic model of paired comparison results is extended to the multidimensional merit space, in which the intrinsic attractiveness of voices and the preference of listeners are both expressed as vectors. The attractiveness for a specific listener is then obtained by calculating the inner product of two vectors. The mapping from the paired comparison results to the multidimensional merit space is formulated as the maximization problem of the likelihood function. After the optimal mapping is obtained, we also discuss the possibility of predicting the attractiveness from the acoustic features. Machine learning approach is introduced to analyze the real data of Japanese greeting phrase “irasshaimase,” and the effectiveness is confirmed by the higher prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is similar to the athletes’ ranking. If the high-ranked player always wins, the ranking is efficient. If there are many upsets in which the low-ranked player wins, the ranking is not efficient.

References

  • Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3–4), 324–345.

    Google Scholar 

  • Cattelan, M., Varin, C., & Firth, D. (2013). Dynamic Bradley-Terry modelling of sports tournaments. Journal of the Royal Statistical Society: Series C (Applied Statistics), 62(1), 135–150.

    Google Scholar 

  • Causeur, D., & Husson, F. (2005). A 2-dimensional extension of the Bradley-Terry model for paired comparisons. Journal of Statistical Planning and Inference, 135, 245–259.

    Google Scholar 

  • Eyben, F., Weninger, F., Groß, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of ACM Multimedia (MM), Barcelona, Spain (pp. 835–838).

    Google Scholar 

  • Fujimoto, Y., Hino, H., & Murata, N. (2009). Item-user preference mapping with mixture models—Data visualization for item preference. In Proceedings of International Conference on Knowledge Discovery and Information Retrieval (pp. 105–111).

    Google Scholar 

  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18.

    Google Scholar 

  • Junichi, Y., Onishi, K., Masuko, T., & Kobayashi, T. (2005). Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis. IEICE Transaction on Information and Systems 88(3), 502–509.

    Google Scholar 

  • Lee, A., & Kawahara, T. (2009). Recent development of open-source speech recognition engine Julius. In Proceedings of APSIPA Annual Summit and Conference, Sapporo, Japan (pp. 1–7).

    Google Scholar 

  • Mosteller, F. (1951). Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations. Psychometrika, 16(1), 3–9.

    Google Scholar 

  • Ribeiro, F., Florêncio, D., Zhang, C., & Seltzer, M. (2011). CROWDMOS: An approach for crowdsourcing mean opinion score studies. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Prague, Czech Republic (pp. 1–7).

    Google Scholar 

  • Ringeval, F. et al. (2017). AVEC2017 Real-life depression, and affect recognition workshop and challenge. In Proceedings of 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA (pp. 3–9).

    Google Scholar 

  • Schuller, B. et al. (2017). The Interspeech 2017 computational paralinguistics challenge: addressee, cold & snoring. In Proceedings of INTERSPEECH 2017, Stockholm, Sweden (pp. 3442–3446).

    Google Scholar 

  • Sato, N., & Obuchi, Y. (2007). Emotion recognition using mel-frequency cepstral coefficients. Journal of Naturan Language Processing 14(4), 83–96.

    Google Scholar 

  • Shah, N. B., Balakrishnan, S., Bradley, J., Parekh, A., Ramchandran, K. & Wainwright, M. (2014). When is it better to compare than to score? CoRR abs/1406.6618.

    Google Scholar 

  • Shevade, S. K., Keerthi, S. S., Bhattacharyya, C., & Murthy, K. (2000). Improvements to the SMO algorithm for SVM regression. IEEE Transaction on Neural Networks 11(5), 1188–1193.

    Google Scholar 

  • Tato, R., Santos, R., Kompe, R., & Pardo, J. (2002). Emotional space improves emotion recognition. In Proceedings of 7th International Conference on Spoken Language Processing (ICSPL2002), Denver, USA (pp. 2029–2032).

    Google Scholar 

  • Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., & Kitamura, T. (2004). Hidden semi-Markov model based speech synthesis. In Proceedings of Interspeech 2004, Jeju Island, Korea (pp. 1393–1396).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasunari Obuchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Obuchi, Y. (2021). Multidimensional Mapping of Voice Attractiveness and Listener’s Preference: Optimization and Estimation from Audio Signal. In: Weiss, B., Trouvain, J., Barkat-Defradas, M., Ohala, J.J. (eds) Voice Attractiveness. Prosody, Phonology and Phonetics. Springer, Singapore. https://doi.org/10.1007/978-981-15-6627-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6627-1_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6626-4

  • Online ISBN: 978-981-15-6627-1

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics