Multidimensional Mapping of Voice Attractiveness and Listener’s Preference: Optimization and Estimation from Audio Signal

Obuchi, Yasunari

doi:10.1007/978-981-15-6627-1_15

Yasunari Obuchi⁸

Part of the book series: Prosody, Phonology and Phonetics ((PRPHPH))

692 Accesses

Abstract

In this chapter, a new framework of listener-dependent quantification of voice attractiveness is introduced. The probabilistic model of paired comparison results is extended to the multidimensional merit space, in which the intrinsic attractiveness of voices and the preference of listeners are both expressed as vectors. The attractiveness for a specific listener is then obtained by calculating the inner product of two vectors. The mapping from the paired comparison results to the multidimensional merit space is formulated as the maximization problem of the likelihood function. After the optimal mapping is obtained, we also discuss the possibility of predicting the attractiveness from the acoustic features. Machine learning approach is introduced to analyze the real data of Japanese greeting phrase “irasshaimase,” and the effectiveness is confirmed by the higher prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It is similar to the athletes’ ranking. If the high-ranked player always wins, the ranking is efficient. If there are many upsets in which the low-ranked player wins, the ranking is not efficient.

References

Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3–4), 324–345.
Google Scholar
Cattelan, M., Varin, C., & Firth, D. (2013). Dynamic Bradley-Terry modelling of sports tournaments. Journal of the Royal Statistical Society: Series C (Applied Statistics), 62(1), 135–150.
Google Scholar
Causeur, D., & Husson, F. (2005). A 2-dimensional extension of the Bradley-Terry model for paired comparisons. Journal of Statistical Planning and Inference, 135, 245–259.
Google Scholar
Eyben, F., Weninger, F., Groß, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of ACM Multimedia (MM), Barcelona, Spain (pp. 835–838).
Google Scholar
Fujimoto, Y., Hino, H., & Murata, N. (2009). Item-user preference mapping with mixture models—Data visualization for item preference. In Proceedings of International Conference on Knowledge Discovery and Information Retrieval (pp. 105–111).
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18.
Google Scholar
Junichi, Y., Onishi, K., Masuko, T., & Kobayashi, T. (2005). Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis. IEICE Transaction on Information and Systems 88(3), 502–509.
Google Scholar
Lee, A., & Kawahara, T. (2009). Recent development of open-source speech recognition engine Julius. In Proceedings of APSIPA Annual Summit and Conference, Sapporo, Japan (pp. 1–7).
Google Scholar
Mosteller, F. (1951). Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations. Psychometrika, 16(1), 3–9.
Google Scholar
Ribeiro, F., Florêncio, D., Zhang, C., & Seltzer, M. (2011). CROWDMOS: An approach for crowdsourcing mean opinion score studies. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Prague, Czech Republic (pp. 1–7).
Google Scholar
Ringeval, F. et al. (2017). AVEC2017 Real-life depression, and affect recognition workshop and challenge. In Proceedings of 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA (pp. 3–9).
Google Scholar
Schuller, B. et al. (2017). The Interspeech 2017 computational paralinguistics challenge: addressee, cold & snoring. In Proceedings of INTERSPEECH 2017, Stockholm, Sweden (pp. 3442–3446).
Google Scholar
Sato, N., & Obuchi, Y. (2007). Emotion recognition using mel-frequency cepstral coefficients. Journal of Naturan Language Processing 14(4), 83–96.
Google Scholar
Shah, N. B., Balakrishnan, S., Bradley, J., Parekh, A., Ramchandran, K. & Wainwright, M. (2014). When is it better to compare than to score? CoRR abs/1406.6618.
Google Scholar
Shevade, S. K., Keerthi, S. S., Bhattacharyya, C., & Murthy, K. (2000). Improvements to the SMO algorithm for SVM regression. IEEE Transaction on Neural Networks 11(5), 1188–1193.
Google Scholar
Tato, R., Santos, R., Kompe, R., & Pardo, J. (2002). Emotional space improves emotion recognition. In Proceedings of 7th International Conference on Spoken Language Processing (ICSPL2002), Denver, USA (pp. 2029–2032).
Google Scholar
Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., & Kitamura, T. (2004). Hidden semi-Markov model based speech synthesis. In Proceedings of Interspeech 2004, Jeju Island, Korea (pp. 1393–1396).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Media Science, Tokyo University of Technology, 1404-1 Katakura, Hachioji, Tokyo, 192-0982, Japan
Yasunari Obuchi

Authors

Yasunari Obuchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yasunari Obuchi .

Editor information

Editors and Affiliations

Technische Universität Berlin, Berlin, Germany
Benjamin Weiss
Saarland University, Saarbrücken, Germany
Jürgen Trouvain
ISEM, Montpellier, France
Melissa Barkat-Defradas
International Computer Science Institute, Berkeley, CA, USA
John J. Ohala

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Obuchi, Y. (2021). Multidimensional Mapping of Voice Attractiveness and Listener’s Preference: Optimization and Estimation from Audio Signal. In: Weiss, B., Trouvain, J., Barkat-Defradas, M., Ohala, J.J. (eds) Voice Attractiveness. Prosody, Phonology and Phonetics. Springer, Singapore. https://doi.org/10.1007/978-981-15-6627-1_15

Download citation

DOI: https://doi.org/10.1007/978-981-15-6627-1_15
Published: 11 October 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6626-4
Online ISBN: 978-981-15-6627-1
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics