Abstract
In this chapter, a new framework of listener-dependent quantification of voice attractiveness is introduced. The probabilistic model of paired comparison results is extended to the multidimensional merit space, in which the intrinsic attractiveness of voices and the preference of listeners are both expressed as vectors. The attractiveness for a specific listener is then obtained by calculating the inner product of two vectors. The mapping from the paired comparison results to the multidimensional merit space is formulated as the maximization problem of the likelihood function. After the optimal mapping is obtained, we also discuss the possibility of predicting the attractiveness from the acoustic features. Machine learning approach is introduced to analyze the real data of Japanese greeting phrase “irasshaimase,” and the effectiveness is confirmed by the higher prediction accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is similar to the athletes’ ranking. If the high-ranked player always wins, the ranking is efficient. If there are many upsets in which the low-ranked player wins, the ranking is not efficient.
References
Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3–4), 324–345.
Cattelan, M., Varin, C., & Firth, D. (2013). Dynamic Bradley-Terry modelling of sports tournaments. Journal of the Royal Statistical Society: Series C (Applied Statistics), 62(1), 135–150.
Causeur, D., & Husson, F. (2005). A 2-dimensional extension of the Bradley-Terry model for paired comparisons. Journal of Statistical Planning and Inference, 135, 245–259.
Eyben, F., Weninger, F., Groß, F., & Schuller, B. (2013). Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proceedings of ACM Multimedia (MM), Barcelona, Spain (pp. 835–838).
Fujimoto, Y., Hino, H., & Murata, N. (2009). Item-user preference mapping with mixture models—Data visualization for item preference. In Proceedings of International Conference on Knowledge Discovery and Information Retrieval (pp. 105–111).
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18.
Junichi, Y., Onishi, K., Masuko, T., & Kobayashi, T. (2005). Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis. IEICE Transaction on Information and Systems 88(3), 502–509.
Lee, A., & Kawahara, T. (2009). Recent development of open-source speech recognition engine Julius. In Proceedings of APSIPA Annual Summit and Conference, Sapporo, Japan (pp. 1–7).
Mosteller, F. (1951). Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations. Psychometrika, 16(1), 3–9.
Ribeiro, F., Florêncio, D., Zhang, C., & Seltzer, M. (2011). CROWDMOS: An approach for crowdsourcing mean opinion score studies. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Prague, Czech Republic (pp. 1–7).
Ringeval, F. et al. (2017). AVEC2017 Real-life depression, and affect recognition workshop and challenge. In Proceedings of 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA (pp. 3–9).
Schuller, B. et al. (2017). The Interspeech 2017 computational paralinguistics challenge: addressee, cold & snoring. In Proceedings of INTERSPEECH 2017, Stockholm, Sweden (pp. 3442–3446).
Sato, N., & Obuchi, Y. (2007). Emotion recognition using mel-frequency cepstral coefficients. Journal of Naturan Language Processing 14(4), 83–96.
Shah, N. B., Balakrishnan, S., Bradley, J., Parekh, A., Ramchandran, K. & Wainwright, M. (2014). When is it better to compare than to score? CoRR abs/1406.6618.
Shevade, S. K., Keerthi, S. S., Bhattacharyya, C., & Murthy, K. (2000). Improvements to the SMO algorithm for SVM regression. IEEE Transaction on Neural Networks 11(5), 1188–1193.
Tato, R., Santos, R., Kompe, R., & Pardo, J. (2002). Emotional space improves emotion recognition. In Proceedings of 7th International Conference on Spoken Language Processing (ICSPL2002), Denver, USA (pp. 2029–2032).
Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., & Kitamura, T. (2004). Hidden semi-Markov model based speech synthesis. In Proceedings of Interspeech 2004, Jeju Island, Korea (pp. 1393–1396).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Obuchi, Y. (2021). Multidimensional Mapping of Voice Attractiveness and Listener’s Preference: Optimization and Estimation from Audio Signal. In: Weiss, B., Trouvain, J., Barkat-Defradas, M., Ohala, J.J. (eds) Voice Attractiveness. Prosody, Phonology and Phonetics. Springer, Singapore. https://doi.org/10.1007/978-981-15-6627-1_15
Download citation
DOI: https://doi.org/10.1007/978-981-15-6627-1_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6626-4
Online ISBN: 978-981-15-6627-1
eBook Packages: Social SciencesSocial Sciences (R0)