Abstract
When humans sing, the changes in pitch and volume are usually significantly greater than when they speak. The shape and size of the mouth changes dramatically when singing compared with when talking, depending on the strength of the voice. In this study, we propose a new model that effectively generates the change in mouth shape in the vocal environment based on the voice strength estimated by analyzing the audio spectrum. We estimate the voice strength by analyzing the spectrum of the voice using a Fast Fourier Transform-based numerical technique for each frame. We apply the intensity of the estimated voice to the morph targets associated with the mouth shape as the weight of the blendshape. Experimental results show a visually convincing lip-synching animation that changes the mouth shape significantly depending on the pitch and volume of the voice.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chao, Y., Faloutsos, P., Kohler, E., PIGHIN, F.: Real-time speech motion synthesis from recorded motions. In: Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 347–355 (2004)
Xie, L., Liu, Z.-Q.: Realistic mouth-synching for speech-driven talking face using articulatory modeling. IEEE Trans. Multimedia 9(3), 500–510 (2007)
Pif, E., Chris, L., Eugene, F., Karan, S.: JALI: an animatorcentric viseme model for expressive lip synchronization. ACM Trans. Graph. (Proc. SIGGRAPH 2016) 35(4), 127:1–127:11 (2016)
Taylor, S., Kim, T., Yue, Y., Mahler, M., Krahe, J., Rodriguez, A.G., Hodgins, J., Matthews, I.: A deep learning approach for generalized speech animation. ACM Trans. Graph. (Proc. SIGGRAPH 2017) 36(4), 93 (2017)
Suwajanakorn, S., Seitz, S.M., Kemelmacher-Schlizerman, I.: Synthesizing Obama: learning lip sync from audio. ACM Trans. Graph. (Proc. SIGGRAPH 2017) 36(4), 95 (2017)
Karras, T., Aila, T., Laine, S., Herva, A., Lehtinen, J.: Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Trans. Graph. (Proc. SIGGRAPH 2017) 36(4), 94:1–94:12 (2017)
Zhang, F.S., Geng, Z.X., Yuan, W.: The algorithm of interpolating windowed FFT for harmonic analysis of power system. IEEE Trans. Power Deliv. 16(2), 160–164 (2001)
Acknowledgments
This research is supported by Ministry of Culture, Sports and Tourism (MCST) and Korea Creative Content Agency (KOCCA) in the Culture Technology (CT) Research & Development Program 2018, and is supported by the Mid-Career Researcher Program through an NRF grant funded by the MEST (No. NRF-2016R1A2B4016239).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kim, N., Park, K. (2020). Singing Lip Sync Animation System Using Audio Spectrum. In: Park, J., Park, DS., Jeong, YS., Pan, Y. (eds) Advances in Computer Science and Ubiquitous Computing. CUTE CSA 2018 2018. Lecture Notes in Electrical Engineering, vol 536. Springer, Singapore. https://doi.org/10.1007/978-981-13-9341-9_23
Download citation
DOI: https://doi.org/10.1007/978-981-13-9341-9_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9340-2
Online ISBN: 978-981-13-9341-9
eBook Packages: EngineeringEngineering (R0)