Abstract
A mimetic word is used to verbally express the manner of a phenomenon intuitively. The Japanese language is known to have a greater number of mimetic words in its vocabulary than most other languages. Especially, since human gaits are one of the most commonly represented behavior by mimetic words in the language, we consider that it should be suitable for labels of fine-grained gait recognition. In addition, Japanese mimetic words have a more decomposable structure than these in other languages such as English. So it is said that they have sound-symbolism and their phonemes are strongly related to the impressions of various phenomena. Thanks to this, native Japanese speakers can express their impressions on them briefly and intuitively using various mimetic words. Our previous work proposed a framework to convert the body-parts movements to an arbitrary mimetic word by a regression model. The framework introduced a “phonetic space” based on sound-symbolism, and it enabled fine-grained gait description using the generated mimetic words consisting of an arbitrary combination of phonemes. However, this method did not consider the “naturalness” of the description. Thus, in this paper, we propose an improved mimetic word generation module considering its naturalness, and update the description framework. Here, we define the co-occurrence frequency of phonemes composing a mimetic word as the naturalness. To investigate the co-occurrence frequency, we collected many mimetic words through a subjective experiment. As a result of evaluation experiments, we confirmed that the proposed module could describe gaits with more natural mimetic words while maintaining the description accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
In Japanese language, a special phoneme /n/ sometimes appears except in the first phoneme (it is called syllabic nasal). Although, strictly speaking, it is not a vowel, in this paper, we handle it as a vowel for convenience.
References
Doizaki, R., Watanabe, J., Sakamoto, M.: Automatic estimation of multidimensional ratings from a single sound-symbolic word and word-based visualization of tactile perceptual space. IEEE Trans. Haptics 10(2), 173–182 (2017)
Fukusato, T., Morishima, S.: Automatic depiction of onomatopoeia in animation considering physical phenomena. In: Proceedings of the 7th ACM International Conference on Motion in Games, pp. 161–169 (2014)
Hamano, S.: The Sound-Symbolic System of Japanese. CSLI Publications, Stanford (1998)
Kato, H., et al.: Toward describing human gaits by onomatopoeias. In: Proceedings of the 2017 IEEE International Conference on Computer Vision, pp. 1573–1580 (2017)
Köhler, W.: Gestalt Psychology: An Introduction to New Concepts in Modern Psychology. WW Norton & Company, New York (1970)
Li, Q., et al.: Classification of gait anomalies from Kinect. Vis. Comput. 34(2), 229–241 (2018)
Ono, M.: Jpn. Onomatopoeia Dict. (In Jpn.). Shogakukan Press, Tokyo (2007)
Ramachandran, V.S., Hubbard, E.M.: Synaesthesia–a window into perception, thought and language. J. Conscious. Stud. 8(12), 3–34 (2001)
Sakata, A., Makihara, Y., Takemura, N., Muramatsu, D., Yagi, Y.: Gait-based age estimation using a DenseNet. In: Carneiro, G., You, S. (eds.) ACCV 2018. LNCS, vol. 11367, pp. 55–63. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21074-8_5
Shimoda, W., Yanai, K.: A visual analysis on recognizability and discriminability of onomatopoeia words with DCNN features. In: Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, pp. 1–6 (2015)
Sundaram, S., Narayanan, S.: Analysis of audio clustering using word descriptions. In: Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 769–772 (2007)
Sundaram, S., Narayanan, S.: Classification of sound clips by two schemes: using onomatopoeia and semantic labels. In: Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, pp. 1341–1344 (2008)
Takano, W., Yamada, Y., Nakamura, Y.: Linking human motions and objects to language for synthesizing action sentences. Auton. Robot. 43(4), 913–925 (2019)
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, pp. 4724–4732 (2016)
Acknowledgements
Parts of this work were supported by MEXT, Grant-in-Aid for Scientific Research and the Kayamori Foundation of Information Science Advancement.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kato, H. et al. (2020). More-Natural Mimetic Words Generation for Fine-Grained Gait Description. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-37734-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37733-5
Online ISBN: 978-3-030-37734-2
eBook Packages: Computer ScienceComputer Science (R0)