Capturing Expressive and Indicative Qualities of Conducting Gesture: An Application of Temporal Expectancy Models

Swaminathan, Dilip; Thornburg, Harvey; Ingalls, Todd; Rajko, Stjepan; James, Jodi; Campana, Ellen; Afanador, Kathleya; Leistikow, Randal

doi:10.1007/978-3-540-85035-9_3

Dilip Swaminathan¹,
Harvey Thornburg¹,
Todd Ingalls¹,
Stjepan Rajko¹,
Jodi James¹,
Ellen Campana¹,
Kathleya Afanador¹ &
…
Randal Leistikow²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4969))

Included in the following conference series:

International Symposium on Computer Music Modeling and Retrieval

1933 Accesses
2 Citations

Abstract

Many event sequences in everyday human movement exhibit temporal structure: for instance, footsteps in walking, the striking of balls in a tennis match, the movements of a dancer set to rhythmic music, and the gestures of an orchestra conductor. These events generate prior expectancies regarding the occurrence of future events. Moreover, these expectancies play a critical role in conveying expressive qualities and communicative intent through the movement; thus they are of considerable interest in musical control contexts. To this end, we introduce a novel Bayesian framework which we call the temporal expectancy model and use it to develop an analysis tool for capturing expressive and indicative qualities of the conducting gesture based on temporal expectancies. The temporal expectancy model is a general dynamic Bayesian network (DBN) that can be used to encode prior knowledge regarding temporal structure to improve event segmentation. The conducting analysis tool infers beat and tempo, which are indicative and articulation which is expressive, as well as temporal expectancies regarding beat (ictus and preparation instances) from conducting gesture. Experimental results using our analysis framework reveal a very strong correlation in how significantly the preparation expectancy builds up for staccato vs legato articulation, which bolsters the case for temporal expectancy as cognitive model for event anticipation, and as a key factor in the communication of expressive qualities of conducting gesture. Our system operates on data obtained from a marker based motion capture system, but can be easily adapted for more affordable technologies like video camera arrays.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ozyurek, S.K.A., Willems, R.M., Hagoort, P.: On-line integration of semantic information from speech and gesture: Insights from event-related brain potentials. Journal of Cognitive Neuroscience 19, 605–616 (2007)
Article Google Scholar
Arulampalam, S., Maskell, S., Gordon, N., Clapp, T.: Tutorial on particle filters for on-line nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing (2001)
Google Scholar
Barabasi, A.L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207 (2005)
Article Google Scholar
Berger, J., Gang, D.: A neural network model of metric perception and cognition in the audition of functional tonal music. In: International Computer Music Conference (1997)
Google Scholar
Bregler, C.: Learning and recognizing human dynamics in video sequences. In: International conference on computer vision and pattern recognition (1997)
Google Scholar
Cemgil, A., Kappen, H.J., Desain, P., Honing, H.: On tempo tracking: Tempogram representation and Kalman filtering. In: Proceedings of the 2000 International Computer Music Conference, pp. 352–355 (2000)
Google Scholar
Cemgil, A.T.: Bayesian Music Transcription. PhD thesis, Radboud University (2004)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley and Sons, Chichester (1999)
Google Scholar
Desain, P.: What rhythm do I have in mind? Detection of imagined temporal patterns from single trial ERP. In: Proceedings of the International Conference on Music Perception and Cognition (ICMPC) (2004)
Google Scholar
Campana, S.P.M.K.T.E., Silverman, L., Bennetto, L.: Listeners immediately integrate natural combinations of speech and iconic gesture. Language and Cognitive Processes (submitted)
Google Scholar
Hackney, P.: Making Connections: Total Body Integration Through Bartenieff Fundamentals. Routledge (2000)
Google Scholar
Hagendoorn, I.G.: Some speculative hypotheses about the nature and perception of dance and choreography. Journal of Consciousness Studies, 79–110 (2004)
Google Scholar
Hainsworth, S.W.: Techniques for the Automated Analysis of Musical Audio. PhD thesis, University of Cambridge (2003)
Google Scholar
Holle, H., Gunter, T.C.: The role of iconic gestures in speech disambiguation: ERP evidence. Journal of Cognitive Neuroscience 19, 1175–1192 (2007)
Article Google Scholar
Huron, D.: Sweet Anticipation: Music and the Psychology of Expectation (Bradford Books). MIT Press, Cambridge (2006)
Google Scholar
Jaynes, E.T.: Probability Theory: Logic of Science, Cambridge (2003)
Google Scholar
Jones, M.R., McAuley, J.D.: Time judgments in global temporal contexts. Perception and Psychophysics, 398–417 (2005)
Google Scholar
Kay, S.M.: Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc., Englewood Cliffs (1993)
MATH Google Scholar
Kolesnik, P., Wanderley, M.: Recognition, analysis and performance with expressive conducting gestures. In: International Computer Music Conference (2004)
Google Scholar
Kutas, M., Federmeier, K.: Electrophysiology reveals semantic memory use in language comprehension. Trends in Cognitive Science 4, 463–470 (2000)
Article Google Scholar
Lee, E., Grull, I., Kiel, H., Borchers, J.: conga: A framework for adaptive conducting gesture analysis. In: International Conference on New Interfaces for Musical Expression (2006)
Google Scholar
Leistikow, R.: Bayesian Modeling of Musical Expectations using Maximum Entropy Stochastic Grammars. PhD thesis, Stanford University (2006)
Google Scholar
McAuley, J.D.: The effect of tempo and musical experience on perceived beat. Australian Journal of Psychology, 176–187 (1999)
Google Scholar
Meyer, L.B.: Emotion and Meaning in Music. University Of Chicago Press (1961)
Google Scholar
Miranda, R.A., Ullman, M.T.: Double dissociation between rules and memory in music: An event-related potential study. NeuroImage 38, 331–345 (2007)
Article Google Scholar
Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. Computer Vision and Image Understanding 81, 231–268 (2001)
Article MATH Google Scholar
Moeslund, T.B., Hilton, A., Kruger, V.: A survey of advances in vision-based human motion capture and analysis. International Journal of Computer Vision and Image Understanding (2006)
Google Scholar
Murphy, D., Andersen, T., Jensen, K.: Conducting Audio Files via Computer Vision. In: Camurri, A., Volpe, G. (eds.) GW 2003. LNCS (LNAI), vol. 2915, pp. 529–540. Springer, Heidelberg (2004)
Google Scholar
Murphy, K.: Dynamic Bayesian Networks:Representation, Inference and Learning. PhD thesis, University of California, Berkeley (2002)
Google Scholar
Narmour, E.: The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model. University of Chicago Press, Chicago (1990)
Google Scholar
Povel, D.J., Essens, P.: Perception of temporal patterns. Music Perception, 411–440 (1985)
Google Scholar
Ross, S.: Stochastic Processes. Wiley Interscience, Chichester (1995)
Google Scholar
Kelly, C.K.S.D., Hopkins, M.: Neural correlates of bimodal speech and gesture comprehension. Brain and Language 89(1), 253–260 (2004)
Article Google Scholar
Koelsch, D.S.K.S.T.G.S., Kasper, E., Friederici, A.D.: Music, language and meaning: Brain signatures of semantic processing. Nature Neuroscience 7, 302–307 (2004)
Article Google Scholar
Savitzky, A., Golay, M.J.E.: Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 1627–1639 (1964)
Google Scholar
Thornburg, H.: Detection and Modeling of Transient Audio Signals with Prior Information. PhD thesis, Stanford University (2005)
Google Scholar
Thornburg, H., Swaminathan, D., Ingalls, T., Leistikow, R.: Joint segmentation and temporal structure inference for partially-observed event sequences. In: International Workshop on Multimedia Signal Processing (2006)
Google Scholar
Torresani, L., Hackney, P., Bregler, C.: Learning motion style synthesis from perceptual observations. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 1393–1400. MIT Press, Cambridge (2007)
Google Scholar
Ude, A.: Robust estimation of human body kinematics from video. In: Proc. IEEE/RSJ Conf. Intelligent Robots and Systems (1999)
Google Scholar
Urtasun, R., Fleet, D.J., Fua, P.: 3d people tracking with gaussian process dynamical models. In: IEEE Conference on Computer Vision and Pattern Recognition (2006)
Google Scholar
Usa, S., Mochida, Y.: A multi-modal conducting simulator. In: International Computer Music Conference (1978)
Google Scholar
Weisstein, E.W.: MathWorld–A Wolfram Web Resource, http://mathworld.wolfram.com/hypotrochoid.html
Wu, Y.C., Coulson, S.: Meaningful gestures: Electrophysiological indices of iconic gesture comprehension. Psychophysiology 42, 654–667 (2005)
Article Google Scholar
Wu, Y.C., Coulson, S.: How iconic gestures enhance communication: An ERP study. Brain and Language (in press, 2007)
Google Scholar
Xenakis, I.: Formalized Music. Pendragon Press, Stuyvesant (1992)
Google Scholar
Zanto, T.P., Snyder, J.S., Large, E.W.: Neural correlates of rhythmic expectancy. Advances in Cognitive Psychology, 221–231 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Arts, Media and Engineering, Arizona State University, USA
Dilip Swaminathan, Harvey Thornburg, Todd Ingalls, Stjepan Rajko, Jodi James, Ellen Campana & Kathleya Afanador
Zenph Studios Inc, Raleigh, NC, USA
Randal Leistikow

Authors

Dilip Swaminathan
View author publications
You can also search for this author in PubMed Google Scholar
Harvey Thornburg
View author publications
You can also search for this author in PubMed Google Scholar
Todd Ingalls
View author publications
You can also search for this author in PubMed Google Scholar
Stjepan Rajko
View author publications
You can also search for this author in PubMed Google Scholar
Jodi James
View author publications
You can also search for this author in PubMed Google Scholar
Ellen Campana
View author publications
You can also search for this author in PubMed Google Scholar
Kathleya Afanador
View author publications
You can also search for this author in PubMed Google Scholar
Randal Leistikow
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Richard Kronland-Martinet Sølvi Ystad Kristoffer Jensen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Swaminathan, D. et al. (2008). Capturing Expressive and Indicative Qualities of Conducting Gesture: An Application of Temporal Expectancy Models. In: Kronland-Martinet, R., Ystad, S., Jensen, K. (eds) Computer Music Modeling and Retrieval. Sense of Sounds. CMMR 2007. Lecture Notes in Computer Science, vol 4969. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85035-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-85035-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85034-2
Online ISBN: 978-3-540-85035-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics