Comparing the Rhythmical Characteristics of Speech and Music – Theoretical and Practical Issues

  • Stephan Hübler
  • Rüdiger Hoffmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6456)


By comparing the features of music and speech in intelligent audio signal processing, both related research fields might benefit from each other. Music and speech serve as a way for humans to express themselves. The aim of this study is to show similarities and differences between music and speech by comparing the hierarchical structures with an emphasis on rhythm. Especially examining the temporal structure of music and speech could lead to new interesting features that improve existing technology. For example utilizing rhythm in synthetic speech is still an open issue as well as rhythmic features have to be improved for music in the fields of semantic search and music similarity retrieval. Theoretical aspects of rhythm in speech and music are discussed as well as practical issues in speech and music research. To show that common approaches are inherently feasible, an algorithm for onset detection is applied to speech and musical signals.


Speech Signal Detection Function Onset Detection Prosodic Feature Synthetic Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Spiegelbild der Sprache - Neurokognition von Musik (May 2010),
  2. 2.
    Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., Sandler, M.B.: A tutorial on onset detection in musical signals. IEEE Transactions on Speech and Audio Processing 13(5), 1035–1047 (2005)CrossRefGoogle Scholar
  3. 3.
    Caruntu, A., Toderean, G., Nica, A.: Automatic silence/unvoiced/voiced classifcation of speech using a modified teager energy feature. In: WSEAS Int. Conf. on Dynamical Systems and Control, Venice, Italy, pp. 62–65 (2005)Google Scholar
  4. 4.
    Collins, N.: Towards Autonomous Agents for Live Computer Music: Realtime Machine Listening and Interactive Music Systems. PhD thesis, Centre for Science and Music, Faculty of Music, University of Cambridge (2006)Google Scholar
  5. 5.
    Cummins, F.: Rhythmic Coordination in English Speech: An Experimental Study. PhD thesis, Indiana University (1997)Google Scholar
  6. 6.
    Cushing, I.R., Dellwo, V.: The role of speech rhythm in attending to one of two simultaneous speakers. In: Speech Prosody, 5th International Conference, Chicago, Illinois (2010)Google Scholar
  7. 7.
    Dauer, R.M.: Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11, 51–62 (1983)Google Scholar
  8. 8.
    Dixon, S., Pampalk, E., Widmer, G.: Classification of dance music by periodicity patterns. In: ISMIR, 4th International Society for Music Information Retrieval Conference, Baltimore, USA, pp. 159–165 (2003)Google Scholar
  9. 9.
    Gouyon, F., Dixon, S.: A review of automatic rhythm description systems. Computer Music Journal 29(1), 34–35 (2005)CrossRefGoogle Scholar
  10. 10.
    Gouyon, F., Dixon, S., Pampalk, E., Widmer, G.: Evaluating rhythmic descriptors for musical genre classification. In: AES, 25th International Conference, London, UK (June 2004)Google Scholar
  11. 11.
    Hübler, S., Wolff, M., Eichner, M.: Vergleich statistischer Klassifikatoren zur Ermittlung musikalischer Aspekte. In: Hoffmann, R. (ed.) Elektronische Sprachsignalverarbeitung. Tagungsband der 20. Konferenz, Dresden, 21. - 23. 9, of Studientexte zur Sprachkommunikation, Dresden, Germany, vol. 53, pp. 338–345 (September 2009)Google Scholar
  12. 12.
    Hirst, D.: The rhythm of text and the rhythm of utterances: from metrics to models. In: Interspeech, 10th Annual Conference of the International Speech Communication Association, Brighton, UK, pp. 1519–1522 (2009)Google Scholar
  13. 13.
    Hoffmann, R., Eichner, M., Wolff, M.: Analysis of verbal and nonverbal acoustic signals with the dresden UASR system. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 200–218. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Keller, E.: A phonetician’s view of signal generation for speech synthesis. In: Vich, R. (ed.) Electronic Speech Signal Processing, 16th Conference, Prague. Studientexte zur Sprachkommunikation, vol. 36, pp. 13–20. TUDpress, Dresden (2005)Google Scholar
  15. 15.
    Keller, E.: Beats for individual timing variation. In: Esposito, A., Bratanic, M., Keller, E., Marinaro, M. (eds.) The Fundamentals of Verbal and Non-verbal Communication and the Biometric Issue, pp. 115–128. IOS Press, Amsterdam (2007)Google Scholar
  16. 16.
    Keller, E.: From sound to rhythm expectancy (Tutorial). 5o Convegno Nazionale AISV. Università de Zurigo, Switzerland (2009)Google Scholar
  17. 17.
    Kühne, M., Wolff, M., Eichner, M., Hoffmann, R.: Voice activation using prosodic features. In: Interspeech, 8th International Conference on Spoken Language Processing, Jeju, Korea, pp. 3001–3004 (2004)Google Scholar
  18. 18.
    Klapuri, A.: Sound onset detection by appliying psychoacoustic knowledge. In: ICASSP, Phoenix, USA, vol. 6, pp. 3089–3092 (March1999)Google Scholar
  19. 19.
    Kompe, R.: Prosody in Speech Understanding Systems. LNCS, vol. 1307. Springer, Heidelberg (1997)Google Scholar
  20. 20.
    Kotnik, B., Sendorek, P., Astrov, S., Koc, T., Ciloglu, T., Fernández, L.D., Banga, E.R., Höge, H., Kacic, Z.: Evaluation of voice activity and voicing detection. In: Interspeech, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, pp. 1642–1645 (2008)Google Scholar
  21. 21.
    Krishna, A.G., Sreenivas, T.V.: Music instrument recognition: From isolated notes to solo phrases. In: ICASSP, pp. 265–268 (2004)Google Scholar
  22. 22.
    Lehiste, I.: Isochrony reconsidered. Journal of Phonetics 5, 253–263 (1977)Google Scholar
  23. 23.
    Lehrdal, F., Jackendoff, R.: The Generative Theory of Tonal Music. MIT Press, Cambridge (1983)Google Scholar
  24. 24.
    Leuschel, A., Docherty, G.J.: Prosodic assessment of dysarthria. In: Disorders of Motor Speech: Assessment, Treatment and Clinical Characterization, pp. 155–178. Paul H Brookes Publishing Co. Inc. (1996)Google Scholar
  25. 25.
    London, J.: Grove music online: Rhythm (April 2010),
  26. 26.
    Aniruddh Patel, D.: Language, music, syntax and the brain. Nature Neuroscience 6(7), 674–681 (2003)CrossRefGoogle Scholar
  27. 27.
    Volín, J., Pollák, P.: The dynamic dimension of the global speech-rhythm attributes. In: Interspeech, 10th Annual Conference of the International Speech Communication Association, Brighton, UK, pp. 1543–1546 (2009)Google Scholar
  28. 28.
    Werner, S., Eichner, M., Wolff, M., Hoffmann, R.: Toward spontanuos speech synthesis - utilizing language model information in tts. IEEE Transactions on Speech and Audio Processing 12(4), 436–445 (2004)CrossRefGoogle Scholar
  29. 29.
    Wolkowicz, J., Kesel, V.: Predicting development of research in music based on parallel with natural language processing. In: ISMIR, 11th International Society for Music Information Retrieval Conference, Utrecht, Netherlands, pp. 665–667 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Stephan Hübler
    • 1
  • Rüdiger Hoffmann
    • 1
  1. 1.Laboratory of Acoustics and Speech CommunicationTechnische Universität DresdenDresdenGermany

Personalised recommendations