Separation and Classification of Harmonic Sounds for Singing Voice Detection

  • Martín Rocamora
  • Alvaro Pardo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7441)


This paper presents a novel method for the automatic detection of singing voice in polyphonic music recordings, that involves the extraction of harmonic sounds from the audio mixture and their classification. After being separated, sounds can be better characterized by computing features that are otherwise obscured in the mixture. A set of descriptors of typical pitch fluctuations of the singing voice is proposed, that is combined with classical spectral timbre features. The evaluation conducted shows the usefulness of the proposed pitch features and indicates that the approach is a promising alternative for tackling the problem, in particular for not much dense polyphonies where singing voice can be correctly tracked. As an outcome of this work an automatic singing voice separation system is obtained with encouraging results.


Musical Instrument Pitch Contour Audio Feature Signal Frame Pitch Tracking 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Tsai, W.H., Wang, H.M.: Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signal. IEEE Transactions on Speech and Audio Processing 14(1) (2006)Google Scholar
  2. 2.
    Rocamora, M., Herrera, P.: Comparing audio descriptors for singing voice detection in music audio files. In: 11th Brazilian Symposium on Computer Music, São Paulo, Brazil (2007)Google Scholar
  3. 3.
    Regnier, L., Peeters, G.: Singing voice detection in music tracks using direct voice vibrato detection. In: ICASSP IEEE Int. Conf., pp. 1685–1688 (2009)Google Scholar
  4. 4.
    Cancela, P., López, E., Rocamora, M.: Fan chirp transform for music representation. In: 13th DAFx-10 Int. Conf. on Digital Audio Effects, Graz, Austria (2010)Google Scholar
  5. 5.
    Rocamora, M., Cancela, P.: Pitch tracking in polyphonic audio by clustering local fundamental frequency estimates. In: 9th Brazilian AES Audio Engineering Congress, São Paulo, Brazil (2011)Google Scholar
  6. 6.
    Sundberg, J.: The science of the singing voice. De Kalb, Il. Northern Illinois University Press (1987)Google Scholar
  7. 7.
    Ellis, D.P.W.: PLP and RASTA (and MFCC, and inversion) in Matlab (2005)Google Scholar
  8. 8.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Martín Rocamora
    • 1
  • Alvaro Pardo
    • 2
  1. 1.Institute of Electrical Engineering - School of EngineeringUniversidad de la RepúblicaUruguay
  2. 2.Department of Electrical Engineering - School of Engineering and TechnologiesUniversidad Católica del UruguayUruguay

Personalised recommendations