Journal of Intelligent Information Systems

, Volume 40, Issue 1, pp 141–158 | Cite as

Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition



In this paper, we propose a method based on probabilistic mixture model decomposition that can simultaneously identify musical instrument types, estimate pitches and assign each pitch to its source instrument in monaural polyphonic audio containing multiple sources. In the proposed system, the probability density function (PDF) of the observed mixture note is treated as a weighted sum approximation of all possible note models. These note models, covering 14 instruments and all their possible pitches, describe their dynamic frequency envelopes in terms of probability. The weight coefficients, indicating the probabilities of the existence of pitches of a certain type of instrument, are estimated using the Expectation-Maximization (EM) algorithm. The weight coefficients are used to detect the types of source instruments and the pitches. The results of experiments involving 14 instruments within a designated pitch range F3–F6 (37 pitches) demonstrate a good discrimination capability, especially in instrument identification and instrument-pitch identification. For the entire system including the note onset detection tool, using quartet polyphonic recordings, the average F-measure values of instrument-pitch identification, instrument identification and pitch estimation were 55.4, 62.5 and 86 % respectively.


Instrument identification Instrument-pitch identification Pitch estimation EM algorithm Probabilistic model 



The authors would like to thank E. Vincent for sharing the ERB code of his algorithm, and J.C. Brown for sharing the CQT code of his algorithm. This work is supported by the National Natural Scientific Foundation of China Project No.61173110 and Key Projects in the National Science & Technology Pillar Program 2011BAK08B02.


  1. Barbedo, J. G. A., & Tzanetakis, G. (2011). Musical instrument classification using individual partials. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 111–122.CrossRefGoogle Scholar
  2. Bay, M., & Beauchamp, J. (2006). Harmonic source separation using prestored spectra. In Indep. Compon. Anal. and Blind Signal Separ. (pp. 561–568).Google Scholar
  3. Bertin, N., Badeau, R., Vincent, E. (2009). Fast Bayesian NMF algorithms enforcing harmonicity and temporal continuity in polyphonic music transcription. In IEEE Workshop Appl. Signal Process. Audio Acoust. (pp. 29–32). NY, USA: New Paltz.Google Scholar
  4. Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, 4, 126.Google Scholar
  5. Brown, J. C. (1991). Calculation of a constant Q spectral transform (Vol. 89, Vol. 1): Vision and modeling group, media laboratory, Massachusetts Institute of Technology.Google Scholar
  6. Burred, J.J., Robel, A., Sikora, T. (2010). Dynamic spectral envelope modeling for timbre analysis of musical instrument sounds. Audio, Speech, and Language Processing, IEEE Transactions on, 18(3), 663–674.CrossRefGoogle Scholar
  7. Dessein, A., Cont, A., Lemaitre, G. (2010). Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In Int. soc. for music inf. retrieval conf., Utrecht, Netherlands.Google Scholar
  8. Dziubinski, M., Dalka, P., Kostek, B. (2005). Estimation of musical sound separation algorithm effectiveness employing neural networks. Journal of Intelligent Information Systems, 24(2), 133–157.CrossRefGoogle Scholar
  9. Essid, S., Richard, G., David, B. (2006). Musical instrument recognition by pairwise classification strategies. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1401–1412.CrossRefGoogle Scholar
  10. Goto, M. (2004). A predominant-F0 estimation method for polyphonic musical audio signals. In Proc. int. cong. on acoustics, ICA (pp. 1085–1088).Google Scholar
  11. Grindlay, G., & Ellis, D.P.W. (2010). A probabilistic subspace model for multi-instrument polyphonic transcription. In Int. soc. for music inf. retrieval conf., Utrecht, Netherlands (pp. 21–26).Google Scholar
  12. Heittola, T., Klapuri, A., Virtanen, T. (2009). Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In Int. soc. for music inf. retrieval conf., Kobe, Japan (pp. 327–332).Google Scholar
  13. Hofmann, T. (1999). Probabilistic latent semantic indexing. In ACM proceedings of twenty-second annual int. SIGIR conf (pp. 50–57). New York: ACM.Google Scholar
  14. Hu, Y., & Liu, G. (2011). Dynamic characteristics of musical note for musical instrument classification. In IEEE int. conf. on signal processing, communications and computing (pp. 1–6). Xi’an, China: IEEE.Google Scholar
  15. Jiang, W., Wieczorkowska, A., & Raś, Z. (2009). Music instrument estimation in polyphonic sound based on short-term spectrum match. Foundations of Computational Intelligence, 2, 259–273.Google Scholar
  16. Joder, C., Essid, S., Richard, G. (2009). Temporal integration for audio classification with application to musical instrument classification. Audio, Speech, and Language Processing, IEEE Transactions on, 17(1), 174–186.CrossRefGoogle Scholar
  17. Kameoka, H., Nishimoto, T., Sagayama, S. (2007). A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 982–994.CrossRefGoogle Scholar
  18. Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2007). Instrogram: probabilistic representation of instrument existence for polyphonic music. Information and Media Technologies, 2(1), 279–291.Google Scholar
  19. Kostek, B. (2004). Musical instrument classification and duet analysis employing music information retrieval techniques. Proceedings of the IEEE, 92(4), 712–729.CrossRefGoogle Scholar
  20. Kursa, M., Rudnicki, W., Wieczorkowska, A., Kubera, E., Kubik-Komar, A. (2009). Musical instruments in random forest. Foundations of Intelligent Systems, 281–290.Google Scholar
  21. Li, Y., Woodruff, J., Wang, D.L. (2009). Monaural musical sound separation based on pitch and common amplitude modulation. IEEE Transactions on Audio, Speech, and Language Processing, 17(7), 1361–1371.CrossRefGoogle Scholar
  22. Loughran, R., Walker, J., O’Neill, M., O’Farrell, M. (2008). The use of mel-frequency cepstral coefficients in musical instrument identification. In Proc. of the international computer music conference (ICMC), SARC, Belfast, N. Ireland.Google Scholar
  23. Rao, P., & Shandilya, S. (2004). On the detection of melodic pitch in a percussive background. Journal of Audio Engineering Soc., 52(4), 378–391.Google Scholar
  24. Shashanka, M., Raj, B., Smaragdis, P. (2008). Probabilistic latent variable models as nonnegative factorizations. Computational Intelligence and Neuroscience, 2008, 947438.CrossRefGoogle Scholar
  25. Smaragdis, P., Raj, B., Shashanka, M. (2006). A probabilistic latent variable model for acoustic modeling. In Advances in Models for Acoustic Processing, NIPS (Vol. 146).Google Scholar
  26. Vincent, E., Bertin, N., Badeau, R. (2010). Adaptive harmonic spectral decomposition for multiple pitch estimation. Audio, Speech, and Language Processing, IEEE Transactions on, 18(3), 528–537.CrossRefGoogle Scholar
  27. Wieczorkowska, A.A., & Kubera, E. (2010). Identification of a dominating instrument in polytimbral same-pitch mixes using SVM classifiers with non-linear kernel. Journal of Intelligent Information Systems, 34(3), 275–303.CrossRefGoogle Scholar
  28. Wu, J., Vincent, E., Raczynski, S., Nishimoto, T., Ono, N., Sagayama, S. (2011). Polyphonic pitch estimation and instrument identification by joint modeling of sustained and attack sounds. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1124–1132.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Xi’an Jiaotong UniversityXi’anChina

Personalised recommendations