Music instrument recognition using deep convolutional neural networks

  • Arun SolankiEmail author
  • Sachin Pandey
Original Research


Musical instruments identification in polyphonic is a challenge in music information retrieval. In proposed work, a deep convolution neural network framework for predominant instrument recognition in real-world polyphonic music is accomplished. The network is trained on fixed-length music with a labeled predominant instrument and estimate an arbitrary number of instruments from an audio signal with variable length. The Mel spectrogram representation is used to map audio data into the matrix format. This work used eight layer convolution neural network for instrument recognition. ReLu activation function is used for the scaling of training data and introduces non-linearity in the network. At each layer, Max Pooling function is used for the dimension reduction. For the regularization, dropout is used which prevent the output from getting overfitting. The Softmax function gives the probability of particular instruments. The research excellent result with 92.8% accuracy.


Music instrument recognition Deep convolution neural network 


  1. 1.
    Purohit R et al (2013) AI and its application: sixth sense technology. Int J Emerg Trends Technol Comput Sci 2(3):184–186Google Scholar
  2. 2.
    Luong DTA, Chandola V (2017) A K-means approach to clustering disease progressions. 2017 IEEE International Conference on Healthcare Informatics (ICHI).
  3. 3.
    Rong F (2016) Audio classification method based on machine learning. In: 2016 International conference on intelligent transportation. Big Data Smart City (ICITBS).
  4. 4.
    Issa MB et al (2017) Using logistic regression to improve virtual machines management in cloud computing systems. In: 2017 IEEE 14th international conference on mobile ad hoc and sensor systems (MASS).
  5. 5.
    Nitta T (1996) A back-propagation algorithm for complex numbered neural networks. In: Proceedings of 1993 international conference on neural networks (IJCNN-93-Nagoya, Japan).
  6. 6.
    Krizhevsky A et al (2012) ImageNet classification with deep convolutional neural networks. In: NIPS’12 proceedings of the 25th international conference on neural information processing systems, vol 1, pp 1097–1105Google Scholar
  7. 7.
    Hinton GE (2012) Improving neural networks by preventing co-adaptation of feature detectors. Neural Evol Comput 1–18. arXiv:1207.0580
  8. 8.
    Srivastava N et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetzbMATHGoogle Scholar
  9. 9.
    Hoshen Y et al (2015) Speech acoustic modeling from raw multichannel waveforms. In: IEEE international conference on acoustics, speech and signal processing (ICASSP).
  10. 10.
    Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. CoRR. arXiv:1505.00853
  11. 11.
    Xu J, Chang Z, Fan J et al (2015) EURASIP J Adv Signal Process 2015:41. CrossRefGoogle Scholar
  12. 12.
    Nguyen QT, Bui TD (2016) Vietnam J Comput Sci 3:247. CrossRefGoogle Scholar
  13. 13.
    Xiao X (2016) Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation. EURASIP J Adv Signal Process 1:4CrossRefGoogle Scholar
  14. 14.
    Han Y et al (2017) Deep convolutional neural networks for pre-dominant instrument recognition in polyphonic music. IEEE/ACM Trans Audio Speech Lang Process 25(1):208–221. CrossRefGoogle Scholar
  15. 15.
    Slim ESSID et al (2004) Musical instrument recognition on solo performances. In: 2004 12th European signal processing conference, pp 1284–1286Google Scholar
  16. 16.
    Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CoRR. arXiv:1412.6980
  17. 17.
    Calderon A, et al. (2003) Handwritten digit recognition using convolutional neural networks and gabor filters. In: Proceedings of the international congress on computational intelligence CIIC, pp 1–9Google Scholar
  18. 18.
    Essid S et al (2006) Instrument recognition in polyphonic music based on automatic taxonomies. IEEE Trans Audio Speech Lang Process 14:1CrossRefGoogle Scholar
  19. 19.
    Pikrakis A, et al. (2004) Classification of musical patterns using variable duration hidden Markov models. In: 2004 12th European signal processing conferenceGoogle Scholar
  20. 20.
    de Leon PJP, Inesta JM (2007) Pattern recognition approach for music style identification using shallow statistical descriptors. IEEE Trans Syst Man Cybern Part C Appl Rev 37(2):248–257CrossRefGoogle Scholar
  21. 21.
    Carabias-Orti JJ et al (2010) Music scene-adaptive harmonic dictionary for unsupervised note-event detection. IEEE Trans Audio Speech Lang Process 18(3):473–486CrossRefGoogle Scholar
  22. 22.
    Fu Z et al (2011) A survey of audio-based music classification and annotation. IEEE Trans Multimedia 13(2):303–319. CrossRefGoogle Scholar
  23. 23.
    Scherer D, Mller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: Diamantaras K, Duch W, Iliadis LS (eds) Artificial neural networks ICANN 2010. ICANN 2010. Lecture Notes in Computer Science, vol 6354. Springer, BerlinGoogle Scholar
  24. 24.
    Nagi J (2011) Max pooling convolution neural networks for vision- based hand gesture recognition. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)Google Scholar
  25. 25.
    Weenink D (2016) Speech signal processing with Praat 18th April 2016Google Scholar
  26. 26.
    Plunkett K, Elman JL (1997) Exercises in rethinking innateness. MIT Press, Cambridge, p 166Google Scholar
  27. 27.
    Bosch J, et al. (2012) A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals, 13th International society for music information retrieval conference (ISMIR 2012) pp. 559–564Google Scholar

Copyright information

© Bharati Vidyapeeth's Institute of Computer Applications and Management 2019

Authors and Affiliations

  1. 1.School of Information and Communication TechnologyGautam Buddha UniversityGreater NoidaIndia

Personalised recommendations