Skip to main content

Advertisement

Log in

Automatic music transcription: challenges and future directions

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://staff.aist.go.jp/m.goto/RWC-MDB/AIST-Annotation/SyncRWC/

  2. http://lilypond.org/

References

  • Abdallah, S.A., & Plumbley, M.D. (2004). Polyphonic transcription by non-negative sparse coding of power spectra. In 5th int. conf. on music information retrieval (pp. 318–325).

  • Arberet, S., Ozerov, A., Bimbot, F. & Gribonval, R (2012). A tractable framework for estimating and combining spectral source models for audio source separation. Signal Processing, 92(8), 1886–1901.

    Article  Google Scholar 

  • Barbancho, A., Klapuri, A., Tardon, L. & Barbancho, I (2012). Automatic transcription of guitar chords and fingering from audio. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 915–921.

    Article  Google Scholar 

  • Barbancho, I., de la Bandera, C., Barbancho, A., Tardon, L. (2009). Transcription and expressiveness detection system for violin music. In Int. conf. audio, speech, and signal processing (pp. 189–192).

  • Barbedo, J. & Tzanetakis, G (2011). Musical instrument classification using individual partials. IEEE Trans. Audio, Speech, and Language Processing, 19(1), 111–122.

    Article  Google Scholar 

  • Bay, M. & Beauchamp, J. W (2012). Multiple-timbre fundamental frequency tracking using an instrument spectrum library. The. Journal of the Acoustical Society of America, 132(3), 1886.

    Article  Google Scholar 

  • Bay, M., Ehmann, A.F., Downie, J.S. (2009). Evaluation of multiple-F0 estimation and tracking systems. In 10th int. society for music information retrieval conf. (pp. 315–320).

  • Bello, J., Daudet, L., Abdallah, S., Duxbury, C., Davies, M. & Sandler, M (2005). A tutorial on onset detection in musical signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.

    Article  Google Scholar 

  • Bello, J.P. (2003). Towards the automated analysis of simple polyphonic music: A knowledge-based approach. Ph.D. thesis, Department of Electronic Engineering, Queen Mary University of London.

  • Bello, J. P., Daudet, L. & Sandler, M. B (2006). Automatic piano transcription using frequency and time-domain information. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2242–2251.

    Article  Google Scholar 

  • Benetos, E., & Dixon, S. (2011). Polyphonic music transcription using note onset and offset detection. In IEEE international conference on acoustics, speech, and signal processing (pp. 37–40). Prague, Czech Republic.

  • Benetos, E. & Dixon, S (2012). A shift-invariant latent variable model for automatic music transcription. Computer Music Journal, 36(4), 81–94.

    Article  Google Scholar 

  • Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., Klapuri, A. (2012). Automatic music transcription: Breaking the glass ceiling. In 13th int. society for music information retrieval conf. (pp. 379–384).

  • Benetos, E., Klapuri, A., Dixon, S. (2012). Score-informed transcription for automatic piano tutoring. In 20th European signal processing conf. (pp. 2153–2157).

  • Bertin, N., Badeau, R., Richard, G. (2007). Blind signal decompositions for automatic transcription of polyphonic music: NMF and K-SVD on the benchmark. In IEEE international conference on acoustics, speech, and signal processing (pp. 65–68).

  • Bertin, N., Badeau, R. & Vincent, E (2010). Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 538–549.

    Article  Google Scholar 

  • Böck, S., Arzt, A., Krebs, F., Schedl, M. (2012). Online realtime onset detection with recurrent neural networks. In Proceedings of the 15th international conference on digital audio effects.

  • Bosch, J., Janer, J., Fuhrmann, F., Herrera, P. (2012). A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In 13th int. society for music information retrieval conf. (pp. 559–564).

  • Brown, J (1991). Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1), 425–434.

    Article  Google Scholar 

  • Buckheit, J.B., & Donoho, D.L. (1995). WaveLab and reproducible research. Tech. Rep. 474, Dept of Statistics, Stanford Univ.

  • Burred, J., Robel, A., Sikora, T. (2009). Polyphonic musical instrument recognition based on a dynamic model of the spectral envelope. In Int. conf. audio, speech, and signal processing (pp. 173–176).

  • Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C. & Slaney, M (2008). Content-based music information retrieval: current directions and future challenges. Proceedings of the IEEE, 96(4), 668–696.

    Article  Google Scholar 

  • Cemgil, A. & Kappen, B (2003). Monte carlo methods for tempo tracking and rhythm quantization. Journal of Artificial Intelligence Research, 18, 45–81.

    MATH  Google Scholar 

  • Cemgil, A.T. (2004). Bayesian music transcription. Ph.D. thesis, Radboud University Nijmegen, Netherlands.

  • Cemgil, A. T., Kappen, H. J. & Barber, D (2006). A generative model for music transcription. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 679–694.

    Article  Google Scholar 

  • Collins, N. (2005). A comparison of sound onset detection algorithms with emphasis on psychoacoustically motivated detection functions. In 118th convention of the audio engineering society. Barcelona, Spain.

  • Cont, A. (2006). Realtime multiple pitch observation using sparse non-negative constraints. In 7th international conference on music information retrieval.

  • Dannenberg, R. (2005). Toward automated holistic beat tracking, music analysis, and understanding. In 6th int. conf. on music information retrieval (pp. 366–373).

  • Davies, M. & Plumbley, M (2007). Context-dependent beat tracking of musical audio. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 1009–1020.

    Article  Google Scholar 

  • Davy, M., Godsill, S. & Idier, J (2006). Bayesian analysis of western tonal music. Journal of the Acoustical Society of America, 119(4), 2498–2517.

    Article  Google Scholar 

  • Degara, N., Davies, M., Pena, A. & Plumbley, M (2011). Onset event decoding exploiting the rhythmic structure of polyphonic music. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1228–1239.

    Article  Google Scholar 

  • Degara, N., Rua, E. A., Pena, A., Torres-Guijarro, S., Davies, M. & Plumbley, M (2012). Reliability-informed beat tracking of musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 290–301.

    Article  Google Scholar 

  • Desain, P. & Honing, H (1999). Computational models of beat induction: the rule-based approach. Journal of New. Music Research, 28(1), 29–42.

    Article  Google Scholar 

  • Dessein, A., Cont, A., Lemaitre, G. (2010). Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In 11th int. society for music information retrieval conf. (pp. 489–494).

  • Dittmar, C., & Abeßer, J. (2008). Automatic music transcription with user interaction. In 34. Deutsche jahrestagung für akustik (DAGA) (pp. 567–568).

  • Dittmar, C., Cano, E., Abeßer, J., Grollmisch, S. (2012). Music information retrieval meets music education. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 95–120). Schloss Dagstuhl–Leibniz-Zentrum für Informatik.

  • Dixon, S (2001). Automatic extraction of tempo and beat from expressive performances. Journal of New. Music Research, 30(1), 39–58.

    Article  Google Scholar 

  • Dixon, S., Goebl, W. & Cambouropoulos, E (2006). Perceptual smoothness of tempo in expressively performed music. Music Perception, 23(3), 195–214.

    Article  Google Scholar 

  • Dressler, K. (2012). Multiple fundamental frequency extraction for MIREX 2012. In Music information retrieval evaluation eXchange. http:www.music-ir.org/mirex/abstracts/2012/KD1.pdf.

  • Duan, Z., Pardo, B. & Zhang, C (2010). Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2121–2133.

    Article  Google Scholar 

  • Durrieu, J., & Thiran, J. (2012). Musical audio source separation based on user-selected F0 track. In 10th int. conf. latent variable analysis and source separation (pp. 438–445).

  • Eggink, J., & Brown, G. (2003). A missing feature approach to instrument identification in polyphonic music. In Int. conf. audio, speech, and signal processing (Vol. 5, pp. 553–556).

  • Emiya, V., Badeau, R. & David, B (2010). Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1643–1654.

    Article  Google Scholar 

  • Ewert, S., & Müller, M. (2011). Estimating note intensities in music recordings. In Int. conf. audio, speech, and signal processing (pp. 385–388).

  • Ewert, S., & Müller, M. (2012). Using score-informed constraints for NMF-based source separation. In Int. conf. audio, speech, and signal processing (pp. 129–132).

  • Ewert, S., Muller, M., Grosche, P. (2009). High resolution audio synchronization using chroma onset features. In IEEE international conference on audio, speech and signal processing (pp. 1869–1872).

  • Eyben, F., Böck, S., Schuller, B., Graves, A. (2012). Universal onset detection with bidirectional long short-term memory neural networks. In 11th international society for music information retrieval conference.

  • Fourer, D., & Marchand, S. (2012). Informed multiple-F0 estimation applied to monaural audio source separation. In 20th European signal processing conf. (pp. 2158–2162).

  • Freund, Y., Schapire, R. & Abe, N (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5), 771–780.

    Google Scholar 

  • Fuentes, B., Badeau, R., Richard, G. (2011). Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In Int. conf. audio, speech, and signal processing (pp. 401–404).

  • Fuentes, B., Badeau, R., Richard, G. (2012). Blind harmonic adaptive decomposition applied to supervised source separation. In 20th European signal processing conf. (pp. 2654–2658).

  • Gang, R., Bocko, G., Lundberg, J., Roessner, S., Headlam, D., Bocko, M. (2011). A real-time signal processing framework of musical expressive feature extraction using MATLAB. In 12th int. society for music information retrieval conf. (pp. 115–120).

  • Giannoulis, D., & Klapuri, A. (2013). Musical instrument recognition in polyphonic audio using missing feature approach. In IEEE transactions on audio, speech, and language processing (Vol. 21, no. 9, pp. 1805–1817). doi:10.1109/TASL.2013.2248720.

  • Gillet, O., & Richard, G. (2003). Automatic labelling of tabla signals. In 4th int. conf. on music information retrieval.

  • Goto, M (2004). A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43, 311–329.

    Article  Google Scholar 

  • Goto, M. (2012). Grand challenges in music information research. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 217–225). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.

  • Goto, M., Hashiguchi, H., Nishimura, T., Oka, R. (2002). RWC music database: Popular, classical, and jazz music databases. In Proc. ISMIR (Vol. 2, pp. 287–288).

  • Gouyon, F. & Dixon, S (2005). A review of automatic rhythm description systems. Computer Music Journal, 29(1), 34–54.

    Article  Google Scholar 

  • Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G. & Uhle, C (2006). An experimental comparison of audio tempo induction algorithms. IEEE Transactions on Audio, Speech and Language Processing, 14(5), 1832–1844.

    Article  Google Scholar 

  • Grindlay, G. & Ellis, D (2011). Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1159–1169.

    Article  Google Scholar 

  • Grosche, P., Schuller, B., Müller, M. & Rigoll, G (2012). Automatic transcription of recorded music. Acta. Acustica United with Acustica, 98(2), 199–215.

    Article  Google Scholar 

  • Heittola, T., Klapuri, A., Virtanen, T. (2009). Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In 10th int. society for music information retrieval conf. (pp. 327–332).

  • Herrera-Boyer, P., Klapuri, A., Davy, M. (2006). Automatic classification of pitched musical instrument sounds. In Signal processing methods for music transcription (pp. 163–200).

  • Holzapfel, A., Stylianou, Y., Gedik, A. & Bozkurt, B (2010). Three dimensions of pitched instrument onset detection. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1517–1527.

    Article  Google Scholar 

  • Huang, X., Acero, A., Hon, H.W. (Eds.). (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall.

  • Humphrey, E.J., Bello, J.P., LeCun, Y. (2013). Feature learning and deep architectures: new directions for music informatics. Journal of Intelligent Information Systems. doi:10.1007/s10844-013-0248-5.

  • Itoyama, K., Goto, M., Komatani, K., Ogata, T., Okuno, H. (2011). Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling. In Int. conf. audio, speech, and signal processing (pp. 3816–3819).

  • Izmirli, O. (2005). An algorithm for audio key finding. In Music information retrieval evaluation exchange. http://www.music-ir.org/mirex/abstracts/2005/izmirli.pdf.

  • Kameoka, H., Nishimoto, T. & Sagayama, S (2007). A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 982–994.

    Article  Google Scholar 

  • Kameoka, H., Ochiai, K., Nakano, M., Tsuchiya, M., Sagayama, S. (2012). Context-free 2D tree structure model of musical notes for Bayesian modeling of polyphonic spectrograms. In 13th int. society for music information retrieval conf. (pp. 307–312).

  • Kasimi, A.A., Nichols, E., Raphael, C. (2007). A simple algorithm for automatic generation of polyphonic piano fingerings. In 8th international conference on music information retrieval (pp. 355–356). Vienna, Austria.

  • Kirchhoff, H., Dixon, S., Klapuri, A. (2012). Shift-variant non-negative matrix deconvolution for music transcription. In Int. conf. audio, speech, and signal processing (pp. 125–128).

  • Kitahara, T., Goto, M., Komatani, K., Ogata, T. & Okuno, H. G (2007). Instrogram: probabilistic representation of instrument existence for polyphonic music. Information and Media Technologies, 2(1), 279–291.

    Google Scholar 

  • Klapuri, A (2003). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Audio, Speech, and Language Processing, 11(6), 804–816.

    Article  Google Scholar 

  • Klapuri, A., Davy, M. (Eds.). (2006). Signal processing methods for music transcription. Springer.

  • Klapuri, A., Eronen, A. & Astola, J (2006). Analysis of the meter of acoustic musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 342–355.

    Article  Google Scholar 

  • Klapuri, A., Eronen, A., Seppänen, J., Virtanen, T. (2001). Automatic transcription of music. In Symposium on stochastic modeling of music. Ghent, Belgium.

  • Koretz, A. & Tabrikian, J (2011). Maximum a posteriori probability multiple pitch tracking using the harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2210–2221.

    Article  Google Scholar 

  • Lacoste, A., & Eck, D. (2007). A supervised classification algorithm for note onset detection. EURASIP Journal on Applied Signal Processing, 2007(1), 1–13. ID 43745.

    Google Scholar 

  • Large, E. & Kolen, J (1994). Resonance and the perception of musical meter. Connection Science, 6, 177–208.

    Article  Google Scholar 

  • Lee, C. T., Yang, Y. H. & Chen, H (2012). Multipitch estimation of piano music by exemplar-based sparse representation. IEEE Trans. Multimedia, 14(3), 608–618.

    Article  Google Scholar 

  • Lee, K. & Slaney, M (2008). Acoustic chord transcription and key extraction from audio using key-dependent hmms trained on synthesized audio. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 291–301.

    Article  Google Scholar 

  • Leveau, P., Vincent, E., Richard, G. & Daudet, L (2008). Instrument-specific harmonic atoms for mid-level music representation. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 116–128.

    Article  Google Scholar 

  • Little, D., & Pardo, B. (2008). Learning musical instruments from mixtures of audio with weak labels. In 9th int. conf. on music information retrieval (p. 127).

  • Loscos, A., Wang, Y., Boo, W. (2006). Low level descriptors for automatic violin transcription. In 7th int. conf. on music information retrieval (pp. 164–167).

  • Maezawa, A., Itoyama, K., Komatani, K., Ogata, T. & Okuno, H. G (2012). Automated violin fingering transcription through analysis of an audio recording. Computer Music Journal, 36(3), 57–72.

    Article  Google Scholar 

  • Marolt, M (2012). Automatic transcription of bell chiming recordings. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 844–853.

    Article  Google Scholar 

  • Mauch, M. & Dixon, S (2010). Simultaneous estimation of chords and musical context from audio. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1280–1289.

    Article  Google Scholar 

  • Mauch, M., Noland, K., Dixon, S. (2009). Using musical structure to enhance automatic chord transcription. In 10th int. society for music information retrieval conf. (pp. 231–236).

  • McKinney, M., Moelants, D., Davies, M. & Klapuri, A (2007). Evalutation of audio beat tracking and music tempo extraction algorithms. Journal of New. Music Research, 36(1), 1–16.

    Article  Google Scholar 

  • Music Information Retrieval Evaluation eXchange (MIREX) (2011). http://music-ir.org/mirexwiki/. Accessed 8 Jul 2013.

  • Müller, M., Ellis, D., Klapuri, A. & Richard, G (2011). Signal processing for music analysis. IEEE J. Selected Topics in Signal Processing, 5(6), 1088–1110.

    Article  Google Scholar 

  • Nam, J., Ngiam, J., Lee, H., Slaney, M. (2011). A classification-based polyphonic piano transcription approach using learned feature representations. In 12th int. society for music information retrieval conf. (pp. 175–180).

  • Nesbit, A., Hollenberg, L., Senyard, A. (2004). Towards automatic transcription of Australian aboriginal music. In 5th int. conf. on music information retrieval (pp. 326–330).

  • Noland, K., & Sandler, M. (2006). Key estimation using a hidden markov model. In Proceedings of the 7th international conference on music information retrieval (ISMIR) (pp. 121–126).

  • Ochiai, K., Kameoka, H., Sagayama, S. (2012). Explicit beat structure modeling for non-negative matrix factorization-based multipitch analysis. In Int. conf. audio, speech, and signal processing (pp. 133–136).

  • O’Hanlon, K., Nagano, H., Plumbley, M. (2012). Structured sparsity for automatic music transcription. In IEEE international conference on audio, speech and signal processing (pp. 441–444).

  • Oram, A., & Wilson, G. (2010). Making software: What really works, and why we believe it. O’Reilly Media, Incorporated.

  • Oudre, L., Grenier, Y., Févotte, C. (2009). Template-based chord recognition: Influence of the chord types. In 10th international society for music information retrieval conference (pp. 153–158).

  • Özaslan, T., Serra, X., Arcos, J.L. (2012). Characterization of embellishments in Ney performances of Makam music in Turkey. In 13th int. society for music information retrieval conf.

  • Ozerov, A., Vincent, E. & Bimbot, F (2012). A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio, Speech, and Language Processing, 20(4), 1118–1133.

    Article  Google Scholar 

  • Papadopoulos, H., & Peeters, G. (2008). Simultaneous estimation of chord progression and downbeats from an audio file. In IEEE international conference on acoustics, speech and signal processing (pp. 121–124).

  • Papadopoulos, H. & Peeters, G (2011). Joint estimation of chords and downbeats from an audio signal. IEEE Transactions on Audio, Speech and Language Processing, 19(1), 138–152.

    Article  Google Scholar 

  • Peeling, P. & Godsill, S (2011). Multiple pitch estimation using non-homogeneous Poisson processes. IEEE J. Selected Topics in Signal Processing, 5(6), 1133–1143.

    Article  Google Scholar 

  • Peeters, G. (2006). Musical key estimation of audio signal based on hidden Markov modeling of chroma vectors. In Proceedings of the 9th international conference on digital audio effects (pp. 127–131).

  • Pertusa, A., & Iñesta, J.M. (2008). Multiple fundamental frequency estimation using Gaussian smoothness. In int. conf. audio, speech, and signal processing (pp. 105–108).

  • Poliner, G. & Ellis, D (2007). A discriminative model for polyphonic piano transcription. EURASIP J. Advances in Signal Processing, 8, 154–162.

    Google Scholar 

  • Poliner, G., Ellis, D., Ehmann, A., Gomez, E., Streich, S. & Ong, B (2007). Melody transcription from music audio: Approaches and evaluation. IEEE Trans. Audio, Speech, and Language Processing, 15(4), 1247–1256.

    Article  Google Scholar 

  • Raczyński, S.A., Ono, N., Sagayama, S. (2009). Note detection with dynamic bayesian networks as a postanalysis step for NMF-based multiple pitch estimation techniques. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 49–52).

  • Raczynski, S.A., Vincent, E., Bimbot, F., Sagayama, S., et al. (2010). Multiple pitch transcription using DBN-based musicological models. In 2010 int. society for music information retrieval conf. (ISMIR) (pp. 363–368).

  • Radicioni, D.P., & Lombardo, V. (2005) Fingering for music performance. In International computer music conference (pp. 527–530).

  • Raphael, C. (2005). A graphical model for recognizing sung melodies. In 6th international conference on music information retrieval (pp. 658–663).

  • Reis, G., Fonseca, N., de Vega, F.F., Ferreira, A. (2008). Hybrid genetic algorithm based on gene fragment competition for polyphonic music transcription. In Conf. applications of evolutionary computing (pp. 305–314).

  • Röbel, A. (2005). Onset detection in polyphonic signals by means of transient peak classification. In Music information retrieval evaluation exchange. http://www.music-ir.org/evaluation/mirex-results/articles/onset/roebel.pdf.

  • Ryynänen, M., & Klapuri, A. (2005). Polyphonic music transcription using note event modeling. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 319–322).

  • Ryynänen, M. & Klapuri, A (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.

    Article  Google Scholar 

  • Scheirer, E. (1997). Using musical knowledge to extract expressive performance information from audio recordings. In H. Okuno, D. Rosenthal (Eds.), Readings in computational auditory scene analysis. Lawrence Erlbaum.

  • Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., Gómez, E., Gouyon, F., Herrera, P., Jorda, S., Paytuvi, O., Peeters, G., Schlüter, J., Vinet, H., Widmer, G. (2013). Roadmap for music information research. Creative Commons BY-NC-ND 3.0 license. http://mires.eecs.qmul.ac.uk.

  • Smaragdis, P., & Brown, J.C. (2003). Non-negative matrix factorization for polyphonic music transcription. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 177–180).

  • Smaragdis, P. & Mysore, G. J (2009). Separation by humming: User-guided sound extraction from monophonic mixtures. In, IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). USA: New Paltz.

    Google Scholar 

  • Smaragdis, P., Raj, B. & Shashanka, M (2006). A probabilistic latent variable model for acoustic modeling. In, Neural information processing systems workshop. Canada: Whistler.

    Google Scholar 

  • Vandewalle, P., Kovacevic, J. & Vetterli, M (2009). Reproducible research in signal processing. Signal Processing Magazine, IEEE, 26(3), 37–47.

    Article  Google Scholar 

  • Vincent, E., Bertin, N. & Badeau, R (2010). Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 528–537.

    Article  Google Scholar 

  • Wang, Y. & Zhang, B (2008). Application-specific music transcription for tutoring. IEEE MultiMedia, 15(3), 70–74.

    Article  Google Scholar 

  • Wilson, G., Aruliah, D., Brown, C.T., Hong, N.P.C., Davis, M., Guy, R.T., Haddock, S.H., Huff, K., Mitchell, I.M., Plumbley, M.D., et al. (2012). Best practices for scientific computing. arXiv preprint arXiv:1210.0530.

  • Wu, J., Vincent, E., Raczynski, S., Nishimoto, T., Ono, N., Sagayama, S. (2011). Multipitch estimation by joint modeling of harmonic and transient sounds. In Int. conf. audio, speech, and signal processing (pp. 25–28).

  • Yeh, C. (2008). Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis, Université Paris VI - Pierre et Marie Curie, France.

  • Yoshii, K. & Goto, M (2012). A nonparametric Bayesian multipitch analyzer based on infinite latent harmonic allocation. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 717–730.

    Article  Google Scholar 

  • Zhou, R., & Reiss, J. (2007). Music onset detection combining energy-based and pitch-based approaches. In Music information retrieval evaluation exchange. http://www.music-ir.org/mirex/abstracts/2007/OD_zhou.pdf.

Download references

Acknowledgements

E. Benetos is funded by a City University London Research Fellowship. D. Giannoulis and H. Kirchhoff are funded by a Queen Mary University of London CDTA Studentship. We acknowledge the support of the MIReS project, supported by the European Commission, FP7, ICT-2011.1.5 Networked Media and Search Systems, grant agreement No 287711.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emmanouil Benetos.

Additional information

All authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Benetos, E., Dixon, S., Giannoulis, D. et al. Automatic music transcription: challenges and future directions. J Intell Inf Syst 41, 407–434 (2013). https://doi.org/10.1007/s10844-013-0258-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0258-3

Keywords

Navigation