REPET for Background/Foreground Separation in Audio

Rafii, Zafar; Liutkus, Antoine; Pardo, Bryan

doi:10.1007/978-3-642-55016-4_14

Zafar Rafii³,
Antoine Liutkus⁴ &
Bryan Pardo³

Part of the book series: Signals and Communication Technology ((SCT))

2909 Accesses
10 Citations

Abstract

Repetition is a fundamental element in generating and perceiving structure. In audio, mixtures are often composed of structures where a repeating background signal is superimposed with a varying foreground signal (e.g., a singer overlaying varying vocals on a repeating accompaniment or a varying speech signal mixed up with a repeating background noise). On this basis, we present the REpeating Pattern Extraction Technique (REPET), a simple approach for separating the repeating background from the non-repeating foreground in an audio mixture. The basic idea is to find the repeating elements in the mixture, derive the underlying repeating models, and extract the repeating background by comparing the models to the mixture. Unlike other separation approaches, REPET does not depend on special parameterizations, does not rely on complex frameworks, and does not require external information. Because it is only based on repetition, it has the advantage of being simple, fast, blind, and therefore completely and easily automatable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Blandin, C., Ozerov, A., Vincent, E.: Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Process. 92(8), 1950–1960 (2012)
Article Google Scholar
Bregman, A.S.: Auditory Scene Analysis. MIT Press, Cambridge (1990)
Google Scholar
Durrieu, J.L., David, B., Richard, G.: A musically motivated mid-level representation for pitch estimation and musical audio source separation. IEEE J. Sel. Top. Sig. Process. 5(6), 1180–1191 (2011)
Article Google Scholar
FitzGerald, D.: Vocal separation using nearest neighbours and median filtering. In: 23rd IET Irish Signals and Systems Conference. Maynooth, Ireland (2012)
Google Scholar
FitzGerald, D., Gainza, M.: Single channel vocal separation using median filtering and factorisation techniques. ISAST Trans. Electron. Signal Process. 4(1), 62–73 (2010)
Google Scholar
Foote, J.: Visualizing music and audio using self-similarity. In: 7th ACM International Conference on Multimedia, pp. 77–80. Orlando, FL, USA (1999)
Google Scholar
Foote, J., Uchihashi, S.: The beat spectrum: a new approach to rhythm analysis. In: IEEE International Conference on Multimedia and Expo, pp. 881–884. Tokyo, Japan (2001)
Google Scholar
Hsu, C.L., Jang, J.S.R.: On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset. IEEE Trans Audio Speech Lang. Process. 18(2), 310–319 (2010)
Article Google Scholar
Liutkus, A., Rafii, Z., Badeau, R., Pardo, B., Richard, G.: Adaptive filtering for music/voice separation exploiting the repeating musical structure. In: 37th International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan (2012)
Google Scholar
McDermott, J.H., Wrobleski, D., Oxenham, A.J.: Recovering sound sources from embedded repetition. Proc Nat. Acad. Sci. U.S.A. 108(3), 1188–1193 (2011)
Article Google Scholar
Nesta, F., Matassoni, M.: Robust automatic speech recognition through on-line semi blind source extraction. In: CHIME 2011 Workshop on Machine Listening in Multisource Environments, pp. 18–23. Florence, Italy (2011)
Google Scholar
Piccardi, M.: Background subtraction techniques: a review. In: IEEE International Conference on Systems, Man and Cybernetics. The Hague, The Netherlands (2004)
Google Scholar
Rafii, Z., Pardo, B.: A simple music/voice separation system based on the extraction of the repeating musical structure. In: 36th International Conference on Acoustics, Speech and Signal Processing. Prague, Czech Republic (2011)
Google Scholar
Rafii, Z., Pardo, B.: Music/voice separation using the similarity matrix. In: 13th International Society for Music Information Retrieval. Porto, Portugal (2012)
Google Scholar
Rafii, Z., Pardo, B.: Online REPET-SIM for real-time speech enhancement. In: 38th International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada (2013)
Google Scholar
Rafii, Z., Pardo, B.: REpeating Pattern Extraction Technique (REPET): A simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process 21(1), 71–82 (2013)
Article Google Scholar
Rafii, Z., Sun, D.L., Germain, F.G., Mysore, G.J.: Combining modeling of singing voice and background music for automatic separation of musical mixtures. In: 14th International Society for Music Information Retrieval. Curitiba, PR, Brazil (2013).
Google Scholar
Rangachari, S., Loizou, P.C.: A noise-estimation algorithm for highly non-stationary environments. Speech Commun. 48(2), 220–231 (2006)
Article Google Scholar
Rubin, E.: Synsoplevede Figurer. Gyldendal, Skive (1915)
Google Scholar
Özgür Yilmaz, Rickard, S.: Blind separation of speech mixtures via time–frequency masking. IEEE Trans. Signal Process. 52(7), 1830–1847 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Northwestern University, Evanston, IL, USA
Zafar Rafii & Bryan Pardo
Inria, PAROLE, Villiers-lès-Nancy, France
Antoine Liutkus

Authors

Zafar Rafii
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Liutkus
View author publications
You can also search for this author in PubMed Google Scholar
Bryan Pardo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zafar Rafii .

Editor information

Editors and Affiliations

University of Technology, Sydney, Sydney, Australia
Ganesh R. Naik
University of Surrey, Guildford, United Kingdom
Wenwu Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rafii, Z., Liutkus, A., Pardo, B. (2014). REPET for Background/Foreground Separation in Audio. In: Naik, G., Wang, W. (eds) Blind Source Separation. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55016-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-55016-4_14
Published: 22 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55015-7
Online ISBN: 978-3-642-55016-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics