Cross-Modal Approach for Karaoke Artifacts Correction

Yan, Wei-Qi; Kankanhalli, Mohan S.

doi:10.1007/978-0-387-89024-1_9

Cross-Modal Approach for Karaoke Artifacts Correction

Wei-Qi Yan² &
Mohan S. Kankanhalli³

Chapter
First Online: 01 January 2009

1288 Accesses

Abstract

In this chapter, we combine adaptive sampling in conjunction with video analogies (VA) to correct the audio stream in the karaoke environment \(\kappa= \left \{\kappa (t) :\ \kappa (t) = \left (U(t),\ K(t)\right ),\ t \in \left ({t}_{s},\ {t}_{e}\right )\right \}\) where t _s and t _e are start time and end time respectively, U(t) is the user multimedia data. We employ multiple streams from the karaoke data \(K(t) = \left ({K}_{V }(t),\ {K}_{M}(t),\ {K}_{S}(t)\right )\), where K _V(t), K _M(t) and K _S(t) are the video, musical accompaniment and original singer’s rendition respectively along with the user multimedia data \(U(t) = \left ({U}_{A}(t),{U}_{V }(t)\right )\) where U _V(t) is the user video captured with a camera and U _A(t) is the user’s rendition of the song. We analyze the audio and video streaming features \(\Psi (\kappa ) = \left \{\Psi (U(t),\ K(t))\right \} = \left \{\Psi (U(t)),\ \Psi (K(t))\right \} = \left \{{\Psi }_{U}(t),\ {\Psi }_{K}(t)\right \}\), to produce the corrected singing, namely output U ^′(t), which is made as close as possible to the original singer’s rendition. Note that Ψ represents any kind of feature processing.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Marc Davis. Editing out video editing. IEEE Multimedia, pages 54{64, Apr.-Jun. 2003.
Google Scholar
Randy Goldberg and Lance Riek. A Practical Handbook of Speech Coders. CRC Press, Floria U.S.A., 2000.
Google Scholar
Jonathan Harrington and Steve Cassidy. Techniques in Speech Acoustics. Kluwer Academic Press, Dordrecht, The Netherlands, 1999.
Google Scholar
Mohan S. Kankanhalli, Jun Wang, and Ramesh Jain. Experiential sampling in multimedia systems. IEEE Transactions on Multimedia, 8(5):937–946, Sep. 2006.
Article Google Scholar
Hirokazu Kato. Karaoke apparatus selectively providing harmony voice to duet singing voices. U.S. Patent 6121531, Sep. 2000.
Google Scholar
David Kumar and Subutai Ahmad. Method and apparatus for providing interactive karaoke entertainment. U.S. Patent 6692259, Dec. 2002.
Google Scholar
Shuichi Matsumoto. Karaoke apparatus converting gender of singing voice to match octave of song. U.S. Patent 5889223, Mar. 1998.
Google Scholar
Kenji Muraki and Katsuyoshi Fujii. Karaoke sound processor for automatically adjusting the pitch of the accompaniment signal. U.S. Patent 5477003, Dec. 1995.
Google Scholar
Milan Sonka, Vaclav Hlavac, and Roger Boyle. Image Processing, Analysis, and Machine Vision. PWS Publishing, 1998.
Google Scholar
Xiaou Tang, Xinbo Gao, Jianzhuang Liu, and Hongjiang Zhang. A spatial-temporal approach for video caption detection and recognition. IEEE Transactions on Neural Networks, 13(4):961–971, Jul. 2002.
Article Google Scholar
Xiaou Tang, Bo Luo, Xinbo Gao, Edwige Pissaloux, Jianzhuang Liu, and Hongjiang Zhang. Video text extraction using temporal feature vectors. InProc. of IEEE ICME 2002, pages 85–88, Lausanne, Switzerland, Aug. 2002.
Google Scholar
Ye Wang, Min-Yen Kan, Tin-Lay Nwe, Arun Shenoy, and Jun Yin. Lyrically: Automatic synchronization of acoustic musical signals and textual lyrics. InProc. of ACM Multimedia 2004, pages 212 - 219, New York, USA, Oct. 2004.
Google Scholar
Wei-Qi Yan and Mohan S Kankanhalli. Detection and removal of lighting and shaking artifacts in home videos. InProc. of ACM Multimedia 2002, pages 107–116, Juan Les Pins, France, Dec. 2002.
Google Scholar
Wei-Qi Yan, Jun Wang, and Mohan S. Kankanhalli. Analogies based video editing. ACM Multimedia Systems, 11(1):3–18, 2005.
Article Google Scholar
HongJiang Zhang, Atreyi Kankanhalli, and Stephen W. Smoliar. Automatic partitioning of full-motion video. ACM/Springer Multimedia Systems, 1(1):10–28, 1993.
Article Google Scholar
Yi Zhang and Tat-Seng Chua. Detection of text captions in compressed domain video. In Proc. of ACM Multimedia 2000, pages 201–204, Marina Del Rey, CA USA, Aug. 2000.
Google Scholar
Yong-Wei Zhu, Mohan S Kankanhalli, and Chang-Sheng Xu. Music scale modeling for melody matching. In Proc. of ACM Multimedia 2003, pages 359–362, Berkeley, U.S., Nov. 2003.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Queen’s University of Belfast, Belfast, UK
Wei-Qi Yan
Department of Computer Science, National University of Singapore, Berlin, Singapore
Mohan S. Kankanhalli

Authors

Wei-Qi Yan
View author publications
You can also search for this author in PubMed Google Scholar
Mohan S. Kankanhalli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohan S. Kankanhalli .

Editor information

Editors and Affiliations

Dept. Computer Science &, Florida Atlantic University, Glades Road 777, Boca Raton, 33431, U.S.A.
Borko Furht

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yan, WQ., Kankanhalli, M.S. (2009). Cross-Modal Approach for Karaoke Artifacts Correction. In: Furht, B. (eds) Handbook of Multimedia for Digital Entertainment and Arts. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-89024-1_9

Download citation

DOI: https://doi.org/10.1007/978-0-387-89024-1_9
Published: 09 June 2009
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-89023-4
Online ISBN: 978-0-387-89024-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics