Cross-Modal Approach for Karaoke Artifacts Correction

  • Wei-Qi Yan
  • Mohan S. Kankanhalli


In this chapter, we combine adaptive sampling in conjunction with video analogies (VA) to correct the audio stream in the karaoke environment \(\kappa= \left \{\kappa (t) :\ \kappa (t) = \left (U(t),\ K(t)\right ),\ t \in \left ({t}_{s},\ {t}_{e}\right )\right \}\) where t s and t e are start time and end time respectively, U(t) is the user multimedia data. We employ multiple streams from the karaoke data \(K(t) = \left ({K}_{V }(t),\ {K}_{M}(t),\ {K}_{S}(t)\right )\), where K V (t), K M (t) and K S (t) are the video, musical accompaniment and original singer’s rendition respectively along with the user multimedia data \(U(t) = \left ({U}_{A}(t),{U}_{V }(t)\right )\) where U V (t) is the user video captured with a camera and U A (t) is the user’s rendition of the song. We analyze the audio and video streaming features \(\Psi (\kappa ) = \left \{\Psi (U(t),\ K(t))\right \} = \left \{\Psi (U(t)),\ \Psi (K(t))\right \} = \left \{{\Psi }_{U}(t),\ {\Psi }_{K}(t)\right \}\), to produce the corrected singing, namely output U (t), which is made as close as possible to the original singer’s rendition. Note that Ψ represents any kind of feature processing.


Peak Signal Noise Ratio Adaptive Sampling Audio Stream Video Caption Artifact Removal 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Marc Davis. Editing out video editing. IEEE Multimedia, pages 54{64, Apr.-Jun. 2003.Google Scholar
  2. 2.
    Randy Goldberg and Lance Riek. A Practical Handbook of Speech Coders. CRC Press, Floria U.S.A., 2000.Google Scholar
  3. 3.
    Jonathan Harrington and Steve Cassidy. Techniques in Speech Acoustics. Kluwer Academic Press, Dordrecht, The Netherlands, 1999.Google Scholar
  4. 4.
    Mohan S. Kankanhalli, Jun Wang, and Ramesh Jain. Experiential sampling in multimedia systems. IEEE Transactions on Multimedia, 8(5):937–946, Sep. 2006.CrossRefGoogle Scholar
  5. 5.
    Hirokazu Kato. Karaoke apparatus selectively providing harmony voice to duet singing voices. U.S. Patent 6121531, Sep. 2000.Google Scholar
  6. 6.
    David Kumar and Subutai Ahmad. Method and apparatus for providing interactive karaoke entertainment. U.S. Patent 6692259, Dec. 2002.Google Scholar
  7. 7.
    Shuichi Matsumoto. Karaoke apparatus converting gender of singing voice to match octave of song. U.S. Patent 5889223, Mar. 1998.Google Scholar
  8. 8.
    Kenji Muraki and Katsuyoshi Fujii. Karaoke sound processor for automatically adjusting the pitch of the accompaniment signal. U.S. Patent 5477003, Dec. 1995.Google Scholar
  9. 9.
    Milan Sonka, Vaclav Hlavac, and Roger Boyle. Image Processing, Analysis, and Machine Vision. PWS Publishing, 1998.Google Scholar
  10. 10.
    Xiaou Tang, Xinbo Gao, Jianzhuang Liu, and Hongjiang Zhang. A spatial-temporal approach for video caption detection and recognition. IEEE Transactions on Neural Networks, 13(4):961–971, Jul. 2002.CrossRefGoogle Scholar
  11. 11.
    Xiaou Tang, Bo Luo, Xinbo Gao, Edwige Pissaloux, Jianzhuang Liu, and Hongjiang Zhang. Video text extraction using temporal feature vectors. InProc. of IEEE ICME 2002, pages 85–88, Lausanne, Switzerland, Aug. 2002.Google Scholar
  12. 12.
    Ye Wang, Min-Yen Kan, Tin-Lay Nwe, Arun Shenoy, and Jun Yin. Lyrically: Automatic synchronization of acoustic musical signals and textual lyrics. InProc. of ACM Multimedia 2004, pages 212 - 219, New York, USA, Oct. 2004.Google Scholar
  13. 13.
    Wei-Qi Yan and Mohan S Kankanhalli. Detection and removal of lighting and shaking artifacts in home videos. InProc. of ACM Multimedia 2002, pages 107–116, Juan Les Pins, France, Dec. 2002.Google Scholar
  14. 14.
    Wei-Qi Yan, Jun Wang, and Mohan S. Kankanhalli. Analogies based video editing. ACM Multimedia Systems, 11(1):3–18, 2005.CrossRefGoogle Scholar
  15. 15.
    HongJiang Zhang, Atreyi Kankanhalli, and Stephen W. Smoliar. Automatic partitioning of full-motion video. ACM/Springer Multimedia Systems, 1(1):10–28, 1993.CrossRefGoogle Scholar
  16. 16.
    Yi Zhang and Tat-Seng Chua. Detection of text captions in compressed domain video. In Proc. of ACM Multimedia 2000, pages 201–204, Marina Del Rey, CA USA, Aug. 2000.Google Scholar
  17. 17.
    Yong-Wei Zhu, Mohan S Kankanhalli, and Chang-Sheng Xu. Music scale modeling for melody matching. In Proc. of ACM Multimedia 2003, pages 359–362, Berkeley, U.S., Nov. 2003.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceQueen’s University of BelfastBelfastUK
  2. 2.Department of Computer ScienceNational University of SingaporeBerlinSingapore

Personalised recommendations