Skip to main content

Cross-Modal Approach for Karaoke Artifacts Correction

  • Chapter
  • First Online:
  • 1288 Accesses

Abstract

In this chapter, we combine adaptive sampling in conjunction with video analogies (VA) to correct the audio stream in the karaoke environment \(\kappa= \left \{\kappa (t) :\ \kappa (t) = \left (U(t),\ K(t)\right ),\ t \in \left ({t}_{s},\ {t}_{e}\right )\right \}\) where t s and t e are start time and end time respectively, U(t) is the user multimedia data. We employ multiple streams from the karaoke data \(K(t) = \left ({K}_{V }(t),\ {K}_{M}(t),\ {K}_{S}(t)\right )\), where K V (t), K M (t) and K S (t) are the video, musical accompaniment and original singer’s rendition respectively along with the user multimedia data \(U(t) = \left ({U}_{A}(t),{U}_{V }(t)\right )\) where U V (t) is the user video captured with a camera and U A (t) is the user’s rendition of the song. We analyze the audio and video streaming features \(\Psi (\kappa ) = \left \{\Psi (U(t),\ K(t))\right \} = \left \{\Psi (U(t)),\ \Psi (K(t))\right \} = \left \{{\Psi }_{U}(t),\ {\Psi }_{K}(t)\right \}\), to produce the corrected singing, namely output U (t), which is made as close as possible to the original singer’s rendition. Note that Ψ represents any kind of feature processing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Marc Davis. Editing out video editing. IEEE Multimedia, pages 54{64, Apr.-Jun. 2003.

    Google Scholar 

  2. Randy Goldberg and Lance Riek. A Practical Handbook of Speech Coders. CRC Press, Floria U.S.A., 2000.

    Google Scholar 

  3. Jonathan Harrington and Steve Cassidy. Techniques in Speech Acoustics. Kluwer Academic Press, Dordrecht, The Netherlands, 1999.

    Google Scholar 

  4. Mohan S. Kankanhalli, Jun Wang, and Ramesh Jain. Experiential sampling in multimedia systems. IEEE Transactions on Multimedia, 8(5):937–946, Sep. 2006.

    Article  Google Scholar 

  5. Hirokazu Kato. Karaoke apparatus selectively providing harmony voice to duet singing voices. U.S. Patent 6121531, Sep. 2000.

    Google Scholar 

  6. David Kumar and Subutai Ahmad. Method and apparatus for providing interactive karaoke entertainment. U.S. Patent 6692259, Dec. 2002.

    Google Scholar 

  7. Shuichi Matsumoto. Karaoke apparatus converting gender of singing voice to match octave of song. U.S. Patent 5889223, Mar. 1998.

    Google Scholar 

  8. Kenji Muraki and Katsuyoshi Fujii. Karaoke sound processor for automatically adjusting the pitch of the accompaniment signal. U.S. Patent 5477003, Dec. 1995.

    Google Scholar 

  9. Milan Sonka, Vaclav Hlavac, and Roger Boyle. Image Processing, Analysis, and Machine Vision. PWS Publishing, 1998.

    Google Scholar 

  10. Xiaou Tang, Xinbo Gao, Jianzhuang Liu, and Hongjiang Zhang. A spatial-temporal approach for video caption detection and recognition. IEEE Transactions on Neural Networks, 13(4):961–971, Jul. 2002.

    Article  Google Scholar 

  11. Xiaou Tang, Bo Luo, Xinbo Gao, Edwige Pissaloux, Jianzhuang Liu, and Hongjiang Zhang. Video text extraction using temporal feature vectors. InProc. of IEEE ICME 2002, pages 85–88, Lausanne, Switzerland, Aug. 2002.

    Google Scholar 

  12. Ye Wang, Min-Yen Kan, Tin-Lay Nwe, Arun Shenoy, and Jun Yin. Lyrically: Automatic synchronization of acoustic musical signals and textual lyrics. InProc. of ACM Multimedia 2004, pages 212 - 219, New York, USA, Oct. 2004.

    Google Scholar 

  13. Wei-Qi Yan and Mohan S Kankanhalli. Detection and removal of lighting and shaking artifacts in home videos. InProc. of ACM Multimedia 2002, pages 107–116, Juan Les Pins, France, Dec. 2002.

    Google Scholar 

  14. Wei-Qi Yan, Jun Wang, and Mohan S. Kankanhalli. Analogies based video editing. ACM Multimedia Systems, 11(1):3–18, 2005.

    Article  Google Scholar 

  15. HongJiang Zhang, Atreyi Kankanhalli, and Stephen W. Smoliar. Automatic partitioning of full-motion video. ACM/Springer Multimedia Systems, 1(1):10–28, 1993.

    Article  Google Scholar 

  16. Yi Zhang and Tat-Seng Chua. Detection of text captions in compressed domain video. In Proc. of ACM Multimedia 2000, pages 201–204, Marina Del Rey, CA USA, Aug. 2000.

    Google Scholar 

  17. Yong-Wei Zhu, Mohan S Kankanhalli, and Chang-Sheng Xu. Music scale modeling for melody matching. In Proc. of ACM Multimedia 2003, pages 359–362, Berkeley, U.S., Nov. 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohan S. Kankanhalli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Yan, WQ., Kankanhalli, M.S. (2009). Cross-Modal Approach for Karaoke Artifacts Correction. In: Furht, B. (eds) Handbook of Multimedia for Digital Entertainment and Arts. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-89024-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-89024-1_9

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-89023-4

  • Online ISBN: 978-0-387-89024-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics