Multimedia Tools and Applications

, Volume 54, Issue 1, pp 7–25 | Cite as

Scene extraction system for video clips using attached comment interval and pointing region

  • Shoko Wakamiya
  • Daisuke Kitayama
  • Kazutoshi Sumiya


A method was developed to enable users of video sharing websites to easily retrieve video scenes relevant to their interests. The system analyzes both text and non-text aspects of a user’s comment and then retrieves and displays relevant scenes along with attached comments. The text analysis works in tandem with non-text features, namely, the selected area and temporal duration associated with user comments. In this way, our system supports a better-organized retrieval of scenes that have been commented on with a higher degree of relevancy than conventional methods, such as using matching keywords. We describe our method and the relation between the scenes and discuss a prototype system.


Multimedia Video sharing Scene extraction User comments Selected area Temporal duration 



This research was supported in part by a Grant-in-Aid for Scientific Research (B)(2) 20300039 and Grant-in-Aid for JSPS Fellows 21.197 from the Ministry of Education, Culture, Sports, Science, and Technology of Japan.


  1. 1.
    Allen JF (1983) Maintaining knowledge about temporal intervals. In: Communications of the ACM, vol 26, pp 832–843Google Scholar
  2. 2.
    Baluja S, Seth R, Sivakumar D, Jing Y, Yagnik J, Kumar S, Ravichandran D, Aly M (2008) Video suggestion and discovery for YouTube: taking random walks through the view graph. In: Proc. of the 17th international world wide web conference (WWW2008), pp 895–904Google Scholar
  3. 3.
    Braudy L, Cohen M (2004) Film theory and criticism. Oxford University Press, OxfordGoogle Scholar
  4. 4.
    Dao MS, Babaguchi N (2008) Sports event detection using temporal patterns mining and web-casting text. In: Proc. of the 1st ACM workshop on analysis and retrieval of events/actions and workflows in video streams (AREA2008), vol 26, pp 33–40Google Scholar
  5. 5.
    Fukino N, Ma Q, Sumiya K, Tanaka K (2003) Generating football video summery using news article. In: Proc. of the 14th data engineering workshop (DEWS2003), vol 8-P-03 (in Japanese)Google Scholar
  6. 6.
    Gong Y (1999) An accurate and robust method for detecting video shot boundaries. In: Proc. of IEEE international conference on multimedia computing and systems (ICMCS’99), vol 1, pp 850–854Google Scholar
  7. 7.
    GoogleVideo (2010) Accessed 22 Apr 2010
  8. 8.
    Karpenko A, Aarabi P (2008) Tiny videos: non-parametric content-based video retrieval and recognition. In: Proc. of the tenth IEEE international symposium on multimedia (ISM 2008), pp 619–624Google Scholar
  9. 9.
    Kimura T, Sumiya K, Tanaka H (2005) A video editing support system using users’ gazes. In: Proc. of IEEE Pacific Rim conference on communications, computers and signal processing (PACRIM2005), pp 149–152Google Scholar
  10. 10.
    Kitayama D, Oda N, Sumiya K (2008) Organizing user comments in a social video sharing system by temporal duration and pointing region. In: Proc. of international workshop on information-explosion and next generation search (INGS2008), pp 55–58Google Scholar
  11. 11.
    Masuda T, Yamamoto D, Ohira S, Nagao K (2008) Video scene retrieval using online video annotation. In: Lecture notes on artificial intelligence. Springer, HeidelbergGoogle Scholar
  12. 12.
    Miura K, Yamada I, Sumiyoshi H, Yagi N (2006) Automatic generation of a multimedia encyclopedia from tv programs by using closed captions and detecting principal video objects. In: Proc. of the eighth IEEE international symposium on multimedia, pp 873–880Google Scholar
  13. 13.
    Miyamori H, Nakamura S, Tanaka K (2005) Generation of views of TV content using TV viewers’ perspectives expressed in live chats on the web. In: Proc. of the 13th annual ACM international conference on multimedia (ACM Multimedia2005), pp 853–861Google Scholar
  14. 14.
  15. 15.
    Nakamura S, Shimizu M, Tanaka K (2008) Can social annotation support users in evaluating the trustworthiness of video clips?. In: Proc. of the 2nd ACM workshop on information credibility on the web (WICOW’08), pp 59–62Google Scholar
  16. 16.
  17. 17.
    Pradhan S, Tajima K, Tanaka K (1999) A query model for retrieving relevant intervals within a video stream. In: Proc. of IEEE international conference on multimedia computing and systems, vol 2, pp 788–792Google Scholar
  18. 18.
    Saraceno C, Leonardi R (1997) Identification of successive correlated camera shots using audio and video information. In: Proc. of international conference on image processing (ICIP’97), vol 3, pp 116–119Google Scholar
  19. 19.
    Shen E Y-T, Lieberman H, Davenport G (2009) What’s next?: emergent storytelling from video collections. In: Proc. of the 27th international conference on human factors in computing systems, pp 809–818Google Scholar
  20. 20.
    Su J-H, Huang Y-T, Tseng VS (2008) Efficient content-based video retrieval by mining temporal patterns. In: Proc. of the 9th international workshop on multimedia data mining: held in conjunction with the ACM SIGKDD 2008, pp 36–42Google Scholar
  21. 21.
    Sundaram H, Chang S-F (2000) Determining computable scenes in films and their structures using audio-visual memory models. In: Proc. of the eighth ACM international conference on multimedia, pp 95–104Google Scholar
  22. 22.
  23. 23.
  24. 24.
    Tahaghoghi SMM, Williams HE, Thom JA, Volkmer T (2005) Video cut detection using frame windows. In: Proc. of the twenty-eighth Australasian conference on computer science (ACSC ’05), pp 193–199Google Scholar
  25. 25.
    Uehara H, Yoshida K (2005) Annotating TV drama based on viewer dialogue—analysis of viewers’ attention generated on an internet bulletin board. In: Proc. of IEEE/IPSJ international symposium on applications and the internet (SAINT2005), pp 334–340Google Scholar
  26. 26.
    Wang J, Chua T-S (2008) A framework for video scene boundary detection. In: Proc. of the tenth ACM international conference on multimedia, pp 243–246Google Scholar
  27. 27.
    Wu S, Chen Y (2007) Mining nonambiguous temporal patterns for interval-based events. In: IEEE transactions on knowledge and data engineering, vol 19, pp 742–758Google Scholar
  28. 28.
    Wu X, Takimoto M, Satoh S, Adachi J (2008) Scene duplicate detection based on the pattern of discontinuities in feature point trajectories. In: Proc. of the 16th ACM international conference on multimedia (MM’08), pp 51–60Google Scholar
  29. 29.
    Yamamoto D, Nagao K (2004) iVAS: web-based video annotation system and its applications. In: Proc. of the3rd international semantic web conferenceGoogle Scholar
  30. 30.
    Yamamoto D, Masuda T, Ohira S, Nagao K (2008) Video scene annotation based on web social activities. In: IEEE multimedia, pp 22–32Google Scholar
  31. 31.
    Yoshitaka A, Miyake M (2001) Scene detection by audio-visual features. In: Proc. of IEEE international conference on multimedia and expo (ICME2001), vol 3, pp 48–51Google Scholar
  32. 32.
  33. 33.
    Zanetti S, Zelnik-Manor L, Perona P (2008) A walk through the web’s video clips. In: Proc. of IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW ’08), pp 1–8Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Shoko Wakamiya
    • 1
  • Daisuke Kitayama
    • 1
  • Kazutoshi Sumiya
    • 1
  1. 1.University of HyogoHimejiJapan

Personalised recommendations