Film clips retrieval using image queries

  • Ling ZouEmail author
  • Xin Jin
  • Bo Wei


The emergence of entertainment industry motivates the explosive growth of automatically film trailer. Manually finding desired clips from these large amounts of films is time-consuming and tedious, which makes finding the moments of user major or special preference becomes an urgent problem. Moreover, the user subjectivity over a film makes no fixed trailer caters to all tastes. This paper addresses these problems by posing a query-related film clip extraction framework which optimizes selected frames not only meet the semantic meaning of the queries but also have visual similarity on appearance between the query and selected clips. The experimental results show that our query-related film clip retrieval method is particularly useful for film editing, e.g. automatically finding movie clips to arouse audiences’ interests on the film.


Deep learning Transfer learning Film editing 



The research was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61703046 and open projects of state key laboratory of virtual reality technology and systems (No. BUAA-VR-17KF-05).


  1. 1.
    Chen J, Wang YT, Liu Y, Weng DD (2007) System initialization algorithm based on sift key points for markerless augmented reality applications. Infrared Laser Eng 36(6):949–953Google Scholar
  2. 2.
    Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) Imagenet: A large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, pp 248–255Google Scholar
  3. 3.
    Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: 2012 IEEE conference on computer vision and pattern recognition, providence, RI, pp 1346–1353. Google Scholar
  4. 4.
    Gygli M, Grabner H, Riemenschneider H, Gool LV (2014) Creating summaries from user videos. In: European Conference on Computer Vision, pp 505–520Google Scholar
  5. 5.
    Joshi N, Kienzle W, Toelle M, Uyttendaele M, Cohen MF (2015) Real-time hyperlapse creation via optimal frame selection. Acm Trans Graph 34(4):63CrossRefGoogle Scholar
  6. 6.
    Kulesza A, Taskar B (2012) Determinantal point processes for machine learning. Found Trends Mach Learn 5(2-3):17CrossRefGoogle Scholar
  7. 7.
    Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2714–2721Google Scholar
  8. 8.
    Lu H, Li Y, Chen M, Kim H, Serikawa S (2017) Brain intelligence: Go beyond artificial intelligence. Mob Netw Appl 23(2):368–375CrossRefGoogle Scholar
  9. 9.
    Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2017) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J PP(99):1–1Google Scholar
  10. 10.
    Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Conference on computer vision and pattern recognitionGoogle Scholar
  11. 11.
    Sayad IE, Martinet J, Urruty T, Benabbas Y, Djeraba C (2011) A semantically significant visual representation for social image retrieval. In: IEEE International Conference on Multimedia and Expo, pp 1–6Google Scholar
  12. 12.
    Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Pergamon Press Inc., OxfordCrossRefGoogle Scholar
  13. 13.
    Sharghi A, Gong B, Shah M (2016) Query-focused extractive video summarization. In: European Conference on Computer Vision, pp 3–19Google Scholar
  14. 14.
    Sharghi A, Laurel JS, Gong B Query-focused video summarization: Dataset, evaluation, and a memory network based approachGoogle Scholar
  15. 15.
    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision, pp 2818–2826Google Scholar
  16. 16.
    Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. Acm Trans Multimed Comput Commun Appl 3(1):3CrossRefGoogle Scholar
  17. 17.
    Wang F (2005) A cluster algorithm of automatic key frame extraction based on adaptive threshold. J Comput Res Dev 42(10):1752–1757CrossRefGoogle Scholar
  18. 18.
    Wang H, Xin-Xiao WU, Jia YD (2013) Video annotation by using heterogeneous multiple image groups on the web. Chin J Comput 36(10):2062–2069CrossRefGoogle Scholar
  19. 19.
    Yan-Fenga LI, Yua WU, Shi-Longb XU Video key-frame retrieval based on analysis of combined features, Video EngineeringGoogle Scholar
  20. 20.
    Yao T, Mei T, Ngo CW, Li S (2013) Annotation for free:video tagging by mining user search behavior, pp 977–986Google Scholar
  21. 21.
    Zhang CL, Luo JH, Wei XS, Wu J In defense of fully connected layers in visual representation transferGoogle Scholar
  22. 22.
    Zhang CL, Luo JH, Wei XS, Wu J (2017) In defense of fully connected layers in visual representation transfer. In: Pacific-rim conference on multimediaGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Digital Media SchoolBeijing Film AcademyBeijingChina
  2. 2.Beijing Electronic Science and Technology InstituteBeijingChina
  3. 3.Hangzhou dianzi UniversityZhejiangChina

Personalised recommendations