Multimedia Tools and Applications

, Volume 62, Issue 1, pp 51–73 | Cite as

Classifying cinematographic shot types

  • Luca Canini
  • Sergio Benini
  • Riccardo Leonardi


In film-making, the distance from the camera to the subject greatly affects the narrative power of a shot. By the alternate use of Long shots, Medium and Close-ups the director is able to provide emphasis on key passages of the filmed scene. In this work we investigate five different inherent characteristics of single shots which contain indirect information about camera distance, without the need to recover the 3D structure of the scene. Specifically, 2D scene geometric composition, frame colour intensity properties, motion distribution, spectral amplitude and shot content are considered for classifying shots into three main categories. In the experimental phase, we demonstrate the validity of the framework and effectiveness of the proposed descriptors by classifying a significant dataset of movie shots using C4.5 Decision Trees and Support Vector Machines. After comparing the performance of the statistical classifiers using the combined descriptor set, we test the ability of each single feature in distinguishing shot types.


Shot type Movie content Feature extraction 



Figures 7 and 8 are modified versions of decision tree graphs obtained using the Orange software, available at


  1. 1.
    Arijon D (1991) Grammar of the film language. Silman-James PressGoogle Scholar
  2. 2.
    Barrow H, Tenenbaum J (1981) Interpreting line drawings as three-dimensional surfaces. Artif Intell 17(1–3):75–116CrossRefGoogle Scholar
  3. 3.
    Benini S, Canini L, Leonardi R (2010) Estimating cinematographic scene depth in movie shots. In: Proceedings of the IEEE International Conference on Multimedia & Expo (ICME). SingaporeGoogle Scholar
  4. 4.
    Benini S, Xu LQ, Leonardi R (2005) Using lateral ranking for motion-based video shot retrieval and dynamic content characterization. In: Proc. of CBMI. Riga, LatviaGoogle Scholar
  5. 5.
    Bordwell D, Thompson K (1997) Film art: an introduction. McGraw-HillGoogle Scholar
  6. 6.
    Bradski G (2000) The OpenCV library. Dr. Dobb’s Journal of Software ToolsGoogle Scholar
  7. 7.
    Brooks MJ (1989) Shape from shading. MIT Press, Cambridge, MA, USAGoogle Scholar
  8. 8.
    Canini L, Benini S, Leonardi R (2010) Interactive video mashup based on emotional identity. In: Proceedings of the 2010 European Signal Processing Conference (EUSIPCO). Aalborg, DenmarkGoogle Scholar
  9. 9.
    Canini L, Benini S, Leonardi R (2011) Affective analysis on patterns of shot types in movies. In: Proceedings of the 7th international symposium on Image and Signal Processing and Analysis (ISPA). Dubrovnik, CroatiaGoogle Scholar
  10. 10.
    Cantoni V, Lombardi L, Porta M, Vallone U (2001) Qualitative estimation of depth in monocular vision. In: Proc. of IWVF. Springer, London, UK, pp 135–144Google Scholar
  11. 11.
    Cherif I, Solachidis V, Pitas I (2007) Shot type identification of movie content. In: Proceedings of international symposium on signal processing and its applications. Sharjah, United Arab EmiratesGoogle Scholar
  12. 12.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATHGoogle Scholar
  13. 13.
    Duan LY, Xu M, Yu XD, Tian Q (2002) A unified framework for semantic shot classification in sport videos. In: Proc. of ACM MM. ACM, New York, NY, USA, pp 419–420Google Scholar
  14. 14.
    Duda RO, Hart PE (1972) Use of the hough transformation to detect lines and curves in pictures. Commun ACM 15(1):11–15CrossRefGoogle Scholar
  15. 15.
    Ekin A, Tekalp AM (2003) Robust dominant color region detection and color-based applications for sports videos. In: Proc. of ICIP’03. Barcelona, Spain, pp 1025–1028Google Scholar
  16. 16.
    Hanjalic A (2006) Extracting moods from pictures and sounds. IEEE Signal Process Mag 23(2):90–100CrossRefGoogle Scholar
  17. 17.
    Hoiem D (2007) Seeing the world behind the image: spatial layout for 3d scene understanding. Ph.D. thesis, Robotics Institute, Carnegie Mellon Univ., Pittsburgh, PAGoogle Scholar
  18. 18.
    Hsu CW, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13(2):415–425CrossRefGoogle Scholar
  19. 19.
    Internet Movie Database (IMDb) Accessed 2 May 2011
  20. 20.
    Jeannin S, Divakaran A (2001) Mpeg-7 visual motion descriptors. IEEE Trans Circuits Syst Video Technol 11(6):720–724CrossRefGoogle Scholar
  21. 21.
    Keller JM, Crownover RM, Chen RY (1987) Characteristics of natural scenes related to the fractal dimension. IEEE Trans Pattern Anal Mach Intell 9(5):621–627CrossRefGoogle Scholar
  22. 22.
    Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:149–268MathSciNetGoogle Scholar
  23. 23.
    Kurita T, Otsu N, Abdelmalek N (1992) Maximum likelihood thresholding based on population mixture models. Pattern Recogn 25(10):1231–1240CrossRefGoogle Scholar
  24. 24.
    Matessi A, Lombardi L (1999) Vanishing point detection in the hough transform space. In: Proc. of Euro-PAR ’99. Springer, London, UK, pp 987–994CrossRefGoogle Scholar
  25. 25.
    Monaco J (1981) How to read a film. Oxford University Press, New YorkGoogle Scholar
  26. 26.
    Nagai T, Naruse T, Ikehara M, Kurematsu A (2002) Hmm-based surface reconstruction from single images. In: Proc. of ICIP’02. Rochester, NY, USA, pp. 561–564Google Scholar
  27. 27.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175MATHCrossRefGoogle Scholar
  28. 28.
    Palmer SE (1999) Vision science-photons to phenomenology. MIT Press, Cambridge, MAGoogle Scholar
  29. 29.
    Porteous J, Benini S, Canini L, Charles F, Cavazza M, Leonardi R (2010) Interactive storytelling via video content recombination. In: Proceedings of ACM conference on multimedia (ACM MM). Florence, ItalyGoogle Scholar
  30. 30.
    Quinlan JR (1993) C4.5: programs for machine learning (Morgan Kaufmann Series in Machine Learning), 1 edn. Morgan KaufmannGoogle Scholar
  31. 31.
    Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90MATHGoogle Scholar
  32. 32.
    Refaeilzadeh P, Tang L, Liu H (2009) Cross validation. In: In encyclopedia of database systemsGoogle Scholar
  33. 33.
    Salt B (2006) Moving into pictures. More on film history, style, and analysis. Starword, LondonGoogle Scholar
  34. 34.
    Shimshoni I, Moses Y, Lindenbaum M (2000) Shape reconstruction of 3d bilaterally symmetric surfaces. Int J Comput Vision 39(2):97–110. doi: 10.1023/A:1008118909580 Google Scholar
  35. 35.
    Super BJ, Bovik AC (1995) Shape from texture using local spectral moments. IEEE Trans Pattern Anal Mach Intell 17(4):333–343CrossRefGoogle Scholar
  36. 36.
    Torralba A, Oliva A (2002) Depth estimation from image structure. IEEE Trans Pattern Anal Mach Intell 24(9):1226–38CrossRefGoogle Scholar
  37. 37.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proc. of CVPRGoogle Scholar
  38. 38.
    Wang HL, Cheong LF (2009) Taxonomy of directing semantics for film shot classification. IEEE Trans Circuits Syst Video Technol 19:1529–1542CrossRefGoogle Scholar
  39. 39.
    Xie L, Chang SF, Divakaran A, Sun H (2002) Structure analysis of soccer video with hidden markov model. In: Proceedings of ICASSP’02. Orlando, Florida, USAGoogle Scholar
  40. 40.
    Zeng W, Gao W, Zhao D (2002) Video indexing by motion activity maps. In: Proc. of ICIP. Rochester, USAGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Information EngineeringUniversity of BresciaBresciaItaly

Personalised recommendations