Multimedia Tools and Applications

, Volume 41, Issue 1, pp 125–159 | Cite as

Parallel neural networks for multimodal video genre classification

  • Maurizio Montagnuolo
  • Alberto Messina


Improvements in digital technology have made possible the production and distribution of huge quantities of digital multimedia data. Tools for high-level multimedia documentation are becoming indispensable to efficiently access and retrieve desired content from such data. In this context, automatic genre classification provides a simple and effective solution to describe multimedia contents in a structured and well understandable way. We propose in this article a methodology for classifying the genre of television programmes. Features are extracted from four informative sources, which include visual-perceptual information (colour, texture and motion), structural information (shot length, shot distribution, shot rhythm, shot clusters duration and saturation), cognitive information (face properties, such as number, positions and dimensions) and aural information (transcribed text, sound characteristics). These features are used for training a parallel neural network system able to distinguish between seven video genres: football, cartoons, music, weather forecast, newscast, talk show and commercials. Experiments conducted on more than 100 h of audiovisual material confirm the effectiveness of the proposed method, which reaches a classification accuracy rate of 95%.


Video annotation Genre recognition Neural network Feature extraction Multimedia semantics 


  1. 1.
    Albiol A, Fullá MJCh, Albiol A, Torres L (2004) Commercials detection using HMMs. In: International workshop on image analysis for multimedia interactive services. Lisboa, PortugalGoogle Scholar
  2. 2.
    Bellman R (1961) Adaptive control processes: a guided tour. Princeton Univ. PressGoogle Scholar
  3. 3.
    Blum DW (1992) Method and apparatus for identifying and eliminating specific material from video signals. US Patent no. 5151788Google Scholar
  4. 4.
    Boggs J, Petrie DW (2006) The art of watching films with tutorial CD-ROM. McGraw-HillGoogle Scholar
  5. 5.
    Brugnara F, Cettolo M, Federico M, Giuliani D (2000) A system for the segmentation and transcription of italian radio news. In: RIAO, content-based multimedia information access. Paris, FranceGoogle Scholar
  6. 6.
    Ćalić J (2004) Highly efficient low-level feature extraction for video representation and retrieval. PhD thesis, University of LondonGoogle Scholar
  7. 7.
    Chellappa R, Wilson CL, Sirohey S (1995) Human and machine recognition of faces: a survey. Proc IEEE 83(5):705–740 (May)CrossRefGoogle Scholar
  8. 8.
    Cheng W, Liu C, Wang X (2006) A rough set approach to video genre classification. In: 8th international conference on advanced concepts for intelligent vision systems (ACIVS’06). Antwerp, Belgium, pp 1210–1220 (September)Google Scholar
  9. 9.
    Covell M, Baluja S, Fink M (2006) Advertisement detection and replacement using acoustic and visual repetition. In: IEEE 8th workshop on multimedia signal processing (MMSP2006). Victoria, BC, pp 461–466 (October)Google Scholar
  10. 10.
    Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27MATHCrossRefGoogle Scholar
  11. 11.
    Dimitrova N, Agnihotri L, Wei G (2000) Video classification based on HMM using text and faces. In: European conference on signal processing. Tampere, FinlandGoogle Scholar
  12. 12.
    Dimitrova N, Jeannin S, Nesvadba J, McGee T, Agnihotri L, Mekenkam G (2002) Real time commercial detection using MPEG features. In: Proc. 9th int. conf. on information processing and management of uncertainty in knowledge-based systems (IPMU 2002). Annecy, France, pp 481–486 (Invited paper)Google Scholar
  13. 13.
    Dinh PQ, Dorai C, Venkatesh S (2002) Video genre categorization using audio wavelet coefficients. In: ACCV2002: the 5th Asian conference on computer vision. Melbourne, Australia (January)Google Scholar
  14. 14.
    Dorado A, Calic J, Izquierdo E (2004) A rule-based video annotation system. IEEE Trans Circuits Syst Video Technol 14(5):622–633CrossRefGoogle Scholar
  15. 15.
    EBU-UER (2007) Escort 2007. Technical Review 3322, EBUGoogle Scholar
  16. 16.
    Fischer S, Lienhart R, Effelsberg W (1995) Automatic recognition of film genres. In: ACM multimedia 1995. San Francisco, CA, pp 295–304 (November)Google Scholar
  17. 17.
    Glasberg R, Samour A, Elazouzi K, Sikora T (2005) Cartoon-recognition using video & audio-descriptors. In: 13th European signal processing conference (EUSIPCO2005). Antalya, Turkey (September)Google Scholar
  18. 18.
    Goh KS, Miyahara K, Radhakrishan R, Xiong Z, Divakaran A (2004) Audio-visual event detection based on mining of semantic audio-visual labels. Technical Report 2004-008, Mitsubishi Electric Research Laboratory (MERL)Google Scholar
  19. 19.
    Ianeva TI, de Vries AP, Rohrig H (2003) Detecting cartoons: a case study in automatic video-genre classification. In: IEEE international conference on multimedia and expo (ICME’03), pp 449–452 (July)Google Scholar
  20. 20.
    Igel C, Hüsken M (2000) Improving the Rprop learning algorithm. In: Proceedings of the second international symposium on neural computation, NC2000Google Scholar
  21. 21.
    Jolliffe IT (2002) Principal component analysis. SpringerGoogle Scholar
  22. 22.
    Liu Z, Huang J, Wang Y (1998) Classification of TV programs based on audio information using hidden Markov model. In: IEEE 2nd workshop on multimedia signal processing (MMSP ’98). Redonda Beach, CA, USA, pp 27–32 (December)Google Scholar
  23. 23.
    Liu Z, Huang J, Wang Y, Chen T (1997) Audio feature extraction and analysis for scene classification. In: IEEE workshop on multimedia signal processing (MMSP’97), pp 343–348Google Scholar
  24. 24.
    Lo Iacono A, Colamussi M (2005) Rai click—“I want my own TV”. Technical Review 303, EBU (July)Google Scholar
  25. 25.
    Messina A, Montagnuolo M (2008) Fuzzy mining of multimedia genre applied to television archives. In: IEEE international conference on multimedia and expo. Hannover, Germany, 23–26 June 2008Google Scholar
  26. 26.
    Messina A, Montagnuolo M (2008) Multimedia genre characterisation with fuzzy embedding classifiers. In: International workshop on ambient media delivery and interactive television (AMDIT2008). Quebec City, Canada (February)Google Scholar
  27. 27.
    Messina A, Montagnuolo M, Sapino ML (2006) Characterizing multimedia objects through multimodal content analysis and fuzzy fingerprints. In: IEEE international conference on signal-image technology and internet-based systems (SITIS’06). Hammamet, Tunisia (December)Google Scholar
  28. 28.
    Montagnuolo M, Messina A (2007) Automatic genre classification of TV programmes using Gaussian mixture models and neural networks. In: DEXA workshops. Regensurg, Germany, pp 99–103 (September)Google Scholar
  29. 29.
    Montagnuolo M, Messina A (2007) Multimedia knowledge representation for automatic annotation of broadcast TV archives. J Digit Inf Manag 5(2):67–74Google Scholar
  30. 30.
    Montagnuolo M, Messina A (2008) Multimodal genre analysis applied to digital television archives. In: Second international workshop on multimedia data mining and management (DEXA-MDMM’08). Turin, Italy, 2 September 2008Google Scholar
  31. 31.
    Novak AP (1988) Method and system for editing unwanted program material from broadcast signals. US Patent no. 4750213Google Scholar
  32. 32.
    Parnal S, Pizzi S (2003) TV anytime: a new standard. EBU diffusion online, 2003/33, AugustGoogle Scholar
  33. 33.
    Poli JP, Carrive J (2006) Improving program guides for reducing tv stream structuring problem to a simple alignment problem. In: CIMCA ’06: proceedings of the international conference on computational inteligence for modelling control and automation and international conference on intelligent agents web technologies and international commerce, p 31Google Scholar
  34. 34.
    Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45CrossRefGoogle Scholar
  35. 35.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers IncGoogle Scholar
  36. 36.
    Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRefGoogle Scholar
  37. 37.
    Roach M, Mason JS, Pawlewski M (2001) Motion-based classification of cartoons. In: IEEE international symposium on intelligent multimedia, video and speech processing (ISIMP2001), pp 146–149Google Scholar
  38. 38.
    Roach MJ (2002) Video genre classification. PhD thesis, University of Wales SwanseaGoogle Scholar
  39. 39.
    Roach MJ, Mason JSD, Pawlewski M (2001) Video genre classification using dynamics. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’01), pp 1557–1560Google Scholar
  40. 40.
    Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Parallel distributed processing: volume 1: foundations. The MIT Press, pp 318–362Google Scholar
  41. 41.
    Safavian SR, Landgrebe DA (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674CrossRefMathSciNetGoogle Scholar
  42. 42.
    Sánchez JM, Binefa X, Vitriá J, Radeva P (1999) Local color analysis for scene break detection applied to TV commercials recognition. In: VISUAL ’99: proceedings of the third international conference on visual information and information systems, pp 237–244Google Scholar
  43. 43.
    Satterwhite B, Marques O (2004) Automatic detection of television commercials. IEEE Potentials 23(2):9–12CrossRefGoogle Scholar
  44. 44.
    Snoek C, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimedia Tools and Applications 25(1):5–35CrossRefGoogle Scholar
  45. 45.
    Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32 (November)CrossRefGoogle Scholar
  46. 46.
    Takagi S, Hattori S, Yokoyama K, Kodate A, Tominaga H (2003) Sports video categorizing method using camera motion parameters. In: IEEE 2003 international conference on multimedia and expo (ICME’03), pp 461–464 (July)Google Scholar
  47. 47.
    Tamura H, Mori S, Yamawaki T (1978) Texture features corresponding to visual perception. IEEE Trans Syst Man Cybern 8(6):460–473CrossRefGoogle Scholar
  48. 48.
    Taskiran CM, Delp EJ (2001) Distribution of shot lengths for video analysis. In: Proceedings of SPIE, vol. 4676, pp 276–284Google Scholar
  49. 49.
    Taskiran CM, Pollak I, Bouman CA, Delp EJ (2003) Stochastic models of video structure for program genre detection. In: 8th international workshop on visual content processing and representation (VLBV 2003). Madrid, Spain, pp 84–92 (September)Google Scholar
  50. 50.
    Tekalp M (1995) Digital video processing. Prentice HallGoogle Scholar
  51. 51.
    Tomasi C (2005) Estimating Gaussian mixture densities with EM—a tutorial. Technical report, Duke UniversityGoogle Scholar
  52. 52.
    Truong BT, Venkatesh S, Dorai C (2000) Automatic genre identification for content-based video categorization. In: IEEE 15th international conference on pattern recognition (ICPP’00). IEEE Computer Society, pp 230–233Google Scholar
  53. 53.
    Vakkalanka S, Mohan CK, Kumaraswamy R, Yegnanarayana B (2005) Combining multiple evidence for video classification. In: IEEE international conference on intelligent sensing and information processing (ICISIP’05), pp 187–192 (January)Google Scholar
  54. 54.
    Vapnik VN (1999) The nature of statistical learning theory. SpringerGoogle Scholar
  55. 55.
    Vasconcelos N, Lippman A (2000) Statistical models of video structure for content analysis and characterization. IEEE Trans Image Process 9(1):3–19CrossRefGoogle Scholar
  56. 56.
    Vroomen JHM, Collier R, Mozziconacci S (1993) Duration and intonation in emotional speech. In: Eurospeech 1993, pp 577–580Google Scholar
  57. 57.
    Wang J, Xu C, Chang E (2006) Automatic sports video genre classification using pseudo-2D-HMM. In: IEEE 18th international conference on pattern recognition (ICPR’06), pp 778–781Google Scholar
  58. 58.
    Wickenberg-Bolin U, Göransson H, Fryknäs M, Gustafsson MG, Isaksson A (2006) Improved variance estimation of classification performance via reduction of bias caused by small sample size. BMC Bioinformatics 7:127Google Scholar
  59. 59.
    Xu LQ, Li Y (2003) Video classification using spatial-temporal features and PCA. In: IEEE international conference on multimedia and expo (ICME’03), pp 485–488 (July)Google Scholar
  60. 60.
    Yuan X, Lai W, Mei T, Hua XS, Wu XQ, Li S (2006) Automatic video genre categorization using hierarchical SVM. In: IEEE international conference on image processing (ICIP’06). Atlanta, GA, pp 2905–2908 (October)Google Scholar
  61. 61.
    Yuan Y, Song QB, Shen JY (2002) Automatic video classification using decision tree method. In: IEEE 1st international conference on machine learning and cybernetics, vol. 3. Beijing, pp 1153–1157Google Scholar
  62. 62.
    Zhiwen Y, Xingshe Z, Jianhua G, Zhiyi Y (2004) Fuzzy clustering for tv program classification. In: IEEE international conference on information technology: coding and computing (ICIT’04), pp 658–662 (April)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TurinTurinItaly
  2. 2.Centre for Research and Technological InnovationRAI Radiotelevisione ItalianaTurinItaly

Personalised recommendations