Video Summarization with Visual and Semantic Features

  • Pei Dong
  • Zhiyong Wang
  • Li Zhuo
  • Dagan Feng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6297)


Video summarization aims to provide a condensed yet informative version for original footages so as to facilitate content comprehension, browsing and delivery, where multi-modal features play an important role in differentiating individual segments of a video. In this paper, we present a method combining both visual and semantic features. Rather than utilize domain specific or heuristic textual features as semantic features, we assign semantic concepts to video segments through automatic video annotation. Therefore, semantic coherence between accompanying text and high-level concepts of video segments is exploited to characterize the importance of video segments. Visual features (e.g. motion and face) which have been widely used in user attention model-based summarization have been integrated with the proposed semantic coherence to obtain the final summarization. Experiments on a half-hour sample video from TRECVID 2006 dataset have been conducted to demonstrate that semantic coherence is very helpful for video summarization when being fused with different visual features.


semantic coherence video summarization multi-modal features user attention model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Money, A., Agius, H.: Video summarisation: A conceptual framework and survey of the state of the art. Journal of Visual Communication and Image Representation 19(2), 121–143 (2008)CrossRefGoogle Scholar
  2. 2.
    Li, Y., Zhang, T., Tretter, D.: An overview of video abstraction techniques. Tech. Rep. HP-2001-191, HP Laboratory (2001)Google Scholar
  3. 3.
    Ma, Y., Zhang, H.: Video snapshot: A bird view of video sequence. In: Proceedings of the 11th International Conference on Multi Media Modeling (MMM), pp. 94–101 (2005)Google Scholar
  4. 4.
    Xu, M., Li, S.Z., Li, B., Yuan, X.T., Xiang, S.M.: A set theoretical method for video synopsis. In: ACM International Conference on Multimedia Information Retrieval (MIR), pp. 366–370 (2008)Google Scholar
  5. 5.
    Ekin, A., Tekalp, A., Mehrotra, R.: Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing 12(7), 796–807 (2003)CrossRefGoogle Scholar
  6. 6.
    Luo, B., Tang, X., Liu, J., Zhang, H.: Video caption detection and extraction using temporal information. In: Proceedings of the International Conference on Image Processing (ICIP), vol. 1, pp. 297–300 (2003)Google Scholar
  7. 7.
    Taskiran, C., Pizlo, Z., Amir, A., Ponceleon, D., Delp, E.: Automated video program summarization using speech transcripts. IEEE Transactions on Multimedia 8(4), 775–791 (2006)CrossRefGoogle Scholar
  8. 8.
    Tsoneva, T., Barbieri, M., Weda, H.: Automated summarization of narrative video on a semantic level. In: Proceedings of the 1st IEEE International Conference on Semantic Computing (ICSC), pp. 169–176 (2007)Google Scholar
  9. 9.
    Otsuka, I., Nakane, K., Divakaran, A., Hatanaka, K., Ogawa, M.: A highlight scene detection and video summarization system using audio feature for a personal video recorder. IEEE Transactions on Consumer Electronics 51, 112–116 (2005)CrossRefGoogle Scholar
  10. 10.
    Refaey, M., Abd-Almageed, W., Davis, L.: A logic framework for sports video summarization using text-based semantic annotation. In: Proceedings of the 3rd International Workshop on Semantic Media Adaptation and Personalization (SMAP), pp. 69–75 (2008)Google Scholar
  11. 11.
    Pickering, M., Wong, L., Rüger, S.: ANSES: Summarisation of news video. In: Proceedings of International Conference on Image and Video Retrieval (CIVR), pp. 425–434 (2003)Google Scholar
  12. 12.
    Evangelopoulos, G., Zlatintsi, A., Skoumas, G., Rapantzikos, K., Potamianos, A., Maragos, P., Avrithis, Y.: Video event detection and summarization using audio, visual and text saliency. In: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3553–3556 (2009)Google Scholar
  13. 13.
    Chen, B., Wang, J., Wang, J.: A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Transactions on Multimedia 11(2), 295–312 (2009)CrossRefGoogle Scholar
  14. 14.
    Liang, C., Kuo, J., Chu, W., Wu, J.: Semantic units detection and summarization of baseball videos. In: Proceedings of the 47th Midwest Symposium on Circuits and Systems (MWSCAS), vol. 1, pp. 297–300 (2004)Google Scholar
  15. 15.
    Tjondronegoro, D., Chen, Y.P., Pham, B.: Classification of self-consumable highlights for soccer video summaries. In: Proceedings of the IEEE International Conference on Multimedia and Expo. (ICME), vol. 1, pp. 579–582 (2004)Google Scholar
  16. 16.
    Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR), pp. 494–501 (2007)Google Scholar
  17. 17.
    Ma, Y., Hua, X., Lu, L., Zhang, H.: A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia 7(5), 907–919 (2005)CrossRefGoogle Scholar
  18. 18.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (1), pp. 511–518 (2001)Google Scholar
  19. 19.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet::Similarity - measuring the relatedness of concepts. In: Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 38–41 (2004)Google Scholar
  20. 20.
    Kleban, J., Sarkar, A., Moxley, E., Mangiat, S., Joshi, S., Kuo, T., Manjunath, B.: Feature fusion and redundancy pruning for rush video summarization. In: Proceedings of the International Workshop on TRECVID Video Summarization, pp. 84–88 (2007)Google Scholar
  21. 21.
    Liu, Z., Zavesky, E., Gibbon, D., Shahraray, B., Haffner, P.: AT&T research at TRECVID 2007. In: TRECVID 2007 Workshop (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Pei Dong
    • 1
    • 2
  • Zhiyong Wang
    • 1
  • Li Zhuo
    • 2
  • Dagan Feng
    • 1
    • 3
  1. 1.School of Information TechnologiesUniversity of SydneyAustralia
  2. 2.Signal and Information Processing LaboratoryBeijing University of TechnologyBeijingChina
  3. 3.Dept. of Electronic and Information EngineeringHong Kong Polytechnic UniversityHong Kong

Personalised recommendations