Advertisement

Improving Video Concept Detection Using Spatio-Temporal Correlation

  • Songhao Zhu
  • Zhiwei Liang
  • Yuncai Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6297)

Abstract

Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled training data in many real-world application areas, such as video concept detection. As a significant factor of these algorithms, however, pair-wise similarity metric of samples has not been fully investigated. Specifically, for existing approaches, the estimation of pair-wise similarity between two samples relies on the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pair-wise similarity measure. Accordingly, in this paper, a novel framework for video concept detection, called Joint Spatio-Temporal Correlation Learning (JSTCL) is proposed. This framework is characterized by simultaneously taking into account both the spatial and temporal property of video data to improve the computation of pair-wise similarity. We apply the proposed framework to video concept detection and report superior performance compared to key existing approaches over the benchmark TRECVID data set.

Keywords

Graph-based semi-supervised learning pair-wise similarity measure spatio-temporal correlation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Seeger, M.: Learning with labeled and unlabeled data. Technical Report, Edinburgh University (2001)Google Scholar
  2. 2.
    Chapelle, O., Zien, A., Scholkopf, B.: Semi-supervised learning. MIT Press, Cambridge (2006)Google Scholar
  3. 3.
    Song, Y., Hua, X., Wang, M.: Semi-automatic video annotation based on active learning with multiple complementary predictors. In: ACM International Conference on Multimedia Information Retrieval, pp. 97–104. ACM Press, Singapore (2005)Google Scholar
  4. 4.
    Yan, R., Naphade, M.: Semi-supervised cross feature learning for semantic concept detection in videos. In: Proc. IEEE International Conference on Computer Vision and Pattern Recognition, pp. 657–663. IEEE Press, San Diego (2005)Google Scholar
  5. 5.
    Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using Gaussian fields and harmonic function. In: Proc. IEEE International Conference on Machine Learning, pp. 912–919. IEEE Press, Washington (2003)Google Scholar
  6. 6.
    Zhou, D., Bousquet, O., SchÄolkopf, B.: Learning with local and global consistency. In: IEEE International Conference on Neural Information Processing Systems, pp. 321–328. IEEE Press, Vancouver (2003)Google Scholar
  7. 7.
    Belkin, M., Matveeva, I., Niyogi, P.: Regularization and semi-supervised learning on large graphs. In: IEEE International Conference on Annual Conference on Computational Learning Theory, pp. 624–638. IEEE Press, Wisconsin (2004)Google Scholar
  8. 8.
    He, J., Li, M., Zhang, C.: Generalized manifold-ranking based image retrieval. IEEE Traction on Image Processing 15, 3170–3177 (2006)CrossRefGoogle Scholar
  9. 9.
    Wang, C., Jing, F., Zhang, L., Zhang, H.: Image annotation refinement using random walk with restarts. In: ACM International Conference on Multimedia, pp. 647–650. ACM Press, Augsburg (2007)Google Scholar
  10. 10.
    Yuan, X., Hua, X., Wang, M., Wu, X.: Manifold-ranking based video concept detection on large database and feature pool. In: ACM International Conference on Multimedia, pp. 623–626. ACM Press, Augsburg (2007)Google Scholar
  11. 11.
    Wang, M., Hua, X., Zhang, H.: Automatic video annotation by semi-supervised learning with kernel density estimation. In: ACM International Conference on Multimedia, pp. 967–976. ACM Press, Vancouver (2008)Google Scholar
  12. 12.
    Wang, M., Meiz, T., Dai, L.: Video annotation by graph-based learning with neighborhood similarity. In: ACM International Conference on Multimedia, pp. 325–328. ACM Press, Vancouver (2008)Google Scholar
  13. 13.
    Tang, J., Hua, X., Wu, X.: Anisotropic Manifold Ranking for Video Annotation. In: IEEE International Conference on Multimedia and Expo., pp. 492–495. IEEE Press, New York (2009)Google Scholar
  14. 14.
    Stricker, M., Orengo, M.: Similarity of color images. In: IEEE International Conference on Storage and Retrieval for Image and Video Databases, pp. 381–392. IEEE Press, San Diego (1995)Google Scholar
  15. 15.
    Pass, G.: Comparing images using color coherence vectors. In: ACM International Conference on Multimedia, pp. 65–73. ACM Press, Seattle (1997)Google Scholar
  16. 16.
    Kokare, M., Chatterji, B., Biswas, P.: Comparison of similarity metrics for texture image retrieval. In: IEEE International Conference on Multimedia and Expo., pp. 571–575. IEEE Press, New York (2003)Google Scholar
  17. 17.
    Zhu, X.: Semi-Supervised Learning Literature Survey. Technical Report, University of Wisconsin-Madison (2007)Google Scholar
  18. 18.
    TRECVID. Trecvid retrieval evaluations, http://wwwnlpir.nist.gov/projects/trecvid
  19. 19.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines, Software available at, http://www.csie.ntu.edu.tw/~cjlin/libsvm
  20. 20.
    Wang, J., Zhao, Y., Wu, X., Hua, X.: Transductive multi-label learning for video concept detection. In: ACM International Conference on Multimedia, pp. 298–304. ACM Press, Vancouver (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Songhao Zhu
    • 1
  • Zhiwei Liang
    • 1
  • Yuncai Liu
    • 2
  1. 1.Nanjing University of Post and TelecommunicationsNanjingP.R. China
  2. 2.Shanghai Jiao Tong UniversityShanghaiP.R. China

Personalised recommendations