Skip to main content
Log in

Multiple hypergraph ranking for video concept detection

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

This paper tackles the problem of video concept detection using the multi-modality fusion method. Motivated by multi-view learning algorithms, multi-modality features of videos can be represented by multiple graphs. And the graph-based semi-supervised learning methods can be extended to multiple graphs to predict the semantic labels for unlabeled video data. However, traditional graphs represent only homogeneous pairwise linking relations, and therefore the high-order correlations inherent in videos, such as high-order visual similarities, are ignored. In this paper we represent heterogeneous features by multiple hypergraphs and then the high-order correlated samples can be associated with hyperedges. Furthermore, the multi-hypergraph ranking (MHR) algorithm is proposed by defining Markov random walk on each hypergraph and then forming the mixture Markov chains so as to perform transductive learning in multiple hypergraphs. In experiments on the TRECVID dataset, a triple-hypergraph consisting of visual, textual features and multiple labeled tags is constructed to predict concept labels for unlabeled video shots by the MHR framework. Experimental results show that our approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bickel, S., Scheffer, T., 2004. Multi-View Clustering. Proc. 4th IEEE Int. Conf. on Data Mining, p.19–26. [doi:10.1109/ICDM.2004.10095]

  • Dhillon, I.S., 2001. Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.269–274. [doi:10.1145/502512.502550]

  • Dumais, S.T., Furnas, G.W., Landauer, T.K., 1998. Using Latent Semantic Analysis to Improve Access to Textual Information. Proc. SIGCHI Conf. on Human Factors in Computing Systems, p.281–285.

  • Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972–976. [doi:10.1126/science.1136800]

    Article  MathSciNet  Google Scholar 

  • He, J., Li, M., Zhang, H.J., Tong, H.H., Zhang, C.S., 2004. Manifold-Ranking Based Image Retrieval. Proc. 12th Annual ACM Int. Conf. on Multimedia, p.9–16. [doi:10.1145/1027527.1027531]

  • Hoi, S.C.H., Lyu, M.R., 2008. A multimodal and multilevel ranking scheme for large-scale video retrieval. IEEE Trans. Multimedia, 10(4):607–619. [doi:10.1109/TMM.2008.921735]

    Article  Google Scholar 

  • Liu, J., Lai, W., Hua, X., Huang, Y., Li, S., 2007. Video Search Re-ranking via Multi-Graph Propagation. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.208–217. [doi:10.1145/1291233.1291279]

  • Liu, Y., Wu, F., Zhuang, Y., Xiao, J., 2008. Active Post-Refined Multimodality Video Semantic Concept Detection with Tensor Representation. Proc. 16th Annual ACM Int. Conf. on Multimedia, p.91–100. [doi:10.1145/1459359.1459372]

  • Long, B., Yu, P.S., Zhang, Z.F., 2008. A General Model for Multiple View Unsupervised Learning. Proc. SIAM Int. Conf. on Data Mining, p.822–833.

  • Naphade, M., Smith, J.R., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J., 2006. Large-scale concept ontology for multimedia. IEEE Multimedia, 13(3):86–91. [doi:10.1109/MMUL.2006.63]

    Article  Google Scholar 

  • Qi, G., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J., 2007. Correlative Multi-Label Video Annotation. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.17–26. [doi:10.1145/1291233.1291245]

  • Spielman, D.A., Teng, S.H., 2003. Solving Sparse, Symmetric, Diagonally-Dominant Linear Systems in Time O(m 1.31). 44th Annual IEEE Symp. on Foundations of Computer Science, p.416–427. [doi:10.1109/SFCS.2003.1238215]

  • Sun, L., Ji, S., Ye, J., 2008. Hypergraph Spectral Learning for Multi-Label Classification. Proc. 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.668–676. [doi:10.1145/1401890.1401971]

  • Tan, H., Ngo, C., Wu, X., 2008. Modeling Video Hyperlinks with Hypergraph for Web Video Reranking. Proc. 16th Annual ACM Int. Conf. on Multimedia, p.659–662. [doi:10.1145/1459359.1459453]

  • Tang, J., Hua, X.S., Qi, G., Wang, M., Mei, T., Wu, X., 2007. Structure-Sensitive Manifold Ranking for Video Concept Detection. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.852–861. [doi:10.1145/1291233.1291430]

  • Tong, H., He, J., Li, M., Zhang, C., Ma, W.Y., 2005. Graph Based Multi-Modality Learning. Proc. 13th Annual ACM Int. Conf. on Multimedia, p.862–871. [doi:10.1145/1101149.1101337]

  • Virginia, R.S., 2005. Spectral Clustering with Two Views. Proc. 22nd Int. Conf. on Machine Learning, p.20–27.

  • Wang, J., Zhao, Y., Wu, X., Hua, X., 2008. Transductive Multi-Label Learning for Video Concept Detection. Proc. 1st Annual ACM Int. Conf. on Multimedia Information Retrieval, p.298–304. [doi:10.1145/1460096.1460145]

  • Wang, M., Mei, T., Yuan, X., Song, Y., Dai, L., 2007a. Video Annotation by Graph-Based Learning with Neighborhood Similarity. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.325–328. [doi:10.1145/1291233.1291303]

  • Wang, M., Hua, X.S., Yuan, X., Song, Y., Dai, L., 2007b. Optimizing Multi-Graph Learning: Towards a Unified Video Annotation Scheme. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.862–871. [doi:10.1145/1291233.1291431]

  • Weng, M., Chuang, Y., 2008. Multi-Cue Fusion for Semantic Video Indexing. Proc. 16th Annual ACM Int. Conf. Multimedia, p.71–80. [doi:10.1145/1459359.1459370]

  • Yanagawa, A., Chang, S.F., Kennedy, L., Hsu, W., 2007. Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts. ADVENT Technical Report No. 222-2006-8, Columbia University, New York.

    Google Scholar 

  • Yang, Y., Zhuang, Y., Wu, F., Pan, Y., 2008. Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia, 10(3):437–446. [doi:10.1109/TMM.2008.917359]

    Article  Google Scholar 

  • Yuan, X., Hua, X.S., Wang, M., Wu, X., 2006. Manifold-Ranking Based Video Concept Detection on Large Database and Feature Pool. Proc. 14th Annual ACM Int. Conf. on Multimedia, p.623–626. [doi:10.1145/1180639.1180768]

  • Zha, Z., Mei, T., Wang, J., Wang, Z., Hua, X., 2009. Graph-based semi-supervised learning with multiple labels. J. Vis. Commun. Image Represent., 20(2):97–103. [doi:10.1016/j.jvcir.2008.11.009]

    Article  Google Scholar 

  • Zhang, H., Zhuang, Y., Wu, F., 2007. Cross-Modal Correlation Learning for Clustering on Image-Audio Dataset. Proc. 15th Annual ACM Int. Conf. on Multimedia, p. 273–276. [doi:10.1145/1291233.1291290]

  • Zhang, M., Zhou, Z., 2008. M3MIML: a Maximum Margin Method for Multi-Instance Multi-Label Learning. Proc. 8th IEEE Int. Conf. on Data Mining, p.688–697. [doi:10.1109/ICDM.2008.27]

  • Zhao, W., Ngo, C., Tan, H., Wu, X., 2007. Near-duplicate keyframe identification with interest point marching and pattern learning. IEEE Trans. Multimedia, 9(5):1037–1048. [doi:10.1109/TMM.2007.898928]

    Article  Google Scholar 

  • Zhou, D., Burges, C.J.C., 2007. Spectral Clustering and Transductive Learning with Multiple Views. Proc. 24th Int. Conf. on Machine Learning, p.1159–1166. [doi:10.1145/1273496.1273642]

  • Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B., 2004a. Ranking on Data Manifolds. Advances in Neural Information Processing Systems 16, p.169–176.

    Google Scholar 

  • Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B., 2004b. Learning with Local and Global Consistency. Advances in Neural Information Processing Systems 16, p.321–328.

    Google Scholar 

  • Zhou, D., Huang, J., Schölkopf, B., 2007. Learning with Hypergraphs Clustering, Classification, and Embedding. Advances in Neural Information Processing Systems 19, p.1601–1608.

    Google Scholar 

  • Zhu, X., Ghahramani, Z., Lafferty, J., 2003. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. Proc. 20th Int. Conf. on Machine Learning, p.912–919. [doi:10.1109/18.850663]

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Shao.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 60603096 and 60673088), the National High-Tech Research and Development Program (863) of China (No. 2006AA010 107), and the Program for Changjiang Scholars and Innovative Research Team in University of China (No. IRT0652)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, Yh., Shao, J., Wu, F. et al. Multiple hypergraph ranking for video concept detection. J. Zhejiang Univ. - Sci. C 11, 525–537 (2010). https://doi.org/10.1631/jzus.C0910453

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C0910453

Key words

CLC number

Navigation