Journal of Zhejiang University SCIENCE C

, Volume 11, Issue 7, pp 525–537 | Cite as

Multiple hypergraph ranking for video concept detection

Article

Abstract

This paper tackles the problem of video concept detection using the multi-modality fusion method. Motivated by multi-view learning algorithms, multi-modality features of videos can be represented by multiple graphs. And the graph-based semi-supervised learning methods can be extended to multiple graphs to predict the semantic labels for unlabeled video data. However, traditional graphs represent only homogeneous pairwise linking relations, and therefore the high-order correlations inherent in videos, such as high-order visual similarities, are ignored. In this paper we represent heterogeneous features by multiple hypergraphs and then the high-order correlated samples can be associated with hyperedges. Furthermore, the multi-hypergraph ranking (MHR) algorithm is proposed by defining Markov random walk on each hypergraph and then forming the mixture Markov chains so as to perform transductive learning in multiple hypergraphs. In experiments on the TRECVID dataset, a triple-hypergraph consisting of visual, textual features and multiple labeled tags is constructed to predict concept labels for unlabeled video shots by the MHR framework. Experimental results show that our approach is effective.

Key words

Multiple hypergraph ranking Video concept detection Multi-view learning Multiple labeled tags Clustering 

CLC number

TP391 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bickel, S., Scheffer, T., 2004. Multi-View Clustering. Proc. 4th IEEE Int. Conf. on Data Mining, p.19–26. [doi:10.1109/ICDM.2004.10095]Google Scholar
  2. Dhillon, I.S., 2001. Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.269–274. [doi:10.1145/502512.502550]Google Scholar
  3. Dumais, S.T., Furnas, G.W., Landauer, T.K., 1998. Using Latent Semantic Analysis to Improve Access to Textual Information. Proc. SIGCHI Conf. on Human Factors in Computing Systems, p.281–285.Google Scholar
  4. Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972–976. [doi:10.1126/science.1136800]CrossRefMathSciNetGoogle Scholar
  5. He, J., Li, M., Zhang, H.J., Tong, H.H., Zhang, C.S., 2004. Manifold-Ranking Based Image Retrieval. Proc. 12th Annual ACM Int. Conf. on Multimedia, p.9–16. [doi:10.1145/1027527.1027531]Google Scholar
  6. Hoi, S.C.H., Lyu, M.R., 2008. A multimodal and multilevel ranking scheme for large-scale video retrieval. IEEE Trans. Multimedia, 10(4):607–619. [doi:10.1109/TMM.2008.921735]CrossRefGoogle Scholar
  7. Liu, J., Lai, W., Hua, X., Huang, Y., Li, S., 2007. Video Search Re-ranking via Multi-Graph Propagation. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.208–217. [doi:10.1145/1291233.1291279]Google Scholar
  8. Liu, Y., Wu, F., Zhuang, Y., Xiao, J., 2008. Active Post-Refined Multimodality Video Semantic Concept Detection with Tensor Representation. Proc. 16th Annual ACM Int. Conf. on Multimedia, p.91–100. [doi:10.1145/1459359.1459372]Google Scholar
  9. Long, B., Yu, P.S., Zhang, Z.F., 2008. A General Model for Multiple View Unsupervised Learning. Proc. SIAM Int. Conf. on Data Mining, p.822–833.Google Scholar
  10. Naphade, M., Smith, J.R., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J., 2006. Large-scale concept ontology for multimedia. IEEE Multimedia, 13(3):86–91. [doi:10.1109/MMUL.2006.63]CrossRefGoogle Scholar
  11. Qi, G., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J., 2007. Correlative Multi-Label Video Annotation. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.17–26. [doi:10.1145/1291233.1291245]Google Scholar
  12. Spielman, D.A., Teng, S.H., 2003. Solving Sparse, Symmetric, Diagonally-Dominant Linear Systems in Time O(m 1.31). 44th Annual IEEE Symp. on Foundations of Computer Science, p.416–427. [doi:10.1109/SFCS.2003.1238215]Google Scholar
  13. Sun, L., Ji, S., Ye, J., 2008. Hypergraph Spectral Learning for Multi-Label Classification. Proc. 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.668–676. [doi:10.1145/1401890.1401971]Google Scholar
  14. Tan, H., Ngo, C., Wu, X., 2008. Modeling Video Hyperlinks with Hypergraph for Web Video Reranking. Proc. 16th Annual ACM Int. Conf. on Multimedia, p.659–662. [doi:10.1145/1459359.1459453]Google Scholar
  15. Tang, J., Hua, X.S., Qi, G., Wang, M., Mei, T., Wu, X., 2007. Structure-Sensitive Manifold Ranking for Video Concept Detection. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.852–861. [doi:10.1145/1291233.1291430]Google Scholar
  16. Tong, H., He, J., Li, M., Zhang, C., Ma, W.Y., 2005. Graph Based Multi-Modality Learning. Proc. 13th Annual ACM Int. Conf. on Multimedia, p.862–871. [doi:10.1145/1101149.1101337]Google Scholar
  17. Virginia, R.S., 2005. Spectral Clustering with Two Views. Proc. 22nd Int. Conf. on Machine Learning, p.20–27.Google Scholar
  18. Wang, J., Zhao, Y., Wu, X., Hua, X., 2008. Transductive Multi-Label Learning for Video Concept Detection. Proc. 1st Annual ACM Int. Conf. on Multimedia Information Retrieval, p.298–304. [doi:10.1145/1460096.1460145]Google Scholar
  19. Wang, M., Mei, T., Yuan, X., Song, Y., Dai, L., 2007a. Video Annotation by Graph-Based Learning with Neighborhood Similarity. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.325–328. [doi:10.1145/1291233.1291303]Google Scholar
  20. Wang, M., Hua, X.S., Yuan, X., Song, Y., Dai, L., 2007b. Optimizing Multi-Graph Learning: Towards a Unified Video Annotation Scheme. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.862–871. [doi:10.1145/1291233.1291431]Google Scholar
  21. Weng, M., Chuang, Y., 2008. Multi-Cue Fusion for Semantic Video Indexing. Proc. 16th Annual ACM Int. Conf. Multimedia, p.71–80. [doi:10.1145/1459359.1459370]Google Scholar
  22. Yanagawa, A., Chang, S.F., Kennedy, L., Hsu, W., 2007. Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts. ADVENT Technical Report No. 222-2006-8, Columbia University, New York.Google Scholar
  23. Yang, Y., Zhuang, Y., Wu, F., Pan, Y., 2008. Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia, 10(3):437–446. [doi:10.1109/TMM.2008.917359]CrossRefGoogle Scholar
  24. Yuan, X., Hua, X.S., Wang, M., Wu, X., 2006. Manifold-Ranking Based Video Concept Detection on Large Database and Feature Pool. Proc. 14th Annual ACM Int. Conf. on Multimedia, p.623–626. [doi:10.1145/1180639.1180768]Google Scholar
  25. Zha, Z., Mei, T., Wang, J., Wang, Z., Hua, X., 2009. Graph-based semi-supervised learning with multiple labels. J. Vis. Commun. Image Represent., 20(2):97–103. [doi:10.1016/j.jvcir.2008.11.009]CrossRefGoogle Scholar
  26. Zhang, H., Zhuang, Y., Wu, F., 2007. Cross-Modal Correlation Learning for Clustering on Image-Audio Dataset. Proc. 15th Annual ACM Int. Conf. on Multimedia, p. 273–276. [doi:10.1145/1291233.1291290]Google Scholar
  27. Zhang, M., Zhou, Z., 2008. M3MIML: a Maximum Margin Method for Multi-Instance Multi-Label Learning. Proc. 8th IEEE Int. Conf. on Data Mining, p.688–697. [doi:10.1109/ICDM.2008.27]Google Scholar
  28. Zhao, W., Ngo, C., Tan, H., Wu, X., 2007. Near-duplicate keyframe identification with interest point marching and pattern learning. IEEE Trans. Multimedia, 9(5):1037–1048. [doi:10.1109/TMM.2007.898928]CrossRefGoogle Scholar
  29. Zhou, D., Burges, C.J.C., 2007. Spectral Clustering and Transductive Learning with Multiple Views. Proc. 24th Int. Conf. on Machine Learning, p.1159–1166. [doi:10.1145/1273496.1273642]Google Scholar
  30. Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B., 2004a. Ranking on Data Manifolds. Advances in Neural Information Processing Systems 16, p.169–176.Google Scholar
  31. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B., 2004b. Learning with Local and Global Consistency. Advances in Neural Information Processing Systems 16, p.321–328.Google Scholar
  32. Zhou, D., Huang, J., Schölkopf, B., 2007. Learning with Hypergraphs Clustering, Classification, and Embedding. Advances in Neural Information Processing Systems 19, p.1601–1608.Google Scholar
  33. Zhu, X., Ghahramani, Z., Lafferty, J., 2003. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. Proc. 20th Int. Conf. on Machine Learning, p.912–919. [doi:10.1109/18.850663]Google Scholar

Copyright information

© ?Journal of Zhejiang University Science? Editorial Office and Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyZhejiang UniversityHangzhouChina

Personalised recommendations