Multimedia Tools and Applications

, Volume 74, Issue 2, pp 577–593 | Cite as

Nonnegative cross-media recoding of visual-auditory content for social media analysis

  • Hong ZhangEmail author
  • Xin Xu


Cross-media semantics understanding, which focuses on multimedia data of different modalities, is a rising hot topic in social media analysis. One of the most challenging issues for cross-media semantics understanding is how to represent multimedia data of different modalities. Most traditional multimedia semantics analysis works are based on single modality data sources, such as Flickr images or YouTube videos, leaving efficient cross-media data representation wide open. In this paper, we propose a novel nonnegative cross-media recoding approach, which learns co-occurrences of cross-media feature spaces by explicitly learning a common subset of basis vectors. Moreover, we impose the nonnegativity constraint on the decomposed matrices so that the basis vectors represent important and locally meaningful features of the cross-media data. We take two kinds of typical multimedia data, that is, image and audio, as experimental data. Our approach can be applied to a wide range of multimedia applications. The experiments are conducted on image-audio dataset for applications of cross-media retrieval and data clustering. Experiment results are encouraging and show that the performance of our approach is effective.


Cross-media Subspace learning Distance metric Data clustering 



This research is supported by the National Natural Science Foundation of China (No.61003127, No. 61373109), State Key Laboratory of Software Engineering (SKLSE2012-09-31).


  1. 1.
    Cai D, He XF, Han JW (2007) Semi-supervised discriminant analysis. in: IEEE11thInternational conference on Computer Vision, 2007, pp.1–7Google Scholar
  2. 2.
    Dempster AP, Laird NM, Rubin DB et al (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39(1):1–38zbMATHMathSciNetGoogle Scholar
  3. 3.
    Dumais ST, Landauer TK, Littman ML (1996) Automatic cross-linguistic information retrieval using Latent Semantic Indexing. Proceedings of ACM SIGIR Workshop on Cross-Linguistic Information Retrieval, pp.16-23, 1996Google Scholar
  4. 4.
    Frey BJ, Dueck D (2007) Clustering by passing messages between data point. Science 315:972–976CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Gupta SK, Phung D, Adams B et al (2010) Nonnegative shared subspace learning and its application to social media retrieval. Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining. USA. 2010. pp. 1169–1178Google Scholar
  6. 6.
    Han Y, Wu F, Tian Q, Zhuang Y (2012) Graph-guided sparse reconstruction for region tagging. IEEE Conference on Computer Vision and Pattern Recognition, 2012Google Scholar
  7. 7.
    Hansen L, Larsen J, Kolenda T (2000) On independent component analysis for multimedia signals. Multimedia image and video processing. CRC Press, Boca RatonGoogle Scholar
  8. 8.
    Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664CrossRefzbMATHGoogle Scholar
  9. 9.
    He X, Ma WY, Zhang HJ (2004) Learning an image manifold for retrieval. ACM Multimedia Conference, USA, 2004Google Scholar
  10. 10.
    Hong Z, Jun Y, Meng W (2012) Semi-supervised distance metric learning based on local linear regression for data clustering. Neurocomputing 93:100–105CrossRefGoogle Scholar
  11. 11.
    Hong Z, Yun L, Zhigang M (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119:10–16CrossRefGoogle Scholar
  12. 12.
    Kruskal JB, Wish M (1997) Multidimensional scaling. Sage Publications, Beverly HillsGoogle Scholar
  13. 13.
    Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing, 2000Google Scholar
  14. 14.
    Li X, Snoek CGM, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimedia 11(7):1310–1322CrossRefGoogle Scholar
  15. 15.
    Liu Y, Wu F, Zhuang Y, Xiao J (2008) Active post-refined multimodality video semantic concept detection with tensor representation. ACM Multimedia Conference. pp.91-100. 2008Google Scholar
  16. 16.
    Lovasz L, Plummer M (1986) Matching theory. Akadémiai Kiadó, North HollandzbMATHGoogle Scholar
  17. 17.
    Ma Q, Akiyo N, Katsumi T (2006) Complementary information retrieval for cross-media news content. Inf Syst 31(7):659–678CrossRefGoogle Scholar
  18. 18.
    Marl L, Marl S (2008) Structural segmentation of musical audio by constrained clustering. IEEE Trans Audio Speech Lang Process 16(2):318–326CrossRefGoogle Scholar
  19. 19.
    McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748CrossRefGoogle Scholar
  20. 20.
    McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. 1988Google Scholar
  21. 21.
    Pan J, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. ACM SIGKDD Conference. pp.22-25, 2004Google Scholar
  22. 22.
    Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. Proceedings of the 24th International Conference on Machine Learning, page 766, 2007Google Scholar
  23. 23.
    Tang J, Yan S, Hong R, Qi G, Chua T (2009) Inferring semantic concepts from community-contributed images and noisy tags, in: Proceedings of the 17th ACM International Conference on Multimedia, 2009, pp. 223–232Google Scholar
  24. 24.
    Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323CrossRefGoogle Scholar
  25. 25.
    Wu Y, Chang EY, Chang, KCC, Smith JR (2004) Optimal multimodal fusion for multi-media data analysis, in: ACM Multimedia Conference, 2004, pp. 572–579Google Scholar
  26. 26.
    Wu Y, Chang EY, Chang KCC, Smith JR (2004) Optimal multimodal fusion for multimedia data analysis. ACM Multimedia Conference, pp.572-579, 2004Google Scholar
  27. 27.
    Yang Y, Zhuang Y, Wu F, Pan Y (2008) Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans Multimed 10(3):437–446CrossRefGoogle Scholar
  28. 28.
    Yi Y, Feiping N, Dong X, Jiebo L, Yueting Z, Yunhe P (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell (TPAMI) 34(5):723–742Google Scholar
  29. 29.
    Yi Y, Jingkuan S, Zi H, Zhigang M, Nicu S, Alex H (2013) Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis. IEEE Transactions on Multimedia 15(3):572–581CrossRefGoogle Scholar
  30. 30.
    Yueting Z, Yi Y, Fei W (2008) Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans Multimedia 10(2):221–229CrossRefGoogle Scholar
  31. 31.
    Zhang H, Chen L, Liu Y (2013) Isomorphic and sparse multimodal data representation based on correlation analysis. IEEE International Conference on Image Processing. pp.3959-3962. 2013Google Scholar
  32. 32.
    Zhang H, Zhuang Y, Wu F (2007) Cross-modal correlation learning for clustering on Image-Audio Dataset. The 15th ACM International Conference on Multimedia. Germany, 2007Google Scholar
  33. 33.
    Yahong H, Fei W, Qi T, Zhuang Y (2012) Image annotation by input-output structural grouping sparsity. IEEE Trans Image Process (IEEE T-IP) 21(6):3066–3079Google Scholar
  34. 34.
    Yahong H, Yang Y, Zhigang M, Haoquan S, Nicu S, Xiaofang Z (2014) Image attribute adaptation. IEEE Transactions on Multimedia (IEEE T-MM). doi: 10.1109/TMM.2014.2306092

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.College of Computer Science & TechnologyWuhan University of Science & TechnologyWuhanChina
  2. 2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial SystemWuhanChina
  3. 3.State Key Laboratory of Software EngineeringWuhan UniversityWuhanChina

Personalised recommendations