Skip to main content
Log in

Modelling multilevel data in multimedia: A hierarchical factor analysis approach

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multimedia content understanding research requires rigorous approach to deal with the complexity of the data. At the crux of this problem is the method to deal with multilevel data whose structure exists at multiple scales and across data sources. A common example is modeling tags jointly with images to improve retrieval, classification and tag recommendation. Associated contextual observation, such as metadata, is rich that can be exploited for content analysis. A major challenge is the need for a principal approach to systematically incorporate associated media with the primary data source of interest. Taking a factor modeling approach, we propose a framework that can discover low-dimensional structures for a primary data source together with other associated information. We cast this task as a subspace learning problem under the framework of Bayesian nonparametrics and thus the subspace dimensionality and the number of clusters are automatically learnt from data instead of setting these parameters a priori. Using Beta processes as the building block, we construct random measures in a hierarchical structure to generate multiple data sources and capture their shared statistical at the same time. The model parameters are inferred efficiently using a novel combination of Gibbs and slice sampling. We demonstrate the applicability of the proposed model in three applications: image retrieval, automatic tag recommendation and image classification. Experiments using two real-world datasets show that our approach outperforms various state-of-the-art related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. e.g. a discrete space having count data {0, 1, 2, … , }M × 1 etc

  2. k–th factor is considered as an active factor if \(\mathbf {Z}_{j}^{i^{\prime },k}\) is 1 for some i and j.

  3. Cosine similarity is preferred as it is invariant to the scaling.

  4. A similar dataset has been used before in [8].

  5. The F1@N values for the baseline methods are taken from [6] for reference purpose.

References

  1. Barnard K, Duygulu P, Forsyth D, De Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135

    MATH  Google Scholar 

  2. Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 127–134

  3. Blei DM, McAuliffe JD (2007) Supervised topic models. In: Advances in neural information processing systems

  4. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  5. Cao B, Pan S, Zhang Y, Yeung D, Yang Q (2010) Adaptive transfer learning. In: Proceedings of the 24th AAAI conference on artificial intelligence

  6. Chen N, Zhu J, Sun F, Xing E (2012) Large-margin predictive latent subspace learning for multi-view data analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence

  7. Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: A real-world web image database from national university of Singapore. CIVR:48:1–48:9

  8. Dunson D, Park J (2008) Kernel stick-breaking processes. Biometrika 95(2):307–323

    Article  MathSciNet  MATH  Google Scholar 

  9. Ferguson T (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1(2):209–230

    Article  MathSciNet  MATH  Google Scholar 

  10. Ferrari V, Tuytelaars T, Van Gool L (2004) Integrating multiple model views for object recognition. In: Computer vision and pattern recognition. IEEE, pp 105–112

  11. Gilks W, Richardson S, Spiegelhalter D (1995) Markov chain Monte Carlo in practice: Interdisciplinary statistics, vol 2. Chapman & Hall/CRC, London, UK

    Google Scholar 

  12. Gupta S, Phung D, Venkatesh S (2012) A Bayesian nonparametric joint factor model for learning shared and individual subspaces from multiple data sources. In: Proceedings of 12th SIAM international conference on data mining, pp 200–211

  13. Gupta S, Phung D, Venkatesh S (2012) A slice sampler for restricted hierarchical beta process with applications to shared subspace learning. In: Uncertainty in artificial intelligence, pp 316–325

  14. Hjort N (1990) Nonparametric Bayes estimators based on beta processes in models for life history data. Ann Stat 18(3):1259–1294

    Article  MathSciNet  MATH  Google Scholar 

  15. Nikolopoulos S, Zafeiriou S, Patras I, Kompatsiaris I (2012) High order plsa for indexing tagged images. Signal Process

  16. Oliva A, Torralba A (2001) Modeling the shape of the scene: A holistic representation of the spatial envelope. Int’l J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  17. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the international conference on multimedia. ACM, pp 251–260

  18. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: A database and web-based tool for image annotation. Int J Comput Vis 77(1):157–173

    Article  Google Scholar 

  19. Shen Y, Fan J (2010) Leveraging loosely-tagged images and inter-object correlations for tag recommendation. In: Proceedings of the international conference on multimedia. ACM, pp 5–14

  20. Teh Y, Görür D, Ghahramani Z (2007) Stick-breaking construction for the Indian buffet process. J Mach Learn Res- Proc Track 2:556–563

    Google Scholar 

  21. Thibaux R, Jordan M (2007) Hierarchical beta processes and the Indian buffet process. J Mach Learn Res- Proc Track 2:564–571

    Google Scholar 

  22. Vidal R (2011) Subspace clustering. IEEE Signal Proc Mag 28(2):52–68

    Article  Google Scholar 

  23. Wang C, Blei D, Li F (2009) Simultaneous image classification and annotation. In: Computer vision and pattern recognition. IEEE, pp 1903–1910

  24. Wu X, Zhang L, Yu Y (2006) Exploring social annotations for the semantic web. In: Proceedibgs of the international conference on world wide web. ACM, pp 417–426

  25. Xing E, Yan R, Hauptmann A (2005) Mining associated text and images with dual-wing harmoniums. Uncertainty in artificial intelligence, pp 633–641

  26. Yang J, Liu Y, Ping E, Hauptmann A (2007) Harmonium models for semantic video representation and classification. SIAM Conference on Data Mining, pp 1–12

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dinh Phung.

Appendix

Appendix

Sampling u i

This section provides details for sampling u i when the primary medium has real-valued features and the associated media have categorical features. The primary and associated media features are modeled using Gaussian and categorical distributions. This is the model assumed throughout our experiments using real datasets.

Due to the use of Dirichlet process prior, the number of groups (J) can increase/decrease with data. For the existing groups, i.e. for u i = 1, … , J, the conditional Gibbs posterior of u i is given as

$$\begin{array}{@{}rcl@{}} &&p\left(u_{i}\mid\ldots\right)\propto\left\{ \frac{{\Pi}_{\{l\mid\mathbf{C}^{l,i}>0\}}{\Pi}_{t=0}^{d_{li}-1}\left(a_{\psi}+n_{u_{i}}^{-i,l}+t\right)}{{\Pi}_{t=0}^{n_{c_{i}}-1}\left[{\sum}_{\{l\mid\mathbf{C}^{l,i}>0\}}\left(a_{\psi}+n_{u_{i}}^{-i,l}\right)+t\right]}\right\}\\ &&{\kern3.4pc} \times\left\{ \frac{n_{u_{i}}^{-i}}{\xi_{0}+N-1}\delta_{\psi_{u_{i}}}\right\} p\left(\mathbf{Z}^{i,:}\mid\mathbf{Z}^{-i,u_{i}},\beta\right)p\left(\mathbf{W}^{i,:}\mid\mathbf{W}^{-i,u_{i}}\right) \end{array} $$
(28)

where a ψ comes from ψ j ∼ Dirichlet (a ψ ). The number d l i is the frequency of l–th feature and \(n_{c_{i}}\) is the total number of features in i–th item of the associated medium.

For a new group, i.e. when u i = J + 1, the conditional Gibbs posterior of u i is given as

$$ p\left(u_{i}\mid\ldots\right)\propto\left\{ \left(a_{\psi}\right)^{n_{c_{i}}}\frac{\Gamma\left(La_{\psi}\right)}{\Gamma\left(La_{\psi}+n_{c_{i}}\right)}\right\} \times\left\{ \frac{\xi_{0}}{\xi_{0}+N-1}\right\} \times p\left(\mathbf{Z}^{i,:}\mid\mathbf{Z}^{-i,u_{i}},\beta\right)p\left(\mathbf{W}^{i,:}\mid\mathbf{W}^{-i,u_{i}}\right) $$
(29)

In above expressions, the predictive distribution of Z i, : is given as

$$ p\left(\mathbf{Z}^{i,:}\mid\mathbf{Z}^{-i,u_{i}},\beta\right)={\Pi}_{k=1}^{K^{\dagger}}\frac{\Gamma\left(\alpha_{u_{i}}\right){\Gamma}\left(\alpha_{u_{i}}\beta_{(k)}+f_{u_{i}}^{k}\right){\Gamma}\left(\alpha_{u_{i}}\bar{\beta}_{(k)}+N_{u_{i}}-f_{u_{i}}^{k}\right)}{\Gamma\left(\alpha_{u_{i}+N_{u_{i}}}\right){\Gamma}\left(\alpha_{u_{i}}\beta_{(k)}\right){\Gamma}\left(\alpha_{u_{i}}\bar{\beta}_{(k)}\right)} $$
(30)

where \(f_{u_{i}}^{k}\triangleq \sum \limits _{i^{\prime }}\mathbf {Z}_{u_{i}}^{i^{\prime },k}\). The predictive distribution of W i, : is given as

$$ p\left(\mathbf{W}^{i,:}\mid\mathbf{W}^{-i,u_{i}}\right)=\left(2\pi\right)^{-\frac{K^{\dagger}}{2}}\frac{D\left(s_{u_{i}}^{-i},m_{u_{i}}^{-i},\nu_{u_{i}}^{-i},{\Delta}_{u_{i}}^{-i}\right)}{D\left(s_{0},m_{0},\nu_{0},{\Delta}_{0}\right)} $$
(31)

where D denotes the normalization constant of the Normal-Wishart distribution while s, m, ν and Δ are the parameters characterizing the distribution. Using the parameter set (s 0, m 0, ν 0, Δ0) for the prior distribution, the posterior set \(\left (s_{u_{i}}^{-i},m_{u_{i}}^{-i},\nu _{u_{i}}^{-i},{\Delta }_{u_{i}}^{-i}\right )\) is given as \(s_{u_{i}}^{-i}=s_{0}+N_{u_{i}}^{-i},~m_{u_{i}}^{-i}=\frac {s_{0}m_{0}+{\sum }_{i^{\prime }\in S_{-i}}\mathbf {W}^{i^{\prime },:}}{s_{0}+N_{u_{i}}^{-i}}~\nu _{u_{i}}^{-i}=\nu _{0}+N_{u_{i}}^{-i}\) and \({\Delta }_{u_{i}}^{-i}={\Delta }_{0}+\mathbf {W}^{-i,u_{i}}\left (\mathbf {W}^{-i,u_{i}}\right )^{\mathsf {T}}+s_{0}m_{0}\left (m_{0}\right )^{\mathsf {T}}-s_{u_{i}}^{-i}m_{u_{i}}^{-i}\left (m_{u_{i}}^{-i}\right )^{\mathsf {T}}\) and \(S_{-i}\triangleq \left \{ i^{\prime }|u_{i^{\prime }}=u_{i},i^{\prime }\neq i\right \} \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, S., Phung, D. & Venkatesh, S. Modelling multilevel data in multimedia: A hierarchical factor analysis approach. Multimed Tools Appl 75, 4933–4955 (2016). https://doi.org/10.1007/s11042-014-2394-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2394-3

Keywords

Navigation