Multimedia Tools and Applications

, Volume 73, Issue 3, pp 1507–1543 | Cite as

A one-shot domain-independent robust multimedia clustering methodology based on hybrid multimodal fusion

  • Xavier SevillanoEmail author
  • Francesc Alías


The existence of multiple modalities poses a challenge to the design of multimedia data clustering systems, as the unsupervised nature of the problem makes it very difficult to determine a priori whether a single modality should dominate the clustering process, or if modalities should be combined somehow. In order to fight against these indeterminacies—which come on top of those referring to the selection of the optimal clustering algorithm and data representation for the problem at hand–, this work introduces robust multimedia clustering, a one-shot methodology for domain independent multimedia data clustering based on hybrid multimodal fusion. By means of experimentation, we firstly justify the motivation of the proposed methodology by proving the relevance of multimedia clustering indeterminacies. Subsequently, a specific multimedia clustering system based on the requirements of the methodology is implemented and evaluated on three multimedia clustering applications—music genres, photographic topics and audio-visual objects classification—as a proof of concept, analyzing the quality of the obtained partitions and the time complexity of the proposal. The experimental results reveal that the implemented system, which includes a self-refining consensus clustering procedure for attaining high levels of robustness, allows to obtain, in a fully unsupervised manner, better quality partitions than 93 % of the clusterers available in our experiments, being even able to improve the quality of the best ones and outperforming state-of-the-art alternatives.


Robust multimedia clustering Hybrid multimodal fusion Cluster ensembles Self-refining consensus Clustering indeterminacies 


  1. 1.
    Atrey P, Hossein M, El Saddik A, Kankanhalli M (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16:345–379CrossRefGoogle Scholar
  2. 2.
    Atrey P, Kankanhalli M, Jain R (2006) Information assimilation framework for event detection in multimedia surveillance systems. Multimedia Systems 12(3):239–253CrossRefGoogle Scholar
  3. 3.
    Ayache S, Quénot G, Gensel J (2007) Classifier fusion for svm-based multimedia semantic indexing. In: Proc. ECIR, pp 494–504Google Scholar
  4. 4.
    Barnard K, Forsyth D (2001) Learning the semantics of words and pictures. In: Proc. IEEE-ICCV, vol II, pp 408–415Google Scholar
  5. 5.
    Bassiou N, Moschou V, Kotropoulos C (2010) Speaker diarization exploiting the eigengap criterion and cluster ensembles. IEEE Trans Audio Speech Lang Process 18(8):2134–2144CrossRefGoogle Scholar
  6. 6.
    Bekkerman R, Jeon J (2007) Multi-modal clustering for multimedia collections. In: Proc. IEEE-CVPR, pp 1–8Google Scholar
  7. 7.
    Bendjebbour A, Delignon Y, Fouque L, Samson V, Pieczynski W (2001) Multisensor image segmentation using DempsterShafer fusion in Markov fields context. IEEE Trans Geosci Remote Sens 39(8), 1789–1798CrossRefGoogle Scholar
  8. 8.
    Benitez A, Chang S (2002) Perceptual knowledge construction from annotated image collections. In: Columbia University ADVENT, pp 26–29Google Scholar
  9. 9.
    Cai D, He X, Li Z, Ma W, Wen J (2004) Hierarchical clustering of WWW image search results using visual, textual and link information. In: Proc. ACM Multimedia, pp 952–959Google Scholar
  10. 10.
    Chaudhuri K, Kakade S, Livescu K, Sridharan K (2009) Multiview clustering via canonical correlation analysis. In: Proc. ICML, pp 129–136Google Scholar
  11. 11.
    Cooper M (2011) Clustering geo-tagged photo collections using dynamic programming. In: Proc. ACM MM, pp 1025–1028Google Scholar
  12. 12.
    Duygulu P, Barnard K, de Freitas N, Forsyth D (2002) Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proc. ECCV, vol 4. Springer Verlag, pp 97–112Google Scholar
  13. 13.
    Dy J, Brodley C (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889zbMATHMathSciNetGoogle Scholar
  14. 14.
    Fern X, Brodley C (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proc. ICML, pp 281–288Google Scholar
  15. 15.
    Fern X, Lin W (2008) Cluster ensemble selection. In: Proc. SDMGoogle Scholar
  16. 16.
    Foster I (1986) Designing and building parallel programs: concepts and tools for parallel software engineering. Addison-WesleyGoogle Scholar
  17. 17.
    Frank A, Asuncion, A (2010) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. [Online] Available: Accessed Aug 2013
  18. 18.
    Fred A, Jain AK (2002) Data clustering using evidence accumulation. In: Proc. ICPR, pp 276–280Google Scholar
  19. 19.
    Fred A, Jain A (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850CrossRefGoogle Scholar
  20. 20.
    Friedland G, Hung H, Yeo C (2009) Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In: Proc. IEEE-ICASSP, pp 4069–4072Google Scholar
  21. 21.
    Gao B, Liu T, Qin T, Zheng X, Cheng Q, Ma W (2005) Web image clustering by consistent utilization of visual features and surrounding texts. In: Proc. ACM Multimedia, pp 112–121Google Scholar
  22. 22.
    Ghosh J, Acharya A (2011) Cluster ensembles. WIREs Data Mining Knowl Discov 1(4):305–315CrossRefGoogle Scholar
  23. 23.
    Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30CrossRefGoogle Scholar
  24. 24.
    Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: Proc. ALENEX, pp 109–117Google Scholar
  25. 25.
    Hoyer PO (2004) Non-Negative Matrix Factorization with sparseness constraints. J Mach Learn Res 5:1457–1469zbMATHMathSciNetGoogle Scholar
  26. 26.
    Hyvärinen A, Karhunen J, Oja E (2001) Independent component analysis. John Wiley and SonsGoogle Scholar
  27. 27.
    Iam-on N, Boongoen T, Garrett S (2010) LCE: a link-based cluster ensemble method for improved gene expression data analysis. Bioinformatics 26(12):1513–1519CrossRefGoogle Scholar
  28. 28.
    Jain A, Murty M, Flynn P (1999) Data clustering: a survey. ACM Comput Surv 31(3):264–323CrossRefGoogle Scholar
  29. 29.
    Jolliffe I (1986) Principal component analysis. SpringerGoogle Scholar
  30. 30.
    Kaski S (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proc. IJCNN, pp 413–418Google Scholar
  31. 31.
    Khalidov V, Forbes F, Hansard M, Arnaud E, Horaud R (2008) Audio-visual clustering for multiple speaker localization. In: Proc. MLMI, pp 86–97Google Scholar
  32. 32.
    Kleinberg J (2002) An impossibility theorem for clustering. Proc NIPS 15:463–470Google Scholar
  33. 33.
    Klosgen W, Zytkow J, Zyt J (2002) Handbook of data mining and knowledge discovery. Oxford University Press, USAGoogle Scholar
  34. 34.
    Kohavi R, John G (1998) The wrapper approach. In: Liu H, Motoda H (eds) Feature extraction, construction and selection: a data mining perspective. Springer-Verlag, pp 33–50Google Scholar
  35. 35.
    Kuncheva LI, Hadjitodorov ST, Todorova LP (2006) Experimental comparison of cluster ensemble methods. In: Proc. FUSION, pp 24–28Google Scholar
  36. 36.
    Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791CrossRefGoogle Scholar
  37. 37.
    Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACN Trans Multim Comp 2(1):1–19CrossRefGoogle Scholar
  38. 38.
    Li T, Ding C, Jordan M (2007) Solving consensus and semi-supervised clustering problems using non-negative matrix factorization. In: Proc. IEEE-ICDM, pp 577–582Google Scholar
  39. 39.
    Loeff N, Ovesdotter-Alm C, Forsyth D (2006) Discriminating image senses by clustering with multimodal features. In: Proc. COLING/ACL, pp 547–554Google Scholar
  40. 40.
    Lu W, Li L, Li T, Zhang H, Guo J (2011) Web multimedia object clustering via information fusion. In: Proc. ICDAR, pp 319–323Google Scholar
  41. 41.
    Messina A, Montagnuolo M (2009) A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval. In Proc. WWW, pp 321–330Google Scholar
  42. 42.
    Monti S, Tamayo, P, Mesirov J, Golub T (2003) Consensus clustering: a resampling based method for class discovery and visualization of gene expression microarray data. J Mach Learn Res 52(1–2):91–118CrossRefzbMATHGoogle Scholar
  43. 43.
    Ni J, Ma X, Xu L, Wang J (2004) An image recognition method based on multiple BP neural networks fusion. In: Proc. IEEE int’l conf. on information acquisition, pp 323–326Google Scholar
  44. 44.
    Pinto F, Carriço J, Ramirez M, Almeida J (2007) Ranked adjusted rand: integrating distance and partition information in a measure of clustering agreement. BMC Bioinformatics 8(44):1–13Google Scholar
  45. 45.
    van Rijsbergen C (1979) Information retrieval. Buttersworth-HeinemannGoogle Scholar
  46. 46.
    Sevillano X, Alías F, Socoró J (2012) Positional and confidence voting-based consensus functions for fuzzy cluster ensembles. Fuzzy Sets Syst 193:1–32CrossRefGoogle Scholar
  47. 47.
    Sevillano X, Cobo G, Alías F, Socoró J (2006) Feature diversity in cluster ensembles for robust document clustering. In: Proc. SIGIR, pp 697–698Google Scholar
  48. 48.
    Sevillano X, Cobo G, Alías F, Socoró J (2007) Text clustering on latent thematic spaces: variants, strenghts and weaknesses. In: Proc. ICA, pp 794–801Google Scholar
  49. 49.
    Sevillano X (2009) Hierarchical consensus architectures and soft consensus functions for robust multimedia clustering. Ph.D. thesis, La Salle-Universitat Ramon LlullGoogle Scholar
  50. 50.
    Sevillano X, Valero X, Alías F (2012) Audio and video cues for geo-tagging online videos in the absence of metadata. In: Proc. CBMIGoogle Scholar
  51. 51.
    Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetGoogle Scholar
  52. 52.
    Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proc. SIAM-SDM, pp 379–390Google Scholar
  53. 53.
    Torkkola K (2003) Discriminative features for text document classification. Pattern Anal Appl 6(4):301–308MathSciNetGoogle Scholar
  54. 54.
    Turnbull D, Barrington L, Torres D, Lanckriet G (2007) Towards musical query-by-semantic-description using the CAL500 dataset. In: Proc. ACM SIGIR, pp 439–446Google Scholar
  55. 55.
    Vajaria H, Islam T, Sarkar S, Sankar R, Kasturi R (2006) Audio segmentation and speaker localization in meeting videos. In: Proc. IAPR-ICPR, vol 2, pp 1150–1153Google Scholar
  56. 56.
    Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kauffman PublishersGoogle Scholar
  57. 57.
    Wu Z, Cai L, Meng, H (2006) Multi-level fusion of audio and visual features for speaker identification. In: Proc. int’l conf. on adv. in biometrics, pp 493–499Google Scholar
  58. 58.
    Xu H, Chua T (2006) Fusion of AV features and external information sources for event detection in team sports video. ACM Trans Multimed Comput Commun Appl 2(1):44–67CrossRefGoogle Scholar
  59. 59.
    Xu R, Wunsch II D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(2):645–678CrossRefGoogle Scholar
  60. 60.
    Ye Y, Li T, Chen Y, Jiang Q (2010) Automatic malware categorization using cluster ensemble. In: Proc. SIGKDD, pp 95–104Google Scholar
  61. 61.
    Yu Z, Wang X, Wong H (2008) Ensemble based 3D human motion classification. In: Proc. IJCNN, pp 506–510Google Scholar
  62. 62.
    Yu Z, Wong H (2008) Knowledge based cluster ensemble for 3D head model classification. In: Proc. ICPR, pp 1–4Google Scholar
  63. 63.
    Yu Z, Wong H (2009) Class discovery from gene expression data based on perturbation and cluster ensemble. IEEE Trans. NanoBioSci. 8(2):147–160CrossRefMathSciNetGoogle Scholar
  64. 64.
    Zhang X, Jiao L, Liu F, Bo L, Gong, M (2008) Spectral clustering ensemble applied to SAR image segmentation. IEEE Trans Geosci Remote Sens 46(7):2126–2136CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Grup de Recerca en Tecnologies MèdiaLa Salle - Universitat Ramon LlullBarcelonaSpain

Personalised recommendations