Advertisement

A Matrix Factorization Approach for Integrating Multiple Data Views

  • Derek Greene
  • Pádraig Cunningham
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)

Abstract

In many domains there will exist different representations or “views” describing the same set of objects. Taken alone, these views will often be deficient or incomplete. Therefore a key problem for exploratory data analysis is the integration of multiple views to discover the underlying structures in a domain. This problem is made more difficult when disagreement exists between views. We introduce a new unsupervised algorithm for combining information from related views, using a late integration strategy. Combination is performed by applying an approach based on matrix factorization to group related clusters produced on individual views. This yields a projection of the original clusters in the form of a new set of “meta-clusters” covering the entire domain. We also provide a novel model selection strategy for identifying the correct number of meta-clusters. Evaluations performed on a number of multi-view text clustering problems demonstrate the effectiveness of the algorithm.

Keywords

Base Clusterings Individual View View Versus Cumulative Vote 3sources Collection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. 11th Annual Conference on Computational learning theory, pp. 92–100 (1998)Google Scholar
  2. 2.
    Bickel, S., Scheffer, T.: Multi-view clustering. In: Proc. 4th IEEE International Conference on Data Mining, pp. 19–26 (2004)Google Scholar
  3. 3.
    Berthold, M., Patterson, D.: Towards learning in parallel universes. In: Proc. 2004 IEEE International Conference on Fuzzy Systems, vol. 1 (2004)Google Scholar
  4. 4.
    Pavlidis, P., Weston, J., Cai, J., Noble, W.: Learning Gene Functional Classifications from Multiple Data Types. Journal of Computational Biology 9(2), 401–411 (2002)CrossRefGoogle Scholar
  5. 5.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  6. 6.
    Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. JMLR 3, 583–617 (2002)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Jain, A.K., Fred, A.: Data clustering using evidence accumulation. In: Proc. 16th International Conference on Pattern Recognition., vol. 4, pp. 276–280 (2002)Google Scholar
  8. 8.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 2096(1), 359–392 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence 16(7), 901–912 (2002)CrossRefzbMATHGoogle Scholar
  10. 10.
    de Sa, V.: Spectral clustering with two views. In: ICML Workshop on Learning With Multiple Views (2005)Google Scholar
  11. 11.
    Zeng, E., Yang, C., Li, T., Narasimhan, G.: On the Effectiveness of Constraints Sets in Clustering Genes. In: Proc. 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), pp. 79–86 (2007)Google Scholar
  12. 12.
    Greene, D., Cunningham, P.: Producing accurate interpretable clusters from high-dimensional data. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 486–494. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Boutsidis, C., Gallopoulos, E.: SVD based initialization: A head start for non-negative matrix factorization. Pattern Recognition (2008)Google Scholar
  14. 14.
    Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification, 193–218 (1985)Google Scholar
  15. 15.
    Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proc. 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Derek Greene
    • 1
  • Pádraig Cunningham
    • 1
  1. 1.School of Computer Science & InformaticsUniversity College DublinRepublic of Ireland

Personalised recommendations