Learning and Utilizing a Pool of Features in Non-negative Matrix Factorization
Learning and utilizing a pool of features for a given data is important to achieve better performance in data analysis. Since many real world data can be represented as a non-negative data matrix, Non-negative Matrix Factorization (NMF) has recently become popular to deal with data under the non-negativity constraint. However, when the number of features is increased, the constraint imposed on the features can hinder the effective utilization of the learned representation. We conduct extensive experiments to investigate the effectiveness of several state-of-the-art NMF algorithms for learning and utilizing a pool of features over document datasets. Experimental results revealed that coping with the non-orthogonality of features is crucial to achieve a stable performance for exploiting a large number of features in NMF.
KeywordsNormalize Mutual Information Document Cluster Cluster Assignment Imbalanced Data Imbalanced Dataset
Unable to display preview. Download preview PDF.
- 2.Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proc. KDD 2006, pp. 126–135 (2006)Google Scholar
- 3.Harville, D.A.: Matrix Algebra From a Statistican’s Perspective. Springer (2008)Google Scholar
- 6.Kamvar, S.D., Klein, D., Manning, C.D.: Spectral learning. In: Proc. of IJCAI 2003, pp. 561–566 (2003)Google Scholar
- 8.Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Proc. NIPS 2001, pp. 556–562 (2001)Google Scholar
- 9.Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.: Self-taught learning:transfer learning from unlabeled data. In: Proc. ICML 2007, pp. 759–766 (2007)Google Scholar
- 12.Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proc. SIGIR 2003, pp. 267–273 (2003)Google Scholar