Abstract
Traditional classification problem assumes that a data sample belongs to one class among the predefined classes. On the other hand, in a multi-labeled problem such as text categorization, data samples can belong to multiple classes and the task is to output a set of class labels associated with new unseen data sample. As common in text categorization problem, learning a classifier in a high dimensional space can be difficult, known as the curse of dimensionality. It has been shown that performing dimension reduction as a preprocessing step can improve classification performances greatly. Especially, Linear discriminant analysis (LDA) is one of the most popular dimension reduction methods, which is optimized for classification tasks. However, in applying LDA for a multi-labeled problem some ambiguities and difficulties can arise. In this paper, we study on applying LDA for a multi-labeled problem and analyze how an objective function of LDA can be interpreted in multi-labeled setting. We also propose a LDA algorithm which is effective in a multi-labeled problem. Experimental results demonstrate that by considering multi-labeled structures LDA can achieve computational efficiency and also improve classification performances greatly.
This work was supported by the Korea Research Foundation Grant funded by the Korean Government(MOEHRD)(KRF-2006-331-D00510).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lewis, D., Yang, Y., Rose, T., Li, F.: Rcv1: a new benchmark collection for text categorization research. Journal of Machine learning research 5, 361–397 (2004)
Pavlidis, P., Weston, J., Cai, J., Grundy, W.: Combining microarray expression data and phylogenetic profiles to learn functional categories using support vector machines. In: Proceedings of the 5th Annual international conference on computational biology, Montreal, Canada (2001)
Elisseeff, A., Weston, J.: A kernel method for multi-labeled classification. Advances in neural information processing systems 14, 681–687 (2002)
Zhang, M., Zhou, Z.: A k-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing (2005)
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)
Zhu, S., Ji, X., Xu, W., Gong, Y.: Multi-labelled classification using maximum entropy method. In: SIGIR 2005, Salvador, Brazil (2005)
Torkkola, K.: Linear discriminant analysis in document classification. In: TextDM 2001. IEEE ICDM-2001 Workshop on Text Mining, San Jose, CA (2001)
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces v.s. fisherfaces: Recognition using class specific linear projection. IEEE transactions on pattern analysis and machine learning 19(7), 711–720 (1997)
Nguyen, D., Rocke, D.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18(1), 39–50 (2002)
Park, C.H., Park, H., Pardalos, P.: A comparative study of linear and nonlinear feature extraction methods. In: Fourth IEEE International Conference on Data Mining, Brighton, United Kingdom, pp. 495–498 (2004)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Acadamic Press, San Diego (1990)
Yu, H., Yang, J.: A direct lda algorithm for high-dimensional data- with application to face recognition. Pattern recognition 34, 2067–2070 (2001)
Howland, P., Park, H.: Generalizing discriminant analysis using the generalized singular value decomposition. IEEE transaction on pattern analysis and machine intelligence 26(8), 995–1006 (2004)
Zheng, W., Zou, C., Zhao, L.: Real-time face recognition using gram-schmidt orthogonalization for lda. In: The Proceedings of the 17th International Conference on Pattern Recognition (2004)
Ye, J., Janardan, R., Park, C.H., Park, H.: An optimization criterion for generalized discriminant analysis on undersampled problems. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(8), 982–994 (2004)
Schapire, R., Singer, Y.: Boostexter: a boosting-based system for text categorization. Machine learning 39, 135–168 (2000)
Luo, X., Zincir-Heywood, N.: Evaluation of two systems on multi-class multi-label document classification. In: ISMIS 2005, New York, USA (2005)
Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)
Friedman, J.H.: Regularized discriminant analysis. Journal of the American statistical association 84(405), 165–175 (1989)
Chen, L., Liao, H.M., Ko, M., Lin, J., Yu, G.: A new lda-based face recognition system which can solve the small sample size problem. Pattern recognition 33, 1713–1726 (2000)
Yang, J., Yang, J.-Y.: Why can lda be performed in pca transformed space? Pattern Recognition 36, 563–566 (2003)
Kolman, B., Hill, D.: Introductory linear algebra, 8th edn. Prentice-Hall, Englewood Cliffs (2005)
Yu, K., Yu, S., Tresp, V.: Multi-label informed latent semantic indexing. In: SIGIR 2005, Salvador, Brazil (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, M., Park, C.H. (2007). On Applying Dimension Reduction for Multi-labeled Problems. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-73499-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)