On Applying Dimension Reduction for Multi-labeled Problems

Lee, Moonhwi; Park, Cheong Hee

doi:10.1007/978-3-540-73499-4_11

Moonhwi Lee¹ &
Cheong Hee Park¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3655 Accesses
2 Citations

Abstract

Traditional classification problem assumes that a data sample belongs to one class among the predefined classes. On the other hand, in a multi-labeled problem such as text categorization, data samples can belong to multiple classes and the task is to output a set of class labels associated with new unseen data sample. As common in text categorization problem, learning a classifier in a high dimensional space can be difficult, known as the curse of dimensionality. It has been shown that performing dimension reduction as a preprocessing step can improve classification performances greatly. Especially, Linear discriminant analysis (LDA) is one of the most popular dimension reduction methods, which is optimized for classification tasks. However, in applying LDA for a multi-labeled problem some ambiguities and difficulties can arise. In this paper, we study on applying LDA for a multi-labeled problem and analyze how an objective function of LDA can be interpreted in multi-labeled setting. We also propose a LDA algorithm which is effective in a multi-labeled problem. Experimental results demonstrate that by considering multi-labeled structures LDA can achieve computational efficiency and also improve classification performances greatly.

This work was supported by the Korea Research Foundation Grant funded by the Korean Government(MOEHRD)(KRF-2006-331-D00510).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lewis, D., Yang, Y., Rose, T., Li, F.: Rcv1: a new benchmark collection for text categorization research. Journal of Machine learning research 5, 361–397 (2004)
Google Scholar
Pavlidis, P., Weston, J., Cai, J., Grundy, W.: Combining microarray expression data and phylogenetic profiles to learn functional categories using support vector machines. In: Proceedings of the 5th Annual international conference on computational biology, Montreal, Canada (2001)
Google Scholar
Elisseeff, A., Weston, J.: A kernel method for multi-labeled classification. Advances in neural information processing systems 14, 681–687 (2002)
Google Scholar
Zhang, M., Zhou, Z.: A k-nearest neighbor based algorithm for multi-label classification. In: 2005 IEEE International Conference on Granular Computing (2005)
Google Scholar
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)
Google Scholar
Zhu, S., Ji, X., Xu, W., Gong, Y.: Multi-labelled classification using maximum entropy method. In: SIGIR 2005, Salvador, Brazil (2005)
Google Scholar
Torkkola, K.: Linear discriminant analysis in document classification. In: TextDM 2001. IEEE ICDM-2001 Workshop on Text Mining, San Jose, CA (2001)
Google Scholar
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces v.s. fisherfaces: Recognition using class specific linear projection. IEEE transactions on pattern analysis and machine learning 19(7), 711–720 (1997)
Article Google Scholar
Nguyen, D., Rocke, D.: Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18(1), 39–50 (2002)
Article Google Scholar
Park, C.H., Park, H., Pardalos, P.: A comparative study of linear and nonlinear feature extraction methods. In: Fourth IEEE International Conference on Data Mining, Brighton, United Kingdom, pp. 495–498 (2004)
Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Acadamic Press, San Diego (1990)
MATH Google Scholar
Yu, H., Yang, J.: A direct lda algorithm for high-dimensional data- with application to face recognition. Pattern recognition 34, 2067–2070 (2001)
Article MATH Google Scholar
Howland, P., Park, H.: Generalizing discriminant analysis using the generalized singular value decomposition. IEEE transaction on pattern analysis and machine intelligence 26(8), 995–1006 (2004)
Article Google Scholar
Zheng, W., Zou, C., Zhao, L.: Real-time face recognition using gram-schmidt orthogonalization for lda. In: The Proceedings of the 17th International Conference on Pattern Recognition (2004)
Google Scholar
Ye, J., Janardan, R., Park, C.H., Park, H.: An optimization criterion for generalized discriminant analysis on undersampled problems. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(8), 982–994 (2004)
Article Google Scholar
Schapire, R., Singer, Y.: Boostexter: a boosting-based system for text categorization. Machine learning 39, 135–168 (2000)
Article MATH Google Scholar
Luo, X., Zincir-Heywood, N.: Evaluation of two systems on multi-class multi-label document classification. In: ISMIS 2005, New York, USA (2005)
Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Friedman, J.H.: Regularized discriminant analysis. Journal of the American statistical association 84(405), 165–175 (1989)
Article MathSciNet Google Scholar
Chen, L., Liao, H.M., Ko, M., Lin, J., Yu, G.: A new lda-based face recognition system which can solve the small sample size problem. Pattern recognition 33, 1713–1726 (2000)
Article Google Scholar
Yang, J., Yang, J.-Y.: Why can lda be performed in pca transformed space? Pattern Recognition 36, 563–566 (2003)
Article Google Scholar
Kolman, B., Hill, D.: Introductory linear algebra, 8th edn. Prentice-Hall, Englewood Cliffs (2005)
Google Scholar
Yu, K., Yu, S., Tresp, V.: Multi-label informed latent semantic indexing. In: SIGIR 2005, Salvador, Brazil (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Engineering, Chungnam National University, 220 Gung-dong, Yuseong-gu, Daejeon, 305-763, Korea
Moonhwi Lee & Cheong Hee Park

Authors

Moonhwi Lee
View author publications
You can also search for this author in PubMed Google Scholar
Cheong Hee Park
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, M., Park, C.H. (2007). On Applying Dimension Reduction for Multi-labeled Problems. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-73499-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics