On Reducing Dimensionality of Labeled Data Efficiently

Zhang, Guoxi; Iwata, Tomoharu; Kashima, Hisashi

doi:10.1007/978-3-319-93040-4_7

Guoxi Zhang^19,21,
Tomoharu Iwata²⁰ &
Hisashi Kashima^19,21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3375 Accesses
1 Citations

Abstract

We address the problem of reducing dimensionality for labeled data. Our objective is to achieve better class separation in latent space. Existing nonlinear algorithms rely on pairwise distances between data samples, which are generally infeasible to compute or store in the large data limit. In this paper, we propose a parametric nonlinear algorithm that employs a spherical mixture model in the latent space. The proposed algorithm attains grand efficiency in reducing data dimensionality, because it only requires distances between data points and cluster centers. In our experiments, the proposed algorithm achieves up to 44 times better efficiency while maintaining similar efficacy. In practice, it can be used to speedup k-NN classification or visualize data points with their class structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bidder, O.R., Campbell, H.A., Gómez-Laich, A., Urgé, P., Walker, J., Cai, Y., Gao, L., Quintana, F., Wilson, R.P.: Love thy neighbour: automatic animal behavioural classification of acceleration data using the k-nearest neighbour algorithm. PLoS One 9(2), e88609 (2014)
Article Google Scholar
Carreira-Perpinán, M.A.: The elastic embedding algorithm for dimensionality reduction. In: ICML 2010, pp. 167–174 (2010)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Fanty, M., Cole, R.: Spoken letter recognition. In: Advances in Neural Information Processing Systems, pp. 220–226 (1991)
Google Scholar
Globerson, A., Roweis, S.T.: Metric learning by collapsing classes. In: Advances in neural information processing systems, pp. 451–458 (2006)
Google Scholar
Goldberger, J., Hinton, G.E., Roweis, S.T., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 513–520. MIT Press, Cambridge (2005). http://papers.nips.cc/paper/2566-neighbourhood-components-analysis.pdf
Google Scholar
He, Y., Mao, Y., Chen, W., Chen, Y.: Nonlinear metric learning with kernel density estimation. IEEE Trans. Knowl. Data Eng. 27(6), 1602–1614 (2015)
Article Google Scholar
Huang, G., Guo, C., Kusner, M.J., Sun, Y., Sha, F., Weinberger, K.Q.: Supervised word mover’s distance. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 4862–4870. Curran Associates, Inc., New York (2016). http://papers.nips.cc/paper/6139-supervised-word-movers-distance.pdf
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Iwata, T., Saito, K., Ueda, N., Stromsten, S., Griffiths, T.L., Tenenbaum, J.B.: Parametric embedding for class visualization. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 617–624. MIT Press, Cambridge (2005). http://papers.nips.cc/paper/2556-parametric-embedding-for-class-visualization.pdf
Google Scholar
Kedem, D., Tyree, S., Sha, F., Lanckriet, G.R., Weinberger, K.Q.: Non-linear metric learning. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 2573–2581. Curran Associates, Inc., New York (2012). http://papers.nips.cc/paper/4840-non-linear-metric-learning.pdf
Google Scholar
Kusner, M., Tyree, S., Weinberger, K., Agrawal, K.: Stochastic neighbor compression. In: International Conference on Machine Learning, pp. 622–630 (2014)
Google Scholar
Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, vol. 10, pp. 331–339 (1995)
Chapter Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lu, J., Zhou, X., Tan, Y.P., Shang, Y., Zhou, J.: Neighborhood repulsed metric learning for kinship verification. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 331–345 (2014)
Article Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
MATH Google Scholar
Min, M.R., Maaten, L., Yuan, Z., Bonner, A.J., Zhang, Z.: Deep supervised t-distributed embedding. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 791–798 (2010)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)
Google Scholar
Salakhutdinov, R., Hinton, G.E.: Learning a nonlinear embedding by preserving class neighbourhood structure. In: International Conference on Artificial Intelligence and Statistics, pp. 412–419 (2007)
Google Scholar
Villegas, M., Paredes, R.: Dimensionality reduction by minimizing nearest-neighbor classification error. Pattern Recogn. Lett. 32(4), 633–639 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Kyoto, Japan
Guoxi Zhang & Hisashi Kashima
NTT Communication Science Laboratories, Kyoto, Japan
Tomoharu Iwata
Riken Center for Advanced Intelligence Project, Tokyo, Japan
Guoxi Zhang & Hisashi Kashima

Authors

Guoxi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tomoharu Iwata
View author publications
You can also search for this author in PubMed Google Scholar
Hisashi Kashima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoxi Zhang .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, G., Iwata, T., Kashima, H. (2018). On Reducing Dimensionality of Labeled Data Efficiently. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-93040-4_7
Published: 17 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics