Abstract
We address the problem of reducing dimensionality for labeled data. Our objective is to achieve better class separation in latent space. Existing nonlinear algorithms rely on pairwise distances between data samples, which are generally infeasible to compute or store in the large data limit. In this paper, we propose a parametric nonlinear algorithm that employs a spherical mixture model in the latent space. The proposed algorithm attains grand efficiency in reducing data dimensionality, because it only requires distances between data points and cluster centers. In our experiments, the proposed algorithm achieves up to 44 times better efficiency while maintaining similar efficacy. In practice, it can be used to speedup k-NN classification or visualize data points with their class structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bidder, O.R., Campbell, H.A., Gómez-Laich, A., Urgé, P., Walker, J., Cai, Y., Gao, L., Quintana, F., Wilson, R.P.: Love thy neighbour: automatic animal behavioural classification of acceleration data using the k-nearest neighbour algorithm. PLoS One 9(2), e88609 (2014)
Carreira-Perpinán, M.A.: The elastic embedding algorithm for dimensionality reduction. In: ICML 2010, pp. 167–174 (2010)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Fanty, M., Cole, R.: Spoken letter recognition. In: Advances in Neural Information Processing Systems, pp. 220–226 (1991)
Globerson, A., Roweis, S.T.: Metric learning by collapsing classes. In: Advances in neural information processing systems, pp. 451–458 (2006)
Goldberger, J., Hinton, G.E., Roweis, S.T., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 513–520. MIT Press, Cambridge (2005). http://papers.nips.cc/paper/2566-neighbourhood-components-analysis.pdf
He, Y., Mao, Y., Chen, W., Chen, Y.: Nonlinear metric learning with kernel density estimation. IEEE Trans. Knowl. Data Eng. 27(6), 1602–1614 (2015)
Huang, G., Guo, C., Kusner, M.J., Sun, Y., Sha, F., Weinberger, K.Q.: Supervised word mover’s distance. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 4862–4870. Curran Associates, Inc., New York (2016). http://papers.nips.cc/paper/6139-supervised-word-movers-distance.pdf
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Iwata, T., Saito, K., Ueda, N., Stromsten, S., Griffiths, T.L., Tenenbaum, J.B.: Parametric embedding for class visualization. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 617–624. MIT Press, Cambridge (2005). http://papers.nips.cc/paper/2556-parametric-embedding-for-class-visualization.pdf
Kedem, D., Tyree, S., Sha, F., Lanckriet, G.R., Weinberger, K.Q.: Non-linear metric learning. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 2573–2581. Curran Associates, Inc., New York (2012). http://papers.nips.cc/paper/4840-non-linear-metric-learning.pdf
Kusner, M., Tyree, S., Weinberger, K., Agrawal, K.: Stochastic neighbor compression. In: International Conference on Machine Learning, pp. 622–630 (2014)
Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, vol. 10, pp. 331–339 (1995)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lu, J., Zhou, X., Tan, Y.P., Shang, Y., Zhou, J.: Neighborhood repulsed metric learning for kinship verification. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 331–345 (2014)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Min, M.R., Maaten, L., Yuan, Z., Bonner, A.J., Zhang, Z.: Deep supervised t-distributed embedding. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 791–798 (2010)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)
Salakhutdinov, R., Hinton, G.E.: Learning a nonlinear embedding by preserving class neighbourhood structure. In: International Conference on Artificial Intelligence and Statistics, pp. 412–419 (2007)
Villegas, M., Paredes, R.: Dimensionality reduction by minimizing nearest-neighbor classification error. Pattern Recogn. Lett. 32(4), 633–639 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhang, G., Iwata, T., Kashima, H. (2018). On Reducing Dimensionality of Labeled Data Efficiently. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-93040-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93039-8
Online ISBN: 978-3-319-93040-4
eBook Packages: Computer ScienceComputer Science (R0)