Skip to main content

On Reducing Dimensionality of Labeled Data Efficiently

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10939))

Included in the following conference series:

Abstract

We address the problem of reducing dimensionality for labeled data. Our objective is to achieve better class separation in latent space. Existing nonlinear algorithms rely on pairwise distances between data samples, which are generally infeasible to compute or store in the large data limit. In this paper, we propose a parametric nonlinear algorithm that employs a spherical mixture model in the latent space. The proposed algorithm attains grand efficiency in reducing data dimensionality, because it only requires distances between data points and cluster centers. In our experiments, the proposed algorithm achieves up to 44 times better efficiency while maintaining similar efficacy. In practice, it can be used to speedup k-NN classification or visualize data points with their class structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bidder, O.R., Campbell, H.A., Gómez-Laich, A., Urgé, P., Walker, J., Cai, Y., Gao, L., Quintana, F., Wilson, R.P.: Love thy neighbour: automatic animal behavioural classification of acceleration data using the k-nearest neighbour algorithm. PLoS One 9(2), e88609 (2014)

    Article  Google Scholar 

  2. Carreira-Perpinán, M.A.: The elastic embedding algorithm for dimensionality reduction. In: ICML 2010, pp. 167–174 (2010)

    Google Scholar 

  3. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  4. Fanty, M., Cole, R.: Spoken letter recognition. In: Advances in Neural Information Processing Systems, pp. 220–226 (1991)

    Google Scholar 

  5. Globerson, A., Roweis, S.T.: Metric learning by collapsing classes. In: Advances in neural information processing systems, pp. 451–458 (2006)

    Google Scholar 

  6. Goldberger, J., Hinton, G.E., Roweis, S.T., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 513–520. MIT Press, Cambridge (2005). http://papers.nips.cc/paper/2566-neighbourhood-components-analysis.pdf

    Google Scholar 

  7. He, Y., Mao, Y., Chen, W., Chen, Y.: Nonlinear metric learning with kernel density estimation. IEEE Trans. Knowl. Data Eng. 27(6), 1602–1614 (2015)

    Article  Google Scholar 

  8. Huang, G., Guo, C., Kusner, M.J., Sun, Y., Sha, F., Weinberger, K.Q.: Supervised word mover’s distance. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 4862–4870. Curran Associates, Inc., New York (2016). http://papers.nips.cc/paper/6139-supervised-word-movers-distance.pdf

    Google Scholar 

  9. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  10. Iwata, T., Saito, K., Ueda, N., Stromsten, S., Griffiths, T.L., Tenenbaum, J.B.: Parametric embedding for class visualization. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 617–624. MIT Press, Cambridge (2005). http://papers.nips.cc/paper/2556-parametric-embedding-for-class-visualization.pdf

    Google Scholar 

  11. Kedem, D., Tyree, S., Sha, F., Lanckriet, G.R., Weinberger, K.Q.: Non-linear metric learning. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 2573–2581. Curran Associates, Inc., New York (2012). http://papers.nips.cc/paper/4840-non-linear-metric-learning.pdf

    Google Scholar 

  12. Kusner, M., Tyree, S., Weinberger, K., Agrawal, K.: Stochastic neighbor compression. In: International Conference on Machine Learning, pp. 622–630 (2014)

    Google Scholar 

  13. Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, vol. 10, pp. 331–339 (1995)

    Chapter  Google Scholar 

  14. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  15. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  16. Lu, J., Zhou, X., Tan, Y.P., Shang, Y., Zhou, J.: Neighborhood repulsed metric learning for kinship verification. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 331–345 (2014)

    Article  Google Scholar 

  17. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)

    MATH  Google Scholar 

  18. Min, M.R., Maaten, L., Yuan, Z., Bonner, A.J., Zhang, Z.: Deep supervised t-distributed embedding. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 791–798 (2010)

    Google Scholar 

  19. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)

    Google Scholar 

  20. Salakhutdinov, R., Hinton, G.E.: Learning a nonlinear embedding by preserving class neighbourhood structure. In: International Conference on Artificial Intelligence and Statistics, pp. 412–419 (2007)

    Google Scholar 

  21. Villegas, M., Paredes, R.: Dimensionality reduction by minimizing nearest-neighbor classification error. Pattern Recogn. Lett. 32(4), 633–639 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoxi Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, G., Iwata, T., Kashima, H. (2018). On Reducing Dimensionality of Labeled Data Efficiently. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10939. Springer, Cham. https://doi.org/10.1007/978-3-319-93040-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93040-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93039-8

  • Online ISBN: 978-3-319-93040-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics