Global Metric Learning by Gradient Descent

  • Jens Hocke
  • Thomas Martinetz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8681)


The k-NN classifier can be very competitive if an appropriate distance measure is used. It is often used in applications because the classification decisions are easy to interpret. Here, we demonstrate how to find a good Mahalanobis distance for k-NN classification by a simple gradient descent without any constraints. The cost term uses global distances and unlike other methods there is a soft transition in the influence of data points. It is evaluated and compared to other metric learning and feature weighting methods on datasets from the UCI repository, where the described gradient method also shows a high robustness. In the comparison the advantages of global approaches are demonstrated.


Metric Learning Feature Weighting k-Nearest-Neighbors Neighborhood Component Analysis Large Margin Nearest Neighbor Classification Relief 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)CrossRefzbMATHGoogle Scholar
  2. 2.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010),
  3. 3.
    Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin based feature selection - theory and algorithms. In: Proceedings of the Twenty-first International Conference on Machine Learning, ICML 2004, pp. 43–50. ACM, New York (2004)Google Scholar
  4. 4.
    Goldberger, J., Roweis, S.T., Hinton, G.E., Salakhutdinov, R.: Neighbourhood components analysis. In: NIPS (2004)Google Scholar
  5. 5.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)Google Scholar
  6. 6.
    Hammer, B., Villmann, T.: Generalized relevance learning vector quantization. Neural Networks 15(8), 1059–1068 (2002)CrossRefGoogle Scholar
  7. 7.
    Hocke, J., Martinetz, T.: Feature Weighting by Maximum Distance Minimization. In: Mladenov, V., Koprinkova-Hristova, P., Palm, G., Villa, A.E.P., Appollini, B., Kasabov, N. (eds.) ICANN 2013. LNCS, vol. 8131, pp. 420–425. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256 (1992)Google Scholar
  9. 9.
    Schaul, T., LeCun, Y.: Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients. CoRR abs/1301.3764 (2013)Google Scholar
  10. 10.
    Schaul, T., Zhang, S., LeCun, Y.: No More Pesky Learning Rates. In: ICML, vol. (3), pp. 343–351 (2013)Google Scholar
  11. 11.
    Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2006)Google Scholar
  12. 12.
    Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)zbMATHGoogle Scholar
  13. 13.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. Advances in Neural Information Processing Systems 15, 505–512 (2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jens Hocke
    • 1
  • Thomas Martinetz
    • 1
  1. 1.University of Lübeck - Institute for Neuro- and BioinformaticsLübeckGermany

Personalised recommendations