The Mathematics of Divergence Based Online Learning in Vector Quantization

  • Thomas Villmann
  • Sven Haase
  • Frank-Michael Schleif
  • Barbara Hammer
  • Michael Biehl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5998)


We propose the utilization of divergences in gradient descent learning of supervised and unsupervised vector quantization as an alternative for the squared Euclidean distance. The approach is based on the determination of the Fréchet-derivatives for the divergences, wich can be immediately plugged into the online-learning rules. We provide the mathematical foundation of the respective framework. This framework includes usual gradient descent learning of prototypes as well as parameter optimization and relevance learning for improvement of the performance.


vector quantization divergence based learning information theory clustering classification 


  1. 1.
    Amari, S.-I.: Differential-Geometrical Methods in Statistics. Springer, Heidelberg (1985)zbMATHGoogle Scholar
  2. 2.
    Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)MathSciNetGoogle Scholar
  3. 3.
    Bezdek, J., Hathaway, R., Windham, M.: Numerical comparison of RFCM and AP algorithms for clustering relational data. Pattern recognition 24, 783–791 (1991)Google Scholar
  4. 4.
    Cichocki, A., Zdunek, R., Phan, A., Amari, S.-I.: Nonnegative Matrix and Tensor Factorizations. Wiley, Chichester (2009)CrossRefGoogle Scholar
  5. 5.
    Cottrell, M., Hammer, B., Hasenfuß, A., Villmann, T.: Batch and median neural gas. Neural Networks 19, 762–771 (2006)zbMATHCrossRefGoogle Scholar
  6. 6.
    Csiszár, I.: Information-type measures of differences of probability distributions and indirect observations. Studia Sci. Math. Hungaria 2, 299–318 (1967)zbMATHGoogle Scholar
  7. 7.
    Fichtenholz, G.: Differential- und Integralrechnung, 9th edn., vol. II. Deutscher Verlag der Wissenschaften, Berlin (1964)zbMATHGoogle Scholar
  8. 8.
    Frigyik, B.A., Srivastava, S., Gupta, M.: An introduction to functional derivatives. Technical Report UWEETR-2008-0001, Dept. of Electrical Engineering, University of Washington (2008)Google Scholar
  9. 9.
    Fujisawa, H., Eguchi, S.: Robust parameter estimation with a small bias against heavy contamination. Journal of Multivariate Analysis 99, 2053–2081 (2008)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Hammer, B., Villmann, T.: Generalized relevance learning vector quantization. Neural Networks 15(8-9), 1059–1068 (2002)CrossRefGoogle Scholar
  11. 11.
    Heskes, T.: Energy functions for self-organizing maps. In: Oja, E., Kaski, S. (eds.) Kohonen Maps, pp. 303–316. Elsevier, Amsterdam (1999)CrossRefGoogle Scholar
  12. 12.
    Hulle, M.M.V.: Kernel-based topographic map formation achieved with an information theoretic approach. Neural Networks 15, 1029–1039 (2002)CrossRefGoogle Scholar
  13. 13.
    Jang, E., Fyfe, C., Ko, H.: Bregman divergences and the self organising map. In: Fyfe, C., Kim, D., Lee, S.-Y., Yin, H. (eds.) IDEAL 2008. LNCS, vol. 5326, pp. 452–458. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Kantorowitsch, I., Akilow, G.: Funktionalanalysis in normierten Räumen, 2nd revised edn. Akademie-Verlag, Berlin (1978)Google Scholar
  15. 15.
    Kohonen, T.: Self-Organizing Maps. Springer Series in Information Sciences, vol. 30. Springer, Heidelberg (1995) (2nd Extended edn. 1997)Google Scholar
  16. 16.
    Kullback, S., Leibler, R.: On information and sufficiency. Annals of Mathematical Statistics 22, 79–86 (1951)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Lee, J., Verleysen, M.: Generalization of the l p norm for time series and its application to self-organizing maps. In: Cottrell, M. (ed.) Proc. of Workshop on Self-Organizing Maps (WSOM) 2005, Paris, Sorbonne, pp. 733–740 (2005)Google Scholar
  18. 18.
    Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Transactions on Communications 28, 84–95 (1980)CrossRefGoogle Scholar
  19. 19.
    Martinetz, T.M., Berkovich, S.G., Schulten, K.J.: Neural-gas network for vector quantization and its application to time-series prediction. IEEE Trans. on Neural Networks 4(4), 558–569 (1993)CrossRefGoogle Scholar
  20. 20.
    Principe, J.C., Fisher III, J., Xu, D.: Information theoretic learning. In: Haykin, S. (ed.) Unsupervised Adaptive Filtering. Wiley, New York (2000)Google Scholar
  21. 21.
    Qin, A., Suganthan, P.: A novel kernel prototype-based learning algorithm. In: Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), vol. 4, pp. 621–624 (2004)Google Scholar
  22. 22.
    Ramsay, J., Silverman, B.: Functional Data Analysis, 2nd edn. Springer Science+Media, New York (2006)Google Scholar
  23. 23.
    Renyi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press (1961)Google Scholar
  24. 24.
    Renyi, A.: Probability Theory. North-Holland Publishing Company, Amsterdam (1970)Google Scholar
  25. 25.
    Rossi, F., Delannay, N., Conan-Gueza, B., Verleysen, M.: Representation of functional data in neural networks. Neurocomputing 64, 183–210 (2005)CrossRefGoogle Scholar
  26. 26.
    Sato, A., Yamada, K.: Generalized learning vector quantization. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Proceedings of the 1995 Conference on Advances in Neural Information Processing Systems, vol. 8, pp. 423–429. MIT Press, Cambridge (1996)Google Scholar
  27. 27.
    Schneider, P., Biehl, M., Hammer, B.: Hyperparameter learning in robust soft LVQ. In: Verleysen, M. (ed.) Proceedings of the European Symposium on Artificial Neural Networks ESANN, pp. 517–522. d-side publications (2009)Google Scholar
  28. 28.
    Schneider, P., Hammer, B., Biehl, M.: Adaptive relevance matrices in learning vector quantization. Neural Computation 21, 3532–3561 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis and Discovery. Cambridge University Press, Cambridge (2004)Google Scholar
  30. 30.
    Taneja, I., Kumar, P.: Relative information of type s, Csiszár’s f -divergence, and information inequalities. Information Sciences 166, 105–125 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Villmann, T., Haase, S.: Mathematical aspects of divergence based vector quantization using fréchet-derivatives - extended and revised version. Machine Learning Reports 4(MLR-01-2010), 1–35 (2010), Google Scholar
  32. 32.
    Villmann, T., Merényi, E., Hammer, B.: Neural maps in remote sensing image analysis. Neural Networks 16(3-4), 389–403 (2003)CrossRefGoogle Scholar
  33. 33.
    Villmann, T., Schleif, F.-M.: Functional vector quantization by neural maps. In: Chanussot, J. (ed.) Proceedings of First Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS 2009), pp. 1–4. IEEE Press, Los Alamitos (2009)CrossRefGoogle Scholar
  34. 34.
    Villmann, T., Schleif, F.-M., Kostrzewa, M., Walch, A., Hammer, B.: Classification of mass-spectrometric data in clinical proteomics using learning vector quantization methods. Briefings in Bioinformatics 9(2), 129–143 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Thomas Villmann
    • 1
  • Sven Haase
    • 1
  • Frank-Michael Schleif
    • 2
  • Barbara Hammer
    • 2
  • Michael Biehl
    • 3
  1. 1.Department of Mathematics/Natural Sciences/InformaticsUniversity of Applied Sciences MittweidaMittweidaGermany
  2. 2.Institute of Computer ScienceClausthal University of TechnologyClausthal-ZellerfeldGermany
  3. 3.Johann Bernoulli Inst. for Mathematics and Computer ScienceRijksuniversity GroningenThe Netherlands

Personalised recommendations