Skip to main content

Investigation of Activation Functions for Generalized Learning Vector Quantization

  • Conference paper
  • First Online:
Advances in Self-Organizing Maps, Learning Vector Quantization, Clustering and Data Visualization (WSOM 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 976))

Included in the following conference series:

Abstract

An appropriate choice of the activation function plays an important role for the performance of (deep) multilayer perceptrons (MLP) in classification and regression learning. Usually, these activations are applied to all perceptron units in the network. A powerful alternative to MLPs are the prototype-based classification learning methods like (generalized) learning vector quantization (GLVQ). These models also deal with activation functions but here they are applied to the so-called classifier function instead. In the paper we investigate whether successful candidates of activation functions for MLP also perform well for GLVQ. For this purpose we show that the GLVQ classifier function can also be interpreted as a generalized perceptron.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The derivative of the maximum function \(m\left( x,\beta \right) \) could be approximated using the quasi-max function \(\mathcal {Q}_{\alpha }\left( x,\beta \right) =\frac{1}{\alpha }\log \left( e^{\alpha x}+e^{\alpha \cdot \text {sgd}\left( x,\beta \right) }\right) \) proposed by J.D. Cook [23] with \(\alpha \gg 0\). The respective consistent derivative approximation is

    $$\begin{aligned} \frac{d\,m\left( x,\beta \right) }{dx}\approx \frac{\left( \exp \left( \alpha x\right) +\frac{d\text {sgd}\left( x,\beta \right) }{dx}\cdot \exp \left( \alpha \cdot \text {sgd}\left( x,\beta \right) \right) \right) }{\left( \exp \left( \alpha x\right) +\exp \left( \alpha \cdot \text {sgd}\left( x,\beta \right) \right) \right) } \end{aligned}$$
    (11)

    as provided in [24]. Analogously, the quasi-max approximation \(m_{\tau }\left( x,\beta \right) \approx \frac{1}{\alpha }\log \left( \exp \left( \alpha x\right) +\exp \left( \alpha \cdot \tau \left( x,\beta \right) \right) \right) \) is valid with

    $$\begin{aligned} \frac{d\,m_{\tau }\left( x,\beta \right) }{dx}\approx \frac{\left( \exp \left( \alpha x\right) +\frac{d\,\tau \left( x,\beta \right) }{dx}\cdot \exp \left( \alpha \cdot \tau \left( x,\beta \right) \right) \right) }{\left( \exp \left( \alpha x\right) +\exp \left( \alpha \cdot \tau \left( x,\beta \right) \right) \right) } \end{aligned}$$
    (12)

    as the derivative approximation.

  2. 2.

    Tecator data set is available at StaLib: http://lib.stat.cmu.edu/datasets/tecator.

  3. 3.

    The data set can be found at www.ehu.es/ccwintco/uploads/2/22/Indian_pines.mat.

References

  1. Kohonen T (1988) Learning vector quantization. Neural Netw 1(Suppl 1):303

    Google Scholar 

  2. Villmann T, Saralajew S, Villmann A, Kaden M (2018) Learning vector quantization methods for interpretable classification learning and multilayer networks. In: Sabourin C, Merelo JJ, Barranco AL, Madani K, Warwick K (eds) Proceedings of the 10th international joint conference on computational intelligence (IJCCI), Sevilla. SCITEPRESS - Science and Technology Publications, Lda., Lisbon, pp 15–21. ISBN 978-989-758-327-8

    Google Scholar 

  3. Sato A, Yamada K (1996) Generalized learning vector quantization. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems 8, Proceedings of the 1995 conference. MIT Press, Cambridge, pp 423–429

    Google Scholar 

  4. Crammer K, Gilad-Bachrach R, Navot A, Tishby A (2003) Margin analysis of the LVQ algorithm. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing (Proceedings of NIPS 2002), vol 15. MIT Press, Cambridge, pp 462–469

    Google Scholar 

  5. Schneider P, Hammer B, Biehl M (2009) Adaptive relevance matrices in learning vector quantization. Neural Comput 21:3532–3561

    Article  MathSciNet  Google Scholar 

  6. de Vries H, Memisevic R, Courville A (2016) Deep learning vector quantization. In: Verleysen M (ed) Proceedings of the European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2016), Louvain-La-Neuve, Belgium, pp 503–508. i6doc.com

  7. Villmann T, Biehl M, Villmann A, Saralajew S (2017) Fusion of deep learning architectures, multilayer feedforward networks and learning vector quantizers for deep classification learning. In: Proceedings of the 12th workshop on self-organizing maps and learning vector quantization (WSOM2017+). IEEE Press, pp 248–255

    Google Scholar 

  8. Kohonen T (1995) Self-organizing maps, vol 30. Springer series in information sciences. Springer, Heidelberg (Second Extended Edition 1997)

    MATH  Google Scholar 

  9. Haykin S (1994) Neural networks. A comprehensive foundation. Macmillan, New York

    MATH  Google Scholar 

  10. Hertz JA, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation, vol 1. Santa Fe institute studies in the sciences of complexity: lecture notes. Addison-Wesley, Redwood City

    Google Scholar 

  11. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  12. Ramachandran P, Zoph B, Le QV (2018) Swish: a self-gated activation function. Technical report arXiv:1710.05941v2, Google brain

  13. Ramachandran P, Zoph B, Le QV (2018) Searching for activation functions. Technical report arXiv:1710.05941v1, Google brain

  14. Eger S, Youssef P, Gurevych I (2018) Is it time to swish? comparing deep learning activation functions across NLP tasks. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), Brussels, Belgium. Association for computational linguistics, pp 4415–4424

    Google Scholar 

  15. Chieng HH, Wahid N, Pauline O, Perla SRK (2018) Flatten-T swish: a thresholded ReLU-swish-like activation function for deep learning. Int J Adv Intell Inform 4(2):76–86

    Article  Google Scholar 

  16. Kaden M, Riedel M, Hermann W, Villmann T (2015) Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines. Soft Comput 19(9):2423–2434

    Article  Google Scholar 

  17. Fletcher R (1987) Practical methods of optimization, 2nd edn. Wiley, New York. 2000 edition

    Google Scholar 

  18. LeKander M, Biehl M, de Vries H (2017) Empirical evaluation of gradient methods for matrix learning vector quantization. In: Proceedings of the 12th workshop on self-organizing maps and learning vector quantization (WSOM2017+). IEEE Press, pp 1–8

    Google Scholar 

  19. Hammer B, Villmann T (2002) Generalized relevance learning vector quantization. Neural Netw 15(8–9):1059–1068

    Article  Google Scholar 

  20. Saralajew S, Holdijk L, Rees M, Kaden M, Villmann T (2018) Prototype-based neural network layers: incorporating vector quantization. Mach Learn Rep 12(MLR-03-2018):1–17. ISSN: 1865-3960, http://www.techfak.uni-bielefeld.de/~fschleif/mlr/mlr_03_2018.pdf

  21. Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11

    Article  Google Scholar 

  22. Zhang H, Weng T-W, Chen P-Y, Hsieh C-J, Daniel L (2018) Efficient neural network robustness certification with general activation functions. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., New York, pp 4944–4953

    Google Scholar 

  23. Cook J (2011) Basic properties of the soft maximum. Working paper series 70, UT MD Anderson cancer center department of biostatistics. http://biostats.bepress.com/mdandersonbiostat/paper70

  24. Lange M, Villmann T (2013) Derivatives of \(l_p\)-norms and their approximations. Mach. Learn. Rep. 7(MLR-04-2013):43–59. ISSN: 1865-3960. http://www.techfak.uni-bielefeld.de/~fschleif/mlr/mlr_04_2013.pdf

  25. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML-workshop for on deep learning for audio, speech, and language processing, Proceedings of machine learning research, vol 28

    Google Scholar 

  26. Krier C, Rossi F, François D, Verleysen M (2008) A data-driven functional projection approach for the selection of feature ranges in spectra with ICA or cluster analysis. Chemometr Intell Lab Syst 91(1):43–53

    Article  Google Scholar 

  27. Landgrebe DA (2003) Signal theory methods in multispectral remote sensing. Wiley, Hoboken

    Book  Google Scholar 

  28. Asuncion A, Newman DJ: UC Irvine machine learning repository. http://archive.ics.uci.edu/ml/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Villmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Villmann, T., Ravichandran, J., Villmann, A., Nebel, D., Kaden, M. (2020). Investigation of Activation Functions for Generalized Learning Vector Quantization. In: Vellido, A., Gibert, K., Angulo, C., Martín Guerrero, J. (eds) Advances in Self-Organizing Maps, Learning Vector Quantization, Clustering and Data Visualization. WSOM 2019. Advances in Intelligent Systems and Computing, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-030-19642-4_18

Download citation

Publish with us

Policies and ethics