Investigation of Activation Functions for Generalized Learning Vector Quantization

Villmann, Thomas; Ravichandran, Jensun; Villmann, Andrea; Nebel, David; Kaden, Marika

doi:10.1007/978-3-030-19642-4_18

Thomas Villmann¹⁸,
Jensun Ravichandran¹⁸,
Andrea Villmann^18,19,
David Nebel^18,19 &
…
Marika Kaden¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 976))

Included in the following conference series:

International Workshop on Self-Organizing Maps

936 Accesses
3 Citations

Abstract

An appropriate choice of the activation function plays an important role for the performance of (deep) multilayer perceptrons (MLP) in classification and regression learning. Usually, these activations are applied to all perceptron units in the network. A powerful alternative to MLPs are the prototype-based classification learning methods like (generalized) learning vector quantization (GLVQ). These models also deal with activation functions but here they are applied to the so-called classifier function instead. In the paper we investigate whether successful candidates of activation functions for MLP also perform well for GLVQ. For this purpose we show that the GLVQ classifier function can also be interpreted as a generalized perceptron.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The derivative of the maximum function $m\left( x,\beta \right) $ could be approximated using the quasi-max function $\mathcal {Q}_{\alpha }\left( x,\beta \right) =\frac{1}{\alpha }\log \left( e^{\alpha x}+e^{\alpha \cdot \text {sgd}\left( x,\beta \right) }\right) $ proposed by J.D. Cook [23] with $\alpha \gg 0$. The respective consistent derivative approximation is
$$\begin{aligned} \frac{d\,m\left( x,\beta \right) }{dx}\approx \frac{\left( \exp \left( \alpha x\right) +\frac{d\text {sgd}\left( x,\beta \right) }{dx}\cdot \exp \left( \alpha \cdot \text {sgd}\left( x,\beta \right) \right) \right) }{\left( \exp \left( \alpha x\right) +\exp \left( \alpha \cdot \text {sgd}\left( x,\beta \right) \right) \right) } \end{aligned}$$
(11)
as provided in [24]. Analogously, the quasi-max approximation $m_{\tau }\left( x,\beta \right) \approx \frac{1}{\alpha }\log \left( \exp \left( \alpha x\right) +\exp \left( \alpha \cdot \tau \left( x,\beta \right) \right) \right) $ is valid with
$$\begin{aligned} \frac{d\,m_{\tau }\left( x,\beta \right) }{dx}\approx \frac{\left( \exp \left( \alpha x\right) +\frac{d\,\tau \left( x,\beta \right) }{dx}\cdot \exp \left( \alpha \cdot \tau \left( x,\beta \right) \right) \right) }{\left( \exp \left( \alpha x\right) +\exp \left( \alpha \cdot \tau \left( x,\beta \right) \right) \right) } \end{aligned}$$
(12)
as the derivative approximation.
2.
Tecator data set is available at StaLib: http://lib.stat.cmu.edu/datasets/tecator.
3.
The data set can be found at www.ehu.es/ccwintco/uploads/2/22/Indian_pines.mat.

References

Kohonen T (1988) Learning vector quantization. Neural Netw 1(Suppl 1):303
Google Scholar
Villmann T, Saralajew S, Villmann A, Kaden M (2018) Learning vector quantization methods for interpretable classification learning and multilayer networks. In: Sabourin C, Merelo JJ, Barranco AL, Madani K, Warwick K (eds) Proceedings of the 10th international joint conference on computational intelligence (IJCCI), Sevilla. SCITEPRESS - Science and Technology Publications, Lda., Lisbon, pp 15–21. ISBN 978-989-758-327-8
Google Scholar
Sato A, Yamada K (1996) Generalized learning vector quantization. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems 8, Proceedings of the 1995 conference. MIT Press, Cambridge, pp 423–429
Google Scholar
Crammer K, Gilad-Bachrach R, Navot A, Tishby A (2003) Margin analysis of the LVQ algorithm. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing (Proceedings of NIPS 2002), vol 15. MIT Press, Cambridge, pp 462–469
Google Scholar
Schneider P, Hammer B, Biehl M (2009) Adaptive relevance matrices in learning vector quantization. Neural Comput 21:3532–3561
Article MathSciNet Google Scholar
de Vries H, Memisevic R, Courville A (2016) Deep learning vector quantization. In: Verleysen M (ed) Proceedings of the European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2016), Louvain-La-Neuve, Belgium, pp 503–508. i6doc.com
Villmann T, Biehl M, Villmann A, Saralajew S (2017) Fusion of deep learning architectures, multilayer feedforward networks and learning vector quantizers for deep classification learning. In: Proceedings of the 12th workshop on self-organizing maps and learning vector quantization (WSOM2017+). IEEE Press, pp 248–255
Google Scholar
Kohonen T (1995) Self-organizing maps, vol 30. Springer series in information sciences. Springer, Heidelberg (Second Extended Edition 1997)
MATH Google Scholar
Haykin S (1994) Neural networks. A comprehensive foundation. Macmillan, New York
MATH Google Scholar
Hertz JA, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation, vol 1. Santa Fe institute studies in the sciences of complexity: lecture notes. Addison-Wesley, Redwood City
Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
MATH Google Scholar
Ramachandran P, Zoph B, Le QV (2018) Swish: a self-gated activation function. Technical report arXiv:1710.05941v2, Google brain
Ramachandran P, Zoph B, Le QV (2018) Searching for activation functions. Technical report arXiv:1710.05941v1, Google brain
Eger S, Youssef P, Gurevych I (2018) Is it time to swish? comparing deep learning activation functions across NLP tasks. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), Brussels, Belgium. Association for computational linguistics, pp 4415–4424
Google Scholar
Chieng HH, Wahid N, Pauline O, Perla SRK (2018) Flatten-T swish: a thresholded ReLU-swish-like activation function for deep learning. Int J Adv Intell Inform 4(2):76–86
Article Google Scholar
Kaden M, Riedel M, Hermann W, Villmann T (2015) Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines. Soft Comput 19(9):2423–2434
Article Google Scholar
Fletcher R (1987) Practical methods of optimization, 2nd edn. Wiley, New York. 2000 edition
Google Scholar
LeKander M, Biehl M, de Vries H (2017) Empirical evaluation of gradient methods for matrix learning vector quantization. In: Proceedings of the 12th workshop on self-organizing maps and learning vector quantization (WSOM2017+). IEEE Press, pp 1–8
Google Scholar
Hammer B, Villmann T (2002) Generalized relevance learning vector quantization. Neural Netw 15(8–9):1059–1068
Article Google Scholar
Saralajew S, Holdijk L, Rees M, Kaden M, Villmann T (2018) Prototype-based neural network layers: incorporating vector quantization. Mach Learn Rep 12(MLR-03-2018):1–17. ISSN: 1865-3960, http://www.techfak.uni-bielefeld.de/~fschleif/mlr/mlr_03_2018.pdf
Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11
Article Google Scholar
Zhang H, Weng T-W, Chen P-Y, Hsieh C-J, Daniel L (2018) Efficient neural network robustness certification with general activation functions. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., New York, pp 4944–4953
Google Scholar
Cook J (2011) Basic properties of the soft maximum. Working paper series 70, UT MD Anderson cancer center department of biostatistics. http://biostats.bepress.com/mdandersonbiostat/paper70
Lange M, Villmann T (2013) Derivatives of $l_p$-norms and their approximations. Mach. Learn. Rep. 7(MLR-04-2013):43–59. ISSN: 1865-3960. http://www.techfak.uni-bielefeld.de/~fschleif/mlr/mlr_04_2013.pdf
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML-workshop for on deep learning for audio, speech, and language processing, Proceedings of machine learning research, vol 28
Google Scholar
Krier C, Rossi F, François D, Verleysen M (2008) A data-driven functional projection approach for the selection of feature ranges in spectra with ICA or cluster analysis. Chemometr Intell Lab Syst 91(1):43–53
Article Google Scholar
Landgrebe DA (2003) Signal theory methods in multispectral remote sensing. Wiley, Hoboken
Book Google Scholar
Asuncion A, Newman DJ: UC Irvine machine learning repository. http://archive.ics.uci.edu/ml/

Download references

Author information

Authors and Affiliations

Saxony Institute for Computational Intelligence and Machine Learning, University of Applied Sciences Mittweida, Mittweida, Germany
Thomas Villmann, Jensun Ravichandran, Andrea Villmann, David Nebel & Marika Kaden
Schulzentrum Döbeln-Mittweida, Döbeln, Germany
Andrea Villmann & David Nebel

Authors

Thomas Villmann
View author publications
You can also search for this author in PubMed Google Scholar
Jensun Ravichandran
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Villmann
View author publications
You can also search for this author in PubMed Google Scholar
David Nebel
View author publications
You can also search for this author in PubMed Google Scholar
Marika Kaden
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Villmann .

Editor information

Editors and Affiliations

Department of Computer Science, UPC BarcelonaTech, Barcelona, Spain
Alfredo Vellido
Knowledge Engineering and Machine Learning Group (KEMLG) at Intelligent Data Science and Artificial Intelligence Research Center, UPC BarcelonaTech, Barcelona, Spain
Karina Gibert
Department of Automatic Control, UPC BarcelonaTech, Barcelona, Spain
Cecilio Angulo
Departament d'Enginyeria Electrònica, Universitat de València, Burjassot, Valencia, Spain
José David Martín Guerrero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Villmann, T., Ravichandran, J., Villmann, A., Nebel, D., Kaden, M. (2020). Investigation of Activation Functions for Generalized Learning Vector Quantization. In: Vellido, A., Gibert, K., Angulo, C., Martín Guerrero, J. (eds) Advances in Self-Organizing Maps, Learning Vector Quantization, Clustering and Data Visualization. WSOM 2019. Advances in Intelligent Systems and Computing, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-030-19642-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-19642-4_18
Published: 28 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19641-7
Online ISBN: 978-3-030-19642-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics