Abstract
An appropriate choice of the activation function plays an important role for the performance of (deep) multilayer perceptrons (MLP) in classification and regression learning. Usually, these activations are applied to all perceptron units in the network. A powerful alternative to MLPs are the prototype-based classification learning methods like (generalized) learning vector quantization (GLVQ). These models also deal with activation functions but here they are applied to the so-called classifier function instead. In the paper we investigate whether successful candidates of activation functions for MLP also perform well for GLVQ. For this purpose we show that the GLVQ classifier function can also be interpreted as a generalized perceptron.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The derivative of the maximum function \(m\left( x,\beta \right) \) could be approximated using the quasi-max function \(\mathcal {Q}_{\alpha }\left( x,\beta \right) =\frac{1}{\alpha }\log \left( e^{\alpha x}+e^{\alpha \cdot \text {sgd}\left( x,\beta \right) }\right) \) proposed by J.D. Cook [23] with \(\alpha \gg 0\). The respective consistent derivative approximation is
$$\begin{aligned} \frac{d\,m\left( x,\beta \right) }{dx}\approx \frac{\left( \exp \left( \alpha x\right) +\frac{d\text {sgd}\left( x,\beta \right) }{dx}\cdot \exp \left( \alpha \cdot \text {sgd}\left( x,\beta \right) \right) \right) }{\left( \exp \left( \alpha x\right) +\exp \left( \alpha \cdot \text {sgd}\left( x,\beta \right) \right) \right) } \end{aligned}$$(11)as provided in [24]. Analogously, the quasi-max approximation \(m_{\tau }\left( x,\beta \right) \approx \frac{1}{\alpha }\log \left( \exp \left( \alpha x\right) +\exp \left( \alpha \cdot \tau \left( x,\beta \right) \right) \right) \) is valid with
$$\begin{aligned} \frac{d\,m_{\tau }\left( x,\beta \right) }{dx}\approx \frac{\left( \exp \left( \alpha x\right) +\frac{d\,\tau \left( x,\beta \right) }{dx}\cdot \exp \left( \alpha \cdot \tau \left( x,\beta \right) \right) \right) }{\left( \exp \left( \alpha x\right) +\exp \left( \alpha \cdot \tau \left( x,\beta \right) \right) \right) } \end{aligned}$$(12)as the derivative approximation.
- 2.
Tecator data set is available at StaLib: http://lib.stat.cmu.edu/datasets/tecator.
- 3.
The data set can be found at www.ehu.es/ccwintco/uploads/2/22/Indian_pines.mat.
References
Kohonen T (1988) Learning vector quantization. Neural Netw 1(Suppl 1):303
Villmann T, Saralajew S, Villmann A, Kaden M (2018) Learning vector quantization methods for interpretable classification learning and multilayer networks. In: Sabourin C, Merelo JJ, Barranco AL, Madani K, Warwick K (eds) Proceedings of the 10th international joint conference on computational intelligence (IJCCI), Sevilla. SCITEPRESS - Science and Technology Publications, Lda., Lisbon, pp 15–21. ISBN 978-989-758-327-8
Sato A, Yamada K (1996) Generalized learning vector quantization. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems 8, Proceedings of the 1995 conference. MIT Press, Cambridge, pp 423–429
Crammer K, Gilad-Bachrach R, Navot A, Tishby A (2003) Margin analysis of the LVQ algorithm. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing (Proceedings of NIPS 2002), vol 15. MIT Press, Cambridge, pp 462–469
Schneider P, Hammer B, Biehl M (2009) Adaptive relevance matrices in learning vector quantization. Neural Comput 21:3532–3561
de Vries H, Memisevic R, Courville A (2016) Deep learning vector quantization. In: Verleysen M (ed) Proceedings of the European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2016), Louvain-La-Neuve, Belgium, pp 503–508. i6doc.com
Villmann T, Biehl M, Villmann A, Saralajew S (2017) Fusion of deep learning architectures, multilayer feedforward networks and learning vector quantizers for deep classification learning. In: Proceedings of the 12th workshop on self-organizing maps and learning vector quantization (WSOM2017+). IEEE Press, pp 248–255
Kohonen T (1995) Self-organizing maps, vol 30. Springer series in information sciences. Springer, Heidelberg (Second Extended Edition 1997)
Haykin S (1994) Neural networks. A comprehensive foundation. Macmillan, New York
Hertz JA, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation, vol 1. Santa Fe institute studies in the sciences of complexity: lecture notes. Addison-Wesley, Redwood City
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Ramachandran P, Zoph B, Le QV (2018) Swish: a self-gated activation function. Technical report arXiv:1710.05941v2, Google brain
Ramachandran P, Zoph B, Le QV (2018) Searching for activation functions. Technical report arXiv:1710.05941v1, Google brain
Eger S, Youssef P, Gurevych I (2018) Is it time to swish? comparing deep learning activation functions across NLP tasks. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), Brussels, Belgium. Association for computational linguistics, pp 4415–4424
Chieng HH, Wahid N, Pauline O, Perla SRK (2018) Flatten-T swish: a thresholded ReLU-swish-like activation function for deep learning. Int J Adv Intell Inform 4(2):76–86
Kaden M, Riedel M, Hermann W, Villmann T (2015) Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines. Soft Comput 19(9):2423–2434
Fletcher R (1987) Practical methods of optimization, 2nd edn. Wiley, New York. 2000 edition
LeKander M, Biehl M, de Vries H (2017) Empirical evaluation of gradient methods for matrix learning vector quantization. In: Proceedings of the 12th workshop on self-organizing maps and learning vector quantization (WSOM2017+). IEEE Press, pp 1–8
Hammer B, Villmann T (2002) Generalized relevance learning vector quantization. Neural Netw 15(8–9):1059–1068
Saralajew S, Holdijk L, Rees M, Kaden M, Villmann T (2018) Prototype-based neural network layers: incorporating vector quantization. Mach Learn Rep 12(MLR-03-2018):1–17. ISSN: 1865-3960, http://www.techfak.uni-bielefeld.de/~fschleif/mlr/mlr_03_2018.pdf
Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11
Zhang H, Weng T-W, Chen P-Y, Hsieh C-J, Daniel L (2018) Efficient neural network robustness certification with general activation functions. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., New York, pp 4944–4953
Cook J (2011) Basic properties of the soft maximum. Working paper series 70, UT MD Anderson cancer center department of biostatistics. http://biostats.bepress.com/mdandersonbiostat/paper70
Lange M, Villmann T (2013) Derivatives of \(l_p\)-norms and their approximations. Mach. Learn. Rep. 7(MLR-04-2013):43–59. ISSN: 1865-3960. http://www.techfak.uni-bielefeld.de/~fschleif/mlr/mlr_04_2013.pdf
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML-workshop for on deep learning for audio, speech, and language processing, Proceedings of machine learning research, vol 28
Krier C, Rossi F, François D, Verleysen M (2008) A data-driven functional projection approach for the selection of feature ranges in spectra with ICA or cluster analysis. Chemometr Intell Lab Syst 91(1):43–53
Landgrebe DA (2003) Signal theory methods in multispectral remote sensing. Wiley, Hoboken
Asuncion A, Newman DJ: UC Irvine machine learning repository. http://archive.ics.uci.edu/ml/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Villmann, T., Ravichandran, J., Villmann, A., Nebel, D., Kaden, M. (2020). Investigation of Activation Functions for Generalized Learning Vector Quantization. In: Vellido, A., Gibert, K., Angulo, C., Martín Guerrero, J. (eds) Advances in Self-Organizing Maps, Learning Vector Quantization, Clustering and Data Visualization. WSOM 2019. Advances in Intelligent Systems and Computing, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-030-19642-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-19642-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19641-7
Online ISBN: 978-3-030-19642-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)