Activation Functions

Goyal, Mohit; Goyal, Rajan; Venkatappa Reddy, P.; Lall, Brejesh

doi:10.1007/978-3-030-31760-7_1

Mohit Goyal⁴,
Rajan Goyal⁴,
P. Venkatappa Reddy^4,5 &
…
Brejesh Lall⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 865))

2996 Accesses
21 Citations

Abstract

Activation functions lie at the core of deep neural networks allowing them to learn arbitrarily complex mappings. Without any activation, a neural network learn will only be able to learn a linear relation between input and the desired output. The chapter introduces the reader to why activation functions are useful and their immense importance in making deep learning successful. A detailed survey of several existing activation functions is provided in this chapter covering their functional forms, original motivations, merits as well as demerits. The chapter also discusses the domain of learnable activation functions and proposes a novel activation ‘SLAF’ whose shape is learned during the training of a neural network. A working model for SLAF is provided and its performance is experimentally shown on XOR and MNIST classification tasks.

Mohit Goyal and Rajan Goyal have Equally Contributed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bengio, Y., Aaron, C., Courville, Vincent, P.: Unsupervised feature learning and deep learning: a review and new perspectives. In: CoRR abs/1206.5538 (2012). arXiv:1206.5538. http://arxiv.org/abs/1206.5538
Zhang, G.P.: Neural networks for classification: a survey . IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(4), 451–462 (2000). ISSN 1094-6977. https://doi.org/10.1109/5326.897072
Tian, G.P., Pan, L.: Predicting short-term traffic flow by long short-term memory recurrent neural network. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 153–158 (2015). https://doi.org/10.1109/SmartCity.2015.63
Wiki. Activation Potential | Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Action_potential (2018). Accessed 31 Dec 2018
Stanford CS231n—Convolutional neural networks for visual recognition. http://cs231n.github.io/neural-networks-1/. Accessed 01 May 2019
London, M., Hausser, M.: Dendritic computation. Annu. Rev. Neurosci. 28(1), 503–532 (2005). https://doi.org/10.1146/annurev.neuro.28.061604.135703
Wiki. Activation Function | Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Activation_function (2018). Accessed 31 Dec 2018
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Proceedings of the 4th International Conference on Neural Information Processing Systems, NIPS’91, pp. 950–957. Morgan Kaufmann Publishers Inc., Denver, Colorado. http://dl.acm.org/citation.cfm?id=2986916.2987033 (1991). ISBN 1-55860-222-4
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Article MathSciNet Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: CoRR abs/1502.03167 (2015). arXiv:1502.03167. http://arxiv.org/abs/1502.03167
Autoencoders. https://en.wikipedia.org/wiki/Autoencoder. Accessed 05 Sept 2019
Saxe, A.M., Mcclelland, J., Ganguli, G.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (2013)
Google Scholar
Arora, S., et al.: A convergence analysis of gradient descent for deep linear neural networks. In: CoRR abs/1810.02281 (2018). arXiv:1810.02281. http://arxiv.org/abs/1810.02281
Toms, D.J.: Training binary node feedforward neural networks by back propagation of error. Electron. Lett. 26(21), 1745–1746 (1990)
Article Google Scholar
Muselli, M.: On sequential construction of binary neural networks. IEEE Trans. Neural Netw. 6(3), 678–690 (1995)
Article MathSciNet Google Scholar
Ito, Y.: Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory. Neural Netw. 4(3), 385–394 (1991)
Article Google Scholar
Kwan, H.K.: Simple sigmoid-like activation function suitable for digital hardware implementation. Electron. Lett. 28(15), 1379–1380 (1992)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, Berlin (2006). ISBN 0387310738
Google Scholar
Parkes, E.J., Duffy, B.R.: An automated tanh-function method for finding solitary wave solutions to non-linear evolution equations. Comput. Phys. Commun. 98(3), 288–300 (1996)
Article Google Scholar
LeCun, Y., et al.: Efficient backprop In: Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, pp. 9–50. Springer, Berlin (1998). ISBN: 3-540-65311-2. http://dl.acm.org/citation.cfm?id=645754.668382
Pascanu, R., Mikolov, R., Bengio, Y.: Understanding the exploding gradient problem. In: CoRR abs/1211.5063 (2012). arXiv:1211.5063. http://arxiv.org/abs/1211.5063
Hahnloser, R.H.R., et al.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947 (2000). https://doi.org/10.1038/35016072
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10, pp. 807–814. Omnipress, Haifa, Israel (2010). ISBN 978-1-60558-907-7. http://dl.acm.org/citation.cfm?id=3104322.3104425
Maas, A.L.: Rectifier nonlinearities improve neural network acoustic models. In: ICML, vol. 30 (2013)
Google Scholar
He, K., et al.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). In: arXiv preprint (2015). arXiv:1511.07289
Klambauer, G., et al.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 971–980 (2017)
Google Scholar
Goodfellow, I., et al.: Maxout networks. In: Dasgupta, S., McAlleste, D. (eds.) Proceedings of the 30th International Conference on Machine Learning, vol. 28. Proceedings of Machine Learning Research 3. Atlanta, Georgia, USA: PMLR, June 2013, pp. 1319–1327. http://proceedings.mlr.press/v28/goodfellow13.html
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2018)
Google Scholar
He, K., et al.: Identity mappings in Deep residual networks. In: CoRR abs/1603.05027 (2016). arXiv:1603.05027. http://arxiv.org/abs/1603.05027
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: CoRR abs/1605.07146 (2016). arXiv:1605.07146. http://arxiv.org/abs/1605.07146
Huang, G., et al.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243.
Yu, C.C., Tang, Y.C., Liu, B.D.: An adaptive activation function for multilayer feedforward neural networks. In: 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM ’02. Proceedings, vol. 1, pp. 645–650 (2002). https://doi.org/10.1109/TENCON.2002.1181357.
Qian, S., et al.: Adaptive activation functions in convolutional neural networks. Neurocomput 272, 204–212 (2018). ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2017.06.070.
Hou, L., et al.: ConvNets with smooth adaptive activation functions for regression. In: Singh, A., Zhu, J. (eds.) Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. vol. 54. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR, pp. 430–439 (2017). http://proceedings.mlr.press/v54/hou17a.html
Agostinelli, F., et al.: Learning activation functions to improve deep neural networks. In: CoRR abs/1412.6830 (2014). arXiv:1412.6830. http://arxiv.org/abs/1412.6830
Lin, M., Chen Q., Yan, S.: Network in network. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16, 2014 Conference Track Proceedings, (2014). http://arxiv.org/abs/1312.4400
Basis function. June 2018. https://en.wikipedia.org/wiki/Basis_function
Carini, A., Sicuranza, G.L.: Even mirror Fourier nonlinear filters. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5608–5612 (2013)
Google Scholar
Loshchilov, L., Hutter, F.: Fixing weight decay regularization in adam. In: CoRR abs/1711.05101 (2017). arXiv:1711.05101. http://arxiv.org/abs/1711.05101
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings, (2015). http://arxiv.org/abs/1412.6980
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). ISSN 1532-4435. http://dl.acm.org/citation.cfm?id=1953048.2021068
Zeiler, M.D.: ADADELTA: An adaptive learning rate method. In: CoRR abs/1212.5701 (2012). arXiv:1212.5701. http://arxiv.org/abs/1212.5701
Krizhevsky, A., Sutskever, I., Hinton, G.E.: imagenet classification with deep convolutional neural networks. In: Pereira, F., et al. (eds.) Advances in Neural Information Processing Systems 25. Curran Associates, Inc., pp. 1097–1105 (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Delhi, Delhi, India
Mohit Goyal, Rajan Goyal, P. Venkatappa Reddy & Brejesh Lall
Electronics and Communication Engineering, Vignan’s Foundation for Science, Technology & Research, Guntur, India
P. Venkatappa Reddy

Authors

Mohit Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Rajan Goyal
View author publications
You can also search for this author in PubMed Google Scholar
P. Venkatappa Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Brejesh Lall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohit Goyal .

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
Shyi-Ming Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Goyal, M., Goyal, R., Venkatappa Reddy, P., Lall, B. (2020). Activation Functions. In: Pedrycz, W., Chen, SM. (eds) Deep Learning: Algorithms and Applications. Studies in Computational Intelligence, vol 865. Springer, Cham. https://doi.org/10.1007/978-3-030-31760-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-31760-7_1
Published: 24 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31759-1
Online ISBN: 978-3-030-31760-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics