Skip to main content

Activation Functions

  • Chapter
  • First Online:
Deep Learning: Algorithms and Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 865))

Abstract

Activation functions lie at the core of deep neural networks allowing them to learn arbitrarily complex mappings. Without any activation, a neural network learn will only be able to learn a linear relation between input and the desired output. The chapter introduces the reader to why activation functions are useful and their immense importance in making deep learning successful. A detailed survey of several existing activation functions is provided in this chapter covering their functional forms, original motivations, merits as well as demerits. The chapter also discusses the domain of learnable activation functions and proposes a novel activation ‘SLAF’ whose shape is learned during the training of a neural network. A working model for SLAF is provided and its performance is experimentally shown on XOR and MNIST classification tasks.

Mohit Goyal and Rajan Goyal have Equally Contributed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bengio, Y., Aaron, C., Courville, Vincent, P.: Unsupervised feature learning and deep learning: a review and new perspectives. In: CoRR abs/1206.5538 (2012). arXiv:1206.5538. http://arxiv.org/abs/1206.5538

  2. Zhang, G.P.: Neural networks for classification: a survey . IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(4), 451–462 (2000). ISSN 1094-6977. https://doi.org/10.1109/5326.897072

  3. Tian, G.P., Pan, L.: Predicting short-term traffic flow by long short-term memory recurrent neural network. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 153–158 (2015). https://doi.org/10.1109/SmartCity.2015.63

  4. Wiki. Activation Potential | Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Action_potential (2018). Accessed 31 Dec 2018

  5. Stanford CS231n—Convolutional neural networks for visual recognition. http://cs231n.github.io/neural-networks-1/. Accessed 01 May 2019

  6. London, M., Hausser, M.: Dendritic computation. Annu. Rev. Neurosci. 28(1), 503–532 (2005). https://doi.org/10.1146/annurev.neuro.28.061604.135703

  7. Wiki. Activation Function | Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Activation_function (2018). Accessed 31 Dec 2018

  8. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586

  9. Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Proceedings of the 4th International Conference on Neural Information Processing Systems, NIPS’91, pp. 950–957. Morgan Kaufmann Publishers Inc., Denver, Colorado. http://dl.acm.org/citation.cfm?id=2986916.2987033 (1991). ISBN 1-55860-222-4

  10. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)

    Article  MathSciNet  Google Scholar 

  11. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: CoRR abs/1502.03167 (2015). arXiv:1502.03167. http://arxiv.org/abs/1502.03167

  12. Autoencoders. https://en.wikipedia.org/wiki/Autoencoder. Accessed 05 Sept 2019

  13. Saxe, A.M., Mcclelland, J., Ganguli, G.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (2013)

    Google Scholar 

  14. Arora, S., et al.: A convergence analysis of gradient descent for deep linear neural networks. In: CoRR abs/1810.02281 (2018). arXiv:1810.02281. http://arxiv.org/abs/1810.02281

  15. Toms, D.J.: Training binary node feedforward neural networks by back propagation of error. Electron. Lett. 26(21), 1745–1746 (1990)

    Article  Google Scholar 

  16. Muselli, M.: On sequential construction of binary neural networks. IEEE Trans. Neural Netw. 6(3), 678–690 (1995)

    Article  MathSciNet  Google Scholar 

  17. Ito, Y.: Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory. Neural Netw. 4(3), 385–394 (1991)

    Article  Google Scholar 

  18. Kwan, H.K.: Simple sigmoid-like activation function suitable for digital hardware implementation. Electron. Lett. 28(15), 1379–1380 (1992)

    Article  Google Scholar 

  19. Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, Berlin (2006). ISBN 0387310738

    Google Scholar 

  20. Parkes, E.J., Duffy, B.R.: An automated tanh-function method for finding solitary wave solutions to non-linear evolution equations. Comput. Phys. Commun. 98(3), 288–300 (1996)

    Article  Google Scholar 

  21. LeCun, Y., et al.: Efficient backprop In: Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, pp. 9–50. Springer, Berlin (1998). ISBN: 3-540-65311-2. http://dl.acm.org/citation.cfm?id=645754.668382

  22. Pascanu, R., Mikolov, R., Bengio, Y.: Understanding the exploding gradient problem. In: CoRR abs/1211.5063 (2012). arXiv:1211.5063. http://arxiv.org/abs/1211.5063

  23. Hahnloser, R.H.R., et al.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947 (2000). https://doi.org/10.1038/35016072

  24. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10, pp. 807–814. Omnipress, Haifa, Israel (2010). ISBN 978-1-60558-907-7. http://dl.acm.org/citation.cfm?id=3104322.3104425

  25. Maas, A.L.: Rectifier nonlinearities improve neural network acoustic models. In: ICML, vol. 30 (2013)

    Google Scholar 

  26. He, K., et al.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  27. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). In: arXiv preprint (2015). arXiv:1511.07289

  28. Klambauer, G., et al.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 971–980 (2017)

    Google Scholar 

  29. Goodfellow, I., et al.: Maxout networks. In: Dasgupta, S., McAlleste, D. (eds.) Proceedings of the 30th International Conference on Machine Learning, vol. 28. Proceedings of Machine Learning Research 3. Atlanta, Georgia, USA: PMLR, June 2013, pp. 1319–1327. http://proceedings.mlr.press/v28/goodfellow13.html

  30. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2018)

    Google Scholar 

  31. He, K., et al.: Identity mappings in Deep residual networks. In: CoRR abs/1603.05027 (2016). arXiv:1603.05027. http://arxiv.org/abs/1603.05027

  32. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: CoRR abs/1605.07146 (2016). arXiv:1605.07146. http://arxiv.org/abs/1605.07146

  33. Huang, G., et al.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017.243.

  34. Yu, C.C., Tang, Y.C., Liu, B.D.: An adaptive activation function for multilayer feedforward neural networks. In: 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM ’02. Proceedings, vol. 1, pp. 645–650 (2002). https://doi.org/10.1109/TENCON.2002.1181357.

  35. Qian, S., et al.: Adaptive activation functions in convolutional neural networks. Neurocomput 272, 204–212 (2018). ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2017.06.070.

  36. Hou, L., et al.: ConvNets with smooth adaptive activation functions for regression. In: Singh, A., Zhu, J. (eds.) Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. vol. 54. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR, pp. 430–439 (2017). http://proceedings.mlr.press/v54/hou17a.html

  37. Agostinelli, F., et al.: Learning activation functions to improve deep neural networks. In: CoRR abs/1412.6830 (2014). arXiv:1412.6830. http://arxiv.org/abs/1412.6830

  38. Lin, M., Chen Q., Yan, S.: Network in network. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16, 2014 Conference Track Proceedings, (2014). http://arxiv.org/abs/1312.4400

  39. Basis function. June 2018. https://en.wikipedia.org/wiki/Basis_function

  40. Carini, A., Sicuranza, G.L.: Even mirror Fourier nonlinear filters. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5608–5612 (2013)

    Google Scholar 

  41. Loshchilov, L., Hutter, F.: Fixing weight decay regularization in adam. In: CoRR abs/1711.05101 (2017). arXiv:1711.05101. http://arxiv.org/abs/1711.05101

  42. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings, (2015). http://arxiv.org/abs/1412.6980

  43. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). ISSN 1532-4435. http://dl.acm.org/citation.cfm?id=1953048.2021068

  44. Zeiler, M.D.: ADADELTA: An adaptive learning rate method. In: CoRR abs/1212.5701 (2012). arXiv:1212.5701. http://arxiv.org/abs/1212.5701

  45. Krizhevsky, A., Sutskever, I., Hinton, G.E.: imagenet classification with deep convolutional neural networks. In: Pereira, F., et al. (eds.) Advances in Neural Information Processing Systems 25. Curran Associates, Inc., pp. 1097–1105 (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohit Goyal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Goyal, M., Goyal, R., Venkatappa Reddy, P., Lall, B. (2020). Activation Functions. In: Pedrycz, W., Chen, SM. (eds) Deep Learning: Algorithms and Applications. Studies in Computational Intelligence, vol 865. Springer, Cham. https://doi.org/10.1007/978-3-030-31760-7_1

Download citation

Publish with us

Policies and ethics