Skip to main content

Multikernel Activation Functions: Formulation and a Case Study

  • Conference paper
  • First Online:

Part of the book series: Proceedings of the International Neural Networks Society ((INNS,volume 1))

Abstract

The design of activation functions is a growing research area in the field of neural networks. In particular, instead of using fixed point-wise functions (e.g., the rectified linear unit), several authors have proposed ways of learning these functions directly from the data in a non-parametric fashion. In this paper we focus on the kernel activation function (KAF), a recently proposed framework wherein each function is modeled as a one-dimensional kernel model, whose weights are adapted through standard backpropagation-based optimization. One drawback of KAFs is the need to select a single kernel function and its eventual hyper-parameters. To partially overcome this problem, we motivate an extension of the KAF model, in which multiple kernels are linearly combined at every neuron, inspired by the literature on multiple kernel learning. We provide an application of the resulting multi-KAF on a realistic use case, specifically handwritten Latin OCR, on a large dataset collected in the context of the ‘In Codice Ratio’ project. Results show that multi-KAFs can improve the accuracy of the convolutional networks previously developed for the task, with faster convergence, even with a smaller number of overall parameters.

The work of S. Scardapane was supported in part by Italian MIUR, “Progetti di Ricerca di Rilevante Interesse Nazionale”, GAUChO project, under Grant 2015YPXH4W_004.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.archiviosegretovaticano.va/.

  2. 2.

    The dataset is available on the web at http://www.dia.uniroma3.it/db/icr/.

References

  1. Aiolli, F., Donini, M.: EasyMKL: a scalable multiple kernel learning algorithm. Neurocomputing 169, 215–224 (2015)

    Article  Google Scholar 

  2. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of the 2016 International Conference on Learning Representations, ICLR (2016)

    Google Scholar 

  3. Eisenach, C., Wang, Z., Liu, H.: Nonparametrically learning activation functions in deep neural nets. In: 5th International Conference for Learning Representations (Workshop Track) (2017)

    Google Scholar 

  4. Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E.: Towards knowledge discovery from the Vatican Secret Archives. In Codice Ratio-episode 1: machine transcription of the manuscripts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 263–272. ACM (2018)

    Google Scholar 

  5. Firmani, D., Merialdo, P., Nieddu, E., Scardapane, S.: In Codice Ratio: OCR of handwritten Latin documents using deep convolutional networks. In: 11th International Workshop on Artificial Intelligence for Cultural Heritage, AI*CH 2017, pp. 9–16 (2017)

    Google Scholar 

  6. Genton, M.G.: Classes of kernels for machine learning: a statistics perspective. J. Mach. Learn. Res. 2(Dec), 299–312 (2001)

    MathSciNet  MATH  Google Scholar 

  7. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, AISTATS, p. 275 (2011)

    Google Scholar 

  8. Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12(Jul), 2211–2268 (2011)

    MathSciNet  MATH  Google Scholar 

  9. Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the 30th International Conference on Machine Learning, ICML (2013)

    Google Scholar 

  10. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV, pp. 1026–1034 (2015)

    Google Scholar 

  12. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, ICML, pp. 448–456 (2015)

    Google Scholar 

  13. Jin, X., Xu, C., Feng, J., Wei, Y., Xiong, J., Yan, S.: Deep learning with S-shaped rectified linear activation units. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  14. Liu, W., Principe, J.C., Haykin, S.: Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, Hoboken (2011)

    Google Scholar 

  15. Oneto, L., Navarin, N., Donini, M., Ridella, S., Sperduti, A., Aiolli, F., Anguita, D.: Learning with kernels: a local rademacher complexity-based analysis with application to graph kernels. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4660–4671 (2017)

    Article  MathSciNet  Google Scholar 

  16. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)

  17. Scardapane, S., Van Vaerenbergh, S., Totaro, S., Uncini, A.: Kafnets: kernel-based non-parametric activation functions for neural networks. Neural Netw. (2018, in press)

    Google Scholar 

  18. Scardapane, S., Van Vaerenbergh, S., Hussain, A., Uncini, A.: Complex-valued neural networks with non-parametric activation functions. IEEE Trans. Emerg. Top. Comput. Intell. (2018, in press)

    Google Scholar 

  19. Siniscalchi, S.M., Salerno, V.M.: Adaptation to new microphones using artificial neural networks with trainable activation functions. IEEE Trans. Neural Netw. Learn. Syst. 28(8), 1959–1965 (2017)

    Article  MathSciNet  Google Scholar 

  20. Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 215–219. IEEE (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone Scardapane .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Scardapane, S., Nieddu, E., Firmani, D., Merialdo, P. (2020). Multikernel Activation Functions: Formulation and a Case Study. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds) Recent Advances in Big Data and Deep Learning. INNSBDDL 2019. Proceedings of the International Neural Networks Society, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-030-16841-4_33

Download citation

Publish with us

Policies and ethics