Neural Processing Letters

, Volume 50, Issue 3, pp 2627–2646 | Cite as

Generalizing the Convolution Operator in Convolutional Neural Networks

  • Kamaledin Ghiasi-ShiraziEmail author


Convolutional neural networks (CNNs) have become an essential tool for solving many machine vision and machine learning problems. A major element of these networks is the convolution operator which essentially computes the inner product between a weight vector and the vectorized image patches extracted by sliding a window in the image planes of the previous layer. In this paper, we propose two classes of surrogate functions for the inner product operation inherent in the convolution operator and so attain two generalizations of the convolution operator. The first one is based on the class of positive definite kernel functions where their application is justified by the kernel trick. The second one is based on the class of similarity measures defined according to some distance function. We justify this by tracing back to the basic idea behind the neocognitron which is the ancestor of CNNs. Both of these methods are then further generalized by allowing a monotonically increasing function (possibly depending on the weight vector) to be applied subsequently. Like any trainable parameter in a neural network, the template pattern and the parameters of the kernel/distance function are trained with the back-propagation algorithm. As an aside, we use the proposed framework to justify the use of sine activation function in CNNs. Additionally, we discovered a family of generalized convolution operators which is based on the convex combination of the dot-product and the negative squared Euclidean distance functions. Our experiments on the MNIST dataset show that the performance of ordinary CNNs can be achieved by generalized CNNs based on weighted L1/L2 distances, proving the applicability of the proposed generalization of the convolutional neural networks.


Generalized convolutional neural networks Generalized convolution operators L2 family of generalized convolution operators Kernel methods Back-propagation 



The author wishes to express appreciation to Research Deputy of Ferdowsi University of Mashhad for supporting this project by Grant No.: 2/43037. The author also thanks the anonymous reviewers and his fellows Ahad Harati and Ehsan Fazl-Ersi for their valuable comments.


  1. 1.
    Chandar S, Khapra MM, Larochelle H, Ravindran B (2016) Correlational neural networks. Neural Comput 28(2):257–285MathSciNetCrossRefGoogle Scholar
  2. 2.
    Fletcher G, Hinde C (1994) Learning the activation function for the neurons in neural networks. In: ICANN94. Springer, pp 611–614Google Scholar
  3. 3.
    Fukushima K (1975) Cognitron: A self-organizing multilayered neural network. Biol Cybern 20(3–4):121–136CrossRefGoogle Scholar
  4. 4.
    Fukushima K (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202CrossRefGoogle Scholar
  5. 5.
    Fukushima K (1988) Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Netw 1(2):119–130CrossRefGoogle Scholar
  6. 6.
    Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. AISTATS 9:249–256Google Scholar
  7. 7.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778Google Scholar
  8. 8.
    Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366CrossRefGoogle Scholar
  9. 9.
    Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of The 32nd international conference on machine learning. pp 448–456Google Scholar
  10. 10.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM’14, Orlando, Florida, USA. ACM, New York, NY, pp 675–678.
  11. 11.
    Krähenbühl P, Doersch C, Donahue J, Darrell T (2016) Data-dependent initializations of convolutional neural networks. In: International conference on learning representationsGoogle Scholar
  12. 12.
    Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of TorontoGoogle Scholar
  13. 13.
    LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551CrossRefGoogle Scholar
  14. 14.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998a) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  15. 15.
    LeCun Y, Bottou L, Orr GB, Müller KR (1998b) Efficient backprop. In: Neural networks: tricks of the trade. pp 9–50Google Scholar
  16. 16.
    Li P (2016) Two classes of linear equations of discrete convolution type with harmonic singular operators. Complex Var Elliptic Equ 61(1):67–75MathSciNetCrossRefGoogle Scholar
  17. 17.
    Li P (2017a) Generalized convolution-type singular integral equations. Appl Math Comput 311:314–323MathSciNetCrossRefGoogle Scholar
  18. 18.
    Li P (2017b) Some classes of singular integral equations of convolution type in the class of exponentially increasing functions. J Inequal Appl 2017(1):307MathSciNetCrossRefGoogle Scholar
  19. 19.
    Li P, Ren G (2016) Some classes of equations of discrete type with harmonic singular operator and convolution. Appl Math Comput 284:185–194MathSciNetzbMATHGoogle Scholar
  20. 20.
    Lin M, Chen Q, Yan S (2014) Network in network. In: International conference on learning representationsGoogle Scholar
  21. 21.
    Mairal J (2016) End-to-end kernel learning with supervised convolutional kernel networks. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29. Curran Associates Inc., Red Hook, pp 1399–1407Google Scholar
  22. 22.
    Mairal J, Koniusz P, Harchaoui Z, Schmid C (2014) Convolutional kernel networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems. pp 2627–2635.
  23. 23.
    Mishkin D, Matas J (2016) All you need is a good init. In: International conference on learning representationsGoogle Scholar
  24. 24.
    Nakagawa M (1995) An artificial neuron model with a periodic activation function. J Phys Soc Jpn 64(3):1023–1031CrossRefGoogle Scholar
  25. 25.
    Nalaie K, Ghiasi-Shirazi K, Akbarzadeh-T MR (2017) Efficient implementation of a generalized convolutional neural networks based on weighted euclidean distance. In: 2017 7th international conference on computer and knowledge engineering (ICCKE). pp 211–216.
  26. 26.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252MathSciNetCrossRefGoogle Scholar
  27. 27.
    Schölkopf B, Smola A (2002) Learning with kernels- support vector machines, regularization, optimization and beyond. MIT Press, CambridgeGoogle Scholar
  28. 28.
    Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29(3):411–426CrossRefGoogle Scholar
  29. 29.
    Sopena JM, Romero E, Alquezar R (1999) Neural networks with periodic and monotonic activation functions: a comparative study in classification problems. In: 9th international conference on artificial neural networks: ICANN ’99, IET. pp 323–328Google Scholar
  30. 30.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9Google Scholar
  31. 31.
    Williams D, Hinton G (1986) Learning representations by back-propagating errors. Nature 323:533–536CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer EngineeringFerdowsi University of Mashhad (FUM)MashhadIran

Personalised recommendations