Supervised and Unsupervised Co-training of Adaptive Activation Functions in Neural Nets

  • Ilaria Castelli
  • Edmondo Trentin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7081)


In spite of the nice theoretical properties of mixtures of logistic activation functions, standard feedforward neural network with limited resources and gradient-descent optimization of the connection weights may practically fail in several, difficult learning tasks. Such tasks would be better faced by relying on a more appropriate, problem-specific basis of activation functions. The paper introduces a connectionist model which features adaptive activation functions. Each hidden unit in the network is associated with a specific pair (f(·), p(·)), where f(·) (the very activation) is modeled via a specialized neural network, and p(·) is a probabilistic measure of the likelihood of the unit itself being relevant to the computation of the output over the current input. While f(·) is optimized in a supervised manner (through a novel backpropagation scheme of the target outputs which do not suffer from the traditional phenomenon of “vanishing gradient” that occurs in standard backpropagation), p(·) is realized via a statistical parametric model learned through unsupervised estimation. The overall machine is implicitly a co-trained coupled model, where the topology chosen for learning each f(·) may vary on a unit-by-unit basis, resulting in a highly non-standard neural architecture.


Co-training partially unsupervised learning adaptive activation function 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2001)Google Scholar
  2. 2.
    Cybenko, G.: Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals, and Systems 4, 303–314 (1989)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Stinchcombe, M., White, H.: Universal Approximation using Feedforward Networks with Non-Sigmoid Hidden Layer Activation Functions. In: International Joint Conference on Neural Networks, IJCNN 1989, vol. 1, pp. 613–617 (1989)Google Scholar
  4. 4.
    Chen, T., Chen, H.: Universal Approximation to Nonlinear Operators by Neural Networks with Arbitrary Activation Functions and its Application to Dynamical Systems. IEEE Transaction on Neural Networks 4, 911–917 (1995)CrossRefGoogle Scholar
  5. 5.
    Vecci, L., Piazza, F., Uncini, A.: Learning and Approximation Capabilities of Adaptive Spline Activation Function Neural Networks (1998)Google Scholar
  6. 6.
    Castelli, I., Trentin, E.: Semi-unsupervised Weighted Maximum-Likelihood Estimation of Joint Densities for the Co-Training of Adaptive Activation Functions. In: Schwenker, F., Trentin, E. (eds.) PSL 2011. LNCS (LNAI), vol. 7081, pp. 62–71. Springer, Heidelberg (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ilaria Castelli
    • 1
  • Edmondo Trentin
    • 1
  1. 1.Dipartimento di Ingegneria dell’InformazioneUniversità di SienaSienaItaly

Personalised recommendations