Supervised and Unsupervised Co-training of Adaptive Activation Functions in Neural Nets
In spite of the nice theoretical properties of mixtures of logistic activation functions, standard feedforward neural network with limited resources and gradient-descent optimization of the connection weights may practically fail in several, difficult learning tasks. Such tasks would be better faced by relying on a more appropriate, problem-specific basis of activation functions. The paper introduces a connectionist model which features adaptive activation functions. Each hidden unit in the network is associated with a specific pair (f(·), p(·)), where f(·) (the very activation) is modeled via a specialized neural network, and p(·) is a probabilistic measure of the likelihood of the unit itself being relevant to the computation of the output over the current input. While f(·) is optimized in a supervised manner (through a novel backpropagation scheme of the target outputs which do not suffer from the traditional phenomenon of “vanishing gradient” that occurs in standard backpropagation), p(·) is realized via a statistical parametric model learned through unsupervised estimation. The overall machine is implicitly a co-trained coupled model, where the topology chosen for learning each f(·) may vary on a unit-by-unit basis, resulting in a highly non-standard neural architecture.
KeywordsCo-training partially unsupervised learning adaptive activation function
Unable to display preview. Download preview PDF.
- 1.Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2001)Google Scholar
- 3.Stinchcombe, M., White, H.: Universal Approximation using Feedforward Networks with Non-Sigmoid Hidden Layer Activation Functions. In: International Joint Conference on Neural Networks, IJCNN 1989, vol. 1, pp. 613–617 (1989)Google Scholar
- 5.Vecci, L., Piazza, F., Uncini, A.: Learning and Approximation Capabilities of Adaptive Spline Activation Function Neural Networks (1998)Google Scholar
- 6.Castelli, I., Trentin, E.: Semi-unsupervised Weighted Maximum-Likelihood Estimation of Joint Densities for the Co-Training of Adaptive Activation Functions. In: Schwenker, F., Trentin, E. (eds.) PSL 2011. LNCS (LNAI), vol. 7081, pp. 62–71. Springer, Heidelberg (2012)Google Scholar