Abstract
Feedforward Neural Networks training for classification problem is considered. The Extended Kalman Filter, which has been earlier used mostly for training Recurrent Neural Networks for prediction and control, is suggested as a learning algorithm. Implementation of the cross-entropy error function for mini-batch training is proposed. Popular benchmarks are used to compare the method with the gradient-descent, conjugate-gradients and the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm. The influence of mini-batch size on time and quality of training is investigated. The algorithms under consideration implemented as MATLAB scripts are available for free download.
Similar content being viewed by others
References
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., and Kingsbury, B., Deep neural networks for acoustic modeling in speech recognition, IEEE Signal Processing Magazine, vol. 29, no. 6, 2012, pp. 82–97.
Ciresan, D., Meier, U., Masci, J., and Schmidhuber, J., Multi-column deep neural network for traffic sign classification. Neural Networks, 2012, vol. 34, pp. 333–338.
Singhal, S. and Wu, L., Training multilayer perceptrons with the extended Kalman algorithm, Advances in Neural Information Processing Systems 1, Touretzky, D.S., Eds., San Mateo, CA: Morgan Kaufmann, 1989, pp. 133–140.
Haykin, S., Kalman Filtering and Neural Networks, John Wiley & Sons, 2001.
Arasaratnam, I. and Haykin, S., Nonlinear Bayesian filters for training recurrent neural networks, Proc. 7th Mexican International Conference on Artificial Intelligence, Atizapán de Zaragoza, Mexico, October 27–31, 2008; Lecture Notes in Computer Science, 2008, vol. 5317, pp. 12–33.
Haykin, S., Neural Networks and Learning Machines, Third Edition, New York: Prentice Hall, 2009.
Puskorius, G.V. and Feldkamp, L.A., Decoupled extended Kalman filter training of feedforward layered networks, International Joint Conference on Neural Networks, 1991, 8–14 Jul 1991, Seattle, vol. 1, pp. 771–777.
Puskorius, G.V. and Feldkamp, L.A., Training controllers for robustness: multi-stream DEKF, Neural Networks, 1994; IEEE World Congress on Computational Intelligence, 27 June–2 July 1996, Orlando, USA, vol. 6, pp. 2377–2382.
Li, S., Comparative analysis of backpropagation and extended Kalman filter in pattern and batch forms for training neural networks, Proc. on International Joint Conference on Neural Networks (IJCNN’ 01), Washington, DC, July 15–19, 2001, vol. 1, pp. 144–149.
Chernodub, A.N., Direct method for training feed-forward neural networks using batch extended Kalman filter for multi-step-ahead predictions artificial neural networks and machine learning, 23rd International Conference on Artificial Neural Networks, 10–13 September 2014, Sofia, Bulgaria (ICANN-2013), Lecture Notes in Computer Science, Berlin Heidelberg: Springer-Verlag, 2013, vol. 8131, pp. 138–145.
Mirikitani, D.T. and Nikolaev, N., Dynamic modeling with ensemble Kalman filter trained recurrent neural networks, Seventh International Conference on Machine Learning and Applications (ICMLA’08), 11–13 Dec 2008, San-Diego, USA.
Wan, E.A. and van der Merwe, R., The unscented Kalman filter for nonlinear estimation, Proc. of IEEE Symposium, October 2000 (AS-SPCC), Lake Louise, Alberta, Canada, pp. 153–158.
Arasaratnam, I. and Haykin, S., Cubature Kalman filters, IEEE Transactions on Automatic Control, vol. 56, no. 6, pp. 1254–1269.
Prokhorov, D.V. Toyota Prius HEV neurocontrol and diagnostics, Neural Networks, 2008, no. 21, pp. 458–465.
Cernansky, M. and Benuskova, L., Simple recurrent network trained by RTRL and extended Kalman filter algorithms, Neural Network World, 2003, vol. 3, no. 13, pp. 223–234.
http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
Bourlard, H.A. and Morgan, N., Connectionist Speech Recognition: a Hybrid Approach. Norwell, MA, USA: Kluwer Academic Publishers, 1993.
Golik, P., Doetsch, P., and Ney, H., Cross-entropy vs. squared error training: a theoretical and experimental comparison, 14th Annual Conference of the International Speech Communication Association “Interspeech-2013”, 25–29 August 2013, Lyon, France, pp. 1756–1760.
Nabney, I.T., Netlab: Algorithms for Pattern Recognition London: Springer, 2004.
http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/downloads/
Bishop, C.M., Pattern Recognition and Machine Learning, Springer, 2006.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Chernodub, A.N. Training Neural Networks for classification using the Extended Kalman Filter: A comparative study. Opt. Mem. Neural Networks 23, 96–103 (2014). https://doi.org/10.3103/S1060992X14020088
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S1060992X14020088