Advertisement

Multilayer Perceptrons: Other Learing Techniques

  • Ke-Lin DuEmail author
  • M. N. S. Swamy
Chapter

Abstract

This chapter continues to deal with multilayer perceptron. But the focus is on various second-order learning methods to speed up the learning process. Complex-valued multilayer perceptrons and spiking neural networks are also introduced in this chapter.

References

  1. 1.
    Amari, S. I. (1998). Natural gradient works efficiently in learning. Neural Computation, 10, 251–276.CrossRefGoogle Scholar
  2. 2.
    Amari, A., Park, H., & Fukumizu, K. (2000). Adaptive method of realizing natiral gradient learning for multilayer perceptrons. Neural Computation, 12, 1399–1409.CrossRefGoogle Scholar
  3. 3.
    Ampazis, N., & Perantonis, S. J. (2002). Two highly efficient second-order algorithms for training feedforward networks. IEEE Transactions on Neural Networks, 13(5), 1064–1074.CrossRefGoogle Scholar
  4. 4.
    Azimi-Sadjadi, R., & Liou, R. J. (1992). Fast learning process of multilayer neural networks using recursive least squares method. IEEE Transactions on Signal Processing, 40(2), 446–450.CrossRefGoogle Scholar
  5. 5.
    Baermann, F., & Biegler-Koenig, F. (1992). On a class of efficient learning algorithms for neural networks. Neural Networks, 5(1), 139–144.CrossRefGoogle Scholar
  6. 6.
    Barnard, E. (1992). Optimization for training neural nets. IEEE Transactions on Neural Networks, 3(2), 232–240.CrossRefGoogle Scholar
  7. 7.
    Battiti, R., & Masulli, F. (1990). BFGS optimization for faster automated supervised learning. In Proceedings of International Neural Network Conference (Vol. 2, pp. 757–760). Dordrecht, Netherland: Kluwer. Paris, France.Google Scholar
  8. 8.
    Battiti, R. (1992). First- and second-order methods for learning: Between steepest descent and Newton methods. Neural Computation, 4(2), 141–166.CrossRefGoogle Scholar
  9. 9.
    Battiti, R., Masulli, G., & Tecchiolli, G. (1994). Learning with first, second, and no derivatives: A case study in high energy physics. Neurocomputing, 6(2), 181–206.CrossRefGoogle Scholar
  10. 10.
    Beigi, H. S. M. (1993). Neural network learning through optimally conditioned quadratically convergent methods requiring no line search. In Proceedings of IEEE the 36th Midwest Symposium on Circuits and Systems (Vol. 1, pp. 109–112). Detroit, MI.Google Scholar
  11. 11.
    Benvenuto, N., & Piazza, F. (1992). On the complex backpropagation algorithm. IEEE Transactions on Signal Processing, 40(4), 967–969.CrossRefGoogle Scholar
  12. 12.
    Bhaya, A., & Kaszkurewicz, E. (2004). Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Networks, 17, 65–71.zbMATHCrossRefGoogle Scholar
  13. 13.
    Bilski, J., & Rutkowski, L. (1998). A fast training algorithm for neural networks. IEEE Transactions on Circuits and Systems II, 45(6), 749–753.CrossRefGoogle Scholar
  14. 14.
    Bishop, C. M. (1992). Exact calculation of the Hessian matrix for the multilayer perceptron. Neural Computation, 4(4), 494–501.CrossRefGoogle Scholar
  15. 15.
    Bishop, C. M. (1995). Neural networks for pattern recogonition. New York: Oxford Press.Google Scholar
  16. 16.
    Bortoletti, A., Di Fiore, C., Fanelli, S., & Zellini, P. (2003). A new class of quasi-Newtonian methods for optimal learning in MLP-networks. IEEE Transactions on Neural Networks, 14(2), 263–273.CrossRefGoogle Scholar
  17. 17.
    Burton, R. M., & Mpitsos, G. J. (1992). Event dependent control of noise enhances learning in neural networks. Neural Networks, 5, 627–637.CrossRefGoogle Scholar
  18. 18.
    Charalambous, C. (1992). Conjugate gradient algorithm for efficient training of artificial neural networks. IEE Proceedings—G, 139(3), 301–310.CrossRefGoogle Scholar
  19. 19.
    Chen, H. H., Manry, M. T., & Chandrasekaran, H. (1999). A neural network training algorithm utilizing multiple sets of linear equations. Neurocomputing, 25, 55–72.zbMATHCrossRefGoogle Scholar
  20. 20.
    Chen, Y. X., & Wilamowski, B. M. (2002). TREAT: A trust-region-based error-aggregated training algorithm for neural networks. In Proceedings of International Joint Conference on Neural Networks (Vol. 2, pp. 1463–1468).Google Scholar
  21. 21.
    Dai, Y. H., & Yuan, Y. (1999). A nonlinear conjugate gradient method with a strong global convergence property. SIAM Journal on Optimization, 10, 177–182.MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Dixon, L. C. W. (1975). Conjugate gradient algorithms: Quadratic termination properties without linear searches. IMA Journal of Applied Mathematics, 15, 9–18.zbMATHCrossRefGoogle Scholar
  23. 23.
    Ergezinger, S., & Thomsen, E. (1995). An accelerated learning algorithm for multilayer perceptrons: Optimization layer by layer. IEEE Transactions on Neural Networks, 6(1), 31–42.CrossRefGoogle Scholar
  24. 24.
    Fairbank, M., Alonso, E., & Schraudolph, N. (2012). Efficient calculation of the Gauss-Newton approximation of the Hessian matrix in neural networks. Neural Computation, 24(3), 607–610.zbMATHCrossRefGoogle Scholar
  25. 25.
    Fletcher, R. (1991). Practical methods of optimization. New York: Wiley.zbMATHGoogle Scholar
  26. 26.
    Fletcher, R., & Reeves, C. W. (1964). Function minimization by conjugate gradients. Computer Journal, 7, 148–154.MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Fukuoka, Y., Matsuki, H., Minamitani, H., & Ishida, A. (1998). A modified back-propagation method to avoid false local minima. Neural Networks, 11, 1059–1072.CrossRefGoogle Scholar
  28. 28.
    Georgiou, G., & Koutsougeras, C. (1992). Complex domain backpropagation. IEEE Transactions on Circuits and Systems II, 39(5), 330–334.zbMATHCrossRefGoogle Scholar
  29. 29.
    Gonzalez, A., & Dorronsoro, J. R. (2008). Natural conjugate gradient training of multilayer perceptrons. Neurocomputing, 71, 2499–2506.CrossRefGoogle Scholar
  30. 30.
    Goryn, D., & Kaveh, M. (1989). Conjugate gradient learning algorithms for multilayer perceptrons. In Proceedings of the 32nd Midwest Symposium on Circuits and Systems (pp. 736–739). Champaign, IL.Google Scholar
  31. 31.
    Hagan, M. T., & Menhaj, M. B. (1994). Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks, 5(6), 989–993.CrossRefGoogle Scholar
  32. 32.
    Hanna, A. I. & Mandic, D. P. (2002). A normalised complex backpropagation algorithm. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 977–980). Orlando, FL.Google Scholar
  33. 33.
    Hassibi, B., Stork, D. G., & Wolff, G. J. (1992). Optimal brain surgeon and general network pruning. In Proceedings of IEEE International Conference on Neural Networks (pp. 293–299). San Francisco, CA.Google Scholar
  34. 34.
    Heskes, T. (2000). On “natural” learning and pruning in multilayered perceptrons. Neural Computation, 12, 881–901.CrossRefGoogle Scholar
  35. 35.
    Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of National Bureau of Standards B, 49, 409–436.MathSciNetzbMATHCrossRefGoogle Scholar
  36. 36.
    Hush, D. R., Horne, B., & Salas, J. M. (1992). Error surfaces for multilayer perceptrons. IEEE Transactions on Systems, Man, and Cybernetics, 22(5), 1152–1161.CrossRefGoogle Scholar
  37. 37.
    Igel, C., Toussaint, M., & Weishui, W. (2005). Rprop using the natural gradient. In M. G. de Bruin, D. H. Mache, & J. Szabados (Eds.), Trends and applications in constructive approximation, International series of numerical mathematics (Vol. 151, pp. 259–272). Basel, Switzerland: Birkhauser.CrossRefGoogle Scholar
  38. 38.
    IIguni, Y., Sakai, H., & Tokumaru, H., (1992). A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter. IEEE Transactions on Signal Processing, 40(4), 959–967.Google Scholar
  39. 39.
    Johansson, E. M., Dowla, F. U., & Goodman, D. M. (1991). Backpropagation learning for multilayer feedforward neural networks using the conjugate gradient method. International Journal of Neural Systems, 2(4), 291–301.CrossRefGoogle Scholar
  40. 40.
    Kamarthi, S. V., & Pittner, S. (1999). Accelerating neural network training using weight extrapolations. Neural Networks, 12, 1285–1299.CrossRefGoogle Scholar
  41. 41.
    Kantsila, A., Lehtokangas, M., & Saarinen, J. (2004). Complex RPROP-algorithm for neural network equalization of GSM data bursts. Neurocomputing, 61, 339–360.CrossRefGoogle Scholar
  42. 42.
    Kim, T., & Adali, T. (2002). Fully complex multi-layer perceptron network for nonlinear signal processing. Journal of VLSI Signal Processing, 32(1), 29–43.zbMATHCrossRefGoogle Scholar
  43. 43.
    Kim, T., & Adali, T. (2003). Approximation by fully complex multilayer perceptrons. Neural Computation, 15, 1641–1666.zbMATHCrossRefGoogle Scholar
  44. 44.
    Kostopoulos, A. E., & Grapsa, T. N. (2009). Self-scaled conjugate gradient training algorithms. Neurocomputing, 72, 3000–3019.CrossRefGoogle Scholar
  45. 45.
    Lee, J. (2003). Attractor-based trust-region algorithm for efficient training of multilayer perceptrons. Electronics Letters, 39(9), 727–728.CrossRefGoogle Scholar
  46. 46.
    Leung, H., & Haykin, S. (1991). The complex backpropagation algorithm. IEEE Transactions on Signal Processing, 3(9), 2101–2104.CrossRefGoogle Scholar
  47. 47.
    Leung, C. S., Wong, K. W., Sum, P. F., & Chan, L. W. (2001). A pruning method for the recursive least squared algorithm. Neural Networks, 14, 147–174.CrossRefGoogle Scholar
  48. 48.
    Leung, C. S., Tsoi, A. C., & Chan, L. W. (2001). Two regularizers for recursive least squared algorithms in feedforward multilayered neural networks. IEEE Transactions on Neural Networks, 12, 1314–1332.CrossRefGoogle Scholar
  49. 49.
    Li, Y., Zhang, D., & Wang, K. (2006). Parameter by parameter algorithm for multilayer perceptrons. Neural Processing Letters, 23, 229–242.CrossRefGoogle Scholar
  50. 50.
    Liu, C. S., & Tseng, C. H. (1999). Quadratic optimization method for multilayer neural networks with local error-backpropagation. International Journal on Systems Science, 30(8), 889–898.zbMATHCrossRefGoogle Scholar
  51. 51.
    Manry, M. T., Apollo, S. J., Allen, L. S., Lyle, W. D., Gong, W., Dawson, M. S., et al. (1994). Fast training of neural networks for remote sensing. Remote Sensing Reviews, 9, 77–96.CrossRefGoogle Scholar
  52. 52.
    McLoone, S., & Irwin, G. (1999). A variable memory quasi-Newton training algorithm. Neural Processing Letters, 9, 77–89.CrossRefGoogle Scholar
  53. 53.
    McLoone, S. F., & Irwin, G. W. (1997). Fast parallel off-line training of multilayer perceptrons. IEEE Transactions on Neural Networks, 8(3), 646–653.CrossRefGoogle Scholar
  54. 54.
    McLoone, S. F., Asirvadam, V. S., & Irwin, G. W. (2002). A memory optimal BFGS neural network training algorithm. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 513–518). Honolulu, HI.Google Scholar
  55. 55.
    Moller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 6(4), 525–533.CrossRefGoogle Scholar
  56. 56.
    More, J. J. (1977). The Levenberg-Marquardt algorithm: Implementation and theory. In G. A. Watson (Ed.), Numerical analysis (Vol. 630, pp. 105–116)., Lecture notes in mathematics Berlin: Springer-Verlag.CrossRefGoogle Scholar
  57. 57.
    Nazareth, J. L. (2003). Differentiable optimization and equation solving. New York: Springer.zbMATHGoogle Scholar
  58. 58.
    Ng, S. C., Leung, S. H., & Luk, A. (1999). Fast convergent generalized back-propagation algorithm with constant learning rate. Neural Processing Letters, 9, 13–23.CrossRefGoogle Scholar
  59. 59.
    Ngia, L. S. H., & Sjoberg, J. (2000). Efficient training of neural nets for nonlinear adaptive filtering using a recursive Levenberg-Marquardt algorithm. IEEE Transactions on Signal Processing, 48(7), 1915–1927.zbMATHCrossRefGoogle Scholar
  60. 60.
    Nishiyama, K., & Suzuki, K. (2001). H\(_\infty \)-learning of layered neural networks. IEEE Transactions on Neural Networks, 12(6), 1265–1277.CrossRefGoogle Scholar
  61. 61.
    Nitta, T. (1997). An extension to the back-propagation algorithm to complex numbers. Neural Networks, 10(8), 1391–1415.CrossRefGoogle Scholar
  62. 62.
    Parisi, R., Di Claudio, E. D., Orlandim, G., & Rao, B. D. (1996). A generalized learning paradigm exploiting the structure of feedforward neural networks. IEEE Transactions on Neural Networks, 7(6), 1450–1460.CrossRefGoogle Scholar
  63. 63.
    Perantonis, S. J., Ampazis, N., & Spirou, S. (2000). Training feedforward neural networks with the dogleg method and BFGS Hessian updates. In Proceedings of International Joint Conference on Neural Networks (pp. 138–143). Como, Italy.Google Scholar
  64. 64.
    Perry, A. (1978). A modified conjugate gradient algorithm. Operations Research, 26, 26–43.MathSciNetzbMATHCrossRefGoogle Scholar
  65. 65.
    Phua, P. K. H., & Ming, D. (2003). Parallel nonlinear optimization techniques for training neural networks. IEEE Transactions on Neural Networks, 14(6), 1460–1468.CrossRefGoogle Scholar
  66. 66.
    Polak, E. (1971). Computational methods in optimization: A unified approach. New York: Academic Press.Google Scholar
  67. 67.
    Powell, M. J. D. (1977). Restart procedures for the conjugate gradient method. Mathematical Programming, 12, 241–254.MathSciNetzbMATHCrossRefGoogle Scholar
  68. 68.
    Puskorius, G. V., & Feldkamp, L. A. (1991). Decoupled extended Kalman filter training of feedforward layered networks. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 771–777). Seattle, WA.Google Scholar
  69. 69.
    Rao, K. D., Swamy, M. N. S., & Plotkin, E. I. (2000). Complex EKF neural network for adaptive equalization. In Proceedings of IEEE International Symposium on Circuits and Systems (pp. 349–352). Geneva, Switzerland.Google Scholar
  70. 70.
    Rigler, A. K., Irvine, J. M., & Vogl, T. P. (1991). Rescaling of variables in back propagation learning. Neural Networks, 4(2), 225–229.CrossRefGoogle Scholar
  71. 71.
    Rivals, I., & Personnaz, L. (1998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Neurocomputing, 20, 279–294.CrossRefGoogle Scholar
  72. 72.
    Rubanov, N. S. (2000). The layer-wise method and the backpropagation hybrid approach to learning a feedforward neural network. IEEE Transactions on Neural Networks, 11(2), 295–305.CrossRefGoogle Scholar
  73. 73.
    Ruck, D. W., Rogers, S. K., Kabrisky, M., Maybeck, P. S., & Oxley, M. E. (1992). Comparative analysis of backpropagation and the extended Kalman filter for training multilayer perceptrons. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(6), 686–691.CrossRefGoogle Scholar
  74. 74.
    Saarinen, S., Bramley, R., & Cybenko, G. (1993). Ill conditioning in neural network training problems. SIAM Journal on Scientific Computing, 14(3), 693–714.MathSciNetzbMATHCrossRefGoogle Scholar
  75. 75.
    Saito, K., & Nakano, R. (1997). Partial BFGS update and efficient step-length calculation for three-layer neural networks. Neural Computation, 9, 123–141.zbMATHCrossRefGoogle Scholar
  76. 76.
    Savitha, R., Suresh, S., Sundararajan, N., & Saratchandran, P. (2009). A new learning algorithm with logarithmic performance index for complex-valued neural networks. Neurocomputing, 72, 3771–3781.CrossRefGoogle Scholar
  77. 77.
    Scalero, R. S., & Tepedelenlioglu, N. (1992). A fast new algorithm for training feedforward neural networks. IEEE Transactions on Signal Processing, 40(1), 202–210.CrossRefGoogle Scholar
  78. 78.
    Shanno, D. (1978). Conjugate gradient methods with inexact searches. Mathematics of Operations Research, 3, 244–256.MathSciNetzbMATHCrossRefGoogle Scholar
  79. 79.
    Shah, S., & Palmieri, F. (1990). MEKA–A fast, local algorithm for training feedforward neural networks. In Proceedings of International Joint Conference on Neural Networks (IJCNN) (Vol. 3, pp. 41–46). San Diego, CA.Google Scholar
  80. 80.
    Shawe-Taylor, J. S., & Cohen, D. A. (1990). Linear programming algorithm for neural networks. Neural Networks, 3(5), 575–582.CrossRefGoogle Scholar
  81. 81.
    Singhal, S., & Wu, L. (1989). Training feedforward networks with the extended Kalman algorithm. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (Vol. 2, 1187–1190). Glasgow, UK.Google Scholar
  82. 82.
    Stan, O., & Kamen, E. (2000). A local linearized least squares algorithm for training feedforward neural networks. IEEE Transactions on Neural Networks, 11(2), 487–495.CrossRefGoogle Scholar
  83. 83.
    Sum, J., Leung, C. S., Young, G. H., & Kan, W. K. (1999). On the Kalman filtering method in neural network training and pruning. IEEE Transactions on Neural Networks, 10, 161–166.CrossRefGoogle Scholar
  84. 84.
    Uncini, A., Vecci, L., Campolucci, P., & Piazza, F. (1999). Complex-valued neural networks with adaptive spline activation functions. IEEE Transactions on Signal Processing, 47(2), 505–514.CrossRefGoogle Scholar
  85. 85.
    van der Smagt, P. (1994). Minimisation methods for training feed-forward neural networks. Neural Networks, 7(1), 1–11.CrossRefGoogle Scholar
  86. 86.
    Verikas, A., & Gelzinis, A. (2000). Training neural networks by stochastic optimisation. Neurocomputing, 30, 153–172.CrossRefGoogle Scholar
  87. 87.
    Wang, Y. J., & Lin, C. T. (1998). A second-order learning algorithm for multilayer networks based on block Hessian matrix. Neural Networks, 11, 1607–1622.CrossRefGoogle Scholar
  88. 88.
    Wilamowski, B. M., Iplikci, S., Kaynak, O., & Efe, M.O.(2001). An algorithm for fast convergence in training neural networks. In Proceedings of International Joint Conference on Neural Networks (Vol 3, pp. 1778–1782). Washington, DC.Google Scholar
  89. 89.
    Wilamowski, B. M., Cotton, N. J., Kaynak, O., & Dundar, G. (2008). Computing gradient vector and Jacobian matrix in arbitrarily connected neural networks. IEEE Transactions on Industrial Electronics, 55(10), 3784–3790.CrossRefGoogle Scholar
  90. 90.
    Wilamowski, B. M., & Yu, H. (2010). Improved computation for Levenberg-Marquardt training. IEEE Transactions on Neural Networks, 21(6), 930–937.CrossRefGoogle Scholar
  91. 91.
    Wilamowski, B. M., & Yu, H. (2010). Neural network learning without backpropagation. IEEE Transactions on Neural Networks, 21(11), 1793–1803.CrossRefGoogle Scholar
  92. 92.
    Xu, D., Zhang, H., & Liu, L. (2010). Convergence analysis of three classes of split-complex gradient algorithms for complex-valued recurrent neural networks. Neural Computation, 22(10), 2655–2677.MathSciNetzbMATHCrossRefGoogle Scholar
  93. 93.
    Xu, Y., Wong, K.-W., & Leung, C.-S. (2006). Generalized RLS approach to the training of neural networks. IEEE Trans Neural Netw, 17(1), 19–34.CrossRefGoogle Scholar
  94. 94.
    Yang, S.-S., Ho, C.-L., & Siu, S. (2007). Sensitivity analysis of the split-complex valued multilayer perceptron due to the errors of the i.i.d. inputs and weights. IEEE Transactions on Neural Networks, 18(5), 1280–1293.Google Scholar
  95. 95.
    Yang, S.-S., Siu, S., & Ho, C.-L. (2008). Analysis of the initial values in split-complex backpropagation algorithm. IEEE Transactions on Neural Networks, 19(9), 1564–1573.CrossRefGoogle Scholar
  96. 96.
    You, C., & Hong, D. (1998). Nonlinear blind equalization schemes using complex-valued multilayer feedforward neural networks. IEEE Transactions on Neural Networks, 9(6), 1442–1455.CrossRefGoogle Scholar
  97. 97.
    Yu, X. H., Chen, G. A., & Cheng, S. X. (1995). Dynamic learning rate optimization of the backpropagation algorithm. IEEE Transactions on Neural Networks, 6(3), 669–677.CrossRefGoogle Scholar
  98. 98.
    Yu, C., Manry, M. T., Li, J., & Narasimha, P. L. (2006). An efficient hidden layer training method for the multilayer perceptron. Neurocomputing, 70, 525–535.CrossRefGoogle Scholar
  99. 99.
    Zhang, Y., & Li, X. (1999). A fast U-D factorization-based learning algorithm with applications to nonlinear system modeling and identification. IEEE Transactions on Neural Networks, 10, 930–938.CrossRefGoogle Scholar
  100. 100.
    Zhang, H., Zhang, C., & Wu, W. (2009). Convergence of batch split-complex backpropagation algorithm for complex-valued neural networks. Discrete Dynamics in Nature and Society, 2009, 1–16.zbMATHGoogle Scholar
  101. 101.
    Zhang, H., Xu, D., & Zhang, Y. (2014). Boundedness and convergence of split-complex back-propagation algorithm with momentum and penalty. Neural Processing Letters, 39, 297–307.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringConcordia UniversityMontrealCanada
  2. 2.Xonlink Inc.HangzhouChina

Personalised recommendations