Abstract
The application of artificial neural networks to complex real world problems usually requires a modularization of the network architecture. The single modules deal with subtasks that are defined by a decomposition of the problem. Up to now, the modularization of the network is usually done heuristically. Little is known about sensible methods to adapt the network structure to the problem at hand. Incrementally constructed cascade architectures are a promising approach to grow networks according to the needs of the problem. This paper discusses the properties of the recently proposed direct cascade architecture DCA (Littmann & Ritter 1992). One important virtue of DCA is that it allows the cascading of entire subnetworks, even if these admit no error-backpropagation. Exploiting this flexibility and using LLM networks as cascaded elements, we show that the performance of the resulting network cascades can be greatly enhanced compared to the performance of a single network. Our results for the Mackey-Glass time series prediction task indicate that such deeply cascaded network architectures achieve good generalization even on small data sets, when shallow, broad architectures of comparable size suffer from overfitting. We conclude that the DCA approach offers a powerful and flexible alternative to existing schemes such as, e. g. , the mixtures of experts approach, for the construction of modular systems from a wide range of subnetwork types.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baffes, P. , & J. Zelle (1992). Growing layers of perceptrons: Introducing the exten-tron algorithm. Proceedings of the International Joint Conference on Neural Networks, volume II (pp. 392–397). Baltimore, MD.
Baum, E. , & D. Haussler (1989). What size net gives valid generalization? Neural Computation 1:151–160.
Crowder, R. S. (1990). Predicting the Mackey-Glass time series with cascade-correlation learning. In D. S. Touretzky, J. L. Elman, TJ. Sejnowski, & G. E. Hinton (eds. ), Connectionist Models: Proceedings of the 1990 Summer School (pp. 524–532). San Mateo, CA: Morgan Kaufmann.
Fahlman, S. E. (1991). The recurrent cascade-correlation architecture. In R. P. Lipp-mann, J. E. Moody, and D. S. Touretzky (eds. ), Advances in neural information processing systems 3 (pp. 190–196). San Mateo, CA: Morgan Kaufmann.
Fahlman, S. E. (1991). The recurrent cascade-correlation architecture. In R. P. Lipp-mann, J. E. Moody, and D. S. Touretzky (eds. ), Advances in neural information processing systems 3 (pp. 190–196). San Mateo, CA: Morgan Kaufmann.
Frean, M. (1990). The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation 2:198–209.
Hartmann, E. , & J. D. Keeler (1991). Predicting the future: Advantages of semilocal units. Neural Computation 3:566–578.
Jacobs, R., & M. Jordan (1991). A competitive modular connectionist architecture. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (eds. ), Advances in neural information processing systems 3 (pp. 767–773). San Mateo, CA: Morgan Kaufmann.
Jacobs, R., M. Jordan, S. Nowlan, & G. Hinton (1991). Adaptive mixtures of local experts. Neural Computation 3:79–87.
Lapedes, A. , & R. Farber (1987). Nonlinear signal processing using neural networks; prediction and system modeling. Technical Report TR LA-UR-87–2662, Los Alamos National Laboratory, Los Alamos, NM.
LeCun, Y. , J. D. Denker, & S. A. Solla (1990). Optimal brain damage. In D. S. Touretzky (ed. ), Advances in neural information processing systems 2 (pp. 598–605). San Mateo, CA: Morgan Kaufmann.
Littmann, E. , & H. Ritter (1992). Cascade network architectures. Proceedings of the International Joint Conference on Neural Networks, volume II (pp. 398–404). Baltimore, MD.
Littmann, E. , & H. Ritter (1993). Generalization abilities of cascade network architectures. In C. L. Giles, S. J. Hanson, & J. D. Cowan (eds. ), Advances in neural information processing systems 5 (pp. 188–195). San Mateo, CA: Morgan Kaufmann.
Littmann, E. , & H. Ritter (1994a). Analysis and applications of the direct cascade architecture. Technical Report TR 94–2, Department of Computer Science, Bielefeld University, Bielefeld, FR Germany.
Littmann, E. , & H. Ritter (1996). Learning and generalization in cascade network architectures. Neural Computation 8(7):1521–1540.
Mackey, M. , L. Glass (1977). Oscillations and chaos in physiological control systems. Science 197:287–289.
Meyering, A. , & H. Ritter (1992). Learning 3D shape perception with local linear maps. Proceedings of the International Joint Conference on Neural Networks, volume IV (pp. 432–436). Baltimore, MD.
Mezard, M. , & J. R Nadal (1989). Learning in feedforward layered networks: The tiling algorithm. Journal of Physics 22:2191–2204.
Minsky, M. L. , & S. A. Papert (1969). Perceptrons. Cambridge, MA: MIT Press.
Moody, J. , & C. Darken (1988). Learning with localized receptive fields. Connec-tionist Models: Proceedings of the 1988 Summer School (pp. 133–143). San Mateo, CA: Morgan Kaufmann.
Mozer, M. (1989). A focused back-propagation algorithm for temporal pattern recognition. Complex Systems 3:349–381.
Nabhan, T. , & A. Zomaya (1994). Toward generating neural network structures for function approximation. Neural Networks 7:89–99.
Nowlan, S. J. , & G. E. Hinton (1991). Evaluation of adaptive mixtures of competing experts. In D. S. Touretzky (ed. ), Advances in neural information processing systems 5 (pp. 774–780). San Mateo, CA: Morgan Kaufmann.
Ritter, H. (1991). Learning with the self-organizing map. In T. Kohonen, K. Mäk-isara, O. Simula, & J. Kangas (eds. ), Artificial neural networks 1 (pp. 357–364). Amsterdam: Elsevier.
Ritter, H. , T. Martinetz, & K. Schulten (1992). Neural computation and self-organizing maps: An introduction (English and German). New York: Addison-Wesley.
Rumelhart, D. E. , G. E. Hinton, & R. J. Williams (1986). Learning internal representations by back-propagating errors. In D. E. Rumelhart & J. L. McClelland (eds. ), Parallel distributed processing 1. Cambridge, MA: MIT Press.
Stokbro, K. , D. Umberger, & J. Hertz (1990). Exploiting neurons with localized receptive fields to learn chaos. Complex Systems 4:603–622.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Littmann, E., Ritter, H. (2000). Modularization by Cascading Neural Networks. In: Cruse, H., Dean, J., Ritter, H. (eds) Prerational Intelligence: Adaptive Behavior and Intelligent Systems Without Symbols and Logic, Volume 1, Volume 2 Prerational Intelligence: Interdisciplinary Perspectives on the Behavior of Natural and Artificial Systems, Volume 3. Studies in Cognitive Systems, vol 26. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0870-9_39
Download citation
DOI: https://doi.org/10.1007/978-94-010-0870-9_39
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-3792-1
Online ISBN: 978-94-010-0870-9
eBook Packages: Springer Book Archive