Prerational Intelligence: Adaptive Behavior and Intelligent Systems Without Symbols and Logic, Volume 1, Volume 2 Prerational Intelligence: Interdisciplinary Perspectives on the Behavior of Natural and Artificial Systems, Volume 3 pp 624-640 | Cite as

# Modularization by Cascading Neural Networks

## Abstract

The application of artificial neural networks to complex real world problems usually requires a *modularization of the network architecture.* The single modules deal with subtasks that are defined by a decomposition of the problem. Up to now, the modularization of the network is usually done heuristically. Little is known about sensible methods to adapt the network structure to the problem at hand. Incrementally constructed cascade architectures are a promising approach to *grow* networks according to the needs of the problem. This paper discusses the properties of the recently proposed direct cascade architecture DCA (Littmann & Ritter 1992). One important virtue of DCA is that it allows the *cascading of entire subnetworks*, even if these admit no error-backpropagation. Exploiting this flexibility and using LLM networks as cascaded elements, we show that the performance of the resulting network cascades can be greatly enhanced compared to the performance of a single network. Our results for the Mackey-Glass time series prediction task indicate that such deeply cascaded network architectures achieve *good generalization even on small data sets*, when shallow, broad architectures of comparable size suffer from overfitting. We conclude that the DCA approach offers a powerful and flexible alternative to existing schemes such as, e. g. , the mixtures of experts approach, for the construction of modular systems from a wide range of subnetwork types.

## Keywords

Neural Information Processing System Neural Computation Normalize Root Mean Square Error Training Epoch Neural Module## Preview

Unable to display preview. Download preview PDF.

## References

- Baffes, P. , & J. Zelle (1992). Growing layers of perceptrons: Introducing the exten-tron algorithm.
*Proceedings of the International Joint Conference on Neural Networks*, volume II (pp. 392–397). Baltimore, MD.CrossRefGoogle Scholar - Baum, E. , & D. Haussler (1989). What size net gives valid generalization?
*Neural Computation***1**:151–160.CrossRefGoogle Scholar - Crowder, R. S. (1990). Predicting the Mackey-Glass time series with cascade-correlation learning. In D. S. Touretzky, J. L. Elman, TJ. Sejnowski, & G. E. Hinton (eds. ),
*Connectionist Models: Proceedings of the 1990 Summer School*(pp. 524–532). San Mateo, CA: Morgan Kaufmann.Google Scholar - Fahlman, S. E. (1991). The recurrent cascade-correlation architecture. In R. P. Lipp-mann, J. E. Moody, and D. S. Touretzky (eds. ),
*Advances in neural information processing systems*3 (pp. 190–196). San Mateo, CA: Morgan Kaufmann.Google Scholar - Fahlman, S. E. (1991). The recurrent cascade-correlation architecture. In R. P. Lipp-mann, J. E. Moody, and D. S. Touretzky (eds. ),
*Advances in neural information processing systems*3 (pp. 190–196). San Mateo, CA: Morgan Kaufmann.Google Scholar - Frean, M. (1990). The upstart algorithm: A method for constructing and training feedforward neural networks.
*Neural Computation***2**:198–209.CrossRefGoogle Scholar - Hartmann, E. , & J. D. Keeler (1991). Predicting the future: Advantages of semilocal units.
*Neural Computation***3**:566–578.CrossRefGoogle Scholar - Jacobs, R., & M. Jordan (1991). A competitive modular connectionist architecture. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (eds. ),
*Advances in neural information processing systems*3 (pp. 767–773). San Mateo, CA: Morgan Kaufmann.Google Scholar - Jacobs, R., M. Jordan, S. Nowlan, & G. Hinton (1991). Adaptive mixtures of local experts.
*Neural Computation***3**:79–87.CrossRefGoogle Scholar - Lapedes, A. , & R. Farber (1987).
*Nonlinear signal processing using neural networks; prediction and system modeling*. Technical Report TR LA-UR-87–2662, Los Alamos National Laboratory, Los Alamos, NM.Google Scholar - LeCun, Y. , J. D. Denker, & S. A. Solla (1990). Optimal brain damage. In D. S. Touretzky (ed. ),
*Advances in neural information processing systems*2 (pp. 598–605). San Mateo, CA: Morgan Kaufmann.Google Scholar - Littmann, E. , & H. Ritter (1992). Cascade network architectures.
*Proceedings of the International Joint Conference on Neural Networks*, volume II (pp. 398–404). Baltimore, MD.Google Scholar - Littmann, E. , & H. Ritter (1993). Generalization abilities of cascade network architectures. In C. L. Giles, S. J. Hanson, & J. D. Cowan (eds. ),
*Advances in neural information processing systems*5 (pp. 188–195). San Mateo, CA: Morgan Kaufmann.Google Scholar - Littmann, E. , & H. Ritter (1994a).
*Analysis and applications of the direct cascade architecture*. Technical Report TR 94–2, Department of Computer Science, Bielefeld University, Bielefeld, FR Germany.Google Scholar - Littmann, E. , & H. Ritter (1996). Learning and generalization in cascade network architectures.
*Neural Computation***8**(7):1521–1540.CrossRefGoogle Scholar - Mackey, M. , L. Glass (1977). Oscillations and chaos in physiological control systems.
*Science***197**:287–289.CrossRefGoogle Scholar - Meyering, A. , & H. Ritter (1992). Learning 3D shape perception with local linear maps.
*Proceedings of the International Joint Conference on Neural Networks*, volume IV (pp. 432–436). Baltimore, MD.Google Scholar - Mezard, M. , & J. R Nadal (1989). Learning in feedforward layered networks: The tiling algorithm.
*Journal of Physics***22**:2191–2204.MathSciNetGoogle Scholar - Minsky, M. L. , & S. A. Papert (1969).
*Perceptrons*. Cambridge, MA: MIT Press.zbMATHGoogle Scholar - Moody, J. , & C. Darken (1988). Learning with localized receptive fields.
*Connec-tionist Models: Proceedings of the 1988 Summer School*(pp. 133–143). San Mateo, CA: Morgan Kaufmann.Google Scholar - Mozer, M. (1989). A focused back-propagation algorithm for temporal pattern recognition.
*Complex Systems***3**:349–381.MathSciNetzbMATHGoogle Scholar - Nabhan, T. , & A. Zomaya (1994). Toward generating neural network structures for function approximation.
*Neural Networks***7**:89–99.CrossRefGoogle Scholar - Nowlan, S. J. , & G. E. Hinton (1991). Evaluation of adaptive mixtures of competing experts. In D. S. Touretzky (ed. ),
*Advances in neural information processing systems*5 (pp. 774–780). San Mateo, CA: Morgan Kaufmann.Google Scholar - Ritter, H. (1991). Learning with the self-organizing map. In T. Kohonen, K. Mäk-isara, O. Simula, & J. Kangas (eds. ),
*Artificial neural networks*1 (pp. 357–364). Amsterdam: Elsevier.Google Scholar - Ritter, H. , T. Martinetz, & K. Schulten (1992).
*Neural computation and self-organizing maps: An introduction*(English and German). New York: Addison-Wesley.zbMATHGoogle Scholar - Rumelhart, D. E. , G. E. Hinton, & R. J. Williams (1986). Learning internal representations by back-propagating errors. In D. E. Rumelhart & J. L. McClelland (eds. ),
*Parallel distributed processing*1. Cambridge, MA: MIT Press.Google Scholar - Stokbro, K. , D. Umberger, & J. Hertz (1990). Exploiting neurons with localized receptive fields to learn chaos.
*Complex Systems***4**:603–622.zbMATHGoogle Scholar