Architectural constraints on learning and memory function
KeywordsInformation Processing System External Information Layered Architecture Parallel Network Neuronal Ensemble
Recent experimental and computational studies have identified relationships between architecture and functional performance in information processing systems ranging from natural neuronal ensembles [1, 2] to artificial neural networks [3, 4]. While these systems can vary greatly in their size and complexity, they share certain structural features, such as parallel and layered motifs . Quantifying how these features influence functionality is a first step toward understanding the behavior of both natural and artificial information processing systems. Of particular interest is the impact of structural architecture on the ability of the system to balance stability with flexibility, for example in memory versus learning.
In this study, we use neural networks as model information processing systems to examine tradeoffs in learning and memory processes arising from variations in structural organization. We compare the performance of parallel and layered structures during sequential function approximation, a task that requires networks to produce, retain, and dynamically adapt representations of external information. We measure network performance over a range of learning conditions by statistically analyzing the error in these representations while varying the initial network state, the structure of the external information, and the time allowed for learning. By characterizing local error landscape curvature, we can directly relate the functional performance of the system to its underlying architecture.
Across a range of both parallel and layered system architectures, we find that variations in error landscape curvature give rise to tradeoffs between the ability of these networks to learn new versus retain old information, maximize success versus minimize failure, and produce specific versus generalizable representations of information. In particular, parallel networks generate smooth error landscapes with deep, narrow minima. Therefore, given sufficient time and through the adjustment of a large number of connection weights, parallel networks can find highly specific representations of the external information. Although accurate, however, these representations are difficult to generalize. In contrast, layered networks generate rough error landscapes with a variety of local minima, allowing them to quickly find coarse representations by adjusting a fewer number of weights. Although less accurate, these representations are more easily adaptable.
We have conducted a detailed analysis of network performance over a range of parallel and layered architectures, thereby isolating learning and memory tradeoffs that arise from underlying structural complexity. A thorough understanding of small network systems is crucial for predicting the behavior of larger systems in which statistical studies of performance would not be possible. In particular, these results may provide insight into the behavior of composite systems, such as cortical layers composed of structurally distinct columns  or modular divide-and-conquer networks , which share features of both parallel and layered architectures. Additionally, the existence of tradeoffs inherent to a range of network structures may help explain the variability of architectural motifs observed in large-scale biological  and technical  systems. Identifying the structural mechanisms that impact performance has implications for understanding a wide variety of both natural and artificial learning systems.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.