Order in the Black Box: Consistency and Robustness of Hidden Neuron Activation of Feed Forward Neural Networks and Its Use in Efficient Optimization of Network Structure

Samarasinghe, Sandhya

doi:10.1007/978-3-319-28495-8_2

Sandhya Samarasinghe⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 628))

7844 Accesses

Abstract

Neural networks are widely used for nonlinear pattern recognition and regression. However, they are considered as black boxes due to lack of transparency of internal workings and lack of direct relevance of its structure to the problem being addressed making it difficult to gain insights. Furthermore, structure of a neural network requires optimization which is still a challenge. Many existing structure optimization approaches require either extensive multi-stage pruning or setting subjective thresholds for pruning parameters. The knowledge of any internal consistency in the behavior of neurons could help develop simpler, systematic and more efficient approaches to optimise network structure. This chapter addresses in detail the issue of internal consistency in relation to redundancy and robustness of network structure of feed forward networks (3-layer) that are widely used for nonlinear regression. It first investigates if there is a recognizable consistency in neuron activation patterns under all conditions of network operation such as noise and initial weights. If such consistency exists, it points to a recognizable optimum network structure for given data. The results show that such pattern does exist and it is most clearly evident not at the level of hidden neuron activation but hidden neuron input to the output neuron (i.e., weighted hidden neuron activation). It is shown that when a network has more than the optimum number of hidden neurons, the redundant neurons form clearly distinguishable correlated patterns of their weighted outputs. This correlation structure is exploited to extract the required number of neurons using correlation distance based self organising maps that are clustered using Ward clustering that optimally cluster correlated weighted hidden neuron activity patterns without any user defined criteria or thresholds, thus automatically optimizing network structure in one step. The number of Ward clusters on the SOM is the required optimum number of neurons. The SOM/Ward based optimum network is compared with that obtained using two documented pruning methods: optimal brain damage and variance nullity measure to show the efficacy of the correlation approach in providing equivalent results. Also, the robustness of the network with optimum structure is tested against perturbation of weights and confidence intervals for weights are illustrated. Finally, the approach is tested on two practical problems involving a breast cancer diagnostic system and river flow forecasting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

S. Samarasinghe, Neural Networks for Applied Sciences and Engineering-From Fundamentals to Complex Pattern Recognition (CRC Press, 2006)
Google Scholar
C. Bishop, Neural Networks for Pattern Recognition (Clarendon Press, Oxford, UK, 1996)
MATH Google Scholar
S. Haykin, Neural Networks: A comprehensive Foundation, 2nd edn. (Prentice Hall Inc, New Jersey, USA, 1999)
MATH Google Scholar
R. Reed, Pruning algorithms-A survey. IEEE Trans. Neural Networks 4, 740–747 (1993)
Article Google Scholar
Y. Le Cun, J.S. Denker, S.A. Solla, Optimal brain damage, in Advances in Neural Information Processing (2), ed. by D.S. Touretzky (1990), pp. 598–605
Google Scholar
B. Hassibi, D.G. Stork, G.J. Wolff, Optimal brain surgeon and general network pruning. IEEE International Conference on Neural Networks, vol. 1, (San Francisco, 1992), pp. 293–298
Google Scholar
B. Hassibi, D.G. Stork, Second-order derivatives for network pruning: Optimal brain surgeon, in Advances in Neural Information Processing Systems, vol. 5, ed. by C. Lee Giles, S.J. Hanson, J.D. Cowan, (1993), pp. 164–171
Google Scholar
A.P. Engelbrecht, A new pruning heuristic based on variance analysis of sensitivity information. IEEE Trans. Neural Networks 12(6), 1386–1399 (2001)
Article Google Scholar
K. Hagiwara, Regularization learning, early stopping and biased estimator. Neurocomputing 48, 937–955 (2002)
Article MATH Google Scholar
M. Hagiwara, Removal of hidden units and weights for backpropagation networks. Proc. Int. Joint Conf. Neural Networks 1, 351–354 (1993)
Google Scholar
F. Aires, Neural network uncertainty assessment using Bayesian statistics with application to remote sensing: 1. Network weights. J. Geophys. Res. 109, D10303 (2004). doi:10.1029/2003JD004173
Article Google Scholar
F. Aires, Neural network uncertainty assessment using Bayesian statistics with application to remote sensing: 2. Output Error. J. Geophys. Res. 109, D10304 (2004). doi:10.1029/2003JD004174
Article Google Scholar
F. Aires, Neural network uncertainty assessment using Bayesian statistics with application to remote sensing: 3. Network Jacobians. J. Geophys. Res. 109, D10305 (2004). doi:10.1029/2003JD004175
Article Google Scholar
K. Warne, G. Prasad, S. Rezvani, L. Maguire, Statistical computational intelligence techniques for inferential model development: A comparative evaluation and novel proposition for fusion. Eng. Appl. Artif. Intell. 17, 871–885 (2004)
Article Google Scholar
I. Rivals, L. Personnaz, Construction of Confidence Intervals for neural networks based on least squares estimation. Neural Networks 13, 463–484 (2000)
Article Google Scholar
E.J. Teoh, K.C. Tan, C. Xiang, Estimating the number of hidden neurons in a feed forward network using the singular value decomposition IEEE Trans. Neural Networks 17(6), (2006)
Google Scholar
C. Xian, S.Q. Ding, T.H. Lee, Geometrical interpretation and architecture selection of MLP, IEEE Trans. Neural Networks 16(1), (2005)
Google Scholar
P.A. Castillo, J. Carpio, J.J. Merelo, V. Rivas, G. Romero, A. Prieto, Evolving multilayer perceptrons. Neural Process. Lett. 12(2), 115–127 (2000)
Article MATH Google Scholar
X. Yao, Evolutionary artificial neural networks. Proc. IEEE 87(9), 1423–1447 (1999)
Article Google Scholar
S. Samarasinghe, Optimum Structure of Feed Forward Neural Networks by SOM Clustering of Neuron Activations. Proceedings of the International Modelling and Simulation Congress (MODSM) (2007)
Google Scholar
Neural Networks for Mathematica, (Wolfram Research, Inc. USA, 2002)
Google Scholar
J. Sietsma, R.J.F. Dow, Creating artificial neural networks that generalize. Neural Networks 4(1), 67–77 (1991)
Article Google Scholar
Machine learning framework for Mathematica. 2002 Uni software Plus. www.unisoftwareplus.com
J.H. Ward Jr, Hierarchical grouping to optimize an objective function. J. Am Stat. Assoc. 58, 236–244 (1963)
Article MathSciNet Google Scholar
K. Hornik, M. Stinchcombe, H. White, Universal approximation of an unknown mapping and its derivatives using multi-layer feedforard networks. Neural Networks 3, 551–560 (1990)
Article Google Scholar
A.R. Gallant, H. White, On learning the derivative of an unknown mapping with multilayer feedforward networks. Neural Networks 5, 129–138 (1992)
Article Google Scholar
A. Al-yousef, S. Samarasinghe, Ultrasound based computer aided diagnosis of breast cancer: Evaluation of a new feature of mass central regularity degree. Proceedings of the International Modelling and Simulation Congress (MODSM) (2011)
Google Scholar
S. Samarasinghe, Hydrocomplexity: New Tools for Solving Wicked Water Problems Hydrocomplexité: Nouveaux outils pour solutionner des problèmes de l’eau complexes (IAHS Publ. 338) (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Integrated Systems Modelling Group, Lincoln University, Christchurch, New Zealand
Sandhya Samarasinghe

Authors

Sandhya Samarasinghe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandhya Samarasinghe .

Editor information

Editors and Affiliations

School of Computer and Mathematical, Auckland University of Technology (, Auckland, New Zealand
Subana Shanmuganathan
Dept of Informatics & Enabling Tech, Lincoln University, Lincoln, New Zealand
Sandhya Samarasinghe

Appendix: Algorithm for Optimising Hidden Layer of MLP Based on SOM/Ward Clustering of Correlated Weighted Hidden Neuron Outputs

I.
Train an MLP with a relatively larger number of hidden neurons
1.
For input vector X, the weighted input u_j and output y_j of hidden neuron j are:

$$\begin{aligned} u_{j} & = a_{0\,j} + \sum\limits_{i = 1}^{n} {a_{ij} x_{i} } \\ y_{j} & = f\left( {u_{j} } \right) \\ \end{aligned}$$
where a _oj is bias weight and a _ij are input-hidden neuron weights. f is transfer function.
2.
The net input v _k and output z _k of output neuron k are:

$$\begin{aligned} v_{k} & = b_{0k} + \sum\limits_{j = 1}^{m} {b_{jk} y_{j} } \\ z_{k} & = f\left( {v_{k} } \right) \\ \end{aligned}$$
where b _ok is bias weight and b _jk are hidden-output weights.
3.
Mean Square error (MSE) for the whole data set is:

$$MSE = \frac{1}{2N}\left[ {\sum\limits_{i = 1}^{N} {\left( {t_{i} - z_{i} } \right)^{2} } } \right]$$
where t is target and N is the sample size.
4.
Weights are updated using a chosen method of least square error minimisation, such as Levenberg Marquardt method:
$$w_{m} = w_{m - 1} - \varepsilon \,Rd_{m}$$
where d_m is sum of error gradient of weight w for epoch m, R is inverse of curvature, and ε is learning rate.
5.
Repeat the process 1 to 4 until minimum MSE is reached using training, calibration (testing) and validation data sets.
II.
SOM clustering of weighted hidden neuron outputs

Inputs to SOM

An input vector X _j into SOM is:

$$X_{j} = y_{j} b_{j} ;$$

where y _j is output of hidden neuron j and b _j is its weight to output neuron in MLP. Length n of the vector X_j is equal to the number of samples in the original dataset.

Normalise X _j to unit length

SOM training

1.
Projecting weighted output of hidden neurons onto a Self Organising Map:

$$u_{j} = \sum\limits_{i = 1}^{n} {w_{ij} x_{i} }$$
where u _j is output of SOM neuron j and w _ij is its weight with input component x _i
2.
Winner selection: Select winner neuron based on the minimum correlation distance between an input vector and SOM neuron weight vectors (same as Euclidean distance for normalised input vectors)

$$\begin{aligned} d_{\,\,j} & = {\text{x}} - {\text{w}}_{\,\,j} \\ & \sqrt {\sum\limits_{i}^{n} {\left( {x_{i} - w_{ij} } \right)^{2} } } \\ \end{aligned}$$
3.
Update of weights of winner and neighbours at iteration t:

Select neighbourhood function NS(d, t) (such as Gaussian) and learning rate function β(t) (such as exponential or linear) where d is distance from winner to a neighbour neuron and t is iteration.

$${\text{w}}_{\text{j}} \left( t \right) = {\text{w}}_{\text{j}} \left( {t - 1} \right) + \beta \left( t \right)NS\left( {d,t} \right)\left[ {{\text{x}}\left( t \right) - {\text{w}}_{\text{j}} \left( {t - 1} \right)} \right]$$
4.
Repeat the process until mean distance D between weights W _i and inputs x _n is minimum.

$$D = \sum\limits_{i = 1}^{k} {\sum\limits_{{n \in c_{i} }} {\left( {{\text{x}}_{n} - {\text{w}}_{i} } \right)^{2} } }$$
where k is number of SOM neurons and c_i is the cluster of inputs represented by neuron i

III.
Clustering of SOM neurons

Ward method minimizes the within group sum of squares distance as a result of joining two possible (hypothetical) clusters. The within group sum of squares is the sum of square distance between all objects in the cluster and its centroid. Two clusters that produce the least sum of square distance are merged in each step of clustering. This distance measure is called the Ward distance (d _ward) and is expressed as:

$$d_{wand} = \frac{{\left( {n_{r}^{ * } n_{s} } \right)}}{{\left( {n_{r} + n_{s} } \right)}}\left\| {{\text{x}}_{\text{r}} - {\text{x}}_{\text{s}} } \right\|^{2}$$

where x _r and x _s are the centre of gravity of two clusters. n _r and n _s are the number of data points in the two clusters.

The centre of gravity of the two merged clusters x _r(new) is calculated as:

$${\text{x}}_{{r\left( {new} \right)}} = \frac{1}{{n_{r} + n_{s} }}\left( {n_{r}^{ * } {\text{x}}_{r} + n_{s}^{ * } {\text{x}}_{s} } \right)$$

The likelihood of various numbers of clusters is determined by WardIndex as:

$$WardIndex = \frac{1}{NC}\left( {\frac{{d_{t} - d_{t - 1} }}{{d_{t - 1} - d_{t - 2} }}} \right) = \frac{1}{NC}\left( {\frac{{\Delta d_{t} }}{{\Delta d_{t - 1} }}} \right)$$

where d _t is the distance between centres of two clusters to be merged at current step and d _t-1 and d _t-2 are such distances in the previous two steps. NC is the number of clusters left.

The numbers of clusters with the highest WardIndex is selected as the optimum.

IV.
Optimum number of hidden neurons in MLP

The optimum number of hidden neurons in the original MLP is equal to this optimum number of clusters on the SOM.

Train an MLP with the above selected optimum number of hidden neurons.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Samarasinghe, S. (2016). Order in the Black Box: Consistency and Robustness of Hidden Neuron Activation of Feed Forward Neural Networks and Its Use in Efficient Optimization of Network Structure. In: Shanmuganathan, S., Samarasinghe, S. (eds) Artificial Neural Network Modelling. Studies in Computational Intelligence, vol 628. Springer, Cham. https://doi.org/10.1007/978-3-319-28495-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-28495-8_2
Published: 04 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28493-4
Online ISBN: 978-3-319-28495-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Order in the Black Box: Consistency and Robustness of Hidden Neuron Activation of Feed Forward Neural Networks and Its Use in Efficient Optimization of Network Structure

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Algorithm for Optimising Hidden Layer of MLP Based on SOM/Ward Clustering of Correlated Weighted Hidden Neuron Outputs

Appendix: Algorithm for Optimising Hidden Layer of MLP Based on SOM/Ward Clustering of Correlated Weighted Hidden Neuron Outputs

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation