# Evaluating generalization through interval-based neural network inversion

- 81 Downloads

## Abstract

Typically, measuring the generalization ability of a neural network relies on the well-known method of cross-validation which statistically estimates the classification error of a network architecture thus assessing its generalization ability. However, for a number of reasons, cross-validation does not constitute an efficient and unbiased estimator of generalization and cannot be used to assess generalization of neural network after training. In this paper, we introduce a new method for evaluating generalization based on a deterministic approach revealing and exploiting the network’s domain of validity. This is the area of the input space containing all the points for which a class-specific network output provides values higher than a certainty threshold. The proposed approach is a set membership technique which defines the network’s domain of validity by inverting its output activity on the input space. For a trained neural network, the result of this inversion is a set of hyper-boxes which constitute a reliable and \(\varepsilon\)-accurate computation of the domain of validity. Suitably defined metrics on the volume of the domain of validity provide a deterministic estimation of the generalization ability of the trained network not affected by random test set selection as with cross-validation. The effectiveness of the proposed generalization measures is demonstrated on illustrative examples using artificial and real datasets using swallow feed-forward neural networks such as Multi-layer perceptrons.

## Keywords

Neural networks Generalization Inversion Interval analysis Reliable computing## Abbreviations

- HPD
Highest posterior density

- INTLAB
INTerval LABoratory

- IA
Interval analysis

- MLP
Multi-layer perceptron

- OTS
Off training set

Probability density function

- SCS
Set computations with subpavings

- SIVIA
Set inversion via interval analysis

## Notes

### Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable suggestions and comments on earlier version of the manuscript that helped to significantly improve the paper at hand.

### Compliance with ethical standards

### Conflict of interest:

The authors declare that they have no conflict of interest.

## References

- 1.Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2015) Reliable estimation of a neural network’s domain of validity through interval analysis based inversion. In: 2015 international joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN.2015.7280794
- 2.Adam SP, Likas AC, Vrahatis MN (2017) Interval analysis based neural network inversion: a means for evaluating generalization. In: Boracchi G, Iliadis L, Jayne C, Likas A (eds) Engineering applications of neural networks. Springer International Publishing, Berlin, pp 314–326CrossRefGoogle Scholar
- 3.Adam SP, Magoulas GD, Karras DA, Vrahatis MN (2016) Bounding the search space for global optimization of neural networks learning error: an interval analysis approach. J Mach Learn Res 17(169):1–40. http://jmlr.org/papers/v17/14-350.html
- 4.Bishop CM (1996) Neural networks for pattern recognition. Oxford University Press, OxfordzbMATHGoogle Scholar
- 5.Courrieu P (1994) Three algorithms for estimating the domain of validity of feedforward neural networks. Neural Netw 7(1):169–174CrossRefGoogle Scholar
- 6.Eberhart R, Dobbins R (1991) Designing neural network explanation facilities using genetic algorithms. In: 1991 IEEE international joint conference on neural networks, vol 2, pp 1758–1763Google Scholar
- 7.Hampshire II JB, Pearlmutter BA (1991) Equivalence proofs for multilayer perceptron classifiers and the Bayesian discriminant function. In: Proceedings of the 1990 connectionist models summer school, vol 1, pp 159–172CrossRefGoogle Scholar
- 8.Hassoun MH (1995) Fundamentals of artificial neural networks. MIT Press, CambridgezbMATHGoogle Scholar
- 9.Haykin S (1999) Neural networks a comprehensive foundation, 2nd edn. Prentice-Hall, Upper Saddle River, NJzbMATHGoogle Scholar
- 10.Hernández-Espinosa C, Fernández-Redondo M, Ortiz-Gómez M (2003) Inversion of a Neural Network via Interval Arithmetic for Rule Extraction. In: Kaynak O, Alpaydin E, Oja E, Xu L (eds) Artificial Neural Networks and Neural Information Processing ICANN/ICONIP 2003, vol 2714. Springer, Berlin Heidelberg, pp 670–677 Lecture Notes in Computer SciencezbMATHCrossRefGoogle Scholar
- 11.Jaulin L, Kieffer M, Didrit O, Walter E (2001) Applied interval analysis with examples in parameter and state estimation, robust control and robotics. Springer, LondonzbMATHGoogle Scholar
- 12.Jaulin L, Walter E (1993) Set inversion via interval analysis for nonlinear bounded-error estimation. Automatica 29(4):1053–1064MathSciNetzbMATHCrossRefGoogle Scholar
- 13.Jensen C, Reed R, Marks R, El-Sharkawi M, Jung JB, Miyamoto R, Anderson G, Eggen C (1999) Inversion of feedforward neural networks: algorithms and applications. In: Proceedings of the IEEE 87(9):1536–1549CrossRefGoogle Scholar
- 14.Kamimura R (2017) Mutual information maximization for improving and interpreting multi-layered neural networks. In: 2017 IEEE symposium series on computational intelligence (SSCI), pp 1–7Google Scholar
- 15.Karystinos GN, Pados DA (2000) On overfitting, generalization, and randomly expanded training sets. IEEE Trans Neural Netw 11(5):1050–1057CrossRefGoogle Scholar
- 16.Kearfott RB (1996) Interval computations: introduction, uses, and resources. Euromath Bull 2(1):95–112MathSciNetGoogle Scholar
- 17.Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann Math Stat 23:462–466MathSciNetzbMATHCrossRefGoogle Scholar
- 18.Kindermann J, Linden A (1990) Inversion of neural networks by gradient descent. Parallel Comput 14(3):277–286CrossRefGoogle Scholar
- 19.Likas A (2001) Probability density estimation using artificial neural networks. Comput Phys Commun 135(2):167–175zbMATHCrossRefGoogle Scholar
- 20.Liu Y (1995) Unbiased estimate of generalization error and model selection in neural network. Neural Netw 8(2):215–219MathSciNetCrossRefGoogle Scholar
- 21.Lu BL, Kita H, Nishikawa Y (1999) Inverting feedforward neural networks using linear and nonlinear programming. IEEE Trans Neural Netw 10(6):1271–1290CrossRefGoogle Scholar
- 22.Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: an empirical study. In: International conference on learning representations. https://openreview.net/forum?id=HJC2SzZCW
- 23.Reed R, Marks R (1995) An evolutionary algorithm for function inversion and boundary marking. In: IEEE international conference on evolutionary computation, 1995, vol 2, pp 794–797Google Scholar
- 24.Richard M, Lippmann R (1991) Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput 3(4):461–483. https://doi.org/10.1162/neco.1991.3.4.461 CrossRefGoogle Scholar
- 25.Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407MathSciNetzbMATHCrossRefGoogle Scholar
- 26.Rump SM (1999) INTLAB - INTerval LABoratory. In: Csendes T (ed) Developments in reliable computing. Kluwer Academic, Dordrecht, Netherlands, pp 77–104CrossRefGoogle Scholar
- 27.Saad EW, Wunsch DC II (2007) Neural network explanation using inversion. Neural Netw 20(1):78–93zbMATHCrossRefGoogle Scholar
- 28.Theodoridis S, Pikrakis A, Koutroumbas K, Kavouras D (2010) Introduction to pattern recognition: a MATLAB approach. Academic Press, Burlington, MA 01803, USAGoogle Scholar
- 29.Thrun SB (1993) Extracting provably correct rules from artificial neural networks. Technical Report IAI–TR–93–5, Institut fur Informatik III, Bonn, GermanyGoogle Scholar
- 30.Tornil-Sin S, Puig V, Escobet T (2010) Set computations with subpavings in MATLAB: the SCS toolbox. In: 2010 IEEE international symposium on computer-aided control system design (CACSD), pp 1403–1408Google Scholar
- 31.Wolpert DH (1990) A mathematical theory of generalization: part I. Complex Syst 4(2):151–200zbMATHGoogle Scholar
- 32.Wolpert DH (1990) A mathematical theory of generalization: part II. Complex Syst 4(2):201–249zbMATHGoogle Scholar
- 33.Wolpert DH (1992) On the connection between in-sample testing and generalization error. Complex Syst 6(1):47–94MathSciNetzbMATHGoogle Scholar
- 34.Wolpert DH (1996) The existence of a priori distinctions between learning algorithms. Neural Comput 8(7):1391–1420. https://doi.org/10.1162/neco.1996.8.7.1391 CrossRefGoogle Scholar
- 35.Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390. https://doi.org/10.1162/neco.1996.8.7.1341 CrossRefGoogle Scholar