End-to-End Lifelong Learning: a Framework to Achieve Plasticities of both the Feature and Classifier Constructions
- 78 Downloads
Plasticity in our brain offers us promising ability to learn and know the world. Although great successes have been achieved in many fields, few bio-inspired machine learning methods have mimicked this ability. Consequently, when meeting large-scale or time-varying data, these bio-inspired methods are infeasible, due to the reasons that they lack plasticity and need all training data loaded into memory. Furthermore, even the popular deep convolutional neural network (CNN) models have relatively fixed structures and cannot process time varying data well. Through incremental methodologies, this paper aims at exploring an end-to-end lifelong learning framework to achieve plasticities of both the feature and classifier constructions. The proposed model mainly comprises of three parts: Gabor filters followed by max pooling layer offering shift and scale tolerance to input samples, incremental unsupervised feature extraction, and incremental SVM trying to achieve plasticities of both the feature learning and classifier construction. Different from CNN, plasticity in our model has no back propogation (BP) process and does not need huge parameters. Our incremental models, including IncPCANet and IncKmeansNet, have achieved better results than PCANet and KmeansNet on minist and Caltech101 datasets respectively. Meanwhile, IncPCANet and IncKmeansNet show promising plasticity of feature extraction and classifier construction when the distribution of data changes. Lots of experiments have validated the performance of our model and verified a physiological hypothesis that plasticity exists in high level layer better than that in low level layer.
KeywordsPlasticity Lifelong learning End-to-end Incremental PCANet Incremental KMeansNet Incremental SVM
Compliance with Ethical Standards
Conflict of Interests
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- 1.Jim M, David LG. Multiclass object recognition with sparse, localized features. In: IEEE Computer society conference on computer vision and pattern recognition; 2006. p. 11–18.Google Scholar
- 5.LeCun Y, Bengio Y. Convolutional networks for images, speech, and time series. In: The handbook of brain theory and neural networks; 1995. 3361(10).Google Scholar
- 6.Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems; 2012. p. 1097–1105.Google Scholar
- 7.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. 2014.
- 8.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 1–9.Google Scholar
- 13.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. arXiv:1512.03385. 2015.
- 16.Hyvarinen A. Survey on independent component analysis. Neural Computing Surveys. 1999;2(4):94–128.Google Scholar
- 17.Bartlett MS. Independent component representations for face recognition. In: Face image analysis by unsupervised learning. US: Springer; 2001. p. 39–67.Google Scholar
- 19.Lee H, Battle A, Raina R, Ng AY. Efficient sparse coding algorithms. In: Advances in neural information processing systems; 2007. p. 801–808.Google Scholar
- 20.Schölkopf B, Smola A, Müller KR. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998;10(5):1299–319.Google Scholar
- 21.Mika S, Ratsch G, Weston J, Scholkopf B, Mullers KR. Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX, proceedings of the 1999 IEEE signal processing society workshop; 1999. p. 41–48.Google Scholar
- 22.Bach FR, Jordan MI. Kernel independent component analysis. J Mach Learn Res. 2002;3:1–48.Google Scholar
- 23.MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability; 1967. 1(14): p. 281–297.Google Scholar
- 24.Hegde A, Principe JC, Erdogmus D, Ozertem U, Rao YN, Peddaneni H. Perturbation-based eigenvector updates for on-line principal components analysis and canonical correlation analysis. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology. 2006;45(1–2):85–95.CrossRefGoogle Scholar
- 26.Krasulina T. Method of stochastic approximation in the determination of the largest eigenvalue of the mathematical expectation of random matrices. In: Automatation and remote control; 1970. p. 50–56.Google Scholar
- 27.Diehl CP, Cauwenberghs G. SVM incremental learning, adaptation and optimization. Proceedings of the International Joint Conference on Neural Networks. 2003;4:2685–90.Google Scholar
- 28.Thrun S. Explanation-based neural network learning: a lifelong learning approach. Springer Science & Business Media. 2012;(357).Google Scholar
- 29.Thrun S, O’Sullivan J. Discovering structure in multiple learning tasks: the TC algorithm. ICML. 1996;96:489–97.Google Scholar
- 32.Donmez P, Carbonell JG. Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM; p. 619–628.Google Scholar
- 33.Tong S, Koller D. Active learning for structure in bayesian networks. In: IJCAI; 2001.Google Scholar
- 34.Brunskill E, Leffler B, Li L, Littman ML, Roy N. Corl: a continuous-state offset-dynamics reinforcement learner. In: Proceedings of the 24th conference on uncertainty in artificial intelligence (UAI); 2012. p. 53–61.Google Scholar
- 35.Mitchell TM, Cohen WW, Hruschka Jr ER, Talukdar PP, Betteridge J, Carlson A, Lao N. Never ending learning. In: AAAI; 2015. p. 2302–2310.Google Scholar
- 36.Carlson A, Betteridge J, Kisiel B, Settles B, Hruschka Jr ER, Mitchell TM. Toward an architecture for never-ending language learning. AAAI. 2010;5:3.Google Scholar
- 38.Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM; 2007. p. 473–480.Google Scholar