Predictive EWC: mitigating catastrophic forgetting of neural network through pre-prediction of learning data

  • DaeYong Hong
  • Yan Li
  • Byeong-Seok ShinEmail author
Original Research


Each time an artificial neural network learns an unseen dataset, it loses its ability to recognize the feature that it had learned before. This phenomenon is called the catastrophic forgetting problem (CFP). In image classification, the representative feature of each class that has significantly contributed to determining the class into which a given an image is categorized and thus directly influences performance. CFP can thus be damaging. The proposed algorithm, called Predictive EWC or PEWC, learns only sampled data from a new task consisting of the most challenging images for the network to classify. The criterion for extracting a sample is the absolute value of the difference between the network’s predicted value and the annotated value of the given image. This reduces the size of the task to be learned and mitigates the likelihood of CFP. An experiment showed that the average accuracy of a given task is 5% higher when the proposed algorithm is used in comparison with a prevalent algorithm, EWC, while consuming fewer resources.


Transfer learning Catastrophic forgetting Sampling of data Predictive elastic weight consolidation 



This work was supported by INHA University Grant.


  1. Abraham WC, Robins A (2005) Memory retention-synaptic stability versus plasticity dilemma. Trends Neurosci 28(2):73–78CrossRefGoogle Scholar
  2. Becker S, Zhang Y, Lee AA (2018) Geometry of energy landscapes and the optimizability of deep neural networks. arXiv:1808.00408
  3. Chen LC, Papandreou G, Kokkinos L, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848CrossRefGoogle Scholar
  4. Coop R, Mishtal A, Arel I (2013) Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting. IEEE Trans Neural Netw Learn Syst 24(10):1623–1634CrossRefGoogle Scholar
  5. Gepperth A, Karaoguz C (2016) A bio-inspired incremental learning architecture for applied perceptual problems. Cogn Comput 8(5):924–934CrossRefGoogle Scholar
  6. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  7. Kafle K, Kanan C (2017) Visual question answering: datasets, algorithms, and future challenges. Comput Vis Image Underst 163:3–20CrossRefGoogle Scholar
  8. Kemker R, McClure M, Abitino A, Hayes TL, Kanan C (2018) Measuring catastrophic forgetting in neural networks. In: Thirty-second AAAI conference on artificial intelligenceGoogle Scholar
  9. Khilari P, Bhope VP (2015) A review on speech to text conversion methods. Int J Adv Res Comput Eng Technol 4:3067–3072Google Scholar
  10. Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci. zbMATHGoogle Scholar
  11. Kyung-Mo K, Eui-Young C (2017) Image recognition performance enhancements using image normalization. Hum Centric Comput Inf Sci 7(1):33CrossRefGoogle Scholar
  12. LeCun YA, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. Accessed 11 Oct 2018
  13. LeCun YA et al (2012) Efficient backprop. In: Neural networks: tricks of the trade. Lecture notes in computer science, vol 7700, pp 9–48Google Scholar
  14. McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. Psychology of learning and motivation, vol 24. Academic Press, London, pp 109–165Google Scholar
  15. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. CoRR abs/1312.5602Google Scholar
  16. Ning Y, Zeng Y, Feng G, Tianrui L, Xinmin T, Yi P (2017) Deep learning in genomic and medical image data analysis: challenges and approaches. J Inf Process Syst 13(2):204–214Google Scholar
  17. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition, pp 1717–1724Google Scholar
  18. Ren B, Wang H, Li J, Gao H (2017) Life-long learning based on dynamic combination model. Appl Soft Comput 56:398–404CrossRefGoogle Scholar
  19. Robins A (1995) Catastrophic forgetting, rehearsal and pseudo-rehearsal. Connect Sci 7(2):123–146CrossRefGoogle Scholar
  20. Sang-Geol L, Yunsick S, Yeon-Gyu K, Eui-Young C (2018) Variatiaons of AlexNet and GoogLeNet to improve korean character recognition performance. J Inf Process Syst 14(1):205–217Google Scholar
  21. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823Google Scholar
  22. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  23. Song Z, Sheng X (2018) 3D face recognition: a survey. Hum Centric Comput Inf Sci. Google Scholar
  24. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM (2017) ChestX-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2097–2106Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer EngineeringInha UniversityIncheonSouth Korea

Personalised recommendations