Incremental Object Detection Using Ensemble Modeling and Deep Transfer Learning

  • Piyapong Huayhongthong
  • Siriyakorn Rerk-u-sukEmail author
  • Songwit Booddee
  • Praisan Padungweang
  • Kittipong Warasup
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1149)


Object detection is a subset of computer vision that can be accomplished using machine learning. The main process of object detection using machine learning model is model training with images containing objects of interest. However, the model training need a lot of training images. In addition, to improve the model ability to detect addition class of object, it need to be re-trained with both old and new image datasets. It is a time and computation consuming process. This paper proposes an incremental object detection model without re-training the old images. An ensemble model and transfer learning approach are used. The proposed model consist of three parts, two object detection sub-models and a decision model, which are a pre-trained model, a transferred-model and an ensemble model respectively. To illustrate the proposed model, the trained YOLO algorithm training with eighty object categories, 330,000 total images, from COCO image dataset is selected as the pre-trained model. It also be used as an initial model to train the transferred-model using transfer learning technique. Only new images are used for transferred-model training. The ensemble model with the bagging technique is used as a final classifier for choosing the best decision from both sub-models. Using our proposed model, the need of both the number of training dataset and the training time are reduced. Only several hours are needed for model training with three new object categories, 3,000 total images. The experimental results show that the proposed model achieve high performance on test image dataset with 93.33% accuracy.


Common object in object (COCO) Deep transfer learning Ensemble modeling Object detection You Only Look Once (YOLO) 


  1. 1.
    Rajeshwari, P., Abhishek, P., Srikanth, P., Vinod, T.: Object detection: an overview. Int. J. Trend Sci. Res. Dev. (IJTSRD) 3, 1663–1665 (2019)Google Scholar
  2. 2.
    Zhao, Z., Zheng, P., Xu, S., Wu, X.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)CrossRefGoogle Scholar
  3. 3.
    Wu, X., Sahoo, D., Hoi, S.C.H.: Recent Advances in Deep Learning for Object Detection. arXiv preprint, arXiv:1908.03673v1 (2019)
  4. 4.
    Zhang, Y., Sohn, K., Villegas, R., Pan, G., Lee, H.: Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 7–12 June 2015Google Scholar
  5. 5.
    Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, 23–28 June 2014Google Scholar
  6. 6.
    Lu, Y., Javidi, T., Lazebnik, S.: Adaptive object detection using adjacency and zoom prediction. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 2016Google Scholar
  7. 7.
    Sun, Q., Pfahringer, B.: Bagging ensemble selection. In: Wang, D., Reynolds, M., (eds.) Advances in Artificial Intelligence. AI 2011. Lecture Notes in Artificial Intelligence, Perth, Australia, 5–8 December 2011, vol. 7106, pp. 251–260. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, USA, 23–28 June 2014Google Scholar
  9. 9.
    Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA, 7–13 December 2015Google Scholar
  10. 10.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing System, Quebec, Canada, December 2015Google Scholar
  11. 11.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017Google Scholar
  12. 12.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, USA, 27–30 June 2016Google Scholar
  13. 13.
    O’Shea, K., Nash, R.: An introduction to convolutional neural networks. arXiv preprint, arXiv:1511.08458 (2015)
  14. 14.
    Nwankpa, C., Ijomah, W., Gachagan, A., Marshall, S.: Activation functions: comparison of trends in practice and research for deep learning. arXiv preprint, arXiv:1811.03378 (2018)
  15. 15.
    Lin, T., Maire, M., Belongie, S., et al.: Common objects in context. Accessed 12 Sept 2019
  16. 16.
    Open Images Dataset V5. Accessed 21 Nov 2019

Copyright information

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Piyapong Huayhongthong
    • 1
  • Siriyakorn Rerk-u-suk
    • 1
    Email author
  • Songwit Booddee
    • 1
  • Praisan Padungweang
    • 1
  • Kittipong Warasup
    • 1
  1. 1.School of Information TechnologyKing Mongkut’s University of Technology ThonburiBangkokThailand

Personalised recommendations