Skip to main content

Robust Pedestrian Detection: Faster Deployments with Fusion of Models

  • Conference paper
  • First Online:
Pattern Recognition (ACPR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12046))

Included in the following conference series:

Abstract

Pedestrian detection has a wide range of real-world critical applications including security and management of emergency scenarios. In critical applications, detection recall and precision are both essential to ensure the correct detection of all pedestrians. The development and deployment of object detection vision-based models is a time-consuming task, depending on long training and fine-tuning processes to achieve top performance. We propose an alternative approach, based on a fusion of pre-trained off-the-shelf state-of-the-art object detection models, and exploit base model divergences to quickly deploy robust ensembles with improved performance. Our approach promotes model reuse and does not require additional learning algorithms, making it suitable for rapid deployments of critical systems. Experimental results, conducted on PASCAL VOC07 test dataset, reveal mean average precision (mAP) improvements over base detection models, regardless of the set of models selected. Improvements in mAP were observed starting from just two detection models and reached 3.53% for a fusion of four detection models, resulting in an absolute fusion mAP of 83.65%. Moreover, the hyperparameters of our ensemble model may be adjusted to set an appropriate tradeoff between precision and recall to fit different recall and precision application requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I-511–I-518 (2001). https://doi.org/10.1109/CVPR.2001.990517

  2. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410

  3. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177

  4. Dollar, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1532–1545 (2014). https://doi.org/10.1109/TPAMI.2014.2300479

    Article  Google Scholar 

  5. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  6. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  7. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger (2016). https://doi.org/10.1109/CVPR.2017.690

  8. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). https://doi.org/10.1109/CVPR.2017.690

  9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 1–9 (2012). https://doi.org/10.1016/j.protcy.2014.09.007

    Article  Google Scholar 

  10. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015). https://doi.org/10.1016/j.infsof.2008.09.005

    Article  Google Scholar 

  11. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 7–12 June 2015, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594

  12. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)

  13. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision (2015). https://doi.org/10.1109/CVPR.2016.308

  14. Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv:1602.07261 (2016). https://doi.org/10.1016/j.patrec.2014.01.008

    Article  Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  16. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1

    Chapter  Google Scholar 

  17. Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. 11, 169–198 (2016). https://doi.org/10.1613/jair.614

    Article  Google Scholar 

  18. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996). https://doi.org/10.1007/BF00058655

    Article  MATH  Google Scholar 

  19. Freund, Y., Schapire, R.E.: A short introduction to boosting. J. Japanese Soc. Artif. Intell. 14, 771–780 (1999)

    Google Scholar 

  20. Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992). https://doi.org/10.1016/S0893-6080(05)80023-1

    Article  Google Scholar 

  21. Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996). https://doi.org/10.1007/bf00117832

    Article  MATH  Google Scholar 

  22. Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3, 79–87 (2008). https://doi.org/10.1162/neco.1991.3.1.79

    Article  Google Scholar 

  23. Masoudnia, S., Ebrahimpour, R.: Mixture of experts: a literature survey. Artif. Intell. Rev. 42, 275–293 (2014). https://doi.org/10.1007/s10462-012-9338-y

    Article  Google Scholar 

  24. Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE Trans. Neural Networks Learn. Syst. 23, 1177–1193 (2012). https://doi.org/10.1109/TNNLS.2012.2200299

    Article  Google Scholar 

  25. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  26. Niitani, Y., Ogawa, T., Saito, S., Saito, M.: ChainerCV: a library for deep learning in computer vision, pp. 2–5 (2017). https://doi.org/10.1145/3123266.3129395

  27. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81

  28. Zhao, X., Li, W., Zhang, Y., Gulliver, T.A., Chang, S., Feng, Z.: A faster RCNN-based pedestrian detection system. In: IEEE Vehicular Technology Conference (2017). https://doi.org/10.1109/VTCFall.2016.7880852

  29. Wu, Q., Liao, S.: Single shot multibox detector for vehicles and pedestrians detection and classification. DEStech Trans. Eng. Technol. Res., 22–28 (2018). https://doi.org/10.12783/dtetr/apop2017/18705

  30. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1016/j.nima.2015.05.028

    Article  Google Scholar 

  31. Liu, Z., Chen, Z., Li, Z., Hu, W.: An Efficient pedestrian detection method based on YOLOv2. Math. Probl. Eng. 2018 (2018). https://doi.org/10.1155/2018/3518959

    Google Scholar 

  32. Qiu, S., Wen, G., Deng, Z., Liu, J., Fan, Y.: Accurate non-maximum suppression for object detection in high-resolution remote sensing images. Remote Sens. Lett. 9, 237–246 (2018). https://doi.org/10.1080/2150704X.2017.1415473

    Article  Google Scholar 

  33. Felzenszwalb, P.F., Girshick, R.B., Mcallester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2009). https://doi.org/10.1109/TPAMI.2009.167

    Article  Google Scholar 

  34. Devernay, F.: A non-maxima suppression method for edge detection with sub-pixel accuracy. INRIA Res. Rep. 2724 (1995)

    Google Scholar 

  35. Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65, 247–271 (2006). https://doi.org/10.1007/s10994-006-9449-2

    Article  Google Scholar 

  36. Skalak, D., et al.: The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of the American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop, vol. 1129, p. 1133. Citeseer (1996)

    Google Scholar 

  37. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998). https://doi.org/10.1109/34.709601

    Article  Google Scholar 

  38. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960). https://doi.org/10.1177/001316446002000104

    Article  Google Scholar 

  39. Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 211–218. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

Download references

Acknowledgments

This work was funded by the Science and Technology Development Fund of Macau SAR (File no. 138/2016/A3).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chan Tong Lam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lam, C.T., Gaspar, J., Ke, W., Yang, X., Im, S.K. (2020). Robust Pedestrian Detection: Faster Deployments with Fusion of Models. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41404-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41403-0

  • Online ISBN: 978-3-030-41404-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics