Robust Pedestrian Detection: Faster Deployments with Fusion of Models

Lam, Chan Tong; Gaspar, Jose; Ke, Wei; Yang, Xu; Im, Sio Kei

doi:10.1007/978-3-030-41404-7_11

Chan Tong Lam¹²,
Jose Gaspar¹²,
Wei Ke¹²,
Xu Yang¹² &
…
Sio Kei Im¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12046))

Included in the following conference series:

Asian Conference on Pattern Recognition

1403 Accesses
1 Citations

Abstract

Pedestrian detection has a wide range of real-world critical applications including security and management of emergency scenarios. In critical applications, detection recall and precision are both essential to ensure the correct detection of all pedestrians. The development and deployment of object detection vision-based models is a time-consuming task, depending on long training and fine-tuning processes to achieve top performance. We propose an alternative approach, based on a fusion of pre-trained off-the-shelf state-of-the-art object detection models, and exploit base model divergences to quickly deploy robust ensembles with improved performance. Our approach promotes model reuse and does not require additional learning algorithms, making it suitable for rapid deployments of critical systems. Experimental results, conducted on PASCAL VOC07 test dataset, reveal mean average precision (mAP) improvements over base detection models, regardless of the set of models selected. Improvements in mAP were observed starting from just two detection models and reached 3.53% for a fusion of four detection models, resulting in an absolute fusion mAP of 83.65%. Moreover, the hyperparameters of our ensemble model may be adjusted to set an appropriate tradeoff between precision and recall to fit different recall and precision application requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I-511–I-518 (2001). https://doi.org/10.1109/CVPR.2001.990517
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177
Dollar, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1532–1545 (2014). https://doi.org/10.1109/TPAMI.2014.2300479
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger (2016). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). https://doi.org/10.1109/CVPR.2017.690
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 1–9 (2012). https://doi.org/10.1016/j.protcy.2014.09.007
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015). https://doi.org/10.1016/j.infsof.2008.09.005
Article Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 7–12 June 2015, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision (2015). https://doi.org/10.1109/CVPR.2016.308
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv:1602.07261 (2016). https://doi.org/10.1016/j.patrec.2014.01.008
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Chapter Google Scholar
Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. 11, 169–198 (2016). https://doi.org/10.1613/jair.614
Article Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996). https://doi.org/10.1007/BF00058655
Article MATH Google Scholar
Freund, Y., Schapire, R.E.: A short introduction to boosting. J. Japanese Soc. Artif. Intell. 14, 771–780 (1999)
Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992). https://doi.org/10.1016/S0893-6080(05)80023-1
Article Google Scholar
Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996). https://doi.org/10.1007/bf00117832
Article MATH Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3, 79–87 (2008). https://doi.org/10.1162/neco.1991.3.1.79
Article Google Scholar
Masoudnia, S., Ebrahimpour, R.: Mixture of experts: a literature survey. Artif. Intell. Rev. 42, 275–293 (2014). https://doi.org/10.1007/s10462-012-9338-y
Article Google Scholar
Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE Trans. Neural Networks Learn. Syst. 23, 1177–1193 (2012). https://doi.org/10.1109/TNNLS.2012.2200299
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Niitani, Y., Ogawa, T., Saito, S., Saito, M.: ChainerCV: a library for deep learning in computer vision, pp. 2–5 (2017). https://doi.org/10.1145/3123266.3129395
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
Zhao, X., Li, W., Zhang, Y., Gulliver, T.A., Chang, S., Feng, Z.: A faster RCNN-based pedestrian detection system. In: IEEE Vehicular Technology Conference (2017). https://doi.org/10.1109/VTCFall.2016.7880852
Wu, Q., Liao, S.: Single shot multibox detector for vehicles and pedestrians detection and classification. DEStech Trans. Eng. Technol. Res., 22–28 (2018). https://doi.org/10.12783/dtetr/apop2017/18705
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1016/j.nima.2015.05.028
Article Google Scholar
Liu, Z., Chen, Z., Li, Z., Hu, W.: An Efficient pedestrian detection method based on YOLOv2. Math. Probl. Eng. 2018 (2018). https://doi.org/10.1155/2018/3518959
Google Scholar
Qiu, S., Wen, G., Deng, Z., Liu, J., Fan, Y.: Accurate non-maximum suppression for object detection in high-resolution remote sensing images. Remote Sens. Lett. 9, 237–246 (2018). https://doi.org/10.1080/2150704X.2017.1415473
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., Mcallester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2009). https://doi.org/10.1109/TPAMI.2009.167
Article Google Scholar
Devernay, F.: A non-maxima suppression method for edge detection with sub-pixel accuracy. INRIA Res. Rep. 2724 (1995)
Google Scholar
Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65, 247–271 (2006). https://doi.org/10.1007/s10994-006-9449-2
Article Google Scholar
Skalak, D., et al.: The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of the American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop, vol. 1129, p. 1133. Citeseer (1996)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998). https://doi.org/10.1109/34.709601
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960). https://doi.org/10.1177/001316446002000104
Article Google Scholar
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 211–218. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar

Download references

Acknowledgments

This work was funded by the Science and Technology Development Fund of Macau SAR (File no. 138/2016/A3).

Author information

Authors and Affiliations

School of Applied Sciences, Macao Polytechnic Institute, Macao S.A.R., China
Chan Tong Lam, Jose Gaspar, Wei Ke, Xu Yang & Sio Kei Im

Authors

Chan Tong Lam
View author publications
You can also search for this author in PubMed Google Scholar
Jose Gaspar
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ke
View author publications
You can also search for this author in PubMed Google Scholar
Xu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Sio Kei Im
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chan Tong Lam .

Editor information

Editors and Affiliations

University of Malaya, Kuala Lumpur, Malaysia
Shivakumara Palaiahnakote
Consiglio Nazionale delle Ricerche, ICAR, Naples, Italy
Gabriella Sanniti di Baja
Chinese Academy of Sciences, Beijing, China
Liang Wang
Auckland University of Technology, Auckland, New Zealand
Wei Qi Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lam, C.T., Gaspar, J., Ke, W., Yang, X., Im, S.K. (2020). Robust Pedestrian Detection: Faster Deployments with Fusion of Models. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-41404-7_11
Published: 23 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41403-0
Online ISBN: 978-3-030-41404-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics