Abstract
Pedestrian detection has a wide range of real-world critical applications including security and management of emergency scenarios. In critical applications, detection recall and precision are both essential to ensure the correct detection of all pedestrians. The development and deployment of object detection vision-based models is a time-consuming task, depending on long training and fine-tuning processes to achieve top performance. We propose an alternative approach, based on a fusion of pre-trained off-the-shelf state-of-the-art object detection models, and exploit base model divergences to quickly deploy robust ensembles with improved performance. Our approach promotes model reuse and does not require additional learning algorithms, making it suitable for rapid deployments of critical systems. Experimental results, conducted on PASCAL VOC07 test dataset, reveal mean average precision (mAP) improvements over base detection models, regardless of the set of models selected. Improvements in mAP were observed starting from just two detection models and reached 3.53% for a fusion of four detection models, resulting in an absolute fusion mAP of 83.65%. Moreover, the hyperparameters of our ensemble model may be adjusted to set an appropriate tradeoff between precision and recall to fit different recall and precision application requirements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I-511–I-518 (2001). https://doi.org/10.1109/CVPR.2001.990517
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177
Dollar, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1532–1545 (2014). https://doi.org/10.1109/TPAMI.2014.2300479
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger (2016). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). https://doi.org/10.1109/CVPR.2017.690
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 1–9 (2012). https://doi.org/10.1016/j.protcy.2014.09.007
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015). https://doi.org/10.1016/j.infsof.2008.09.005
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 7–12 June 2015, pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision (2015). https://doi.org/10.1109/CVPR.2016.308
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv:1602.07261 (2016). https://doi.org/10.1016/j.patrec.2014.01.008
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. 11, 169–198 (2016). https://doi.org/10.1613/jair.614
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996). https://doi.org/10.1007/BF00058655
Freund, Y., Schapire, R.E.: A short introduction to boosting. J. Japanese Soc. Artif. Intell. 14, 771–780 (1999)
Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992). https://doi.org/10.1016/S0893-6080(05)80023-1
Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996). https://doi.org/10.1007/bf00117832
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3, 79–87 (2008). https://doi.org/10.1162/neco.1991.3.1.79
Masoudnia, S., Ebrahimpour, R.: Mixture of experts: a literature survey. Artif. Intell. Rev. 42, 275–293 (2014). https://doi.org/10.1007/s10462-012-9338-y
Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE Trans. Neural Networks Learn. Syst. 23, 1177–1193 (2012). https://doi.org/10.1109/TNNLS.2012.2200299
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4
Niitani, Y., Ogawa, T., Saito, S., Saito, M.: ChainerCV: a library for deep learning in computer vision, pp. 2–5 (2017). https://doi.org/10.1145/3123266.3129395
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
Zhao, X., Li, W., Zhang, Y., Gulliver, T.A., Chang, S., Feng, Z.: A faster RCNN-based pedestrian detection system. In: IEEE Vehicular Technology Conference (2017). https://doi.org/10.1109/VTCFall.2016.7880852
Wu, Q., Liao, S.: Single shot multibox detector for vehicles and pedestrians detection and classification. DEStech Trans. Eng. Technol. Res., 22–28 (2018). https://doi.org/10.12783/dtetr/apop2017/18705
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1016/j.nima.2015.05.028
Liu, Z., Chen, Z., Li, Z., Hu, W.: An Efficient pedestrian detection method based on YOLOv2. Math. Probl. Eng. 2018 (2018). https://doi.org/10.1155/2018/3518959
Qiu, S., Wen, G., Deng, Z., Liu, J., Fan, Y.: Accurate non-maximum suppression for object detection in high-resolution remote sensing images. Remote Sens. Lett. 9, 237–246 (2018). https://doi.org/10.1080/2150704X.2017.1415473
Felzenszwalb, P.F., Girshick, R.B., Mcallester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2009). https://doi.org/10.1109/TPAMI.2009.167
Devernay, F.: A non-maxima suppression method for edge detection with sub-pixel accuracy. INRIA Res. Rep. 2724 (1995)
Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65, 247–271 (2006). https://doi.org/10.1007/s10994-006-9449-2
Skalak, D., et al.: The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of the American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop, vol. 1129, p. 1133. Citeseer (1996)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998). https://doi.org/10.1109/34.709601
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960). https://doi.org/10.1177/001316446002000104
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 211–218. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Acknowledgments
This work was funded by the Science and Technology Development Fund of Macau SAR (File no. 138/2016/A3).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lam, C.T., Gaspar, J., Ke, W., Yang, X., Im, S.K. (2020). Robust Pedestrian Detection: Faster Deployments with Fusion of Models. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12046. Springer, Cham. https://doi.org/10.1007/978-3-030-41404-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-41404-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41403-0
Online ISBN: 978-3-030-41404-7
eBook Packages: Computer ScienceComputer Science (R0)