Enabling More Accurate Bounding Boxes for Deep Learning-Based Real-Time Human Detection
While human detection has been significantly recognized and widely used in many areas, the importance of human detection for behavioral analysis in medical research has been rarely reported. Recently, however, efforts have been actively made to recognize behavior diseases by measuring gait variability using pattern analysis of human detection results from videos taken by cameras. For this purpose, it is very crucial to establish robust human detection algorithms. In this work, we modified deep learning models by changing multi-detection into human detection. Also, we improved the localization of human detection by adjusting the input image according to the ratio of objects in an image and improving the results of several bounding boxes by interpolation. Experimental results demonstrated that by adopting the proposals, the accuracy of human detection could be increased significantly.
KeywordsHuman detection Deep learning Bounding box regression Localization Real-time analysis
This work was supported by the Brain Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2016M3C7A1905477, NRF-2014M3C7A1046050) and the Basic Science Research Program through the NRF funded by the Ministry of Education (NRF-2017R1D1A1B03036423). This study was approved by the Institutional Review Board of Gwangju Institute of Science and Technology (IRB no. 20180629-HR-36-07-04). All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.
- 2.Felzenszwalb, P. (2008). A discriminatively trained, multiscale, deformable part model. In 10th IEEE International Symposium on High Performance Distributed Computing (pp. 1–8). New York: IEEE Press. https://doi.org/10.1109/cvpr.2008.4587597.
- 4.Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 580–587). New York: IEEE Press. https://doi.org/10.1109/cvpr.2014.81.
- 5.Girshick, R. (2015). Fast R-CNN. In IEEE International Conference on Computer Vision (pp. 1440–1448). New York: IEEE press. https://doi.org/10.1109/iccv.2015.169.
- 7.Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (pp. 779–788). New York: IEEE Press. https://doi.org/10.1109/cvpr.2016.91.
- 8.Gwak, J., Park, G., & Jeon, M. (2017). Viewpoint invariant person re-identification for global multi-object tracking with non-overlapping cameras. KSII Transactions on Internet and Information Systems, 11, 2075–2092.Google Scholar
- 13.Yu, H., Riskowski, J., & Brower, R. (2009). Gait variability while walking with three different speeds. In: 2009 IEEE International Conference on Rehabilitation Robotics (pp. 823–827). New York: IEEE Press. https://doi.org/10.1109/icorr.20095209486.
- 17.Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 6517–6525). New York: IEEE Press. https://doi.org/10.1109/cvpr.2017.690.
- 18.Huang, J., Rathod, V., Sun, C., & Zhu, M. (2017). Speed/accuracy trade-offs for modern convolutional object detectors. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 3296–3297). New York: IEEE Press. https://doi.org/10.1109/cvpr.2017.351.