Abstract
Currently, state-of-the-art object detectors are based on Faster R-CNN. We firstly revisit Faster R-CNN and explore problems in it, e.g., coarseness of feature maps for accurate localization, fixed-window feature extraction in RPN and insensitivity for small scale objects. Then a novel object detection network is proposed to address these problems. Specifically, we utilize a two-stage cascade multi-scale proposal generation network to get high accurate proposals: an original RPN is adopted to initially generate coarse proposals, then another network with multi-layer features and RoI pooling layer are introduced to refine these proposals. We also generate small scale proposals in the second stage simultaneously. After that, a detection network with multi-layer features further classifies and refines proposals. A novel 3-step joint training algorithm is introduced to optimize our model. Experiments on PASCAL VOC 2007 and 2012 demonstrate the effectiveness of our network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Everingham, M., Eslami, S.A., Van Gool, L., et al.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111, 98–136 (2015). LNCS. Springer
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common Objects in Context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). doi:10.1007/978-3-319-10602-1_48
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates, Montréal (2015)
Girshick, R., Donahue, J., Darrell, T., et al.: Region-based convolutional networks for accurate object detection and segmentation. In: IEEE Computer Vision and Pattern Recognition, pp. 580–587. IEEE Press, Columbus (2014)
Uijlings, J.R., Van De Sande, K.E., Gevers, T., et al.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, South Lake Tahoe (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision, pp. 1440–1448. IEEE Press, Santiago (2015)
Yu, W., Yang, K., Bai, Y., et al.: Visualizing and comparing convolutional neural networks. arXiv preprint arXiv:1412.6631 (2014)
Kong, T., Yao, A., Chen, Y., et al.: HyperNet: towards accurate region proposal generation and joint object detection. In: IEEE Computer Vision and Pattern Recognition, pp. 845–853. IEEE Press, Las Vegas (2016)
Zhang, L., Lin, L., Liang, X., He, K.: Is Faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). doi:10.1007/978-3-319-46475-6_28
Yang, B., Yan, J., Lei, Z., et al.: Craft objects from images. In: IEEE Computer Vision and Pattern Recognition, pp. 6043–6051. IEEE Press, Las Vegas (2016)
Gidaris, S., Komodakis, N.: Attend refine repeat: active box proposal generation via in-out localization. arXiv preprint arXiv:1606.04446 (2016)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). doi:10.1007/978-3-319-46493-0_22
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). doi:10.1007/978-3-319-46448-0_2
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144 (2016)
Bell, S., Lawrence Zitnick, C., Bala, K., et al.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: IEEE Computer Vision and Pattern Recognition, pp. 2874–2883. IEEE Press, Las Vegas (2016)
Ghodrati, A., Diba, A., Pedersoli, M., et al.: DeepProposal: hunting objects by cascading deep convolutional layers. In: IEEE International Conference on Computer Vision, pp. 2578–2586. IEEE Press, Santiago (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Han, G., Zhang, X., Li, C. (2017). Revisiting Faster R-CNN: A Deeper Look at Region Proposal Network. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-70090-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70089-2
Online ISBN: 978-3-319-70090-8
eBook Packages: Computer ScienceComputer Science (R0)