StairsNet: Mixed Multi-scale Network for Object Detection

Gao, Weiyi; Cao, Wenlong; Zhai, Jian; Rui, Jianwu

doi:10.1007/978-3-319-77380-3_29

StairsNet: Mixed Multi-scale Network for Object Detection

Weiyi Gao^19,20,
Wenlong Cao^19,20,
Jian Zhai²⁰ &
…
Jianwu Rui²⁰

Conference paper
First Online: 10 May 2018

2770 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10735))

Abstract

It is common to choose image classification network as backbone in the object detector. The art-of-the-state image classification network exhibits excellent performance on image classification, but that network hurts the detection efficiency, mainly due to the coarseness of features from several convolution and pooling layers. In this paper, we present a single deep neural network with inceptions, called StairsNet, to take advantage of the art-of-the-state image classification network in object detection. In contrast to previous single network SSD [13] which uses VGG-16 as a feature to extract network, our approach applies recently state-of-the-art classification network Residual Network (ResNets [5]). Meanwhile, to avoid coarseness of the last CNN feature, StairsNet not only utilizes various of scale features, but also mixes different scale features to predict. To this end, we insert two stairs-like architectures into the network: top stairway network that mixes multi-scale feature maps as input to predict bounding boxes and bottom stairway network that turns into two different scale feature branches. Our StairsNet significantly increases the PASCAL-style mean Average Precision (mAP) from 75.0% (SSD + ResNet-101) to 77.7%. Code is available at https://github.com/gwyve/caffe/tree/StairsNet.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 155.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–202 (2012)
Article Google Scholar
Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1312–1328 (2012)
Article Google Scholar
Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015) arXiv preprint arXiv:1502.03167
Kong, T., Yao, A., Chen, Y., Sun, F.: HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 845–853 (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Google Scholar
Li, Y., He, K., Sun, J., et al.: R-FCN: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network (2013). arXiv preprint arXiv:1312.4400
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature Pyramid Networks for Object Detection (2016)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Pont-Tuset, J., Arbelaez, P., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 128–140 (2017)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: Better, Faster, Stronger (2016). arXiv preprint arXiv:1612.08242
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: Integrated recognition, localization and detection using convolutional networks (2013). arXiv preprint arXiv:1312.6229
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, Inception-Resnet and the impact of residual connections on learning (2016). arXiv preprint arXiv:1602.07261
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Szegedy, C., Reed, S., Erhan, D., Anguelov, D., Ioffe, S.: Scalable, high-quality object detection. Computer Science (2014)
Google Scholar
Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. In: Advances in Neural Information Processing Systems, pp. 2553–2561 (2013)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Article Google Scholar
Zitnick, C.L., Dollár, P.: Edge Boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26
Chapter Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant No. 61432001, the National Science and Technology Major Project under Grant No. 2014ZX01029101-002, the Youth Innovation Promotion Association CAS under Grant No. 2016105, and the National High-tech R&D Program (863 Program) under Grant No. 2013AA01A603.

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, China
Weiyi Gao & Wenlong Cao
Institute of Software, Chinese Academy of Sciences, National Engineering Research Center of Fundamental Software, Beijing, China
Weiyi Gao, Wenlong Cao, Jian Zhai & Jianwu Rui

Authors

Weiyi Gao
View author publications
You can also search for this author in PubMed Google Scholar
Wenlong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Jianwu Rui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiyi Gao .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Bing Zeng
University of Chinese Academy of Sciences, Beijing, China
Qingming Huang
University of Ottawa, Ottawa, Ontario, Canada
Abdulmotaleb El Saddik
University of Electronic Science and Technology of China, Chengdu, China
Hongliang Li
Chinese Academy of Sciences, Beijing, China
Shuqiang Jiang
Harbin Institute of Technology, Harbin, China
Xiaopeng Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, W., Cao, W., Zhai, J., Rui, J. (2018). StairsNet: Mixed Multi-scale Network for Object Detection. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-77380-3_29
Published: 10 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77379-7
Online ISBN: 978-3-319-77380-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics