Abstract
We propose a novel approach for class-agnostic object proposal generation, which is efficient and especially well-suited to detect small objects. Efficiency is achieved by scale-specific objectness attention maps which focus the processing on promising parts of the image and reduce the amount of sampled windows strongly. This leads to a system, which is \(33\%\) faster than the state-of-the-art and clearly outperforming state-of-the-art in terms of average recall. Secondly, we add a module for detecting small objects, which are often missed by recent models. We show that this module improves the average recall for small objects by about \(53\%\). Our implementation is available at: https://www.inf.uni-hamburg.de/en/inst/ab/cv/people/wilms/attentionmask.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. TPAMI 34(11), 2189–2202 (2012)
Chen, L.C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., Adam, H.: MaskLab: instance segmentation by refining object detection with semantic and direction features. In: CVPR, pp. 4013–4022. IEEE (2018)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40(4), 834–848 (2017)
Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: CVPR, pp. 3640–3649. IEEE (2016)
Chen, L., et al.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: CVPR, pp. 6298–6306. IEEE (2017)
Cheng, M.M., Zhang, Z., Lin, W.Y., Torr, P.H.S.: BING: binarized normed gradients for objectness estimation at 300fps. In: CVPR, pp. 3286–3293. IEEE (2014)
Dai, J., He, K., Li, Y., Ren, S., Sun, J.: Instance-sensitive fully convolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 534–549. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_32
Frintrop, S., Werner, T., Martín García, G.: Traditional saliency reloaded: a good old model in new shape. In: CVPR, pp. 82–90. IEEE (2015)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448. IEEE (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587. IEEE (2014)
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS, pp. 545–552. MIT Press (2007)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE (2016)
Hong, S., Roh, B., Kim, K.H., Cheon, Y., Park, M.: PVANet: lightweight deep neural networks for real-time object detection. arXiv preprint arXiv:1611.08588 (2016)
Horbert, E., Martín García, G., Frintrop, S., Leibe, B.: Sequence-level object candidates based on saliency for generic object recognition on mobile systems. In: ICRA, pp. 127–134. IEEE (2015)
Hosang, J., Benenson, R., Dollár, P., Schiele, B.: What makes for effective detection proposals? TPAMI 38(4), 814–830 (2015)
Hou, Q., Cheng, M.M., Hu, X., Borji, A., Tu, Z., Torr, P.: Deeply supervised salient object detection with short connections. In: CVPR, pp. 5300–5309. IEEE (2017)
Hu, H., Lan, S., Jiang, Y., Cao, Z., Sha, F.: FastMask: segment multi-scale object candidates in one shot. In: CVPR, pp. 2280–2288. IEEE (2017)
Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: ICCV, pp. 262–270. IEEE (2015)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. TPAMI 20(11), 1254–1259 (1998)
Klein, D.A., Frintrop, S.: Center-surround divergence of feature statistics for salient object detection. In: ICCV, pp. 2214–2219. IEEE (2011)
Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for object detection. In: CVPR, pp. 5244–5252. IEEE (2017)
Li, H., Liu, Y., Ouyang, W., Wang, X.: Zoom out-and-in network with recursive training for object proposal. arXiv preprint arXiv:1702.05711 (2017)
Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: CVPR, pp. 4438–4446. IEEE (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440. IEEE (2015)
Pashler, H.: The Psychology of Attention. MIT Press, Cambridge (1997)
Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: NIPS, pp. 1990–1998. MIT Press (2015)
Pinheiro, P.O., Lin, T.-Y., Collobert, R., Dollár, P.: Learning to refine object segments. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 75–91. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_5
Pont-Tuset, J., Arbelaez, P., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation. TPAMI 39(1), 128–140 (2017)
Qiao, S., Shen, W., Qiu, W., Liu, C., Yuille, A.: ScaleNet: guiding object proposal generation in supermarkets and beyond. In: ICCV, pp. 1809–1818. IEEE (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788. IEEE (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99. MIT Press (2015)
Treisman, A.M., Gelade, G.: Cogn. Psychol. 12 (1980)
Tsotsos, J.K.: An inhibitory beam for attentional selection. In: Harris, L.R., Jenkin, M. (eds.) Spatial Vision in Humans and Robots. Cambridge University Press (1993)
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)
Werner, T., Martín-García, G., Frintrop, S.: Saliency-guided object candidates based on gestalt principles. In: Nalpantidis, L., Krüger, V., Eklundh, J.-O., Gasteratos, A. (eds.) ICVS 2015. LNCS, vol. 9163, pp. 34–44. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20904-3_4
Wolfe, J.M., Cave, K., Franzel, S.: Guided search: an alternative to the feature integration model for visual search. J. Exp. Psychol. Hum. Percept. Perform. 15, 97–136 (1989)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp. 2048–2057. PMLR (2015)
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: CVPR, pp. 21–29. IEEE (2016)
You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR, pp. 4651–4659. IEEE (2016)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wilms, C., Frintrop, S. (2019). AttentionMask: Attentive, Efficient Object Proposal Generation Focusing on Small Objects. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-20890-5_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20889-9
Online ISBN: 978-3-030-20890-5
eBook Packages: Computer ScienceComputer Science (R0)