Skip to main content

Category Aggregation Among Region Proposals for Object Detection

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing - PCM 2016 (PCM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9917))

Included in the following conference series:

Abstract

Recently, an overwhelming majority of object detection methods have focused on how to reduce the number of region proposals while keeping high object recall without consideration of category information. It may lead to a lot of false positives due to the interferences between categories especially when the number of categories is very large. To eliminate such interferences, we propose a novel category aggregation approach based upon our observation that more frequently detected categories around an object have the higher probabilities to be present in an image. After further exploiting the co-occurrence relationship between categories, we can determine the most possible categories for an image in advance. Thus, many false positives can be greatly filtered out before subsequent classification process. Our extensive experiments on the well-known ILSVRC 2015 detection dataset show that our approach can achieve 49.0% of mAP in the validation dataset and 45.36% of mAP in the test dataset ranked 5th in the ILSVRC 2015 detection task.

This work was supported by 863 Project (2014AA015202), National Nature Science Foundation of China (61572472), Beijing Natural Science Foundation (4152050) and Beijing Advanced Innovation Center for Imaging Technology (BAICIT-2016009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sermanet, P., Eigen, D., Zhang, X., et al.: Overfeat integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)

  2. Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Cascade object detection with deformable part models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2241–2248. IEEE (2010)

    Google Scholar 

  3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  4. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  5. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  6. Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  7. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  8. Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: 2010 IEEE Conference on IEEE Computer Vision and Pattern Recognition (CVPR), pp. 73–80 (2010)

    Google Scholar 

  9. Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., et al.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)

    Article  Google Scholar 

  10. Cheng, M.M., Zhang, Z., Lin, W.Y., et al.: BING: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3286–3293 (2014)

    Google Scholar 

  11. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_26

    Google Scholar 

  12. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  13. Erhan, D., Szegedy, C., Toshev, A., et al.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)

    Google Scholar 

  14. Arbelez, P., Pont-Tuset, J., Barron, J., et al.: Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014)

    Google Scholar 

  15. Qi, G.J., Hua, X.S., Rui, Y., et al.: Correlative multi-label video annotation. In: Proceedings of the 15th International Conference on Multimedia, pp. 17–26. ACM (2007)

    Google Scholar 

  16. Jiang, W., Chang, S.F., Loui, A.C.: Active context-based concept fusion with partial user labels. In: 2006 IEEE International Conference on Image Processing, pp. 2917–2920. IEEE (2006)

    Google Scholar 

  17. Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)

    Google Scholar 

  18. Ouyang, W., Wang, X., Zeng, X., et al.: DeepID-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2015)

    Google Scholar 

  19. He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  20. Weng, M.F., Chuang, Y.Y.: Multi-cue fusion for semantic video indexing. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 71–80. ACM (2008)

    Google Scholar 

  21. Choi, M.J., Lim, J.J., Torralba, A., et al.: Exploiting hierarchical context on a large database of object categories. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 129–136. IEEE (2010)

    Google Scholar 

  22. Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)

    Google Scholar 

  23. Oquab, M., Bottou, L., Laptev, I., et al.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)

    Google Scholar 

  24. Zheng, L., Wang, S., Liu, Z., et al.: Packing, padding: coupled multi-index for accurate image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946 (2014)

    Google Scholar 

  25. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Li, L., Tang, S., Zhou, J., Wang, B., Tian, Q. (2016). Category Aggregation Among Region Proposals for Object Detection. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48896-7_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48895-0

  • Online ISBN: 978-3-319-48896-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics