Advertisement

Learning to Generate Object Segment Proposals with Multi-modal Cues

  • Haoyang ZhangEmail author
  • Xuming He
  • Fatih Porikli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10111)

Abstract

This paper presents a learning-based object segmentation proposal generation method for stereo images. Unlike existing methods which mostly rely on low-level appearance cue and handcrafted similarity functions to group segments, our method makes use of learned deep features and designed geometric features to represent a region, as well as a learned similarity network to guide the grouping process. Given an initial segmentation hierarchy, we sequentially merge adjacent regions in each level based on their affinity measured by the similarity network. This merging process generates new segmentation hierarchies, which are then used to produce a pool of regional proposals by taking region singletons, pairs, triplets and 4-tuples from them. In addition, we learn a ranking network that predicts the objectness score of each regional proposal and diversify the ranking based on Maximum Marginal Relevance measures. Experiments on the Cityscapes dataset show that our approach performs significantly better than the baseline and the current state-of-the-art.

Notes

Acknowledgment

Data61 is part of the Commonwealth Scientific and Industrial Research Organisation (CSIRO) which is the federal government agency for scientific research in Australia. The Tesla K40 used for this research was donated by the NVIDIA Corporation.

References

  1. 1.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2015)Google Scholar
  2. 2.
    Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015)Google Scholar
  3. 3.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  4. 4.
    Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 73–80. IEEE (2010)Google Scholar
  5. 5.
    Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Intl. J. Comput. Vis. 104, 154–171 (2013)CrossRefGoogle Scholar
  6. 6.
    Cheng, M.M., Zhang, Z., Lin, W.Y., Torr, P.: Bing: Binarized normed gradients for objectness estimation at 300fps. In: IEEE CVPR (2014)Google Scholar
  7. 7.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_26 Google Scholar
  8. 8.
    Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1312–1328 (2012)CrossRefGoogle Scholar
  9. 9.
    Pont-Tuset, J., Arbeláez, P., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation (2015). arXiv:1503.00848
  10. 10.
    Endres, I., Hoiem, D.: Category independent object proposals. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 575–588. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15555-0_42 CrossRefGoogle Scholar
  11. 11.
    Krähenbühl, P., Koltun, V.: Learning to propose objects. In: CVPR (2015)Google Scholar
  12. 12.
    Pinheiro, P.O., Collobert, R., Dollar, P.: Learning to segment object candidates. In: Advances in Neural Information Processing Systems, pp. 1981–1989 (2015)Google Scholar
  13. 13.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades, arXiv preprint (2015). arXiv:1512.04412
  14. 14.
    Bleyer, M., Rhemann, C., Rother, C.: Extracting 3D scene-consistent object proposals and depth from stereo images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 467–481. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33715-4_34 CrossRefGoogle Scholar
  15. 15.
    Chen, X., Kundu, K., Zhu, Y., Berneshawi, A., Ma, H., Fidler, S., Urtasun, R.: 3d object proposals for accurate object class detection. In: NIPS (2015)Google Scholar
  16. 16.
    Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding, arXiv preprint (2016). arXiv:1604.01685
  17. 17.
    Hosang, J., Benenson, R., Dollár, P., Schiele, B.: What makes for effective detection proposals? arXiv preprint (2015). arXiv:1502.05082
  18. 18.
    Humayun, A., Li, F., Rehg, J.M.: The middle child problem: revisiting parametric min-cut and seeds for object proposals. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1600–1608 (2015)Google Scholar
  19. 19.
    Lee, T., Fidler, S., Dickinson, S.: Learning to combine mid-level cues for object proposal generation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1680–1688 (2015)Google Scholar
  20. 20.
    Wang, C., Zhao, L., Liang, S., Zhang, L., Jia, J., Wei, Y.: Object proposal by multi-branch hierarchical segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3873–3881 (2015)Google Scholar
  21. 21.
    Rantalankila, P., Kannala, J., Rahtu, E.: Generating object segmentation proposals using global and local search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2417–2424 (2014)Google Scholar
  22. 22.
    Yanulevskaya, V., Uijlings, J., Sebe, N.: Learning to group objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3134–3141 (2014)Google Scholar
  23. 23.
    Kuo, W., Hariharan, B., Malik, J.: Deepbox: Learning objectness with convolutional networks, arXiv preprint (2015). arXiv:1505.02146
  24. 24.
    Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., Van Gool, L.: Deepproposal: hunting objects by cascading deep convolutional layers. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2578–2586 (2015)Google Scholar
  25. 25.
    Sharma, A., Tuzel, O., Liu, M.Y.: Recursive context propagation network for semantic scene labeling. In: Advances in Neural Information Processing Systems, pp. 2447–2455 (2014)Google Scholar
  26. 26.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)Google Scholar
  27. 27.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  28. 28.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)Google Scholar
  29. 29.
    Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  30. 30.
    Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3992–4000 (2015)Google Scholar
  31. 31.
    Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1558–1570 (2015)CrossRefGoogle Scholar
  32. 32.
    Vedaldi, A., Lenc, K.: Matconvnet: convolutional neural networks for matlab. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, pp. 689–692. ACM (2015)Google Scholar
  33. 33.
    Yamaguchi, K., McAllester, D., Urtasun, R.: Efficient joint segmentation, occlusion labeling, stereo and flow estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 756–771. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_49 Google Scholar
  34. 34.
    Krähenbühl, P., Koltun, V.: Geodesic object proposals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 725–739. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_47 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.The Australian National UniversityCanberraAustralia
  2. 2.Data61CSIROCanberraAustralia

Personalised recommendations