Advertisement

Semi-convolutional Operators for Instance Segmentation

  • David NovotnyEmail author
  • Samuel Albanie
  • Diane Larlus
  • Andrea Vedaldi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

Object detection and instance segmentation are dominated by region-based methods such as Mask RCNN. However, there is a growing interest in reducing these problems to pixel labeling tasks, as the latter could be more efficient, could be integrated seamlessly in image-to-image network architectures as used in many other tasks, and could be more accurate for objects that are not well approximated by bounding boxes. In this paper we show theoretically and empirically that constructing dense pixel embeddings that can separate object instances cannot be easily achieved using convolutional operators. At the same time, we show that simple modifications, which we call semi-convolutional, have a much better chance of succeeding at this task. We use the latter to show a connection to Hough voting as well as to a variant of the bilateral kernel that is spatially steered by a convolutional network. We demonstrate that these operators can also be used to improve approaches such as Mask RCNN, demonstrating better segmentation of complex biological shapes and PASCAL VOC categories than achievable by Mask RCNN alone.

Keywords

Instance embedding Object detection Instance segmentation Coloring Semi-convolutional 

Notes

Acknowledgments

We gratefully acknowledge the support of Naver, EPSRC AIMS CDT, AWS ML Research Award, and ERC 677195-IDIU.

Supplementary material

474172_1_En_6_MOESM1_ESM.pdf (212 kb)
Supplementary material 1 (pdf 212 KB)

References

  1. 1.
    Harley, A.W., Derpanis, K.G., Kokkinos, I.: Segmentation-aware convolutional networks using local attention masks. In: Proceedings of the ICCV (2017)Google Scholar
  2. 2.
    Andriluka, M., Stewart, R., Ng, A.Y.: End-to-end people detection in crowded scenes. In: Proceeding of the CVPR (2016)Google Scholar
  3. 3.
    Arnab, A., Torr, P.H.S.: Pixelwise instance segmentation with a dynamically instantiated network. In: Proceedings of the CVPR (2017)Google Scholar
  4. 4.
    Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: Proceedings of the CVPR (2017)Google Scholar
  5. 5.
    Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. In: Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, pp. 714–725. Morgan Kaufmann Publishers Inc., San Francisco (1987)Google Scholar
  6. 6.
    Chandra, S., Usunier, N., Kokkinos, I.: Dense and low-rank gaussian CRFs using deep embeddings. In: Proceedings of the ICCV (2017)Google Scholar
  7. 7.
    Chen, Y.T., Liu, X., Yang, M.H.: Multi-instance object segmentation with occlusion handling. In: Proceedings of the CVPR (2015)Google Scholar
  8. 8.
    Dai, J., He, K., Li, Y., Ren, S., Sun, J.: Instance-sensitive fully convolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 534–549. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_32CrossRefGoogle Scholar
  9. 9.
    Dai, J., He, K., Sun, J.: Convolutional feature masking for joint object and stuff segmentation. In: Proceedings of the CVPR (2015)Google Scholar
  10. 10.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the CVPR (2016)Google Scholar
  11. 11.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. In: Proceedings of the NIPS (2016)Google Scholar
  12. 12.
    De Brabandere, B., Neven, D., Van Gool, L.: Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551 (2017)
  13. 13.
    Duda, R.O., Hart, P.E.: Use of the hough transformation to detect lines and curves in pictures. Commun. ACM 15(1), 11–15 (1972)CrossRefGoogle Scholar
  14. 14.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
  15. 15.
    Fathi, A., et al.: Semantic instance segmentation via deep metric learning. CoRR abs/1703.10277 (2017)Google Scholar
  16. 16.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. IJCV 59(2), 167–181 (2004)CrossRefGoogle Scholar
  17. 17.
    Feragen, A., Lauze, F., Hauberg, S.: Geodesic exponential kernels: when curvature and linearity conflict. In: Proceedings of the CVPR (2015)Google Scholar
  18. 18.
    Girshick, R.: Fast r-CNN. In: Proceedings of the ICCV (2015)Google Scholar
  19. 19.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the CVPR (2014)Google Scholar
  20. 20.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10584-0_20CrossRefGoogle Scholar
  21. 21.
    Harley, A.W., Derpanis, K.G., Kokkinos, I.: Learning dense convolutional embeddings for semantic segmentation. In: Proceedings of the ICLR (2016)Google Scholar
  22. 22.
    Hayder, Z., He, X., Salzmann, M.: Boundary-aware instance segmentation. In: Proceedings of the CVPR (2017)Google Scholar
  23. 23.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-CNN. In: Proceedings of the ICCV (2017)Google Scholar
  24. 24.
    Hu, H., Lan, S., Jiang, Y., Cao, Z., Sha, F.: Fastmask: segment multi-scale object candidates in one shot. In: Proceedings of the CVPR (2017)Google Scholar
  25. 25.
    Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the CVPR (2017)Google Scholar
  26. 26.
    Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B., Rother, C.: Instancecut: from edges to instances with multicut. In: Proceedings of the CVPR, July 2017Google Scholar
  27. 27.
    Kokkinos, I.: Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the CVPR (2017)Google Scholar
  28. 28.
    Kong, S., Fowlkes, C.: Recurrent pixel embedding for instance grouping. In: Proceedings of the CVPR (2018)Google Scholar
  29. 29.
    Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? Combining object detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15561-1_31CrossRefGoogle Scholar
  30. 30.
    Leibe, B., Schiele, B.: Interleaving object categorization and segmentation. In: Christensen, H.I., Nagel, H.-H. (eds.) Cognitive Vision Systems. LNCS, vol. 3948, pp. 145–161. Springer, Heidelberg (2006).  https://doi.org/10.1007/11414353_10CrossRefGoogle Scholar
  31. 31.
    Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the CVPR (2017)Google Scholar
  32. 32.
    Liang, X., Wei, Y., Shen, X., Jie, Z., Feng, J., Lin, L., Yan, S.: Reversible recursive instance-level object segmentation. In: Proceedings of the CVPR (2016)Google Scholar
  33. 33.
    Liang, X., Wei, Y., Shen, X., Yang, J., Lin, L., Yan, S.: Proposal-free network for instance-level object segmentation. PAMI (2017)Google Scholar
  34. 34.
    Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proc. CVPR (2017)Google Scholar
  35. 35.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  36. 36.
    Liu, R., et al.: An intriguing failing of convolutional neural networks and the coordconv solution. arXiv preprint arXiv:1807.03247 (2018)
  37. 37.
    Liu, S., Jia, J., Fidler, S., Urtasun, R.: Sgn: Sequential grouping networks for instance segmentation. In: Proceeding of the ICCV (2017)Google Scholar
  38. 38.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  39. 39.
    Ljosa, V., Sokolnicki, K.L., Carpenter, A.E.: Annotated high-throughput microscopy image sets for validation. Nat. Methods 9(7), 637 (2012)CrossRefGoogle Scholar
  40. 40.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the CVPR (2015)Google Scholar
  41. 41.
    Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: Proceedings of the NIPS (2017)Google Scholar
  42. 42.
    Pinheiro, P.O., Collobert, R., Dollár, P.: Learning to segment object candidates. In: Proceedings of the NIPS (2015)Google Scholar
  43. 43.
    Abaev, P., Gaidamaka, Y., Samouylov, K.E.: Queuing model for loss-based overload control in a SIP server using a hysteretic technique. In: Andreev, S., Balandin, S., Koucheryavy, Y. (eds.) NEW2AN/ruSMART -2012. LNCS, vol. 7469, pp. 371–378. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-32686-8_34CrossRefGoogle Scholar
  44. 44.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the CVPR (2017)Google Scholar
  45. 45.
    Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: Proceedings of the CVPR (2017)Google Scholar
  46. 46.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the NIPS (2015)Google Scholar
  47. 47.
    Romera-Paredes, B., Torr, P.H.S.: Recurrent instance segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 312–329. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_19CrossRefGoogle Scholar
  48. 48.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI 22(8), 888–905 (2000)CrossRefGoogle Scholar
  49. 49.
    Silberman, N., Sontag, D., Fergus, R.: Instance segmentation of indoor scenes using a coverage loss. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 616–631. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10590-1_40CrossRefGoogle Scholar
  50. 50.
    Tighe, J., Niethammer, M., Lazebnik, S.: Scene parsing with object instances and occlusion ordering. In: Proceedings of the CVPR (2014)Google Scholar
  51. 51.
    Uhrig, J., Cordts, M., Franke, U., Brox, T.: Pixel-Level encoding and depth layering for instance-level semantic labeling. In: Rosenhahn, B., Andres, B. (eds.) GCPR 2016. LNCS, vol. 9796, pp. 14–25. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-45886-1_2CrossRefGoogle Scholar
  52. 52.
    Wählby, C., et al.: Resolving clustered worms via probabilistic shape models. In: 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 552–555. IEEE (2010)Google Scholar
  53. 53.
    Wang, L., Lu, H., Ruan, X., Yang, M.H.: Deep networks for saliency detection via local estimation and global search. In: Proceedings of the CVPR, June 2015Google Scholar
  54. 54.
    Yurchenko, V., Lempitsky, V.: Parsing images of overlapping organisms with deep singling-out networks. In: Proceedings of the CVPR (2017)Google Scholar
  55. 55.
    Zhang, Z., Schwing, A.G., Fidler, S., Urtasun, R.: Monocular object instance segmentation and depth ordering with cnns. In: Proceedings of the ICCV (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Visual Geometry Group, Department of Engineering ScienceUniversity of OxfordOxfordUK
  2. 2.Computer Vision GroupNAVER LABS EuropeMeylanFrance

Personalised recommendations