Simultaneous Semantic Segmentation and Outlier Detection in Presence of Domain Shift
Abstract
Recent success on realistic road driving datasets has increased interest in exploring robust performance in real-world applications. One of the major unsolved problems is to identify image content which can not be reliably recognized with a given inference engine. We therefore study approaches to recover a dense outlier map alongside the primary task with a single forward pass, by relying on shared convolutional features. We consider semantic segmentation as the primary task and perform extensive validation on WildDash val (inliers), LSUN val (outliers), and pasted objects from Pascal VOC 2007 (outliers). We achieve the best validation performance by training to discriminate inliers from pasted ImageNet-1k content, even though ImageNet-1k contains many road-driving pixels, and, at least nominally, fails to account for the full diversity of the visual world. The proposed two-head model performs comparably to the C-way multi-class model trained to predict uniform distribution in outliers, while outperforming several other validated approaches. We evaluate our best two models on the WildDash test dataset and set a new state of the art on the WildDash benchmark.
Supplementary material
References
- 1.Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
- 2.Bevandic, P., Kreso, I., Orsic, M., Segvic, S.: Discriminative out-of-distribution detection for semantic segmentation. CoRR abs/1808.07703 (2018)Google Scholar
- 3.Blum, H., Sarlin, P., Nieto, J.I., Siegwart, R., Cadena, C.: The Fishyscapes benchmark: measuring blind spots in semantic segmentation. CoRR abs/1904.03215Google Scholar
- 4.Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2019)Google Scholar
- 5.Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_5CrossRefGoogle Scholar
- 6.Bulò, S.R., Porzi, L., Kontschieder, P.: In-place activated BatchNorm for memory-optimized training of DNNs. CoRR, abs/1712.02616, December 5 2017Google Scholar
- 7.Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997). https://doi.org/10.1023/A:1007379606734MathSciNetCrossRefGoogle Scholar
- 8.Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49CrossRefGoogle Scholar
- 9.Cordts, M., et al.: The cityscapes dataset. In: CVPRW (2015)Google Scholar
- 10.Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)Google Scholar
- 11.DeVries, T., Taylor, G.W.: Learning confidence for out-of-distribution detection in neural networks. CoRR abs/1802.04865 (2018)Google Scholar
- 12.Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV, pp. 2650–2658 (2015)Google Scholar
- 13.Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput, Vision (2010)CrossRefGoogle Scholar
- 14.Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32, 1231–1237 (2013)CrossRefGoogle Scholar
- 15.Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
- 16.Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML, pp. 1321–1330 (2017)Google Scholar
- 17.He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
- 18.He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23CrossRefGoogle Scholar
- 19.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
- 20.Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: ICLR (2017)Google Scholar
- 21.Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. In: ICLR (2019)Google Scholar
- 22.Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
- 23.Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NIPS, pp. 5574–5584 (2017)Google Scholar
- 24.Kong, S., Fowlkes, C.: Pixel-wise attentional gating for parsimonious pixel labeling. arxiv 1805.01556 (2018)Google Scholar
- 25.Kreso, I., Krapac, J., Segvic, S.: Ladder-style DenseNets for semantic segmentation of large natural images. In: ICCV CVRSUAD 2017, pp. 238–245 (2017)Google Scholar
- 26.Kreso, I., Krapac, J., Segvic, S.: Efficient ladder-style DenseNets for semantic segmentation of large images. CoRR abs/1905.05661 (2019)Google Scholar
- 27.Kreso, I., Orsic, M., Bevandic, P., Segvic, S.: Robust semantic segmentation with ladder-DenseNet models. CoRR abs/1806.03465 (2018)Google Scholar
- 28.Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: NIPS, pp. 6402–6413 (2017)Google Scholar
- 29.Lee, K., Lee, H., Lee, K., Shin, J.: Training confidence-calibrated classifiers for detecting out-of-distribution samples. In: ICLR (2018)Google Scholar
- 30.Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: NeurIPS (2018)Google Scholar
- 31.Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: ICLR (2018)Google Scholar
- 32.Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)Google Scholar
- 33.Meletis, P., Dubbelman, G.: Training of convolutional networks on multiple heterogeneous datasets for street scene semantic segmentation. In: IV (2018)Google Scholar
- 34.Nalisnick, E.T., Matsukawa, A., Teh, Y.W., Görür, D., Lakshminarayanan, B.: Do deep generative models know what they don’t know? In: ICLR (2019)Google Scholar
- 35.Neuhold, G., Ollmann, T., Bulò, S.R., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: ICCV (2017)Google Scholar
- 36.Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Y. Ng, A.: Multimodal deep learning. In: ICML, pp. 689–696 (2011)Google Scholar
- 37.Sabokrou, M., Khalooei, M., Fathy, M., Adeli, E.: Adversarially learned one-class classifier for novelty detection. In: CVPR, pp. 3379–3388 (2018)Google Scholar
- 38.Scheirer, W.J., de Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1757–1772 (2013)CrossRefGoogle Scholar
- 39.Shafaei, A., Schmidt, M., Little, J.J.: Does your model know the digit 6 is not a cat? a less biased evaluation of “outlier” detectors. CoRR abs/1809.04729 (2018)Google Scholar
- 40.Smith, L., Gal, Y.: Understanding measures of uncertainty for adversarial example detection. In: UAI, abs/1803.08533 (2018)Google Scholar
- 41.Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR, June 2011. https://doi.org/10.1109/CVPR.2011.5995347
- 42.Vyas, A., Jammalamadaka, N., Zhu, X., Das, D., Kaul, B., Willke, T.L.: Out-of-distribution detection using an ensemble of self supervised leave-out classifiers. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 560–574. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_34CrossRefGoogle Scholar
- 43.Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: CVPR (2017)Google Scholar
- 44.Yu, F., Zhang, Y., Song, S., Seff, A., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. CoRR abs/1506.03365 (2015)Google Scholar
- 45.Zamir, A.R., Sax, A., Shen, W.B., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR (2018)Google Scholar
- 46.Zendel, O., Honauer, K., Murschitz, M., Steininger, D., Domínguez, G.F.: WildDash - creating hazard-aware benchmarks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 407–421. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_25CrossRefGoogle Scholar
- 47.Zendel, O., Murschitz, M., Humenberger, M., Herzner, W.: How good is my test data? introducing safety analysis for computer vision. Int. J. Comput. Vis. 125(1–3), 95–109 (2017)MathSciNetCrossRefGoogle Scholar
- 48.Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)Google Scholar