Labeled datasets for semantic segmentation are imperfect, especially in medical imaging where borders are often subtle or ill-defined. Little work has been done to analyze the effect that label errors have on the performance of segmentation methodologies. Here we present a large-scale study of model performance in the presence of varying types and degrees of error in training data. We trained U-Net, SegNet, and FCN32 several times for liver segmentation with 10 different modes of ground-truth perturbation. Our results show that for each architecture, performance steadily declines with boundary-localized errors, however, U-Net was significantly more robust to jagged boundary errors than the other architectures. We also found that each architecture was very robust to non-boundary-localized errors, suggesting that boundary-localized errors are fundamentally different and more challenging problem than random label errors in a classification setting.



Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA225435. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


  1. 1.
    Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)Google Scholar
  2. 2.
    Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1988). Scholar
  3. 3.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. CoRR abs/1511.00561 (2015).
  4. 4.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)CrossRefGoogle Scholar
  5. 5.
    Chang, R.F., Wu, W.J., Moon, W.K., Chen, D.R.: Automatic ultrasound segmentation and morphology based diagnosis of solid breast tumors. Breast Cancer Res. Treat. 89(2), 179 (2005). Scholar
  6. 6.
    Chollet, F., et al.: Keras (2015).
  7. 7.
    Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014). Scholar
  8. 8.
    Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., Lew, M.S.: Deep learning for visual understanding: a review. Neurocomputing 187, 27–48 (2016)CrossRefGoogle Scholar
  9. 9.
    Cardoso, M.J., et al. (eds.): LABELS/CVII/STENT 2017. LNCS, vol. 10552. Springer, Cham (2017). Scholar
  10. 10.
    Huff, T.J., Ludwig, P.E., Zuniga, J.M.: The potential for machine learning algorithms to improve and reduce the cost of 3-dimensional printing for surgical planning. Expert Rev. Med. Dev. 15(5), 349–356 (2018). pMID: 29723481CrossRefGoogle Scholar
  11. 11.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint: arXiv:1412.6980
  12. 12.
    Le, T.N., et al.: Liver tumor segmentation from MR images using 3D fast marching algorithm and single hidden layer feedforward neural network. BioMed. Res. Int. 2016, 8 (2016)Google Scholar
  13. 13.
    Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)Google Scholar
  14. 14.
    Muhlenbach, F., Zighed, D.A.: Relabeling mislabeled instances, pp. 5–15 (2002)Google Scholar
  15. 15.
    Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep Feature Learning for Knee Cartilage Segmentation Using a Triplanar Convolutional Neural Network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013). Scholar
  16. 16.
    Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., Rabinovich, A.: Training deep neural networks on noisy labels with bootstrapping, pp. 1–11 (2014)., Scholar
  17. 17.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. CoRR abs/1505.04597 (2015), Scholar
  18. 18.
    Roth, H.R., et al.: DeepOrgan: multi-level deep convolutional networks for automated pancreas segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part I. LNCS, vol. 9349, pp. 556–564. Springer, Cham (2015). Scholar
  19. 19.
    Roth, H.R., et al.: An application of cascaded 3D fully convolutional networks for medical image segmentation. CoRR abs/1803.05431 (2018). Scholar
  20. 20.
    Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. CoRR abs/1605.06211 (2016).
  21. 21.
    Verbaeten, S., Van Assche, A.: Ensemble methods for noise elimination in classification problems. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 317–325. Springer, Heidelberg (2003). Scholar
  22. 22.
    Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 7–12 June, pp. 2691–2699 (2015).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Nicholas Heller
    • 1
    Email author
  • Joshua Dean
    • 1
  • Nikolaos Papanikolopoulos
    • 1
  1. 1.Computer Science and EngineeringUniversity of Minnesota – Twin CitiesMinneapolisUSA

Personalised recommendations