Ordinal Regression with Neuron Stick-Breaking for Medical Diagnosis

  • Xiaofeng LiuEmail author
  • Yang Zou
  • Yuhang Song
  • Chao Yang
  • Jane You
  • B. V. K. Vijaya Kumar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11134)


The classification for medical diagnosis usually involves inherently ordered labels corresponding to the level of health risk. Previous multi-task classifiers on ordinal data often use several binary classification branches to compute a series of cumulative probabilities. However, these cumulative probabilities are not guaranteed to be monotonically decreasing. It also introduces a large number of hyper-parameters to be fine-tuned manually. This paper aims to eliminate or at least largely reduce the effects of those problems. We propose a simple yet efficient way to rephrase the output layer of the conventional deep neural network. We show that our methods lead to the state-of-the-art accuracy on Diabetic Retinopathy dataset and Ultrasound Breast dataset with very little additional cost.


Medical diagnosis Ordinal regression Deep neural network Stick-breaking Unimodal label smoothing 



This work was supported in part by the National Natural Science Foundation 61308099, 61304032 and 61675202, Hong Kong Government General Research Fund GRF 152202/14E, PolyU Central Research Grant G-YBJW, Youth Innovation Promotion Association, CAS (2017264), Innovative Foundation of CIOMP, CAS (Y586320150), 11ZDGG001,CXJJ-16S038,CXJJ-17S017.


  1. 1.
    Geras, K.J., Wolfson, S., Shen, Y., Kim, S., Moy, L., Cho, K.: High-resolution breast cancer screening with multi-view deep convolutional neural networks. arXiv preprint arXiv:1703.07047 (2017)
  2. 2.
    Li, X., Kao, Y., Shen, W., Li, X., Xie, G.: Lung nodule malignancy prediction using multi-task convolutional neural network. In: Medical Imaging 2017: Computer-Aided Diagnosis, vol. 10134. International Society for Optics and Photonics (2017)Google Scholar
  3. 3.
    Gentry, A.E., Jackson-Cook, C.K., Lyon, D.E., Archer, K.J.: Penalized ordinal regression methods for predicting stage of cancer in high-dimensional covariate spaces. Cancer Inform. 14, 201–208 (2015)Google Scholar
  4. 4.
    Gulshan, V.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22), 2402–2410 (2016)CrossRefGoogle Scholar
  5. 5.
    Beckham, C., Pal, C.: A simple squared-error reformulation for ordinal classification. arXiv preprint arXiv:1612.00775 (2016)
  6. 6.
    Ratner, V., Shoshan, Y., Kachman, T.: Learning multiple non-mutually-exclusive tasks for improved classification of inherently ordered labels. arXiv preprint arXiv:1805.11837 (2018)
  7. 7.
    Eidinger, E., Enbar, R., Hassner, T.: Age and gender estimation of unfiltered faces. IEEE Trans. Inf. Forensics Secur. 9(12), 2170–2179 (2014)CrossRefGoogle Scholar
  8. 8.
    Zhao, R., Gan, Q., Wang, S., Ji, Q.: Facial expression intensity estimation using ordinal information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3466–3474 (2016)Google Scholar
  9. 9.
    Cardoso, J.S., da Costa, J.F.P., Cardoso, M.J.: Modelling ordinal relations with SVMs: an application to objective aesthetic evaluation of breast cancer conservative treatment. Neural Netw. 18(5–6), 808–817 (2005)CrossRefGoogle Scholar
  10. 10.
    Koren, Y., Sill, J.: OrdRec: an ordinal model for predicting personalized item rating distributions. In: Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 117–124. ACM (2011)Google Scholar
  11. 11.
    Niu, Z., Zhou, M., Wang, L., Gao, X., Hua, G.: Ordinal regression with multiple output CNN for age estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4920–4928 (2016)Google Scholar
  12. 12.
    Geng, X., Zhou, Z.H., Smith-Miles, K.: Automatic age estimation based on facial aging patterns. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2234–2240 (2007)CrossRefGoogle Scholar
  13. 13.
    Fu, Y., Huang, T.S.: Human age estimation with regression on discriminative aging manifold. IEEE Trans. Multimed. 10(4), 578–584 (2008)CrossRefGoogle Scholar
  14. 14.
    Chang, K.Y., Chen, C.S., Hung, Y.P.: Ordinal hyperplanes ranker with cost sensitivities for age estimation. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 585–592 IEEE (2011)Google Scholar
  15. 15.
    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)Google Scholar
  16. 16.
    Chen, S., Zhang, C., Dong, M., Le, J., Rao, M.: Using ranking-CNN for age estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  17. 17.
    Cheng, J., Wang, Z., Pollastri, G.: A neural network approach to ordinal regression. In: IEEE International Joint Conference on Neural Networks, IJCNN 2008 (IEEE World Congress on Computational Intelligence), pp. 1279–1284. IEEE (2008)Google Scholar
  18. 18.
    Frank, E., Hall, M.: A simple approach to ordinal classification. In: De Raedt, L., Flach, P. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 145–156. Springer, Heidelberg (2001). Scholar
  19. 19.
    Hou, L., Yu, C.P., Samaras, D.: Squared earth mover’s distance-based loss for training deep neural networks. arXiv preprint arXiv:1611.05916 (2016)
  20. 20.
    da Costa, J.F.P., Alonso, H., Cardoso, J.S.: The unimodal model for the classification of ordinal data. Neural Netw. 21(1), 78–91 (2008)CrossRefGoogle Scholar
  21. 21.
    Beckham, C., Pal, C.: Unimodal probability distributions for deep ordinal classification. arXiv preprint arXiv:1705.05278 (2017)
  22. 22.
    Nishikawa, R.M., Comstock, C.E., Linver, M.N., Newstead, G.M., Sandhir, V., Schmidt, R.A.: Agreement between radiologists’ interpretations of screening mammograms. In: Tingberg, A., Lång, K., Timberg, P. (eds.) IWDM 2016. LNCS, vol. 9699, pp. 3–10. Springer, Cham (2016). Scholar
  23. 23.
    Salazar, A.J., Romero, J.A., Bernal, O.A., Moreno, A.P., Velasco, S.C.: Reliability of the BI-RADS final assessment categories and management recommendations in a telemammography context. J. Am. Coll. Radiol. 14(5), 686–692 (2017)CrossRefGoogle Scholar
  24. 24.
    Khan, M., Mohamed, S., Marlin, B., Murphy, K.: A stick-breaking likelihood for categorical data analysis with latent Gaussian models. In: Artificial Intelligence and Statistics, pp. 610–618 (2012)Google Scholar
  25. 25.
    Sethuraman, J.: A constructive definition of Dirichlet priors. Statistica Sinica 4, 639–650 (1994)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Agresti, A.: An Introduction to Categorical Data Analysis, vol. 135. Wiley, New York (1996)zbMATHGoogle Scholar
  27. 27.
    Wan Kai, P.: Continuation-ratio model for categorical data: a Gibbs sampling approach. In: Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1 (2008)Google Scholar
  28. 28.
    Frigyik, B.A., Kapila, A., Gupta, M.R.: Introduction to the Dirichlet distribution and related processes. Department of Electrical Engineering, University of Washignton, UWEETR-2010-0006 (2010)Google Scholar
  29. 29.
    Gutiérrez, P.A., Tiňo, P., Hervás-Martínez, C.: Ordinal regression neural networks based on concentric hyperspheres. Neural Netw. 59, 51–60 (2014)CrossRefGoogle Scholar
  30. 30.
    Cohen, J.: Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213 (1968)CrossRefGoogle Scholar
  31. 31.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  32. 32.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  33. 33.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.Fanhan Information TechSuzhouChina
  3. 3.University of Southern CaliforniaLos AngelesUSA
  4. 4.The Hong Kong Polytechnic UniversityKowloonHong Kong
  5. 5.Carnegie Mellon University AfricaKigaliRwanda

Personalised recommendations