Temporal Consistency Objectives Regularize the Learning of Disentangled Representations

  • Gabriele ValvanoEmail author
  • Agisilaos Chartsias
  • Andrea Leo
  • Sotirios A. Tsaftaris
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11795)


There has been an increasing focus in learning interpretable feature representations, particularly in applications such as medical image analysis that require explainability, whilst relying less on annotated data (since annotations can be tedious and costly). Here we build on recent innovations in style-content representations to learn anatomy, imaging characteristics (appearance) and temporal correlations. By introducing a self-supervised objective of predicting future cardiac phases we improve disentanglement. We propose a temporal transformer architecture that given an image conditioned on phase difference, it predicts a future frame. This forces the anatomical decomposition to be consistent with the temporal cardiac contraction in cine MRI and to have semantic meaning with less need for annotations. We demonstrate that using this regularization, we achieve competitive results and improve semi-supervised segmentation, especially when very few labelled data are available. Specifically, we show Dice increase of up to 19% and 7% compared to supervised and semi-supervised approaches respectively on the ACDC dataset. Code is available at:


Disentangled representations Semi-supervised learning Cardiac segmentation 



This work was supported by the Erasmus+ programme of the European Union, during an exchange between IMT School for Advanced Studies Lucca and the School of Engineering, University of Edinburgh. S.A. Tsaftaris acknowledges the support of the Royal Academy of Engineering and the Research Chairs and Senior Research Fellowships scheme. We thank NVIDIA Corporation for donating the Titan Xp GPU used for this research.

Supplementary material

490967_1_En_2_MOESM1_ESM.pdf (690 kb)
Supplementary material 1 (pdf 689 KB)


  1. 1.
    Bai, W., et al.: Recurrent neural networks for aortic image sequence segmentation with sparse annotations. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 586–594. Springer, Cham (2018). Scholar
  2. 2.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE PAMI 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  3. 3.
    Bengio, Y., et al.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE TMI 37(11), 2514–2525 (2018)Google Scholar
  5. 5.
    Chartsias, A., et al.: Disentangled representation learning in cardiac image analysis. Med. Image Anal. 58, 101535 (2019)CrossRefGoogle Scholar
  6. 6.
    Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In: NeurIPS, pp. 2172–2180 (2016)Google Scholar
  7. 7.
    Hsieh, J.T., Liu, B., Huang, D.A., Fei-Fei, L.F., Niebles, J.C.: Learning to decompose and disentangle representations for video prediction. In: NeurIPS, pp. 517–526 (2018)Google Scholar
  8. 8.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  9. 9.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)Google Scholar
  10. 10.
    Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: ECCV, pp. 35–51 (2018)Google Scholar
  11. 11.
    Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: On the effectiveness of least squares generative adversarial networks. IEEE PAMI PP(99), 1–13 (2018)CrossRefGoogle Scholar
  12. 12.
    Qin, C., et al.: Joint Learning of motion estimation and segmentation for cardiac MR image sequences. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp. 472–480. Springer, Cham (2018). Scholar
  13. 13.
    Qin, C., Shi, B., Liao, R., Mansi, T., Rueckert, D., Kamen, A.: Unsupervised deformable registration for multi-modal images via disentangled representations. arXiv preprint arXiv:1903.09331 (2019)
  14. 14.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  15. 15.
    Smith, L.N.: Cyclical learning rates for training neural networks. In: 2017 IEEE WACV, pp. 464–472. IEEE (2017)Google Scholar
  16. 16.
    Van Steenkiste, S., Locatello, F., Schmidhuber, J., Bachem, O.: Are disentangled representations helpful for abstract visual reasoning? arXiv preprint arXiv:1905.12506 (2019)
  17. 17.
    Wood, J.N.: A smoothness constraint on the development of object recognition. Cognition 153, 140–145 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Gabriele Valvano
    • 1
    • 2
    Email author
  • Agisilaos Chartsias
    • 2
  • Andrea Leo
    • 1
  • Sotirios A. Tsaftaris
    • 2
  1. 1.IMT School for Advanced Studies LuccaLuccaItaly
  2. 2.School of EngineeringUniversity of EdinburghEdinburghUK

Personalised recommendations