Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation

  • Jonas WulffEmail author
  • Michael J. Black
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11269)


The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic losses. Proxy tasks can overcome these issues, and start by training a network for a task for which annotation is easier or which can be trained unsupervised. The trained network is then fine-tuned for the original task using small amounts of ground truth data. Here, we investigate frame interpolation as a proxy task for optical flow. Using real movies, we train a CNN unsupervised for temporal interpolation. Such a network implicitly estimates motion, but cannot handle untextured regions. By fine-tuning on small amounts of ground truth flow, the network can learn to fill in homogeneous regions and compute full optical flow fields. Using this unsupervised pre-training, our network outperforms similar architectures that were trained supervised using synthetic optical flow.


Acknowledgements and Disclosure

JW was supported by the Max Planck ETH Center for Learning Systems. MJB has received research funding from Intel, Nvidia, Adobe, Facebook, and Amazon. While MJB is a part-time employee of Amazon, this research was performed solely at, and funded solely by, MPI.


  1. 1.
    Ahmadi, A., Patras, I.: Unsupervised convolutional neural networks for motion estimation. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1629–1633, September 2016. DOI:
  2. 2.
    Alletto, S., Abati, D., Calderara, S., Cucchiara, R., Rigazio, L.: TransFlow: unsupervised motion flow by joint geometric and pixel-level estimation. Technical report, arXiv preprint arXiv:1706.00322 (2017)
  3. 3.
    Blanco, J.L., Moreno, F.A., Gonzalez-Jimenez, J.: The malaga urban dataset: high-rate stereo and LiDARs in a realistic urban scenario. Int. J. Robot. Res. 33(2), 207–214 (2014). Scholar
  4. 4.
    Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). Scholar
  5. 5.
    Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV (2015)Google Scholar
  6. 6.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: International Conference on Computer Vision (ICCV) (2015)Google Scholar
  7. 7.
    Fragkiadaki, A., Seybold, B., Sukthankar, R., Vijayanarasimhan, S., Ricco, S.: Self-supervised learning of structure and motion from video. arxiv 2017 (2017)Google Scholar
  8. 8.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013). Scholar
  9. 9.
    Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, July 2017Google Scholar
  10. 10.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  11. 11.
    Güney, F., Geiger, A.: Deep discrete flow. In: ACCV (2016)Google Scholar
  12. 12.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR (2017)Google Scholar
  13. 13.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, k.: Spatial transformer networks. In: NIPS (2015)Google Scholar
  14. 14.
    Janai, J., Güney, F., Wulff, J., Black, M., Geiger, A.: Slow Flow: exploiting high-speed cameras for accurate and diverse optical flow reference data. In: CVPR (2017)Google Scholar
  15. 15.
    Lai, W.S., Huang, J.B., Yang, M.H.: Semi-supervised learning for optical flow with generative adversarial networks. In: NIPS (2017)Google Scholar
  16. 16.
    Larsson, G., Maire, M., Shakhnarovich, G.: Colorization as a proxy task for visual understanding. In: CVPR (2017)Google Scholar
  17. 17.
    Liu, C., Freeman, W.T., Adelson, E.H., Weiss, Y.: Human-assisted motion annotation. In: CVPR (2008)Google Scholar
  18. 18.
    Long, G., Kneip, L., Alvarez, J.M., Li, H., Zhang, X., Yu, Q.: Learning image matching by simply watching video. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 434–450. Springer, Cham (2016). Scholar
  19. 19.
    Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR (2016)Google Scholar
  20. 20.
    Meinhardt, T., Moller, M., Hazirbas, C., Cremers, D.: Learning proximal operators: using denoising networks for regularizing inverse imaging problems. In: ICCV (2017)Google Scholar
  21. 21.
    Meister, S., Hur, J., Roth, S.: UnFlow: unsupervised learning of optical flow with a bidirectional census loss. arXiv preprint arXiv:1711.07837 (2017)
  22. 22.
    Menze, M., Heipke, C., Geiger, A.: Discrete optimization for optical flow. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 16–28. Springer, Cham (2015). Scholar
  23. 23.
    Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: ICCV (2017)Google Scholar
  24. 24.
    Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. Technical report, arXiv (2016)Google Scholar
  25. 25.
    Ren, Z., Yan, J., Ni, B., Liu, B., Yang, X., Zha, H.: Unsupervised deep learning for optical flow estimation. In: AAAI Conference on Artificial Intelligence (2017)Google Scholar
  26. 26.
    Richter, S.R., Hayder, Z., Koltun, V.: Playing for Benchmarks. In: ICCV (2017)Google Scholar
  27. 27.
    Sedaghat, N., Zolfaghari, M., Brox, T.: Hybrid learning of optical flow and next frame prediction to boost optical flow in the wild. Technical report, arXiv:1612.03777 (2017)
  28. 28.
    Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: ICCV (2017)Google Scholar
  29. 29.
    Sun, D., Roth, S., Black, M.: A quantitative analysis of current practices in optical flow estimation and the principles behind them. Int. J. Comput. Vis. 106(2), 115–137 (2014). Scholar
  30. 30.
    Sun, D., Sudderth, E., Black, M.J.: Layered segmentation and optical flow estimation over time. In: CVPR (2012)Google Scholar
  31. 31.
    Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. arXiv preprint arXiv:1709.02371 (2017)
  32. 32.
    Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: CVPR (2016)Google Scholar
  33. 33.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). Scholar
  34. 34.
    Xu, J., Ranftl, R., Koltun, V.: Accurate optical flow via direct cost volume processing. In: CVPR (2017)Google Scholar
  35. 35.
    Yu, J.J., Harley, A.W., Derpanis, K.G.: Back to basics: unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 3–10. Springer, Cham (2016). Scholar
  36. 36.
    Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Max-Planck Institute for Intelligent SystemsTübingenGermany
  2. 2.MIT CSAILCambridgeUSA

Personalised recommendations