Machine learning subsurface flow equations from data

  • Haibin Chang
  • Dongxiao ZhangEmail author
Open Access
Original Paper


Governing equations of physical problems are traditionally derived from conservation laws or physical principles. However, some complex problems still exist for which these first-principle derivations cannot be implemented. As data acquisition and storage ability have increased, data-driven methods have attracted great attention. In recent years, several works have addressed how to learn dynamical systems and partial differential equations using data-driven methods. Along this line, in this work, we investigate how to discover subsurface flow equations from data via a machine learning technique, the least absolute shrinkage and selection operator (LASSO). The learning of single-phase groundwater flow equation and contaminant transport equation are demonstrated. Considering that the parameters of subsurface formation are usually heterogeneous, we propose a procedure for learning partial differential equations with heterogeneous model parameters for the first time. Derivative calculation from discrete data is required for implementing equation learning, and we discuss how to calculate derivatives from noisy data. For a series of cases, the proposed data-driven method demonstrates satisfactory results for learning subsurface flow equations.


Machine learning Data-driven discovery Governing equations Noisy data LASSO 



This work is partially funded by the National Natural Science Foundation of China (Grant No. U1663208 and 51520105005) and the National Science and Technology Major Project of China (Grant No. 2017ZX05009-005 and 2016ZX05037-003). The link for the open-source Matlab code is provided in Hesterberg et al. [13]. The other computer codes and data used are available upon request from the corresponding author.

Compliance with Ethical Standards

Conflict of interests

The authors declare that they have no conflict of interest.


  1. 1.
    Bear, J.: Dynamics of Fluids in Porous Media. New York: Environmental Science Series (1972)Google Scholar
  2. 2.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009). Google Scholar
  3. 3.
    Bongard, J., Lipson, H.: Automated reverse engineering of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 104(24), 9943–9948 (2007). Google Scholar
  4. 4.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3(1), 1–122 (2010). Google Scholar
  5. 5.
    Bruno, O., Hoch, D.: Numerical differentiation of approximated functions with limited order-of-accuracy deterioration. SIAM J. Numer. Anal. 50(3), 1581–1603 (2012). Google Scholar
  6. 6.
    Brunton, S.L., Proctor, J.L., Kutz, J.N.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 113(15), 3932–3937 (2016). Google Scholar
  7. 7.
    Chang, H., Zhang, D.: Identification of physical processes via combined data-driven and data-assimilation methods. J. Comp. Phy. 393, 337–350 (2019). Google Scholar
  8. 8.
    Chartrand, R.: Numerical differentiation of noisy, nonsmooth data. ISRN Applied Mathematics 2011, 1–11 (2011). Google Scholar
  9. 9.
    Cullum, J.: Numerical differentiation and regularization. SIAM J. Numer. Anal. 8(2), 254–265 (1971). Google Scholar
  10. 10.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.J.: Least angle regression. Ann. Stat. 32(2), 407–451 (2004). Google Scholar
  11. 11.
    Figueiredo, M.A.T., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Sign. Proces. 1(4), 586–597 (2007). Google Scholar
  12. 12.
    Hastie, T., Tibshirani, R.J., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction. New York: Springer series in statistics. (2009)
  13. 13.
    Hesterberg, T., Choi, N.H., Meier, L., Fraley, C.: Least angle and l1 penalized regression: a review. Statistics Surveys 2, 61–93 (2008). Google Scholar
  14. 14.
    Jauberteau, F, Jauberteau, J.L.: Numerical differentiation with noisy signal. Appl. Math. Comput. 215 (6), 2283–2297 (2009). Google Scholar
  15. 15.
    Knowles, I., Le, T., Yan, A.: On the recovery of multiple flow parameters from transient head data. J. Comput. Appl. Math. 169(1), 1–15 (2004).
  16. 16.
    Mangan, N.M., Brunton, S.L., Proctor, J.L., Kutz, J.N.: Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Transactions on Molecular Biological and Multi-Scale Communications 2(1), 52–63 (2016). Google Scholar
  17. 17.
    Mangan, N.M., Kutz, J.N., Brunton, S.L., Proctor, J.L.: Model selection for dynamical systems via sparse regression and information criteria. Proceedings of the Royal Society A-Mathematical Physical and Engineering Sciences 473(2204), 16 (2017). Google Scholar
  18. 18.
    Meng, J., Li, H.: An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection. Adv. Water Resour. 105, 13–28 (2017). Google Scholar
  19. 19.
    Ramos, G., Carrera, J., Gómez, S., Minutti, C., Camacho, R.: A stable computation of log-derivatives from noisy drawdown data. Water Resour. Res. 53(9), 7904–7916 (2017). Google Scholar
  20. 20.
    Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35(3), 1012–1030 (2007). Google Scholar
  21. 21.
    Rudy, S.H., Brunton, S.L., Proctor, J.L., Kutz, J.N.: Data-driven discovery of partial differential equations. Sci. Adv. 3(4), e1602614 (2017). Google Scholar
  22. 22.
    Schaeffer, H.: Learning partial differential equation via data discovery and sparse optimisation. Proceedings of the Royal Society A-Mathematical Physical and Engineering Sciences 473(2197), 20160446 (2017). Google Scholar
  23. 23.
    Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009). Google Scholar
  24. 24.
    Tibshirani, R.J.: The lasso problem and uniqueness. Electronic Journal of Statistics 7(1), 1456–1490 (2013). Google Scholar
  25. 25.
    Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006). Google Scholar

Copyright information

© The Author(s) 2019

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.ERE and BIC-ESAT, College of EngineeringPeking UniversityBeijingChina

Personalised recommendations