Highly Adaptive Lasso (HAL)

  • Mark J. van der Laan
  • David Benkeser
Part of the Springer Series in Statistics book series (SSS)


In this chapter, we define a general nonparametric estimator of a d-variate function valued parameter ψ0. This parameter is defined as a minimizer of an expectation of a loss function L(ψ)(O) that is guaranteed to converge to the true ψ0 at a rate faster than n−1∕4, for all dimensions d: \(\sqrt{ d_{0}(\psi _{n},\psi _{0})} = O_{P}(n^{-1/4-\alpha (d)/8})\), where d0(ψ, ψ0) = P0L(ψ) − P0L(ψ0) is the loss-based dissimilarity. This is a remarkable result because this rate does not depend on the underlying smoothness of ψ0. For example, ψ0 can be a function that is discontinuous at many points or nondifferentiable. The only assumption we need to assume is that ψ0 is right-continuous with left-hand limits, and has a finite variation norm, so that ψ0 generates a measure (just as a cumulative distribution function generates a measure on the Euclidean space).


  1. A. Afifi, S. Azen, Statistical Analysis: A Computer Oriented Approach, 2nd edn. (Academic, New York, 1979)zbMATHGoogle Scholar
  2. D. Benkeser, M.J. van der Laan, The highly adaptive lasso estimator, in IEEE International Conference on Data Science and Advanced Analytics, pp. 689–696 (2016)Google Scholar
  3. L. Breiman, Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  4. L. Breiman, J.H. Friedman, R. Olshen, C.J. Stone, Classification and Regression Trees (Chapman & Hall, Boca Raton, 1984)zbMATHGoogle Scholar
  5. J.H. Friedman, Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–141 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  6. J.H. Friedman, Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  7. M.A. Hearst, S.T Dumais, E. Osman, J. Platt, B. Scholkopf. Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)Google Scholar
  8. D. Kibler, D.W. Aha, M.K. Albert, Instance-based prediction of real-valued attributes. Comput. Intell. 5, 51 (1989)CrossRefGoogle Scholar
  9. Z. Liu, T. Stengos, Nonlinearities in cross country growth regressions: a semiparametric approach. J. Appl. Econom. 14, 527–538 (1999)CrossRefGoogle Scholar
  10. E.A. Nadaraya, On estimating regression. Theory Probab. Appl. 9(1), 141–142 (1964)CrossRefzbMATHGoogle Scholar
  11. B. Rosner, Fundamentals of Biostatistics, 5th edn. (Duxbury, Pacific Grove, 1999)Google Scholar
  12. J.W. Smith, J.E. Everhart, W.C. Dickson, W.C. Knowler, R.S. Johannes, Using the adap learning algorithm to forecast the onset of diabetes mellitus, in Proceedings of the Annual Symposium on Computer Application in Medical Care (American Medical Informatics Association, Bethesda, 1988), p. 261Google Scholar
  13. A.W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Processes (Springer, Berlin, Heidelberg, New York, 1996)CrossRefzbMATHGoogle Scholar
  14. A.W. van der Vaart, J.A. Wellner, A local maximal inequality under uniform entropy. Electron. J. Stat. 5, 192–203 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  15. G.S. Watson, Smooth regression analysis. Sankhyā Indian J. Stat. Ser. A 359–372 (1964)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Division of Biostatistics and Department of StatisticsUniversity of California, BerkeleyBerkeleyUSA
  2. 2.Department of Biostatistics and BoinformaticsRollins School of Public Health, Emory UniversityAtlantaUSA

Personalised recommendations