Statistical Papers

, Volume 60, Issue 2, pp 235–249 | Cite as

Optimal subsampling for softmax regression

  • Yaqiong Yao
  • HaiYing WangEmail author
Regular Article


To meet the challenge of massive data, Wang et al. (J Am Stat Assoc 113(522):829–844, 2018b) developed an optimal subsampling method for logistic regression. The purpose of this paper is to extend their method to softmax regression, which is also called multinomial logistic regression and is commonly used to model data with multiple categorical responses. We first derive the asymptotic distribution of the general subsampling estimator, and then derive optimal subsampling probabilities under the A-optimality criterion and the L-optimality criterion with a specific L matrix. Since the optimal subsampling probabilities depend on the unknowns, we adopt a two-stage adaptive procedure to address this issue and use numerical simulations to demonstrate its performance.


Massive data Subsampling Optimality criterion Softmax regression 



We gratefully acknowledge the comments from two referees that helped improve the paper.


  1. Atkinson A, Donev A, Tobias R (2007) Optimum experimental designs, with SAS, vol 34. Oxford University Press, OxfordzbMATHGoogle Scholar
  2. Drineas P, Kannan R, Mahoney MW (2006a) Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J Comput 36(1):132–157MathSciNetCrossRefzbMATHGoogle Scholar
  3. Drineas P, Kannan R, Mahoney MW (2006b) Fast Monte Carlo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM J Comput 36(1):158–183MathSciNetCrossRefzbMATHGoogle Scholar
  4. Drineas P, Kannan R, Mahoney MW (2006c) Fast Monte Carlo algorithms for matrices III: computing a compressed approximate matrix decomposition. SIAM J Comput 36(1):184–206MathSciNetCrossRefzbMATHGoogle Scholar
  5. Drineas P, Mahoney MW, Muthukrishnan S (2006d) Sampling algorithms for \(l_2\) regression and applications. In: Proceedings of the seventeenth annual ACM-SIAM symposium on discrete algorithm. Society for Industrial and Applied Mathematics, Philadelphia, pp 1127–1136Google Scholar
  6. Drineas P, Mahoney M, Muthukrishnan S (2008) Relative-error CUR matrix decomposition. SIAM J Matrix Anal Appl 30:844–881MathSciNetCrossRefzbMATHGoogle Scholar
  7. Drineas P, Mahoney M, Muthukrishnan S, Sarlos T (2011) Faster least squares approximation. Numer Math 117:219–249MathSciNetCrossRefzbMATHGoogle Scholar
  8. Ferguson TS (1996) A course in large sample theory. Chapman and Hall, LondonCrossRefzbMATHGoogle Scholar
  9. Frieze A, Kannan R, Vempala S (2004) Fast Monte-Carlo algorithms for finding low-rank approximations. J ACM 51:1025–1041MathSciNetCrossRefzbMATHGoogle Scholar
  10. Lane A, Yao P, Flournoy N (2014) Information in a two-stage adaptive optimal design. J Stat Plan Inference 144:173–187MathSciNetCrossRefzbMATHGoogle Scholar
  11. Ma P, Mahoney M, Yu B (2015) A statistical perspective on algorithmic leveraging. J Mach Learn Res 16:861–911MathSciNetzbMATHGoogle Scholar
  12. Mahoney MW (2011) Randomized algorithms for matrices and data. Found Trends Mach Learn 3(2):123–224zbMATHGoogle Scholar
  13. Mahoney MW, Drineas P (2009) CUR matrix decompositions for improved data analysis. Proc Natl Acad Sci USA 106(3):697–702MathSciNetCrossRefzbMATHGoogle Scholar
  14. Ortega JM, Rheinboldt WC (1970) Iterative solution of nonlinear equations in several variables, vol 30. SIAM, PhiladelphiazbMATHGoogle Scholar
  15. R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.
  16. Raskutti G, Mahoney M (2016) A statistical perspective on randomized sketching for ordinary least-squares. J Mach Learn Res 17:1–31MathSciNetzbMATHGoogle Scholar
  17. van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, LondonCrossRefzbMATHGoogle Scholar
  18. Wang H (2018) More efficient estimation for logistic regression with optimal subsample. arXiv preprint. arXiv:180202698
  19. Wang H, Yang M, Stufken J (2018a) Information-based optimal subdata selection for big data linear regression. J Am Stat Assoc.
  20. Wang H, Zhu R, Ma P (2018b) Optimal subsampling for large sample logistic regression. J Am Stat Assoc 113(522):829–844MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of ConnecticutStorrsUSA

Personalised recommendations