Skip to main content

Spatially-Dimension-Adaptive Sparse Grids for Online Learning

  • Conference paper
  • First Online:
Sparse Grids and Applications - Stuttgart 2014

Part of the book series: Lecture Notes in Computational Science and Engineering ((LNCSE,volume 109))

Abstract

This paper takes a new look at regression with adaptive sparse grids. Considering sparse grid refinement as an optimisation problem, we show that it is in fact an instance of submodular optimisation with a cardinality constraint. Hence, we are able to directly apply results obtained in combinatorial optimisation research concerned with submodular optimisation to the grid refinement problem. Based on these results, we derive an efficient refinement indicator that allows the selection of new grid indices with finer granularity than was previously possible. We then implement the resulting new refinement procedure using an averaged stochastic gradient descent method commonly used in online learning methods. As a result we obtain a new method for training adaptive sparse grid models. We show both for synthetic and real-life data that the resulting models exhibit lower complexity and higher predictive power compared to currently used state-of-the-art methods.

With the support of the Technische Universität München – Institute for Advanced Study, funded by the German Excellence Initiative (and the European Union Seventh Framework Programme under grant agreement nr 291763).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    To normalise a finite dataset we would need to calculate minimal and maximal values of the input in all dimensions. This can be done even by passing through the dataset: first we initialise two variables \(\mathbf{x}_{\text{min}}\) and \(\mathbf{x}_{\text{max}}\) with the first element and then update the components of these two variables if the new input patterns would have smaller/larger vales than stored.

  2. 2.

    We noticed that this rule gives a better regularisation properties to the model with acceptable extra costs.

  3. 3.

    In this experiment we focused on the comparison between OSDA and BSA. Hence, we terminated the training of OSDA with Rosenblatt transformation prematurely. However, Fig. 10 suggests that further improvement may be possible.

  4. 4.

    For OSDA we counted 2 passes through data in online optimisation loop and one for computing the refinement indicators.

References

  1. J.K. Adelman-McCarthy et al., The Fifth Data Release of the Sloan Digital Sky Survey. Astrophys. J. Suppl. Ser. 172(2), 634–644 (2007)

    Article  Google Scholar 

  2. F. Bach, E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate o(1/n), in Advances in Neural Information Processing Systems 26, ed. by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Weinberger (Curran Associates, Inc., Red Hook, 2013), pp. 773–781

    Google Scholar 

  3. T. Blumensath, M.E. Davies, Stagewise weak gradient pursuits. IEEE Trans. Signal Process. 57(11), 4333–4346 (2009)

    Article  MathSciNet  Google Scholar 

  4. L. Bottou, Stochastic learning, in Advanced Lectures on Machine Learning (Springer, Berlin/Heidelberg, 2004), pp. 146–168

    Book  MATH  Google Scholar 

  5. L. Bottou, Online algorithms and stochastic approximations, in Online Learning and Neural Networks, ed. by D. Saad (Cambridge University Press, Cambridge, 1998), pp. 9–42. Revised (2012)

    Google Scholar 

  6. L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010), Paris, ed. by Y. Lechevallier, G. Saporta (Physica-Verlag, Heidelberg, 2010), pp. 177–187. ISBN:978-3-7908-2603-6

    Chapter  Google Scholar 

  7. L. Bottou, Stochastic gradient tricks, in Neural Networks, Tricks of the Trade, Reloaded, ed. by G. Montavon, G.B. Orr, K.-R. Müller. Lecture Notes in Computer Science (LNCS 7700) (Springer, 2012), pp. 430–445

    Google Scholar 

  8. L. Bottou, Y. LeCun, Large scale online learning, in Advances in Neural Information Processing Systems 16, ed. by S. Thrun, L. Saul, B. Schölkopf (MIT, Cambridge MA, 2004), pp. 217–224

    Google Scholar 

  9. H.-J. Bungartz, M. Griebel, Sparse grids. Acta Numer. 13, 147–269 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  10. H.-J. Bungartz, D. Pflüger, S. Zimmer, Adaptive sparse grid techniques for data mining, in Modeling, Simulation and Optimization of Complex Processes, ed. by H.G. Bock, E. Kostina, H.X. Phu, R. Rannacher (Springer, Berlin/Heidelberg, 2008), pp. 121–130. ISBN:978-3-540-79408-0

    Chapter  Google Scholar 

  11. G. Buse, Exploiting Many-Core Architectures for Dimensionally Adaptive Sparse Grids. Dissertation, Institut für Informatik, Technische Universität München, München, 2015

    Google Scholar 

  12. U. Feige, A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  13. J. Garcke, Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten dünnen Gittern. Doktorarbeit, Institut für Numerische Simulation, Universität Bonn, 2004

    Google Scholar 

  14. J. Garcke, M. Griebel, M. Thess, Data mining with sparse grids. Computing 67(3), 225–253 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  15. M. Hegland, Adaptive sparse grids, in Procedings of 10th Computational Techniques and Applications Conference CTAC-2001, Brisbane, vol. 44, ed. by K. Burrage, R.B. Sidje (2003), pp. C335–C353

    Google Scholar 

  16. A. Heinecke, D. Pflüger, Multi- and many-core data mining with adaptive sparse grids, in Proceedings of the 8th ACM International Conference on Computing Frontiers, New York, May 2011 (ACM, 2011), pp. 29:1–29:10

    Google Scholar 

  17. P. Kambadur, A.C. Lozano, A parallel, block greedy method for sparse inverse covariance estimation for ultra-high dimensions, in Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, Scottsdale (2013), pp. 351–359

    Google Scholar 

  18. V. Khakhutskyy, D. Pflüger, M. Hegland, Scalability and fault tolerance of the alternating direction method of multipliers for sparse grids, in Parallel Computing: Accelerating Computational Science and Engineering (CSE), Amsterdam, 2014, ed. by M. Bader, H.-J. Bungartz, A. Bode, M. Gerndt, G.R. Joubert. Volume 25 of Advances in Parallel Computing (IOS, 2014), pp. 603–612

    Google Scholar 

  19. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance, Cost-effective outbreak detection in networks, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, New York (ACM, 2007), pp. 420–429

    Google Scholar 

  20. M. Minoux, Accelerated greedy algorithms for maximizing submodular set functions, in Optimization Techniques, Lecture Notes in Control and Information Sciences 7:234–243, (1978)

    Article  MathSciNet  MATH  Google Scholar 

  21. G.L. Nemhauser, L.a. Wolsey, M.L. Fisher, An analysis of approximations for maximizing submodular set functions-I. Math. Program. 14, 265–294 (1978)

    Google Scholar 

  22. J. Nocedal, S.J. Wright, Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. (Springer, New York, 2006)

    Google Scholar 

  23. B. Peherstorfer, Model Order Reduction of Parametrized Systems with Sparse Grid Learning Techniques. Dissertation, Department of Informatics, Technische Universität München, Oct. 2013

    Google Scholar 

  24. B. Peherstorfer, D. Pflüger, H.-J. Bungartz, A Sparse-grid-based out-of-sample extension for dimensionality reduction and clustering with laplacian eigenmaps, in AI 2011: Advances in Artificial Intelligence, ed. by D. Wang, M. Reynolds (Springer, Berlin/Heidelberg, 2011), pp. 112–121

    Chapter  Google Scholar 

  25. B. Peherstorfer, F. Franzelin, D. Pflüger, H.-J. Bungartz, Classification with probability density estimation on sparse grids, in Sparse Grids and Applications – Munich 2012, ed. by J. Garcke, D. Pflüger. Volume 97 of Lecture Notes in Computational Science and Engineering, pp. 255–270 (Springer, Cham/New York, 2014)

    Google Scholar 

  26. D. Pflüger, Spatially Adaptive Sparse Grids for High-Dimensional Problems (Verlag Dr. Hut, München, 2010).

    MATH  Google Scholar 

  27. D. Pflüger, Spatially adaptive refinement, in Sparse Grids and Applications, ed. by J. Garcke, M. Griebel. Lecture Notes in Computational Science and Engineering (Springer, Berlin/Heidelberg, 2012), pp. 243–262

    Google Scholar 

  28. B. Polyak, A. Juditsky, Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  29. M. Rosenblatt, Remarks on a multivariate transformation. Ann. Math. Stat. 23(3), 470–472 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  30. T. Schaul, S. Zhang, Y. LeCun, No More Pesky learning rates. J. Mach. Learn. Res. 28, 343–351 (2013)

    Google Scholar 

  31. D. Strätling, Concept drift with adaptive sparse grids. Bachelor Thesis, Technische Universität München, 2015

    Google Scholar 

  32. K. Wei, J. Bilmes, R.U.W. Edu, B.U.W. Edu, Fast multi-stage submodular maximization, in International Conference on Machine Learning, Beijing, 2014

    Google Scholar 

  33. W. Xu, Towards optimal one pass large scale learning with averaged stochastic gradient descent. Arxiv preprint arXiv:1107.2490. (2011), pp. 1–19

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valeriy Khakhutskyy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Khakhutskyy, V., Hegland, M. (2016). Spatially-Dimension-Adaptive Sparse Grids for Online Learning. In: Garcke, J., Pflüger, D. (eds) Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering, vol 109. Springer, Cham. https://doi.org/10.1007/978-3-319-28262-6_6

Download citation

Publish with us

Policies and ethics