Spatially-Dimension-Adaptive Sparse Grids for Online Learning

Khakhutskyy, Valeriy; Hegland, Markus

doi:10.1007/978-3-319-28262-6_6

Valeriy Khakhutskyy⁹ &
Markus Hegland¹⁰

Part of the book series: Lecture Notes in Computational Science and Engineering ((LNCSE,volume 109))

994 Accesses
1 Citations

Abstract

This paper takes a new look at regression with adaptive sparse grids. Considering sparse grid refinement as an optimisation problem, we show that it is in fact an instance of submodular optimisation with a cardinality constraint. Hence, we are able to directly apply results obtained in combinatorial optimisation research concerned with submodular optimisation to the grid refinement problem. Based on these results, we derive an efficient refinement indicator that allows the selection of new grid indices with finer granularity than was previously possible. We then implement the resulting new refinement procedure using an averaged stochastic gradient descent method commonly used in online learning methods. As a result we obtain a new method for training adaptive sparse grid models. We show both for synthetic and real-life data that the resulting models exhibit lower complexity and higher predictive power compared to currently used state-of-the-art methods.

With the support of the Technische Universität München – Institute for Advanced Study, funded by the German Excellence Initiative (and the European Union Seventh Framework Programme under grant agreement nr 291763).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
To normalise a finite dataset we would need to calculate minimal and maximal values of the input in all dimensions. This can be done even by passing through the dataset: first we initialise two variables \(\mathbf{x}_{\text{min}}\) and \(\mathbf{x}_{\text{max}}\) with the first element and then update the components of these two variables if the new input patterns would have smaller/larger vales than stored.
2.
We noticed that this rule gives a better regularisation properties to the model with acceptable extra costs.
3.
In this experiment we focused on the comparison between OSDA and BSA. Hence, we terminated the training of OSDA with Rosenblatt transformation prematurely. However, Fig. 10 suggests that further improvement may be possible.
4.
For OSDA we counted 2 passes through data in online optimisation loop and one for computing the refinement indicators.

References

J.K. Adelman-McCarthy et al., The Fifth Data Release of the Sloan Digital Sky Survey. Astrophys. J. Suppl. Ser. 172(2), 634–644 (2007)
Article Google Scholar
F. Bach, E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate o(1/n), in Advances in Neural Information Processing Systems 26, ed. by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Weinberger (Curran Associates, Inc., Red Hook, 2013), pp. 773–781
Google Scholar
T. Blumensath, M.E. Davies, Stagewise weak gradient pursuits. IEEE Trans. Signal Process. 57(11), 4333–4346 (2009)
Article MathSciNet Google Scholar
L. Bottou, Stochastic learning, in Advanced Lectures on Machine Learning (Springer, Berlin/Heidelberg, 2004), pp. 146–168
Book MATH Google Scholar
L. Bottou, Online algorithms and stochastic approximations, in Online Learning and Neural Networks, ed. by D. Saad (Cambridge University Press, Cambridge, 1998), pp. 9–42. Revised (2012)
Google Scholar
L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010), Paris, ed. by Y. Lechevallier, G. Saporta (Physica-Verlag, Heidelberg, 2010), pp. 177–187. ISBN:978-3-7908-2603-6
Chapter Google Scholar
L. Bottou, Stochastic gradient tricks, in Neural Networks, Tricks of the Trade, Reloaded, ed. by G. Montavon, G.B. Orr, K.-R. Müller. Lecture Notes in Computer Science (LNCS 7700) (Springer, 2012), pp. 430–445
Google Scholar
L. Bottou, Y. LeCun, Large scale online learning, in Advances in Neural Information Processing Systems 16, ed. by S. Thrun, L. Saul, B. Schölkopf (MIT, Cambridge MA, 2004), pp. 217–224
Google Scholar
H.-J. Bungartz, M. Griebel, Sparse grids. Acta Numer. 13, 147–269 (2004)
Article MathSciNet MATH Google Scholar
H.-J. Bungartz, D. Pflüger, S. Zimmer, Adaptive sparse grid techniques for data mining, in Modeling, Simulation and Optimization of Complex Processes, ed. by H.G. Bock, E. Kostina, H.X. Phu, R. Rannacher (Springer, Berlin/Heidelberg, 2008), pp. 121–130. ISBN:978-3-540-79408-0
Chapter Google Scholar
G. Buse, Exploiting Many-Core Architectures for Dimensionally Adaptive Sparse Grids. Dissertation, Institut für Informatik, Technische Universität München, München, 2015
Google Scholar
U. Feige, A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)
Article MathSciNet MATH Google Scholar
J. Garcke, Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten dünnen Gittern. Doktorarbeit, Institut für Numerische Simulation, Universität Bonn, 2004
Google Scholar
J. Garcke, M. Griebel, M. Thess, Data mining with sparse grids. Computing 67(3), 225–253 (2001)
Article MathSciNet MATH Google Scholar
M. Hegland, Adaptive sparse grids, in Procedings of 10th Computational Techniques and Applications Conference CTAC-2001, Brisbane, vol. 44, ed. by K. Burrage, R.B. Sidje (2003), pp. C335–C353
Google Scholar
A. Heinecke, D. Pflüger, Multi- and many-core data mining with adaptive sparse grids, in Proceedings of the 8th ACM International Conference on Computing Frontiers, New York, May 2011 (ACM, 2011), pp. 29:1–29:10
Google Scholar
P. Kambadur, A.C. Lozano, A parallel, block greedy method for sparse inverse covariance estimation for ultra-high dimensions, in Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, Scottsdale (2013), pp. 351–359
Google Scholar
V. Khakhutskyy, D. Pflüger, M. Hegland, Scalability and fault tolerance of the alternating direction method of multipliers for sparse grids, in Parallel Computing: Accelerating Computational Science and Engineering (CSE), Amsterdam, 2014, ed. by M. Bader, H.-J. Bungartz, A. Bode, M. Gerndt, G.R. Joubert. Volume 25 of Advances in Parallel Computing (IOS, 2014), pp. 603–612
Google Scholar
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance, Cost-effective outbreak detection in networks, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, New York (ACM, 2007), pp. 420–429
Google Scholar
M. Minoux, Accelerated greedy algorithms for maximizing submodular set functions, in Optimization Techniques, Lecture Notes in Control and Information Sciences 7:234–243, (1978)
Article MathSciNet MATH Google Scholar
G.L. Nemhauser, L.a. Wolsey, M.L. Fisher, An analysis of approximations for maximizing submodular set functions-I. Math. Program. 14, 265–294 (1978)
Google Scholar
J. Nocedal, S.J. Wright, Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. (Springer, New York, 2006)
Google Scholar
B. Peherstorfer, Model Order Reduction of Parametrized Systems with Sparse Grid Learning Techniques. Dissertation, Department of Informatics, Technische Universität München, Oct. 2013
Google Scholar
B. Peherstorfer, D. Pflüger, H.-J. Bungartz, A Sparse-grid-based out-of-sample extension for dimensionality reduction and clustering with laplacian eigenmaps, in AI 2011: Advances in Artificial Intelligence, ed. by D. Wang, M. Reynolds (Springer, Berlin/Heidelberg, 2011), pp. 112–121
Chapter Google Scholar
B. Peherstorfer, F. Franzelin, D. Pflüger, H.-J. Bungartz, Classification with probability density estimation on sparse grids, in Sparse Grids and Applications – Munich 2012, ed. by J. Garcke, D. Pflüger. Volume 97 of Lecture Notes in Computational Science and Engineering, pp. 255–270 (Springer, Cham/New York, 2014)
Google Scholar
D. Pflüger, Spatially Adaptive Sparse Grids for High-Dimensional Problems (Verlag Dr. Hut, München, 2010).
MATH Google Scholar
D. Pflüger, Spatially adaptive refinement, in Sparse Grids and Applications, ed. by J. Garcke, M. Griebel. Lecture Notes in Computational Science and Engineering (Springer, Berlin/Heidelberg, 2012), pp. 243–262
Google Scholar
B. Polyak, A. Juditsky, Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)
Article MathSciNet MATH Google Scholar
M. Rosenblatt, Remarks on a multivariate transformation. Ann. Math. Stat. 23(3), 470–472 (1952)
Article MathSciNet MATH Google Scholar
T. Schaul, S. Zhang, Y. LeCun, No More Pesky learning rates. J. Mach. Learn. Res. 28, 343–351 (2013)
Google Scholar
D. Strätling, Concept drift with adaptive sparse grids. Bachelor Thesis, Technische Universität München, 2015
Google Scholar
K. Wei, J. Bilmes, R.U.W. Edu, B.U.W. Edu, Fast multi-stage submodular maximization, in International Conference on Machine Learning, Beijing, 2014
Google Scholar
W. Xu, Towards optimal one pass large scale learning with averaged stochastic gradient descent. Arxiv preprint arXiv:1107.2490. (2011), pp. 1–19
Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität München, München, Germany
Valeriy Khakhutskyy
The Australian National University, Canberra, ACT, Australia
Markus Hegland

Authors

Valeriy Khakhutskyy
View author publications
You can also search for this author in PubMed Google Scholar
Markus Hegland
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valeriy Khakhutskyy .

Editor information

Editors and Affiliations

Institut für Numerische Simulation, Universität Bonn, Bonn, Germany
Jochen Garcke
Institute for Parallel and Distributed S, Universität Stuttgart, Stuttgart, Germany
Dirk Pflüger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khakhutskyy, V., Hegland, M. (2016). Spatially-Dimension-Adaptive Sparse Grids for Online Learning. In: Garcke, J., Pflüger, D. (eds) Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering, vol 109. Springer, Cham. https://doi.org/10.1007/978-3-319-28262-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-28262-6_6
Published: 17 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28260-2
Online ISBN: 978-3-319-28262-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics