Abstract
This paper takes a new look at regression with adaptive sparse grids. Considering sparse grid refinement as an optimisation problem, we show that it is in fact an instance of submodular optimisation with a cardinality constraint. Hence, we are able to directly apply results obtained in combinatorial optimisation research concerned with submodular optimisation to the grid refinement problem. Based on these results, we derive an efficient refinement indicator that allows the selection of new grid indices with finer granularity than was previously possible. We then implement the resulting new refinement procedure using an averaged stochastic gradient descent method commonly used in online learning methods. As a result we obtain a new method for training adaptive sparse grid models. We show both for synthetic and real-life data that the resulting models exhibit lower complexity and higher predictive power compared to currently used state-of-the-art methods.
With the support of the Technische Universität München – Institute for Advanced Study, funded by the German Excellence Initiative (and the European Union Seventh Framework Programme under grant agreement nr 291763).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
To normalise a finite dataset we would need to calculate minimal and maximal values of the input in all dimensions. This can be done even by passing through the dataset: first we initialise two variables \(\mathbf{x}_{\text{min}}\) and \(\mathbf{x}_{\text{max}}\) with the first element and then update the components of these two variables if the new input patterns would have smaller/larger vales than stored.
- 2.
We noticed that this rule gives a better regularisation properties to the model with acceptable extra costs.
- 3.
In this experiment we focused on the comparison between OSDA and BSA. Hence, we terminated the training of OSDA with Rosenblatt transformation prematurely. However, Fig. 10 suggests that further improvement may be possible.
- 4.
For OSDA we counted 2 passes through data in online optimisation loop and one for computing the refinement indicators.
References
J.K. Adelman-McCarthy et al., The Fifth Data Release of the Sloan Digital Sky Survey. Astrophys. J. Suppl. Ser. 172(2), 634–644 (2007)
F. Bach, E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate o(1/n), in Advances in Neural Information Processing Systems 26, ed. by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Weinberger (Curran Associates, Inc., Red Hook, 2013), pp. 773–781
T. Blumensath, M.E. Davies, Stagewise weak gradient pursuits. IEEE Trans. Signal Process. 57(11), 4333–4346 (2009)
L. Bottou, Stochastic learning, in Advanced Lectures on Machine Learning (Springer, Berlin/Heidelberg, 2004), pp. 146–168
L. Bottou, Online algorithms and stochastic approximations, in Online Learning and Neural Networks, ed. by D. Saad (Cambridge University Press, Cambridge, 1998), pp. 9–42. Revised (2012)
L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010), Paris, ed. by Y. Lechevallier, G. Saporta (Physica-Verlag, Heidelberg, 2010), pp. 177–187. ISBN:978-3-7908-2603-6
L. Bottou, Stochastic gradient tricks, in Neural Networks, Tricks of the Trade, Reloaded, ed. by G. Montavon, G.B. Orr, K.-R. Müller. Lecture Notes in Computer Science (LNCS 7700) (Springer, 2012), pp. 430–445
L. Bottou, Y. LeCun, Large scale online learning, in Advances in Neural Information Processing Systems 16, ed. by S. Thrun, L. Saul, B. Schölkopf (MIT, Cambridge MA, 2004), pp. 217–224
H.-J. Bungartz, M. Griebel, Sparse grids. Acta Numer. 13, 147–269 (2004)
H.-J. Bungartz, D. Pflüger, S. Zimmer, Adaptive sparse grid techniques for data mining, in Modeling, Simulation and Optimization of Complex Processes, ed. by H.G. Bock, E. Kostina, H.X. Phu, R. Rannacher (Springer, Berlin/Heidelberg, 2008), pp. 121–130. ISBN:978-3-540-79408-0
G. Buse, Exploiting Many-Core Architectures for Dimensionally Adaptive Sparse Grids. Dissertation, Institut für Informatik, Technische Universität München, München, 2015
U. Feige, A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)
J. Garcke, Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten dünnen Gittern. Doktorarbeit, Institut für Numerische Simulation, Universität Bonn, 2004
J. Garcke, M. Griebel, M. Thess, Data mining with sparse grids. Computing 67(3), 225–253 (2001)
M. Hegland, Adaptive sparse grids, in Procedings of 10th Computational Techniques and Applications Conference CTAC-2001, Brisbane, vol. 44, ed. by K. Burrage, R.B. Sidje (2003), pp. C335–C353
A. Heinecke, D. Pflüger, Multi- and many-core data mining with adaptive sparse grids, in Proceedings of the 8th ACM International Conference on Computing Frontiers, New York, May 2011 (ACM, 2011), pp. 29:1–29:10
P. Kambadur, A.C. Lozano, A parallel, block greedy method for sparse inverse covariance estimation for ultra-high dimensions, in Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, Scottsdale (2013), pp. 351–359
V. Khakhutskyy, D. Pflüger, M. Hegland, Scalability and fault tolerance of the alternating direction method of multipliers for sparse grids, in Parallel Computing: Accelerating Computational Science and Engineering (CSE), Amsterdam, 2014, ed. by M. Bader, H.-J. Bungartz, A. Bode, M. Gerndt, G.R. Joubert. Volume 25 of Advances in Parallel Computing (IOS, 2014), pp. 603–612
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance, Cost-effective outbreak detection in networks, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, New York (ACM, 2007), pp. 420–429
M. Minoux, Accelerated greedy algorithms for maximizing submodular set functions, in Optimization Techniques, Lecture Notes in Control and Information Sciences 7:234–243, (1978)
G.L. Nemhauser, L.a. Wolsey, M.L. Fisher, An analysis of approximations for maximizing submodular set functions-I. Math. Program. 14, 265–294 (1978)
J. Nocedal, S.J. Wright, Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. (Springer, New York, 2006)
B. Peherstorfer, Model Order Reduction of Parametrized Systems with Sparse Grid Learning Techniques. Dissertation, Department of Informatics, Technische Universität München, Oct. 2013
B. Peherstorfer, D. Pflüger, H.-J. Bungartz, A Sparse-grid-based out-of-sample extension for dimensionality reduction and clustering with laplacian eigenmaps, in AI 2011: Advances in Artificial Intelligence, ed. by D. Wang, M. Reynolds (Springer, Berlin/Heidelberg, 2011), pp. 112–121
B. Peherstorfer, F. Franzelin, D. Pflüger, H.-J. Bungartz, Classification with probability density estimation on sparse grids, in Sparse Grids and Applications – Munich 2012, ed. by J. Garcke, D. Pflüger. Volume 97 of Lecture Notes in Computational Science and Engineering, pp. 255–270 (Springer, Cham/New York, 2014)
D. Pflüger, Spatially Adaptive Sparse Grids for High-Dimensional Problems (Verlag Dr. Hut, München, 2010).
D. Pflüger, Spatially adaptive refinement, in Sparse Grids and Applications, ed. by J. Garcke, M. Griebel. Lecture Notes in Computational Science and Engineering (Springer, Berlin/Heidelberg, 2012), pp. 243–262
B. Polyak, A. Juditsky, Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838–855 (1992)
M. Rosenblatt, Remarks on a multivariate transformation. Ann. Math. Stat. 23(3), 470–472 (1952)
T. Schaul, S. Zhang, Y. LeCun, No More Pesky learning rates. J. Mach. Learn. Res. 28, 343–351 (2013)
D. Strätling, Concept drift with adaptive sparse grids. Bachelor Thesis, Technische Universität München, 2015
K. Wei, J. Bilmes, R.U.W. Edu, B.U.W. Edu, Fast multi-stage submodular maximization, in International Conference on Machine Learning, Beijing, 2014
W. Xu, Towards optimal one pass large scale learning with averaged stochastic gradient descent. Arxiv preprint arXiv:1107.2490. (2011), pp. 1–19
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Khakhutskyy, V., Hegland, M. (2016). Spatially-Dimension-Adaptive Sparse Grids for Online Learning. In: Garcke, J., Pflüger, D. (eds) Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering, vol 109. Springer, Cham. https://doi.org/10.1007/978-3-319-28262-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-28262-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28260-2
Online ISBN: 978-3-319-28262-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)