Abstract
Scattered Data and Aggregated Inference (SDAI) represents a class of problems where data cannot be at a centralized location, while modeling and inference is pursued. Distributed statistical inference is a technique to tackle a type of the above problem, and has recently attracted enormous attention. Many existing work focus on the averaging estimator, e.g., Zhang et al. (2013) and many others. In this chapter, we propose a one-step approach to enhance a simple-averaging based distributed estimator. We derive the corresponding asymptotic properties of the newly proposed estimator. We find that the proposed one-step estimator enjoys the same asymptotic properties as the centralized estimator. The proposed one-step approach merely requires one additional round of communication in relative to the averaging estimator; so the extra communication burden is insignificant. In finite-sample cases, numerical examples show that the proposed estimator outperforms the simple averaging estimator with a large margin in terms of the mean squared errors. A potential application of the one-step approach is that one can use multiple machines to speed up large-scale statistical inference with little compromise in the quality of estimators. The proposed method becomes more valuable when data can only be available at distributed machines with limited communication bandwidth. We discuss other types of SDAI problems at the end.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arjevani Y, Shamir O (2015) Communication complexity of distributed convex learning and optimization. Technical report. http://arxiv.org/abs/1506.01900. Accessed 28 Oct 2015
Balcan M-F, Blum A, Fine S, Mansour Y (2012) Distributed learning, communication complexity and privacy. https://arxiv.org/abs/1204.3514. Accessed 25 May 2012
Balcan M-F, Kanchanapally V, Liang Y, Woodruff D (2014) Improved distributed principal component analysis. Technical report. http://arxiv.org/abs/1408.5823. Accessed 23 Dec 2014
Battey H, Fan J, Liu H, Lu J, Zhu Z (2015) Distributed estimation and inference with statistical guarantees. https://arxiv.org/abs/1509.05457. Accessed 17 Sept 2015
Bickel PJ (1975) One-step Huber estimates in the linear model. J Am Stat Assoc 70(350):428–434
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Bradley JK, Kyrola A, Bickson D, Guestrin C (2011) Parallel coordinate descent for L1-regularized loss minimization. In Proceedings of 28th international conference on Machine Learning. https://arxiv.org/abs/1105.5379. Accessed 26 May 2011
Chen X, Xie M-g (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat Sin 24:1655–1684
Chen S, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61
Cichocki A, Amari S-I, Zdunek R, Phan AH (2009) Non-negative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley-Blackwell, Hoboken
Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P et al. (2012) Spanner: Googles globally distributed database. In: Proceedings of the USENIX symposium on operating systems design and implementation
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165–202
Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: SIAM international conference on data mining, pp 606–610
Donoho D, Stodden V (2003) When does non-negative matrix factorization give a correct decomposition into parts? In: Advances in neural information processing systems. Stanford University, Stanford
El Gamal M, Lai L (2015) Are Slepian-Wolf rates necessary for distributed parameter estimation? Technical report. http://arxiv.org/abs/1508.02765. Accessed 10 Nov 2015
Fan J, Chen J (1999) One-step local quasi-likelihood estimation. J R Stat Soc Ser B Stat Methodol 61(4):927–943
Fan J, Feng Y, Song R (2012) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557
Fan J, Han F, Liu H (2014) Challenges of big data analysis. Natl Sci Rev 1:293–314
Fevotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput 21(3):793–830
Forero PA, Cano A, Giannakis GB (2010) Consensus-based distributed support vector machines. J Mach Learn Res 11:1663–1707
Gillis N, Luce R (2014) Robust near-separable nonnegative matrix factorization using linear optimization. J Mach Learn Res 15:1249–1280
Huang C, Huo X (2015) A distributed one-step estimator. Technical report. http://arxiv.org/abs/1511.01443. Accessed 10 Nov 2015
Huang K, Sidiropoulos ND, Swami A (2014) Non-negative matrix factorization revisited: uniqueness and algorithm for symmetric decomposition. IEEE Trans Signal Process 62(1):211–224
Jaggi M, Smith V, Takác M, Terhorst J, Krishnan S, Hofmann T, Jordan MI (2014) Communication-efficient distributed dual coordinate ascent. In: Advances in neural information processing systems, pp 3068–3076
Kleiner A, Talwalkar A, Sarkar P, Jordan MI (2014) A scalable bootstrap for massive data. J R Stat Soc Ser B Stat Methodol 76(4):795–816
Lang S (1993) Real and functional analysis, vol 142. Springer Science & Business Media, Berlin
Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401:788–791
Lee JD, Sun Y, Liu Q, Taylor JE (2015) Communication-efficient sparse regression: a one-shot approach. arXiv preprint arXiv:1503.04337
Liu Q, Ihler AT (2014) Distributed estimation, information loss and exponential families. In: Advances in neural information processing systems, pp 1098–1106
McDonald R, Hall K, Mann G (2010) Distributed training strategies for the structured perceptron. In: North American chapter of the Association for Computational Linguistics (NAACL)
Mitra S, Agrawal M, Yadav A, Carlsson N, Eager D, Mahanti A (2011) Characterizing web-based video sharing workloads. ACM Trans Web 5(2):8
Mizutani T (2014) Ellipsoidal rounding for nonnegative matrix factorization under noisy separability. J Mach Learn Res 15:1011–1039
Neiswanger W, Wang C, Xing E (2013) Asymptotically exact, embarrassingly parallel MCMC. arXiv preprint arXiv:1311.4780
Nowak RD (2003) Distributed EM algorithms for density estimation and clustering in sensor networks. IEEE Trans Signal Process 51(8):2245–2253
Paatero P, Tapper U (1994) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126
Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl 401(1):29–47
Ravikumar P, Lafferty J, Liu H, Wasserman L (2009) Sparse additive models. J R Stat Soc Ser B Stat Methodol 71(5):1009–1030
Rosenblatt J, Nadler B (2014) On the optimality of averaging in distributed statistical learning. arXiv preprint arXiv:1407.2724
Schmidt MN, Larson J, Hsiao FT (2007) Wind noise reduction using non-negative sparse coding. In: Machine learning for signal processing, IEEE workshop, pp 431–436
Shamir O, Srebro N, Zhang T (2014) Communication-efficient distributed optimization using an approximate Newton-type method. In: Proceedings of the 31st international conference on machine learning, pp 1000–1008
Song Q, Liang F (2015) A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression. J R Stat Soc B 77(Part 5):947–972
Städler N, Bühlmann P, Van De Geer S (2010) ℓ 1-Penalization for mixture regression models. Test 19(2):209–256
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58(1):267–288
van der Vaart AW (2000) Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge
Wainwright M (2014) Constrained forms of statistical minimax: computation, communication, and privacy. In: Proceedings of international congress of mathematicians
Wang X, Peng P, Dunson DB (2014) Median selection subset aggregation for parallel inference. In: Advances in neural information processing systems, pp 2195–2203
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: The 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 267–273
Yang Y, Barron A (1999) Information-theoretic determination of minimax rates of convergence. Ann Stat 27(5):1564–1599
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol 68(1):49–67
Zhang Y, Duchi JC, Wainwright MJ (2013) Communication-efficient algorithms for statistical optimization. J Mach Learn Res 14:3321–3363
Zhang Y, Duchi JC, Jordan MI, Wainwright MJ (2013) Information-theoretic lower bounds for distributed statistical estimation with communication constraints. Technical report, UC Berkeley. Presented at the NIPS Conference 2013
Zhao T, Cheng G, Liu H (2014) A partially linear framework for massive heterogeneous data. arXiv preprint arXiv:1410.8570
Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems, pp 2595–2603
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Huo, X., Huang, C., Ni, X.S. (2018). Scattered Data and Aggregated Inference. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-18284-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18283-4
Online ISBN: 978-3-319-18284-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)