Scattered Data and Aggregated Inference

Huo, Xiaoming; Huang, Cheng; Ni, Xuelei Sherry

doi:10.1007/978-3-319-18284-1_4

Scattered Data and Aggregated Inference

Xiaoming Huo⁷,
Cheng Huang⁷ &
Xuelei Sherry Ni⁸

Chapter
First Online: 18 July 2018

4383 Accesses

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

Abstract

Scattered Data and Aggregated Inference (SDAI) represents a class of problems where data cannot be at a centralized location, while modeling and inference is pursued. Distributed statistical inference is a technique to tackle a type of the above problem, and has recently attracted enormous attention. Many existing work focus on the averaging estimator, e.g., Zhang et al. (2013) and many others. In this chapter, we propose a one-step approach to enhance a simple-averaging based distributed estimator. We derive the corresponding asymptotic properties of the newly proposed estimator. We find that the proposed one-step estimator enjoys the same asymptotic properties as the centralized estimator. The proposed one-step approach merely requires one additional round of communication in relative to the averaging estimator; so the extra communication burden is insignificant. In finite-sample cases, numerical examples show that the proposed estimator outperforms the simple averaging estimator with a large margin in terms of the mean squared errors. A potential application of the one-step approach is that one can use multiple machines to speed up large-scale statistical inference with little compromise in the quality of estimators. The proposed method becomes more valuable when data can only be available at distributed machines with limited communication bandwidth. We discuss other types of SDAI problems at the end.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Arjevani Y, Shamir O (2015) Communication complexity of distributed convex learning and optimization. Technical report. http://arxiv.org/abs/1506.01900. Accessed 28 Oct 2015
Balcan M-F, Blum A, Fine S, Mansour Y (2012) Distributed learning, communication complexity and privacy. https://arxiv.org/abs/1204.3514. Accessed 25 May 2012
Balcan M-F, Kanchanapally V, Liang Y, Woodruff D (2014) Improved distributed principal component analysis. Technical report. http://arxiv.org/abs/1408.5823. Accessed 23 Dec 2014
Battey H, Fan J, Liu H, Lu J, Zhu Z (2015) Distributed estimation and inference with statistical guarantees. https://arxiv.org/abs/1509.05457. Accessed 17 Sept 2015
Bickel PJ (1975) One-step Huber estimates in the linear model. J Am Stat Assoc 70(350):428–434
Article MathSciNet Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Article Google Scholar
Bradley JK, Kyrola A, Bickson D, Guestrin C (2011) Parallel coordinate descent for L1-regularized loss minimization. In Proceedings of 28th international conference on Machine Learning. https://arxiv.org/abs/1105.5379. Accessed 26 May 2011
Chen X, Xie M-g (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat Sin 24:1655–1684
Google Scholar
Chen S, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61
Article MathSciNet Google Scholar
Cichocki A, Amari S-I, Zdunek R, Phan AH (2009) Non-negative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley-Blackwell, Hoboken
Google Scholar
Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P et al. (2012) Spanner: Googles globally distributed database. In: Proceedings of the USENIX symposium on operating systems design and implementation
Google Scholar
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165–202
Google Scholar
Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: SIAM international conference on data mining, pp 606–610
Chapter Google Scholar
Donoho D, Stodden V (2003) When does non-negative matrix factorization give a correct decomposition into parts? In: Advances in neural information processing systems. Stanford University, Stanford
Google Scholar
El Gamal M, Lai L (2015) Are Slepian-Wolf rates necessary for distributed parameter estimation? Technical report. http://arxiv.org/abs/1508.02765. Accessed 10 Nov 2015
Fan J, Chen J (1999) One-step local quasi-likelihood estimation. J R Stat Soc Ser B Stat Methodol 61(4):927–943
Article MathSciNet Google Scholar
Fan J, Feng Y, Song R (2012) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557
Article MathSciNet Google Scholar
Fan J, Han F, Liu H (2014) Challenges of big data analysis. Natl Sci Rev 1:293–314
Article Google Scholar
Fevotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput 21(3):793–830
Article Google Scholar
Forero PA, Cano A, Giannakis GB (2010) Consensus-based distributed support vector machines. J Mach Learn Res 11:1663–1707
Google Scholar
Gillis N, Luce R (2014) Robust near-separable nonnegative matrix factorization using linear optimization. J Mach Learn Res 15:1249–1280
Google Scholar
Huang C, Huo X (2015) A distributed one-step estimator. Technical report. http://arxiv.org/abs/1511.01443. Accessed 10 Nov 2015
Huang K, Sidiropoulos ND, Swami A (2014) Non-negative matrix factorization revisited: uniqueness and algorithm for symmetric decomposition. IEEE Trans Signal Process 62(1):211–224
Article MathSciNet Google Scholar
Jaggi M, Smith V, Takác M, Terhorst J, Krishnan S, Hofmann T, Jordan MI (2014) Communication-efficient distributed dual coordinate ascent. In: Advances in neural information processing systems, pp 3068–3076
Google Scholar
Kleiner A, Talwalkar A, Sarkar P, Jordan MI (2014) A scalable bootstrap for massive data. J R Stat Soc Ser B Stat Methodol 76(4):795–816
Article MathSciNet Google Scholar
Lang S (1993) Real and functional analysis, vol 142. Springer Science & Business Media, Berlin
Book Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401:788–791
Article Google Scholar
Lee JD, Sun Y, Liu Q, Taylor JE (2015) Communication-efficient sparse regression: a one-shot approach. arXiv preprint arXiv:1503.04337
Google Scholar
Liu Q, Ihler AT (2014) Distributed estimation, information loss and exponential families. In: Advances in neural information processing systems, pp 1098–1106
Google Scholar
McDonald R, Hall K, Mann G (2010) Distributed training strategies for the structured perceptron. In: North American chapter of the Association for Computational Linguistics (NAACL)
Google Scholar
Mitra S, Agrawal M, Yadav A, Carlsson N, Eager D, Mahanti A (2011) Characterizing web-based video sharing workloads. ACM Trans Web 5(2):8
Article Google Scholar
Mizutani T (2014) Ellipsoidal rounding for nonnegative matrix factorization under noisy separability. J Mach Learn Res 15:1011–1039
Google Scholar
Neiswanger W, Wang C, Xing E (2013) Asymptotically exact, embarrassingly parallel MCMC. arXiv preprint arXiv:1311.4780
Google Scholar
Nowak RD (2003) Distributed EM algorithms for density estimation and clustering in sensor networks. IEEE Trans Signal Process 51(8):2245–2253
Article Google Scholar
Paatero P, Tapper U (1994) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126
Article Google Scholar
Pauca VP, Piper J, Plemmons RJ (2006) Nonnegative matrix factorization for spectral data analysis. Linear Algebra Appl 401(1):29–47
Article MathSciNet Google Scholar
Ravikumar P, Lafferty J, Liu H, Wasserman L (2009) Sparse additive models. J R Stat Soc Ser B Stat Methodol 71(5):1009–1030
Article MathSciNet Google Scholar
Rosenblatt J, Nadler B (2014) On the optimality of averaging in distributed statistical learning. arXiv preprint arXiv:1407.2724
Google Scholar
Schmidt MN, Larson J, Hsiao FT (2007) Wind noise reduction using non-negative sparse coding. In: Machine learning for signal processing, IEEE workshop, pp 431–436
Google Scholar
Shamir O, Srebro N, Zhang T (2014) Communication-efficient distributed optimization using an approximate Newton-type method. In: Proceedings of the 31st international conference on machine learning, pp 1000–1008
Google Scholar
Song Q, Liang F (2015) A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression. J R Stat Soc B 77(Part 5):947–972
Article MathSciNet Google Scholar
Städler N, Bühlmann P, Van De Geer S (2010) ℓ ₁-Penalization for mixture regression models. Test 19(2):209–256
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58(1):267–288
Google Scholar
van der Vaart AW (2000) Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge
Google Scholar
Wainwright M (2014) Constrained forms of statistical minimax: computation, communication, and privacy. In: Proceedings of international congress of mathematicians
Google Scholar
Wang X, Peng P, Dunson DB (2014) Median selection subset aggregation for parallel inference. In: Advances in neural information processing systems, pp 2195–2203
Google Scholar
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: The 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 267–273
Google Scholar
Yang Y, Barron A (1999) Information-theoretic determination of minimax rates of convergence. Ann Stat 27(5):1564–1599
Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol 68(1):49–67
Article MathSciNet Google Scholar
Zhang Y, Duchi JC, Wainwright MJ (2013) Communication-efficient algorithms for statistical optimization. J Mach Learn Res 14:3321–3363
Google Scholar
Zhang Y, Duchi JC, Jordan MI, Wainwright MJ (2013) Information-theoretic lower bounds for distributed statistical estimation with communication constraints. Technical report, UC Berkeley. Presented at the NIPS Conference 2013
Google Scholar
Zhao T, Cheng G, Liu H (2014) A partially linear framework for massive heterogeneous data. arXiv preprint arXiv:1410.8570
Google Scholar
Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems, pp 2595–2603
Google Scholar
Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Xiaoming Huo & Cheng Huang
Department of Statistics and Analytical Sciences, Kennesaw State University, Kennesaw, GA, USA
Xuelei Sherry Ni

Authors

Xiaoming Huo
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xuelei Sherry Ni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoming Huo .

Editor information

Editors and Affiliations

Ladislaus von Bortkiewicz Chair of Statistics, C.A.S.E. Center for Applied Statistics & Economics, Humboldt-Universität zu Berlin, Berlin, Germany
Wolfgang Karl Härdle
Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan
Henry Horng-Shing Lu
School of Statistics, University of Minnesota, Minneapolis, USA
Xiaotong Shen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Huo, X., Huang, C., Ni, X.S. (2018). Scattered Data and Aggregated Inference. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-18284-1_4
Published: 18 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18283-4
Online ISBN: 978-3-319-18284-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics