Abstract
In this paper, we study the distributed logistic regression to process the separated large scale data which is stored in different linked computers. Based on the Alternating Direction Method of Multipliers (ADMM) algorithm, we transform the solving of logistic problem into the multistep iteration process, and propose the distributed logistic algorithm which has controllable communication cost. Specifically, in each iteration of the distributed algorithm, each computer updates the local estimators and interacts the local estimators with the neighbors simultaneously. Then we prove the convergence of distributed logistic algorithm. Due to the decentralized property of computer network, the proposed distributed logistic algorithm is robust. The classification results of our distributed logistic method are same as the non-distributed approach. Numerical studies have shown that our approach are both effective and efficient which perform well in distributed massive data analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mcdonald, R., Mohri, M., Silberman, N., Walker, D., Mann, G.: Efficient large-scale distributed training of conditional maximum entropy models. Advances in Neural Information Processing Systems, vol. 1, pp. 1231–1239. NIPS, La Jolla (2009)
McDonald, R., Hall, K., Mann, G.: Distributed training strategies for the structured perceptron. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 456–464. ACL, Los Angeles (2010)
Zhang, Y., Duchi, J., Wainwright, M.: Communication-efficient algorithms for statistical optimization. J. Mach. Learn. Res. 14(1), 3321–3363 (2013)
Zhang, Y., Duchi, J., Wainwright, M.: Divide and conquer Kernel ridge regression: a distributed algorithm with minimax optimal rates. J. Mach. Learn. Res. 30(1), 592–617 (2013)
Mateos, G., Bazerque, J., Giannakis, G.: Distributed sparse linear regression. IEEE Trans. Signal Process. 58(10), 5262–5276 (2010)
Wang, P., Zhang, H., Liang, Y.: Model selection with distributed SCAD penalty. J. Appl. Stat. 45(11), 1938–1955 (2017)
Wang J., Kolar M., Srebro N., Zhang T.: Efficient distributed learning with sparsity. In: International Conference on Machine Learning, vol. 70, pp. 3636–3645. PMLR, Sydney (2017)
Menendez, M.L., Pardo, L., Pardo, M.C.: Preliminary \(phi\)-divergence test estimators for linear restrictions in a logistic regression model. Stat. Pap. 50(2), 277–300 (2009)
Pardo, J.A., Pardo, L., Pardo, M.C.: Minimum \(\phi -\)divergence estimator in logistic regression models. Stat. Pap. 47(1), 91–108 (2006)
Revan, O.M.: Iterative algorithms of biased estimation methods in binary logistic regression. Stat. Pap. 57(4), 991–1016 (2016)
Lange, T., Mosler, K., Mozharovskyi, P.: Fast nonparametric classification based on data depth. Stat. Pap. 55(1), 49–69 (2015)
Boyd, S., Parikh, N., Chu, E.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Xie, P., Jin, K., Xing, E.: Distributed machine learning via sufficient factor broadcasting. Arxiv, http://arxiv.org/abs/1409.5705. Accessed 7 Sep 2015
Gopal, S., Yang, Y.: Distributed training of large-scale logistic models. In: Proceedings of the 30th International Conference on Machine Learning, vol. 28, pp. 289–297. PMLR, Atlanta (2013)
Peng, H., Liang, D., Choi, C.: Evaluating parallel logistic regression models. In: IEEE International Conference on Big Data, pp. 119–126. IEEE, Silicon Valley (2013)
Kang, D., Lim, W., Shin, K.: Data/feature distributed stochastic coordinate descent for logistic regression. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1269–1278. ACM, Shanghai (2014)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Glowinski, R., Marroco, A.: On the solution of a class of non linear Dirichlet problems by a penalty-duality method and finite elements of order one. In: Marchuk, G.I. (ed.) Optimization Techniques IFIP Technical Conference. LNCS, pp. 327–333. Springer, Berlin (1974). https://doi.org/10.1007/978-3-662-38527-2_45
Bertsekas, D., Tsitsiklis, J.: Parallel and Distributed Computation: Numerical Methods, 2nd edn. Athena Scientific, Belmont (1997)
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)
Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96(12), 6745–6750 (1999)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 ADMM algorithm
Consider the following optimization problem
First, we form the quadratically augmented Lagrangian function
where \(V:=\{V_{j}\}_{j\in \mathcal {C}}\) is Lagrange multipliers, and \(c>0\) is a preselected penalty coefficient. Then we use ADMM algorithm to solve this problem.
ADMM algorithm entails three steps per iteration
Step 1: R updates,
Step 2: \(\theta \) updates,
Step 3: V updates,
For details, in iteration \(k+1\), we update \(R^{k+1}\) via minimum \(L(R,\theta ^{k},V^{k})\) with respect to R, update \(\theta \) via minimum \(L(R^{k+1},\theta ,V^{k})\) with respect to \(\theta \), and update Lagrange multiplier via \(V^{k+1}=V^{(k)}+c(MR^{(k+1)}+M^{'}\theta ^{(k+1)})\), until the algorithm converges.
1.2 Proof of Theorem 2
Proof
Theorem 2 can be considered as a special case of [20]. This paper analyzed the convergence of ADMM for minimizing possible nonconvex objective problem. It’s enough to show that our distributed optimization problem satisfies the convergence conditions A1-A5 in [20]. It’s obvious that A1 holds. Since both M and M\(^{'}\) are full rank, then A2 and A5 are established. And \(f(\theta )\) is Lipschitz differentiable with constant \(L_f\), A4 holds. For any \(j\in J\), and any \(R_j\), \(R_j^{'}\), (2.6) holds, thus A3 established. Then Theorem 2 has been proved.
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shi, P., Wang, P., Zhang, H. (2019). Distributed Logistic Regression for Separated Massive Data. In: Jin, H., Lin, X., Cheng, X., Shi, X., Xiao, N., Huang, Y. (eds) Big Data. BigData 2019. Communications in Computer and Information Science, vol 1120. Springer, Singapore. https://doi.org/10.1007/978-981-15-1899-7_20
Download citation
DOI: https://doi.org/10.1007/978-981-15-1899-7_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1898-0
Online ISBN: 978-981-15-1899-7
eBook Packages: Computer ScienceComputer Science (R0)