Skip to main content

Distributed Logistic Regression for Separated Massive Data

  • Conference paper
  • First Online:
Big Data (BigData 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1120))

Included in the following conference series:

Abstract

In this paper, we study the distributed logistic regression to process the separated large scale data which is stored in different linked computers. Based on the Alternating Direction Method of Multipliers (ADMM) algorithm, we transform the solving of logistic problem into the multistep iteration process, and propose the distributed logistic algorithm which has controllable communication cost. Specifically, in each iteration of the distributed algorithm, each computer updates the local estimators and interacts the local estimators with the neighbors simultaneously. Then we prove the convergence of distributed logistic algorithm. Due to the decentralized property of computer network, the proposed distributed logistic algorithm is robust. The classification results of our distributed logistic method are same as the non-distributed approach. Numerical studies have shown that our approach are both effective and efficient which perform well in distributed massive data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mcdonald, R., Mohri, M., Silberman, N., Walker, D., Mann, G.: Efficient large-scale distributed training of conditional maximum entropy models. Advances in Neural Information Processing Systems, vol. 1, pp. 1231–1239. NIPS, La Jolla (2009)

    Google Scholar 

  2. McDonald, R., Hall, K., Mann, G.: Distributed training strategies for the structured perceptron. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 456–464. ACL, Los Angeles (2010)

    Google Scholar 

  3. Zhang, Y., Duchi, J., Wainwright, M.: Communication-efficient algorithms for statistical optimization. J. Mach. Learn. Res. 14(1), 3321–3363 (2013)

    MathSciNet  MATH  Google Scholar 

  4. Zhang, Y., Duchi, J., Wainwright, M.: Divide and conquer Kernel ridge regression: a distributed algorithm with minimax optimal rates. J. Mach. Learn. Res. 30(1), 592–617 (2013)

    MATH  Google Scholar 

  5. Mateos, G., Bazerque, J., Giannakis, G.: Distributed sparse linear regression. IEEE Trans. Signal Process. 58(10), 5262–5276 (2010)

    Article  MathSciNet  Google Scholar 

  6. Wang, P., Zhang, H., Liang, Y.: Model selection with distributed SCAD penalty. J. Appl. Stat. 45(11), 1938–1955 (2017)

    Article  MathSciNet  Google Scholar 

  7. Wang J., Kolar M., Srebro N., Zhang T.: Efficient distributed learning with sparsity. In: International Conference on Machine Learning, vol. 70, pp. 3636–3645. PMLR, Sydney (2017)

    Google Scholar 

  8. Menendez, M.L., Pardo, L., Pardo, M.C.: Preliminary \(phi\)-divergence test estimators for linear restrictions in a logistic regression model. Stat. Pap. 50(2), 277–300 (2009)

    Article  MathSciNet  Google Scholar 

  9. Pardo, J.A., Pardo, L., Pardo, M.C.: Minimum \(\phi -\)divergence estimator in logistic regression models. Stat. Pap. 47(1), 91–108 (2006)

    Article  MathSciNet  Google Scholar 

  10. Revan, O.M.: Iterative algorithms of biased estimation methods in binary logistic regression. Stat. Pap. 57(4), 991–1016 (2016)

    Article  MathSciNet  Google Scholar 

  11. Lange, T., Mosler, K., Mozharovskyi, P.: Fast nonparametric classification based on data depth. Stat. Pap. 55(1), 49–69 (2015)

    Article  MathSciNet  Google Scholar 

  12. Boyd, S., Parikh, N., Chu, E.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  13. Xie, P., Jin, K., Xing, E.: Distributed machine learning via sufficient factor broadcasting. Arxiv, http://arxiv.org/abs/1409.5705. Accessed 7 Sep 2015

  14. Gopal, S., Yang, Y.: Distributed training of large-scale logistic models. In: Proceedings of the 30th International Conference on Machine Learning, vol. 28, pp. 289–297. PMLR, Atlanta (2013)

    Google Scholar 

  15. Peng, H., Liang, D., Choi, C.: Evaluating parallel logistic regression models. In: IEEE International Conference on Big Data, pp. 119–126. IEEE, Silicon Valley (2013)

    Google Scholar 

  16. Kang, D., Lim, W., Shin, K.: Data/feature distributed stochastic coordinate descent for logistic regression. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1269–1278. ACM, Shanghai (2014)

    Google Scholar 

  17. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  Google Scholar 

  18. Glowinski, R., Marroco, A.: On the solution of a class of non linear Dirichlet problems by a penalty-duality method and finite elements of order one. In: Marchuk, G.I. (ed.) Optimization Techniques IFIP Technical Conference. LNCS, pp. 327–333. Springer, Berlin (1974). https://doi.org/10.1007/978-3-662-38527-2_45

    Chapter  Google Scholar 

  19. Bertsekas, D., Tsitsiklis, J.: Parallel and Distributed Computation: Numerical Methods, 2nd edn. Athena Scientific, Belmont (1997)

    MATH  Google Scholar 

  20. Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78(1), 29–63 (2019)

    Article  MathSciNet  Google Scholar 

  21. Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  22. Blake, C., Merz, C.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Zhang .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 ADMM algorithm

Consider the following optimization problem

$$\begin{aligned} \underset{R, \theta }{\min }\ f(\theta )+h(R) \end{aligned}$$
(2.4)
$$s.t.\ MR+M^{'}\theta =0,$$

First, we form the quadratically augmented Lagrangian function

$$\begin{aligned} L(R,\theta ,V)=f(\theta )+h(R)+\langle V, MR+M^{'}\theta \rangle +\frac{c}{2}\Vert MR+M^{'}\theta \Vert _2^2, \end{aligned}$$
(2.5)

where \(V:=\{V_{j}\}_{j\in \mathcal {C}}\) is Lagrange multipliers, and \(c>0\) is a preselected penalty coefficient. Then we use ADMM algorithm to solve this problem.

ADMM algorithm entails three steps per iteration

Step 1: R updates,

$$\begin{aligned} R^{k+1}=\underset{R}{\arg \min }\ L(R,\theta ^{k},V^{k}). \end{aligned}$$
(A.1)

Step 2: \(\theta \) updates,

$$\begin{aligned} \theta ^{k+1}=\underset{\theta }{\arg \min }\ L(R^{k+1},\theta ,V^{k}). \end{aligned}$$
(A.2)

Step 3: V updates,

$$\begin{aligned} V^{k+1}= V^{k}+c(MR^{k+1}+M^{'}\theta ^{k+1}). \end{aligned}$$
(A.3)

For details, in iteration \(k+1\), we update \(R^{k+1}\) via minimum \(L(R,\theta ^{k},V^{k})\) with respect to R, update \(\theta \) via minimum \(L(R^{k+1},\theta ,V^{k})\) with respect to \(\theta \), and update Lagrange multiplier via \(V^{k+1}=V^{(k)}+c(MR^{(k+1)}+M^{'}\theta ^{(k+1)})\), until the algorithm converges.

1.2 Proof of Theorem 2

Proof

Theorem 2 can be considered as a special case of [20]. This paper analyzed the convergence of ADMM for minimizing possible nonconvex objective problem. It’s enough to show that our distributed optimization problem satisfies the convergence conditions A1-A5 in [20]. It’s obvious that A1 holds. Since both M and M\(^{'}\) are full rank, then A2 and A5 are established. And \(f(\theta )\) is Lipschitz differentiable with constant \(L_f\), A4 holds. For any \(j\in J\), and any \(R_j\), \(R_j^{'}\), (2.6) holds, thus A3 established. Then Theorem 2 has been proved.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, P., Wang, P., Zhang, H. (2019). Distributed Logistic Regression for Separated Massive Data. In: Jin, H., Lin, X., Cheng, X., Shi, X., Xiao, N., Huang, Y. (eds) Big Data. BigData 2019. Communications in Computer and Information Science, vol 1120. Springer, Singapore. https://doi.org/10.1007/978-981-15-1899-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1899-7_20

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1898-0

  • Online ISBN: 978-981-15-1899-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics