Machine Learning

, Volume 107, Issue 4, pp 727–747 | Cite as

Distributed multi-task classification: a decentralized online learning approach

  • Chi Zhang
  • Peilin Zhao
  • Shuji Hao
  • Yeng Chai Soh
  • Bu Sung Lee
  • Chunyan Miao
  • Steven C. H. Hoi


Although dispersing one single task to distributed learning nodes has been intensively studied by the previous research, multi-task learning on distributed networks is still an area that has not been fully exploited, especially under decentralized settings. The challenge lies in the fact that different tasks may have different optimal learning weights while communication through the distributed network forces all tasks to converge to an unique classifier. In this paper, we present a novel algorithm to overcome this challenge and enable learning multiple tasks simultaneously on a decentralized distributed network. Specifically, the learning framework can be separated into two phases: (i) multi-task information is shared within each node on the first phase; (ii) communication between nodes then leads the whole network to converge to a common minimizer. Theoretical analysis indicates that our algorithm achieves a \(\mathcal {O}(\sqrt{T})\) regret bound when compared with the best classifier in hindsight, which is further validated by experiments on both synthetic and real-world datasets.


Decentralized distributed learning Multi-task learning Online learning 



This research is supported by the National Research Foundation, Prime Ministers Office, Singapore under its IDM Futures Funding Initiative. We also get support from “Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY)” and “Interdisciplinary Graduate School (IGS)”.


  1. Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. The Journal of Machine Learning Research, 6, 1817–1853.MathSciNetzbMATHGoogle Scholar
  2. Bertrand, A., & Moonen, M. (2010). Distributed adaptive node-specific signal estimation in fully connected sensor networkspart i: Sequential node updating. IEEE Transactions on Signal Processing, 58(10), 5277–5291.MathSciNetCrossRefGoogle Scholar
  3. Bertrand, A., & Moonen, M. (2011). Distributed adaptive estimation of node-specific signals in wireless sensor networks with a tree topology. IEEE Transactions on Signal Processing, 59(5), 2196–2210.MathSciNetCrossRefGoogle Scholar
  4. Bertsekas, D. P. (1983). Distributed asynchronous computation of fixed points. Mathematical Programming, 27(1), 107–120.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Bertsekas, D. P. (1997). A new class of incremental gradient methods for least squares problems. SIAM Journal on Optimization, 7(4), 913–926.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Bertsekas, D. P., & Tsitsiklis, J. N. (1989). Parallel and distributed computation: Numerical methods (Vol. 23). Englewood Cliffs, NJ: Prentice Hall.zbMATHGoogle Scholar
  7. Blatt, D., Hero, A. O., & Gauchman, H. (2007). A convergent incremental gradient method with a constant step size. SIAM Journal on Optimization, 18(1), 29–51.MathSciNetCrossRefzbMATHGoogle Scholar
  8. Blitzer, J., Dredze, M., Pereira, F., et al. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. ACL, 7, 440–447.Google Scholar
  9. Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.MathSciNetCrossRefGoogle Scholar
  10. Cavallanti, G., Cesa-Bianchi, N., & Gentile, C. (2010). Linear algorithms for online multitask classification. The Journal of Machine Learning Research, 11, 2901–2934.MathSciNetzbMATHGoogle Scholar
  11. Chen, J., Richard, C., & Sayed, A. H. (2014). Multitask diffusion adaptation over networks. IEEE Transactions on Signal Processing, 62(16), 4129–4144.MathSciNetCrossRefGoogle Scholar
  12. Chen, J., & Sayed, A. H. (2012). Diffusion adaptation strategies for distributed optimization and learning over networks. IEEE Transactions on Signal Processing, 60(8), 4289–4305.MathSciNetCrossRefGoogle Scholar
  13. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive–aggressive algorithms. The Journal of Machine Learning Research, 7, 551–585.MathSciNetzbMATHGoogle Scholar
  14. DeGroot, M. H. (1974). Reaching a consensus. Journal of the American Statistical Association, 69(345), 118–121.CrossRefzbMATHGoogle Scholar
  15. Dekel, O., Long, P. M., & Singer, Y. (2006). Online multitask learning. In G. Lugosi & H. U. Simon (Eds.), Learning theory (pp. 453–467). Berlin: Springer.Google Scholar
  16. Dinuzzo, F., Pillonetto, G., & De Nicolao, G. (2011). Client–server multitask learning from distributed datasets. IEEE Transactions on Neural Networks, 22(2), 290–303.CrossRefGoogle Scholar
  17. Duchi, J. C., Agarwal, A., & Wainwright, M. J. (2012). Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Transactions on Automatic Control, 57(3), 592–606.MathSciNetCrossRefzbMATHGoogle Scholar
  18. Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. In Journal of Machine Learning Research, 6, 615–637.MathSciNetzbMATHGoogle Scholar
  19. Johansson, B., Keviczky, T., Johansson, M., & Johansson, K. H. (2008). Subgradient methods and consensus algorithms for solving convex optimization problems. In 47th IEEE conference on decision and control, 2008. CDC 2008 (pp. 4185–4190). IEEE.Google Scholar
  20. Li, G., Hoi, S. C., Chang, K., Liu, W., & Jain, R. (2014). Collaborative online multitask learning. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1866–1876.CrossRefGoogle Scholar
  21. Lugosi, G., Papaspiliopoulos, O., & Stoltz, G. (2009). Online multi-task learning with hard constraints. arXiv preprint arXiv:0902.3526.
  22. Murugesan, K., Liu, H., Carbonell, J., & Yang, Y. (2016). Adaptive smoothed online multi-task learning. In Advances in Neural Information Processing Systems (pp. 4296–4304).Google Scholar
  23. Nedic, A., & Ozdaglar, A. (2009). Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1), 48–61.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Negahban, S., & Wainwright, M. J. (2008). Joint support recovery under high-dimensional scaling: Benefits and perils of l1,-regularization. Advances in Neural Information Processing Systems, 21, 1161–1168.Google Scholar
  25. Neto, E. S. H., & De Pierro, Á. R. (2009). Incremental subgradients for constrained convex optimization: A unified framework and new methods. SIAM Journal on Optimization, 20(3), 1547–1572.MathSciNetCrossRefzbMATHGoogle Scholar
  26. Peters, B., Bui, H. H., Frankild, S., Nielsen, M., Lundegaard, C., Kostem, E., et al. (2006). A community resource benchmarking predictions of peptide binding to mhc-i molecules. PLoS Computational Biology, 2(6), e65.CrossRefGoogle Scholar
  27. Pong, T. K., Tseng, P., Ji, S., & Ye, J. (2010). Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM Journal on Optimization, 20(6), 3465–3489.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386.CrossRefGoogle Scholar
  29. Saha, A., Rai, P., Venkatasubramanian, S., & Daume, H. (2011). Online learning of multiple tasks and their relationships. In International Conference on Artificial Intelligence and Statistics (pp. 643–651).Google Scholar
  30. Sayed, A. H., et al. (2014). Adaptation, learning, and optimization over networks. Foundations and Trends \({\textregistered }\) Machine Learning, 7(4–5), 311–801.Google Scholar
  31. Sayed, A. H. (2013). Diffusion adaptation over networks. Academic Press Library in Signal Processing, 3, 323–454.CrossRefGoogle Scholar
  32. Shalev-Shwartz, S., et al. (2012). Online learning and online convex optimization. Foundations and Trends \({\textregistered }\) Machine Learning, 4(2), 107–194.Google Scholar
  33. Strang, G., Strang, G., Strang, G., & Strang, G. (1993). Introduction to linear algebra (Vol. 3). Wellesley, MA: Wellesley-Cambridge Press.zbMATHGoogle Scholar
  34. Sundhar Ram, S., Nedić, A., & Veeravalli, V. V. (2010). Distributed stochastic subgradient projection algorithms for convex optimization. Journal of Optimization Theory and Applications, 147(3), 516–545.MathSciNetCrossRefzbMATHGoogle Scholar
  35. Tsitsiklis, J. N. (1984). Problems in decentralized decision making and computation. DTIC Document: Technical report.Google Scholar
  36. Tu, S. Y., & Sayed, A. H. (2012). Diffusion strategies outperform consensus strategies for distributed estimation over adaptive networks. IEEE Transactions on Signal Processing, 60(12), 6217–6234.MathSciNetCrossRefGoogle Scholar
  37. Wang, J., Kolar, M., & Srebro, N. (2016). Distributed multi-task learning with shared representation. arXiv preprint arXiv:1603.02185.
  38. Wang, J., Kolar, M., Srebro, N., et al. (2016). Distributed multi-task learning. In Proceedings of the 19th international conference on artificial intelligence and statistics (AISTATS) (pp. 751–760).Google Scholar
  39. Zhang, C., Zhao, P., Hao, S., Soh, Y. C., & Lee, B. S. (2016). Rom: A robust online multi-task learning approach. In Data mining (ICDM), 2016 IEEE 16th international conference on (pp. 1341–1346). IEEE.Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Nanyang Technological UniversitySingaporeSingapore
  2. 2.School of Software EngineeringSouth China University of TechnologyGuangzhouChina
  3. 3.Institute of High Performance Computing, A*STARSingaporeSingapore
  4. 4.Singapore Management UniversitySingaporeSingapore

Personalised recommendations