Abstract
We present the first protocol for distributed online prediction that aims to minimize online prediction loss and network communication at the same time. This protocol can be applied wherever a prediction-based service must be provided timely for each data point of a multitude of high frequency data streams, each of which is observed at a local node of some distributed system. Exemplary applications include social content recommendation and algorithmic trading. The challenge is to balance the joint predictive performance of the nodes by exchanging information between them, while not letting communication overhead deteriorate the responsiveness of the service. Technically, the proposed protocol is based on controlling the variance of the local models in a decentralized way. This approach retains the asymptotic optimal regret of previous algorithms. At the same time, it allows to substantially reduce network communication, and, in contrast to previous approaches, it remains applicable when the data is non-stationary and shows rapid concept drift. We demonstrate empirically that the protocol is able to hold up a high predictive performance using only a fraction of the communication required by benchmark methods.
A preliminary extended abstract of this paper was presented at the BD3 workshop at VLDB’13. This research has been supported by the EU FP7-ICT-2013-11 under grant 619491 (FERARI).
Chapter PDF
Similar content being viewed by others
Keywords
- Batch Size
- Neural Information Processing System
- Stochastic Gradient Descent
- Machine Learn Research
- Hinge Loss
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abernethy, J., Agarwal, A., Bartlett, P.L., Rakhlin, A.: A stochastic view of optimal regret through minimax duality. In: 22nd Annual Conference on Learning Theory (2009)
Balcan, M.-F., Blum, A., Fine, S., Mansour, Y.: Distributed learning, communication complexity and privacy. CoRR, abs/1204.3514 (2012)
Bar-Or, A., Wolff, R., Schuster, A., Keren, D.: Decision tree induction in high dimensional, hierarchically distributed databases. In: Proceedings of the SIAM International Conference on Data Mining (2005)
Bshouty, N.H., Long, P.M.: Linear classifiers are nearly optimal when hidden variables have diverse effects. Machine Learning 86(2), 209–231 (2012)
Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press (2006) ISBN 978-0-521-84108-5
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. Journal of Machine Learning Research 7, 551–585 (2006)
Daumé III, H., Phillips, J.M., Saha, A., Venkatasubramanian, S.: Efficient protocols for distributed classification and optimization. CoRR, abs/1204.3523 (2012)
Dekel, O., Gilad-Bachrach, R., Shamir, O., Xiao, L.: Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research 13, 165–202 (2012)
Herbster, M., Warmuth, M.K.: Tracking the best linear predictor. Journal of Machine Learning Research 1, 281–309 (2001)
Hsu, D., Karampatziakis, N., Langford, J., Smola, A.J.: Parallel online learning. CoRR, abs/1103.4204 (2011)
Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: SIGMOD, pp. 289–300 (2006)
Keren, D., Sharfman, I., Schuster, A., Livne, A.: Shape sensitive geometric monitoring. Transactions on Knowledge and Data Engineering 24(8), 1520–1535 (2012)
Mcdonald, R., Mohri, M., Silberman, N., Walker, D., Mann, G.S.: Efficient large-scale distributed training of conditional maximum entropy models. In: Advances in Neural Information Processing Systems (NIPS), vol. 22, pp. 1231–1239 (2009)
McDonald, R.T., Hall, K., Mann, G.: Distributed training strategies for the structured perceptron. In: HLT-NAACL, pp. 456–464 (2010)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Kluwer Academic Publisher (2003)
Nguyen, X., Wainwright, M.J., Jordan, M.I.: Decentralized detection and classification using kernel methods. In: ICML, page 80. ACM (2004)
Ouyang, J., Patel, N., Sethi, I.: Induction of multiclass multifeature split decision trees from distributed data. Pattern Recognition 42(9), 1786–1794 (2009)
Predd, J.B., Kulkarni, S., Poor, V.: Distributed learning in wireless sensor networks. Signal Processing Magazine 23(4), 56–69 (2006)
Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. Transactions on Database Systems (TODS) 32(4), 23 (2007)
Wang, J.-P., Lu, Y.-C., Yeh, M.-Y., Lin, S.-D., Gibbons, P.B.: Communication-efficient distributed multiple reference pattern matching for m2m systems. In: Proceedings of the International Conference on Data Mining (ICDM). IEEE (2013)
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research 11, 2543–2596 (2010)
Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the International Conference on Machine Learning (ICML), page 116. ACM (2004)
Zinkevich, M., Smola, A.J., Langford, J.: Slow learners are fast. In: Advances in Neural Information Processing Systems (NIPS), vol. 22, pp. 2331–2339 (2009)
Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. In: Advances in Neural Information Processing Systems (NIPS), pp. 2595–2603 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kamp, M., Boley, M., Keren, D., Schuster, A., Sharfman, I. (2014). Communication-Efficient Distributed Online Prediction by Dynamic Model Synchronization. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44848-9_40
Download citation
DOI: https://doi.org/10.1007/978-3-662-44848-9_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44847-2
Online ISBN: 978-3-662-44848-9
eBook Packages: Computer ScienceComputer Science (R0)