Advertisement

Consistency of the Fittest: Towards Dynamic Staleness Control for Edge Data Analytics

  • Atakan AralEmail author
  • Ivona Brandic
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)

Abstract

A critical challenge for data stream processing at the edge of the network is the consistency of the machine learning models in distributed worker nodes. Especially in the case of non-stationary streams, which exhibit high degree of data set shift, mismanagement of models poses the risks of suboptimal accuracy due to staleness and ignored data. In this work, we analyze model consistency challenges of distributed online machine learning scenario and present preliminary solutions for synchronizing model updates. Additionally, we propose metrics for measuring the level and speed of data set shift.

Keywords

Edge computing Data analytics Consistency Staleness 

Notes

Acknowledgements

The work described in this paper has been funded through the Haley project (Holistic Energy Efficient Hybrid Clouds) as part of the TU Vienna Distinguished Young Scientist Award 2011 and Rucon project (Runtime Control in Multi Clouds), FWF Y 904 START-Programm 2015.

References

  1. 1.
    Aral, A., Brandic, I.: Dependency mining for service resilience at the edge. In: ACM/IEEE Symposium on Edge Computing, pp. 228–242. IEEE (2018)Google Scholar
  2. 2.
    de Assuncao, M.D., da Silva Veith, A., Buyya, R.: Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J. Netw. Comput. Appl. 103, 1–17 (2018)CrossRefGoogle Scholar
  3. 3.
    Ben-Haim, Y., Tom-Tov, E.: A streaming parallel decision tree algorithm. J. Mach. Learn. Res. 11, 849–872 (2010)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecasting and Control. Wiley, Hoboken (2015)zbMATHGoogle Scholar
  5. 5.
    Brogi, A., Mencagli, G., Neri, D., Soldani, J., Torquati, M.: Container-based support for autonomic DSP through the Fog. In: Auto-DaSP, pp. 17–28 (2017)Google Scholar
  6. 6.
    Cardellini, V., Presti, F.L., Nardelli, M., Russo, G.R.: Decentralized self-adaptation for elastic data stream processing. Future Gener. Comput. Syst. 87, 171–185 (2018)CrossRefGoogle Scholar
  7. 7.
    Cipar, J., Ho, Q., Kim, J.K., Lee, S., Ganger, G.R., Gibson, G., et al.: Solving the straggler problem with bounded staleness. In: HotOS, vol. 13, p. 22 (2013)Google Scholar
  8. 8.
    Erol-Kantarci, M., Mouftah, H.T.: Energy-efficient information and communication infrastructures in the smart grid: a survey on interactions and open issues. IEEE Commun. Surv. Tutor. 17(1), 179–197 (2015)CrossRefGoogle Scholar
  9. 9.
    Farhangi, H.: The path of the smart grid. Power Energy Mag. 8(1), 18–28 (2010)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Greenberg, A., Hamilton, J., Maltz, D.A., Patel, P.: The cost of a cloud: research problems in DC networks. Comput. Commun. Rev. 39(1), 68–73 (2008)CrossRefGoogle Scholar
  11. 11.
    Hara, T., Madria, S.K.: Consistency management among replicas in peer-to-peer mobile ad hoc networks. In: 24th IEEE Symposium on Reliable Distributed Systems, pp. 3–12. IEEE (2005)Google Scholar
  12. 12.
    Harries, M.: SPLICE-2 Comparative Evaluation: Electricity Pricing. Technical report, The University of New South Wales, Sydney 2052, Australia (1999)Google Scholar
  13. 13.
    Ho, Q., Cipar, J., Cui, H., Lee, S., Kim, J.K., Gibbons, P.B., et al.: More effective distributed ML via a stale synchronous parallel parameter server. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2013)Google Scholar
  14. 14.
    Javadi, B., Kondo, D., Vincent, J., Anderson, D.: Mining for statistical availability models in large-scale distributed systems: an empirical study of SETI@home. In: IEEE/ACM MASCOTS (2009)Google Scholar
  15. 15.
    Kim, K.: Financial time series forecasting using support vector machines. Neurocomputing 55(1–2), 307–319 (2003)CrossRefGoogle Scholar
  16. 16.
    Lee, J.H., Sim, J., Kim, H.: BSSync: processing near memory for machine learning workloads with bounded staleness consistency models. In: International Conference on Parallel Architecture and Compilation, pp. 241–252. IEEE (2015)Google Scholar
  17. 17.
    Li, M., Andersen, D.G., Park, J.W., Smola, A.J., Ahmed, A., Josifovski, V., et al.: Scaling distributed machine learning with the parameter server. In: USENIX Conference on Operating Systems Design and Implementation, pp. 583–598 (2014)Google Scholar
  18. 18.
    Lujic, I., De Maio, V., Brandic, I.: Efficient edge storage management based on near real-time forecasts. In: ICFEC, pp. 21–30. IEEE (2017)Google Scholar
  19. 19.
    McDonald, J., McGranaghan, M., Denton, D., Ellis, A., Imhoff, C., et al.: Strategic R&D opportunities for the smart grid. Technical report, NIST Steering Committee for Innovation in Smart Grid Measurement Science and Standards (2013)Google Scholar
  20. 20.
    Melton, R., Knight, M., et al.: GridWise Transactive Energy Framework (version 1). Technical report, The GridWise Architecture Council, WA, USA, PNNL-22946 (2015)Google Scholar
  21. 21.
    Morales, G.D.F., Bifet, A.: Samoa: scalable advanced massive online analysis. J. Mach. Learn. Res. 16(1), 149–153 (2015)Google Scholar
  22. 22.
    Moreno-Torres, J.G., Raeder, T., Alaiz-Rodríguez, R., et al.: A unifying view on dataset shift in classification. Pattern Recognit. 45(1), 521–530 (2012)CrossRefGoogle Scholar
  23. 23.
    Parker, C.: Machine learning from streaming data: two problems, two solutions, two concerns, and two lessons (2013). https://blog.bigml.com/2013/03/12/
  24. 24.
    Patel, P., Ali, M.I., Sheth, A.: On using the intelligent edge for IoT analytics. IEEE Intell. Syst. 32(5), 64–69 (2017)CrossRefGoogle Scholar
  25. 25.
    Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)Google Scholar
  26. 26.
    Ranjan, R.: Streaming big data processing in datacenter clouds. IEEE Cloud Comput. 1(1), 78–83 (2014)CrossRefGoogle Scholar
  27. 27.
    Satyanarayanan, M., Bahl, P., Caceres, R., Davies, N.: The case for VM-based cloudlets in mobile computing. IEEE Pervasive Comput. 8(4), 14–23 (2009)CrossRefGoogle Scholar
  28. 28.
    Xing, E.P., Ho, Q., Dai, W., et al.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)CrossRefGoogle Scholar
  29. 29.
    Yu, H., Vahdat, A.: Design and evaluation of a conit-based continuous consistency model for replicated services. ACM TOCS 20(3), 239–282 (2002)CrossRefGoogle Scholar
  30. 30.
    Zeger, S.L., Qaqish, B.: Markov regression models for time series: a quasi-likelihood approach. Biometrics 44(4), 1019–1031 (1988)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Zhang, G.P.: Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50, 159–175 (2003)CrossRefGoogle Scholar
  32. 32.
    Žliobaitė, I.: Learning under concept drift: an overview. Technical report, Vilnius University (2010). eprint arXiv:1010.4784

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute of Information Systems EngineeringVienna University of TechnologyViennaAustria

Personalised recommendations