Advertisement

A Survey on Global Management View: Toward Combining System Monitoring, Resource Management, and Load Prediction

  • Rodrigo da Rosa RighiEmail author
  • Matheus Lehmann
  • Marcio Miguel Gomes
  • Jeferson Campos Nobre
  • Cristiano André da Costa
  • Sandro José Rigo
  • Marcio Lena
  • Rodrigo Fraga Mohr
  • Luiz Ricardo Bertoldi de Oliveira
Article
  • 19 Downloads

Abstract

Today, enterprise applications impose more and more resource requirements to support an ascending number of clients and to deliver them an acceptable Quality of Service (QoS). To ensure such requirements are met, it is essential to apply appropriate resource and application monitoring techniques. Such techniques collect data to enable predictions and actions which can offer better system performance. Typically, system administrators need to consider different data sources, so making the relationship among them by themselves. To address these gaps and considering the context of general networked-based systems, we propose a survey that combines a discussion about system monitoring, data prediction, and resource management procedures in a unified view. The article discusses resource and application monitoring, resource management, and data forecast at both performance and architectural perspectives of enterprise systems. Our idea is to describe consolidated subjects such as monitoring metrics and resource scheduling, together with novel trends, including cloud elasticity and artificial intelligence-based load prediction algorithms. This survey links the aforesaid three pillars, emphasizing relationships among them and also pointing out opportunities and research challenges in the area.

Keywords

Performance Monitoring metrics Computing infrastructure Resource monitoring Resource management Load prediction Time series 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This article was partially supported by the following Brazilian agencies: CAPES, CNPq and FAPERGS. In addition, we would like to thank DELL for also supporting this research.

References

  1. 1.
    Aaziz, O., Cook, J., Sharifi, H.: Push me pull you: Integrating opposing data transport modes for efficient hpc application monitoring. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp. 674-681 (2015)Google Scholar
  2. 2.
    Aceto, G., Botta, A., De Donato, W., Pescapè, A.: Cloud monitoring: a survey. Comput. Netw. 57(9), 2093–2115 (2013)Google Scholar
  3. 3.
    Agarwala, S., Poellabauer, C., Kong, J., Schwan, K., Wolf, M.: System-level resource monitoring in high-performance computing environments. J. Grid. Comput. 1(3), 273–289 (2003)zbMATHGoogle Scholar
  4. 4.
    Akbar, M.F., Munir, E.U., Rafique, M.M., Malik, Z., Khan, S.U., Yang, L.T.: List-based task scheduling for cloud computing. In: 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 652–659.  https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData.2016.143 (2016)
  5. 5.
    Al-Ayyoub, M., Daraghmeh, M., Jararweh, Y., Althebyan, Q.: Towards improving resource management in cloud systems using a multi-agent framework. Int. J. Cloud Comput. 5(1-2), 112–133 (2016)Google Scholar
  6. 6.
    Al-Dhuraibi, Y., Paraiso, F., Djarallah, N., Merle, P.: Elasticity in cloud computing: state of the art and research challenges. IEEE Trans. Serv. Comput. PP(99), 1–1 (2017).  https://doi.org/10.1109/TSC.2017.2711009 Google Scholar
  7. 7.
    Al Wadia, M., Tahir Ismail, M.: Selecting wavelet transforms model in forecasting financial time series data based on arima model. Appl. Math. Sci. 5(7), 315–326 (2011)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Alhamazani, K., Ranjan, R., Mitra, K., Rabhi, F., Jayaraman, P.P., Khan, S.U., Guabtni, A., Bhatnagar, V.: An overview of the commercial cloud monitoring tools: research dimensions, design issues, and state-of-the-art. Computing 97(4), 357–377 (2015)MathSciNetGoogle Scholar
  9. 9.
    Amiri, M., Mohammad-Khanli, L.: Survey on prediction models of applications for resources provisioning in cloud. Journal of Network and Computer Applications (2017)Google Scholar
  10. 10.
    Balcas, J., Kcira, D., Mughal, A., Newman, H., Spiropulu, M., Vlimant, J.: Monalisa, an agent-based monitoring and control system for the lhc experiments. In: Journal of Physics: Conference Series, IOP Publishing, vol. 898, p. 092055 (2017)Google Scholar
  11. 11.
    Borchert, K., Hirth, M., Zinner, T., Mocanu, D.C.: Correlating qoe and technical parameters of an sap system in an enterprise environment. In: 2016 28th International Teletraffic Congress (ITC 28), IEEE, vol. 3, pp. 34–36 (2016)Google Scholar
  12. 12.
    Bouabdallah, R., Lajmi, S., Ghedira, K.: Use of reactive and proactive elasticity to adjust resources provisioning in the cloud provider. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (2016)Google Scholar
  13. 13.
    Box, G.E., Jenkins, G.M.: Time series analysis forecasting and control. Tech. rep., Wisconsin Univ Madison Dept of Statistics (1970)zbMATHGoogle Scholar
  14. 14.
    Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time series analysis: forecasting and control. Wiley, New York (2015)zbMATHGoogle Scholar
  15. 15.
    Carvallo, P., Cavalli, A.R., Mallouli, W., Rios, E.: Multi-cloud applications security monitoring. In: International Conference on Green, Pervasive, and Cloud Computing, Springer, pp. 748–758 (2017)Google Scholar
  16. 16.
    Casavant, T.L., Kuhl, J.G.: A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. Softw. Eng. 14(2), 141–154 (1988).  https://doi.org/10.1109/32.4634 Google Scholar
  17. 17.
    Chen, J., Wang, C., Zhou, B.B., Sun, L., Lee, Y.C., Zomaya, A.Y.: Tradeoffs between profit and customer satisfaction for service provisioning in the cloud. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, pp 229–238. ACM, New York (2011). http://doi.acm.org/10.1145/1996130.1996161
  18. 18.
    Choi, T.M., Yu, Y., Au, K.F.: A hybrid sarima wavelet transform method for sales forecasting. Decis. Support. Syst. 51(1), 130–140 (2011)Google Scholar
  19. 19.
    Duan, R., Prodan, R., Li, X.: Multi-objective game theoretic schedulingof bag-of-tasks workflows on hybrid clouds. IEEE Trans. Cloud Comput. 2(1), 29–42 (2014).  https://doi.org/10.1109/TCC.2014.2303077 Google Scholar
  20. 20.
    Farshchi, M., Schneider, J.G., Weber, I., Grundy J: Metric selection and anomaly detection for cloud operations using log and metric correlation analysis. Journal of Systems and Software (2017)Google Scholar
  21. 21.
    Fatema, K., Emeakaroha, V.C., Healy, P.D., Morrison, J.P., Lynn, T.: A survey of cloud monitoring tools: Taxonomy, capabilities and objectives. J. Parallel Distrib. Comput. 74(10), 2918–2933 (2014)Google Scholar
  22. 22.
    Fittkau, F., Hasselbring, W.: Elastic application-level monitoring for large software landscapes in the cloud. In: European conference on service-oriented and cloud computing, Springer, pp. 80–94 (2015)Google Scholar
  23. 23.
    Frachtenberg, E., Schwiegelshohn, U.: New challenges of parallel job scheduling. In: Proceedings of the 13th International Conference on Job Scheduling Strategies for Parallel Processing. http://dl.acm.org/citation.cfm?id=1791551.1791552, vol. JSSPP’07, pp 1–23. Springer-Verlag, Berlin (2008)
  24. 24.
    Galante, G., d Bona, L.C.E.: A Survey on Cloud Computing Elasticity. In: 2012 IEEE 5th International Conference on Utility and Cloud Computing, pp. 263–270.  https://doi.org/10.1109/UCC.2012.30 (2012)
  25. 25.
    Galante, G., Erpen De Bona, L.C., Mury, A.R., Schulze, B., Rosa Righi, R.: An analysis of public clouds elasticity in the execution of scientific applications: a survey. J. Grid Comput. 14(2), 193–216 (2016).  https://doi.org/10.1007/s10723-016-9361-3 Google Scholar
  26. 26.
    Ghaderi, J.: Simple high-performance algorithms for scheduling jobs in the cloud. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 345–352,  https://doi.org/10.1109/ALLERTON.2015.7447025 (2015)
  27. 27.
    Guan, Q., Zhang, Z., Fu, S.: Proactive failure management by integrated unsupervised and semi-supervised learning for dependable cloud systems. In: 2011 6th International Conference on Availability, Reliability and Security, pp. 83–90.  https://doi.org/10.1109/ARES.2011.20 (2011)
  28. 28.
    Holt, C.C.: Forecasting seasonals and trends by exponentially weighted moving averages. Int. J. Forecast. 20(1), 5–10 (2004).  https://doi.org/10.1016/j.ijforecast.2003.09.015. http://www.sciencedirect.com/science/article/pii/S0169207003001134 Google Scholar
  29. 29.
    Hsieh, T.J., Hsiao, H.F., Yeh, W.C.: Forecasting stock markets using wavelet transforms and recurrent neural networks: an integrated system based on artificial bee colony algorithm. Appl. Soft Comput. 11(2), 2510–2525 (2011)Google Scholar
  30. 30.
    Katsaros, G., Subirats, J., Fitó, J O, Guitart, J., Gilet, P., Espling, D.: A service framework for energy-aware monitoring and vm management in clouds. Futur. Gener. Comput. Syst. 29 (8), 2077–2091 (2013)Google Scholar
  31. 31.
    Khan, M., Khendek, F., Toeroe, M.: Monitoring service level workload and adapting highly available applications. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, ACM, pp. 522–529 (2016)Google Scholar
  32. 32.
    Khandelwal, I., Adhikari, R., Verma, G.: Time series forecasting using hybrid arima and ann models based on dwt decomposition. Proc. Comput. Sci. 48, 173–179 (2015)Google Scholar
  33. 33.
    Khashei, M., Bijari, M.: A novel hybridization of artificial neural networks and arima models for time series forecasting. Appl. Soft Comput. 11(2), 2664–2675 (2011).  https://doi.org/10.1016/j.asoc.2010.10.015. http://www.sciencedirect.com/science/article/pii/S1568494610002759, the Impact of Soft Computing for the Progress of Artificial IntelligenceGoogle Scholar
  34. 34.
    Krauter, K., Buyya, R., Maheswaran, M.: A taxonomy and survey of grid resource management systems for distributed computing. Software: Practice and Experience 32(2), 135–164 (2002).  https://doi.org/10.1002/spe.432 zbMATHGoogle Scholar
  35. 35.
    Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)Google Scholar
  36. 36.
    Liu, J., Pacitti, E., Valduriez, P., De Oliveira, D., Mattoso, M.: Multi-objective scheduling of scientific workflows in multisite clouds. Futur. Gener. Comput. Syst. 63, 76–95 (2016)Google Scholar
  37. 37.
    Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: Scientific workflow scheduling with provenance data in a multisite cloud. In: Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXIII, Springer, pp. 80–112 (2017)Google Scholar
  38. 38.
    Liu, J., Pacitti, E., Valduriez, P.: A survey of scheduling frameworks in big data systems. Int. J. Cloud Comput. 7, 1–27 (2018)Google Scholar
  39. 39.
    Ma, H., Wang, L., Tak, B.C., Wang, L., Tang, C.: Auto-tuning Performance of MPI Parallel Programs Using Resource Management in Container-Based Virtual Cloud. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pp. 545–552.  https://doi.org/10.1109/CLOUD.2016.0078 (2016)
  40. 40.
    Madni, S.H.H., Latiff, M.S.A., Coulibaly, Y., Abdulhamid, S.M.: Resource Scheduling for Infrastructure As a Service (IaaS) in Cloud Computing. J. Netw. Comput. Appl. 68(C), 173–200 (2016).  https://doi.org/10.1016/j.jnca.2016.04.016 Google Scholar
  41. 41.
    Mandal, A., Ruth, P., Baldin, I., Król, D, Juve, G., Mayani, R., Da Silva, R.F., Deelman, E., Meredith, J., Vetter, J., et al.: Toward an end-to-end framework for modeling, monitoring and anomaly detection for scientific workflows. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. IEEE, pp. 1370–1379 (2016)Google Scholar
  42. 42.
    Manvi, S.S., Shyam, G.K.: Resource management for infrastructure as a service (iaas) in cloud computing: a survey. J. Netw. Comput. Appl. 41, 424–440 (2014)Google Scholar
  43. 43.
    Markham, I.S., Rakes, T.R.: The effect of sample size and variability of data on the comparative performance of artificial neural networks and regression. Comput. Oper. Res. 25(4), 251–263 (1998)zbMATHGoogle Scholar
  44. 44.
    Mell, P.M., Grance, T.: SP 800-145. The NIST definition of cloud computing. Tech. Rep. Gaithersburg, United States (2011)Google Scholar
  45. 45.
    Milidiu, R.L., Machado, R.J., Renteria, R.P.: Time-series forecasting through wavelets transformation and a mixture of expert models. Neurocomputing 28(1), 145–156 (1999)Google Scholar
  46. 46.
    Morton, A.: Active and passive metrics and methods (with hybrid types in-between). RFC 7799 (Informational) (2016)Google Scholar
  47. 47.
    Netto, M.A.S., Calheiros, R.N., Rodrigues, E.R., Cunha, R.L.F., Buyya, R.: HPC cloud for scientific and business applications: taxonomy, vision, and research challenges. ACM Comput. Surv. 1 (1), 1–1 (2017)Google Scholar
  48. 48.
    Pahl, C.: Containerization and the PaaS Cloud. IEEE Cloud Comput. 2(3), 24–31 (2015). 10.1109/MCC.2015.51Google Scholar
  49. 49.
    Patel, D.K., Tripathy, D., Tripathy, C.: Survey of load balancing techniques for grid. J. Netw. Comput. Appl. 65(C), 103–119 (2016).  https://doi.org/10.1016/j.jnca.2016.02.012 Google Scholar
  50. 50.
    Pavlou, G.: On the evolution of management approaches, frameworks and protocols: a historical perspective. J. Netw. Syst. Manag. 15(4), 425–445 (2007).  https://doi.org/10.1007/s10922-007-9082-9 Google Scholar
  51. 51.
    Persico, V., Grimaldi, D., Pescapè, A, Salvi, A., Santini, S.: A fuzzy approach based on heterogeneous metrics for scaling out public clouds. IEEE Trans. Parallel Distrib. Syst. 28(8), 2117–2130 (2017).  https://doi.org/10.1109/TPDS.2017.2651810 Google Scholar
  52. 52.
    di Pietro, A., Huici, F., Costantini, D., Niccolini, S.: Decon: Decentralized coordination for large-scale flow monitoring.. In: Proceedings..., Proceedings of the IEEE Conference on Computer Communications (INFOCOM).  https://doi.org/10.1109/INFCOMW.2010.5466642, pp 1–5. IEEE Computer Society, Washington (2010)
  53. 53.
    Poddar, R., Vishnoi, A., Mann, V.: HAVEN: Holistic load balancing and auto scaling in the cloud. In: 2015 7th International Conference on Communication Systems and Networks (COMSNETS), pp. 1–8.  https://doi.org/10.1109/COMSNETS.2015.7098681 (2015)
  54. 54.
    d R Righi, R., Rodrigues, V.F., da Costa, C.A., Galante, G., de Bona, L.C.E., Ferreto, T.: AutoElastic: Automatic resource elasticity for high performance applications in the cloud. IEEE Trans. Cloud Comput. 4(1), 6–19 (2016).  https://doi.org/10.1109/TCC.2015.2424876 Google Scholar
  55. 55.
    Ranjan, R., Benatallah, B.: Programming cloud resource orchestration framework: operations and research challenges. arXiv:12042204 (2012)
  56. 56.
    Righi, R.D.R.: MigBSP: a new approach for processes rescheduling management on bulk synchronous parallel applications (2009)Google Scholar
  57. 57.
    Righi, R.D.R., Rodrigues, V.F., da Costa, C.A., Galante, G., de Bona, L.C.E., Ferreto, T.: Autoelastic: automatic resource elasticity for high performance applications in the cloud. IEEE Trans. Cloud Comput. 4(1), 6–19 (2016).  https://doi.org/10.1109/TCC.2015.2424876 Google Scholar
  58. 58.
    Rodrigues, V.F., Correa, E., da Costa, C.A., da Rosa Righi, R.: On exploring proactive cloud elasticity for internet of things demands. In: 2017 XLIII Latin American Computer Conference, CLEI 2017, Córdoba, Argentina, September 4-8, 2017, pp. 1–10.  https://doi.org/10.1109/CLEI.2017.8226417 (2017)
  59. 59.
    Röhl, T, Eitzinger, J., Hager, G., Wellein, G.: Likwid monitoring stack: A flexible framework enabling job specific performance monitoring for the masses. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp. 781-784 (2017)Google Scholar
  60. 60.
    da Rosa Righi, R., Pilla, L.L., Carissimi, A.S., Navaux, P.O.A., Heiss, H.U.: Applying processes rescheduling over irregular BSP application, pp 213–223. Springer, Berlin (2009).  https://doi.org/10.1007/978-3-642-01970-8_22 Google Scholar
  61. 61.
    da Rosa Righi, R., de Quadros Gomes, R., Rodrigues, V.F., da Costa, C.A., Alberti, A.M., Pilla, L.L., Navaux, P.O.A.: Migpf: Towards on self-organizing process rescheduling of bulk-synchronous parallel applications. Futur. Gener. Comput. Syst. 78, 272–286 (2018).  https://doi.org/10.1016/j.future.2016.05.004. http://www.sciencedirect.com/science/article/pii/S0167739X16301145 Google Scholar
  62. 62.
    Sahi, S.K., Dhaka, V.: A survey paper on workload prediction requirements of cloud computing. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp. 254–258 (2016)Google Scholar
  63. 63.
    Sawamura, R., Boeres, C., Rebello, V.E.F.: MEC: The Memory Elasticity Controller. In: 2016 IEEE 23rd international conference on high performance computing (HiPC), pp. 111–120.  https://doi.org/10.1109/HiPC.2016.022 (2016)
  64. 64.
    Sekar, V., Reiter, M.K., Willinger, W., Zhang, H., Kompella, R.R., Andersen, D.G.: Csamp: A system for network-wide flow monitoring. In: Proceedings..., USENIX Symposium on Networked Systems Design and Implementation (NSDI), pp 233–246. USENIX Association, Berkeley (2008)Google Scholar
  65. 65.
    Seneviratne, S., Witharana, S.: A survey on methodologies for runtime prediction on grid environments. In: 2014 7th International Conference on Information and Automation for Sustainability (ICIAfS), IEEE, pp. 1–6 (2014)Google Scholar
  66. 66.
    Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., Zekauskas, M.: A one-way active measurement protocol (owamp). RFC 4656 (Proposed Standard) (2006)Google Scholar
  67. 67.
    Shen, H.: RIAL: Resource intensity aware load balancing in clouds. IEEE Trans. Cloud Comput. PP(99), 1–1 (2017).  https://doi.org/10.1109/TCC.2017.2737628 Google Scholar
  68. 68.
    Singh, S., Chana, I.: A survey on resource scheduling in cloud computing: Issues and challenges. J. Grid Comput. 14(2), 217–264 (2016)Google Scholar
  69. 69.
    Sun, P., Wu, D., Wei, K., Guo, X.: Bans-based cloud resources monitoring system. In: 2015 8th International Symposium on Computational Intelligence and Design (ISCID), IEEE, vol. 2, pp. 445-448 (2015)Google Scholar
  70. 70.
    Tonouchi, T.: A light-weight application monitoring and statistical debugging for a black-box application. In: 2015 17th Asia-Pacific Network Operations and Management Symposium (APNOMS), IEEE, pp. 523–526 (2015)Google Scholar
  71. 71.
    Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002).  https://doi.org/10.1109/71.993206 Google Scholar
  72. 72.
    Waraich, S.S.: Classification of Dynamic Load Balancing Strategies in a Network of Workstations. In: 5th International Conference on Information Technology: New Generations (itng 2008), pp. 1263-1265.  https://doi.org/10.1109/ITNG.2008.166 (2008)
  73. 73.
    Watts, J., Taylor, S.: A practical approach to dynamic load balancing. IEEE Trans. Parallel Distrib. Syst. 9(3), 235–248 (1998).  https://doi.org/10.1109/71.674316 Google Scholar
  74. 74.
    Weingärtner, R, Bräscher, G B, Westphall, C.B.: Cloud resource management: a survey on forecasting and profiling models. J. Netw. Comput. Appl. 47, 99–106 (2015)Google Scholar
  75. 75.
    Winters, P.R.: Forecasting sales by exponentially weighted moving averages. Manag. Sci. 6(3), 324–342 (1960)MathSciNetzbMATHGoogle Scholar
  76. 76.
    Xu, X., Chen, Y., Calero, J.M.A.: Distributed decentralized collaborative monitoring architecture for cloud infrastructures. Clust. Comput. 20(3), 2451–2463 (2017)Google Scholar
  77. 77.
    Yagoubi, B., Medebber, M.: A load balancing model for grid environment. In: 2007 22nd International Symposium on Computer and Information Sciences, pp. 1–7.  https://doi.org/10.1109/ISCIS.2007.4456873(2007)
  78. 78.
    Yoo, W., Sim, A.: Time-series forecast modeling on high-bandwidth network measurements. J. Grid Comput. 14(3), 463–476 (2016)Google Scholar
  79. 79.
    Zhang, G.P.: Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50, 159–175 (2003)zbMATHGoogle Scholar
  80. 80.
    Zhang, H., Jiang, G., Yoshihira, K., Chen, H.: Proactive workload management in hybrid cloud computing. IEEE Trans. Netw. Serv. Manag. 11(1), 90–100 (2014).  https://doi.org/10.1109/TNSM.2013.122313.130448 Google Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  • Rodrigo da Rosa Righi
    • 1
    Email author
  • Matheus Lehmann
    • 1
  • Marcio Miguel Gomes
    • 1
  • Jeferson Campos Nobre
    • 1
  • Cristiano André da Costa
    • 1
  • Sandro José Rigo
    • 1
  • Marcio Lena
    • 2
  • Rodrigo Fraga Mohr
    • 2
  • Luiz Ricardo Bertoldi de Oliveira
    • 1
  1. 1.Software Innovation Laboratory (SoftwareLab) - Applied Computing Graduate ProgramUniversity of Vale do Rio dos Sinos (UNISINOS)São LeopoldoBrazil
  2. 2.DELLEldorado do SulBrazil

Personalised recommendations