Skip to main content
Log in

Improving Scalability of Cloud Monitoring Through PCA-Based Clustering of Virtual Machines

  • Original Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Cloud computing has recently emerged as a leading paradigm to allow customers to run their applications in virtualized large-scale data centers. Existing solutions for monitoring and management of these infrastructures consider virtual machines (VMs) as independent entities with their own characteristics. However, these approaches suffer from scalability issues due to the increasing number of VMs in modern cloud data centers. We claim that scalability issues can be addressed by leveraging the similarity among VMs behavior in terms of resource usage patterns. In this paper we propose an automated methodology to cluster VMs starting from the usage of multiple resources, assuming no knowledge of the services executed on them. The innovative contribution of the proposed methodology is the use of the statistical technique known as principal component analysis (PCA) to automatically select the most relevant information to cluster similar VMs. We apply the methodology to two case studies, a virtualized testbed and a real enterprise data center. In both case studies, the automatic data selection based on PCA allows us to achieve high performance, with a percentage of correctly clustered VMs between 80% and 100% even for short time series (1 day) of monitored data. Furthermore, we estimate the potential reduction in the amount of collected data to demonstrate how our proposal may address the scalability issues related to monitoring and management in cloud computing data centers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Singh R, Shenoy P J, Natu M, Sadaphal V P, Vin H M. Predico: A system for what-if analysis in complex data center applications. In Proc. the 12th International Middleware Conference, Dec. 2011, pp.123-142.

  2. Wood T, Shenoy P, Venkataramani A, Yousif M. Black-box and gray-box strategies for virtual machine migration. In Proc. the 4th USENIX Conference on Networked Systems Design and Implementation, Apr. 2007, pp.229-242.

  3. Andreolini M, Colajanni M, Tosi S. A software architecture for the analysis of large sets of data streams in cloud infras-tructures. In Proc. the 11th IEEE International Conference on Computer and Information Technology (IEEE CIT 2011), Aug. 31-Sept. 2, 2011, pp.389-394.

  4. Ardagna D, Panicucci B, Trubian M, Zhang L. Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Transactions on Services Computing, 2012, 5(1): 2–19.

    Article  Google Scholar 

  5. Beloglazov A, Buyya R. Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In Proc. the 8th Int. Workshop on Middlewave for Grids, Clouds and e-Science, Dec. 2010, Article No.4.

  6. Gmach D, Rolia J, Cherkasova L, Kemper A. Resource pool management: Reactive versus proactive or let’s be friends. Computer Networks, 2009, 53(17): 2905–2922.

    Article  Google Scholar 

  7. Lancellotti R, Andreolini M, Canali C, Colajanni M. Dynamic request management algorithms for Web-based services in cloud computing. In Proc. the 35th IEEE Computer Soft-ware and Applications Conference, Jul. 2011, pp.401-406.

  8. Tang C, Steinder M, Spreitzer M, Pacifici G. A scalable application placement controller for enterprise data centers. In Proc. the 16th International Conference on World Wide Web, May 2007, pp.331-340.

  9. Durkee D. Why cloud computing will never be free. Queue, 2010, 8(4): 20:20–20:29.

    Google Scholar 

  10. Canali C, Lancellotti R. Automated clustering of virtual machines based on correlation of resource usage. Communications Software and Systems, 2012, 8(4): 102–109.

    Google Scholar 

  11. Canali C, Lancellotti R. Automated clustering of VMs for scalable cloud monitoring and management. In Proc. the 20th International Conference on Software, Telecommunications and Computer Networks, Sept. 2012, pp.1-5.

  12. Gong Z, Gu X. PAC: Pattern-driven application consolidation for efficient cloud computing. In Proc. the IEEE Int. Symp. Modeling, Analysis & Simulation of Computer and Telecommunication Systems, Aug. 2010, pp.24-33.

  13. Setzer T, Stage A. Decision support for virtual machine reassignments in enterprise data centers. In Proc. the IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS), Apr. 2010, pp.88-94.

  14. Castro M, Liskov B. Practical Byzantine fault tolerance. In Proc. the 3rd Symposium on Operating Systems Design and Implementation, Feb. 1999, pp.173-186.

  15. Cecchet E, Chanda A, Elnikety S, Marguerite J, Zwaenepoel W. Performance comparison of middleware architectures for generating dynamic Web content. In Proc. the 4th International Middleware Conference, Jun. 2003, pp.242-261.

  16. Kavalanekar S, Narayanan D, Sankar S, Thereska E, Vaid K, Worthington B. Measuring database performance in on-line services: A trace-based approach. In Lecture Notes in Computer Science 5895, Nambiar R, Poess M (eds.), Berlin, Heidelberg: Springer-Verlag, 2009, pp.132-145.

    Google Scholar 

  17. de Menezes M A, Barabási A L. Separating internal and external dynamics of complex systems. Physical Review Letters, 2004, 93(6).

  18. Hyvärinen A, Oja E. Independent component analysis: Algorithms and applications. Neural Networks, 2000, 13(4/5): 411–430.

    Article  Google Scholar 

  19. Greenacre M. Correspondence Analysis in Practice. Chapman and Hall/CRC, 2007.

  20. Mardia K V, Kent J T, Bibby J M. Multivariate Analysis (Probability and Mathematical Statistics). Academic Press, 1995.

  21. Abdi H, Williams L J. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433–459.

    Article  Google Scholar 

  22. Jain A K. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 2010, 31(8): 651–666.

    Article  Google Scholar 

  23. Filippone M, Camastra F, Masulli F, Rovetta S. A survey of kernel and spectral methods for clustering. Pattern Recognition, 2008, 41(1): 176–190.

    Article  MATH  Google Scholar 

  24. Andreolini M, Colajanni M, Pietri M. A scalable architecture for real-time monitoring of large information systems. In Proc. the 2nd IEEE Symposium on Network Cloud Computing and Applications, Dec. 2012, pp.143-150.

  25. Dinda P A, O’Hallaron D R. Host load prediction using linear models. Cluster Computing, 2000, 3(4): 265–280.

    Article  Google Scholar 

  26. Vogels W. Beyond server consolidation. ACM Queue, 2008, 6(1): 20–26.

    Article  Google Scholar 

  27. Amigó E, Gonzalo J, Artiles J, Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Journal of Information Retrieval, 2009, 12(4): 461-486.

    Article  Google Scholar 

  28. Manning C D, Raghavan P, Schtze H. Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008.

    Book  MATH  Google Scholar 

  29. Kusic D, Kephart J O, Hanson J E, Kandasamy N, Jiang G. Power and performance management of virtualized computing environment via lookahead. Cluster Computing, 2009, 12(1): 1–15.

    Article  Google Scholar 

  30. Chung W C, Chang R S. A new mechanism for resource monitoring in Grid computing. Future Generation Computer Systems, 2009, 25(1): 1–7.

    Article  Google Scholar 

  31. Naeem A N, Ramadass S, Yong C. Controlling scale sensor networks data quality in the Ganglia grid monitoring tool. Communication and Computer, 2010, 7(11): 18–26.

    Google Scholar 

  32. Tu C Y, Kuo W C, Teng W H, Wang Y T, Shiau S. A power- aware cloud architecture with smart metering. In Proc. the 39th International Conference on Parallel Processing Work-shops, Sept. 2010, pp.497-503.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia Canali.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 28 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canali, C., Lancellotti, R. Improving Scalability of Cloud Monitoring Through PCA-Based Clustering of Virtual Machines. J. Comput. Sci. Technol. 29, 38–52 (2014). https://doi.org/10.1007/s11390-013-1410-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-013-1410-9

Keywords

Navigation