CloudPT: Performance Testing for Identifying and Detecting Bottlenecks in IaaS

  • Ameen AlkasemEmail author
  • Hongwei Liu
  • Decheng Zuo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11336)


This work addresses performance testing for monitoring mass quantities of large-dataset measurements in infrastructure-as-a-Service (IaaS). Physical resources are not virtualized in sharing dynamic clouds; thus, shared resources compete for access to system resources. This competition introduces significant new challenges when assessing the performance of IaaS. A bottleneck may occur if one system resource is critical to IaaS; this may shut down the system and services, which would reduce the workflow performance by a large margin. To protect against bottlenecks, we propose CloudPT, a performance test management framework for IaaS. CloudPT has many advantages: (I) high-efficiency detection; (II) a unified end-to-end feedback loop to collaborate with cloud-ecosystems management; and (III) a troubleshooting performance test. This paper shows that CloudPT efficiently identifies and detects bottlenecks with a minimal false-positive rate (<13%) and it correlates high accuracy using the failure of a host virtual machine (host VM) to start-up with both cloud illustrative batches and transactional workloads such as the Spark, and Kafka framework for a data partitioning and collecting events on an each server. In a framework based on a trace case study, CloudPT diagnosed performance bottlenecks in 20 s with a precision rate of 86%, confirming its real-time efficiency.


IaaS Bottlenecks Performance testing VMs Apache Spark 



We are also thankful to anonymous reviewers for their valuable feedback and comments for improving the quality of the manuscript.


  1. 1.
    Malli, S.S., Soundararajan, V., Venkataraman, B.: Real Time Big Data Analytics to Derive Actionable Intelligence in Enterprise Applications, Internet of Things and Big Data Analytics Toward Next-Generation Intelligence, pp. 99–121. Springer, Cham (2018)Google Scholar
  2. 2.
    Gregg, B. Systems Performance: Enterprise and The Cloud. Pearson Education, New JerseyGoogle Scholar
  3. 3.
    Alsheikh, M.A., Niyato, D., Lin, S., Tan, H.P., Han, Z.: Mobile big data analytics using deep learning and apache spark. IEEE Network 30(3), 22–29 (2016)CrossRefGoogle Scholar
  4. 4.
  5. 5.
    Zhang, Q., Cheng, L., Boutaba, R.: Cloud computing: state-of-the-art and research challenges. J. Internet Serv. Appl. 1(1), 7–18 (2010)CrossRefGoogle Scholar
  6. 6.
    Alkasem, A., Liu, H., Decheng, Z., et al.: AFDI: A Virtualization-based Accelerated Fault Diagnosis Innovation for High Availability Computing, arXiv preprint arXiv:1507.08036 (2015)
  7. 7.
  8. 8.
    Alkasem, A., Liu, H., Zuo, D.: Utility cloud: a novel approach for diagnosis and self-healing based on the uncertainty in anomalous metrics. In: Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences, pp. 99–107. ACM (2017)Google Scholar
  9. 9.
    Zhai, Y., Xu, W.: March. efficient bottleneck detection in stream process system using fuzzy logic model. In: Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 438–445. IEEE (2017)Google Scholar
  10. 10.
    Castro Fernandez, R., Migliavacca, M., Kalyvianaki, E., Pietzuch, P.: Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM (2013)Google Scholar
  11. 11.
    Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., et al.: Anomaly-based network intrusion detection: techniques, systems and challenges. Comput. Secur. 28(1), 18–28 (2009)CrossRefGoogle Scholar
  12. 12.
    Massie, M., et al.: Monitoring with Ganglia: Tracking Dynamic Host and Application Metrics at Scale. O’Reilly Media, Inc., Massachusetts (2012)Google Scholar
  13. 13.
    Barth, W.N.: System and Network Monitoring. No Starch Press, San Francisco (2008)Google Scholar
  14. 14.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Massachusetts (2016)Google Scholar
  15. 15.
    Sharma, B., Praveen, A., Chita, R.D.: Problem determination and diagnosis in shared dynamic clouds. In: 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE (2013)Google Scholar
  16. 16.
    Cherkasova, L., Ozonat, K., Mi, N., Symons, J., Smirni, E.: Automated anomaly detection and performance modeling of enterprise applications. ACM Trans. Comput. Syst. (TOCS) 27(3), 1–32 (2009)CrossRefGoogle Scholar
  17. 17.
    Kumar, A., Shankar, R., Choudhary, A., Thakur, L.S.: A big data MapReduce framework for fault diagnosis in cloud-based manufacturing. Int. J. Prod. Res. 54(23), 7060–7073 (2016)CrossRefGoogle Scholar
  18. 18.
    Li, J., Qiu, M., Ming, Z., Quan, G., Qin, X., Gu, Z.: Online optimization for scheduling preemptable tasks on IaaS cloud systems. J. Parallel Distrib. Comput. 72(5), 666–677 (2012)CrossRefGoogle Scholar
  19. 19.
    Alkasem, A., Liu, H., Shafiq, M., Zuo, D.: A new theoretical approach: a model construct for fault troubleshooting in cloud computing. Mobile Inf. Syst. 2017, 16 (2017). Article ID 9038634CrossRefGoogle Scholar
  20. 20.
    SivaSelvan, N., Haider, M.Y., Selvan, N.S., Hegde, G.: Design and Development of Performance Management System (2016)Google Scholar
  21. 21.
    Wang, C., Talwar, V., Schwan, K., Ranganathan, P.: Online detection of utility cloud anomalies using metric distributions. In: Network Operations and Management Symposium (NOMS). IEEE (2010)Google Scholar
  22. 22.
    Bertino, Elisa, Catania, Barbara: Integrating XML and databases. IEEE Internet Comput. 5(4), 84–88 (2001)CrossRefGoogle Scholar
  23. 23.
    Barham, P., Boris, D., Keir, F., Steven, H., et al.: Xen and the art of virtualization. In: ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 164–177. ACM (2003)CrossRefGoogle Scholar
  24. 24.
    Riddle, A.R., Soon, M.C.: A survey on the security of hypervisors in cloud computing. In: 2015 IEEE 35th International Conference on Distributed Computing Systems Workshops (ICDCSW), pp. 100–104. IEEE (2015)Google Scholar
  25. 25.
    Gelman, A., John, B.C., Hal, S.S., Donald, B.R.: Bayesian Data Analysis, vol. 2. Chapman & Hall/CRC, Boca Raton (2014)Google Scholar
  26. 26.
    Doane, D.P., Lori, E.S.: Applied Statistics in Business and Economics. Irwin, New York (2005)Google Scholar
  27. 27.
    Alkasem, A., Liu, H., Zuo, D., Algarash, B.: Cloud computing: a model construct of real-time monitoring for big dataset analytics using apache spark. J. Phys: Conf. Ser. 933(1), 012018 (2018)Google Scholar
  28. 28.
    Jackson, K.: OpenStack Cloud Computing Cookbook. Packt Publishing Ltd, Birmingham (2012)Google Scholar
  29. 29.
    Kumar, V., Karsten, S.S., Yuan, C., Akhil, S.: A state-space approach to SLA based management. In: Network Operations and Management Symposium NOMS 2008 IEEE, pp. 192–199. IEEE (2008)Google Scholar
  30. 30.
    Alkasem, A., Liu, H.: A survey of fault-tolerance in cloud computing: concepts and practice. Res. J. Appl. Sci. Eng. Technol. 11(12), 1365–1377 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations