Advertisement

Interference-aware co-scheduling method based on classification of application characteristics from hardware performance counter using data mining

  • Jieun Choi
  • Geunchul Park
  • Dukyun NamEmail author
Article
  • 27 Downloads

Abstract

Computational scientists and engineers who are eager to obtain the best performance of scientific applications need efficient application characterization methods to successfully exploit high-performance hardware resources. However, modern processors are accompanied by high-bandwidth on-chip memory or a large number of cores. Therefore, application characterization research that takes into account the newly introduced hardware features in next-generation high performance computing environments is insufficient and complex. In this paper, we propose a simple and fast method to classify the application characteristics in systems state-of-the-art processors using hardware performance counters. The proposed method utilizes hardware performance counters to monitor hardware events related to system performance. A clustering approach is adopted that requires limited understanding of the correlation between hardware events and application characteristics. The application characterization technique is applied to NAS parallel benchmarks in two systems, including Intel Knights Landing and SkyLake Xeon processors. We demonstrate that the proposed techniques can capture system and application characteristics and provide users with useful insights into application execution.

Keywords

Application characteristics classification Performance counter event Data mining Hardware performance counter Resource interference Interference-aware co-scheduling 

Notes

Acknowledgements

This work was partly supported by Institute for Information & communications Technology Promotion (IITP) Grant funded by the Korea government (MSIT) (No. R0190-18-2012) and Korea Institute of Science and Technology Information (KISTI) Grant (No. K-19-L02-C06-S01)

References

  1. 1.
    Jones, M.D., et al.: Workload analysis of blue waters. arXiv:1703.00924 (2017)
  2. 2.
    Cho, J.-Y., Jin, H.-W., Nam, D.: Enhanced memory management for scalable MPI intra-node communication on many-core processor. In: Proceedings of the 24th European MPI Users’ Group Meeting (EuroMPI), Article No. 10. ACM (2017)Google Scholar
  3. 3.
    Molka, D., Schöne, R., Hackenberg, D., Nagel, W.E.: Detecting memory-boundedness with hardware performance counters. In: Proceedings of the 8th ACM/SPEC International Conference on Performance Engineering (ICPE), pp. 27–38. ACM (2017)Google Scholar
  4. 4.
    Liang, F., Feng, C., Lu, X., Xu, Z.: Performance characterization of hadoop and data MPI based on Amdahl’s second law. In: Proceedings of the 9th IEEE International Conference on Networking, Architecture, and Storage, pp. 207–215. IEEE (2014)Google Scholar
  5. 5.
    Wang, H., Isci, C., Subramanian, L., Choi, J., Qian, D., Mutlu, O.: A-DRM: Architecture-aware distributed resource management of virtualized clusters. In: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pp. 93–106. ACM (2015)Google Scholar
  6. 6.
    Sreepathi, S., et al.: Application characterization using Oxbow toolkit and PADS infrastructure. In: Proceedings of the 1st International Workshop on Hardware-Software Co-design for High Performance Computing, pp. 55–63. IEEE (2014)Google Scholar
  7. 7.
    Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.E.: A performance counter architecture for computing accurate CPI components. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 175–184. ACM (2006)Google Scholar
  8. 8.
  9. 9.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  10. 10.
  11. 11.
  12. 12.
    Intel\(^{\textregistered }\) 64 and IA-32 Architectures Software Developer’s Manual, vol. 3B. Intel (2017)Google Scholar
  13. 13.
    AMD64 Architecture Programmer’s Manual, vol. 2. AMD (2013)Google Scholar
  14. 14.
  15. 15.
    Performance API (PAPI). http://icl.cs.utk.edu/papi/
  16. 16.
  17. 17.
    Mathur, W., Cook, J.: Improved estimation for software multiplexing of performance counters. In: Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 23–32. IEEE (2005)Google Scholar
  18. 18.
    Jundt, A., et al.: Compute bottlenecks on the new 64-bit ARM. In: Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, Article No. 6. IEEE (2015)Google Scholar
  19. 19.
    Schöne, R., Hackenberg, D.: On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In: Proceedings of the 2nd ACM/SPEC International Conference on Performance Engineering, pp. 481–486. ACM (2011)Google Scholar
  20. 20.
    Keller, V., Gruber, R.: One joule per gflop for blas2 now! In: Proceedings of the International Conference of Numerical Analysis and Applied Mathematics, pp. 1321–1324. American Institute of Physics (2010)Google Scholar
  21. 21.
    Jarus, M., Oleksiak, A.: Top-down characterization approximation based on performance counters architecture for AMD processors. Simul. Model. Pract. Theory. 68, 146–162 (2016)CrossRefGoogle Scholar
  22. 22.
    Da Costa, G., Pierson, J.-M.: Characterizing Applications from Power Consumption: A Case Study for HPC Benchmarks. In: Kranzlmüller, D., Toja, A.M. (eds.) Information and Communication on Technology for the Fight Against Global Warming (ICT-GLOW 2011). Lecture Notes in Computer Science, vol. 6868. Springer (2011)Google Scholar
  23. 23.
    Zhang, J., Figueiredo, R.J.: Application classification through monitoring and learning of resource consumption patterns. In: Proceedings of the Parallel and Distributed Processing Symposium (IPDPS). IEEE (2006)Google Scholar
  24. 24.
    Breitbart, J., Weidendorfer, J., Trinitis, C.: Case study on Co-scheduling for HPC applications. In: Proceedings of the 44th International Conference on Parallel Processing Workshops, pp. 277–285. IEEE (2015)Google Scholar
  25. 25.
    Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 129–141. ACM (2010)Google Scholar
  26. 26.
    Van Craeynest, K., Jaleel, A., Eeckhout, L., Narvaez, P., Emer, J.: Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In: Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA), pp. 213–224. ACM (2012)Google Scholar
  27. 27.
    Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann, Burlington (2016)Google Scholar
  28. 28.
    Harini, R.: Intel\(^{\textregistered }\) Xeon\(^{\textregistered }\) Phi\(^{{\rm TM}}\) Processor—Performance Monitoring Reference Manual, vol. 1. Intel (2017)Google Scholar
  29. 29.
    Likwid tool: Knights Landing. https://github.com/RRZE-HPC/likwid/wiki/KNL
  30. 30.
    Harini, R.: Intel\(^{\textregistered }\) Xeon\(^{\textregistered }\) Phi\(^{{\rm TM}}\) Processor—Performance Monitoring Reference Manual, vol. 2. Intel (2017)Google Scholar
  31. 31.
  32. 32.
    Intel\(^{\textregistered }\) Xeon\(^{\textregistered }\) Processor Scalable Memory Family Uncore Performance Monitoring Reference Manual. Intel (2017)Google Scholar
  33. 33.
    Park, G., Rho, S., Kim, J.-S., Nam, D.: Towards optimal scheduling policy for heterogeneous memory architecture in many-core system. Clust. Comput. 22(1), 121–133 (2019)CrossRefGoogle Scholar
  34. 34.
    Choi, J., Park, G., Nam, D.: Efficient classification of application characteristics by using hardware performance counters with data mining. In: Proceedings of 2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W), pp. 24–29. IEEE (2018)Google Scholar
  35. 35.
    Wong, P., Van der Wijngaart, R.F.: NAS parallel benchmarks I/O version 2.4. NAS Technical Report NAS-03-002. NASA Ames Research Center (2003)Google Scholar
  36. 36.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.National Institute of Supercomputing and NetworkingKISTIDaejeonRepublic of Korea

Personalised recommendations