Advertisement

Taxonomist: Application Detection Through Rich Monitoring Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11014)

Abstract

Modern supercomputers are shared among thousands of users running a variety of applications. Knowing which applications are running in the system can bring substantial benefits: knowledge of applications that intensively use shared resources can aid scheduling; unwanted applications such as cryptocurrency mining or password cracking can be blocked; system architects can make design decisions based on system usage. However, identifying applications on supercomputers is challenging because applications are executed using esoteric scripts along with binaries that are compiled and named by users.

This paper introduces a novel technique to identify applications running on supercomputers. Our technique, Taxonomist, is based on the empirical evidence that applications have different and characteristic resource utilization patterns. Taxonomist uses machine learning to classify known applications and also detect unknown applications. We test our technique with a variety of benchmarks and cryptocurrency miners, and also with applications that users of a production supercomputer ran during a 6 month period. We show that our technique achieves nearly perfect classification for this challenging data set.

Keywords

Supercomputing HPC Application detection Monitoring Security Cryptocurrency 

References

  1. 1.
    ASCR cybersecurity for scientific computing integrity. DOE Workshop Report (2015)Google Scholar
  2. 2.
    Agelastos, A., et al.: The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large scale computing systems and applications. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 154–165 (2014)Google Scholar
  3. 3.
    Agelastos, A., et al.: Toward rapid understanding of production HPC applications and systems. In: IEEE International Conference on Cluster Computing, pp. 464–473 (2015)Google Scholar
  4. 4.
    Ates, E., et al.: Artifact for taxonomist: application detection through rich monitoring data (2018).  https://doi.org/10.6084/m9.figshare.6384248
  5. 5.
    Auweter, A., et al.: A case study of energy aware scheduling on SuperMUC. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 394–409. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07518-1_25CrossRefGoogle Scholar
  6. 6.
    Bailey, D., et al.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991)CrossRefGoogle Scholar
  7. 7.
    Bhatele, A., Mohror, K., Langer, S.H., Isaacs, K.E.: There goes the neighborhood: performance degradation due to nearby jobs. In: SC 2013, pp. 41:1–41:12. ACM, New York (2013)Google Scholar
  8. 8.
    Combs, J., et al.: Power signatures of high-performance computing workloads. In: Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, E2SC 2014, pp. 70–78. IEEE Press, Piscataway (2014)Google Scholar
  9. 9.
    Cray: Aries hardware counters (s-0045-20). Technical report (2015). http://docs.cray.com/books/S-0045-20/S-0045-20.pdf
  10. 10.
    Dart, E., Rotman, L., Tierney, B., Hester, M., Zurawski, J.: The science DMZ: a network design pattern for data-intensive science. In: SC 2013, pp. 1–10 (2013)Google Scholar
  11. 11.
    DeMasi, O., Samak, T., Bailey, D.H.: Identifying HPC codes via performance logs and machine learning. In: Proceedings of the First Workshop on Changing Landscapes in HPC Security, pp. 23–30. ACM, New York (2013)Google Scholar
  12. 12.
    Dongarra, J., et al.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011)CrossRefGoogle Scholar
  13. 13.
    Egele, M., Woo, M., Chapman, P., Brumley, D.: Blanket execution: dynamic similarity testing for program binaries and components. In: 23rd USENIX Security Symposium, pp. 303–317. USENIX Association, San Diego (2014)Google Scholar
  14. 14.
    Florez, G., Liu, Z., Bridges, S.M., Skjellum, A., Vaughn, R.B.: Lightweight monitoring of MPI programs in real time: research articles. Concurr. Comput.: Pract. Exp. 17(13), 1547–1578 (2005)CrossRefGoogle Scholar
  15. 15.
    Heroux, M.A., et al.: Improving performance via mini-applications. Technical report SAND2009-5574, Sandia National Laboratories (2009)Google Scholar
  16. 16.
    Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification. Technical report (2003). https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
  17. 17.
    Kunen, A., Bailey, T., Brown, P.: KRIPKE-a massively parallel transport mini-app. Technical report, Lawrence Livermore National Laboratory, Livermore (2015)Google Scholar
  18. 18.
    Ma, C., et al.: An approach for matching communication patterns in parallel applications. In: IEEE International Symposium on Parallel Distributed Processing, pp. 1–12 (2009)Google Scholar
  19. 19.
  20. 20.
    Office of Inspector General: Semiannual report to congress (2014). https://www.nsf.gov/pubs/2014/oig14002/oig14002.pdf
  21. 21.
    Peisert, S.: Fingerprinting communication and computation on HPC machines. Technical report, Lawrence Berkeley National Laboratory (2010).  https://doi.org/10.2172/983323
  22. 22.
    RedLock CSI Team: Lessons from the cryptojacking attack at Tesla. Technical report (2018). https://blog.redlock.io/cryptojacking-tesla
  23. 23.
    Rosenberg, E.: Nuclear scientists logged on to one of Russias most secure computers to mine bitcoin. The Washington Post (2018)Google Scholar
  24. 24.
    Skinner, D., Wright, N., Fuerlinger, K., Yelick, K., Snavely, A.: Integrated performance monitoring IPM (2009). http://ipm-hpc.sourceforge.net/
  25. 25.
    Thebe, O., Bunde, D.P., Leung, V.J.: Scheduling restartable jobs with short test runs. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2009. LNCS, vol. 5798, pp. 116–137. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-04633-9_7CrossRefGoogle Scholar
  26. 26.
    Tuncer, O., et al.: Diagnosing performance variations in HPC applications using machine learning. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds.) ISC 2017. LNCS, vol. 10266, pp. 355–373. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58667-0_19CrossRefGoogle Scholar
  27. 27.
    Wang, X., Smith, K., Hyndman, R.: Characteristic-based clustering for time series data. Data Min. Knowl. Disc. 13(3), 335–364 (2006)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Whalen, S., Peisert, S., Bishop, M.: Multiclass classification of distributed memory parallel computations. Pattern Recogn. Lett. 34(3), 322–329 (2013)CrossRefGoogle Scholar
  29. 29.
    Zcash Electric Coin Company: Zcash open source miner challenge (2016). www.zcashminers.org

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Boston UniversityBostonUSA
  2. 2.Sandia National LaboratoriesAlbuquerqueUSA

Personalised recommendations