Abstract
In this paper we consider the efficiency of hybrid systems-on-a-chip for high-performance calculations. Firstly, we build Roofline performance models for the systems considered using Empirical Roofline Toolkit and compare the results with the theoretical estimates. Secondly, we use LAMMPS as an example of the molecular dynamic package to demonstrate its performance and efficiency in various configurations running on Nvidia Tegra K1 & X1. Following the Roofline approach, we attempt to distinguish compute-bound and memory-bound conditions for the MD algorithm using the Lennard-Jones liquid model. The results are discussed in the context of the LAMMPS performance on Intel Xeon CPUs and the Nvidia Tesla K80 GPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mitra, G., Johnston, B., Rendell, A., McCreath, E., Zhou, J.: Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp. 1107–1116 (2013). doi:10.1109/IPDPSW.2013.207
Keipert, K., Mitra, G., Sunriyal, V., Leang, S.S., Sosonkina, M., Rendell, A.P., Gordon, M.S.: Energy-efficient computational chemistry: comparison of x86 and ARM systems. J. Chem. Theory Comput. 11(11), 5055–5061 (2015). doi:10.1021/acs.jctc.5b00713
Curnow, H.J., Wichmann, B.A.: A synthetic benchmark. Comput. J. 19(1), 43–49 (1976)
Strohmaier, E., Hongzhang, S.: Apex-Map: a global data access benchmark to analyze HPC systems and parallel programming paradigms. In: Proceedings of the ACM/IEEE SC 2005 Conference (2005). doi:10.1109/SC.2005.13
Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Technical report, Sandia National Laboratories (2009)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Hoefler, T., Belli, R.: Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 73:1–73:12 (2015). http://doi.acm.org/10.1145/2807591.2807644
Pruitt, D.D., Freudenthal, E.A.: Preliminary investigation of mobile system features potentially relevant to HPC. In: Proceedings of the 4th International Workshop on Energy Efficient Supercomputing, E2SC 2016, pp. 54–60. IEEE Press, Piscataway, NJ, USA (2016). doi:10.1109/E2SC.2016.13
Scogland, T., Azose, J., Rohr, D., Rivoire, S., Bates, N., Hackenberg, D.: Node variability in large-scale power measurements: perspectives from the Green500, Top500 and EEHPCWG. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 (2015). http://doi.acm.org/10.1145/2807591.2807653
Stegailov, V.V., Orekhov, N.D., Smirnov, G.S.: HPC hardware efficiency for quantum and classical molecular dynamics. In: Malyshkin, V. (ed.) PaCT 2015. LNCS, vol. 9251, pp. 469–473. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21909-7_45
Smirnov, G.S., Stegailov, V.V.: Efficiency of classical molecular dynamics algorithms on supercomputers. Math. Models Comput. Simul. 8(6), 734–743 (2016). doi:10.1134/S2070048216060156
Gallardo, E., Teller, P.J., Argueta, A., Jaloma, J.: Cross-accelerator performance profiling. In: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale XSEDE 2016, pp. 19:1–19:8. ACM, NY, USA (2016). doi:10.1145/2949550.2949567
Glinsky, B., Kulikov, I., Chernykh, I., Weins, D., Snytnikov, A., Nenashev, V., Andreev, A., Egunov, V., Kharkov, E.: The co-design of astrophysical code for massively parallel supercomputers. In: Carretero, J., et al. (eds.) ICA3PP 2016. LNCS, vol. 10049, pp. 342–353. Springer, Heidelberg (2016). doi:10.1007/978-3-319-49956-7_27
Rojek, K., Wyrzykowski, R., Kuczynski, L.: Systematic adaptation of stencil-based 3D MPDATA to GPU architectures. Concurr. Comput.: Pract. Exp. (2016). doi:10.1002/cpe.3970
Nikolskiy, V., Stegailov, V.: Floating-point performance of ARM cores and their efficiency in classical molecular dynamics. J. Phys.: Conf. Ser. 681(1) (2016). Article ID 012049. http://stacks.iop.org/1742-6596/681/i=1/a=012049
Laurenzano, M.A., Tiwari, A., Cauble-Chantrenne, A., Jundt, A., Ward, W.A., Campbell, R., Carrington, L.: Characterization and bottleneck analysis of a 64-bit ARMv8 platform. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 36–45 (2016). doi:10.1109/ISPASS.2016.7482072
Ukidave, Y., Kaeli, D., Gupta, U., Keville., K.: Performance of the NVIDIA Jetson TK1 in HPC. In: 2015 IEEE International Conference on Cluster Computing, pp. 533–534 (2015)
Haidar, A., Tomov, S., Luszczek, P., Dongarra, J.: Magma embedded: towards a dense linear algebra library for energy efficient extreme computing. In: High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2015)
Stone, J.E., Hallock, M.J., Phillips, J.C., Peterson, J.R., Luthey-Schulten, Z., Schulten, K.: Evaluation of emerging energy-efficient heterogeneous computing platforms for biomolecular and cellular simulation workloads. In: International Parallel and Distributed Processing Symposium Workshop (IPDPSW). IEEE (2016)
Nikolskiy, V.P., Stegailov, V.V., Vecher, V.S.: Efficiency of the Tegra K1 and X1 systems-on-chip for classical molecular dynamics. In: 2016 International Conference on High Performance Computing Simulation (HPCS), pp. 682–689 (2016). doi:10.1109/HPCSim. 7568401
Lo, Y.J., et al.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 129–148. Springer, Heidelberg (2015). doi:10.1007/978-3-319-17248-4_7
Eckhardt, W., et al.: 591 TFLOPS multi-trillion particles simulation on SuperMUC. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 1–12. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38750-0_1
Piana, S., Klepeis, J.L., Shaw, D.E.: Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Curr. Opin. Struct. Biol. 24, 98–105 (2014). doi:10.1016/j.sbi.2013.12.006
Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117(1), 1–19 (1995). doi:10.1006/jcph.1995.1039
Glaser, J., Nguyen, T.D., Anderson, J.A., Lui, P., Spiga, F., Millan, J.A., Morse, D.C., Glotzer, S.C.: Strong scaling of general-purpose molecular dynamics simulations on GPUs. Comput. Phys. Commun. 192, 97–107 (2015). doi:10.1016/j.cpc.2015.02.028
Trott, C.R., Winterfeld, L., Crozier, P.S.: General-purpose molecular dynamics simulations on GPU-based clusters. ArXiv e-prints arXiv:1009.4330 (2010)
Brown, W.M., Wang, P., Plimpton, S.J., Tharrington, A.N.: Implementing molecular dynamics on hybrid high performance computers – short range forces. Comput. Phys. Commun. 182(4), 898–911 (2011). doi:10.1016/j.cpc.2010.12.021
Brown, W.M., Kohlmeyer, A., Plimpton, S.J., Tharrington, A.N.: Implementing molecular dynamics on hybrid high performance computers – particle–particle particle-mesh. Comput. Phys. Commun. 183(3), 449–459 (2012). doi:10.1016/j.cpc.2011.10.012
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). doi:10.1016/j.jpdc.2014.07.003
Acknowledgments
HSE and MIPT provided funds for purchasing the hardware used in this study. The authors are grateful to the Forsite company for the access to the server with Nvidia Tesla K80. The authors acknowledge Joint Supercomputer Centre of RAS for the access to MVS-100K and MVS-10P supercomputers. The work was supported by the grant No. 14-50-00124 of the Russian Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Nikolskii, V., Vecher, V., Stegailov, V. (2016). Performance of MD-Algorithms on Hybrid Systems-on-Chip Nvidia Tegra K1 & X1. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2016. Communications in Computer and Information Science, vol 687. Springer, Cham. https://doi.org/10.1007/978-3-319-55669-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-55669-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55668-0
Online ISBN: 978-3-319-55669-7
eBook Packages: Computer ScienceComputer Science (R0)