Skip to main content

Accelerating LBM and LQCD Application Kernels by In-Memory Processing

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2015)

Abstract

Processing-in-memory architectures promise increased computing performance at decreased costs in energy, as the physical proximity of the compute pipelines to the data store eliminates overheads for data transport. We assess the overall performance impact using a recently introduced architecture of that type, called the Active Memory Cube, for two representative scientific applications. Precise performance results for performance critical kernels are obtained using cycle-accurate simulations. We provide an overall performance estimate using performance models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ang, J.A., Barrett, R.F., Benner, R.E., Burke, D., Chan, C., Cook, J., Donofrio, D., Hammond, S.D., Hemmert, K.S., Kelly, S.M., Le, H., Leung, V.J., Resnick, D.R., Rodrigues, A.F., Shalf, J., Stark, D., Unat, D., Wright, N.J.: Abstract machine models and proxy architectures for exascale computing. In: Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing (Co-HPC 2014), pp. 25–32. IEEE Press, Piscataway (2014). http://dx.doi.org/10.1109/Co-HPC.2014.4

  2. Balasubramonian, R., Chang, J., Manning, T., Moreno, J.H., Murphy, R., Nair, R., Swanson, S.: Near-data processing: insights from a MICRO-46 workshop. IEEE Micro 34(4), 36–42 (2014)

    Article  Google Scholar 

  3. Biferale, L., Mantovani, F., Pivanti, M., Sbragaglia, A., Schifano, S., Toschi, F., Tripiccione, R.: Lattice Boltzmann fluid-dynamics on the QPACE supercomputer. Procedia Comput. Sci. 1(1), 1075–1082 (2010). http://www.sciencedirect.com/science/article/pii/S1877050910001201, ICCS 2010

    Article  Google Scholar 

  4. Biferale, L., Mantovani, F., Pivanti, M., Pozzati, F., Sbragaglia, M., Scagliarini, A., Schifano, S.F., Toschi, F., Tripiccione, R.: Optimization of multi-phase compressible lattice Boltzmann codes on massively parallel multi-core systems. Procedia Comput. Sci. 4, 994–1003 (2011). http://www.sciencedirect.com/science/article/pii/S1877050911001633, Proceedings of the International Conference on Computational Science, ICCS 2011

    Article  Google Scholar 

  5. Boyle, P.A., Christ, N.H., Kim, C.: Co-design of the IBM BlueGene/q level 1 prefetch engine with QCD. IBM J. Res. Dev. 57(1/2), 13:1–13:10 (2013)

    Article  Google Scholar 

  6. Calore, E., Schifano, S.F., Tripiccione, R.: A portable OpenCL lattice Boltzmann code for multi- and many-core processor architectures. Procedia Comput. Sci. 29, 40–49 (2014). http://www.sciencedirect.com/science/article/pii/S1877050914001811, 2014 International Conference on Computational Science

    Article  Google Scholar 

  7. Elliott, D., Snelgrove, W., Stumm, M.: Computational ram: a memory-simd hybrid and its application to dsp. In: Proceedings of the IEEE 1992 on Custom Integrated Circuits Conference, pp. 30.6.1–30.6.4, May 1992

    Google Scholar 

  8. Frommer, A., Kahl, K., Krieg, S., Leder, B., Rottmann, M.: Adaptive aggregation based domain decomposition multigrid for the lattice Wilson Dirac operator. SIAM J. Sci. Comput. 36, A1581–A1608 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  9. Hall, M., Kogge, P., Koller, J., Diniz, P., Chame, J., Draper, J., LaCoss, J., Granacki, J., Brockman, J., Srivastava, A., Athas, W., Freeh, V., Shin, J., Park, J.: Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In: ACM/IEEE 1999 Conference on Supercomputing, pp. 57–57, November 1999

    Google Scholar 

  10. Heybrock, S., Joó, B., Kalamkar, D.D., Smelyanskiy, M., Vaidyanathan, K., Wettig, T., Dubey, P.: Lattice QCD with domain decomposition on intel xeon phi co-processors. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2014), pp. 69–80. IEEE Press, Piscataway (2014). http://dx.doi.org/10.1109/SC.2014.11

  11. Hybrid Memory Cube Consortium: Hybrid Memory Cube Specification (2013)

    Google Scholar 

  12. Kang, Y., Huang, W., Yoo, S.M., Keen, D., Ge, Z., Lam, V., Pattnaik, P., Torrellas, J.: FlexRAM: toward an advanced intelligent memory system. In: International Conference on Computer Design (ICCD 1999), pp. 192–201 (1999)

    Google Scholar 

  13. Koutsou, G., Krieg, S., Pleiter, D., Simma, H.: EIC co-design questionnaire: lattice QCD (unpublished, 2013)

    Google Scholar 

  14. Nair, R., Antao, S.F., Bertolli, C., Bose, P., Brunheroto, J.R., Chen, T., Cher, C.-Y., Costa, C.H.A., Evangelinos, C., Fleischer, B.M., Fox, T.W., Gallo, D.S., Grinberg, L., Gunnels, J.A., Jacob, A.C., Jacob, P., Jacobson, H.M., Karkhanis, T., Kim, C., Moreno, J.H., O’Brien, J.K., Ohmacht, M., Park, Y., Prener, D.A., Rosenburg, B.S., Ryu, K.D., Sallenave, O., Serrano, M.J., Siegl, P.D.M., Sugavanam, K., Sura, Z.: Active memory cube: a processing-in-memory architecture for exascale systems. IBM J. Res. Dev. 59(2/3), 17:1–17:14 (2015)

    Article  Google Scholar 

  15. Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-d blocking optimization for stencil computations on modern cpus and gpus. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2010), pp. 1–13, November 2010

    Google Scholar 

  16. Patterson, D., Anderson, T., Cardwell, N., Fromm, R., Keeton, K., Kozyrakis, C., Thomas, R., Yelick, K.: A case for intelligent RAM. IEEE Micro 17(2), 34–44 (1997)

    Article  Google Scholar 

  17. Scagliarini, A., Biferale, L., Sbragaglia, M., Sugiyama, K., Toschi, F.: Lattice Boltzmann methods for thermal flows: continuum limit and applications to compressible Rayleigh-Taylor systems. Phys. Fluids 22(5), 055101 (2010)

    Article  Google Scholar 

  18. Schifano, S.F., Tripiccione, R.: EIC co-design questionnaire: LBM (unpublished, 2013)

    Google Scholar 

  19. Torrellas, J.: Flexram: toward an advanced intelligent memory system: a retrospective paper. In: IEEE 30th International Conference on Computer Design (ICCD 2012), pp. 3–4, September 2012

    Google Scholar 

  20. Williams, S., Oliker, L., Carter, J., Shalf, J.: Extracting ultra-scale lattice Boltzmann performance via hierarchical and distributed auto-tuning. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2011), pp. 55:1–55:12. ACM, New York (2011). http://doi.acm.org/10.1145/2063384.2063458

  21. Winter, F., Clark, M., Edwards, R., Joo, B.: A framework for lattice QCD calculations on GPUs. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1073–1082, May 2014

    Google Scholar 

Download references

Acknowledgements

We thank the AMC team at IBM Research, in particular J. Moreno, for sharing their knowledge on the AMC and continued help on this project including many fruitful discussions. Furthermore, we gratefully acknowledge F.S. Schifano and R. Tripiccione (INFN/University of Ferrara) for making a mini-application version of their D2Q37 code available and for discussing their future roadmaps [18]. We also thank G. Koutsou, S. Krieg, and H. Simma from the Simulation Lab LQCD at Cyprus Institute/DESY/JSC for discussing the future requirements of LQCD [13]. Finally, we thank A. Frommer and S. Krieg for making their implementation of their AMG solver [8] available.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thorsten Hater .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Baumeister, P.F. et al. (2015). Accelerating LBM and LQCD Application Kernels by In-Memory Processing. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20119-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20118-4

  • Online ISBN: 978-3-319-20119-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics