Advertisement

Shallow Water Waves on a Deep Technology Stack: Accelerating a Finite Volume Tsunami Model Using Reconfigurable Hardware in Invasive Computing

  • Alexander Pöppl
  • Marvin Damschen
  • Florian Schmaus
  • Andreas Fried
  • Manuel Mohr
  • Matthias Blankertz
  • Lars Bauer
  • Jörg Henkel
  • Wolfgang Schröder-Preikschat
  • Michael Bader
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10659)

Abstract

Reconfigurable architectures are commonly used in the embedded systems domain to speed up compute-intensive tasks. They combine a reconfigurable fabric with a general-purpose microprocessor to accelerate compute-intensive tasks on the fabric while the general-purpose CPU is used for the rest of the workload. Through the use of invasive computing, we aim to show the feasibility of this technology for HPC scenarios. We demonstrate this by accelerating a proxy application for the simulation of shallow water waves using the i-Core, a reconfigurable processor that is part of the invasive computing multiprocessor system-on-chip. Using a floating-point custom instruction, the entire computation of numerical fluxes occurring in the application’s finite volume scheme is performed by hardware accelerators.

Keywords

Invasive computing High Performance Computing Tsunami simulation Reconfigurable processor Resource-aware computing 

Notes

Acknowledgments

This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89).

References

  1. 1.
  2. 2.
    Bader, M., Breuer, A., Hölzl, W., Rettenberger, S.: Vectorization of an augmented Riemann solver for the shallow water equations. In: Proceedings of 2014 International Conference on High Performance Computing and Simulation (HPCS 2014), pp. 193–201. IEEE (2014)Google Scholar
  3. 3.
    Bale, D.S., LeVeque, R.J., Mitran, S., Rossmanith, J.A.: A wave propagation method for conservation laws and balance laws with spatially varying flux functions. SIAM J. Sci. Comput. 24(3), 955–978 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bauer, L., Grudnitsky, A., Damschen, M., et al.: Floating point acceleration for stream processing applications in dynamically reconfigurable processors. In: IEEE Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia), October 2015Google Scholar
  5. 5.
    Bauer, L., Shafique, M., Henkel, J.: A computation- and communication-infrastructure for modular special instructions in a dynamically reconfigurable processor. In: International Conference on Field Programmable Logic and Applications, pp. 203–208. IEEE (2008)Google Scholar
  6. 6.
    Braun, M., Buchwald, S., Mohr, M., Zwinkau, A.: Dynamic X10: resource-aware programming for higher efficiency. Technical report 8, Karlsruhe Institute of Technology (2014). (X10 2014)Google Scholar
  7. 7.
    Braun, M., Buchwald, S., Zwinkau, A.: Firm—a graph-based intermediate representation. Technical report 35, Karlsruhe Institute of Technology (2011)Google Scholar
  8. 8.
    Breuer, A., Bader, M.: Teaching parallel programming models on a shallow-water code. In: Proceedings of 2012 11th International Symposium on Parallel and Distributed Computing, ISPDC 2012, pp. 301–308. IEEE Computer Society (2012)Google Scholar
  9. 9.
    Bungartz, H.J., Riesinger, C., Schreiber, M., et al.: Invasive computing in HPC with X10. In: Proceedings of 3rd ACM SIGPLAN X10 Workshop, X10 2013, pp. 12–19. ACM, New York (2013)Google Scholar
  10. 10.
    Cheriton, D.R., Malcolm, M.A., Melen, L.S., Sager, G.R.: Thoth, a portable real-time operating system. Commun. ACM 22(2), 105–115 (1979)CrossRefzbMATHGoogle Scholar
  11. 11.
    Cobham Gaisler AB: GRLIB IP library user’s manual. Technical report, Göteborg, Sweden, January 2016. Version 1.5.0: http://www.gaisler.com/products/grlib/grlib.pdf. Retrieved 2 May 2017
  12. 12.
    Damschen, M., Bauer, L., Henkel, J.: Extending the WCET problem to optimize for runtime-reconfigurable processors. ACM Trans. Archit. Code Optim. 13(4), 45:1–45:24 (2016)CrossRefGoogle Scholar
  13. 13.
    Dijkstra, E.W.: The structure of the “THE”-multiprogramming system. Commun. ACM 11(5), 341–346 (1968)CrossRefzbMATHGoogle Scholar
  14. 14.
    Henkel, J., Herkersdorf, A., Bauer, L., et al.: Invasive manycore architectures. In: Proceedings of 17th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 193–200, January 2012Google Scholar
  15. 15.
    Herlihy, M.: Wait-free synchronization. ACM Trans. Prog. Lang. Syst. (TOPLAS) 13(1), 124–149 (1991)CrossRefGoogle Scholar
  16. 16.
    Jouppi, N.P., Young, C., Patil, N., et al.: In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760 (2017)
  17. 17.
    LeVeque, R.J., George, D.L., Berger, M.J.: Tsunami modelling with adaptively refined finite volume methods. Acta Numerica 20, 211–289 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Mohr, M., Buchwald, S., Zwinkau, A., et al.: Cutting out the middleman: OS-level support for X10 activities. In: Proceedings of 5th ACM SIGPLAN X10 Workshop, X10 2015, pp. 13–18. ACM, New York (2015)Google Scholar
  19. 19.
    Oechslein, B., Schedel, J., Kleinöder, J., et al.: OctoPOS: a parallel operating system for invasive computing. In: Proceedings of International Workshop on Systems for Future Multi-core Architectures (SFMA), pp. 9–14. EuroSys (2011)Google Scholar
  20. 20.
    Ovtcharov, K., Ruwase, O., Kim, J.Y., et al.: Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper, vol. 2, no. 11 (2015)Google Scholar
  21. 21.
    Pöppl, A., Bader, M., Schwarzer, T., Glaß, M.: SWE-X10: simulating shallow water waves with lazy activation of patches using ActorX10. In: Proceedings of 2nd International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pp. 32–39. IEEE, November 2016Google Scholar
  22. 22.
    Roloff, S., Pöppl, A., Schwarzer, T., et al.: ActorX10: an actor library for X10. In: Proceedings of 6th ACM SIGPLAN X10 Workshop (X10). ACM (2016)Google Scholar
  23. 23.
    Saraswat, V., Almasi, G., Bikshandi, G., et al.: The asynchronous partitioned global address space model. Technical report, Toronto, Canada, June 2010Google Scholar
  24. 24.
    Saraswat, V., Bloom, B., Peshansky, I., et al.: X10 language specification, December 2015. Version 2.5: http://x10-lang.org. Retrieved 5 May 2017
  25. 25.
    Tanenbaum, A.S.: Modern Operating Systems, pp. 859–860. Prentice Hall, Upper Saddle River (2009)zbMATHGoogle Scholar
  26. 26.
    Teich, J., Henkel, J., Herkersdorf, A., Schmitt-Landsiedel, D., Schröder-Preikschat, W., Snelting, G.: Invasive computing: an overview. In: Hübner, M., Becker, J. (eds.) Multiprocessor System-on-Chip, pp. 241–268. Springer, New York (2011).  https://doi.org/10.1007/978-1-4419-6460-1_11 CrossRefGoogle Scholar
  27. 27.
    Tessier, R., Pocek, K., DeHon, A.: Reconfigurable computing architectures. Proc. IEEE 103(3), 332–354 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Alexander Pöppl
    • 1
  • Marvin Damschen
    • 2
  • Florian Schmaus
    • 3
  • Andreas Fried
    • 2
  • Manuel Mohr
    • 2
  • Matthias Blankertz
    • 2
  • Lars Bauer
    • 2
  • Jörg Henkel
    • 2
  • Wolfgang Schröder-Preikschat
    • 3
  • Michael Bader
    • 1
  1. 1.Department of InformaticsTechnical University of MunichGarching bei MünchenGermany
  2. 2.Department of InformaticsKarlsruhe Institute of TechnologyKarlsruheGermany
  3. 3.Department of Computer Science 4Friedrich-Alexander University Erlangen-Nürnberg (FAU)ErlangenGermany

Personalised recommendations