Advertisement

Accelerating a Classic 3D Video Game on Heterogeneous Reconfigurable MPSoCs

  • Leonardo SurianoEmail author
  • David Lima
  • Eduardo de la Torre
Conference paper
  • 44 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12083)

Abstract

Heterogeneous Reconfigurable MPSoCs, coupling microprocessors with Programmable Logic, are becoming extremely important in High-Performance Embedded Computing domain where energy consumption is a key factor to be considered by every designer. However, efficient hardware/software co-design still requires experience and a big effort: finding an optimal solution and an acceptable trade-off between performance and energy may require several tests and it is strongly platform-dependent. To this respect, a Dataflow-based method is used in this work for exploring different hardware/software configurations (number of hardware accelerators and FPGA frequency). As a use case, the acceleration of a well-known 3D video game (DOOM) is presented. The method offers rapid trade-off analysis in terms of non-functional parameters such as computing performance or power/energy measurements.

Extensive experimental results show that is possible to speed up the game and, at the same time, reduce the energy consumption of the whole platform. A custom Linux-based Operating System for Zynq Ultrascale+ was created, including a GPU driver to support a graphical interface on an HDMI screen and drivers to manage custom hardware accelerators on the FPGA side.

The best solution to save up to 63% of energy corresponds to the use of four parallel hardware accelerators, where a function speed up of x3.6 and an application speed up of x2 (in line with Amdahl’s law) is obtained.

Additionally, a set of Pareto optimal solutions are reported in the results section.

Keywords

Hardware acceleration FPGA Performance measurement Power measurement Energy measurement Design space exploration Pareto Front MPSoC Zynq Ultrascale+ Linux Driver 3D video game DOOM 

Notes

Acknowledgments

This work was supported by the Spanish Ministry (Ministerio de Economía y Competitividad) under projects PLATINO under Grant TEC2012-31145.

References

  1. 1.
    Agne, A., et al.: ReconOS: an operating system approach for reconfigurable computing. IEEE Micro 34(1), 60–71 (2014)CrossRefGoogle Scholar
  2. 2.
    Arrestier, F., Desnos, K., Pelcat, M., Heulot, J., Juarez, E., Menard, D.: Delays and states in dataflow models of computation. In: Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2018, pp. 47–54. ACM, New York (2018).  https://doi.org/10.1145/3229631.3229645
  3. 3.
    Baghdadi, A., Zergainoh, N., Cesario, W., Roudier, T., Jerraya, A.A.: Design space exploration for hardware/software codesign of multiprocessor systems. In: Proceedings 11th International Workshop on Rapid System Prototyping, RSP 2000. Shortening the Path from Specification to Prototype (Cat. No. PR00668), pp. 8–13, June 2000.  https://doi.org/10.1109/IWRSP.2000.854975
  4. 4.
    Beltrame, G., Fossati, L., Sciuto, D.: Decision-theoretic design space exploration of multiprocessor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29(7), 1083–1095 (2010).  https://doi.org/10.1109/TCAD.2010.2049053CrossRefGoogle Scholar
  5. 5.
    Blythe, S.A., Walker, R.A.: Efficient optimal design space characterization methodologies. ACM Trans. Des. Autom. Electron. Syst. 5(3), 322–336 (2000).  https://doi.org/10.1145/348019.348058CrossRefGoogle Scholar
  6. 6.
    Bruni, D., Bogliolo, A., Benini, L.: Statistical design space exploration for application-specific unit synthesis. In: Proceedings of the 38th Design Automation Conference (IEEE Cat. No. 01CH37232), pp. 641–646, June 2001.  https://doi.org/10.1145/378239.379039
  7. 7.
    Caldas-Calle, L., Jara, J., Huerta, M., Gallegos, P.: QoS evaluation of VPN in a Raspberry Pi devices over wireless network. In: 2017 International Caribbean Conference on Devices, Circuits and Systems (ICCDCS), pp. 125–128, June 2017.  https://doi.org/10.1109/ICCDCS.2017.7959718
  8. 8.
    Castrillon, J., Leupers, R.: Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap. Technical report, Lehrstuhl für Software für Systeme auf Silizium (2013)Google Scholar
  9. 9.
    Charitopoulos, G., Koidis, I., Papadimitriou, K., Pnevmatikatos, D.: Hardware task scheduling for partially reconfigurable FPGAs. In: Sano, K., Soudris, D., Hübner, M., Diniz, P.C. (eds.) ARC 2015. LNCS, vol. 9040, pp. 487–498. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16214-0_45CrossRefGoogle Scholar
  10. 10.
    Open-Source Community: Chocolate doom Wiki-pages (2019). https://www.chocolate-doom.org/wiki/index.php/Chocolate_Doom
  11. 11.
    Coşar, M., Karasartova, S.: A firewall application on SOHO networks with Raspberry Pi and snort. In: 2017 International Conference on Computer Science and Engineering (UBMK), pp. 1000–1003, October 2017.  https://doi.org/10.1109/UBMK.2017.8093414
  12. 12.
    Desnos, K., Pelcat, M., Nezan, J.F., Bhattacharyya, S.S., Aridhi, S.: PiMM: parameterized and interfaced dataflow meta-model for MPSoCs runtime reconfiguration. In: 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII), pp. 41–48. IEEE (2013)Google Scholar
  13. 13.
    Eckert, M., Meyer, D., Haase, J., Klauer, B.: Operating system concepts for reconfigurable computing: review and survey. Int. J. Reconfigurable Comput. 2016, 1–11 (2016)CrossRefGoogle Scholar
  14. 14.
    Erbas, C., Cerav-Erbas, S., Pimentel, A.D.: Multiobjective optimization and evolutionary algorithms for the application mapping problem in multiprocessor system-on-chip design. IEEE Trans. Evol. Comput. 10(3), 358–374 (2006).  https://doi.org/10.1109/TEVC.2005.860766CrossRefGoogle Scholar
  15. 15.
    FANDOM: Doom Wiki (2019). https://doom.fandom.com/wiki/Shareware
  16. 16.
    Gajski, D.D., Vahid, F., Narayan, S.: SpecSyn: an environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 6(1), 84–100 (1998).  https://doi.org/10.1109/92.661251CrossRefGoogle Scholar
  17. 17.
    Gries, M.: Methods for evaluating and covering the design space during early design development. Integr. VLSI J. 38(2), 131–183 (2004)CrossRefGoogle Scholar
  18. 18.
    Harish Kumar, B.: WSN based automatic irrigation and security system using Raspberry Pi board. In: 2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC), pp. 1097–1103, September 2017.  https://doi.org/10.1109/CTCEEC.2017.8455140
  19. 19.
  20. 20.
    Ismail, A., Shannon, L.: FUSE: front-end user framework for O/S abstraction of hardware accelerators. In: 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 170–177. IEEE (2011)Google Scholar
  21. 21.
    Kang, E., Jackson, E., Schulte, W.: An approach for effective design space exploration. In: Calinescu, R., Jackson, E. (eds.) Monterey Workshop 2010. LNCS, vol. 6662, pp. 33–54. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21292-5_3CrossRefGoogle Scholar
  22. 22.
    Kang, S., Kumar, R.: Magellan: a search and machine learning-based framework for fast multi-core design space exploration and optimization. In: 2008 Design, Automation and Test in Europe, pp. 1432–1437, March 2008.  https://doi.org/10.1109/DATE.2008.4484875
  23. 23.
    Kreutz, M., Marcon, C.A., Carro, L., Wagner, F., Susin, A.A.: Design space exploration comparing homogeneous and heterogeneous network-on-chip architectures. In: Proceedings of the 18th Annual Symposium on Integrated Circuits and System Design, SBCCI 2005, pp. 190–195. ACM, New York (2005).  https://doi.org/10.1145/1081081.1081130
  24. 24.
    Lahiri, K., Raghunathan, A., Dey, S.: System-level performance analysis for designing on-chip communication architectures. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 20(6), 768–783 (2001).  https://doi.org/10.1109/43.924830CrossRefGoogle Scholar
  25. 25.
    Liu, J., Zhong, W., Jiao, L.: A multiagent evolutionary algorithm for combinatorial optimization problems. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 40, 229–240 (2010)CrossRefGoogle Scholar
  26. 26.
    Madroñal, D., et al.: Automatic instrumentation of dataflow applications using PAPI. In: Proceedings of the 15th ACM International Conference on Computing Frontiers, pp. 232–235. ACM (2018)Google Scholar
  27. 27.
    Nag, K., Pal, T., Pal, N.R.: ASMiGA: an archive-based steady-state micro genetic algorithm. IEEE Trans. Cybern. 45(1), 40–52 (2015).  https://doi.org/10.1109/TCYB.2014.2317693CrossRefGoogle Scholar
  28. 28.
    Orsila, H., Salminen, E., Hämäläinen, T.: Parameterizing simulated annealing for distributing Kahn process networks on multiprocessor SoCs. In: 2009 International Symposium on System-on-Chip, pp. 019–026, November 2009.  https://doi.org/10.1109/SOCC.2009.5335683
  29. 29.
    Park, C., Chung, J., Ha, S.: Extended synchronous dataflow for efficient DSP system prototyping. IEEE, June 1999Google Scholar
  30. 30.
    Parthornratt, T., Burapanonte, N., Gunjarueg, W.: People identification and counting system using Raspberry Pi (AU-PICC: Raspberry Pi customer counter). In: 2016 International Conference on Electronics, Information, and Communications (ICEIC), pp. 1–5. IEEE (2016)Google Scholar
  31. 31.
    Pelcat, M., et al.: PREESM: a dataflow-based rapid prototyping framework for simplifying multicore DSP programming. In: 2014 6th European Embedded Design in Education and Research Conference (EDERC), pp. 36–40. IEEE (2014)Google Scholar
  32. 32.
    Pimentel, A.D.: Exploring exploration: a tutorial introduction to embedded systems design space exploration. IEEE Des. Test 34(1), 77–90 (2017)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Qadri, M.Y., Qadri, N.N., McDonald-Maier, K.D.: Fuzzy logic based energy and throughput aware design space exploration for MPSoC. Microprocess. Microsyst. 40, 113–123 (2016)CrossRefGoogle Scholar
  34. 34.
    Sekar, C., et al.: Tutorial T7: designing with Xilinx SDSoC. In: 2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems (VLSID), pp. xl–xli. IEEE (2017)Google Scholar
  35. 35.
    Shani, G.: Task-based decomposition of factored POMDPs. IEEE Trans. Cybern. 44(2), 208–216 (2014).  https://doi.org/10.1109/TCYB.2013.2252009CrossRefGoogle Scholar
  36. 36.
    Singh, A.K., Shafique, M., Kumar, A., Henkel, J.: Mapping on multi/many-core systems: survey of current and emerging trends. In: 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–10. IEEE (2013)Google Scholar
  37. 37.
    Sogi, N.R., Chatterjee, P., Nethra, U., Suma, V.: SMARISA: a Raspberry Pi based smart ring for women safety using IoT. In: 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 451–454, July 2018.  https://doi.org/10.1109/ICIRCA.2018.8597424
  38. 38.
    Suriano, L., et al.: DAMHSE: programming heterogeneous MPSocS with hardware acceleration using dataflow-based design space exploration and automated rapid prototyping. Microprocess. Microsyst. 71, 102882 (2019)CrossRefGoogle Scholar
  39. 39.
    Suriano, L., Madroñal, D., Rodríguez, A., Juárez, E., Sanz, C., de la Torre, E.: A unified hardware/software monitoring method for reconfigurable computing architectures using PAPI. In: 2018 13th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC), pp. 1–8. IEEE (2018)Google Scholar
  40. 40.
    Suriano, L., Rodriguez, A., Desnos, K., Pelcat, M., de la Torre, E.: Analysis of a heterogeneous multi-core, multi-hw-accelerator-based system designed using PREESM and SDSoC. In: 2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), pp. 1–7. IEEE (2017)Google Scholar
  41. 41.
    Theelen, B.D., Geilen, M.C., Basten, T., Voeten, J.P., Gheorghita, S.V., Stuijk, S.: A scenario-aware data flow model for combined long-run average and worst-case performance analysis. In: Fourth ACM and IEEE International Conference on Formal Methods and Models for Co-Design, MEMOCODE 2006. Proceedings, pp. 185–194. IEEE (2006)Google Scholar
  42. 42.
    Wang, Y., et al.: SPREAD: a streaming-based partially reconfigurable architecture and programming model. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(12), 2179–2192 (2013)CrossRefGoogle Scholar
  43. 43.
    Wolf, W., Jerraya, A.A., Martin, G.: Multiprocessor system-on-chip (MPSoC) technology. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 27(10), 1701–1713 (2008)CrossRefGoogle Scholar
  44. 44.
    Xilinx: Vivado design suite user guide - high level synthesis (2018)Google Scholar
  45. 45.
  46. 46.
    Xin, B., Chen, J., Zhang, J., Dou, L., Peng, Z.: Efficient decision makings for dynamic weapon-target assignment by virtual permutation and Tabu search heuristics. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(6), 649–662 (2010).  https://doi.org/10.1109/TSMCC.2010.2049261CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Universidad Politécnica de MadridMadridSpain

Personalised recommendations