Advertisement

Evaluating Auto-adaptation Methods for Fine-Grained Adaptable Processors

  • Joost Hoozemans
  • Jeroen van Straten
  • Zaid Al-Ars
  • Stephan Wong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10793)

Abstract

To achieve energy savings while maintaining adequate performance, system designers and programmers wish to create the best possible match between program behavior and the underlying hardware. Well-known current approaches include DVFS and task migrations in heterogeneous platforms such as big.LITTLE processors. Additionally, processors have been proposed in literature that are able to adapt (parts of) their organization to the workload. These reconfigurations can be managed using hardware monitors, profiling and other compile-time information or a combination of both. Many current solutions are suitable for heterogeneous systems, as migration penalties pose a practical limit to the maximum adaptation frequency, but not for dynamic processors that can adapt much more fine-grained.

In this paper, we present two novel concepts to aid these low-penalty reconfigurable processors - one requiring an ISA extension and one without. Our experimental results show that our approaches enable a dynamic processor to reduce the energy-delay product by up to 25% and on average 10% to 18% compared to the best performing static setups.

Notes

Acknowledgements

This work has been supported by the ALMARVI European Artemis project nr. 621439.

References

  1. 1.
    Khubaib, Suleman, M.A., Hashemi, M., Wilkerson, C., Patt, Y.N.: MorphCore: an energy-efficient microarchitecture for high performance ILP and high throughput TLP. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 305–316, December 2012Google Scholar
  2. 2.
    Brown, J.A., Porter, L., Tullsen, D.M.: Fast thread migration via cache working set prediction. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), pp. 193–204. IEEE (2011)Google Scholar
  3. 3.
    Rangan, K.K., Wei, G.-Y., Brooks, D.: Thread motion: fine-grained power management for multi-core systems. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ser. ISCA 2009, pp. 302–313. ACM, New York (2009). http://doi.acm.org/10.1145/1555754.1555793
  4. 4.
    Rodrigues, M., Roma, N., Tomás, P.: Fast and scalable thread migration for multi-core architectures. In: 2015 IEEE 13th International Conference on Embedded and Ubiquitous Computing, pp. 9–16, October 2015Google Scholar
  5. 5.
    Brandon, A., Hoozemans, J., van Straten, J., Wong, S.: Exploring ILP and TLP on a polymorphic VLIW processor. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds.) ARCS 2017. LNCS, vol. 10172, pp. 177–189. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-54999-6_14 CrossRefGoogle Scholar
  6. 6.
    Wong, S., van As, T., Brown, G.: \(\rho \)-VEX: a reconfigurable and extensible softcore VLIW processor. In: International Conference on Field-Programmable Technology (ICFPT), December 2008Google Scholar
  7. 7.
    Brandon, A., Wong, S.: Support for dynamic issue width in VLIW processors using generic binaries. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 827–832, March 2013Google Scholar
  8. 8.
    Codrescu, L., Anderson, W., Venkumanhanti, S., Zeng, M., Plondke, E., Koob, C., Ingle, A., Tabony, C., Maule, R.: Hexagon DSP: an architecture optimized for mobile multimedia and communications. IEEE Micro 34(2), 34–43 (2014)CrossRefGoogle Scholar
  9. 9.
    Becchi, M., Crowley, P.: Dynamic thread assignment on heterogeneous multiprocessor architectures. In: Proceedings of the 3rd Conference on Computing Frontiers, ser. CF 2006, pp. 29–40. ACM, New York (2006)Google Scholar
  10. 10.
    Guo, Q., Sartor, A., Brandon, A., Beck, A.C., Zhou, X., Wong, S.: Run-time phase prediction for a reconfigurable VLIW processor. In: 2016 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1634–1639. IEEE (2016)Google Scholar
  11. 11.
    Hoogerbrugge, J.: Dynamic branch prediction for a VLIW processor. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, (PACT), pp. 207–214. IEEE (2000)Google Scholar
  12. 12.
    Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: a free, commercially representative embedded benchmark suite. In: 2001 IEEE International Workshop on Workload Characterization: WWC-4, pp. 3–14. IEEE (2001)Google Scholar
  13. 13.
    Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S.W., Moore, C.R.: Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, pp. 422–433. IEEE (2003)Google Scholar
  14. 14.
    Ipek, E., Kirman, M., Kirman, N., Martinez, J.F.: Core fusion: accommodating software diversity in chip multiprocessors. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, ser. ISCA 2007, pp. 186–197. ACM, New York (2007). http://doi.acm.org/10.1145/1250662.1250686
  15. 15.
    Rodrigues, R., Annamalai, A., Koren, I., Kundu, S.: Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing. ACM Trans. Des. Autom. Electron. Syst. 18(1), 5:1–5:23 (2013). http://doi.acm.org/10.1145/2390191.2390196 Google Scholar
  16. 16.
    Duesterwald, E., Cascaval, C., Dwarkadas, S.: Characterizing and predicting program behavior and its variability. In: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, PACT 2003, pp. 220–231, September 2003Google Scholar
  17. 17.
    Chi, E., Salem, A.M., Bahar, R.I., Weiss, R.: Combining software and hardware monitoring for improved power and performance tuning. In: Proceedings of the Seventh Workshop on Interaction Between Compilers and Computer Architectures: INTERACT-7, pp. 57–64. IEEE (2003)Google Scholar
  18. 18.
    Kumar, R., Farkas, K.I., Jouppi, N.P., Ranganathan, P., Tullsen, D.M.: Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction. In: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture: MICRO-36, pp. 81–92. IEEE (2003)Google Scholar
  19. 19.
    Greenhalgh, P.: big.LITTLE processing with ARM cortex-A15 & Cortex-A7. ARM White Paper, pp. 1–8 (2011)Google Scholar
  20. 20.
    Van Craeynest, K., Jaleel, A., Eeckhout, L., Narvaez, P., Emer, J.: Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In: Proceedings of the 39th Annual International Symposium on Computer Architecture, ser. ISCA 2012, pp. 213–224. IEEE Computer Society, Washington, DC (2012). http://dl.acm.org/citation.cfm?id=2337159.2337184
  21. 21.
    Otero, A., Morales-Cas, A., Portilla, J., de la Torre, E., Riesgo, T.: A modular peripheral to support self-reconfiguration in SoCs. In: 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, pp. 88–95 (2010)Google Scholar
  22. 22.
    Aldham, M., Anderson, J., Brown, S., Canis, A.: Low-cost hardware profiling of run-time and energy in FPGA embedded processors. In: ASAP 2011–22nd IEEE International Conference on Application-specific Systems, Architectures and Processors, pp. 61–68, September 2011Google Scholar
  23. 23.
    Sherwood, T., Sair, S., Calder, B.: Phase tracking and prediction. In: ACM SIGARCH Computer Architecture News, vol. 31, no. 2, pp. 336–349. ACM (2003)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Joost Hoozemans
    • 1
  • Jeroen van Straten
    • 1
  • Zaid Al-Ars
    • 1
  • Stephan Wong
    • 1
  1. 1.Delft University of TechnologyDelftThe Netherlands

Personalised recommendations