Rapid, High-Level Performance Estimation for DSE Using Calibrated Weight Tables

  • Kasra Moazzemi
  • Smit Patel
  • Shen Feng
  • Gunar Schirner
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 523)

Abstract

Automated Design Space Exploration (DSE) is a critical part of system-level design. It relies on performance estimation to evaluate design alternatives. However, since a plethora of design alternatives need to be compared, the run-time of performance estimation itself may pose a bottleneck. In DSE, fastest performance estimation is of essence while some accuracy may be sacrificed. Fast estimation can be realised through capturing application demand, as well as Processing Element (PE) supply (later on called weight table) in a matrix each. Then, performance estimation (retargeting) is reduced to a matrix multiplication. However, defining the weight table from a data sheet is impractical due to the multitude of (micro-) architecture aspects.

This paper introduces a novel methodology, WeiCal, for automatically generating Weight Tables in the context of C source-level estimation using application profiling and Linear Programming (LP). LP solving is based on the measured performance of training benchmarks on an actual PE. We validated WeiCal using a synthetic processor and benchmark model, and also analyse the impact of non-observable features on estimation accuracy. We evaluate the efficiency using 49 benchmarks on 2 different processors with varying configurations (multiple memory configurations and software optimizations). On a 3.1 GHz i5-3450 Intel host, 25 million estimations/second can be obtained regardless of the application size and PE complexity. The accuracy is sufficient for early DSE with a 24% average error.

References

  1. 1.
    Cai, L., Gerstlauer, A., Gajski, D.: Retargetable profiling for rapid, early system-level design space exploration. In: Proceedings of the 41st Annual Design Automation Conference, DAC 2004, San Diego, CA, USA, pp. 281–286 (2004). ISBN 1-58113-828-8Google Scholar
  2. 2.
    Lattuada, M., Ferrandi, F.: Performance modeling of embedded applications with zero architectural knowledge. In: IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp. 277–286 (2010)Google Scholar
  3. 3.
    Wang, Z., Herkersdorf, A.: An efficient approach for system-level timing simulation of compiler-optimized embedded software. In: 46th ACM/IEEE, Design Automation Conference, pp. 220–225 (2009)Google Scholar
  4. 4.
    Aung, Y.L., Lam, S.-K., Srikanthan, T.: Compiler-assisted technique for rapid performance estimation of FPGA-based processors. In: IEEE International SOC Conference (SOCC), pp. 341–346 (2011)Google Scholar
  5. 5.
    Oyamada, M., Wagner, F.R., Bonaciu, M., Cesario, W., Jerraya, A.: Software performance estimation in MPSoC design. In: Asia and South Pacific Design Automation Conference Proceedings of the 2007, pp. 38–43 (2007). ISBN 1-4244-0629-3Google Scholar
  6. 6.
    Gao, L., Karuri, K., Kraemer, S., Leupers, R., Ascheid, G., Meyr, H.: Multiprocessor performance estimation using hybrid simulation. In: 45th ACM/IEEE Design Automation Conference, pp. 325–330 (2008)Google Scholar
  7. 7.
    Hwang, Y., Abdi, S., Gajski, D.: Cycle-approximate retargetable performance estimation at the transaction level. In: Design, Automation and Test in Europe, DATE 2008, pp. 3–8 (2008)Google Scholar
  8. 8.
    Javaid, H., Janapsatya, A., Haque, M.S., Parameswaran, S.: Rapid runtime estimation methods for pipelined MPSoCs. In: Design, Automation Test in Europe Conference Exhibition, pp. 363–368 (2010)Google Scholar
  9. 9.
    Stattelmann, S., Bringmann, O., Rosenstiel, W.: Fast and accurate source-level simulation of software timing considering complex code optimizations. In: Design Automation Conference (DAC), 2011 48th ACM/EDAC/IEEE, pp. 486–491 (2011)Google Scholar
  10. 10.
    Mohanty, S., Prasanna, V.K.: Rapid system-level performance evaluation and optimization for application mapping onto SoC architectures. In: 15th Annual IEEE International ASIC/SOC Conference, pp. 160–167 (2002)Google Scholar
  11. 11.
    Samsung Electronics: 32 bit CMOS Microcontroller User’s Manual. S3C2440A, July 2004. Rev. 1Google Scholar
  12. 12.
    Guthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., Brown, R.: MiBench: a free, commercially representative embedded benchmark suite. In: IEEE International Workshop on Workload Characterization WWC-4, pp. 3–14, Dec 2001Google Scholar
  13. 13.
    Zivojnovic, V., Martinez, J., Schlager, C., Meyr, H.: DSPstone: a DSP-oriented benchmarking methodology. In: The International Conference on Signal Processing Applications and Technology, pp. 715–720 (1994)Google Scholar
  14. 14.
    Analog Devices: Blackfin Embedded Processor. ADSP-BF527 (2013). Rev. DGoogle Scholar
  15. 15.
    Eide, E., Regehr, J.: Volatiles are miscompiled, and what to do about it. In: EMSOFT (2008)Google Scholar
  16. 16.
    Gustafsson, J., Betts, A., Ermedahl, A., Lisper, B.: The malardalen WCET benchmarks - past, present and future. In: WCET 2010, Brussels, Belgium, pp. 137–147 (2010)Google Scholar
  17. 17.
    Achterberg, T.: SCIP: solving constraint integer programs. Math. Program. Comput. 1, 1–41 (2009)MathSciNetCrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2017

Authors and Affiliations

  • Kasra Moazzemi
    • 1
  • Smit Patel
    • 1
  • Shen Feng
    • 1
  • Gunar Schirner
    • 1
  1. 1.Department of Electrical and Computer EngineeringNortheastern UniversityBostonUSA

Personalised recommendations