Skip to main content

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 5470))

  • 456 Accesses

Abstract

Computer architects and designers rely heavily on simulation. The downside of simulation is that it is very time-consuming — simulating an industry-standard benchmark on today’s fastest machines and simulators takes several weeks. A practical solution to the simulation problem is sampling. Sampled simulation selects a number of sampling units out of a complete program execution and only simulates those sampling units in detail. An important problem with sampling however is the microarchitecture state at the beginning of each sampling unit. Large hardware structures such as caches and branch predictors suffer most from unknown hardware state. Although a great body of work exists on cache state warmup, very little work has been done on branch predictor warmup.

This paper proposes Branch History Matching (BHM) for accurate branch predictor warmup during sampled simulation. The idea is to build a distribution for each sampling unit of how far one needs to go in the pre-sampling unit in order to find the same static branch with a similar global and local history as the branch instance appearing in the sampling unit. Those distributions are then used to determine where to start the warmup phase for each sampling unit for a given total warmup length budget. Using SPEC CPU2000 integer benchmarks, we show that BHM is substantially more efficient than fixed-length warmup in terms of warmup length for the same accuracy. Or reverse, BHM is substantially more accurate than fixed-length warmup for the same warmup budget.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Conte, T.M., Hirsch, M.A., Menezes, K.N.: Reducing state loss for effective trace sampling of superscalar processors. In: Proceedings of the 1996 International Conference on Computer Design (ICCD 1996), pp. 468–477 (1996)

    Google Scholar 

  2. Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: Proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), pp. 45–57 (2002)

    Google Scholar 

  3. Wunderlich, R.E., Wenisch, T.F., Falsafi, B., Hoe, J.C.: SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In: Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA-30), pp. 84–95 (2003)

    Google Scholar 

  4. Yi, J.J., Kodakara, S.V., Sendag, R., Lilja, D.J., Hawkins, D.M.: Characterizing and comparing prevailing simulation techniques. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA-11), pp. 266–277 (2005)

    Google Scholar 

  5. Van Biesbrouck, M., Eeckhout, L., Calder, B.: Efficient sampling startup for sampled processor simulation. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793, pp. 47–67. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Wenish, T., Wunderlich, R., Falsafi, B., Hoe, J.: TurboSMARTS: Accurate microarchitecture simulation in minutes. In: Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 408–409 (2005)

    Google Scholar 

  7. Barr, K.C., Pan, H., Zhang, M., Asanovic, K.: Accelerating multiprocessor simulation with a memory timestamp record. In: Proceedings of the 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 66–77 (2005)

    Google Scholar 

  8. Van Ertvelde, L., Hellebaut, F., Eeckhout, L., De Bosschere, K.: NSL-BLRL: Efficient cache warmup for sampled processor simulation. In: Proceedings of the 29th Annual International Simulation Symposium (ANSS), pp. 168–175 (2006)

    Google Scholar 

  9. Wenisch, T.F., Wunderlich, R.E., Falsafi, B., Hoe, J.C.: Simulation sampling with live-points. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 2–12 (2006)

    Google Scholar 

  10. Barr, K.C., Asanovic, K.: Branch trace compression for snapshot-based simulation. In: Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 25–36 (2006)

    Google Scholar 

  11. Girbal, S., Mouchard, G., Cohen, A., Temam, O.: DiST: A simple, reliable and scalable method to significantly reduce processor architecture simulation time. In: Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 1–12 (2003)

    Google Scholar 

  12. Lauterbach, G.: Accelerating architectural simulation by parallel execution of trace samples. Technical Report SMLI TR-93-22, Sun Microsystems Laboratories Inc. (1993)

    Google Scholar 

  13. Kessler, R.E., Hill, M.D., Wood, D.A.: A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers 43, 664–675 (1994)

    Article  MATH  Google Scholar 

  14. Wood, D.A., Hill, M.D., Kessler, R.E.: A model for estimating trace-sample miss ratios. In: Proceedings of the 1991 SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 79–89 (1991)

    Google Scholar 

  15. Conte, T.M., Hirsch, M.A., Hwu, W.W.: Combining trace sampling with single pass methods for efficient cache simulation. IEEE Transactions on Computers 47, 714–720 (1998)

    Article  Google Scholar 

  16. Haskins Jr., J.W., Skadron, K.: Minimal subset evaluation: Rapid warm-up for simulated hardware state. In: Proceedings of the 2001 International Conference on Computer Design (ICCD 2001), pp. 32–39 (2001)

    Google Scholar 

  17. Haskins Jr., J.W., Skadron, K.: Memory Reference Reuse Latency: Accelerated warmup for sampled microarchitecture simulation. In: Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2003), pp. 195–203 (2003)

    Google Scholar 

  18. Eeckhout, L., Luo, Y., De Bosschere, K., John, L.K.: BLRL: Accurate and efficient warmup for sampled processor simulation. The Computer Journal 48, 451–459 (2005)

    Article  Google Scholar 

  19. Luo, Y., John, L.K., Eeckhout, L.: Self-monitored adaptive cache warm-up for microprocessor simulation. In: Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), pp. 10–17 (2004)

    Google Scholar 

  20. Haskins, J.W., Skadron, K.: Accelerated warmup for sampled microarchitecture simulation. ACM Transactions on Architecture and Code Optimization (TACO) 2, 78–108 (2005)

    Article  Google Scholar 

  21. Perelman, E., Hamerly, G., Calder, B.: Picking statistically valid and early simulation points. In: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), pp. 244–256 (2003)

    Google Scholar 

  22. Srivastava, A., Eustace, A.: ATOM: A system for building customized program analysis tools. Technical Report 94/2, Western Research Lab, Compaq (1994)

    Google Scholar 

  23. McFarling, S.: Combining branch predictors. Technical Report WRL TN-36, Digital Western Research Laboratory (1993)

    Google Scholar 

  24. Yeh, T.Y., Patt, Y.N.: Alternative implementations of two-level adaptive branch prediction. In: Proceedings of the 19th Annual International Symposium on Computer Architecture (ISCA-19), pp. 124–134 (1992)

    Google Scholar 

  25. Jimenez, D., Lin, C.: Neural methods for dynamic branch prediction. ACM Transactions on Computer Systems (TOCS) 20, 369–397 (2002)

    Article  Google Scholar 

  26. Seznec, A.: Analysis of the O-GEometric history length branch predictor. In: Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA), pp. 394–405 (2005)

    Google Scholar 

  27. Jimenez, D.A.: Piecewise linear branch prediction. In: Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA), pp. 382–393 (2005)

    Google Scholar 

  28. Jimenez, D.A.: Fast path-based neural branch prediction. In: Proceedings of the 36th Annual International Symposium on Microarchitecture (MICRO), pp. 243–252 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kluyskens, S., Eeckhout, L. (2009). Branch Predictor Warmup for Sampled Simulation through Branch History Matching. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers II. Lecture Notes in Computer Science, vol 5470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00904-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00904-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00903-7

  • Online ISBN: 978-3-642-00904-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics