Abstract
Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In this paper, we propose a new profiling technique that incorporates the strength of both software and hardware to achieve near-zero overhead profiling. The compiler passes profiling requests as a few bits of information in branch instructions to the hardware, and the hardware uses the free execution slots available in a user program to execute profiling operations. We have implemented the compiler instrumentation of this technique using an Itanium research compiler. Our result shows that the accurate block profiling incurs very little overhead to the user program in terms of the program scheduling cycles. For example, the average overhead is 0.6% for the SPECint95 benchmarks. The hardware support required for the new profiling is practical. We believe this will enable many profile-driven dynamic optimizations for EPIC processors such as the Itanium processors.
Chapter PDF
References
Anderson, J., Berc, L.M., Dean, J., Ghemawat, S., Henzinger, M.R., Leung, S.T., Sites, R.L., Vandevoorde, M.T., Waldspurger, C.A., Weihl, W.E.: Continuous profiling: where have all the cycles gone? In: Proc. 16th Symposium on Operating System Principles (October 1997)
Arnold, Matthew, Ryder, B.G.: A framework for reducing the cost of instrumented code. In: Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, Snowbird, Utah, United States, June 2001, pp. 168–179 (2001)
August, D.I., Connors, D.A., Mahlke, S.A., Sias, J.W., Crozier, K.M., Cheng, B.-C., Eaton, P.R., Olaniran, Q.B., Hwu, W.-M.W.: Integrated predicated and speculative execution in the IMPACT EPIC architecture. In: Proceedings of 25th Annual International Symposium on Computer Architecture, pp. 227–237 (1998)
Ball, Thomas, Larus, J.: Optimally profiling and tracing programs. ACM Transactions on Programming Languages and Systems 16(3), 1319–1360 (1994)
Ball, Thomas, Larus, J.: Efficient Path Profiling. MICRO-29 (December 1996)
Conte, T.M., Petal, B.A., Cox, J.S.: Using branch handling hardware to support profile-driven optimization. In: Proc. 27th Annual Intl. Symposium on Microarchitecture, pp. 36–45 (December 1996)
Conte, T.M., Menezes, K.N., Hirsh, M.A.: Accurate and practical profile-driven compilation using the profile buffer. In: Proc. 29th Annual Intl. Symposium on Microarchitecture, November 1994, pp. 12–21 (1994)
Dean, J., Hicks, J.E., Waldspurger, C.A., Weihl, W.E., Chrysos, G.: ProfileMe: Hardware Support for Instruction-level Profiling on Out-of-Order Processors. Micro-30 (December 1997)
Diep, Trung, A., Neslson, C., Shen, J.P.: Performance Evaluation of the PowerPC 620 Microarchitecture. In: Proceeding of the 22nd Annual International Symposium on Computer Architecture, June 1995, pp. 163–174 (1995)
Ebcioglu, K., Altman, E., Gschwind, M., Sathaye, S.: Dynamic binary translation and optimization. IEEE Transactions on Computers 50(6), 529–548 (2001)
Eichenberger, A., Lobo, S.M.: Efficient Edge Profiling for ILP-Processor. PACT 1998 (1998)
Knuth, D.E., Stevenson, F.R.: Optimal measurement of points for program frequency counts. BIT 13, pp. 313–322 (1973)
Intel Corp, Itanium Application Developers Architecture Guide (May 1999)
Lee, Yong-fong, Ryder, B.G.: A Comprehensive Approach to Parallel Data Flow Analysis. In: Proceedings of the ACM International Conference on Supercomputing, July 1992, pp. 236–247 (1992)
Merten, Matthew, C., Trick, A.R., George, C.N., Gyllenhaal, J.C., Hwu, W.-m.W.: A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization. In: Proceedings of the 26th International Symposium on Computer Architecture (May 1999)
Merten, M.C., Trick, A.R., Nystrom, E.M., Barnes, R.D., Hwu, W.-M.W.: A hardware mechanism for dynamic extraction and relayout of program hot spots. In: Proceedings of the 27th International Symposium on Computer Architecture, pp. 59–70 (2000)
Schnarr, Eric, Larus, J.: Instruction Scheduling and Executable Editing. Micro 29 (December 1996)
Schlansker, M.S., Rau, B.R.: EPIC: Explicitly Parallel Instruction Computing. Computer 33(2), 37–45 (2000)
Zhang, Xiaolan, Wang, Z., Gloy, N., Bradley Chen, J., Smith, M.D.: System Support for Automated Profiling and Optimization. In: 16th ACM Symposium on Operating System Principles, October 5-8 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, Y., Lee, YF. (2004). Exploiting Free Execution Slots on EPIC Processors for Efficient and Accurate Runtime Profiling. In: Yew, PC., Xue, J. (eds) Advances in Computer Systems Architecture. ACSAC 2004. Lecture Notes in Computer Science, vol 3189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30102-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-30102-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23003-8
Online ISBN: 978-3-540-30102-8
eBook Packages: Springer Book Archive