Abstract
As has been emphasized throughout this book, it is necessary a high level of adaptability to cope with the high heterogeneous behavior of recent applications. At the same time, binary code compatibility is mandatory, so the large amount of already existing software can be reused without any kind of modification. In this scenario, this chapter discusses dynamic optimization techniques, how they can be used to improve performance, how they maintain binary compatibility and some case studies. The chapter starts presenting Binary translation. Its main concepts are clarified, as well as the main challenges that a binary translator mechanism must handle to work properly. The section ends with a detailed view of some examples of Binary Translation machines. Then, Reuse is discussed, and diverse types of it are covered: instruction reuse, value prediction, basic block, trace reuse and dynamic trace memoization. Furthermore, according to the discussion made in Chap.3, even though reconfigurable systems present huge potentials in terms of performance and energy, they alone cannot deal with the high heterogeneous behavior of recent applications neither maintain binary compatibility. Therefore, this chapter ends presenting approaches that use reconfigurable architectures together with mechanisms that somehow reassembles the behavior of the dynamic optimization techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altman, E.R., Kaeli, D.R., Sheffer, Y.: Welcome to the opportunities of binary translation. IEEE Comput. 33(3), 40–45 (2000)
Altman, E.R., Ebcioglu, K., Gschwind, M., Sathaye, S.: Advances and future challenges in binary – translation and optimization. Proc. IEEE 89(11), 1710–1722 (2001)
Apple Inc, R.: Apple rosetta. http://www.apple.com/asia/rosetta/ (2006)
Bala, V., Duesterwald, E., Banerjia, S.: Dynamo: a transparent dynamic optimization system. In: PLDI ’00: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, pp. 1–12. ACM, New York (2000). doi:http://doi.acm.org/10.1145/349299.349303
Bauer, L., Shafique, M., Kramer, S., Henkel, J.: Rispp: rotating instruction set processing platform. In: Proceedings of the 44th Annual Design Automation Conference, DAC ’07, pp. 791–796. ACM, New York (2007). doi:10.1145/1278480.1278678. http://doi.acm.org/10.1145/1278480.1278678
Beck, A.C.S., Carro, L.: A vliw low power java processor for embedded applications. In: SBCCI ’04: Proceedings of the 17th Symposium on Integrated Circuits and System Design, pp. 157–162. ACM, New York (2004). doi:http://doi.acm.org/10.1145/1016568.1016614
Beck, A.C.S., Carro, L.: Application of binary translation to java reconfigurable architectures. In: IPDPS ’05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05) – Workshop 3, p. 156.2. IEEE Computer Society, Washington, DC (2005). doi:http://dx.doi.org/10.1109/IPDPS.2005.111
Beck, A.C.S., Carro, L.: Dynamic reconfiguration with binary translation: breaking the ilp barrier with software compatibility. In: DAC ’05: Proceedings of the 42nd Annual Design Automation Conference, pp. 732–737. ACM, New York (2005). doi http://doi.acm.org/10.1145/1065579.1065771
Beck, A.C.S., Cairo, L.: Low power java processor for embedded applications. In: VLSI-SOC: From Systems to Chips. IFIP International Federation for Information Processing, vol. 200, pp. 213–228. Springer Boston (2006). URL http://www.springerlink.com/content/14rh612330184tu8/
Beck, A.C.S., Carro, L.: Transparent acceleration of data dependent instructions for general purpose processors. In: IFIP VLSI-SoC 2007, IFIP WG 10.5 International Conference on Very Large Scale Integration of System-on-Chip, Atlanta, GA, USA, 15–17 October 2007, pp. 66–71. Atlanta/USA IEEE (2007)
Beck, A.C.S., Carro, L.: Reconfigurable acceleration with binary compatibility for general purpose processors. In: VLSI-SoC: Advanced Topics on Systems on a Chip. IFIP International Federation for Information Processing, vol. 291, pp. 1–16. Springer, Boston (2009). http://www.springerlink.com/content/p17618617681uvx3/
Beck Filho, A.C.S., Mattos, J.C.B., Wagner, F.R., Carro, L.: Caco-ps: A general purpose cycle-accurate configurable power simulator. In: SBCCI ’03: Proceedings of the 16th Symposium on Integrated circuits and systems design, p. 349. IEEE Computer Society, Washington, DC (2003)
Beck, A.C.S., Gomes, V.F., Carro, L.: Exploiting java through binary translation for low power embedded reconfigurable systems. In: SBCCI ’05: Proceedings of the 18th Annual Symposium on Integrated Circuits and System Design, pp. 92–97. ACM, New York (2005). doi:http://doi.acm.org/10.1145/1081081.1081109
Beck, A.C.S., Gomes, V.F., Carro, L.: Automatic dataflow execution with reconfiguration and dynamic instruction merging. In: IFIP VLSI-SoC 2006, IFIP WG 10.5 International Conference on Very Large Scale Integration of System-on-Chip, Nice, France, 16–18 October 2006, pp. 30–35. Nice/France IEEE (2006)
Beck, A.C.S., Gomes, V.F., Carro, L.: Dynamic instruction merging and a reconfigurable array: Dataflow execution with software compatibility. In: Reconfigurable Computing: Architectures and Applications. Lecture Notes in Computer Science, vol. 3985, pp. 449–454. Springer, Berlin/Heidelberg (2006). http://www.springerlink.com/content/86458544617q0366/
Beck, A.C.S., Rutzig, M.B., Gaydadjiev, G., Carro, L.: Transparent reconfigurable acceleration for heterogeneous embedded applications. In: DATE ’08: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 1208–1213. ACM, New York (2008). doi:http://doi.acm.org/10.1145/1403375.1403669
Bellard, F.: Qemu, a fast and portable dynamic translator. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC ’05, pp. 41–41. USENIX Association, Berkeley (2005). http://dl.acm.org/citation.cfm?id=1247360.1247401
Bem, E.Z., Petelczyc, L.: Minimips: a simulation project for the computer architecture laboratory. In: SIGCSE ’03: Proceedings of the 34th SIGCSE Technical Symposium on Computer Science Education, pp. 64–68. ACM, New York (2003). doi:http://doi.acm.org/10.1145/611892.611934
Berticelli Lo, T., Beck, A., Rutzig, M., Carro, L.: A low-energy approach for context memory in reconfigurable systems. In: 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8 (2010). doi:10.1109/IPDPSW.2010.5470745
Burger, D., Austin, T.M.: The simplescalar tool set, version 2.0. SIGARCH Comput. Archit. News 25(3), 13–25 (1997). doi:http://doi.acm.org/10.1145/268806.268810
Burns, J., Gaudiot, J.L.: Smt layout overhead and scalability. IEEE Trans. Parallel Distrib. Syst. 13(2), 142–155 (2002). doi:http://dx.doi.org/10.1109/71.983942
Chernoff, A., Herdeg, M., Hookway, R., Reeve, C., Rubin, N., Tye, T., Bharadwaj, S., Yates, J.: Fx!32 a profile-directed binary translator. Micro IEEE 18(2), 56–64 (1998). doi:10.1109/40.671403. http://dx.doi.org/10.1109/40.671403
Clark, N.T., Zhong, H.: Automated custom instruction generation for domain-specific processor acceleration. IEEE Trans. Comput. 54(10), 1258–1270 (2005). doi:http://dx.doi.org/10.1109/TC.2005.156. Member-Mahlke, Scott A.
Clark, N., Tang, W., Mahlke, S.: Automatically generating custom instruction set extensions. In: Workshop on Application-Specific Processors (WASP), pp. 94–101 (2002)
Clark, N., Zhong, H., Mahlke, S.: Processor acceleration through automated instruction set customization. In: MICRO 36: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, p. 129. IEEE Computer Society, Washington, DC (2003)
Clark, N., Kudlur, M., Park, H., Mahlke, S., Flautner, K.: Application-specific processing on a general-purpose core via transparent instruction set customization. In: MICRO 37: Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 30–40. IEEE Computer Society, Washington, DC (2004). doi:http://dx.doi.org/10.1109/MICRO.2004.5
Clark, N., Blome, J., Chu, M., Mahlke, S., Biles, S., Flautner, K.: An architecture framework for transparent instruction set customization in embedded processors. In: ISCA ’05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pp. 272–283. IEEE Computer Society, Washington, DC (2005). doi:http://dx.doi.org/10.1109/ISCA.2005.9
Costa, A.T.D., Franca, F.M., Filho, E.M.C.: The dynamic trace memoization reuse technique. In: 9th PACT, p. 9299, 2000, IEEE Computer Society, Los Alamitos, pp. 92–99 (2000)
Dehnert, J.C., Grant, B.K., Banning, J.P., Johnson, R., Kistler, T., Klaiber, A., Mattson, J.: The transmeta code morphingTMsoftware: using speculation, recovery, and adaptive retranslation to address real-life challenges. In: CGO ’03: Proceedings of the International Symposium on Code Generation and Optimization, pp. 15–24. IEEE Computer Society, Washington, DC (2003)
de Mattos, J.C.B., Beck, A.C.S., Carro, L.: Object-oriented reconfiguration. In: 18th IEEE International Workshop on Rapid System Prototyping (RSP 2007), 28–30 May 2007, Porto Alegre, RS, Brazil, pp. 69–74. IEEE Computer Society, Washington, DC (2007)
Ebcioglu, K., Fritts, J., Kosonocky, S., Gschwind, M., Altman, E., Kailas, K., Brigh, T.: An eight issue tree-vliw processor for dynamic binary translation. In: ICCD ’98: Proceedings of the International Conference on Computer Design, p. 488. IEEE Computer Society, Washington, DC (1998)
Ebcioglu, K., Altman, E., Gschwind, M., Sathaye, S.: Dynamic binary translation and optimization. IEEE Trans. Comput. 50(6), 529–548 (2001). doi:http://dx.doi.org/10.1109/12.931892
Ebcioğlu, K., Altman, E.R.: Daisy: dynamic compilation for 100 architectural compatibility. In: ISCA ’97: Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 26–37. ACM, New York (1997)
Ferreira, R., Laure, M., Rutzig, M.B., Beck, A.C., Carro, L.: Reducing interconnection cost in coarse-grained dynamic computing through multistage network. In: FPL 2008, International Conference on Field Programmable Logic and Applications, Heidelberg, Germany, 8–10 September 2008, pp. 47–52. IEEE, New York (2008)
Ferreira, R., Laure, M., Beck, A.C., Lo, T., Rutzig, M., Carro, L.: A low cost and adaptable routing network for reconfigurable systems. In: 23nd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23–29, 2009, pp. 1–8. IEEE, Los Alamitos (2009)
Gabbay, F., Gabbay, F.: Speculative execution based on value prediction. Tech. rep., EE Department TR 1080, Technion – Israel Institue of Technology (1996)
Gabbay, F., Mendelson, A.: Using value prediction to increase the power of speculative execution hardware. ACM Trans. Comput. Syst. 16(3), 234–270 (1998). doi:http://doi.acm.org/10.1145/290409.290411
Gomes, V.F., Beck, A.C.S., Carro, L.: Trading time and space on low power embedded architectures with dynamic instruction merging. J. Low Power Electron. 1(3), 249–258 (2005)
Gonzalez, A., Tubella, J., Molina, C.: Trace-level reuse. In: ICPP ’99: Proceedings of the 1999 International Conference on Parallel Processing, p. 30. IEEE Computer Society, Washington, DC (1999)
Gschwind, M., Ebcioğlu, K., Altman, E., Sathaye, S.: Binary translation and architecture convergence issues for ibm system/390. In: ICS ’00: Proceedings of the 14th International Conference on Supercomputing, pp. 336–347. ACM, New York (2000). doi:http://doi.acm.org/10.1145/335231.335264
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: Mibench: A free, commercially representative embedded benchmark suite. In: 2001 IEEE International Workshop on Workload Characterization, 2001. WWC-4, pp. 3–14. IEEE Computer Society, Washington, DC (2001)
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 4th edn. Morgan Kaufmann (2006)
Hookway, R.J., Herdeg, M.A.: Digital fx!32: combining emulation and binary translation. Digital Tech. J. 9(1), 3–12 (1997)
Huang, J., Lilja, D.: Exploiting basic block value locality with block reuse. In: HPCA ’99: Proceedings of the 5th International Symposium on High Performance Computer Architecture, p. 106. IEEE Computer Society, Washington, DC (1999)
Huang, J., Lilja, D.J.: Extending value reuse to basic blocks with compiler support. IEEE Trans. Comput. 49(4), 331–347 (2000). doi:http://dx.doi.org/10.1109/12.844346
Hwu, W.M.W., Mahlke, S.A., Chen, W.Y., Chang, P.P., Warter, N.J., Bringmann, R.A., Quellette, R.G., Hank, R.E., Kiyohara, T., Haab, G.E., Holm, J.G., Lavery, D.M.: The superblock: an effective technique for vliw and superscalar compilation. In: Instruction-Level Parallel Processors, pp. 234–253. Kluwer, Hingham (1995)
Junior, J.F., Rutzig, M.B., Beck, A.C.S., Carro, L.: Towards an adaptable multiple-isa reconfigurable processor. In: Proceedings of the 7th International Conference on Reconfigurable Computing: Architectures, Tools and Applications, ARC’11, pp. 157–168. Springer, Berlin/Heidelberg (2011). http://dl.acm.org/citation.cfm?id=1987535.1987558
Lee, C., Potkonjak, M., Mangione-smith, W.H.: Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In: International Symposium on Microarchitecture, pp. 330–335. IEEE Computer Society, Washington, DC (1997)
Lipasti, M.H., Shen, J.P.: Exceeding the dataflow limit via value prediction. In: MICRO 29: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 226–237. IEEE Computer Society, Washington, DC (1996)
Lipasti, M.H., Wilkerson, C.B., Shen, J.P.: Value locality and load value prediction. In: ASPLOS-VII: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 138–147. ACM, New York (1996). doi:http://doi.acm.org/10.1145/237090.237173
Lysecky, R., Vahid, F.: A configurable logic architecture for dynamic hardware/software partitioning. In: DATE ’04: Proceedings of the Conference on Design, Automation and Test in Europe, p. 10480. IEEE Computer Society, Washington, DC (2004)
Lysecky, R., Vahid, F.: A study of the speedups and competitiveness of fpga soft processor cores using dynamic hardware/software partitioning. In: DATE ’05: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 18–23. IEEE Computer Society, Washington, DC (2005). doi:http://dx.doi.org/10.1109/DATE.2005.38
Lysecky, R., Vahid, F.: Design and implementation of a microblaze-based warp processor. ACM Trans. Embed. Comput. Syst. 8(3), 1–22 (2009). doi:http://doi.acm.org/10.1145/1509288.1509294
Lysecky, R., Stitt, G., Vahid, F.: Warp processors. ACM Trans. Des. Autom. Electron. Syst. 11(3), 659–681 (2006). doi:http://doi.acm.org/10.1145/1142980.1142986
Memik, G., Mangione-Smith, W.H., Hu, W.: Netbench: a benchmarking suite for network processors. In: ICCAD ’01: Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design, pp. 39–42. IEEE, Piscataway (2001)
Patel, S.J., Lumetta, S.S.: replay: A hardware framework for dynamic optimization. IEEE Trans. Comput. 50(6), 590–608 (2001). DOI http://dx.doi.org/10.1109/12.931895
Peng, L., Nakano, A., Tan, G., Vashishta, P., Fan, D., Zhang, H., Kalia, R.K., Song, F.: Performance analysis and optimization of molecular dynamics simulation on godson-t many-core processor. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, CF ’11, pp. 32:1–32:10. ACM, New York (2011). doi:10.1145/2016604.2016643. http://doi.acm.org/10.1145/2016604.2016643
Peng, L., Tan, G., Kalia, R.K., Nakano, A., Vashishta, P., Fan, D., Sun, N.: Preliminary investigation of accelerating molecular dynamics simulation on godson-t many-core processor. In: Proceedings of the 2010 Conference on Parallel Processing, Euro-Par 2010, pp. 349–356. Springer, Berlin/Heidelberg (2011). http://dl.acm.org/citation.cfm?id=2031978.2032026
Pilla, M.L., da Costa, A.T., França, F.M.G., Childers, B.R., Soffa, M.L.: The limits of speculative trace reuse on deeply pipelined processors. In: SBAC-PAD ’03: Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing, p. 36. IEEE Computer Society, Washington, DC (2003)
Pilla, M.L., Childers, B.R., da Costa, A.T., Franca, F.M.G., Navaux, P.O.A.: A speculative trace reuse architecture with reduced hardware requirements. In: SBAC-PAD ’06: Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing, pp. 47–54. IEEE Computer Society, Washington, DC (2006). doi:http://dx.doi.org/10.1109/SBAC-PAD.2006.7
Puttaswamy, K., Choi, K.W., Park, J.C., Mooney III, V.J., Chatterjee, A., Ellervee, P.: System level power-performance trade-offs in embedded systems using voltage and frequency scaling of off-chip buses and memory. In: ISSS ’02: Proceedings of the 15th International Symposium on System Synthesis, pp. 225–230. ACM, New York (2002). doi:http://doi.acm.org/10.1145/581199.581249
Rotenberg, E., Bennett, S., Smith, J.E.: Trace cache: a low latency approach to high bandwidth instruction fetching. In: MICRO 29: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 24–35. IEEE Computer Society, Washington, DC (1996)
Rutzig, M.B., Beck, A.C.S., Carro, L.: Transparent dataflow execution for embedded applications. In: ISVLSI ’07: Proceedings of the IEEE Computer Society Annual Symposium on VLSI, pp. 47–54. IEEE Computer Society, Washington, DC (2007). doi:http://dx.doi.org/10.1109/ISVLSI.2007.98
Rutzig, M.B., Madruga, F.L., Alves, M.A.Z., de Freitas, H.C., Beck, A.C.S., Maillard, N., Navaux, P.O.A., Carro, L.: Tlp and ilp exploitation through a reconfigurable multiprocessor system. In: IPDPS Workshops, pp. 1–8. IEEE, Piscataway (2010)
Rutzig, M., Beck, A., Carro, L.: Creams: An embedded multiprocessor platform. In: Koch, A., Krishnamurthy, R., McAllister, J., Woods, R., El-Ghazawi, T. (eds.) Reconfigurable Computing: Architectures, Tools and Applications. Lecture Notes in Computer Science, vol. 6578, pp. 118–124. Springer, Berlin/Heidelberg (2011)
Rutzig, M.B., Beck, A.C.S., Madruga, F., Alves, M.A., Freitas, H.C., Maillard, N., Navaux, P.O.A., Carro, L.: Boosting parallel applications performance on applying dim technique in a multiprocessing environment. Int. J. Reconfig. Comput. 2011, 4:1–4:13 (2011). doi:10.1155/2011/546962. http://dx.doi.org/10.1155/2011/546962
Sager, D., Group, D.P., Corp, I.: The microarchitecture of the pentium 4 processor. Intel Technol. J. 1(2001) (2001)
Schneider Beck Fl., A.C., Carro, L.: Dynamic Reconfigurable Architectures and Transparent Optimization Techniques: Automatic Acceleration of Software Execution, 1st edn. Springer, Dordrecht (2010)
Shankland, S.: Transmeta shoots for 700 mhz with new chip. In: CNET News (2000) http://news.cnet.com/Transmeta-shoots-for-700-MHz-with-new-chip/2100-1001_3-235806.html
Sites, R.L., Chernoff, A., Kirk, M.B., Marks, M.P., Robinson, S.G.: Binary translation. Commun. ACM 36(2), 69–81 (1993). doi:http://doi.acm.org/10.1145/151220.151227
Smith, J.E.: A study of branch prediction strategies. In: ISCA ’98: 25 Years of the International Symposia on Computer Architecture (Selected Papers), pp. 202–215. ACM, New York (1998). doi:http://doi.acm.org/10.1145/285930.285980
Sodani, A., Sohi, G.S.: Dynamic instruction reuse. SIGARCH Comput. Archit. News 25(2), 194–205 (1997). doi:http://doi.acm.org/10.1145/384286.264200
Sodani, A., Sohi, G.S.: An empirical analysis of instruction repetition. SIGOPS Oper. Syst. Rev. 32(5), 35–45 (1998). doi:http://doi.acm.org/10.1145/384265.291016
Sodani, A., Sohi, G.S.: Understanding the differences between value prediction and instruction reuse. In: MICRO 31: Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, pp. 205–215. IEEE Computer Society, Los Alamitos (1998)
Stitt, G., Lysecky, R., Vahid, F.: Dynamic hardware/software partitioning: a first approach. In: DAC ’03: Proceedings of the 40th Annual Design Automation Conference, pp. 250–255. ACM, New York (2003). doi:http://doi.acm.org/10.1145/775832.775896
Stitt, G., Vahid, F., McGregor, G., Einloth, B.: Hardware/software partitioning of software binaries: a case study of h.264 decode. In: CODES+ISSS ’05: Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pp. 285–290. ACM, New York (2005). doi:http://doi.acm.org/10.1145/1084834.1084905
Vahid, F., Stitt, G., Lysecky, R.: Warp processing: Dynamic translation of binaries to fpga circuits. Computer 41(7), 40–46 (2008). doi:http://dx.doi.org/10.1109/MC.2008.240
Yang, B.S., Moon, S.M., Park, S., Lee, J., Lee, S., Park, J., Chung, Y.C., Kim, S., Ebcioglu, K., Altman, E.R.: Latte: A java vm just-in-time compiler with fast and efficient register allocation. In: IEEE PACT, pp. 128–138. IEEE Computer Society, Washington, DC (1999)
Yeager, K.C.: The mips r10000 superscalar microprocessor. IEEE Micro 16(2), 28–40 (1996). doi:http://dx.doi.org/10.1109/40.491460
Yu, P., Mitra, T.: Characterizing embedded applications for instruction-set extensible processors. In: DAC ’04: Proceedings of the 41st Annual Design Automation Conference, pp. 723–728. ACM, New York (2004). doi:http://doi.acm.org/10.1145/996566.996764
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Beck, A.C.S. (2013). Dynamic Optimization Techniques. In: Beck, A., Lang Lisbôa, C., Carro, L. (eds) Adaptable Embedded Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1746-0_6
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1746-0_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1745-3
Online ISBN: 978-1-4614-1746-0
eBook Packages: EngineeringEngineering (R0)