Loop-Aware Instruction Scheduling with Dynamic Contention Tracking for Tiled Dataflow Architectures

Farooq, Muhammad Umar; John, Lizy K.

doi:10.1007/978-3-642-00722-4_14

Loop-Aware Instruction Scheduling with Dynamic Contention Tracking for Tiled Dataflow Architectures

Muhammad Umar Farooq¹⁸ &
Lizy K. John¹⁸

Conference paper

839 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5501))

Abstract

Increasing on-chip wire delay along with the distributed nature of processing elements, makes instruction scheduling for tiled dataflow architectures very crucial. Our analysis reveals that careful placement of frequently executed sections of applications, and dynamic resource contention tracking can significantly improve the performance of the application. The former reduces the operand network latency, while the latter reduces stalls due to contention for processing elements. We augment one of the most recent instruction scheduling algorithms — hierarchical instruction scheduling — to better exploit spatial locality between instructions within a loop, thereby reducing expensive communication overhead by 6.5% and increasing average IPC by 5.13%. Secondly, in the presence of conditional branches and variable latency memory instructions, estimating resource contention, at compile time, is not only complex but also imperfect. We suggest dynamic tracking of contending instructions, and their re-location, once a contention threshold is exceeded. Results showed that dynamic contention tracking reduced the average ALU conflicts by 23%, thereby improving the average IPC by 14.22%. Combined together, these augmentations improve the average IPC by 19.39% and over 30% for some benchmarks.

Download to read the full chapter text

Chapter PDF

References

Burger, D., Keckler, S.W., McKinley, K.S., Dahlin, M., John, L.K., Lin, C., Moore, C.R., Burrill, J., McDonald, R.G., Yoder, W., The TRIPS Team: Scaling to the End of Silicon with EDGE Architectures. Computer 37(7), 44–55 (2004)
Article Google Scholar
Coons, K.E., Chen, X., Burger, D., McKinley, K.S., Kushwaha, S.K.: A Spatial Path Scheduling Algorithm For EDGE Architectures. In: ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pp. 129–140. ACM Press, New York (2006)
Chapter Google Scholar
Dennis, J.B., Misunas, D.P.: A preliminary architecture for a basic data-flow processor. SIGARCH Comput. Archit. News 3(4), 126–132 (1974)
Article Google Scholar
Gibert, E., Sanchez, J., Gonzalez, A.: Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor. In: MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 123–133. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W.J., Horowitz, M.: Smart Memories: A Modular Reconfigurable Architecture. In: ISCA 2000: Proceedings of the 27th annual international symposium on Computer architecture, pp. 161–171. ACM Press, New York (2000)
Google Scholar
Mercaldi, M., Swanson, S., Petersen, A., Putnam, A., Schwerin, A., Oskin, M., Eggers, S.J.: Instruction scheduling for a tiled dataflow architecture. In: ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pp. 141–150. ACM Press, New York (2006)
Chapter Google Scholar
Ozer, E., Banerjia, S., Conte, T.M.: Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In: MICRO 31: Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, pp. 308–315. IEEE Computer Society Press, Los Alamitos (1998)
Chapter Google Scholar
Papadopoulos, G.M., Culler, D.E.: Monsoon: An explicit token-store architecture. In: ISCA 1998: 25 years of the international symposia on Computer architecture (selected papers), pp. 398–407. ACM Press, New York (1998)
Chapter Google Scholar
Qian, Y., Carr, S., Sweany, P.H.: Optimizing Loop Performance For Clustered VLIW Architectures. In: PACT 2002: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, USA, pp. 271–280. IEEE Computer Society Press, Los Alamitos (2002)
Chapter Google Scholar
Sakai, S., Yamaguchi, y., Hiraki, K., Kodama, Y., Yuba, T.: An architecture of a dataflow single chip processor. In: ISCA 1989: Proceedings of the 16th annual international symposium on Computer architecture, pp. 46–53. ACM Press, New York (1989)
Google Scholar
EEMBC Benchmark Scores, http://www.eembc.org
Swanson, S., Michelson, K., Schwerin, A., Oskin, M.: Dataflow: The Road Less Complex. In: WCED 2003: Proceedings of the 3rd Workship on Complexity-Effective Design (2003)
Google Scholar
Swanson, S., Michelson, K., Schwerin, A., Oskin, M.: WaveScalar. In: MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, p. 291. IEEE Computer Society Press, Los Alamitos (2003)
Google Scholar
Waingold, E., Taylor, M., Sarkar, V., Lee, V., Lee, W., Kim, J., Frank, M., Finch, P., Devabhaktumi, S., Barua, R., Babb, J., Amarsinghe, S., Agarwal, A.: Baring it all to Software: The Raw Machine. Technical report, Cambridge, MA, USA (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, The University of Texas at Austin, USA
Muhammad Umar Farooq & Lizy K. John

Authors

Muhammad Umar Farooq
View author publications
You can also search for this author in PubMed Google Scholar
Lizy K. John
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computing Laboratory, Oxford University, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Oege de Moor
Department of Computer Science, Aarhus University, Aabogade 34, 8200, Aarhus N., Denmark
Michael I. Schwartzbach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Farooq, M.U., John, L.K. (2009). Loop-Aware Instruction Scheduling with Dynamic Contention Tracking for Tiled Dataflow Architectures. In: de Moor, O., Schwartzbach, M.I. (eds) Compiler Construction. CC 2009. Lecture Notes in Computer Science, vol 5501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00722-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-00722-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00721-7
Online ISBN: 978-3-642-00722-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics