Abstract
Networks-on-chip (NoC) provide a scalable and power-efficient communication infrastructure for different computing chips, ranging from fully customized multi/many-processor systems-on-chip (MPSoCs) to general-purpose chip multiprocessors (CMPs). A common aspect in almost all NoC workloads is the varying size of data transmitted by each transaction: while large data blocks are transferred as multiple-flit packets, a part of the traffic consists of short data segment (control data) that does not even fill a single flit. In conventional NoCs, switch allocator assigns/grants a switch output (and the link connected to it) to a single flit at each cycle, even if the flit is shorter than the link bit-width. In this paper, we propose a novel NoC architecture that enables routers to simultaneously send two short flits on the same link, effectively utilizing the link bandwidth that otherwise would be wasted. To this end, new crossbar, virtual channel (VC), and switch allocator architectures are presented to support parallel short packet forwarding on NoC links. Simulation results using synthetic and realistic workloads show that the proposed architecture improves the NoC performance by up to 24%.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gratz, P., Kim, C., Sankaralingam, K., Hanson, H., Shivakumar, P., Keckler, S.W., Burger, D.: On-chip interconnection networks of the TRIPS chip. IEEE Micro 27(5), 41–50 (2007)
Kumary, A., Kunduz, p., Singhx, A., Pehy, L.S., Jha, N.: A 4.6 Tbits/s 3.6 GHz single-cycle NoC router with a novel switch allocator in 65 nm CMOS. In: 2007 25th International Conference on Computer Design, Lake Tahoe, CA, pp. 63–70 (2007)
Howard, J., Dighe, S., Vangal, S.R., Ruhl, G., Borkar, N., Jain, S., Erraguntla, V., Konow, M., Riepen, M., Gries, M., Droege, G., Lund-Larsen, T., Steibl, S., Borkar, S., De, V.K., Van der Wijngaart, R.F.: A 48-core IA-32 processor in 45 nm CMOS using on-die message-passing and DVFS for performance and power scaling. IEEE J. Solid State Circ. 46(1), 173–183 (2011)
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C., Brown III, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5), 15–31 (2007)
Rotem, E., Naveh, A., Ananthakrishnan, A., Weissmann, E., Rajwan, D.: Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro 32(2), 20–27 (2012)
Overview of Intel Xeon Phi Coprocessor. https://software.intel.com
Lee, J., Nicopoulos, C., Park, S.J., Swaminathan, M., Kim, J.: Do we need wide flits in networks-on-chip? In: ISVLSI, Natal, pp. 2–7 (2013)
Volos, S., Seiculescu, C., Grot, B., Pour, N.K., Falsafi, B., De Micheli, G.: CCNoC: specializing on-chip interconnects for energy efficiency in cache-coherent servers. In: Sixth International Symposium on Networks-on-Chip, Copenhagen, pp. 67–74 (2012)
Ma, S., Jerger, N.E., Wang, Z.: Whole packet forwarding: efficient design of fully adaptive routing algorithms for networks-on-chip. In: HPCA, New Orleans, pp. 1–12 (2012)
Badr, M., Jerger, N.E.: SynFull: synthetic traffic models capturing cache coherent behavior. In: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), Minneapolis, MN, pp. 109–120 (2014)
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81. ACM, New York (2008)
Modarressi, M., Tavakkol, A., Sarbazi-Azad, H.: Application-aware topology reconfiguration for on-chip networks. IEEE Trans. Very Large Scale Integr. Circ. 19(11), 2010–2022 (2011)
Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: ISPASS, Boston, MA, pp. 163–174 (2009)
Hestness, J., Grot, B., Keckler, S.W.: Netrace: dependency-driven trace-based network-on-chip simulation. In: The Third International Workshop on Network on Chip Architectures (NoCArc 2010), pp. 31–36. ACM, New York (2010)
Yoon, Y.J., Concer, N., Petracca, M., Carloni, L.P.: Virtual channels and multiple physical networks: two alternatives to improve NoC performance. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 32(12), 1906–1919 (2013)
Mazloumi, A., Modarressi, M.: A hybrid packet/circuit-switched router to accelerate memory access in NoC-based chip multiprocessors. In: Design, Automation and Test in Europe Conference (DATE 2015), pp. 908–911 (2015)
Lotfi-Kamran, P., Modarressi, M., Sarbazi-Azad, H.: Near ideal network-on-chip for servers. In: 23rd IEEE Symposium on High Performance Computer Architecture (HPCA 2017), TX, USA (2017)
Cortex-A15 Technical Reference Manual. https://www.arm.com
HyperTransport Technology. https://www.amd.com
BookSim 2.0. https://nocs.stanford.edu/cgi-bin/trac.cgi/wiki/Re
Sun, C., Chen, C.H.O., Kurian, G., Wei, L., Miller, J., Agarwal, A., Peh, L.S., Stojanovic, V.: DSENT - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In: NOCS, Copenhagen, pp. 201–210 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Momenzadeh, E., Modarressi, M., Mazloumi, A., Daneshtalab, M. (2017). Parallel Forwarding for Efficient Bandwidth Utilization in Networks-on-Chip. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds) Architecture of Computing Systems - ARCS 2017. ARCS 2017. Lecture Notes in Computer Science(), vol 10172. Springer, Cham. https://doi.org/10.1007/978-3-319-54999-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-54999-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54998-9
Online ISBN: 978-3-319-54999-6
eBook Packages: Computer ScienceComputer Science (R0)