Lightweight Hardware Synchronization for Avoiding Buffer Overflows in Network-on-Chips

  • Martin Frieb
  • Alexander Stegmeier
  • Jörg Mische
  • Theo Ungerer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10793)


Buffer overflows are a serious problem when running message-passing programs on network-on-chip based many-core processors. A simple synchronization mechanism ensures that data is transferred when nodes need it. Thereby, it avoids full buffers and interruption at any other time. However, software synchronization is not able to completely achieve these objectives, because its flits may still interrupt nodes or fill buffers. Therefore, we propose a lightweight hardware synchronization. It requires only small architectural changes as it comprises only very small components and it scales well. For controlling our hardware supported synchronization, we add two new assembler instructions. Furthermore, we show the difference in the software development process and evaluate the impact on the execution time of global communication operations and required receive buffer slots.



The authors thank Ingo Sewing for his efforts implementing our lightweight hardware synchronization in the RC/MC architecture.


  1. 1.
    Agarwal, A., Iskander, C., Shankar, R.: Survey of network on chip (NoC) architectures & contributions. J. Eng. Comput. Archit. 3(1), 21–27 (2009)Google Scholar
  2. 2.
    Bjerregaard, T., Mahadevan, S.: A survey of research and practices of network-on-chip. ACM Comput. Surv. (CSUR) 38(1), 1–51 (2006)CrossRefGoogle Scholar
  3. 3.
    Borkar, S.: Future of interconnect fabric: a contrarian view. In: Workshop on System Level Interconnect Prediction, SLIP 2010, pp. 1–2 (2010)Google Scholar
  4. 4.
    Chrysos, G.: Intel® Xeon Phi coprocessor (codename knights corner). In: Hot Chips 24 Symposium (HCS), 2012 IEEE, pp. 1–31. IEEE (2012)Google Scholar
  5. 5.
    Coenen, M., Murali, S., Ruadulescu, A., Goossens, K., De Micheli, G.: A buffer-sizing algorithm for networks on chip using TDMA and credit-based end-to-end flow control. In: Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, CODES+ ISSS 2006, pp. 130–135. IEEE (2006)Google Scholar
  6. 6.
    Goossens, K., Dielissen, J., Radulescu, A.: Æthereal network on chip: concepts, architectures, and implementations. IEEE Design Test Comput. 22(5), 414–421 (2005)CrossRefGoogle Scholar
  7. 7.
    Kung, H.T., Morris, R.: Credit-based flow control for ATM networks. IEEE Netw. 9(2), 40–48 (1995)CrossRefGoogle Scholar
  8. 8.
    Kurose, J.F., Ross, K.W.: Computer Networking: A Top-Down Approach. Pearson, London (2012)Google Scholar
  9. 9.
    Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 3.1. High Performance Computing Center Stuttgart (HLRS) (2015).
  10. 10.
    Mische, J., Frieb, M., Stegmeier, A., Ungerer, T.: Reduced complexity many-core: timing predictability due to message-passing. In: Knoop, J., Karl, W., Schulz, M., Inoue, K., Pionteck, T. (eds.) ARCS 2017. LNCS, vol. 10172, pp. 139–151. Springer, Cham (2017). CrossRefGoogle Scholar
  11. 11.
    Mische, J., Ungerer, T.: Low power flitwise routing in an unidirectional torus with minimal buffering. In: Proceedings of the Fifth International Workshop on Network on Chip Architectures, NoCArc 2012, pp. 63–68. ACM, New York (2012)Google Scholar
  12. 12.
    Mische, J., Ungerer, T.: Guaranteed service independent of the task placement in NoCs with torus topology. In: Proceedings of the 22nd International Conference on Real-Time Networks and Systems, RTNS 2014, pp. 151–160. ACM, New York (2014)Google Scholar
  13. 13.
    Rattner, J.: An experimental many-core processor from Intel Labs. Presentation (2010).
  14. 14.
    Raynal, M., Helary, J.M.: Synchronization and Control of Distributed Systems and Programs. Wiley Series in Parallel Computing. Wiley, Chichester (1990). (Trans: Synchronisation et contrôle des systèmes et des programmes réparties, Paris, Eyrolles).
  15. 15.
    Tanenbaum, A.S., Van Steen, M.: Distributed Systems: Principles and Paradigms, 2nd edn. Prentice-Hall, Upper Saddle River (2007)zbMATHGoogle Scholar
  16. 16.
    Tanenbaum, A.S., Wetherall, D.J.: Computer Networks. Pearson, London (2010)Google Scholar
  17. 17.
    Vangal, S.R., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Singh, A., Jacob, T., Jain, S., Erraguntla, V., Roberts, C., Hoskote, Y., Borkar, N., Borkar, S.: An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circ. 43(1), 29–41 (2008)CrossRefGoogle Scholar
  18. 18.
    Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown III, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5), 15–31 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Martin Frieb
    • 1
  • Alexander Stegmeier
    • 1
  • Jörg Mische
    • 1
  • Theo Ungerer
    • 1
  1. 1.Institute of Computer ScienceUniversity of AugsburgAugsburgGermany

Personalised recommendations