Network Interface with Task Spawning Support for NoC-Based DSM Architectures
Distributed Shared Memory (DSM) architectures are becoming popular to exploit parallelism of architectures while offering flexibility of using both shared and distributed memory paradigms to application developers. At the same time, Networks on Chip (NoC) have become reality to address communication bottlenecks in massively parallel tile-based processor architectures. In NoC-based DSM architectures, the synchronization overhead for spawning a task on a remote network node may lead to high performance penalties. In order to reduce the synchronization delays during remote task spawning, the design of Network Interface (NI) becomes important. In this paper, we present a network interface architecture which supports task spawning between network nodes by employing efficient synchronization mechanisms. The proposed NI internal hardware support offloads the software from handling the synchronization during remote task spawning and hence results in achieving better overall performance. Simulation results highlight that the proposed hardware architecture improves the performance by up to 42 % in comparison to existing state of the art approaches. The FPGA prototype is also used to depict the benefits of the proposed approach for real world applications. Implementation results show the low area footprint of the proposed hardware.
KeywordsShared Memory Network Interface Distribute Shared Memory Message Sequence Chart Remote Direct Memory Access
Unable to display preview. Download preview PDF.
- 1.Agarwal, A.: The tile processor: A 64-core multicore for embedded processing. In: HPEC (2007)Google Scholar
- 2.Howard, J., Dighe, S., Hoskote, Y., et al.: A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS. In: ISSCC (2010)Google Scholar
- 3.Benini, L., Micheli, G.D.: Networks on chips: a new SoC paradigm. Computer (2002)Google Scholar
- 5.Yelick, K., Bonachea, D., Chen, W.-Y., Colella, P., Datta, K., Duell, J., Graham, S.L., Hargrove, P., Hilfinger, P., Husbands, P., et al.: Productivity and performance using partitioned global address space languages. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, pp. 24–32. ACM (2007)Google Scholar
- 6.Tota, S.V., Casu, M.R., Roch, M.R., Rostagno, L., Zamboni, M.: Medea: a hybrid shared-memory/message-passing multiprocessor noc-based architecture. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 45–50 (2010)Google Scholar
- 7.Chen, X., Lu, Z., Jantsch, A., Chen, S.: Supporting distributed shared memory on multi-core network-on-chips using a dual microcoded controller. In: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 39–44 (2010)Google Scholar
- 8.Kavadias, S.G., Katevenis, M.G., Zampetakis, M., Nikolopoulos, D.S.: On-chip communication and synchronization mechanisms with cache-integrated network interfaces. In: Proceedings of the 7th ACM International Conference on Computing Frontiers, ser. CF 2010 (2010)Google Scholar
- 9.Adve, S.V., Adve, V.S., Hill, M.D., Vernon, M.K.: Comparison of hardware and software cache coherence schemes (1991)Google Scholar
- 10.Oechslein, B., Schedel, J., Kleinöder, J., Bauer, L., Henkel, J., Lohmann, D., Schröder-Preikschat, W.: Octopos: a parallel operating system for invasive computing. In: Proceedings of the International Workshop on Systems for Future Multi-Core Architectures (SFMA), EuroSys, pp. 9–14 (2011)Google Scholar
- 11.Heisswolf, J., Koenig, R., Kupper, M., Becker, J.: Providing multiple hard latency and throughput guarantees for packet switching networks on chip. Computers & Electrical Engineering 39(8), 2603–2622 (2013). http://www.sciencedirect.com/science/article/pii/S0045790613001638 CrossRefGoogle Scholar
- 13.Gaiesler, J.: The leon processor user’s manual, July 2001. http://www.cs.ucr.edu/~dalton/leon/downloads/leon-2.3.5.pdf