Abstract
The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. Distributed operating systems and applications allow to exploit the increased scalability of such architectures, but still face the data-to-task locality challenge. As inter-tile communication, thread synchronization and data transport often impose significant software overhead on such architectures, many applications would benefit from a more efficient and powerful communication primitive with minimal software involvement.
We propose software-defined hardware-managed queues for distributed computing architectures that enable efficient inter-tile communication by leveraging application-specific queues with arbitrarily sized elements. To ensure (remote) processing of queued elements, SHARQ introduces the concept of an optional handler task, which is scheduled by hardware on demand. Queue and memory management, intra- and inter-tile data transfer, and handler task invocation are entirely handled by hardware. Only the dynamic queue creation at runtime is performed in software.
As an example use-case, we integrated SHARQ into the MPI library. The evaluation with the MPI-based NAS benchmarks shows a reduction in execution time by up to 48% for the communication intense IS kernel in a \(4 \times 4\) tile design on an FPGA platform with a total of 80 LEON3 cores.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Teich, J., et al.: Invasive computing: an overview. In: Multiprocessor System-on-Chip, pp. 241–268 (2011). https://doi.org/10.1007/978-1-4419-6460-1_11
Oechslein, B., Schedel, J., et al.: OctoPOS: a parallel operating system for invasive computing. In: Proceedings of the International Workshop on Systems for Future Multi-Core Architectures. EuroSys, pp. 9–14 (2011)
Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: ACM Symposium on Principles of Distributed Computing, pp. 267–275 (1996). https://doi.org/10.1145/248052.248106
Petrovic, D., et al.: Leveraging hardware message passing for efficient thread synchronization. TOPC 2(4), 24:1–24:26 (2016). https://doi.org/10.1145/2858652
Sánchez, D., et al.: Flexible architectural support for fine-grain scheduling. In: ASPLOS Conference Proceedings, pp. 311–322 (2010). https://doi.org/10.1145/1736020.1736055
Lee, J., Nicopoulos, C., Lee, H.G., Panth, S., Lim, S.K., Kim, J.: IsoNet: hardware-based job queue management for many-core architectures. IEEE Trans. VLSI Syst. 21(6), 1080–1093 (2013). https://doi.org/10.1109/TVLSI.2012.2202699
Pujari, R.K., Wild, T., Herkersdorf, A.: TCU: a multi-objective hardware thread mapping unit for HPC clusters. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 39–58. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41321-1_3
Sharma, R.R., et al.: Exploring hardware work queue support for lightweight threads in MPSoCs. In: Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6 (2012). https://doi.org/10.1109/ReConFig.2012.6416747
Kumar, S., Hughes, C.J., Nguyen, A.D.: Carbon: architectural support for fine-grained parallelism on chip multiprocessors. In: Symposium on Computer Architecture (ISCA), pp. 162–173 (2007). https://doi.org/10.1145/1250662.1250683
Wang, Y., Wang, R., Herdrich, A., Tsai, J., Solihin, Y.: CAF: core to core communication acceleration framework. In: Conference on Parallel Architectures and Compilation (PACT), pp. 351–362 (2016). https://doi.org/10.1145/2967938.2967954
Lee, S., et al.: HAQu: hardware-accelerated queueing for fine-grained threading on a chip multiprocessor. In: Conference on High-Performance Computer Architecture (HPCA), pp. 99–110 (2011). https://doi.org/10.1109/HPCA.2011.5749720
Brewer, E.A., et al.: Remote queues: exposing message queues for optimization and atomicity. In: ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 42–53 (1995). https://doi.org/10.1145/215399.215416
Rheindt, S., Schenk, A., Srivatsa, A., Wild, T., Herkersdorf, A.: CaCAO: complex and compositional atomic operations for NoC-based manycore platforms. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds.) ARCS 2018. LNCS, vol. 10793, pp. 139–152. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77610-1_11
Zaib, A., et al.: Efficient task spawning for shared memory and message passing in many-core architectures. J. Syst. Archit. - Embed. Syst. Design 77, 72–82 (2017). https://doi.org/10.1016/j.sysarc.2017.03.004
Heisswolf, J., et al.: The invasive network on chip - a multi-objective many-core communication infrastructure. In: Conference on Architecture of Computing Systems (ARCS), Workshop Proceedings, pp. 1–8 (2014)
Subhlok, J., Venkataramaiah, S., Singh, A.: Characterizing NAS benchmark performance on shared heterogeneous networks. In: Parallel and Distributed Processing Symposium (IPDPS) (2002). https://doi.org/10.1109/IPDPS.2002.1015659
Maier, S., Hönig, T., Wägemann, P., Schröder-Preikschat, W.: Asynchronous abstract machines: anti-noise system software for many-core processors. In: Proceedings of the 9th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS) (2019). https://doi.org/10.1145/3322789.3328744
Acknowledgements
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 146371743 – TRR 89: Invasive Computing. We also thank G. Drescher, J. Rabenstein and T. Langer from FAU, as well as A. Preißner, O. Lenke and L. Nolte from TUM for their excellent help.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Rheindt, S., Maier, S., Schmaus, F., Wild, T., Schröder-Preikschat, W., Herkersdorf, A. (2019). SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-27562-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27561-7
Online ISBN: 978-3-030-27562-4
eBook Packages: Computer ScienceComputer Science (R0)