SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures

Rheindt, Sven; Maier, Sebastian; Schmaus, Florian; Wild, Thomas; Schröder-Preikschat, Wolfgang; Herkersdorf, Andreas

doi:10.1007/978-3-030-27562-4_15

Sven Rheindt¹¹,
Sebastian Maier¹²,
Florian Schmaus¹²,
Thomas Wild¹¹,
Wolfgang Schröder-Preikschat¹² &
…
Andreas Herkersdorf¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11733))

Included in the following conference series:

International Conference on Embedded Computer Systems

1582 Accesses
4 Citations

Abstract

The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. Distributed operating systems and applications allow to exploit the increased scalability of such architectures, but still face the data-to-task locality challenge. As inter-tile communication, thread synchronization and data transport often impose significant software overhead on such architectures, many applications would benefit from a more efficient and powerful communication primitive with minimal software involvement.

We propose software-defined hardware-managed queues for distributed computing architectures that enable efficient inter-tile communication by leveraging application-specific queues with arbitrarily sized elements. To ensure (remote) processing of queued elements, SHARQ introduces the concept of an optional handler task, which is scheduled by hardware on demand. Queue and memory management, intra- and inter-tile data transfer, and handler task invocation are entirely handled by hardware. Only the dynamic queue creation at runtime is performed in software.

As an example use-case, we integrated SHARQ into the MPI library. The evaluation with the MPI-based NAS benchmarks shows a reduction in execution time by up to 48% for the communication intense IS kernel in a \(4 \times 4\) tile design on an FPGA platform with a total of 80 LEON3 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Teich, J., et al.: Invasive computing: an overview. In: Multiprocessor System-on-Chip, pp. 241–268 (2011). https://doi.org/10.1007/978-1-4419-6460-1_11
Google Scholar
Oechslein, B., Schedel, J., et al.: OctoPOS: a parallel operating system for invasive computing. In: Proceedings of the International Workshop on Systems for Future Multi-Core Architectures. EuroSys, pp. 9–14 (2011)
Google Scholar
Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: ACM Symposium on Principles of Distributed Computing, pp. 267–275 (1996). https://doi.org/10.1145/248052.248106
Petrovic, D., et al.: Leveraging hardware message passing for efficient thread synchronization. TOPC 2(4), 24:1–24:26 (2016). https://doi.org/10.1145/2858652
Article Google Scholar
Sánchez, D., et al.: Flexible architectural support for fine-grain scheduling. In: ASPLOS Conference Proceedings, pp. 311–322 (2010). https://doi.org/10.1145/1736020.1736055
Lee, J., Nicopoulos, C., Lee, H.G., Panth, S., Lim, S.K., Kim, J.: IsoNet: hardware-based job queue management for many-core architectures. IEEE Trans. VLSI Syst. 21(6), 1080–1093 (2013). https://doi.org/10.1109/TVLSI.2012.2202699
Article Google Scholar
Pujari, R.K., Wild, T., Herkersdorf, A.: TCU: a multi-objective hardware thread mapping unit for HPC clusters. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 39–58. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41321-1_3
Chapter Google Scholar
Sharma, R.R., et al.: Exploring hardware work queue support for lightweight threads in MPSoCs. In: Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6 (2012). https://doi.org/10.1109/ReConFig.2012.6416747
Kumar, S., Hughes, C.J., Nguyen, A.D.: Carbon: architectural support for fine-grained parallelism on chip multiprocessors. In: Symposium on Computer Architecture (ISCA), pp. 162–173 (2007). https://doi.org/10.1145/1250662.1250683
Wang, Y., Wang, R., Herdrich, A., Tsai, J., Solihin, Y.: CAF: core to core communication acceleration framework. In: Conference on Parallel Architectures and Compilation (PACT), pp. 351–362 (2016). https://doi.org/10.1145/2967938.2967954
Lee, S., et al.: HAQu: hardware-accelerated queueing for fine-grained threading on a chip multiprocessor. In: Conference on High-Performance Computer Architecture (HPCA), pp. 99–110 (2011). https://doi.org/10.1109/HPCA.2011.5749720
Brewer, E.A., et al.: Remote queues: exposing message queues for optimization and atomicity. In: ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 42–53 (1995). https://doi.org/10.1145/215399.215416
Rheindt, S., Schenk, A., Srivatsa, A., Wild, T., Herkersdorf, A.: CaCAO: complex and compositional atomic operations for NoC-based manycore platforms. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds.) ARCS 2018. LNCS, vol. 10793, pp. 139–152. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77610-1_11
Chapter Google Scholar
Zaib, A., et al.: Efficient task spawning for shared memory and message passing in many-core architectures. J. Syst. Archit. - Embed. Syst. Design 77, 72–82 (2017). https://doi.org/10.1016/j.sysarc.2017.03.004
Article Google Scholar
Heisswolf, J., et al.: The invasive network on chip - a multi-objective many-core communication infrastructure. In: Conference on Architecture of Computing Systems (ARCS), Workshop Proceedings, pp. 1–8 (2014)
Google Scholar
Subhlok, J., Venkataramaiah, S., Singh, A.: Characterizing NAS benchmark performance on shared heterogeneous networks. In: Parallel and Distributed Processing Symposium (IPDPS) (2002). https://doi.org/10.1109/IPDPS.2002.1015659
Maier, S., Hönig, T., Wägemann, P., Schröder-Preikschat, W.: Asynchronous abstract machines: anti-noise system software for many-core processors. In: Proceedings of the 9th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS) (2019). https://doi.org/10.1145/3322789.3328744

Download references

Acknowledgements

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 146371743 – TRR 89: Invasive Computing. We also thank G. Drescher, J. Rabenstein and T. Langer from FAU, as well as A. Preißner, O. Lenke and L. Nolte from TUM for their excellent help.

Author information

Authors and Affiliations

Technical University of Munich (TUM), Arcisstr. 21, 80333, Munich, Germany
Sven Rheindt, Thomas Wild & Andreas Herkersdorf
Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Schlossplatz 4, 91054, Erlangen, Germany
Sebastian Maier, Florian Schmaus & Wolfgang Schröder-Preikschat

Authors

Sven Rheindt
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Maier
View author publications
You can also search for this author in PubMed Google Scholar
Florian Schmaus
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Wild
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Schröder-Preikschat
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Herkersdorf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sven Rheindt .

Editor information

Editors and Affiliations

Technical University of Crete and ICS - FORTH, Chania, Greece
Dionisios N. Pnevmatikatos
INSA Rennes, Rennes Cedex 7, France
Maxime Pelcat
Fraunhofer IESE, Kaiserslautern, Germany
Matthias Jung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rheindt, S., Maier, S., Schmaus, F., Wild, T., Schröder-Preikschat, W., Herkersdorf, A. (2019). SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-27562-4_15
Published: 08 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27561-7
Online ISBN: 978-3-030-27562-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics