Skip to main content

SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures

  • Conference paper
  • First Online:
Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11733))

Included in the following conference series:

Abstract

The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. Distributed operating systems and applications allow to exploit the increased scalability of such architectures, but still face the data-to-task locality challenge. As inter-tile communication, thread synchronization and data transport often impose significant software overhead on such architectures, many applications would benefit from a more efficient and powerful communication primitive with minimal software involvement.

We propose software-defined hardware-managed queues for distributed computing architectures that enable efficient inter-tile communication by leveraging application-specific queues with arbitrarily sized elements. To ensure (remote) processing of queued elements, SHARQ introduces the concept of an optional handler task, which is scheduled by hardware on demand. Queue and memory management, intra- and inter-tile data transfer, and handler task invocation are entirely handled by hardware. Only the dynamic queue creation at runtime is performed in software.

As an example use-case, we integrated SHARQ into the MPI library. The evaluation with the MPI-based NAS benchmarks shows a reduction in execution time by up to 48% for the communication intense IS kernel in a \(4 \times 4\) tile design on an FPGA platform with a total of 80 LEON3 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Teich, J., et al.: Invasive computing: an overview. In: Multiprocessor System-on-Chip, pp. 241–268 (2011). https://doi.org/10.1007/978-1-4419-6460-1_11

    Google Scholar 

  2. Oechslein, B., Schedel, J., et al.: OctoPOS: a parallel operating system for invasive computing. In: Proceedings of the International Workshop on Systems for Future Multi-Core Architectures. EuroSys, pp. 9–14 (2011)

    Google Scholar 

  3. Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: ACM Symposium on Principles of Distributed Computing, pp. 267–275 (1996). https://doi.org/10.1145/248052.248106

  4. Petrovic, D., et al.: Leveraging hardware message passing for efficient thread synchronization. TOPC 2(4), 24:1–24:26 (2016). https://doi.org/10.1145/2858652

    Article  Google Scholar 

  5. Sánchez, D., et al.: Flexible architectural support for fine-grain scheduling. In: ASPLOS Conference Proceedings, pp. 311–322 (2010). https://doi.org/10.1145/1736020.1736055

  6. Lee, J., Nicopoulos, C., Lee, H.G., Panth, S., Lim, S.K., Kim, J.: IsoNet: hardware-based job queue management for many-core architectures. IEEE Trans. VLSI Syst. 21(6), 1080–1093 (2013). https://doi.org/10.1109/TVLSI.2012.2202699

    Article  Google Scholar 

  7. Pujari, R.K., Wild, T., Herkersdorf, A.: TCU: a multi-objective hardware thread mapping unit for HPC clusters. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) ISC High Performance 2016. LNCS, vol. 9697, pp. 39–58. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41321-1_3

    Chapter  Google Scholar 

  8. Sharma, R.R., et al.: Exploring hardware work queue support for lightweight threads in MPSoCs. In: Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6 (2012). https://doi.org/10.1109/ReConFig.2012.6416747

  9. Kumar, S., Hughes, C.J., Nguyen, A.D.: Carbon: architectural support for fine-grained parallelism on chip multiprocessors. In: Symposium on Computer Architecture (ISCA), pp. 162–173 (2007). https://doi.org/10.1145/1250662.1250683

  10. Wang, Y., Wang, R., Herdrich, A., Tsai, J., Solihin, Y.: CAF: core to core communication acceleration framework. In: Conference on Parallel Architectures and Compilation (PACT), pp. 351–362 (2016). https://doi.org/10.1145/2967938.2967954

  11. Lee, S., et al.: HAQu: hardware-accelerated queueing for fine-grained threading on a chip multiprocessor. In: Conference on High-Performance Computer Architecture (HPCA), pp. 99–110 (2011). https://doi.org/10.1109/HPCA.2011.5749720

  12. Brewer, E.A., et al.: Remote queues: exposing message queues for optimization and atomicity. In: ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 42–53 (1995). https://doi.org/10.1145/215399.215416

  13. Rheindt, S., Schenk, A., Srivatsa, A., Wild, T., Herkersdorf, A.: CaCAO: complex and compositional atomic operations for NoC-based manycore platforms. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds.) ARCS 2018. LNCS, vol. 10793, pp. 139–152. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77610-1_11

    Chapter  Google Scholar 

  14. Zaib, A., et al.: Efficient task spawning for shared memory and message passing in many-core architectures. J. Syst. Archit. - Embed. Syst. Design 77, 72–82 (2017). https://doi.org/10.1016/j.sysarc.2017.03.004

    Article  Google Scholar 

  15. Heisswolf, J., et al.: The invasive network on chip - a multi-objective many-core communication infrastructure. In: Conference on Architecture of Computing Systems (ARCS), Workshop Proceedings, pp. 1–8 (2014)

    Google Scholar 

  16. Subhlok, J., Venkataramaiah, S., Singh, A.: Characterizing NAS benchmark performance on shared heterogeneous networks. In: Parallel and Distributed Processing Symposium (IPDPS) (2002). https://doi.org/10.1109/IPDPS.2002.1015659

  17. Maier, S., Hönig, T., Wägemann, P., Schröder-Preikschat, W.: Asynchronous abstract machines: anti-noise system software for many-core processors. In: Proceedings of the 9th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS) (2019). https://doi.org/10.1145/3322789.3328744

Download references

Acknowledgements

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – project number 146371743 – TRR 89: Invasive Computing. We also thank G. Drescher, J. Rabenstein and T. Langer from FAU, as well as A. Preißner, O. Lenke and L. Nolte from TUM for their excellent help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sven Rheindt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rheindt, S., Maier, S., Schmaus, F., Wild, T., Schröder-Preikschat, W., Herkersdorf, A. (2019). SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27562-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27561-7

  • Online ISBN: 978-3-030-27562-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics