Skip to main content

qCUDA-ARM: Virtualization for Embedded GPU Architectures

  • Conference paper
  • First Online:
Internet of Vehicles. Technologies and Services Toward Smart Cities (IOV 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11894))

Included in the following conference series:

Abstract

The emergence of Internet of Things (IOT) is changing the ways of computing resources acquisition, from centralized cloud data centers to distributed pervasive edge nodes. To cope the small amount of diversity problem for IOT devices and applications, two research trends are investigated for the system design of edge nodes: heterogeneity and virtualization. In this paper, we consider the integration of those two important trends and present a virtualization system for embedded GPU architectures, called qCUDA-ARM. The design of qCUDA-ARM is based on the framework of qCUDA, a virtualization system for x86 servers. Because of the architectural differences between x86 servers and ARM based embedded systems, many subsystems of qCUDA-ARM, such as memory management, need to be redesigned. We evaluated the performance of qCUDA-ARM with three CUDA benchmarks and two real world applications. For computational intensive jobs, qCUDA-ARM can reach similar performance of the native system; and for memory bound programs, qCUDA-ARM can also have up to 90% performance of that of the native one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. CUDA toolkit document 5.9 memory management. https://docs.nvidia.com/cuda/cuda-runtime-api

  2. Mediatek Helio. https://en.wikichip.org/wiki/mediatek/helio

  3. Programming guide: CUDA toolkit documentation. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html/

  4. Amert, T., Otterness, N., Yang, M., Anderson, J.H., Smith, F.D.: GPU scheduling on the NVIDIA TX2: hidden details revealed. In: 2017 IEEE Real-Time Systems Symposium (RTSS), pp. 104–115. IEEE (2017)

    Google Scholar 

  5. Bonomi, F., Milito, R., Zhu, J., Addepalli, S.: Fog computing and its role in the internet of things. In: 2012 First Edition of the MCC Workshop on Mobile Cloud Computing, pp. 13–16 (2012)

    Google Scholar 

  6. Celesti, A., Mulfari, D., Fazio, M., Villari, M., Puliafito, A.: Exploring container virtualization in IoT clouds. In: 2016 IEEE International Conference on Smart Computing, pp. 1–6 (2016)

    Google Scholar 

  7. Duato, J., Peña, A.J., Silla, F., Mayo, R., Quintana-Ortí, E.S.: rCUDA: reducing the number of GPU-based accelerators in high performance clusters, pp. 224–231 (2010)

    Google Scholar 

  8. Giunta, G., Montella, R., Agrillo, G., Coviello, G.: A GPGPU transparent virtualization component for high performance computing clouds. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6271, pp. 379–391. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15277-1_37

    Chapter  Google Scholar 

  9. Gottschlag, M., Hillenbrand, M., Kehne, J., Stoess, J., Bellosa, F.: LoGV: low-overhead GPGPU virtualization. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications and 2013 IEEE International Conference on Embedded and Ubiquitous Computing, pp. 1721–1726 (2013)

    Google Scholar 

  10. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of things (IoT): a vision, architectural elements, and future directions. Future Gener. Comput. Syst. 29, 1645–1660 (2013)

    Article  Google Scholar 

  11. Guo, C., et al.: BCube: a high performance, server-centric network architecture for modular data centers. In: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication (2009)

    Google Scholar 

  12. Hsu, H.C., Lee, C.R.: G-KVM: a full GPU virtualization on KVM. In: 2016 IEEE International Conference on Computer and Information Technology, pp. 545–552 (2016)

    Google Scholar 

  13. Jones, R.W.: Optimizing QEMU boot time. http://oirase.annexia.org/tmp/paper.pdf

  14. Kato, S., McThrow, M., Maltzahn, C., Brandt, S.: Gdev: first-class GPU resource management in the operating system. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC 2012, p. 37. USENIX Association, Berkeley (2012)

    Google Scholar 

  15. Tian, K., Dong, Y., Cowperthwaite, D.: A full GPU virtualization solution with mediated pass-through. In: USENIX ATC 2014 Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, pp. 121–132 (2014)

    Google Scholar 

  16. Tong, L., Li, Y., Gao, W.: A hierarchical edge cloud architecture for mobile computing. In: The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9 (2016)

    Google Scholar 

  17. Morabito, R., Kjällman, J., Komu, M.: Hypervisors vs. lightweight virtualization: a performance comparison. In: 2015 IEEE International Conference on Cloud Engineering, pp. 386–393, March 2015. https://doi.org/10.1109/IC2E.2015.74

  18. Markthub, P., Nomura, A., Matsuoka, S.: mrCUDA: low-overhead middleware for transparently migrating CUDA execution from remote to local GPUs. In: Presented at the SC15 Conference (2015)

    Google Scholar 

  19. Russell, R.: Virtio: towards a de-facto standard for virtual I/O devices. In: ACM SIGOPS Operating Systems Review - Research and Developments in the Linux Kernel, pp. 95–103 (2008)

    Google Scholar 

  20. Shi, L., Chen, H., Sun, J., Li, K.: vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61(6), 804–816 (2012)

    Article  MathSciNet  Google Scholar 

  21. Stevens, A.: Introduction to amba® 4 ace™ and big. little™ processing technology. ARM White Paper, CoreLink Intelligent System IP by ARM (2011)

    Google Scholar 

  22. Suzuki, Y., Kato, S., Yamada, H., Kono, K.: GPUvm: GPU virtualization at the hypervisor. IEEE Trans. Comput. 65, 2752–2766 (2015)

    Article  MathSciNet  Google Scholar 

  23. Suzuki, Y., Kato, S., Yamada, H., Kono, K.: GPUvm: why not virtualizing GPUs at the hypervisor? In: 2014 USENIX Annual Technical Conference, USENIX ATC 2014, pp. 109–120 (2014)

    Google Scholar 

  24. Zhu, J., Chan, D.S., Prabhu, M.S., Natarajan, P., Hu, H., Bonomi, F.: Improving web sites performance using edge servers in fog computing architecture. In: 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering, pp. 320–323 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Che-Rung Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, BY., Lee, CR. (2020). qCUDA-ARM: Virtualization for Embedded GPU Architectures. In: Hsu, CH., Kallel, S., Lan, KC., Zheng, Z. (eds) Internet of Vehicles. Technologies and Services Toward Smart Cities. IOV 2019. Lecture Notes in Computer Science(), vol 11894. Springer, Cham. https://doi.org/10.1007/978-3-030-38651-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-38651-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-38650-4

  • Online ISBN: 978-3-030-38651-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics