Towards a GPU Cloud: Benefits and Security Issues

  • Flavio LombardiEmail author
  • Roberto Di Pietro
Part of the Computer Communications and Networks book series (CCN)


Graphics processing unit (GPU)-based clouds are gaining momentum, and GPU computing resources are starting to be offered as a cloud service, either as parallel computing power or accessible as a part of a leased virtual machine (VM). For this reason, the GPU cloud is one of the most promising cloud evolutions. However, the present cloud offerings do not effectively exploit GPU computing resources, which could well improve the performance and security of distributed computing systems. In fact, heterogeneous many-core hardware and especially GPUs, offer a potentially massive increase in computing power. They are also very power efficient, enabling significant price/performance improvements over traditional central processing units (CPUs). Unfortunately, and more importantly, GPU clouds do not guarantee an adequate level of security with respect to access control and isolation. There is no effective control on how parallel code (a.k.a. kernels) is actually executed on a GPU. In fact, the present GPU device drivers are entirely based on proprietary code and are optimized for performance rather than security. As a result, GPU architectures and hardware (HW)/software (SW) implementations are not yet considered to be mature enough for a GPU cloud. In particular, the level of security offered by this novel approach has yet to be fully investigated, as there is a limited security-related research that specifically targets GPU architectures. This chapter describes how GPU-as-a-Service can be exposed to misuse and to potential denial of service (DoS) and information leakage. It also shows how GPUs can be used as a security and integrity monitoring tool by the cloud, for instance, to provide timely integrity checking of VM code and data, allowing scalable management of the security of complex cloud computing infrastructures. Some further relevant security concerns are discussed in this chapter, including GPU service availability, access transparency and control.


Graphics processing unit GPU Isolation Many-core architecture Multithreading Privacy Security 


  1. 1.
    Agosta G, Barenghi A, Santis FD, Biagio AD, Pelosi G (2009) Fast disk encryption through GPGPU acceleration. In: Proceedings of the 2009 international conference on parallel and distributed computing, applications and technologies. Washington, DC, IEEE Computer Society, pp 102–109Google Scholar
  2. 2.
    AMD (2012) HSA—what is heterogeneous system architecture. Accessed 06 May 2014
  3. 3.
    Barenghi A, Pelosi G, Teglia Y (2011) Information leakage discovery techniques to enhance secure chip design. In: Ardagna C, Zhou J (eds) Information security theory and practice. Security and privacy of mobile devices in wireless communication, vol 6633. Springer, Berlin, pp 128–143CrossRefGoogle Scholar
  4. 4.
    Black N, Rodzik J (2010) My other computer is your GPU: System-centric CUDA threat modeling with CUBAR. Accessed 12 May 2014
  5. 5.
    Chafik O (2011) JavaCL OpenCL bindings for Java. Accessed 06 May 2014
  6. 6.
    Citrix (2013) NVIDIA. NVIDIA GridGoogle Scholar
  7. 7.
    Das A, Memik G, Zambreno J, Choudhary A (2010) Detecting/preventing information leakage on the memory bus due to malicious hardware. In: Proceedings of the conference on design, automation and test in Europe. European Design and Automation Association, pp 861–866Google Scholar
  8. 8.
    Di Pietro R, Lombardi F, Villani A (2013) CUDA leaks: information leakage in GPU architectures. arXiv:1305.7383Google Scholar
  9. 9.
    Feng W-C, Xiao S (2010) To GPU synchronize or not GPU synchronize? In: Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS), pp 3801–3804Google Scholar
  10. 10.
    Frost G (2011) Aparapi a parallel API. Tratto da Accessed 06 May 2014
  11. 11.
    Georgescu S, Chow P (2011) GPU accelerated CAE using open solvers and the cloud. SIGARCH Comput Archit News 39(4):14–19CrossRefGoogle Scholar
  12. 12.
    Gorantla S, Kadloor S, Kiyavash N, Coleman T, Moskowitz IS, Kang MH (2012) Characterizing the efficacy of the (NRL) network pump in mitigating covert timing channels. Inf Forensics Secur IEEE Trans 7(1):64–75CrossRefGoogle Scholar
  13. 13.
    Gupta S, Feng S, Ansari A, Mahlke S (2010) Erasing core boundaries for robust and configurable performance. In: 43rd annual IEEE/ACM international symposium on Microarchitecture (MICRO), Atlanta, Georgia, pp 325–336Google Scholar
  14. 14.
    Kang MH, Moskowitz IS (1993) A pump for rapid, reliable, secure communication. In: Proceedings of the 1st ACM conference on computer and communication security, fairfax, 3–5 Nov 1993, pp 119–129Google Scholar
  15. 15.
    Kanuparthi A, Zahran M, Karri R (2012) Architecture support for dynamic integrity checking. Inf Forensics Secur IEEE Trans 7(1):321–332CrossRefGoogle Scholar
  16. 16.
    Kato S, McThrow M, Maltzahn C, Brandt S (2012) Gdev: First-class GPU resource management in the operating system. In: Proceedings of the 2012 USENIX conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, Berkeley, CA, USA, p 37Google Scholar
  17. 17.
    Kim J, Kim H, Lee JH, Lee J (2011) Achieving a single compute device image in OpenCL for multiple GPUs. In: Proceedings of the 16th ACM symposium on principles and practice of parallel programming. New York, NY, USA, ACM, pp 277–288Google Scholar
  18. 18.
    Larabel M (2011) NVIDIA 295.40 closes high-risk security flaw. Accessed 06 May 2014
  19. 19.
    Larabel M (2012) NVIDIA root access. Accessed 06 May 2014
  20. 20.
    Li Y, Zhao K, Chu X, Liu J (2010) Speeding up K-means algorithm by GPUs. In: 10th IEEE international conference on computer and information technology (CIT ’10), Bradford, UKGoogle Scholar
  21. 21.
    Lombardi F, Di Pietro R (2009) KvmSec: a security extension for Linux kernel virtual machines. In: Proceedings of the 2009 ACM symposium on applied computing. New York, NY, USA, ACM, pp 2029–2034Google Scholar
  22. 22.
    Lombardi F, Di Pietro R (2010) CUDACS: securing the cloud with CUDA-enabled secure virtualization. In: Proceedings of the 12th international conference on Information and communications security. Berlin, Heidelberg, Springer-Verlag, pp 92–106Google Scholar
  23. 23.
    Lombardi F, Di Pietro R (2011) Secure virtualization for cloud computing. J Netw Comput Appl 34(4):1113–1122CrossRefGoogle Scholar
  24. 24.
    Mei C, Jiang H, Jenness J (2010) CUDA-based AES parallelization with fine-tuned GPU memory utilization. In: 2010 IEEE international symposium on parallel distributed processing, workshops and Phd forum (IPDPSW), pp 1–7Google Scholar
  25. 25.
    Menichelli F, Menicocci R, Olivieri M, Trifiletti A (2008) High-level side-channel attack modeling and simulation for security-critical systems on chips. IEEE Trans Dependable Secur Comput 5(3):164–176CrossRefGoogle Scholar
  26. 26.
    Mercuri RT, Neumann PG (2003) Security by obscurity. Commun ACM 46(11):160–166CrossRefGoogle Scholar
  27. 27.
    NVIDIA GPUDirect Cuda Toolkit Documentation. Accessed 12 May 2014
  28. 28. Accessed 12 May 2014
  29. 29.
    Osvik D, Shamir A, Tromer E (2006) Cache attacks and countermeasures: the case of AES. In: Pointcheval D (ed) Topics in cryptology CT-RSA 2006, vol 3860. Springer, Berlin, pp 1–20CrossRefGoogle Scholar
  30. 30.
    Oz I, Topcuoglu HR, Kandemir M, Tosun O (2012) Thread vulnerability in parallel applications. J Parallel Distrib Comput 72(10):1171–1185CrossRefGoogle Scholar
  31. 31.
    POCL (2011) POCL—portable computing language. Accessed 06 May 2014
  32. 32.
    Pungila C, Negru V (2012). A highly-efficient memory-compression approach for GPU-accelerated virus signature matching. In: Gollmann D, Freiling FC (eds) Information security. Springer, Berlin, pp 354–369CrossRefGoogle Scholar
  33. 33.
    Rebeiro C, Mukhopadhay D (2012) Boosting profiled cache timing attacks with A priori analysis. Inf Forensics Secur IEEE Trans 7(6):1900–1905CrossRefGoogle Scholar
  34. 34.
    Ristenpart T (2009) Hey, you, get off of my cloud: exploring information leakage in third-party compute Clouds. In: Proceedings of the 16th ACM conference on computer and communications security, CCS ’09, New York, NY, pp 199–212Google Scholar
  35. 35.
    Rossbach CJ, Currey J, Silberstein M, Ray B, Witchel E (2011) PTask: operating system abstractions to manage GPUs as compute devices. In: Proceedings of the twenty-third ACM symposium on operating systems principles. New York, NY, USA, ACM, pp 233–248Google Scholar
  36. 36.
    Sengupta D, Belapure R, Schwan K (2013) Multi-tenancy on GPGPU-based servers. In: Proceedings of the 7th international workshop on virtualization technologies in distributed computing. New York, NY, USA, ACM, pp 3–10Google Scholar
  37. 37.
    Shye A, Blomstedt J, Moseley T, Reddi VJ, Connors DA (2009) PLR: a software approach to transient fault tolerance for multicore architectures. IEEE Trans Dependable Secur Comput 6(2):135–148CrossRefGoogle Scholar
  38. 38.
    Tsai T-C, Hsieh C-W, Chou C-Y, Cheng Y-F, Kuo S-H (2012) NCHC’s Formosa V GPU cluster enters the TOP500 ranking. In: Proceedings of the 2012 IEEE 4th international conference on cloud computing technology and science (CloudCom). Washington, DC, USA, IEEE Computer Society, pp 622–624Google Scholar
  39. 39.
    Wang Z, Wu C, Grace M, Jiang X (2012) Isolating commodity hosted hypervisors with HyperLock. In: Proceedings of the 7th ACM European conference on computer systems. New York, NY, USA, ACM, pp 127–140Google Scholar
  40. 40.
    Wong H, Papadopoulou M-M, Sadooghi-Alvandi M, Moshovos A (2010) Demystifying GPU microarchitecture through microbenchmarking. In: IEEE international symposium on performance analysis of systems software (ISPASS), pp 235–246Google Scholar
  41. 41.
    Wu J, Hong B (2011) An efficient k-means algorithm on CUDA. In: 25th IEEE international symposium on parallel and distributed processing workshops and PhD forum (IPDPSW ’11), Anchorage, Alaska, pp 1740–1749Google Scholar
  42. 42.
    Xu W, Zhang H, Jiao S, Wang D, Song F, Liu Z (2012) Optimizing sparse matrix vector multiplication using cache blocking method on Fermi GPU. In: Proceedings of the 2012 13th ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing. Washington, DC, USA, IEEE Computer Society, pp 231–235Google Scholar
  43. 43.
    Yang X, Blackburn SM, Frampton D, Sartor JB, McKinley KS (2011) Why nothing matters: the impact of zeroing. SIGPLAN Not 46(10):307–324CrossRefGoogle Scholar
  44. 44.
    Yang Y, Xiang P, Kong J, Mantor M, Zhou H (2012) A unified optimizing compiler framework for different GPGPU architectures. ACM Trans Archit Code Optim 9(2):1–33CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  1. 1.Springer Research Group, Maths and Physics DepartmentUniversity of Roma TreRomeItaly

Personalised recommendations