GPGPU Computing for Cloud Auditing



With the increasing computational complexity of cloud auditing and other data-intensive analysis applications, there is a growing need for computing platforms that can handle massive data sets and perform rapid analysis. These needs are met by systems with accelerators, such as Graphics Processing Units (GPUs), that can perform data analysis with a high level of parallelism employing tools like Hadoop MapReduce to handle massively parallel computing jobs. Applying GPUs to general purpose processing is known as GPGPU. This chapter uses an introductory approach to cover the basics of GPUs and GPGPU computing and their application to cloud computing and handling of large data sets. The main aim is to give the reader a broad background on how GPGPUs are used and their contribution to advances in cloud auditing.


Cloud Computing Graphic Processing Unit Intrusion Detection Graphic Hardware Single Instruction Multiple Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This material is based upon work partially supported by the National Science Foundation (NSF) Engineering Research Centers (ERC) Innovations Program grant EEC-0946463, Mathworks, the Air Force Office of Scientific Research (AFOSR)/the Air Force Research Laboratory (AFRL) LRIR 11RI01COR. The authors would like to thank James Brock for developing the OpenCL code examples, and Devon Yablonski for writing the conjugate gradient GPU code and analyzing the performance of GPU computing. This work was completed while both James and Devon were graduate students, and had worked under the supervision of Miriam Leeser at Northeastern University.


  1. 1., Programming guide: AMD accelerated parallel processing openCL. (2012)
  2. 2., The Message Passing Interface (MPI) standard.
  3. 3., Apache Hadoop.
  4. 4.
    Bauer, M., Cook, H., Khailany, B.: CudaDMA: optimizing GPU memory bandwidth via warp specialization. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC’11, Seattle, pp. 12:1–12:11. ACM, New York (2011). doi:10.1145/2063384.2063400Google Scholar
  5. 5.
    Bordawekar, R., Bondhugula, U., Rao, R.: Believe it or not!: Mult-core CPUs can match GPU performance for a FLOP-intensive application! In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT’10, Vienna, pp. 537–538. ACM, New York (2010). doi:10.1145/1854273.1854340Google Scholar
  6. 6.
    Buck, I.: GPU computing with NVIDIA CUDA. In: ACM SIGGRAPH 2007 Courses, SIGGRAPH’07, San Diego. ACM, New York (2007). doi:10.1145/1281500.1281647Google Scholar
  7. 7.
    Chafi, H., Sujeeth, A.K., Brown, K.J., Lee, H., Atreya, A.R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP ’11, San Antonio, pp. 35–46. ACM, New York (2011). doi:10.1145/1941553.1941561Google Scholar
  8. 8.
    Chen, L., Huo, X., Agrawal, G.: Accelerating MapReduce on a coupled CPU-GPU architecture. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC’12, Salt Lake City, pp. 25:1–25:11. IEEE Computer Society Press, Los Alamitos (2012). doi:10.1109/SC.2012.16Google Scholar
  9. 9., Cloud Security Alliance (CSA).
  10. 10.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi:10.1145/1327452.1327492CrossRefGoogle Scholar
  11. 11.
    Elteir, M., Lin, H., Feng, W., Scogland, T.: StreamMR: An optimized MapReduce framework for AMD GPUs. In: Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS’11, Tainan, pp. 364–371. IEEE Computer Society, Washington (2011). doi:10.1109/ICPADS.2011.131Google Scholar
  12. 12.
    Farber, R.: CUDA, Supercomputing for the masses: Part 2. Dr. Dobb’s J. (2008)
  13. 13.
    Feng, W., Lin, H., Scogland, T., Zhang, J.: OpenCL and the 13 dwarfs: a work in progress. In: Proceedings of the 3rd Joint WOSP/SIPEW International Conference on Performance Engineering, ICPE’12, Boston, pp. 291–294. ACM, New York (2012). doi:10.1145/2188286.2188341Google Scholar
  14. 14., The Green 500: Ranking the world’s most energy efficient supercomputers. (2012)
  15. 15.
    Han, T.D., Abdelrahman, T.S.: hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU’09, Washington, DC, pp. 52–61. ACM, New York (2009). doi:10.1145/1513895.1513902Google Scholar
  16. 16.
    Harris, M.: Many-core GPU computing with NVIDIA CUDA. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS’08, Aegean, Sea, pp. 1–1. ACM, New York (2008). doi:10.1145/1375527.1375528Google Scholar
  17. 17.
    He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a MapReduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT’08, Toronto, pp. 260–269. ACM, New York (2008). doi:10.1145/1454115.1454152Google Scholar
  18. 18.
    Hong, C., Chen, D., Chen, W., Zheng, W., Lin, H.: MapCG: writing parallel program portable between CPU and GPU. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT’10, Vienna, pp. 217–226. ACM, New York (2010). doi:10.1145/1854273.1854303Google Scholar
  19. 19.
    Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA’09, Austin, pp. 152–163. ACM, New York (2009). doi:10.1145/1555754.1555775Google Scholar
  20. 20.
    Huang, N.F., Hung, H.W., Lai, S.H., Chu, Y.M., Tsai, W.Y.: A GPU-based multiple-pattern matching algorithm for network intrusion detection systems. In: Proceedings of the 22nd International Conference on Advanced Information Networking and Applications – Workshops, AINAW’08, Ginowan, pp. 62–67. IEEE Computer Society, Washington, DC (2008). doi:10.1109/WAINA.2008.145Google Scholar
  21. 21.
    Huang, S., Xiao, S., Feng, W.: On the energy efficiency of graphics processing units for scientific computing. In: Proceedings of 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS’09, Rome, pp. 1–8. IEEE Computer Society, Washington, DC (2009). doi:10.1109/IPDPS.2009.5160980Google Scholar
  22. 22.
    Isayev, O.:, Computational chemistry: Toward real–life petascale simulations. (2011). HPC Advisory Council Stanford Workshop
  23. 23.
    Jablin, T.B., Prabhu, P., Jablin, J.A., Johnson, N.P., Beard, S.R., August, D.I.: Automatic cpu-gpu communication management and optimization. ACM SIGPLAN Notices 47(6), 142–151 (2011). doi:10.1145/2345156.1993516CrossRefGoogle Scholar
  24. 24.
    Jang, K., Han, S., Han, S., Moon, S., Park, K.S.: SSL Shader: Cheap SSL acceleration with commodity processors. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, Boston (2011)Google Scholar
  25. 25.
    Jooybar, H., Fung, W.W., O’Connor, M., Devietti, J., Aamodt, T.M.: GPUDet: A deterministic GPU architecture. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS’13, Houston, pp. 1–12. ACM, New York (2013). doi:10.1145/2451116.2451118Google Scholar
  26. 26., OpenCL – The open standard for parallel programming of heterogeneous systems.
  27. 27.
    Kim, J., Kim, H., Lee, J.H., Lee, J.: Achieving a single compute device image in OpenCL for multiple GPUs. In: Proceedings of the 16th ACM Symposium on Principles and practice of parallel programming, PPoPP’11, San Antonio, pp. 277–288. ACM, New York (2011). doi:10.1145/1941553.1941591Google Scholar
  28. 28.
    Kirk, D.: NVIDIA CUDA software and GPU parallel computing architecture. In: Proceedings of the 6th International Symposium on Memory management, ISMM’07, Montreal, pp. 103–104. ACM, New York (2007). doi:10.1145/1296907.1296909Google Scholar
  29. 29.
    Kirk, D., mei Hwu, W.: Programming Massively Parallel Processors. Morgan Kaufmann, Boston (2010)Google Scholar
  30. 30.
    Kluter, T., Brisk, P., Ienne, P., Charbon, E.: Speculative DMA for architecturally visible storage in instruction set extensions. In: Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS’08, Atlanta, pp. 243–248. ACM, New York (2008). doi:10.1145/1450135.1450191Google Scholar
  31. 31.
    Lee, J., Kim, J., Seo, S., Kim, S., Park, J., Kim, H., Dao, T.T., Cho, Y., Seo, S.J., Lee, S.H., Cho, S.M., Song, H.J., Suh, S.B., Choi, J.D.: An OpenCL framework for heterogeneous multicores with local memory. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT’10, Vienna, pp. 193–204. ACM, New York (2010). doi:10.1145/1854273.1854301Google Scholar
  32. 32.
    Leeser, M., Yablonski, D., Brooks, D., King, L.S.: The challenges of writing portable, correct and high performance libraries for GPUs. SIGARCH Comput. Archit. News 39(4), 2–7 (2011). doi:10.1145/2082156.2082158CrossRefGoogle Scholar
  33. 33.
    Leung, A., Vasilache, N., Meister, B., Baskaran, M., Wohlford, D., Bastoul, C., Lethin, R.: A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU’10, Washington, DC, pp. 51–61. ACM, New York (2010). doi:10.1145/1735688.1735698Google Scholar
  34. 34.
    Li, C., Wu, H., Chen, S., Li, X., Guo, D.: Efficient implementation for MD5-RC4 encryption using GPU with CUDA. In: Proceedings of the 3rd International Conference on Anti-Counterfeiting, Security, and Identification in Communication, ASID’09, Hong Kong, pp. 167–170. IEEE, Piscataway (2009)Google Scholar
  35. 35.
    Lin, Y., Wang, W., Gui, K.: OpenGL application live migration with GPU acceleration in personal cloud. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC’10, Chicago, pp. 280–283. ACM, New York (2010). doi:10.1145/1851476.1851510Google Scholar
  36. 36.
    Luebke, D.:, Siggraph 2008: GPU architecture: implications & trends. (2008)
  37. 37.
    Luley, R., Usmail, C., Barnell, M.: Energy efficiency evaluation and benchmarking of AFRL’s Condor high performance computer. In: Proceedings of the High Performance Computing Modernization Program Users Group Conference, Portland, pp. 1–11 (2011). URL
  38. 38.
    Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J., Ghalsasi, A.:, Cloud Computing – The business perspective. (2011)
  39. 39., Parallel computing toolbox.
  40. 40.
    Mattson, T.G., Sanders, B.A., Massingill, B.L.: Patterns for Parallel Programming. Addison Wesley, Boston (2005)Google Scholar
  41. 41.
    Meng, J., Morozov, V.A., Kumaran, K., Vishwanath, V., Uram, T.D.: GROPHECY: GPU performance projection from CPU code skeletons. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC’11, Seattle, pp. 14:1–14:11. ACM, New York (2011). doi:10.1145/2063384.2063402Google Scholar
  42. 42.
    Mistry, P., Gregg, C., Rubin, N., Kaeli, D., Hazelwood, K.: Analyzing program flow within a many-kernel OpenCL application. In: Proceedings of the 4th Workshop on General Purpose Processing on Graphics Processing Units, GPGPU’11, Newport Beach, pp. 10:1–10:8. ACM, New York (2011). doi:10.1145/1964179.1964193Google Scholar
  43. 43., OpenCL programming guide for the CUDA architecture. (2012)
  44. 44., High performance computing with CUDA tutorial.
  45. 45., OpenMP.
  46. 46.
    Qiu, Q., Wu, Q., Bishop, M., Pino, R., Linderman, R.: A parallel neuromorphic text recognition system and its implementation on a heterogeneous high performance computing cluster. IEEE Trans. Comput. (99), 1 (2012). doi:10.1109/TC.2012.50Google Scholar
  47. 47.
    Rangan, R., Vachharajani, N., Ottoni, G., August, D.I.: Performance scalability of decoupled software pipelining. ACM Trans. Archit. Code Optim. 5(2), 8:1–8:25 (2008). doi:10.1145/1400112.1400113Google Scholar
  48. 48.
    Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Single-dimension software pipelining for multidimensional loops. ACM Trans. Archit. Code Optim. 4(1) (2007)Google Scholar
  49. 49.
    Ross, V.W.: 500 TeraFLOPS heterogeneous cluster (Air Force’s largest interactive HPC). In: The 7th Mohawk Valley Technology Symposium, Rome (2011)Google Scholar
  50. 50.
    Saidi, S., Tendulkar, P., Lepley, T., Maler, O.: Optimizing explicit data transfers for data parallel applications on the cell architecture. ACM Trans. Archit. Code Optim. 8(4), 37:1–37:20 (2012). doi:10.1145/2086696.2086716Google Scholar
  51. 51., SCIRun: A scientific computing problem solving environment (PSE).
  52. 52.
    Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. Technical Report, Carnegie Mellon University (1994)Google Scholar
  53. 53.
    Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N., Matsuoka, S.: An 80-Fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC’10, New Orleans, pp. 1–11. IEEE Computer Society, Washington, DC (2010). doi:10.1109/SC.2010.9Google Scholar
  54. 54.
    Silberstein, M., Maruyama, N.: An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures. In: Proceedings of the 4th Annual International Conference on Systems and Storage, SYSTOR’11, Haifa, pp. 7:1–7:7. ACM, New York (2011). doi:10.1145/1987816.1987826Google Scholar
  55. 55.
    Sim, J., Dasgupta, A., Kim, H., Vuduc, R.: A performance analysis framework for identifying potential benefits in GPGPU applications. ACM SIGPLAN Not. 47(8), 11–22 (2012). doi:10.1145/2370036.2145819CrossRefGoogle Scholar
  56. 56.
    Song, F., Dongarra, J.: A scalable framework for heterogeneous GPU-based clusters. In: Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA’12, Pittsburgh, pp. 91–100. ACM, New York (2012). doi:10.1145/2312005.2312025Google Scholar
  57. 57.
    Stuart, J.A., Chen, C.K., Ma, K.L., Owens, J.D.: Multi-GPU volume rendering using MapReduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC’10, Chicago, pp. 841–848. ACM, New York (2010). doi:10.1145/1851476.1851597Google Scholar
  58. 58., Top 500 Supercomputer sites. (2012)
  59. 59., Sparse matrix collection.
  60. 60.
    Vasiliadis, G., Polychronakis, M., Ioannidis, S.: MIDeA: a multi-parallel intrusion detection architecture. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS’11, Chicago, pp. 297–308. ACM, New York (2011). doi:10.1145/2046707.2046741Google Scholar
  61. 61.
    Viswanathan, B.:, cloud auditing Making sure that your cloud works per your expectations. (2012)
  62. 62.
    Volkov, V.: Better performance at lower occupancy. In: Graphics Technology Conference (GTC) (2010).
  63. 63.
    Vuduc, R., Chandramowlishwaran, A., Choi, J., Guney, M., Shringarpure, A.: On the limits of GPU acceleration. In: Proceedings of the 2nd USENIX conference on Hot Topics in Parallelism, HotPar’10, Berkeley, pp. 13–13. USENIX Association, Berkeley (2010)Google Scholar
  64. 64.
    Wang, P.H., Collins, J.D., Chinya, G.N., Jiang, H., Tian, X., Girkar, M., Yang, N.Y., Lueh, G.Y., Wang, H.: EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’07, San Diego, pp. 156–166. ACM, New York (2007). doi:10.1145/1250734.1250753Google Scholar
  65. 65.
    Yang, C., Wang, F., Du, Y., Chen, J., Liu, J., Yi, H., Lu, K.: Adaptive optimization for petascale heterogeneous CPU/GPU computing. In: Proceedings of 2010 IEEE International Conference on Cluster Computing, CLUSTER’10, Heraklion, pp. 19–28. IEEE Computer Society, Washington, DC (2010). doi:10.1109/CLUSTER.2010.12Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Air Force Research LaboratoryWPAFBUSA
  2. 2.Northeastern UniversityBostonUSA

Personalised recommendations