Skip to main content

GPGPU Computing for Cloud Auditing

  • Chapter
  • First Online:
Book cover High Performance Cloud Auditing and Applications

Abstract

With the increasing computational complexity of cloud auditing and other data-intensive analysis applications, there is a growing need for computing platforms that can handle massive data sets and perform rapid analysis. These needs are met by systems with accelerators, such as Graphics Processing Units (GPUs), that can perform data analysis with a high level of parallelism employing tools like Hadoop MapReduce to handle massively parallel computing jobs. Applying GPUs to general purpose processing is known as GPGPU. This chapter uses an introductory approach to cover the basics of GPUs and GPGPU computing and their application to cloud computing and handling of large data sets. The main aim is to give the reader a broad background on how GPGPUs are used and their contribution to advances in cloud auditing.

“Approved for Public Release; Distribution Unlimited: 88ABW-2013-0081, 09-Jan-2013”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. amd.com, Programming guide: AMD accelerated parallel processing openCL. http://goo.gl/t9pkN (2012)

  2. anl.gov, The Message Passing Interface (MPI) standard. http://goo.gl/D6zO0

  3. apache.org, Apache Hadoop. http://goo.gl/tnkf

  4. Bauer, M., Cook, H., Khailany, B.: CudaDMA: optimizing GPU memory bandwidth via warp specialization. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC’11, Seattle, pp. 12:1–12:11. ACM, New York (2011). doi:10.1145/2063384.2063400

    Google Scholar 

  5. Bordawekar, R., Bondhugula, U., Rao, R.: Believe it or not!: Mult-core CPUs can match GPU performance for a FLOP-intensive application! In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT’10, Vienna, pp. 537–538. ACM, New York (2010). doi:10.1145/1854273.1854340

    Google Scholar 

  6. Buck, I.: GPU computing with NVIDIA CUDA. In: ACM SIGGRAPH 2007 Courses, SIGGRAPH’07, San Diego. ACM, New York (2007). doi:10.1145/1281500.1281647

    Google Scholar 

  7. Chafi, H., Sujeeth, A.K., Brown, K.J., Lee, H., Atreya, A.R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP ’11, San Antonio, pp. 35–46. ACM, New York (2011). doi:10.1145/1941553.1941561

    Google Scholar 

  8. Chen, L., Huo, X., Agrawal, G.: Accelerating MapReduce on a coupled CPU-GPU architecture. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC’12, Salt Lake City, pp. 25:1–25:11. IEEE Computer Society Press, Los Alamitos (2012). doi:10.1109/SC.2012.16

    Google Scholar 

  9. cloudsecurityalliance.org, Cloud Security Alliance (CSA). http://goo.gl/u6VtG

  10. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi:10.1145/1327452.1327492

    Article  Google Scholar 

  11. Elteir, M., Lin, H., Feng, W., Scogland, T.: StreamMR: An optimized MapReduce framework for AMD GPUs. In: Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS’11, Tainan, pp. 364–371. IEEE Computer Society, Washington (2011). doi:10.1109/ICPADS.2011.131

    Google Scholar 

  12. Farber, R.: CUDA, Supercomputing for the masses: Part 2. Dr. Dobb’s J. http://www.drdobbs.com/parallel/cuda-supercomputing-for-the-masses-part/207402986 (2008)

  13. Feng, W., Lin, H., Scogland, T., Zhang, J.: OpenCL and the 13 dwarfs: a work in progress. In: Proceedings of the 3rd Joint WOSP/SIPEW International Conference on Performance Engineering, ICPE’12, Boston, pp. 291–294. ACM, New York (2012). doi:10.1145/2188286.2188341

    Google Scholar 

  14. green500.org, The Green 500: Ranking the world’s most energy efficient supercomputers. http://goo.gl/WTDeM (2012)

  15. Han, T.D., Abdelrahman, T.S.: hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU’09, Washington, DC, pp. 52–61. ACM, New York (2009). doi:10.1145/1513895.1513902

    Google Scholar 

  16. Harris, M.: Many-core GPU computing with NVIDIA CUDA. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS’08, Aegean, Sea, pp. 1–1. ACM, New York (2008). doi:10.1145/1375527.1375528

    Google Scholar 

  17. He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a MapReduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT’08, Toronto, pp. 260–269. ACM, New York (2008). doi:10.1145/1454115.1454152

    Google Scholar 

  18. Hong, C., Chen, D., Chen, W., Zheng, W., Lin, H.: MapCG: writing parallel program portable between CPU and GPU. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT’10, Vienna, pp. 217–226. ACM, New York (2010). doi:10.1145/1854273.1854303

    Google Scholar 

  19. Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA’09, Austin, pp. 152–163. ACM, New York (2009). doi:10.1145/1555754.1555775

    Google Scholar 

  20. Huang, N.F., Hung, H.W., Lai, S.H., Chu, Y.M., Tsai, W.Y.: A GPU-based multiple-pattern matching algorithm for network intrusion detection systems. In: Proceedings of the 22nd International Conference on Advanced Information Networking and Applications – Workshops, AINAW’08, Ginowan, pp. 62–67. IEEE Computer Society, Washington, DC (2008). doi:10.1109/WAINA.2008.145

    Google Scholar 

  21. Huang, S., Xiao, S., Feng, W.: On the energy efficiency of graphics processing units for scientific computing. In: Proceedings of 2009 IEEE International Symposium on Parallel & Distributed Processing, IPDPS’09, Rome, pp. 1–8. IEEE Computer Society, Washington, DC (2009). doi:10.1109/IPDPS.2009.5160980

    Google Scholar 

  22. Isayev, O.: hpcadvisorycouncil.com, Computational chemistry: Toward real–life petascale simulations. http://goo.gl/gbuxi (2011). HPC Advisory Council Stanford Workshop

  23. Jablin, T.B., Prabhu, P., Jablin, J.A., Johnson, N.P., Beard, S.R., August, D.I.: Automatic cpu-gpu communication management and optimization. ACM SIGPLAN Notices 47(6), 142–151 (2011). doi:10.1145/2345156.1993516

    Article  Google Scholar 

  24. Jang, K., Han, S., Han, S., Moon, S., Park, K.S.: SSL Shader: Cheap SSL acceleration with commodity processors. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, Boston (2011)

    Google Scholar 

  25. Jooybar, H., Fung, W.W., O’Connor, M., Devietti, J., Aamodt, T.M.: GPUDet: A deterministic GPU architecture. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS’13, Houston, pp. 1–12. ACM, New York (2013). doi:10.1145/2451116.2451118

    Google Scholar 

  26. khronos.org, OpenCL – The open standard for parallel programming of heterogeneous systems. http://goo.gl/7aM3s

  27. Kim, J., Kim, H., Lee, J.H., Lee, J.: Achieving a single compute device image in OpenCL for multiple GPUs. In: Proceedings of the 16th ACM Symposium on Principles and practice of parallel programming, PPoPP’11, San Antonio, pp. 277–288. ACM, New York (2011). doi:10.1145/1941553.1941591

    Google Scholar 

  28. Kirk, D.: NVIDIA CUDA software and GPU parallel computing architecture. In: Proceedings of the 6th International Symposium on Memory management, ISMM’07, Montreal, pp. 103–104. ACM, New York (2007). doi:10.1145/1296907.1296909

    Google Scholar 

  29. Kirk, D., mei Hwu, W.: Programming Massively Parallel Processors. Morgan Kaufmann, Boston (2010)

    Google Scholar 

  30. Kluter, T., Brisk, P., Ienne, P., Charbon, E.: Speculative DMA for architecturally visible storage in instruction set extensions. In: Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS’08, Atlanta, pp. 243–248. ACM, New York (2008). doi:10.1145/1450135.1450191

    Google Scholar 

  31. Lee, J., Kim, J., Seo, S., Kim, S., Park, J., Kim, H., Dao, T.T., Cho, Y., Seo, S.J., Lee, S.H., Cho, S.M., Song, H.J., Suh, S.B., Choi, J.D.: An OpenCL framework for heterogeneous multicores with local memory. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT’10, Vienna, pp. 193–204. ACM, New York (2010). doi:10.1145/1854273.1854301

    Google Scholar 

  32. Leeser, M., Yablonski, D., Brooks, D., King, L.S.: The challenges of writing portable, correct and high performance libraries for GPUs. SIGARCH Comput. Archit. News 39(4), 2–7 (2011). doi:10.1145/2082156.2082158

    Article  Google Scholar 

  33. Leung, A., Vasilache, N., Meister, B., Baskaran, M., Wohlford, D., Bastoul, C., Lethin, R.: A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU’10, Washington, DC, pp. 51–61. ACM, New York (2010). doi:10.1145/1735688.1735698

    Google Scholar 

  34. Li, C., Wu, H., Chen, S., Li, X., Guo, D.: Efficient implementation for MD5-RC4 encryption using GPU with CUDA. In: Proceedings of the 3rd International Conference on Anti-Counterfeiting, Security, and Identification in Communication, ASID’09, Hong Kong, pp. 167–170. IEEE, Piscataway (2009)

    Google Scholar 

  35. Lin, Y., Wang, W., Gui, K.: OpenGL application live migration with GPU acceleration in personal cloud. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC’10, Chicago, pp. 280–283. ACM, New York (2010). doi:10.1145/1851476.1851510

    Google Scholar 

  36. Luebke, D.: ucdavis.edu, Siggraph 2008: GPU architecture: implications & trends. http://goo.gl/KHZmo (2008)

  37. Luley, R., Usmail, C., Barnell, M.: Energy efficiency evaluation and benchmarking of AFRL’s Condor high performance computer. In: Proceedings of the High Performance Computing Modernization Program Users Group Conference, Portland, pp. 1–11 (2011). URL http://goo.gl/RR6X5

  38. Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J., Ghalsasi, A.: ssrn.com, Cloud Computing – The business perspective. http://goo.gl/aYM1s (2011)

  39. mathworks.com, Parallel computing toolbox. http://goo.gl/cpM2O

  40. Mattson, T.G., Sanders, B.A., Massingill, B.L.: Patterns for Parallel Programming. Addison Wesley, Boston (2005)

    Google Scholar 

  41. Meng, J., Morozov, V.A., Kumaran, K., Vishwanath, V., Uram, T.D.: GROPHECY: GPU performance projection from CPU code skeletons. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC’11, Seattle, pp. 14:1–14:11. ACM, New York (2011). doi:10.1145/2063384.2063402

    Google Scholar 

  42. Mistry, P., Gregg, C., Rubin, N., Kaeli, D., Hazelwood, K.: Analyzing program flow within a many-kernel OpenCL application. In: Proceedings of the 4th Workshop on General Purpose Processing on Graphics Processing Units, GPGPU’11, Newport Beach, pp. 10:1–10:8. ACM, New York (2011). doi:10.1145/1964179.1964193

    Google Scholar 

  43. nvidia.com, OpenCL programming guide for the CUDA architecture. http://goo.gl/GcWuK (2012)

  44. nvidia.com, High performance computing with CUDA tutorial. http://goo.gl/2DNBY

  45. openmp.org, OpenMP. http://goo.gl/M5YVS

  46. Qiu, Q., Wu, Q., Bishop, M., Pino, R., Linderman, R.: A parallel neuromorphic text recognition system and its implementation on a heterogeneous high performance computing cluster. IEEE Trans. Comput. (99), 1 (2012). doi:10.1109/TC.2012.50

    Google Scholar 

  47. Rangan, R., Vachharajani, N., Ottoni, G., August, D.I.: Performance scalability of decoupled software pipelining. ACM Trans. Archit. Code Optim. 5(2), 8:1–8:25 (2008). doi:10.1145/1400112.1400113

    Google Scholar 

  48. Rong, H., Tang, Z., Govindarajan, R., Douillet, A., Gao, G.R.: Single-dimension software pipelining for multidimensional loops. ACM Trans. Archit. Code Optim. 4(1) (2007)

    Google Scholar 

  49. Ross, V.W.: 500 TeraFLOPS heterogeneous cluster (Air Force’s largest interactive HPC). In: The 7th Mohawk Valley Technology Symposium, Rome (2011)

    Google Scholar 

  50. Saidi, S., Tendulkar, P., Lepley, T., Maler, O.: Optimizing explicit data transfers for data parallel applications on the cell architecture. ACM Trans. Archit. Code Optim. 8(4), 37:1–37:20 (2012). doi:10.1145/2086696.2086716

    Google Scholar 

  51. sci.utah.edu, SCIRun: A scientific computing problem solving environment (PSE). http://goo.gl/Li6YX

  52. Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. Technical Report, Carnegie Mellon University (1994)

    Google Scholar 

  53. Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N., Matsuoka, S.: An 80-Fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC’10, New Orleans, pp. 1–11. IEEE Computer Society, Washington, DC (2010). doi:10.1109/SC.2010.9

    Google Scholar 

  54. Silberstein, M., Maruyama, N.: An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures. In: Proceedings of the 4th Annual International Conference on Systems and Storage, SYSTOR’11, Haifa, pp. 7:1–7:7. ACM, New York (2011). doi:10.1145/1987816.1987826

    Google Scholar 

  55. Sim, J., Dasgupta, A., Kim, H., Vuduc, R.: A performance analysis framework for identifying potential benefits in GPGPU applications. ACM SIGPLAN Not. 47(8), 11–22 (2012). doi:10.1145/2370036.2145819

    Article  Google Scholar 

  56. Song, F., Dongarra, J.: A scalable framework for heterogeneous GPU-based clusters. In: Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA’12, Pittsburgh, pp. 91–100. ACM, New York (2012). doi:10.1145/2312005.2312025

    Google Scholar 

  57. Stuart, J.A., Chen, C.K., Ma, K.L., Owens, J.D.: Multi-GPU volume rendering using MapReduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC’10, Chicago, pp. 841–848. ACM, New York (2010). doi:10.1145/1851476.1851597

    Google Scholar 

  58. top500.org, Top 500 Supercomputer sites. http://goo.gl/P4Qaf (2012)

  59. ufl.edu, Sparse matrix collection. http://goo.gl/OgeW

  60. Vasiliadis, G., Polychronakis, M., Ioannidis, S.: MIDeA: a multi-parallel intrusion detection architecture. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS’11, Chicago, pp. 297–308. ACM, New York (2011). doi:10.1145/2046707.2046741

    Google Scholar 

  61. Viswanathan, B.: cloudtweaks.com, cloud auditing Making sure that your cloud works per your expectations. http://goo.gl/rXwpc (2012)

  62. Volkov, V.: Better performance at lower occupancy. In: Graphics Technology Conference (GTC) (2010). http://goo.gl/M6H3k

  63. Vuduc, R., Chandramowlishwaran, A., Choi, J., Guney, M., Shringarpure, A.: On the limits of GPU acceleration. In: Proceedings of the 2nd USENIX conference on Hot Topics in Parallelism, HotPar’10, Berkeley, pp. 13–13. USENIX Association, Berkeley (2010)

    Google Scholar 

  64. Wang, P.H., Collins, J.D., Chinya, G.N., Jiang, H., Tian, X., Girkar, M., Yang, N.Y., Lueh, G.Y., Wang, H.: EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’07, San Diego, pp. 156–166. ACM, New York (2007). doi:10.1145/1250734.1250753

    Google Scholar 

  65. Yang, C., Wang, F., Du, Y., Chen, J., Liu, J., Yi, H., Lu, K.: Adaptive optimization for petascale heterogeneous CPU/GPU computing. In: Proceedings of 2010 IEEE International Conference on Cluster Computing, CLUSTER’10, Heraklion, pp. 19–28. IEEE Computer Society, Washington, DC (2010). doi:10.1109/CLUSTER.2010.12

    Google Scholar 

Download references

Acknowledgements

This material is based upon work partially supported by the National Science Foundation (NSF) Engineering Research Centers (ERC) Innovations Program grant EEC-0946463, Mathworks, the Air Force Office of Scientific Research (AFOSR)/the Air Force Research Laboratory (AFRL) LRIR 11RI01COR. The authors would like to thank James Brock for developing the OpenCL code examples, and Devon Yablonski for writing the conjugate gradient GPU code and analyzing the performance of GPU computing. This work was completed while both James and Devon were graduate students, and had worked under the supervision of Miriam Leeser at Northeastern University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Virginia W. Ross .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Ross, V.W., Leeser, M.E. (2014). GPGPU Computing for Cloud Auditing. In: Han, K., Choi, BY., Song, S. (eds) High Performance Cloud Auditing and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3296-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3296-8_10

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-3295-1

  • Online ISBN: 978-1-4614-3296-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics