Simulation and Application Performance Evaluation Using GPU Through CUDA C & Deep Learning in TensorFlow

  • Ajeet KumarEmail author
  • Abhishek Khanna
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 799)


GPUs have as of late pulled in the consideration of numerous application designers as product information parallel coprocessors. The most current eras of GPU design give less demanding programmability and expanded all-inclusive statement while keeping up the gigantic memory data transfer capacity and computational force of conventional GPUs. This open door ought to divert endeavors in GPU examination to setting up standards and systems that permit proficient mapping of calculation to design equipment. The project, shows the GeForce GTX 560 Ti processors association, highlights, and summed up improvement systems. Method to execution on the platform is by utilizing gigantic multithreading and use vast quantity of centers, cover up global storage inactivity. In order to achieve it, designers confront the test of striking the right harmony between every string’s asset utilization and the quantity of all the while dynamic strings. The assets to oversee incorporate the quantity of resistors also the degree of on-chip storage utilized per string, given strings per multiprocessor, also worldwide memory transmission capacity. The researcher likewise get expanded execution on rearranging, gets to off-chip storage and join solicitations for similar else adjoining storage areas therefore, implement established enhancements by diminishing quantity of implemented function. Such methodologies are used over an assortment of utilizations and areas and accomplish between a 10.5X to 14X application speedup. The similar result was achieved with the single core GPU using deep learning technique in TensorFlow framework.

Used Terms

Design Performance Languages TensorFlow 


GPU parallel processing Deep learning Neural nets  TensorFlow 


  1. 1.
  2. 2.
    Buck, I.: Brook Specification v0.2, October 2003Google Scholar
  3. 3.
  4. 4.
    Kennedy, K., Allen, R.: Automatic translation of Fortran programs to vector form. ACM Trans. Prog. Lang. Syst. 9(4), 491–542 (1987)CrossRefGoogle Scholar
  5. 5.
    Atallah, M.J. (ed.): Algorithms and Theory of Computation Handbook. CRC Press LLC, Boco Raton (1998)zbMATHGoogle Scholar
  6. 6.
    Kennedy, K., Callahan, D., Carr, S.: Improving register allocation for subscripted variables. ACM SIGPLAN Not. 9(4), 328–342 (2004)Google Scholar
  7. 7.
    Akeley, K., Glanville, R.S., Kilgard, M.J., Mark, W.R.: Cg: a system for programming graphics hardware in a C-like language. In: ACM SIGGRAPH 2003 Papers, pp. 896–907 (2003)Google Scholar
  8. 8.
    Loveman, D.B.: High performance Fortran. IEEE Parallel Distrib. Technol.: Syst. Technol. 1(1), 25–42 (1993)CrossRefGoogle Scholar
  9. 9.
    Rothberg, E.E., Lam, M.S., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: Proceedings of 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 63–74, April 1991Google Scholar
  10. 10.
    Allen, J.R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., Burlington (2002)Google Scholar
  11. 11.
    Gray, J., Govindaraju, N.K., Manocha, D., Larsen, S.: A memory model for scientific algorithms on graphics processors. In: Proceedings of 2006 ACM/IEEE Conference on Supercomputing, no. 89 (2006)Google Scholar
  12. 12.
    Sugerman, J., Fatahalian, K., Hanrahan, P.: Understanding the efficiency of GPU algorithms for matrix-matrix multiplication. In: Proceedings of ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pp. 133–137 (2004)Google Scholar
  13. 13.
    Brainerd, W.S., Adams, J.C., Smith, B.T., Martin, J.T., Wagener, J.L.: Fortran 90 Handbook: Complete ANSI/ISO Reference. Intertext Publications Inc./McGraw-Hill Inc., New York (1992)Google Scholar
  14. 14.
    ECE 498AL1: Programming Massively Parallel Processors, Fall 2007.
  15. 15.
    The PeakStream Platform: High productivity software development for multi-core processors. Technical report (2006)Google Scholar
  16. 16.

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Birla Institute of Technology and SciencePilaniIndia
  2. 2.Maharaja Surajmal Institute of TechnologyGGSIPUDelhiIndia

Personalised recommendations