Performance Issues of Parallel, Scalable Convolutional Neural Networks in Deep Learning

  • Umesh ChavanEmail author
  • Dinesh Kulkarni
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 810)


In this work, we investigate the performance issues in the parallel and scalable of Convolutional Neural Networks (CNNs). This will accelerate the training performance of CNN. In this paper we propose the parallel recognition using Compute Unified Device Architecture (CUDA) Technology and Message Passing Interface (MPI). We demonstrate scalability and performance that can be achieved on the GPU using CUDA framework where the computation-intensive tasks shifted on GPU. It compares result on GPU hardware architecture with the serial algorithm on CPU. The main novelty of our method is a new scalable CNN architecture that integrates a category hierarchy with deep CNN.


Convolutional neural network Deep learning MPI CUDA 


  1. 1.
    Luebke, D.: CUDA: scalable parallel programming for high-performance scientific computing. In: Proceedings of the 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro. Paris, France [s. n.] (2008)Google Scholar
  2. 2.
    Chellapilla, K., Puri, S., Simard, P.Y.: High performance convolutional neural networks for document processing. In: Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition (2006)Google Scholar
  3. 3.
    Lahabar, S., Agrawal, P., Narayanan, P.J.: High performance pattern recognition on GPU. In: Proceedings of the National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (2008)Google Scholar
  4. 4.
    Ciresan, D.C., Meier, U., Masci, J., et al.: Flexible, high performance convolutional neural networks for image classification. In: Proceedings of the Twenty-Second In Ternational Joint Conference on Artificial Intelligence (2011)Google Scholar
  5. 5.
    Yadan, O., Adams, K., Taigman, Y., et al.: Multi-GPU Training of ConvNets. Eprint Arxiv (2013)Google Scholar
  6. 6.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  7. 7.
    Kim, H., Nam, H., Jung, W., Lee, J.: Performance analysis of CNN frameworks for GPUs. In: 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE (2017)Google Scholar
  8. 8.
    NVIDIA CUDA—Programming Guide, Aug 2009, pp. 55–64.
  9. 9.
    Kim, H., Nam, H., Jung, W., Lee, J.: Performance analysis of CNN frameworks for GPUs. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Santa Rosa, CA (2017).
  10. 10.
    NVIDIA: NVIDIA CUBLAS Library. Accessed on Aug 2009
  11. 11.
    Lawrence, S., Giles, C.L., Tsoi, A.C.: Convolutional neural networks for face recognition. In: Proceedings of the IEEE Computer Society Conference on CVPR, San Francisco, California, USA, pp. 217–222 (1996)Google Scholar
  12. 12.
    Hedge, V., Usmani, S.: Parallel and distributed deep learning. Technical report, Stanford University, 2016.
  13. 13.
    Lam, J.C.L., Eizenman, M.: Convolutional neural networks for eye detection in remote gaze estimation systems. In: Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1 (2008)Google Scholar
  14. 14.
    Driss, S.B., Soua, M., Kachouri, R., Akil, M.: A comparison study between MLP and convolutional neural network models for character recognition. In: SPIE Conference on Real-Time Image and Video Processing, Apr 2017Google Scholar
  15. 15.

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of ITWalchand College of EngineeringSangliIndia

Personalised recommendations