Modeling and Optimizing Data Transfer in GPU-Accelerated Optical Coherence Tomography
Signal processing of optical coherence tomography (OCT) has become a bottleneck for using OCT in medical and industrial applications. Recently, GPUs gained more importance as compute device to achieve video frame rate of 25 frames/s. Therefore, we develop a CUDA implementation of an OCT signal processing chain: We focus on reformulating the signal processing algorithms in terms of high-performance libraries like CUBLAS and CUFFT. Additionally, we use NVIDIA’s stream concept to overlap computations and data transfers. Performance results are presented for two Pascal GPUs and validated with a derived performance model. The model gives an estimate for the overall execution time for the OCT signal processing chain, including compute and transfer times.
KeywordsGPU OCT Performance model CUDA
- 3.Boyer, M., Meng, J., Kumaran, K.: Improving GPU performance prediction with data transfer modeling. In: 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and PhD Forum, pp. 1097–1106, May 2013Google Scholar
- 5.Danalis, A., et al.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU-3, pp. 63–74. ACM, New York (2010)Google Scholar
- 13.Nugteren, C., Corporaal, H.: The boat hull model: enabling performance prediction for parallel computing prior to code development. In: Proceedings of the 9th Conference on Computing Frontiers, CF 2012. ACM Press (2012)Google Scholar
- 14.Van Werkhoven, B., Maassen, J., Seinstra, F.J., Bal, H.E.: Performance models for CPU-GPU data transfers. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 11–20, May 2014Google Scholar