Modeling and Optimizing Data Transfer in GPU-Accelerated Optical Coherence Tomography

  • Tobias SchrödterEmail author
  • David Pallasch
  • Sandra Wienke
  • Robert Schmitt
  • Matthias S. Müller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)


Signal processing of optical coherence tomography (OCT) has become a bottleneck for using OCT in medical and industrial applications. Recently, GPUs gained more importance as compute device to achieve video frame rate of 25 frames/s. Therefore, we develop a CUDA implementation of an OCT signal processing chain: We focus on reformulating the signal processing algorithms in terms of high-performance libraries like CUBLAS and CUFFT. Additionally, we use NVIDIA’s stream concept to overlap computations and data transfers. Performance results are presented for two Pascal GPUs and validated with a derived performance model. The model gives an estimate for the overall execution time for the OCT signal processing chain, including compute and transfer times.


GPU OCT Performance model CUDA 


  1. 1.
    van Aarle, W., et al.: Fast and flexible x-ray tomography using the astra toolbox. Opt. Express 24(22), 25129–25147 (2016)CrossRefGoogle Scholar
  2. 2.
    Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: incorporating long messages into the logp model for parallel computation. J. Parallel Distrib. Comput. 44(1), 71–79 (1997)CrossRefGoogle Scholar
  3. 3.
    Boyer, M., Meng, J., Kumaran, K.: Improving GPU performance prediction with data transfer modeling. In: 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and PhD Forum, pp. 1097–1106, May 2013Google Scholar
  4. 4.
    Culler, D.E., et al.: LogP: a practical model of parallel computation. Commun. ACM 39(11), 78–85 (1996)CrossRefGoogle Scholar
  5. 5.
    Danalis, A., et al.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU-3, pp. 63–74. ACM, New York (2010)Google Scholar
  6. 6.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005). special issue on “Program Generation, Optimization, and Platform Adaptation”CrossRefGoogle Scholar
  7. 7.
    Drexler, W., Fujimoto, J.G.: Optical Coherence Tomography: Technology and Applications. Springer, Heidelberg (2008). Scholar
  8. 8.
    Gómez-Luna, J., González-Linares, J.M., Benavides, J.I., Guil, N.: Performance models for asynchronous data transfers on consumer graphics processing units. J. Parallel Distrib. Comput. 72(9), 1117–1126 (2012). accelerators for High-Performance ComputingCrossRefGoogle Scholar
  9. 9.
    Huang, D., et al.: Optical coherence tomography. Science 254(5035), 1178–1181 (1991)CrossRefGoogle Scholar
  10. 10.
    Izatt, J.A., Choma, M.A.: Theory of optical coherence tomography. In: Drexler, W., Fujimoto, J.G. (eds.) Optical Coherence Tomography. Biological and Medical Physics, Biomedical Engineering, pp. 47–72. Springer, Heidelberg (2008). Scholar
  11. 11.
    Van der Jeught, S., Bradu, A., Podoleanu, A.G.: Real-time resampling in fourier domain optical coherence tomography using a graphics processing unit. J. Biomed. Opt. 15(3), 030511–030511–3 (2010)CrossRefGoogle Scholar
  12. 12.
    Madougou, S., Varbanescu, A., de Laat, C., van Nieuwpoort, R.: The landscape of GPGPU performance modeling tools. Parallel Comput. 56, 18–33 (2016)CrossRefGoogle Scholar
  13. 13.
    Nugteren, C., Corporaal, H.: The boat hull model: enabling performance prediction for parallel computing prior to code development. In: Proceedings of the 9th Conference on Computing Frontiers, CF 2012. ACM Press (2012)Google Scholar
  14. 14.
    Van Werkhoven, B., Maassen, J., Seinstra, F.J., Bal, H.E.: Performance models for CPU-GPU data transfers. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 11–20, May 2014Google Scholar
  15. 15.
    Wieser, W., Draxinger, W., Klein, T., Karpf, S., Pfeiffer, T., Huber, R.: High definition live 3D-OCT in vivo: design and evaluation of a 4D OCT engine with 1 GVoxel/s. Biomed. Opt. Express 5(9), 2963–2977 (2014)CrossRefGoogle Scholar
  16. 16.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  17. 17.
    Zhang, K., Kang, J.U.: Graphics processing unit-based ultrahigh speed real-time fourier domain optical coherence tomography. IEEE J. Sel. Top. Quantum Electron. 18(4), 1270–1279 (2012)CrossRefGoogle Scholar
  18. 18.
    Zhang, K., Kang, J.U.: Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k fourier-domain oct system. Opt. Express 18(11), 11772–11784 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.RWTH Aachen UniversityAachenGermany
  2. 2.Fraunhofer-Institute for Production Technology IPTAachenGermany
  3. 3.IT CenterRWTH Aachen UniversityAachenGermany
  4. 4.Laboratory for Machine Tools and Production Engineering (WZL)RWTH Aachen UniversityAachenGermany

Personalised recommendations