Fast, Sub-pixel Accurate Digital Image Correlation Algorithm Powered by Heterogeneous (CPU-GPU) Framework
Digital Image Correlation (DIC) is a popular non-contact image-based full-field deformation measurement tool widely used in mechanics. In spite of its significant advantages, it is still primarily used as a post-processing tool due to its computational cost. In recent years, parallel computing platforms such as multi-core processors and Graphics Processing Units (GPUs) have been used to improve the speed of the DIC algorithm, with GPUs being well-suited for implementing data-parallel operations. Previous works have performed GPU-based DIC wherein each sub-image (i.e. a collection of a few pixels in the local neighborhood of a point of interest) is allocated to a single thread on the GPU, thus achieving parallelism across sub-images. However, this is not the only type of parallelism that is possible: one can also achieve parallelism within a sub-image as well as across whole images. The aim of this work is to efficiently implement 2D-DIC such that parallelism within a sub-image as well as across sub-images leads to considerable reduction in computation time. We use a heterogeneous framework consisting of an Intel Xeon octa-core CPU and an Nvidia Tesla K20C GPU card in this work. The CPU is used to handle image pre-processing, whereas the GPU is used to process four compute-intensive tasks: affine shape function computation, B-Spline interpolation, residual vector calculation and deformation vector update. Parallelization within and across sub-images is achieved in this work by efficient thread handling and use of pre-compiled BLAS libraries. In order to estimate the speedup provided by the GPU, the same four tasks were also evaluated on the octa-core CPU; a speedup of approximately 7 to 5 times was observed for a single sub-image whose size varies from 21×21 to 61×61 respectively. However, it is expected that for a larger number of sub-images, the GPU speedup will be higher and this is indeed the case: when the affine shape function computation and B-Spline interpolation steps were evaluated on 1869 21×21 pixel sub-images, the speedup was around a more impressive 453 times. Further GPU optimization as well as parallelization across image pairs is currently underway and even faster GPU-assisted DIC seems achievable.
KeywordsFull-field displacement Sub-image Parallel computing Heterogeneous framework Compute Unified Device Architecture (CUDA) Thread Kernel
- 1.Anderson, R.F., Kirtzic, J.S., Daescu, O.: Applying parallel design techniques to template matching with GPUs. In: International Conference on High Performance Computing for Computational Science, pp. 456–468 (2010)Google Scholar
- 14.Sutton, M.A., Orteu, J.J., Schreier, H.: Image Correlation for Shape, Motion and Deformation Measurements: Basic Concepts, Theory and Applications. Springer, Berlin (2009)Google Scholar