CRiSPy-CUDA: Computing Species Richness in 16S rRNA Pyrosequencing Datasets with CUDA
Pyrosequencing technologies are frequently used for sequencing the 16S rRNA marker gene for metagenomic studies of microbial communities. Computing a pairwise genetic distance matrix from the produced reads is an important but highly time consuming task. In this paper, we present a parallelized tool (called CRiSPy) for scalable pairwise genetic distance matrix computation and clustering that is based on the processing pipeline of the popular ESPRIT software package. To achieve high computational efficiency, we have designed massively parallel CUDA algorithms for pairwise k-mer distance and pairwise genetic distance computation. We have also implemented a memory-efficient sparse matrix clustering program to process the distance matrix. On a single-GPU, CRiSPy achieves speedups of around two orders of magnitude compared to the sequential ESPRIT program for both the time-consuming pairwise genetic distance module and the whole processing pipeline, thus making CRiSPy particularly suitable for high-throughput microbial studies.
KeywordsMetagenomics Pyrosequencing Alignment CUDA MPI
- 3.Fabrice, A., Didier, R.: Exploring microbial diversity using 16S rRNA high-throughput methods. Applied and Environmental Microbiology 2, 074–092 (2009)Google Scholar
- 10.Sun, Y., Cai, Y., Huse, S., et al.: A Large-scale Benchmark Study of Existing Algorithms for Taxonomy-Independent Microbial Community Analysis. Briefings in Bioinformatics (2011)Google Scholar