Advertisement

Porting and Benchmarking of BWAKIT Pipeline on OpenPOWER Architecture

  • Nagarajan KathiresanEmail author
  • Rashid Al-Ali
  • Puthen Jithesh
  • Ganesan Narayanasamy
  • Zaid Al-Ars
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11203)

Abstract

Next Generation Sequencing (NGS) technology produces large volumes of genome data, which gets processed using various open source bioinformatics tools. The configuration and compilation of some bioinformatics tools (e.g. BWAKIT, root) is a challenging activity in its own right, not to mention the need to perform more elaborate porting activities for these applications on some architectures (e.g. IBM Power). The best practices of application porting should ensure (i) the semantics of the program or algorithm should not be changed, (ii) the output generated from the original source code and the modified source code (i.e., after porting) should be same even though the code is ported into different architectures and (iii) the output should be similar across different architectures after porting. Burrows-Wheeler Aligner (BWA) is the most popular genome mapping application used in the BWAKIT toolset. This BWAKIT provides pre-compiled binaries for x86_64 architecture and an end-to-end solution for genome mapping. In this paper, we show how to port various pre-built application binaries used in BWAKIT into OpenPOWER architecture and execute the BWAKIT pipeline successfully. Additionally, we demonstrate the validity of output results on OpenPOWER as well as present benchmarking results of BWAKIT applications that indicate the suitability of the highly multithreaded OpenPOWER architecture to execute these applications.

Keywords

BWAKIT Genome mapping Burrows-Wheeler Aligner Parallelization Scalability Efficiency POWER architecture 

Notes

Acknowledgement

The authors gratefully acknowledge the access that was provided to OpenPOWER hardware at Forschungszentrum Jülich Supercomputing Center. Special thanks goes to Dr. Dirk Pleiter and Dr. Marcus Richter, Jülich Supercomputing Center, Germany. Also, the authors would like to thank Mr. Jaideep Bajwa, Mr. Michael Dawson, and Dr. Yinhe Cheng for helping on V8, K8 and trimadap source code modifications for POWER architecture.

References

  1. 1.
    Li, H., Durbin, R.: Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)CrossRefGoogle Scholar
  2. 2.
    Broad Institute. GATK best practices for the NGS Pipeline (2016). https://goo.gl/mjdmU2. Accessed 19 Jan 2016
  3. 3.
    Kathiresan, N., Temanni, R., Al-Ali, R.: Performance improvement of BWA MEM algorithm using data-parallel with concurrent parallelization. In: International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 406–411. IEEE (2014)Google Scholar
  4. 4.
    Al-Ali, R., Kathiresan, N., El Anbari, M., Schendel, E., Zaid, A.: Workflow optimization of performance and quality of service for bioinformatics application in high performance computing. J. Comput. Sci. 15, 3–10 (2016)CrossRefGoogle Scholar
  5. 5.
    Parallel Computing, Chapter 7 Performance and Scalability. https://www.cs.uky.edu/~jzhang/CS621/chapter7.pdf
  6. 6.
    Genome Comparison and analysis testing. standard genome data (2016). http://www.bioplanet.com/gcat. Accessed 19 Jan 2016
  7. 7.
    Kathiresan, N., Al-Ali, R.: Intelligent resource management system. U.S. Patent Application 15/194,052, filed December 28 2017 (2017)Google Scholar
  8. 8.
    Kathiresan, N., Temanni, R., Almabrazi, H., Syed, N., Jithesh, P.V., Al-Ali, R.: Accelerating next generation sequencing data analysis with system level optimizations. Sci. Rep. 7(1), 9058 (2017)CrossRefGoogle Scholar
  9. 9.
  10. 10.
  11. 11.
    IBM Power Vector Intrinisic Functions version 1.0.4. https://github.com/vcflib/vcflib/blob/master/src/vec128int.h
  12. 12.
    Ahmed, N., Sima, V.M., Houtgast, E.J., Bertels, K.L.M., Al-Ars, Z.: Heterogeneous hardware/software acceleration of the BWA-MEM DNA alignment algorithm. In: International Conference on Computer Aided Design (ICCAD 2015) (2015)Google Scholar
  13. 13.
    Al-Ars, Z., Mushtaq, H.: Scalability potential of BWA DNA mapping algorithm on apache spark. In: International Symposium on Information Management and Big Data (SIMBig 2015) (2015)Google Scholar
  14. 14.
    Mushtaq, H., Al-Ars, H.: Cluster-based apache spark implementation of the GATK DNA analysis pipeline. In: IEEE Conference on Bioinformatics and Biomedicine (BIBM 2015) (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Nagarajan Kathiresan
    • 1
    Email author
  • Rashid Al-Ali
    • 1
  • Puthen Jithesh
    • 1
  • Ganesan Narayanasamy
    • 2
  • Zaid Al-Ars
    • 3
  1. 1.Biomedical Informatics, Research DivisionSidra MedicineDohaQatar
  2. 2.OpenPOWER Leader in Education and ResearchIBM India Ltd.BangaloreIndia
  3. 3.Quantum and Computer EngineeringDelft University of TechnologyDelftNetherlands

Personalised recommendations