Abstract
Emerging multi- and many-core computer architectures pose new challenges with respect to efficient exploitation of parallelism. In addition, it is currently not clear which might be the most appropriate parallel programming paradigm to exploit such architectures, both from the efficiency as well as software engineering point of view. Beyond that, the application of high performance computing techniques and the use of supercomputers will be essential to deal with the explosive accumulation of sequence data. We address these issues via a thorough performance study by example of RAxML, which is a widely used Bioinformatics application for large-scale phylogenetic inference under the Maximum Likelihood criterion. We provide an overview over the respective parallelization strategies with MPI, Pthreads, and OpenMP and assess performance for these approaches on a large variety of parallel architectures. Results indicate that there is no universally best-suited paradigm with respect to efficiency and portability of the ML function. Therefore, we suggest that the ML function should be parallelized with MPI and Pthreads based on software engineering criteria as well as to enforce data locality.
Chapter PDF
References
Hamady, M., Walker, J., Harris, J., Gold, N., Knight, R.: Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature Methods 5, 235–237 (2008)
Darling, A., Carey, L., Feng, W.: The Design, Implementation, and Evaluation of mpiBLAST. In: Proceedings of ClusterWorld 2003 (2003)
Stamatakis, A., Auch, A., Meier-Kolthoff, J., Göker, M.: Axpcoords & parallel axparafit: Statistical co-phylogenetic analyses on thousands of taxa. BMC Bioinformatics (2007)
Felsenstein, J.: Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution 39(4), 783–791 (1985)
Bader, D., Roshan, U., Stamatakis, A.: Computational Grand Challenges in Assembling the Tree of Life: Problems & Solutions. In: Advances in Computers. Elsevier, Amsterdam (2006)
Minh, B.Q., Vinh, L.S., Schmidt, H.A., von Haeseler, A.: Large maximum likelihood trees. In: Proc. of the NIC Symposium 2006, pp. 357–365 (2006)
Blagojevic, F., Nikolopoulos, D.S., Stamatakis, A., Antonopoulos, C.D.: Dynamic Multigrain Parallelization on the Cell Broadband Engine. In: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 90–100 (2007)
Stamatakis, A., Ott, M., Ludwig, T.: RAxML-OMP: An Efficient Program for Phylogenetic Inference on SMPs. In: Malyshkin, V.E. (ed.) PaCT 2005. LNCS, vol. 3606, pp. 288–302. Springer, Heidelberg (2005)
Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17, 368–376 (1981)
Dunn, C.W., Hejnol, A., Matus, D.Q., Pang, K., Browne, W.E., Smith, S.A., Seaver, E., Rouse, G.W., Obst, M., Edgecombe, G.D., Sorensen, M.V., Haddock, S.H.D., Schmidt-Rhaesa, A., Okusu, A., Kristensen, R.M., Wheeler, W.C., Martindale, M.Q., Giribet, G.: Broad phylogenomic sampling improves resolution of the animal tree of life. Nature (2008) (advance on-line publication)
Robertson, C.E., Harris, J.K., Spear, J.R., Pace, N.R.: Phylogenetic diversity and ecology of environmental Archaea. Current Opinion in Microbiology 8, 638–642 (2005)
Charalambous, M., Trancoso, P., Stamatakis, A.: Initial Experiences Porting a Bioinformatics Application to a Graphics Processor. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 415–425. Springer, Heidelberg (2005)
Ott, M., Zola, J., Aluru, S., Johnson, A.D., Janies, D., Stamatakis, A.: Large-scale Phylogenetic Analysis on Current HPC Architectures. Scientific Programming (Submitted, 2008)
Ott, M., Zola, J., Aluru, S., Stamatakis, A.: Large-scale Maximum Likelihood-based Phylogenetic Analysis on the IBM BlueGene/L. In: Proceedings of IEEE/ACM Supercomputing Conference 2007 (2007)
Berlin, K., Huan, J., Jacob, M., Kochhar, G., Prins, J., Pugh, B., Sadayappan, P., Spacco, J., Tseng, C.: Evaluating the Impact of Programming Language Features on the Performance of Parallel Applications on Cluster Architectures. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958. Springer, Heidelberg (2004)
Cappello, F., Etiemble, D.: MPI versus MPI+ OpenMP on the IBM SP for the NAS Benchmarks. In: Proc. Supercomputing 2000, Dallas, TX (2000)
Krawezik, G., Alleon, G., Cappello, F.: SPMD OpenMP versus MPI on a IBM SMP for 3 Kernels of the NAS Benchmarks. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds.) ISHPC 2002. LNCS, vol. 2327. Springer, Heidelberg (2002)
Jones, M., Yao, R.: Parallel programming for OSEM reconstruction with MPI, OpenMP, and hybrid MPI-OpenMP. Nuclear Science Symposium Conference Record, 2004 IEEE 5 (2004)
Shan, H., Singh, J., Oliker, L., Biswas, R.: A Comparison of Three Programming Models for Adaptive Applications on the Origin2000. Journal of Parallel and Distributed Computing 62(2), 241–266 (2002)
Minh, B.Q., Vinh, L.S., von Haeseler, A., Schmidt, H.A.: pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics 21(19), 3794–3796 (2005)
Guindon, S., Gascuel, O.: A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood. Systematic Biology 52(5), 696–704 (2003)
Zwickl, D.: Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets under the Maximum Likelihood Criterion. PhD thesis, University of Texas at Austin (April 2006)
Ronquist, F., Huelsenbeck, J.: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12), 1572–1574 (2003)
McMahon, M.M., Sanderson, M.J.: Phylogenetic Supermatrix Analysis of GenBank Sequences from 2228 Papilionoid Legumes. Systematic Biology 55(5), 818–836 (2006)
Tavar, S.: Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences. Some Mathematical Questions in Biology: DNA Sequence Analysis 17 (1986)
Yang, Z.: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites. Journal of Molecular Evolution 39, 306–314 (1994)
Stamatakis, A.: The RAxML 7.0.4 Manual, The Exelixis Lab. LMU Munich (April 2008)
Bininda-Emonds, O., Cardillo, M., Jones, K., MacPhee, R., Beck, R., Grenyer, R., Price, S., Vos, R., Gittleman, J., Purvis, A.: The delayed rise of present-day mammals. Nature 446, 507–512 (2007)
Ott, M., Klug, T., Weidendorfer, J., Trinitis, C.: Autopin - Automated Optimization of Thread-to-Core Pinning on Multicore Systems. In: Proceedings of 1st Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG) (January 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stamatakis, A., Ott, M. (2008). Exploiting Fine-Grained Parallelism in the Phylogenetic Likelihood Function with MPI, Pthreads, and OpenMP: A Performance Study. In: Chetty, M., Ngom, A., Ahmad, S. (eds) Pattern Recognition in Bioinformatics. PRIB 2008. Lecture Notes in Computer Science(), vol 5265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88436-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-88436-1_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88434-7
Online ISBN: 978-3-540-88436-1
eBook Packages: Computer ScienceComputer Science (R0)