High-Performance Haplotype Assembly

Aldinucci, Marco; Bracciali, Andrea; Marschall, Tobias; Patterson, Murray; Pisanti, Nadia; Torquati, Massimo

doi:10.1007/978-3-319-24462-4_21

Marco Aldinucci¹⁹,
Andrea Bracciali¹⁷,
Tobias Marschall^20,21,
Murray Patterson²²,
Nadia Pisanti¹⁸ &
…
Massimo Torquati¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8623))

Included in the following conference series:

International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics

1427 Accesses

Abstract

The problem of Haplotype Assembly is an essential step in human genome analysis. It is typically formalised as the Minimum Error Correction (MEC) problem which is NP-hard. MEC has been approached using heuristics, integer linear programming, and fixed-parameter tractability (FPT), including approaches whose runtime is exponential in the length of the DNA fragments obtained by the sequencing process. Technological improvements are currently increasing fragment length, which drastically elevates computational costs for such methods. We present pWhatsHap, a multi-core parallelisation of WhatsHap, a recent FPT optimal approach to MEC. WhatsHap moves complexity from fragment length to fragment overlap and is hence of particular interest when considering sequencing technology’s current trends. pWhatsHap further improves the efficiency in solving the MEC problem, as shown by experiments performed on datasets with high coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aldinucci, M., Bracciali, A., Liò, P., Sorathiya, A., Torquati, M.: StochKit-FF: Efficient systems biology on multicore architectures. In: Guarracino, M.R., et al. (eds.) Euro-Par-Workshop 2010. LNCS, vol. 6586, pp. 167–175. Springer, Heidelberg (2011)
Chapter Google Scholar
Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: Accelerating code on multi-cores with fastflow. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 170–181. Springer, Heidelberg (2011)
Chapter Google Scholar
Aldinucci, M., Torquati, M., Spampinato, C., Drocco, M., Misale, C., Calcagno, C., Coppo, M.: Parallel stochastic systems biology in the cloud. Briefings in Bioinformatics, June 2013
Google Scholar
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS 1967 (Spring): Proc. of the April 18-20, pp. 483–485 (1967)
Google Scholar
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Communications of the ACM 52(10), 56–67 (2009)
Article Google Scholar
Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–159 (2008)
Google Scholar
Bansal, V., Halpern, A.L., Axelrod, N., Bafna, V.: An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research 18(8), 1336–1346 (2008)
Article Google Scholar
Chen, Z.-Z., Deng, F., Wang, L.: Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 29(16), 1938–1945 (2013)
Article Google Scholar
Cilibrasi, R., van Iersel, L., Kelk, S., Tromp, J.: On the complexity of several haplotyping problems. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 128–139. Springer, Heidelberg (2005)
Chapter Google Scholar
R.G. Downey, M.R. Fellows: Parameterized Complexity, 530 pp. Springer (1999)
Google Scholar
Fouilhoux, P., Mahjoub, A.R.: Solving VLSI design and DNA sequencing problems using bipartization of graphs. Computational Optimization and Applications 51(2), 749–781 (2012)
Article MathSciNet MATH Google Scholar
Greenberg, H.J., Hart, W.E., Lancia, G.: Opportunities for combinatorial optimization in computational biology. INFORMS J. on Computing 16(3), 211–231 (2004)
Article MathSciNet MATH Google Scholar
He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183–i190 (2010)
Google Scholar
Kuleshov, V.: Probabilistic single-individual haplotyping. Bioinformatics 30(17), i379–i385 (2014)
Google Scholar
Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., et al.: The diploid genome sequence of an individual human. PLoS Biol. 5(10), e254 (2007)
Google Scholar
Mattson, T., Sanders, B., Massingill, B.: Patterns for parallel programming. Addison-Wesley Professional (2004)
Google Scholar
Misale, C.: Accelerating bowtie2 with a lock-less concurrency approach and memory affinity. In: Proc. of the 22nd International Euromicro Conference PDP 2014: Parallel Distributed and network-based Processing, pp. 578–585 (2014)
Google Scholar
Mousavi, S.R., Mirabolghasemi, M., Bargesteh, N., Talebi, M.: Effective haplotype assembly via maximum Boolean satisfiablility. Biochemical and Biophysical Research Communications 404(2), 593–598 (2011)
Article Google Scholar
Panconesi, A., Sozio, M.: Fast hare: A fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 266–277. Springer, Heidelberg (2004)
Chapter Google Scholar
Patterson, M., Marschall, T., Pisanti, N., van Iersel, L., Stougie, L., Klau, G.W., Schönhuth, A.: Whatshap: Haplotype assembly for future-generation sequencing reads. In: Proc. of 18th ACM Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 237–249 (2014)
Google Scholar
Zhao, Y.-T., Wu, L.-Y., Zhang, J.-H., Wang, R.-S., Zhang, X.-S.: Haplotype assembly from aligned weighted SNP fragments. Computational Biology and Chemistry 29, 281–287 (2005)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Mathematics, Stirling University, Stirling, UK
Andrea Bracciali
ERABLE team, INRIA, Computer Science Department, University of Pisa, Pisa, Italy
Nadia Pisanti & Massimo Torquati
Computer Science Department, University of Torino, Torino, Italy
Marco Aldinucci
Center for Bioinformatics, Saarland University, Saarbrücken, Germany
Tobias Marschall
Computational Biology and Applied Algorithmics, Max Planck Inst. for Informatics, Saarbrücken, Germany
Tobias Marschall
Lab. Biométrie et Biologie Evolutive, University Lyon, Lyon, France
Murray Patterson

Authors

Marco Aldinucci
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Bracciali
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Marschall
View author publications
You can also search for this author in PubMed Google Scholar
Murray Patterson
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Pisanti
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Torquati
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CUSSB, University "Vita-Salute" San Raffae, Milano, Italy
Clelia DI Serio
The Computer Laboratory, University of Cambridge, Cambridge, United Kingdom
Pietro Liò
CUSSB, Università Vita-Salute San Raffaele, Milano, Italy
Alessandro Nonis
Dipartimento di Informatica, Universitá degli Studi di Salerno, Fisciano, Salerno, Italy
Roberto Tagliaferri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aldinucci, M., Bracciali, A., Marschall, T., Patterson, M., Pisanti, N., Torquati, M. (2015). High-Performance Haplotype Assembly. In: DI Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2014. Lecture Notes in Computer Science(), vol 8623. Springer, Cham. https://doi.org/10.1007/978-3-319-24462-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-24462-4_21
Published: 18 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24461-7
Online ISBN: 978-3-319-24462-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics