Informatics for PacBio Long Reads

  • Yuta SuzukiEmail author
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 1129)


In this article, we review the development of a wide variety of bioinformatics software implementing state-of-the-art algorithms since the introduction of SMRT sequencing technology into the field. We focus on the three major categories of development: read mapping (aligning to reference genomes), de novo assembly, and detection of structural variants. The long SMRT reads benefit all the applications, but they are achievable only through considering the nature of the long reads technology properly.



I’d like to thank Yoshihiko Suzuki, Yuichi Motai and Dr./Prof. Shinichi Morishita for insightful comments on the draft.


  1. Abouelhoda MI, Ohlebusch E. A local chaining algorithm and its applications in comparative genomics. International workshop on algorithms in bioinformatics. Berlin/Heidelberg: Springer; 2003.Google Scholar
  2. Au KF, et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci. 2013;110(50):E4821–30.CrossRefGoogle Scholar
  3. Berlin K, et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 2015;33(6):623–30.CrossRefGoogle Scholar
  4. Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics. 2012;13(1):238.CrossRefGoogle Scholar
  5. Chin C-S, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–9.CrossRefGoogle Scholar
  6. Chin C-S, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13(12):1050–4.CrossRefGoogle Scholar
  7. Clark TA, et al. Direct detection and sequencing of damaged DNA bases. Genome Integr. 2011;2(1):10.CrossRefGoogle Scholar
  8. Deonovic B, et al. IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Res. 2017;45(5):e32.CrossRefGoogle Scholar
  9. English AC, Salerno WJ, Reid JG. PBHoney: identifying genomic variants via long-read discordance and interrupted mapping. BMC Bioinformatics. 2014;15(1):180.CrossRefGoogle Scholar
  10. Eppstein D, et al. Sparse dynamic programming I: linear cost functions. J ACM (JACM). 1992;39(3):519–45.CrossRefGoogle Scholar
  11. Ferragina P, Manzini G. Opportunistic data structures with applications. Foundations of computer science, 2000. Proceedings. 41st annual symposium on. IEEE, 2000.Google Scholar
  12. Flusberg BA, et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods. 2010;7(6):461–5.CrossRefGoogle Scholar
  13. Huddleston J, et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 2017;27(5):677–85.CrossRefGoogle Scholar
  14. Ichikawa K, et al. Centromere evolution and CpG methylation during vertebrate speciation. Nat Commun. 2017;8(1):1833.CrossRefGoogle Scholar
  15. Kamath GM, et al. HINGE: long-read assembly achieves optimal repeat resolution. Genome Res. 2017;27(5):747–56.CrossRefGoogle Scholar
  16. Koren S, et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36.CrossRefGoogle Scholar
  17. Lander ES, Waterman MS. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics. 1988;2(3):231–9.CrossRefGoogle Scholar
  18. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv. 2013:1303.3997.Google Scholar
  19. Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics. 2016;32(14):2103–10.CrossRefGoogle Scholar
  20. Li H. Minimap2: versatile pairwise alignment for nucleotide sequences. arXiv. 2017:1708.Google Scholar
  21. Li H, Durbin R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics. 2010;26(5):589–95.CrossRefGoogle Scholar
  22. Loomis EW, et al. Sequencing the unsequenceable: expanded CGG-repeat alleles of the fragile X gene. Genome Res. 2013;23(1):121–8.CrossRefGoogle Scholar
  23. Miller W, Myers EW. Sequence comparison with concave weighting functions. Bull Math Biol. 1988;50(2):97–120.CrossRefGoogle Scholar
  24. Miller JR, et al. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics. 2008;24(24):2818–24.CrossRefGoogle Scholar
  25. Myers EW. An O (ND) difference algorithm and its variations. Algorithmica. 1986;1(1):251–66.CrossRefGoogle Scholar
  26. Myers EW. The fragment assembly string graph. Bioinformatics. 2005;21(Suppl_2):ii79–85.PubMedGoogle Scholar
  27. Myers G. Efficient local alignment discovery amongst noisy long reads. International workshop on algorithms in bioinformatics. Berlin/Heidelberg: Springer; 2014.Google Scholar
  28. Pendleton M, et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods. 2015;12(8):780–6.CrossRefGoogle Scholar
  29. Sedlazeck FJ, et al. Accurate detection of complex structural variations using single molecule sequencing. bioRxiv. 2017:169557.Google Scholar
  30. Seo J-S, et al. De novo assembly and phasing of a Korean human genome. Nature. 2016;538:243–7.CrossRefGoogle Scholar
  31. Steinberg KM, et al. Single haplotype assembly of the human genome from a hydatidiform mole. Genome Res. 2014;24(12):2066–76.CrossRefGoogle Scholar
  32. Suzuki H, Kasahara M. Acceleration of nucleotide semi-global alignment with adaptive banded dynamic programming. bioRxiv. 2017:130633.Google Scholar
  33. Suzuki Y, et al. AgIn: measuring the landscape of CpG methylation of individual repetitive elements. Bioinformatics. 2016;32(19):2911–9.CrossRefGoogle Scholar
  34. Vaser R, et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737–46.CrossRefGoogle Scholar
  35. Xiao C-L, et al. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nat Methods. 2017;14(11):1072–4.CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computational Biology and Medical Sciences, Graduate School of Frontier SciencesThe University of TokyoTokyoJapan

Personalised recommendations