Abstract
Recent development of next generation sequencing (NGS) technologies has led to the identification of structural variants (SVs) of genomic DNA existing in the human population. Several SV detection methods utilizing NGS data have been proposed. However, there are several difficulties in analysis of NGS data, particularly with regard to handling reads from duplicated loci or low-complexity sequences of the human genome. In this paper, we propose SVEM, a novel statistical method to detect SVs with a single nucleotide resolution that can utilize multi-mapped reads on breakpoints. SVEM estimates the amount of reads on breakpoints as parameters and mapping states as latent variables using the expectation maximization algorithm. This framework enables us to handle ambiguous mapping of reads without discarding information for SV detection. SVEM is applied to simulation data and real data, and it achieves better performance than existing methods in terms of precision and recall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Feuk, L., Carson, A.R., Scherer, S.W.: Structural variation in the human genome. Nat. Rev. Genet. 7(2), 85–97 (2006)
Xu, B., Roos, J.L., Levy, S., Van Rensburg, E.J., Gogos, J.A., Karayiorgou, M.: Strong association of de novo copy number mutations with sporadic schizophrenia. Nat. Genet. 40(7), 880–885 (2008)
Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., et al.: A census of human cancer genes. Nat. Rev. Cancer 4(3), 177–183 (2004)
Reich, D.E., Schaffner, S.F., Daly, M.J., McVean, G., Mullikin, J.C., Higgins, J.M., et al.: Human genome sequence variation and the influence of gene history, mutation and recombination. Nat. Genet. 32(1), 135–142 (2002)
Hoogendoorn, E.: Computational methods for the detection of structural variation in the human genome (2012)
Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., et al.: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20(2), 207–211 (1998)
Hehir-Kwa, J.Y., Egmont-Petersen, M., Janssen, I.M., Smeets, D., Van Kessel, A.G., Veltman, J.A.: Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis. DNA Res. 14(1), 1–11 (2007)
Miller, D.T., Adam, M.P., Aradhya, S., Biesecker, L.G., Brothman, A.R., Carter, N.P., et al.: Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86(5), 749–764 (2010)
Alkan, C., Coe, B.P., Eichler, E.E.: Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12(5), 363–376 (2011)
Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., et al.: Fine-scale structural variation of the human genome. Nat. Genet. 37(7), 727–732 (2005)
Abyzov, A., Urban, A.E., Snyder, M., Gerstein, M.: CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome. Res. 21(6), 974–984 (2011)
Rausch, T., Zichner, T., Schlattl, A., Stütz, A.M., Benes, V., Korbel, J.O.: DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28(18), i333–i339 (2012)
Chen, K., Wallis, J.W., McLellan, M.D., Larson, D.E., Kalicki, J.M., Pohl, C.S., et al.: BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6(9), 677–681 (2009)
Ye, K., Schulz, M.H., Long, Q., Apweiler, R., Ning, Z.: Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25(21), 2865–2871 (2009)
Suzuki, S., Yasuda, T., Shiraishi, Y., Miyano, S., Nagasaki, M.: ClipCrop: a tool for detecting structural variations with single-base resolution using soft-clipping information. BMC Bioinformatics 12(Suppl. 14), 7 (2011)
Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Giga Science 1(1), 18 (2012)
Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., McVean, G.A.: An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422), 56–65 (2012) (1000 Genomes Project Consortium)
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Ewing, B., Hillier, L., Wendl, M.C., Green, P.: Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8(3), 175–185 (1998)
Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., et al.: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409(6822), 928–933 (2001)
Mills, R.E., Walter, K., Stewart, C., Handsaker, R.E., Chen, K., Alkan, C., et al.: Mapping copy number variation by population-scale genome sequencing. Nature 470(7332), 59–65 (2011)
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4), 493–500 (2010)
Nariai, N., Hirose, O., Kojima, K., Nagasaki, M.: TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference. Bioinformatics 29(18), 2292–2299 (2013)
Mimori, T., Nariai, N., Kojima, K., Takahashi, M., Ono, A., Sato, Y., Yamaguchi-Kabata, Y., Nagasaki, M.: iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data. BMC Systems Biology 7(6), 1–8 (2013)
Kojima, K., Nariai, N., Mimori, T., Takahashi, M., Yamaguchi-Kabata, Y., Sato, Y., Nagasaki, M.: A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads. Bioinformatics 29(22), 2835–2843 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ohtsuki, T. et al. (2014). SVEM: A Structural Variant Estimation Method Using Multi-mapped Reads on Breakpoints. In: Dediu, AH., MartÃn-Vide, C., Truthe, B. (eds) Algorithms for Computational Biology. AlCoB 2014. Lecture Notes in Computer Science(), vol 8542. Springer, Cham. https://doi.org/10.1007/978-3-319-07953-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-07953-0_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07952-3
Online ISBN: 978-3-319-07953-0
eBook Packages: Computer ScienceComputer Science (R0)