Abstract
Whole genome amplification (WGA) have been applied to single cell copy number variations (CNVs) analysis, which is a common genomic mutation associated with various diseases and provides new insight for the fields of biology and medicine. However, the WGA-induced bias based on multiple displacement amplification (MDA) significantly limits sensitivity and specificity for CNVs detection. To address the limitations, an empirical algorithm for CNVs detection at single cell level was developed. This proposed method consists of base call amplification, alig- nment and analysis to remove the MDA-induced bias. We generated and analyzed about 50G short read data sets based on MDAsim, a software to amplify the chromosome 21 into various coverage. Simulation experiments have shown that the coverage tended to be less than average in genomic GC-enriched (>45 %) regions, implying a significant amplification bias within these regions. Base substitution error frequencies with G > A transversion is being among the most frequent and C > T, G > T transversions are among the least frequent substitution errors. The estimated substitution was employed to compensate errors to correct bias readings.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Navin, N., Kendall, J., Troge, J., Andrews, P., Rodgers, L., McIndoo, J., Cook, K., Stepansky, A., Levy, D., Esposito, D.: Tumour evolution inferred by single-cell sequencing. Nature 472(7341), 90–94 (2011)
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)
Benjamini, Y., Speed, T.P.: Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40(10), e72 (2012)
Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36(16), e105 (2008)
Voet, T., Kumar, P., Van Loo, P., Cooke, S.L., Marshall, J., Lin, M., Esteki, M.Z., Van der Aa, N., Mateiu, L., McBride, D.J.: Single-cell paired-end genome sequencing reveals structural variation per cell cycle. Nucleic Acids Res (2013)
Hou, Y., Song, L., Zhu, P., Zhang, B., Tao, Y., Xu, X., Li, F., Wu, K., Liang, J., Shao, D.: Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell 148(5), 873–885 (2012)
Talkowski, M.E., Rosenfeld, J.A., Blumenthal, I., Pillalamarri, V., Chiang, C., Heilbut, A., Ernst, C., Hanscom, C., Rossin, E., Lindgren, A.M.: Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries. Cell 149(3), 525–537 (2012)
Dean, F.B., Nelson, J.R., Giesler, T.L., Lasken, R.S.: Rapid amplification of plasmid and phage DNA using phi29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 11(6), 1095–1099 (2001)
Tagliavi, Z., Draghici, S.: MDAsim: A multiple displacement amplification simulator. In: 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1–4. IEEE (2012)
Paez, J.G., Lin, M., Beroukhim, R., Lee, J.C., Zhao, X., Richter, D.J., Gabriel, S., Herman, P., Sasaki, H., Altshuler, D.: Genome coverage and sequence fidelity of Φ29 polymerasee-based multiple strand displacement whole genome amplification. Nucleic Acids Res. 32(9), e71 (2004)
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Arriola, E., Lambros, M.B., Jones, C., Dexter, T., Mackay, A., Tan, D.S., Tamber, N., Fenwick, K., Ashworth, A., Dowsett, M.: Evaluation of Phi29-based whole-genome amplification for microarray-based comparative genomic hybridisation. Lab. Invest. 87(1), 75–83 (2007)
Bredel, M., Bredel, C., Juric, D., Kim, Y., Vogel, H., Harsh, G.R., Recht, L.D., Pollack, J.R., Sikic, B.I.: Amplification of whole tumor genomes and gene-by-gene mapping of genomic aberrations from limited sources of fresh-frozen and paraffin-embedded DNA. J. Mol. Diagn. 7(2), 171–182 (2005)
Zhang, C., Zhang, C., Chen, S., Yin, X., Pan, X., Lin, G., Tan, Y., Tan, K., Xu, Z., Hu, P.: A single cell level based method for copy number variation analysis by low coverage massively parallel sequencing. PLoS ONE 8(1), e54236 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Xu, B., Li, T., Luo, Y., Xu, R., Cai, H. (2014). An Empirical Algorithm for Bias Correction Based on GC Estimation for Single Cell Sequencing. In: Peng, WC., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8643. Springer, Cham. https://doi.org/10.1007/978-3-319-13186-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-13186-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13185-6
Online ISBN: 978-3-319-13186-3
eBook Packages: Computer ScienceComputer Science (R0)