Abstract
Dynamic Programming (DP) based change-point methods have shown very good statistical performance on DNA copy number analysis. However, the quadratic algorithmic complexity of DP has limited their use on high-density arrays or next generation sequencing data. This complexity issue is particularly critical for segmentation and calling of segments, and for the joint segmentation of many different profiles. Our contribution is two-fold. First we provide an at worst linear DP algorithm for segmentation and calling, which allows the use of DP-based segmentation on high-density arrays with a considerably reduced computational cost. For the joint segmentation issue we provide a parallel version of the cghseg package which now allows us to analyze more than 1,000 profiles of length 100,000 within a few hours. Therefore our method and software package are adapted to the next generation of computers (multi-cores) and experiments (very large profiles).
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amdahl, G. M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the AFIPS ’67 Spring Joint Computer Conference, 18–20 April 1967 (Spring), pp. 483–485. ACM (1967)
David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C.J., Bofkin, L., Jones, T., Davis, R.W., Steinmetz, L.M.: A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci. USA 103(14), 5320–5325 (2006)
Hocking, T.D., Schleiermacher, G., Janoueix-Lerosey, I., Delattre, O., Bach, F., Vert, J.-P.: Learning smoothing models using breakpoint annotations. HAL Technical report 00663790 (2012)
Killick, R., Fearnhead, P., Eckley, I. A.: Optimal detection of changepoints with a linear computational cost. arXiv:1101.1438, January 2011.
Marioni, J.-C., Thorne, N.-P., Tavare, S.: BioHMM: a heterogeneous hidden markov model for segmenting array CGH data. Bioinformatics 22(9), 1144–1146 (2006)
Nicolas, P., Bize, L., Muri, F., Hoebeke, M., Rodolphe, F., Ehrlich, S.D., Prum, B., Bessieres, P.: Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models. Nucleic Acids Res. 30(6), 1418–1426 (2002)
Nicolas, P., Leduc, A., Robin, S., Rasmussen, S., Jarmer, H., Bessieres, P.: Transcriptional landscape estimation from tiling array data using a model of signal shift and drift. Bioinformatics 25(18), 2341–2347 (2009)
Olshen, A.B., Venkatraman, E.S., Lucito, R., Wigler, M.: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5(4), 557–572 (2004)
Picard, F., Lebarbier, E., Hoebeke, M., Rigaill, G., Thiam, B., Robin, S.: Joint segmentation, calling and normalization of multiple array CGH profiles. Biostatistics 12(3), 413–428 (2011)
Picard, F., Robin, S., Lavielle, M., Vaisse, C., Daudin, J.-J.: A statistical approach for array CGH data analysis. BMC Bioinf. 6, 27 (2005)
Picard, F., Robin, S., Lebarbier, E., Daudin, J.-J.: A segmentation/clustering model for the analysis of array CGH data. Biometrics 63, 758–766 (2007)
Pique-Regi, R., Ortega, A., Asgharzadeh, S.: Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA. Bioinformatics 25(10), 1223–1230 (2009)
Rigaill, G.: Pruned dynamic programming for optimal multiple change-point detection. arxiv:1004.0887, April 2010
Shah, S.P.: Computational methods for identification of recurrent copy number alteration patterns by array CGH. Cytogenet. Genome Res. 123(1–4), 343–351 (2008)
Teo, S.M., Pawitan, Y., Kumar, V., Thalamuthu, A., Seielstad, M., Chia, K.S., Salim, A.: Multi-platform segmentation for joint detection of copy number variants. Bioinformatics 27(11), 1555–1561 (2011)
van de Wiel, M.A., Picard, F., van Wieringen, W.N., Ylstra, B.: Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief. Bioinf. 12(1), 10–21 (2011)
van de Wiel, M.A., Kim, K.I., Vosse, S.J., van Wieringen, W.N., Wilting, S.M., Ylstra, B.: CGHcall: calling aberrations for array cgh tumor profiles. Bioinformatics 23(7), 892–894 (2007)
Willenbrock, H., Fridlyand, J.: A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics 21(22), 4084–4091 (2005)
Zhang, N.R., Siegmund, D.O.: A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63(1), 22–32 (2007)
Zhang, N.R., Siegmund, D.O., Ji, H., Li, J.Z.: Detecting simultaneous changepoints in multiple sequences. Biometrika 97(3), 631–645 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Rigaill, G., Miele, V., Picard, F. (2014). Fast and Parallel Algorithm for Population-Based Segmentation of Copy-Number Profiles. In: Formenti, E., Tagliaferri, R., Wit, E. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2013. Lecture Notes in Computer Science(), vol 8452. Springer, Cham. https://doi.org/10.1007/978-3-319-09042-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-09042-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09041-2
Online ISBN: 978-3-319-09042-9
eBook Packages: Computer ScienceComputer Science (R0)