Abstract
Genome sequencing centers are flooding the scientific community with data. A single sequencing machine can nowadays generate more data in one day than any existing machine could have produced throughout the entire year of 2005. Therefore, the pressure for efficient sequencing data compression algorithms is very high and is being felt worldwide. Here, we describe GReEn (Genome Resequencing Encoding), a compression tool recently proposed for compressing genome resequencing data using a reference genome sequence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The authors are with the Signal Processing Lab, IEETA, DETI, University of Aveiro, 3810–193 Aveiro, Portugal.
References
Grumbach S, Tahi F (1993) Compression of DNA sequences. In: Proceedings of the data compression conference, DCC-93, Snowbird, pp 340–350
Rivals E, Delahaye J-P, Dauchet M, Delgrange O (1996) A guaranteed compression scheme for repetitive DNA sequences. In: Proceedings of the data compression conference, DCC-96, Snowbird, p 453
Loewenstern D, Yianilos PN (1997) Significantly lower entropy estimates for natural DNA sequences. In: Proceedings of the data compression conference, DCC-97, Snowbird, March 1997, pp 151–160
Matsumoto T, Sadakane K, Imai H (2000) Biological sequence compression algorithms. In: Dunker AK, Konagaya A, Miyano S, Takagi T (eds) Genome informatics 2000: proceedings of the 11th workshop, Tokyo, pp 43–52
Chen X, Kwong S, Li M (2001) A compression algorithm for DNA sequences. IEEE Eng Med Biol Mag 20:61–66
Chen X, Li M, Ma B, Tromp J (2002) DNACompress: fast and effective DNA sequence compression. Bioinformatics 18(12):1696–1698
Manzini G, Rastero M (2004) A simple and fast DNA compressor. Softw Pract Exp 34:1397–1411
Korodi G, Tabus I (2005) An efficient normalized maximum likelihood algorithm for DNA sequence compression. ACM Trans Inform Syst 23(1):3–34
Behzadi B, Le Fessant F (2005) DNA compression challenge revisited. In: Combinatorial pattern matching: proceedings of CPM-2005. LNCS, vol 3537. Jeju Island, June 2005. Springer-Verlag, New York, pp 190–200
Korodi G, Tabus I (2007) Normalized maximum likelihood model of order-1 for the compression of DNA sequences. In: Proceedings of the data compression conference, DCC-2007, Snowbird, March 2007, pp 33–42
Cao MD, Dix TI, Allison L, Mears C (2007) A simple statistical algorithm for biological sequence compression. In: Proceedings of the data compression conference, DCC-2007, Snowbird, March 2007, pp 43–52
Giancarlo R, Scaturro D, Utro F (2009) Textual data compression in computational biology: a synopsis. Bioinformatics 25(13):1575–1586
Pinho AJ, Neves AJR, Afreixo V, Bastos CAC, Ferreira PJSG (2006) A three-state model for DNA protein-coding regions. IEEE Trans Biomed Eng 53(11):2148–2155
Pinho AJ, Neves AJR, Ferreira PJSG (2008) Inverted-repeats-aware finite-context models for DNA coding. In: Proceedings of the 16th European signal processing conference, EUSIPCO-2008, Lausanne, August 2008
Pinho AJ, Neves AJR, Bastos CAC, Ferreira PJSG (2009) DNA coding using finite-context models and arithmetic coding. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, ICASSP-2009, Taipei, April 2009, pp 1693–1696
Pinho AJ, Pratas D, Ferreira PJSG (2011) Bacteria DNA sequence compression using a mixture of finite-context models. In: Proceedings of the IEEE workshop on statistical signal processing, Nice, June 2011
Pinho AJ, Ferreira PJSG, Neves AJR, Bastos CAC (2011) On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS One 6(6):e21588
Pinho AJ, Pratas D, Garcia SP (2012) GReEn: a tool for efficient compression of genome resequencing data. Nucleic Acids Res 40(4):e27
Rissanen J (1976) Generalized Kraft inequality and arithmetic coding. IBM J Res Dev 20(3):198–203
Sayood K (2006) Introduction to data compression, 3rd edn. Morgan Kaufmann, San Francisco
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this protocol
Cite this protocol
Pinho, A.J., Pratas, D., Garcia, S.P. (2013). Compressing Resequencing Data with GReEn. In: Shomron, N. (eds) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol 1038. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-514-9_2
Download citation
DOI: https://doi.org/10.1007/978-1-62703-514-9_2
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-513-2
Online ISBN: 978-1-62703-514-9
eBook Packages: Springer Protocols