Skip to main content

Compressing Resequencing Data with GReEn

  • Protocol
  • First Online:
Deep Sequencing Data Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1038))

  • 6269 Accesses

Abstract

Genome sequencing centers are flooding the scientific community with data. A single sequencing machine can nowadays generate more data in one day than any existing machine could have produced throughout the entire year of 2005. Therefore, the pressure for efficient sequencing data compression algorithms is very high and is being felt worldwide. Here, we describe GReEn (Genome Resequencing Encoding), a compression tool recently proposed for compressing genome resequencing data using a reference genome sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The authors are with the Signal Processing Lab, IEETA, DETI, University of Aveiro, 3810–193 Aveiro, Portugal.

References

  1. Grumbach S, Tahi F (1993) Compression of DNA sequences. In: Proceedings of the data compression conference, DCC-93, Snowbird, pp 340–350

    Google Scholar 

  2. Rivals E, Delahaye J-P, Dauchet M, Delgrange O (1996) A guaranteed compression scheme for repetitive DNA sequences. In: Proceedings of the data compression conference, DCC-96, Snowbird, p 453

    Google Scholar 

  3. Loewenstern D, Yianilos PN (1997) Significantly lower entropy estimates for natural DNA sequences. In: Proceedings of the data compression conference, DCC-97, Snowbird, March 1997, pp 151–160

    Google Scholar 

  4. Matsumoto T, Sadakane K, Imai H (2000) Biological sequence compression algorithms. In: Dunker AK, Konagaya A, Miyano S, Takagi T (eds) Genome informatics 2000: proceedings of the 11th workshop, Tokyo, pp 43–52

    Google Scholar 

  5. Chen X, Kwong S, Li M (2001) A compression algorithm for DNA sequences. IEEE Eng Med Biol Mag 20:61–66

    Article  Google Scholar 

  6. Chen X, Li M, Ma B, Tromp J (2002) DNACompress: fast and effective DNA sequence compression. Bioinformatics 18(12):1696–1698

    Article  PubMed  CAS  Google Scholar 

  7. Manzini G, Rastero M (2004) A simple and fast DNA compressor. Softw Pract Exp 34:1397–1411

    Article  Google Scholar 

  8. Korodi G, Tabus I (2005) An efficient normalized maximum likelihood algorithm for DNA sequence compression. ACM Trans Inform Syst 23(1):3–34

    Article  Google Scholar 

  9. Behzadi B, Le Fessant F (2005) DNA compression challenge revisited. In: Combinatorial pattern matching: proceedings of CPM-2005. LNCS, vol 3537. Jeju Island, June 2005. Springer-Verlag, New York, pp 190–200

    Google Scholar 

  10. Korodi G, Tabus I (2007) Normalized maximum likelihood model of order-1 for the compression of DNA sequences. In: Proceedings of the data compression conference, DCC-2007, Snowbird, March 2007, pp 33–42

    Google Scholar 

  11. Cao MD, Dix TI, Allison L, Mears C (2007) A simple statistical algorithm for biological sequence compression. In: Proceedings of the data compression conference, DCC-2007, Snowbird, March 2007, pp 43–52

    Google Scholar 

  12. Giancarlo R, Scaturro D, Utro F (2009) Textual data compression in computational biology: a synopsis. Bioinformatics 25(13):1575–1586

    Article  PubMed  CAS  Google Scholar 

  13. Pinho AJ, Neves AJR, Afreixo V, Bastos CAC, Ferreira PJSG (2006) A three-state model for DNA protein-coding regions. IEEE Trans Biomed Eng 53(11):2148–2155

    Article  PubMed  Google Scholar 

  14. Pinho AJ, Neves AJR, Ferreira PJSG (2008) Inverted-repeats-aware finite-context models for DNA coding. In: Proceedings of the 16th European signal processing conference, EUSIPCO-2008, Lausanne, August 2008

    Google Scholar 

  15. Pinho AJ, Neves AJR, Bastos CAC, Ferreira PJSG (2009) DNA coding using finite-context models and arithmetic coding. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, ICASSP-2009, Taipei, April 2009, pp 1693–1696

    Google Scholar 

  16. Pinho AJ, Pratas D, Ferreira PJSG (2011) Bacteria DNA sequence compression using a mixture of finite-context models. In: Proceedings of the IEEE workshop on statistical signal processing, Nice, June 2011

    Google Scholar 

  17. Pinho AJ, Ferreira PJSG, Neves AJR, Bastos CAC (2011) On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS One 6(6):e21588

    Article  PubMed  CAS  Google Scholar 

  18. Pinho AJ, Pratas D, Garcia SP (2012) GReEn: a tool for efficient compression of genome resequencing data. Nucleic Acids Res 40(4):e27

    Article  PubMed  CAS  Google Scholar 

  19. Rissanen J (1976) Generalized Kraft inequality and arithmetic coding. IBM J Res Dev 20(3):198–203

    Article  Google Scholar 

  20. Sayood K (2006) Introduction to data compression, 3rd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this protocol

Cite this protocol

Pinho, A.J., Pratas, D., Garcia, S.P. (2013). Compressing Resequencing Data with GReEn. In: Shomron, N. (eds) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol 1038. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-514-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-514-9_2

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-513-2

  • Online ISBN: 978-1-62703-514-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics