Skip to main content

Porting Referential Genome Compression Tool on Loongson Platform

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 729))

Abstract

With the fast development of genome sequencing technology, genome sequencing become faster and affordable. Consequently, genomic scientists are now facing an explosive increase of genomic data. Managing, storing and analyzing this quickly growing amount of data is challenging. It is desirable to apply some compression techniques to reduce storage and transferring cost. Referential genome compression is one of these techniques, which exploited the highly similarity of the same or an evolutionary close species (e.g., two randomly selected humans have at least 99% of genetic similarity) and store only the differences between the compressed file and well-known reference genome sequence. In this paper, we port two referential compression algorithm to Loongson platform and profiling their performance. And we use multi-process technology to improve the speed of compression.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., et al.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)

    Article  Google Scholar 

  2. Illumina Int: HiSeq X Series of Sequencing Systems Specification Sheet (2016). https://www.illumina.com/documents/products/datasheets/datasheet-hiseq-x-ten.pdf

  3. Reuter, J.A., Spacek, D.V., Snyder, M.P.: High-throughput sequencing technologies. Mol. Cell 58, 586–597 (2015)

    Article  Google Scholar 

  4. Joly, Y., Dove, E.S., Knoppers, B.M., Bobrow, M., Chalmers, D.: Data sharing in the post-genomic world: the experience of the International Cancer Genome Consortium (ICGC) Data Access Compliance Office (DACO). PLoS Comput. Biol. 8, e1002549 (2012)

    Article  Google Scholar 

  5. Collins, F.S., Barker, A.D.: Mapping the cancer genome. Sci. Am. 296, 50–57 (2007)

    Article  Google Scholar 

  6. ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)

    Article  Google Scholar 

  7. Kahn, S.D.: On the future of genomic data. Science 331, 728–729 (2011)

    Article  Google Scholar 

  8. Nalbantoglu, Ö.U., Russell, D.J., Sayood, K.: Data compression concepts and algorithms and their applications to bioinformatics. Entropy 12, 34–52 (2009)

    Article  Google Scholar 

  9. Antoniou, D., Theodoridis, E., Tsakalidis, A.: Compressing biological sequences using self adjusting data structures. In: 2010 10th IEEE International Conference on Information Technology and Applications in Biomedicine (ITAB), pp. 1–5 (2010)

    Google Scholar 

  10. Grumbach, S., Tahi, F.: A new challenge for compression algorithms: genetic sequences. Inf. Process. Manag. 30, 875–886 (1994)

    Article  MATH  Google Scholar 

  11. Bose, T., Mohammed, M.H., Dutta, A., Mande, S.S.: BIND–an algorithm for loss-less compression of nucleotide sequence data. J. Biosci. 37, 785–789 (2012)

    Article  Google Scholar 

  12. Cao, M.D., Dix, T.I., Allison, L., Mears, C.: A simple statistical algorithm for biological sequence compression. In: 2007 Data Compression Conference, DCC 2007, pp. 43–52 (2007)

    Google Scholar 

  13. Deorowicz, S., Grabowski, S.: Robust relative compression of genomes with random access. Bioinformatics 27, 2979–2986 (2011)

    Article  Google Scholar 

  14. Wandelt, S., Leser, U.: FRESCO: referential compression of highly similar sequences. IEEE/ACM Trans. Comput. Biol. Bioinform. 10, 1275–1288 (2013)

    Article  Google Scholar 

  15. Alves, F., Cogo, V., Wandelt, S., Leser, U., Bessani, A.: On-demand indexing for referential compression of DNA sequences. PLoS ONE 10, e0132460 (2015)

    Article  Google Scholar 

  16. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  17. Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40, 1098–1101 (1952)

    Article  MATH  Google Scholar 

  18. Pinho, A.J., Ferreira, P.J., Neves, A.J., Bastos, C.A.: On the representability of complete genomes by multiple competing finite-context (Markov) models. PLoS ONE 6, e21588 (2011)

    Article  Google Scholar 

  19. Rajarajeswari, P., Apparao, A.: DNABIT compress-genome compression algorithm. Bioinformation 5, 350–360 (2011)

    Article  Google Scholar 

  20. Kuruppu, S., Beresford-Smith, B., Conway, T., Zobel, J.: Iterative dictionary construction for compression of large DNA data sets. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 9, 137–149 (2012)

    Article  Google Scholar 

  21. Pratas, D., Pinho, A.J.: Compressing the human genome using exclusively Markov models. In: Rocha, M.P., Rodríguez, J.M.C., Fdez-Riverola, F., Valencia, A. (eds.) PACBB 2011, pp. 213–220. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19914-1_29

    Google Scholar 

  22. Wang, C., Zhang, D.: A novel compression tool for efficient storage of genome resequencing data. Nucleic Acids Res. 39, e45 (2011)

    Article  Google Scholar 

  23. Saha, S., Rajasekaran, S.: ERGC: an efficient referential genome compression algorithm. Bioinformatics, btv399 (2015)

    Google Scholar 

  24. Li, R., Yu, C., Li, Y., Lam, T.-W., Yiu, S.-M., Kristiansen, K., et al.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009)

    Article  Google Scholar 

  25. Luo, Q., Liu, G., Ming, Z., Xiao, F.: Porting and optimizing SOAP2 on Loongson Architecture. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), pp. 566–570 (2015)

    Google Scholar 

  26. Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

    Article  Google Scholar 

Download references

Acknowledgment

The research was jointly supported by Shenzhen Science & Technology Foundation: JCYJ20150930105133185, National Natural Science Foundation of China: NSF/GDU1301252, and State Key Laboratory of Computer Architecture ICTCA: CARCH 201405. Guangdong Province Key Laboratory Project: 2012A061400024.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiuming Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd

About this paper

Cite this paper

Du, Z., Guo, C., Zhang, Y., Luo, Q. (2017). Porting Referential Genome Compression Tool on Loongson Platform. In: Chen, G., Shen, H., Chen, M. (eds) Parallel Architecture, Algorithm and Programming. PAAP 2017. Communications in Computer and Information Science, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-6442-5_43

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6442-5_43

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6441-8

  • Online ISBN: 978-981-10-6442-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics