Skip to main content

Distributed Genomic Compression in MapReduce Paradigm

  • Conference paper
  • First Online:
Internet and Distributed Computing Systems (IDCS 2019)

Abstract

In recent years the biological data, represented for computational analysis, has increased in size terms. Despite the representation of the latter is demanded to specific file format, the analysis and managing overcame always more difficult due to high dimension of data. For these reasons, in recent years, a new computational framework, called Hadoop for manage and compute this data have been introduced. Hadoop is based on MapReduce paradigm to manage data in distributed systems. Despite the gain of performance obtained from this framework, our aim is to introduce a new compression method DSRC by decreasing the size of output file and make easy its processing from ad-hoc software. Performance analysis will show the reliability and efficiency achieved by our implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A GPU parallel implementation of the local principal component analysis overcomplete method for DW image denoising. In: IEEE Symposium on Computers and Communication (ISCC), Messina 2016, pp. 26–31 (2016). https://doi.org/10.1109/ISCC.2016.7543709

  2. Cuomo, S., Galletti, A., Marcellino, L.: A GPU algorithm in a distributed computing system for 3D MRI denoising. In: 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), Krakow, 2015, pp. 557–562 (2015). https://doi.org/10.1109/3PGCIC.2015.77

  3. De Luca, P., Galletti, A., Giunta G., Marcellino, L., Raei, M.: Performance analysis of a multicore implementation for solving a two-dimensional inverse anomalous diffusion problem. In: Proceedings of the 3rd International Conference and Summer School, NUMTA2019. LNCS (2019)

    Google Scholar 

  4. Montella, R., et al.: Accelerating Linux and Android applications on low-power devices through remote GPGPU offloading. Concurr. Comput. Pract. Exp. 29(24), e4286 (2017)

    Article  Google Scholar 

  5. Marcellino, L., et al.: Using GPGPU accelerated interpolation algorithms for Marine Bathymetry processing with on-premises and cloud based computational resources. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) PPAM 2017. LNCS, vol. 10778, pp. 14–24. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78054-2_2

    Chapter  Google Scholar 

  6. Montella, R., Di Luccio, D., Kosta, S., Giunta, G., Foster, I.: Performance, resilience, and security in moving data from the fog to the cloud: the DYNAMO transfer framework approach. In: Xiang, Y., Sun, J., Fortino, G., Guerrieri, A., Jung, J.J. (eds.) IDCS 2018. LNCS, vol. 11226, pp. 197–208. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02738-4_17

    Chapter  Google Scholar 

  7. https://hadoop.apache.org

  8. Roguski, Ł., Deorowicz, S.: DSRC 2-industry-oriented compression of FASTQ files. Bioinformatics 30(15), 2213–2215 (2014)

    Article  Google Scholar 

  9. Oliveira Jr., W., Justino, E., Oliveira, L.S.: Comparing compression models for authorship attribution. Forensic Sci. Int. 228(1–3), 100–104 (2013)

    Article  Google Scholar 

  10. Deorowicz, S., Grabowski, S.: Compression of genomic sequences in FASTQ format. Bioinformatics 27(6), 860–862 (2011)

    Article  Google Scholar 

  11. https://www.zerotier.com

  12. https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html

  13. https://hive.apache.org

  14. http://oozie.apache.org

  15. https://sourceforge.net/p/contrail-bio/code/ci/master/tree/

  16. https://www.ebi.ac.uk/ena

  17. https://www.boost.org

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pasquale De Luca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

De Luca, P., Fiscale, S., Landolfi, L., Di Mauro, A. (2019). Distributed Genomic Compression in MapReduce Paradigm. In: Montella, R., Ciaramella, A., Fortino, G., Guerrieri, A., Liotta, A. (eds) Internet and Distributed Computing Systems . IDCS 2019. Lecture Notes in Computer Science(), vol 11874. Springer, Cham. https://doi.org/10.1007/978-3-030-34914-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34914-1_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34913-4

  • Online ISBN: 978-3-030-34914-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics