Distributed Genomic Compression in MapReduce Paradigm

  • Pasquale De LucaEmail author
  • Stefano Fiscale
  • Luca Landolfi
  • Annabella Di Mauro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11874)


In recent years the biological data, represented for computational analysis, has increased in size terms. Despite the representation of the latter is demanded to specific file format, the analysis and managing overcame always more difficult due to high dimension of data. For these reasons, in recent years, a new computational framework, called Hadoop for manage and compute this data have been introduced. Hadoop is based on MapReduce paradigm to manage data in distributed systems. Despite the gain of performance obtained from this framework, our aim is to introduce a new compression method DSRC by decreasing the size of output file and make easy its processing from ad-hoc software. Performance analysis will show the reliability and efficiency achieved by our implementation.


Hadoop Distributed computing Genomic compression data 


  1. 1.
    Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A GPU parallel implementation of the local principal component analysis overcomplete method for DW image denoising. In: IEEE Symposium on Computers and Communication (ISCC), Messina 2016, pp. 26–31 (2016).
  2. 2.
    Cuomo, S., Galletti, A., Marcellino, L.: A GPU algorithm in a distributed computing system for 3D MRI denoising. In: 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), Krakow, 2015, pp. 557–562 (2015).
  3. 3.
    De Luca, P., Galletti, A., Giunta G., Marcellino, L., Raei, M.: Performance analysis of a multicore implementation for solving a two-dimensional inverse anomalous diffusion problem. In: Proceedings of the 3rd International Conference and Summer School, NUMTA2019. LNCS (2019)Google Scholar
  4. 4.
    Montella, R., et al.: Accelerating Linux and Android applications on low-power devices through remote GPGPU offloading. Concurr. Comput. Pract. Exp. 29(24), e4286 (2017)CrossRefGoogle Scholar
  5. 5.
    Marcellino, L., et al.: Using GPGPU accelerated interpolation algorithms for Marine Bathymetry processing with on-premises and cloud based computational resources. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds.) PPAM 2017. LNCS, vol. 10778, pp. 14–24. Springer, Cham (2018). Scholar
  6. 6.
    Montella, R., Di Luccio, D., Kosta, S., Giunta, G., Foster, I.: Performance, resilience, and security in moving data from the fog to the cloud: the DYNAMO transfer framework approach. In: Xiang, Y., Sun, J., Fortino, G., Guerrieri, A., Jung, J.J. (eds.) IDCS 2018. LNCS, vol. 11226, pp. 197–208. Springer, Cham (2018). CrossRefGoogle Scholar
  7. 7.
  8. 8.
    Roguski, Ł., Deorowicz, S.: DSRC 2-industry-oriented compression of FASTQ files. Bioinformatics 30(15), 2213–2215 (2014)CrossRefGoogle Scholar
  9. 9.
    Oliveira Jr., W., Justino, E., Oliveira, L.S.: Comparing compression models for authorship attribution. Forensic Sci. Int. 228(1–3), 100–104 (2013)CrossRefGoogle Scholar
  10. 10.
    Deorowicz, S., Grabowski, S.: Compression of genomic sequences in FASTQ format. Bioinformatics 27(6), 860–862 (2011)CrossRefGoogle Scholar
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of SalernoFiscianoItaly
  2. 2.Science and Technologies DepartmentUniversity of Naples “Parthenope”NaplesItaly
  3. 3.Department of PathologyIstituto Nazionale Tumori, IRCCS-Fondazione “G. Pascale”NaplesItaly

Personalised recommendations