Skip to main content

A High Performance Storage Appliance for Genomic Data

  • Conference paper
  • First Online:
Book cover Bioinformatics and Biomedical Engineering (IWBBIO 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10209))

Included in the following conference series:

Abstract

Rapid advancements in the area of next generation sequencing is revolutionizing the way in which biologists and now increasingly, clinicians analyze genomic data. These advances have substantially decreased the time and the cost it takes to sequence the genomes of new patients, thereby making genomic techniques more mainstream and giving rise to the new era of precision medicine. National scale genome programs have been launched in various parts of the world such as USA, the United Kingdom, and Saudi Arabia to name a few. One of the key insights out of this mainstream adoption is that even though the time and cost of generating sequence data has decreased dramatically, the cost of analyzing the data to yield clinically relevant information has not proportionally decreased. On the contrary, downstream analysis of the genomic data now dominates the cost in terms of time, effort and monetary value. This could be attributed to a number of factors: the sheer volume of data, limited knowledge of phenotypic, regulatory and epigenetic artifacts within the genome, and limited computational capabilities of existing data analysis tools and infrastructure. Overcoming these challenges is central to realize a more accurate, sophisticated and cost-effective genomic medicine. Another challenge, related to the limited analytic capabilities of existing computational and storage infrastructure is what we address in this paper. We discuss how novel trends in hardware, including the emergence of cheap, high performance and endurance solid-state storage associated with low latency interconnect and software defined orchestration, can help creating a high performance storage tier which improves data acquisition, storage, transmission and analysis over the current commercial alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gonzalez-Garay, M.: The road from next-generation sequencing to personalized medicine. Pers. Med. 11(5), 523–544 (2014)

    Article  Google Scholar 

  2. DePristo, M., Banks, E., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)

    Article  Google Scholar 

  3. Stephens, Z., Lee, S., Faghri, F., Campbell, R., Zhai, C., Efron, M., et al.: Big data: Astronomical or genomical? PLoS Biol. 13(7) (2015)

    Google Scholar 

  4. Supermicro (2016). www.supermicro.com

  5. IOzone: File system benchmarking (2016). www.iozone.org

  6. PetaGene (2016). www.petagene.com/

  7. Greenfield, D., Stegle, O., Rrustemi, A.: GeneCodeq: quality score compression and improved genotyping using a bayesian framework. Bioinformatics 32(20), 3124–3132 (2016)

    Article  Google Scholar 

  8. Ali, A.A., El-Kalioby, M., Abouelhoda, M.: Supporting bioinformatics applications with hybrid multi-cloud services. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 415–425. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16483-0_41

    Google Scholar 

  9. Ali, A.A., El-Kalioby, M., Abouelhoda, M.: The case for docker in multicloud enabled bioinformatics applications. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 587–601. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31744-1_52

    Chapter  Google Scholar 

Download references

Acknowledgments

This publication was supported by the Saudi Human Genome Project, King Abdulaziz City for Science and Technology (KACST). Our thanks to Majed Alelaiwi, Gabriele Paciucci, Adam Roe and Ahmad Al-jeshi of Intel for their collaboration throughout the project. Our thanks to Majed Alelaiwi, Gabriele Paciucci, Craig Rhodes, Adam Roe and Ahmad Al-jeshi of Intel for their collaboration throughout the project. We would also like to thank Faheem Karim and Martin Galle from Supermicro on their advice on chassis and configuration. We would like to thank Vaughn Wittorff and Dan Greenfield of PetaGene (Cambridge, UK) for allowing us to use their test compression runs.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Abouelhoda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kaul, G., Shah, Z.A., Abouelhoda, M. (2017). A High Performance Storage Appliance for Genomic Data. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56154-7_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56153-0

  • Online ISBN: 978-3-319-56154-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics