A High Performance Storage Appliance for Genomic Data

Kaul, Gaurav; Shah, Zeeshan Ali; Abouelhoda, Mohamed

doi:10.1007/978-3-319-56154-7_43

Gaurav Kaul¹⁵,
Zeeshan Ali Shah^16,17 &
Mohamed Abouelhoda^16,17

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10209))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

1843 Accesses
2 Citations

Abstract

Rapid advancements in the area of next generation sequencing is revolutionizing the way in which biologists and now increasingly, clinicians analyze genomic data. These advances have substantially decreased the time and the cost it takes to sequence the genomes of new patients, thereby making genomic techniques more mainstream and giving rise to the new era of precision medicine. National scale genome programs have been launched in various parts of the world such as USA, the United Kingdom, and Saudi Arabia to name a few. One of the key insights out of this mainstream adoption is that even though the time and cost of generating sequence data has decreased dramatically, the cost of analyzing the data to yield clinically relevant information has not proportionally decreased. On the contrary, downstream analysis of the genomic data now dominates the cost in terms of time, effort and monetary value. This could be attributed to a number of factors: the sheer volume of data, limited knowledge of phenotypic, regulatory and epigenetic artifacts within the genome, and limited computational capabilities of existing data analysis tools and infrastructure. Overcoming these challenges is central to realize a more accurate, sophisticated and cost-effective genomic medicine. Another challenge, related to the limited analytic capabilities of existing computational and storage infrastructure is what we address in this paper. We discuss how novel trends in hardware, including the emergence of cheap, high performance and endurance solid-state storage associated with low latency interconnect and software defined orchestration, can help creating a high performance storage tier which improves data acquisition, storage, transmission and analysis over the current commercial alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gonzalez-Garay, M.: The road from next-generation sequencing to personalized medicine. Pers. Med. 11(5), 523–544 (2014)
Article Google Scholar
DePristo, M., Banks, E., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)
Article Google Scholar
Stephens, Z., Lee, S., Faghri, F., Campbell, R., Zhai, C., Efron, M., et al.: Big data: Astronomical or genomical? PLoS Biol. 13(7) (2015)
Google Scholar
Supermicro (2016). www.supermicro.com
IOzone: File system benchmarking (2016). www.iozone.org
PetaGene (2016). www.petagene.com/
Greenfield, D., Stegle, O., Rrustemi, A.: GeneCodeq: quality score compression and improved genotyping using a bayesian framework. Bioinformatics 32(20), 3124–3132 (2016)
Article Google Scholar
Ali, A.A., El-Kalioby, M., Abouelhoda, M.: Supporting bioinformatics applications with hybrid multi-cloud services. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 415–425. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16483-0_41
Google Scholar
Ali, A.A., El-Kalioby, M., Abouelhoda, M.: The case for docker in multicloud enabled bioinformatics applications. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 587–601. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31744-1_52
Chapter Google Scholar

Download references

Acknowledgments

This publication was supported by the Saudi Human Genome Project, King Abdulaziz City for Science and Technology (KACST). Our thanks to Majed Alelaiwi, Gabriele Paciucci, Adam Roe and Ahmad Al-jeshi of Intel for their collaboration throughout the project. Our thanks to Majed Alelaiwi, Gabriele Paciucci, Craig Rhodes, Adam Roe and Ahmad Al-jeshi of Intel for their collaboration throughout the project. We would also like to thank Faheem Karim and Martin Galle from Supermicro on their advice on chassis and configuration. We would like to thank Vaughn Wittorff and Dan Greenfield of PetaGene (Cambridge, UK) for allowing us to use their test compression runs.

Author information

Authors and Affiliations

Corp. (UK) Ltd., London, UK
Gaurav Kaul
King Faisal Specialist Hospital and Research Center (KFSHRC), Riyadh, Saudi Arabia
Zeeshan Ali Shah & Mohamed Abouelhoda
Saudi Human Genome Program, King Abdulaziz City for Science and Technology (KACST), Riyadh, Saudi Arabia
Zeeshan Ali Shah & Mohamed Abouelhoda

Authors

Gaurav Kaul
View author publications
You can also search for this author in PubMed Google Scholar
Zeeshan Ali Shah
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Abouelhoda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Abouelhoda .

Editor information

Editors and Affiliations

Universidad de Granada, Granada, Spain
Ignacio Rojas
Universidad de Granada, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaul, G., Shah, Z.A., Abouelhoda, M. (2017). A High Performance Storage Appliance for Genomic Data. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-56154-7_43
Published: 01 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics