Skip to main content

Gene Presence and Absence in Genomic Big Data for Precision Medicine

  • Conference paper
  • First Online:
Intelligent Computing and Information and Communication

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 673))

  • 1562 Accesses

Abstract

The twenty–first-century precision medicine aims at using a systems-oriented approach to find the root cause of disease specific to an individual by including molecular pathology tests. The challenges of genomic data analysis for precision medicine are multifold, they are a combination of big data, high dimensionality, and with often multimodal distributions. Advanced investigations use techniques such as Next Generation Sequencing (NGS) which rely on complex statistical methods for gaining useful insights. Analysis of the exome and transcriptome data allow for in-depth study of the 22 thousand genes in the human body, many of which relate to phenotype and disease state. Not all genes are expressed in all tissues. In disease state, some genes are even deleted in the genome. Therefore, as part of knowledge discovery, exome and transcriptome big data needs to be analyzed to determine whether a gene is actually absent (deleted/not expressed) or present. In this paper, we present a statistical technique to identify the genes that are present or absent in exome or transcriptome data (big data) to improve the accuracy for precision medicine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Eisenstein, Michael. “Big data: the power of petabytes.” Nature 527.7576 (2015): S2–S4.

    Google Scholar 

  2. Bock, Hans-Hermann, and Edwin Diday, eds. Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer Science & Business Media, 2012.

    Google Scholar 

  3. Morley, Michael, et al. “Genetic analysis of genome-wide variation in human gene expression.” Nature 430.7001 (2004): 743–747.

    Google Scholar 

  4. Ried, Thomas, et al. “Genomic changes defining the genesis, progression, and malignancy potential in solid human tumors: a phenotype/genotype correlation.” Genes, Chromosomes and Cancer 25.3 (1999): 195–204.

    Google Scholar 

  5. Kitano, Hiroaki. “Computational systems biology.” Nature 420.6912 (2002): 206–210.

    Google Scholar 

  6. Maniatis, Tom, Stephen Goodbourn, and Janice A. Fischer. “Regulation of inducible and tissue-specific gene expression.” Science 236 (1987): 1237–1246.

    Google Scholar 

  7. Komura, Daisuke, et al. “Noise reduction from genotyping microarrays using probe level information.” In silico biology 6.1, 2 (2006): 79–92.

    Google Scholar 

  8. Schwartz, Schraga, Ram Oren, and Gil Ast. “Detection and removal of biases in the analysis of next-generation sequencing reads.” PloS one 6.1 (2011): e16685.

    Google Scholar 

  9. Trapnell, Cole, et al. “Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.” Nature protocols 7.3 (2012): 562–578.

    Google Scholar 

  10. iOMICS-Research Version 4.0.

    Google Scholar 

  11. Reynolds, Douglas. “Gaussian mixture models.” Encyclopedia of biometrics (2015): 827–832.

    Google Scholar 

  12. Moon, Todd K. “The expectation-maximization algorithm.” IEEE Signal processing magazine 13.6 (1996): 47–60.

    Google Scholar 

  13. Lappalainen, Tuuli, et al. “Transcriptome and genome sequencing uncovers functional variation in humans.” Nature 501.7468 (2013): 506–511.

    Google Scholar 

  14. Petryszak, Robert, et al. “Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants.” Nucleic acids research (2015): gkv1045.

    Google Scholar 

  15. Pleasance, Erin D., et al. “A comprehensive catalogue of somatic mutations from a human cancer genome.” Nature 463.7278 (2010): 191–196.

    Google Scholar 

  16. Talukder, Asoke K., et al. “Tracking Cancer Genetic Evolution using OncoTrack.” Scientific Reports 6 (2016).

    Google Scholar 

  17. Gracia-Aznarez, Francisco Javier, et al. “Whole exome sequencing suggests much of non-BRCA1/BRCA2 familial breast cancer is due to moderate and low penetrance susceptibility alleles.” PloS one 8.2 (2013): e55681.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asoke K. Talukder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Adhil, M., Agarwal, M., Ghosh, K., Sule, M., Talukder, A.K. (2018). Gene Presence and Absence in Genomic Big Data for Precision Medicine. In: Bhalla, S., Bhateja, V., Chandavale, A., Hiwale, A., Satapathy, S. (eds) Intelligent Computing and Information and Communication. Advances in Intelligent Systems and Computing, vol 673. Springer, Singapore. https://doi.org/10.1007/978-981-10-7245-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7245-1_22

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7244-4

  • Online ISBN: 978-981-10-7245-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics