Skip to main content

Bioinformatic Tools for Identifying Disease Gene and SNP Candidates

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 628))

Abstract

As databases of genome data continue to grow, our understanding of the functional elements of the genome grows as well. Many genetic changes in the genome have now been discovered and characterized, including both disease-causing mutations and neutral polymorphisms. In addition to experimental approaches to characterize specific variants, over the past decade, there has been intense bioinformatic research to understand the molecular effects of these genetic changes. In addition to genomic experimental assays, the bioinformatic efforts have focused on two general areas. First, researchers have annotated genetic variation data with molecular features that are likely to affect function. Second, statistical methods have been developed to predict mutations that are likely to have a molecular effect. In this protocol manuscript, methods for understanding the molecular functions of single nucleotide polymorphisms (SNPs) and mutations are reviewed and described. The intent of this chapter is to provide an introduction to the online tools that are both easy to use and useful.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Mooney, S. (2005) Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief Bioinform, 6, 44-56.

    Article  PubMed  CAS  Google Scholar 

  2. Ng, P.C. and Henikoff, S. (2006) Predicting the effects of amino Acid substitutions on protein function. Annu Rev Genomics Hum Genet, 7, 61-80.

    Article  PubMed  CAS  Google Scholar 

  3. Steward, R.E., MacArthur, M.W., Laskowski, R.A. and Thornton, J.M. (2003) Molecular basis of inherited diseases: a structural perspective. Trends Genet, 19, 505-513.

    Article  PubMed  CAS  Google Scholar 

  4. Cooper, D.N., Stenson, P.D. and Chuzhanova, N.A. (2006) The Human Gene Mutation Database (HGMD) and its exploitation in the study of mutational mechanisms. Curr Protoc Bioinformatics, Chapter 1, Unit 1.13.

  5. Hamosh, A., Scott, A.F., Amberger, J., Valle, D. and McKusick, V.A. (2000) Online Mendelian Inheritance in Man (OMIM). Hum Mutat, 15, 57-61.

    Article  PubMed  CAS  Google Scholar 

  6. Altman, R.B. (2007) PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. Nat Genet, 39, 426.

    Article  PubMed  CAS  Google Scholar 

  7. Mailman, M.D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., et al. (2007) The NCBI dbGaP database of genotypes and phenotypes. Nat Genet, 39, 1181-1186.

    Article  PubMed  CAS  Google Scholar 

  8. Sjoblom, T., Jones, S., Wood, L.D., Parsons, D.W., Lin, J., Barber, T.D., et al. (2006) The consensus coding sequences of human breast and colorectal cancers. Science, 314, 268-274.

    Article  PubMed  Google Scholar 

  9. Greenman, C., Stephens, P., Smith, R., Dalgliesh, G.L., Hunter, C., Bignell, G., et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature, 446, 153-158.

    Article  PubMed  CAS  Google Scholar 

  10. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C. and Ferrin, T.E. (2004) UCSF Chimera - a visualization system for exploratory research and analysis. J Comput Chem, 25, 1605-1612.

    Article  PubMed  CAS  Google Scholar 

  11. Chen, R., Morgan, A.A., Dudley, J., Deshpande, T., Li, L., Kodama, K., Chiang, A.P. and Butte, A.J. (2008) FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease. Genome Biol, 9, R170.

    Article  PubMed  Google Scholar 

  12. Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., et al. (2006) Gene prioritization through genomic data fusion. Nat Biotechnol, 24, 537-544.

    Article  PubMed  CAS  Google Scholar 

  13. van Driel, M.A., Cuelenaere, K., Kemmeren, P.P., Leunissen, J.A., Brunner, H.G. and Vriend, G. (2005) GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res, 33, W758-W761.

    Article  PubMed  Google Scholar 

  14. Perez-Iratxeta, C., Wjst, M., Bork, P. and Andrade, M.A. (2005) G2D: a tool for mining genes associated with disease. BMC Genet, 6, 45.

    Article  PubMed  Google Scholar 

  15. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25, 25-29.

    Article  PubMed  CAS  Google Scholar 

  16. Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. and Pickard, B.S. (2006) SUSPECTS: enabling fast and effective prioritization of positional candidates. Bioinformatics, 22, 773-774.

    Article  PubMed  CAS  Google Scholar 

  17. Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. and Pickard, B.S. (2005) Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinfor­matics, 6, 55.

    Article  PubMed  Google Scholar 

  18. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., et al. (2007) New developments in the InterPro database. Nucleic Acids Res, 35, D224-D228.

    Article  PubMed  CAS  Google Scholar 

  19. Rossi, S., Masotti, D., Nardini, C., Bonora, E., Romeo, G., Macii, E., et al. (2006) TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res, 34, W285-W292.

    Article  PubMed  CAS  Google Scholar 

  20. Franke, L., van Bakel, H., Fokkens, L., de Jong, E.D., Egmont-Petersen, M. and Wijmenga, C. (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet, 78, 1011-1025.

    Article  PubMed  CAS  Google Scholar 

  21. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. and Kanehisa, M. (1999) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res, 27, 29-34.

    Article  PubMed  CAS  Google Scholar 

  22. Bader, G.D., Betel, D. and Hogue, C.W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res, 31, 248-250.

    Article  PubMed  CAS  Google Scholar 

  23. Peri, S., Navarro, J.D., Kristiansen, T.Z., Amanchy, R., Surendranath, V., Muthusamy, B., et al. (2004) Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res, 32, D497-D501.

    Article  PubMed  CAS  Google Scholar 

  24. Mishra, G.R., Suresh, M., Kumaran, K., Kannabiran, N., Suresh, S., Bala, P., et al. (2006) Human protein reference database - 2006 update. Nucleic Acids Res, 34, D411-D414.

    Article  PubMed  CAS  Google Scholar 

  25. George, R.A., Liu, J.Y., Feng, L.L., Bryson-Richardson, R.J., Fatkin, D. and Wouters, M.A. (2006) Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res, 34, e130.

    Article  PubMed  Google Scholar 

  26. Radivojac, P., Peng, K., Clark, W.T., Peters, B.J., Mohan, A., Boyle, S.M. and Mooney, S.D. (2008) An integrated approach to inferring gene-disease associations in humans. Proteins, 72, 1030-1037.

    Article  PubMed  CAS  Google Scholar 

  27. Tiffin, N., Adie, E., Turner, F., Brunner, H.G., van Driel, M.A., Oti, M., et al. (2006) Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res, 34, 3067-3081.

    Article  PubMed  CAS  Google Scholar 

  28. Turner, F.S., Clutterbuck, D.R. and Semple, C.A. (2003) POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol, 4, R75.

    Article  PubMed  Google Scholar 

  29. Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res, 31, 51-54.

    Article  PubMed  CAS  Google Scholar 

  30. Birney, E., Andrews, D., Bevan, P., Caccamo, M., Cameron, G., Chen, Y., et al. (2004) Ensembl 2004. Nucleic Acids Res, 32 Database issue, D468-D470.

    Article  PubMed  CAS  Google Scholar 

  31. Laskowski, R.A. and Thornton, J.M. (2008) Understanding the molecular machinery of genetics through 3D structures. Nat Rev Genet, 9, 141-151.

    Article  PubMed  CAS  Google Scholar 

  32. Karchin, R., Diekhans, M., Kelly, L., Thomas, D.J., Pieper, U., Eswar, N., et al. (2005) LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics, 21, 2814-2820.

    Article  PubMed  CAS  Google Scholar 

  33. Yue, P., Melamud, E. and Moult, J. (2006) SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics, 7, 166.

    Article  PubMed  Google Scholar 

  34. Singh, A., Olowoyeye, A., Baenziger, P.H., Dantzer, J., Kann, M.G., Radivojac, P., et al. (2007) MutDB: update on development of tools for the biochemical analysis of genetic variation. Nucleic Acids Res, 36 (Database issue), D815-D819.

    Article  PubMed  Google Scholar 

  35. Jegga, A.G., Gowrisankar, S., Chen, J. and Aronow, B.J. (2007) PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease. Nucleic Acids Res, 35, D700-D706.

    Article  PubMed  CAS  Google Scholar 

  36. Pieper, U., Eswar, N., Braberg, H., Madhusudhan, M.S., Davis, F.P., Stuart, A.C., et al. (2004) MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res, 32 Database issue, D217-D222.

    Article  PubMed  CAS  Google Scholar 

  37. Youn, E., Peters, B., Radivojac, P. and Mooney, S.D. (2006) Evaluation of features for catalytic residue prediction in novel folds. Protein Sci, 16, 216-226.

    Article  PubMed  Google Scholar 

  38. Ofran, Y. and Rost, B. (2003) Predicted protein-protein interaction sites from local sequence information. FEBS Lett, 544, 236-239.

    Article  PubMed  CAS  Google Scholar 

  39. Iakoucheva, L.M., Radivojac, P., Brown, C.J., O’Connor, T.R., Sikes, J.G., Obradovic, Z. and Dunker, A.K. (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res, 32, 1037-1049.

    Article  PubMed  CAS  Google Scholar 

  40. Wang, Z. and Moult, J. (2001) SNPs, protein structure, and disease. Hum Mutat, 17, 263-270.

    Article  PubMed  Google Scholar 

  41. Ye, Y., Li, Z. and Godzik, A. (2006) Modeling and analyzing three-dimensional structures of human disease proteins. Pac Symp Biocomput, 11, 439-446.

    Google Scholar 

  42. Radivojac, P., Baenziger, P.H., Kann, M.G., Mort, M.E., Hahn, M.W. and Mooney, S.D. (2008) Gain and loss of phosphorylation sites in human cancer. Bioinformatics, 24, i241-i247.

    Article  PubMed  Google Scholar 

  43. UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res, 36, D190-D195.

    Article  Google Scholar 

  44. Wang, P., Dai, M., Xuan, W., McEachin, R.C., Jackson, A.U., Scott, L.J., et al. (2006) SNP Function Portal: a web database for exploring the function implication of SNP alleles. Bioinformatics, 22, e523-e529.

    Article  PubMed  CAS  Google Scholar 

  45. Reumers, J., Maurer-Stroh, S., Schymkowitz, J. and Rousseau, F. (2006) SNPeffect v2.0: a new step in investigating the molecular phenotypic effects of human non-synonymous SNPs. Bioinformatics, 22, 2183-2185.

    Article  PubMed  CAS  Google Scholar 

  46. Conde, L., Vaquerizas, J.M., Santoyo, J., Al-Shahrour, F., Ruiz-Llorente, S., Robledo, M. and Dopazo, J. (2004) PupaSNP Finder: a web tool for finding SNPs with putative effect at transcriptional level. Nucleic Acids Res, 32, W242-W248.

    Article  PubMed  CAS  Google Scholar 

  47. Reumers, J., Conde, L., Medina, I., Maurer-Stroh, S., Van Durme, J., Dopazo, J., et al. (2008) Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases. Nucleic Acids Res, 36, D825-D829.

    Article  PubMed  CAS  Google Scholar 

  48. Cai, Z., Tsung, E.F., Marinescu, V.D., Ramoni, M.F., Riva, A. and Kohane, I.S. (2004) Bayesian approach to discovering pathogenic SNPs in conserved protein domains. Hum Mutat, 24, 178-184.

    Article  PubMed  CAS  Google Scholar 

  49. Chasman, D. and Adams, R.M. (2001) Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J Mol Biol, 307, 683-706.

    Article  PubMed  CAS  Google Scholar 

  50. Krishnan, V.G. and Westhead, D.R. (2003) A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function. Bioinformatics, 19, 2199-2209.

    Article  PubMed  CAS  Google Scholar 

  51. Saunders, C.T. and Baker, D. (2002) Evalua­tion of structural and evolutionary contributions to deleterious mutation prediction. J Mol Biol, 322, 891-901.

    Article  PubMed  CAS  Google Scholar 

  52. Vitkup, D., Sander, C. and Church, G.M. (2003) The amino-acid mutational spectrum of human genetic disease. Genome Biol, 4, R72.

    Article  PubMed  Google Scholar 

  53. Care, M.A., Needham, C.J., Bulpitt, A.J. and Westhead, D.R. (2007) Deleterious SNP prediction: be mindful of your training data! Bioinformatics, 23, 664-672.

    Article  PubMed  CAS  Google Scholar 

  54. Ferrer-Costa, C., Gelpi, J.L., Zamakola, L., Parraga, I., de la Cruz, X. and Orozco, M. (2005) PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics, 21, 3176-3178.

    Article  PubMed  CAS  Google Scholar 

  55. Ramensky, V., Bork, P. and Sunyaev, S. (2002) Human non-synonymous SNPs: server and survey. Nucleic Acids Res, 30, 3894-3900.

    Article  PubMed  CAS  Google Scholar 

  56. Ng, P.C. and Henikoff, S. (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res, 31, 3812-3814.

    Article  PubMed  CAS  Google Scholar 

  57. Ye, Z.Q., Zhao, S.Q., Gao, G., Liu, X.Q., Langlois, R.E., Lu, H. and Wei, L. (2007) Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics, 23, 1444-1450.

    Article  PubMed  CAS  Google Scholar 

  58. Bromberg, Y. and Rost, B. (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res, 35, 3823-3835.

    Article  PubMed  CAS  Google Scholar 

  59. Tian, J., Wu, N., Guo, X., Guo, J., Zhang, J. and Fan, Y. (2007) Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC Bioinformatics, 8, 450.

    Article  PubMed  Google Scholar 

  60. Mi, H., Lazareva-Ulitsky, B., Loo, R., Kejariwal, A., Vandergriff, J., Rabkin, S., et al. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res, 33, D284-D288.

    Article  PubMed  CAS  Google Scholar 

  61. Wang, G.S. and Cooper, T.A. (2007) Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet, 8, 749-761.

    Article  PubMed  CAS  Google Scholar 

  62. Freimuth, R.R., Stormo, G.D. and McLeod, H.L. (2005) PolyMAPr: programs for polymorphism database mining, annotation, and functional analysis. Hum Mutat, 25, 110-117.

    Article  PubMed  CAS  Google Scholar 

  63. Smith, P.J., Zhang, C., Wang, J., Chew, S.L., Zhang, M.Q. and Krainer, A.R. (2006) An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Hum Mol Genet, 15, 2490-2508.

    Article  PubMed  CAS  Google Scholar 

  64. Yvert, G., Brem, R.B., Whittle, J., Akey, J.M., Foss, E., Smith, E.N., et al. (2003) Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet, 35, 57-64.

    Article  PubMed  CAS  Google Scholar 

  65. Hudson, T.J. (2003) Wanted: regulatory SNPs. Nat Genet, 33, 439-440.

    Article  PubMed  CAS  Google Scholar 

  66. Pruitt, K.D. and Maglott, D.R. (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res, 29, 137-140.

    Article  PubMed  CAS  Google Scholar 

  67. Riva, A. and Kohane, I.S. (2002) SNPper: retrieval and analysis of human SNPs. Bioinformatics, 18, 1681-1685.

    Article  PubMed  CAS  Google Scholar 

  68. Kim, B.C., Kim, W.Y., Park, D., Chung, W.H., Shin, K.S. and Bhak, J. (2008) SNP@Promoter: a database of human SNPs (single nucleotide polymorphisms) within the putative promoter regions. BMC Bioinformatics, 9 Suppl 1, S2.

    Article  PubMed  Google Scholar 

  69. Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res, 31, 374-378.

    Article  PubMed  CAS  Google Scholar 

  70. Chen, K. and Rajewsky, N. (2006) Natural selection on human microRNA binding sites inferred from SNP data. Nat Genet, 38, 1452-1456.

    Article  PubMed  CAS  Google Scholar 

  71. Montgomery, S.B., Griffith, O.L., Schuetz, J.M., Brooks-Wilson, A. and Jones, S.J. (2007) A survey of genomic properties for the detection of regulatory polymorphisms. PLoS Comput Biol, 3, e106.

    Article  PubMed  Google Scholar 

  72. Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U. and Gaul, U. (2008) Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature, 451, 535-540.

    Article  PubMed  CAS  Google Scholar 

  73. Kawabata, T., Ota, M. and Nishikawa, K. (1999) The Protein Mutant Database. Nucleic Acids Res, 27, 355-357.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

We are graciously supported by K22LM009135 (PI: Mooney), R01LM009722 (PI: Mooney), P01AG018397 (PI: Econs), U01GM061373 (PI: Flockhart), and the Indiana Genomics Initiative. The Indiana Genomics Initiative (INGEN) is supported in part by the Lilly Endowment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean D. Mooney .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Mooney, S.D., Krishnan, V.G., Evani, U.S. (2010). Bioinformatic Tools for Identifying Disease Gene and SNP Candidates. In: Barnes, M., Breen, G. (eds) Genetic Variation. Methods in Molecular Biology, vol 628. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-367-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-367-1_17

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60327-366-4

  • Online ISBN: 978-1-60327-367-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics