Skip to main content
Log in

The amino-acid mutational spectrum of human genetic disease

  • Research
  • Published:
Genome Biology Aims and scope Submit manuscript

Abstract

Background

Nonsynonymous mutations in the coding regions of human genes are responsible for phenotypic differences between humans and for susceptibility to genetic disease. Computational methods were recently used to predict deleterious effects of nonsynonymous human mutations and polymorphisms. Here we focus on understanding the amino-acid mutation spectrum of human genetic disease. We compare the disease spectrum to the spectra of mutual amino-acid mutation frequencies, non-disease polymorphisms in human genes, and substitutions fixed between species.

Results

We find that the disease spectrum correlates well with the amino-acid mutation frequencies based on the genetic code. Normalized by the mutation frequencies, the spectrum can be rationalized in terms of chemical similarities between amino acids. The disease spectrum is almost identical for membrane and non-membrane proteins. Mutations at arginine and glycine residues are together responsible for about 30% of genetic diseases, whereas random mutations at tryptophan and cysteine have the highest probability of causing disease.

Conclusions

The overall disease spectrum mainly reflects the mutability of the genetic code. We corroborate earlier results that the probability of a nonsynonymous mutation causing a genetic disease increases monotonically with an increase in the degree of evolutionary conservation of the mutation site and a decrease in the solvent-accessibility of the site; opposite trends are observed for non-disease polymorphisms. We estimate that the rate of nonsynonymous mutations with a negative impact on human health is less than one per diploid genome per generation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  1. Wang Z, Moult J: SNPs, protein structure, and disease. Hum Mutat. 2001, 17: 263-270. 10.1002/humu.22.

    Article  PubMed  Google Scholar 

  2. Sunyaev S, Ramensky V, Koch I, Lathe W, Kondrashov AS, Bork P: Prediction of deleterious human alleles. Hum Mol Genet. 2001, 10: 591-597. 10.1093/hmg/10.6.591.

    Article  PubMed  CAS  Google Scholar 

  3. Ng PC, Henikoff S: Predicting deleterious amino acid substitutions. Genome Res. 2001, 11: 863-874. 10.1101/gr.176601.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  4. Chasman D, Adams M: Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J Mol Biol. 2001, 307: 683-706. 10.1006/jmbi.2001.4510.

    Article  PubMed  CAS  Google Scholar 

  5. Miller MP, Kumar S: Understanding human disease mutations through the use of interspecific variation. Hum Mol Genet. 2001, 10: 2319-2328. 10.1093/hmg/10.21.2319.

    Article  PubMed  CAS  Google Scholar 

  6. Terp BN, Cooper DN, Christensen IT, Jorgensen FS, Bross P, Gregersen N, Krawczak M: Assessing the relative importance of the biophysical properties of amino acid substitutions associated with human genetic disease. Hum Mutat. 2002, 20: 98-109. 10.1002/humu.10095.

    Article  PubMed  CAS  Google Scholar 

  7. McKusick VA: Mendelian Inheritance in Man. Catalogs of Human Genes and Genetic Disorders. 1998, Baltimore: John Hopkins University Press, 12

    Google Scholar 

  8. Bairoch A, Apweiler R: The SWISS-PROT protein sequence data bank and its new supplement TrEMBL. Nucleic Acids Res. 1996, 24: 21-25. 10.1093/nar/24.1.21.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  9. Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, Stanley SE, Jiang R, Messer CJ, Chew A, Han JH, et al: Haplotype variation and linkage disequilibrium in 313 human genes. Science. 2001, 293: 489-493. 10.1126/science.1059431.

    Article  PubMed  CAS  Google Scholar 

  10. Dayhoff MO: A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure. Edited by: Silver Spring: National Biomedical Research Foundation. 1978, Dayhoff MO, 345-352.

    Google Scholar 

  11. Halushka MK, Fan JB, Bentley K, Hsie L, Shen N, Weder A, Cooper R, Lipshutz R, Chakravarti A: Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet. 1999, 22: 239-247. 10.1038/10297.

    Article  PubMed  CAS  Google Scholar 

  12. Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Shaw N, Lane CR, Lim EP, Kalyanaraman N, et al: Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet. 1999, 22: 231-238. 10.1038/10290.

    Article  PubMed  CAS  Google Scholar 

  13. Hess ST, Blake JD, Blake RD: Wide variations in neighbor-dependent substitution rates. J Mol Biol. 1994, 236: 1022-1033. 10.1016/0022-2836(94)90009-4.

    Article  PubMed  CAS  Google Scholar 

  14. Sonnhammer EL, von Heijne G, Krogh A: A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol. 1998, 6: 175-182.

    PubMed  CAS  Google Scholar 

  15. Benner SA, Cohen MA, Gonnet GH: Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng. 1994, 7: 1323-1332.

    Article  PubMed  CAS  Google Scholar 

  16. Cooper DN, Youssoufian H: The CpG dinucleotide and human genetic disease. Hum Genet. 1988, 78: 151-155. 10.1007/BF00278187.

    Article  PubMed  CAS  Google Scholar 

  17. Krawczak M, Ball EV, Cooper DN: Neighboring-nucleotide effects on the rates of germ-line single base-pair substitution in human genes. Am J Hum Genet. 1998, 63: 474-488. 10.1086/301965.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  18. Ng PC, Henikoff S: Accounting for human polymorphisms predicted to affect protein function. Genome Res. 2002, 12: 436-446. 10.1101/gr.212802.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  19. Ramensky V, Bork P, Sunyaev S: Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002, 30: 3894-3900. 10.1093/nar/gkf493.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  20. Ferrer-Costa C, Orozco M, de la Cruz X: Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J Mol Biol. 2002, 315: 771-786. 10.1006/jmbi.2001.5255.

    Article  PubMed  CAS  Google Scholar 

  21. Bustamante CD, Townsend JP, Hartl DL: Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol Biol Evol. 2000, 17: 301-308.

    Article  PubMed  CAS  Google Scholar 

  22. Grantham R: Amino acid difference formula to help explain protein evolution. Science. 1974, 185: 862-864.

    Article  PubMed  CAS  Google Scholar 

  23. Fay JC, Wyckoff GJ, Wu CI: Positive and negative selection on the human genome. Genetics. 2001, 158: 1227-1234.

    PubMed  CAS  PubMed Central  Google Scholar 

  24. Terwilliger JD, Haghighi F, Heikkalinna TS, Goring HH: A biased assessment of the use of SNPs in human complex traits. Curr Opin Genet Dev. 2002, 12: 726-734. 10.1016/S0959-437X(02)00357-X.

    Article  PubMed  CAS  Google Scholar 

  25. Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN: Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet. 2003, 33: 177-182. 10.1038/ng1071.

    Article  PubMed  CAS  Google Scholar 

  26. Olins PO, Bauer SC, Braford-Goldberg S, Sterbenz K, Polazzi JO, Caparon MH, Klein BK, Easton AM, Paik K, Klover JA, et al: Saturation mutagenesis of human interleukin-3. J Biol Chem. 1995, 270: 23754-23760. 10.1074/jbc.270.40.23754.

    Article  PubMed  CAS  Google Scholar 

  27. Huang W, Petrosino J, Hirsch M, Shenkin PS, Palzkill T: Amino acid sequence determinants of beta-lactamase structure and activity. J Mol Biol. 1996, 258: 688-703. 10.1006/jmbi.1996.0279.

    Article  PubMed  CAS  Google Scholar 

  28. Pakula AA, Sauer RT: Genetic analysis of protein stability and function. Annu Rev Genet. 1989, 23: 289-310. 10.1146/annurev.ge.23.120189.001445.

    Article  PubMed  CAS  Google Scholar 

  29. Matthews BW: Structural and genetic analysis of the folding and function of T4 lysozyme. FASEB J. 1996, 10: 35-41.

    PubMed  CAS  Google Scholar 

  30. Nachman MW, Crowell SL: Estimate of the mutation rate per nucleotide in humans. Genetics. 2000, 156: 297-304.

    PubMed  CAS  PubMed Central  Google Scholar 

  31. Eyre-Walker A, Keightley PD: High genomic deleterious mutation rates in hominids. Nature. 1999, 397: 344-347. 10.1038/16915.

    Article  PubMed  CAS  Google Scholar 

  32. Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992, 89: 10915-10919.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  33. Templeton AR, Clark AG, Weiss KM, Nickerson DA, Boerwinkle E, Sing CF: Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am J Hum Genet. 2000, 66: 69-83. 10.1086/302699.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  34. Zavolan M, Kepler TB: Statistical inference of sequence-dependent mutation rates. Curr Opin Genet Dev. 2001, 11: 612-615. 10.1016/S0959-437X(00)00242-2.

    Article  PubMed  CAS  Google Scholar 

  35. Rogozin I, Kondrashov F, Glazko G: Use of mutation spectra analysis software. Hum Mutat. 2001, 17: 83-102. 10.1002/1098-1004(200102)17:2<83::AID-HUMU1>3.0.CO;2-E.

    Article  PubMed  CAS  Google Scholar 

  36. Wootton JC, Federhen S: Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 1996, 266: 554-571.

    Article  PubMed  CAS  Google Scholar 

  37. Holm L, Sander C: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics. 1998, 14: 423-429. 10.1093/bioinformatics/14.5.423.

    Article  PubMed  CAS  Google Scholar 

  38. Higgins DG, Thomposon JD, Gibson TJ: Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 1996, 266: 383-402.

    Article  PubMed  CAS  Google Scholar 

  39. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  40. Bernstein FC, Koetzle TF, Williams GJB, Meyer EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M: The Protein Data Bank: A computer based archival file for macromolecular structures. J Mol Biol. 1977, 112: 535-542.

    Article  PubMed  CAS  Google Scholar 

  41. Hubbard SJ, Thornton JM: NACCESS Computer Program. 1993, London: Department of Biochemistry and Molecular Biology, University College London

    Google Scholar 

  42. Mount DW: Bioinformatics. 2001, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press

    Google Scholar 

Download references

Acknowledgements

We thank Jay Shendure, John Aach, Patrik D'haeseleer, Daniel Segre, Peter Kharchenko, and Tzachi Pilpel for discussions. This work was supported in part by research grants from the US Department of Energy through the grant DOE DE-FG02-87-ER60565.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George M Church.

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vitkup, D., Sander, C. & Church, G.M. The amino-acid mutational spectrum of human genetic disease. Genome Biol 4, R72 (2003). https://doi.org/10.1186/gb-2003-4-11-r72

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/gb-2003-4-11-r72

Keywords

Navigation