Abstract
Background
Nonsynonymous mutations in the coding regions of human genes are responsible for phenotypic differences between humans and for susceptibility to genetic disease. Computational methods were recently used to predict deleterious effects of nonsynonymous human mutations and polymorphisms. Here we focus on understanding the amino-acid mutation spectrum of human genetic disease. We compare the disease spectrum to the spectra of mutual amino-acid mutation frequencies, non-disease polymorphisms in human genes, and substitutions fixed between species.
Results
We find that the disease spectrum correlates well with the amino-acid mutation frequencies based on the genetic code. Normalized by the mutation frequencies, the spectrum can be rationalized in terms of chemical similarities between amino acids. The disease spectrum is almost identical for membrane and non-membrane proteins. Mutations at arginine and glycine residues are together responsible for about 30% of genetic diseases, whereas random mutations at tryptophan and cysteine have the highest probability of causing disease.
Conclusions
The overall disease spectrum mainly reflects the mutability of the genetic code. We corroborate earlier results that the probability of a nonsynonymous mutation causing a genetic disease increases monotonically with an increase in the degree of evolutionary conservation of the mutation site and a decrease in the solvent-accessibility of the site; opposite trends are observed for non-disease polymorphisms. We estimate that the rate of nonsynonymous mutations with a negative impact on human health is less than one per diploid genome per generation.
Similar content being viewed by others
References
Wang Z, Moult J: SNPs, protein structure, and disease. Hum Mutat. 2001, 17: 263-270. 10.1002/humu.22.
Sunyaev S, Ramensky V, Koch I, Lathe W, Kondrashov AS, Bork P: Prediction of deleterious human alleles. Hum Mol Genet. 2001, 10: 591-597. 10.1093/hmg/10.6.591.
Ng PC, Henikoff S: Predicting deleterious amino acid substitutions. Genome Res. 2001, 11: 863-874. 10.1101/gr.176601.
Chasman D, Adams M: Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J Mol Biol. 2001, 307: 683-706. 10.1006/jmbi.2001.4510.
Miller MP, Kumar S: Understanding human disease mutations through the use of interspecific variation. Hum Mol Genet. 2001, 10: 2319-2328. 10.1093/hmg/10.21.2319.
Terp BN, Cooper DN, Christensen IT, Jorgensen FS, Bross P, Gregersen N, Krawczak M: Assessing the relative importance of the biophysical properties of amino acid substitutions associated with human genetic disease. Hum Mutat. 2002, 20: 98-109. 10.1002/humu.10095.
McKusick VA: Mendelian Inheritance in Man. Catalogs of Human Genes and Genetic Disorders. 1998, Baltimore: John Hopkins University Press, 12
Bairoch A, Apweiler R: The SWISS-PROT protein sequence data bank and its new supplement TrEMBL. Nucleic Acids Res. 1996, 24: 21-25. 10.1093/nar/24.1.21.
Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, Stanley SE, Jiang R, Messer CJ, Chew A, Han JH, et al: Haplotype variation and linkage disequilibrium in 313 human genes. Science. 2001, 293: 489-493. 10.1126/science.1059431.
Dayhoff MO: A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure. Edited by: Silver Spring: National Biomedical Research Foundation. 1978, Dayhoff MO, 345-352.
Halushka MK, Fan JB, Bentley K, Hsie L, Shen N, Weder A, Cooper R, Lipshutz R, Chakravarti A: Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet. 1999, 22: 239-247. 10.1038/10297.
Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Shaw N, Lane CR, Lim EP, Kalyanaraman N, et al: Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet. 1999, 22: 231-238. 10.1038/10290.
Hess ST, Blake JD, Blake RD: Wide variations in neighbor-dependent substitution rates. J Mol Biol. 1994, 236: 1022-1033. 10.1016/0022-2836(94)90009-4.
Sonnhammer EL, von Heijne G, Krogh A: A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol. 1998, 6: 175-182.
Benner SA, Cohen MA, Gonnet GH: Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng. 1994, 7: 1323-1332.
Cooper DN, Youssoufian H: The CpG dinucleotide and human genetic disease. Hum Genet. 1988, 78: 151-155. 10.1007/BF00278187.
Krawczak M, Ball EV, Cooper DN: Neighboring-nucleotide effects on the rates of germ-line single base-pair substitution in human genes. Am J Hum Genet. 1998, 63: 474-488. 10.1086/301965.
Ng PC, Henikoff S: Accounting for human polymorphisms predicted to affect protein function. Genome Res. 2002, 12: 436-446. 10.1101/gr.212802.
Ramensky V, Bork P, Sunyaev S: Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002, 30: 3894-3900. 10.1093/nar/gkf493.
Ferrer-Costa C, Orozco M, de la Cruz X: Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J Mol Biol. 2002, 315: 771-786. 10.1006/jmbi.2001.5255.
Bustamante CD, Townsend JP, Hartl DL: Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol Biol Evol. 2000, 17: 301-308.
Grantham R: Amino acid difference formula to help explain protein evolution. Science. 1974, 185: 862-864.
Fay JC, Wyckoff GJ, Wu CI: Positive and negative selection on the human genome. Genetics. 2001, 158: 1227-1234.
Terwilliger JD, Haghighi F, Heikkalinna TS, Goring HH: A biased assessment of the use of SNPs in human complex traits. Curr Opin Genet Dev. 2002, 12: 726-734. 10.1016/S0959-437X(02)00357-X.
Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN: Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet. 2003, 33: 177-182. 10.1038/ng1071.
Olins PO, Bauer SC, Braford-Goldberg S, Sterbenz K, Polazzi JO, Caparon MH, Klein BK, Easton AM, Paik K, Klover JA, et al: Saturation mutagenesis of human interleukin-3. J Biol Chem. 1995, 270: 23754-23760. 10.1074/jbc.270.40.23754.
Huang W, Petrosino J, Hirsch M, Shenkin PS, Palzkill T: Amino acid sequence determinants of beta-lactamase structure and activity. J Mol Biol. 1996, 258: 688-703. 10.1006/jmbi.1996.0279.
Pakula AA, Sauer RT: Genetic analysis of protein stability and function. Annu Rev Genet. 1989, 23: 289-310. 10.1146/annurev.ge.23.120189.001445.
Matthews BW: Structural and genetic analysis of the folding and function of T4 lysozyme. FASEB J. 1996, 10: 35-41.
Nachman MW, Crowell SL: Estimate of the mutation rate per nucleotide in humans. Genetics. 2000, 156: 297-304.
Eyre-Walker A, Keightley PD: High genomic deleterious mutation rates in hominids. Nature. 1999, 397: 344-347. 10.1038/16915.
Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992, 89: 10915-10919.
Templeton AR, Clark AG, Weiss KM, Nickerson DA, Boerwinkle E, Sing CF: Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am J Hum Genet. 2000, 66: 69-83. 10.1086/302699.
Zavolan M, Kepler TB: Statistical inference of sequence-dependent mutation rates. Curr Opin Genet Dev. 2001, 11: 612-615. 10.1016/S0959-437X(00)00242-2.
Rogozin I, Kondrashov F, Glazko G: Use of mutation spectra analysis software. Hum Mutat. 2001, 17: 83-102. 10.1002/1098-1004(200102)17:2<83::AID-HUMU1>3.0.CO;2-E.
Wootton JC, Federhen S: Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 1996, 266: 554-571.
Holm L, Sander C: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics. 1998, 14: 423-429. 10.1093/bioinformatics/14.5.423.
Higgins DG, Thomposon JD, Gibson TJ: Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 1996, 266: 383-402.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Bernstein FC, Koetzle TF, Williams GJB, Meyer EF, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M: The Protein Data Bank: A computer based archival file for macromolecular structures. J Mol Biol. 1977, 112: 535-542.
Hubbard SJ, Thornton JM: NACCESS Computer Program. 1993, London: Department of Biochemistry and Molecular Biology, University College London
Mount DW: Bioinformatics. 2001, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press
Acknowledgements
We thank Jay Shendure, John Aach, Patrik D'haeseleer, Daniel Segre, Peter Kharchenko, and Tzachi Pilpel for discussions. This work was supported in part by research grants from the US Department of Energy through the grant DOE DE-FG02-87-ER60565.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Cite this article
Vitkup, D., Sander, C. & Church, G.M. The amino-acid mutational spectrum of human genetic disease. Genome Biol 4, R72 (2003). https://doi.org/10.1186/gb-2003-4-11-r72
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1186/gb-2003-4-11-r72