Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Does haplotype diversity predict power for association mapping of disease susceptibility?


Many recent studies have established that haplotype diversity in a small region may not be greatly diminished when the number of markers is reduced to a smaller set of “haplotype-tagging” single-nucleotide polymorphisms (SNPs) that identify the most common haplotypes. These studies are motivated by the assumption that retention of haplotype diversity assures retention of power for mapping disease susceptibility by allelic association. Using two bodies of real data, three proposed measures of diversity, and regression-based methods for association mapping, we found no scenario for which this assumption was tenable. We compared the chi-square for composite likelihood and the maximum chi-square for single SNPs in diplotypes, excluding the marker designated as causal. All haplotype-tagging methods conserve haplotype diversity by selecting common SNPs. When the causal marker has a range of allele frequencies as in real data, chi-square decreases faster than under random selection as the haplotype-tagging set diminishes. Selecting SNPs by maximizing haplotype diversity is inefficient when their frequency is much different from the unknown frequency of the causal variant. Loss of power is minimized when the difference between minor allele frequencies of the causal SNP and a closely associated marker SNP is small, which is unlikely in ignorance of the frequency of the causal SNP unless dense markers are used. Therefore retention of haplotype diversity in simulations that do not mirror genomic allele frequencies has no relevance to power for association mapping. TagSNPs that are assigned to bins instead of haplotype blocks also lose power compared with random SNPs. This evidence favours a multi-stage design in which both models and density change adaptively.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. Ackerman H, Usen S, Mott R, Richardson A, Sisay-Joof F, Katundu P, Taylor T, Ward R, Molyneux M, Pinder M, Kwiatkowski DP (2003) Haplotype analysis of the TNF locus by association efficiency and entropy. Genome Biol 4:R24

  2. Botstein D, Risch N (2003) Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nat Genet 33 (Suppl): 228–237

  3. Burgner D, Usen S, Rockett K, Jallow M, Ackerman H, Cervino A, Pinder M, Kwiatkowski DP (2003) Nucleotide and haplotypic diversity of the NOS2A promoter region and its relationship to cerebral malaria. Hum Genet 112:379–386

  4. Cardon LR, Abecasis GR (2003) Using haplotype blocks to map human complex trait loci. Trends Genet 19:135–140

  5. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analysis using linkage disequilibrium. Am J Hum Genet 74:106–120

  6. Clark AG (2003) Finding genes underlying risk of complex disease by linkage disequilibrium mapping. Curr Opin Genet Dev 13:296–302

  7. Collins A, Morton NE (1998) Mapping a disease locus by allelic association. Proc Natl Acad Sci USA 95:1741–1745

  8. Couzin J (2002) Genomics. New mapping project splits the community. Science 296:1391–1393

  9. Crow JF, Kimura M (1970) An introduction to population genetics theory. Harper and Row, New York

  10. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES (2001) High-resolution haplotype structure in the human genome. Nat Genet 29:229–232

  11. Devlin B, Risch N, Roeder K (1996) Disequilibrium mapping: composite likelihood for pairwise disequilibrium. Genomics 36:1–16

  12. Jeffreys AJ, Kauppi L, Neumann R (2001) Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 29:217–222

  13. Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, Cordell HJ, Eaves IA, Dudbridge F, Twells RC, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough SC, Clayton DG, Todd JA (2001) Haplotype tagging for the identification of common disease genes. Nat Genet 29:233–237

  14. Ke X, Hunt S, Tapper W, Lawrence R, Stavrides G, Ghori J, Whittaker P, Collins A, Morris AP, Bentley D, Cardon LR, Deloukas P (2004) The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet 13:577–588

  15. Kruglyak L, Nickerson DA (2001) Variation is the spice of life. Nat Genet 27:234–236

  16. Lonjou C, Zhang W, Collins A, Tapper WJ, Elahi E, Maniatis N, Morton NE (2003) Linkage disequilibrium in human populations. Proc Natl Acad Sci USA 100:6069–6074

  17. Malecot G (1969) The Mathematics of Heredity. Freeman, San Francisco

  18. Malecot G (1973) Isolation by distance. In: Morton NE (ed) Genetic Structure of Populations. University of Hawaii Press, Honolulu, pp 72–75

  19. Maniatis N, Collins A, Xu CF, McCarthy LC, Hewett DR, Tapper W, Ennis S, Ke X, Morton NE (2002) The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis. Proc Natl Acad Sci USA 99:2228–2233

  20. Maniatis N, Collins A, Gibson J, Zhang W, Tapper W, Morton NE (2004) Positional cloning by linkage disequilibrium. Am J Hum Genet 74:846–855

  21. Meng Z, Zaykin DV, Xu C-F, Wagner M, Ehm MG (2003) Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am J Hum Genet 73:115–130

  22. Morris AP, Whittaker JC, Balding DJ (2002) Fine-scale mapping of disease loci via shattered coalescent modelling of genealogies. Am J Hum Genet 70:686–707

  23. Morton NE (1955) Sequential tests for the detection of linkage. Am J Hum Genet 7:277–318

  24. Morton NE, Zhang W, Taillon-Miller P, Ennis S, Kwok PY, Collins A (2001) The optimal measure of allelic association. Proc Natl Acad Sci USA 98:5217–5221

  25. Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719–1723

  26. Pritchard JK (2001) Are rare variants responsible for susceptibility to common diseases? Am J Hum Genet 69:124–137

  27. Pritchard JK, Cox NJ (2002) The allelic architecture of human disease genes: common disease-common variant...or not? Hum Mol Genet 11:2417–2423

  28. Reich DE, Lander ES (2001) On the allelic spectrum of human disease. Trends Genet 17:502–510

  29. Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517

  30. Sebastiani P, Lazarus R, Weiss ST, Kunkel LM, Kohane IS, Romani MF (2003) Minimal haplotype tagging. Proc Natl Acad Sci USA 100:9900–9905

  31. Shannon CE (1948) A mathematical theory of communication. Bell System Tech J 27:379–423, 623–656

  32. Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:9440–9445

  33. Stram DO, Haiman CA, Hirschhorn JN, Altshuler D, Kolonel LN, Henderson BE, Pike MC (2003) Choosing haplotype-tagging SNPs based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the multiethnic cohort study. Hum Hered 55:27–36

  34. Terwilliger JD (2000) A likelihood-based extended admixture model of oligogenic inheritance in ‘model-based’ and ‘model-free’ analysis. Eur J Hum Genet 8:399–406

  35. Wang WY, Todd JA (2003) The usefulness of different density SNP maps for disease association studies of common variants. Hum Mol Genet 12:3145–3149

  36. Weiss KM, Clark AG (2002) Linkage disequilibrium and the mapping of complex human traits. Trends Genet 18:19–24

  37. Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered 53:79–91

  38. Zhang K, Calabrese P, Nordborg M, Sun F (2002a) Haplotype block structure and its applications to association studies: power and study designs. Am J Hum Genet 71:1386–1394

  39. Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002b) A dynamic programming algorithm for haplotype block partitioning. Proc Natl Acad Sci USA 99:7335–7339

  40. Zhang W, Collins A, Maniatis N, Tapper W, Morton NE (2002c) Properties of linkage disequilibrium (LD) maps. Proc Natl Acad Sci USA 99:17004–17007

  41. Zhao H, Pfeiffer R, Gail M (2003) How useful are the tagging SNPs for identifying complex disease genes? Am J Hum Genet 73 (Suppl): 216

Download references


We are grateful to Alec Jeffreys and Mark Daly for making their data publicly available. We thank Daniel Stram and Kui Zhang for the tagSNPs and HapBlock programs and suggestions in using them. This work was supported by a grant from the Medical Research Council.

Author information

Correspondence to Newton E. Morton.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zhang, W., Collins, A. & Morton, N.E. Does haplotype diversity predict power for association mapping of disease susceptibility?. Hum Genet 115, 157–164 (2004).

Download citation


  • Linkage Disequilibrium
  • Minor Allele Frequency
  • Relative Efficiency
  • Association Mapping
  • Haplotype Diversity