Haplotype Analysis for Case-Control Data

  • Gang Zheng
  • Yaning Yang
  • Xiaofeng Zhu
  • Robert C. Elston
Part of the Statistics for Biology and Health book series (SBH)


Chapter 7 covers haplotype analysis. It starts with haplotype inference, including an introduction to phase and phase ambiguity and estimation of haplotype frequencies. Haplotype disequilibrium, testing for linkage disequilibrium (LD), haplotype blocks and tagging SNPs are discussed. Two types of tests are considered. The first is haplotype-based association analysis, including the likelihood ratio test, regression method, and haplotype similarity. The second comprise LD contrast tests, including composite LD measures and contrasting LD measures.


Linkage Disequilibrium Haplotype Frequency Linkage Disequilibrium Pattern Haplotype Inference Haplotype Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 2.
    Abecasis, G.R., Cookson, W.O.: GOLD----graphical overview of linkage disequilibrium. Bioinformatics 16, 182–183 (2000) CrossRefGoogle Scholar
  2. 22.
    Brown, D.G., Harrower, I.M.: A new integer programming formulation for the pure parsimony problem in haplotype analysis. In: Jonassen, I., Kim, J. (eds.) Workshop on Algorithms in Bioinformatics. Springer, Berlin (2004) Google Scholar
  3. 23.
    Brown, D.G., Harrower, I.M.: Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 141–154 (2006) CrossRefGoogle Scholar
  4. 37.
    Clark, A.G.: Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7, 111–122 (1990) Google Scholar
  5. 54.
    Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001) CrossRefGoogle Scholar
  6. 58.
    Dempster, A., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Stat. Soc. Ser. B 39, 1–38 (1977) MATHMathSciNetGoogle Scholar
  7. 59.
    Devlin, B., Risch, N.: A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29, 311–322 (1995) CrossRefGoogle Scholar
  8. 69.
    Eitan, Y., Kashi, Y.: Direct micro-haplotyping by multiple double PCR amplifications of specific alleles (MD-PASA). Nucleic Acids Res. 30, e62 (2002) CrossRefGoogle Scholar
  9. 78.
    Epstein, M.P., Satten, G.A.: Inference on haplotype effects in case-control studies using unphased genotype data. Am. J. Hum. Genet. 73, 1316–1329 (2003) CrossRefGoogle Scholar
  10. 80.
    Excoffier, L., Slatkin, M.: Maximum likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol. 12, 921–927 (1995) Google Scholar
  11. 93.
    Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, M., Adeyemo, A., Cooper, R., Ward, R., Lander, E.S., Daly, M.J., Atshuler, D.: The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002) CrossRefGoogle Scholar
  12. 110.
    Gusfield, D.: Inference of haplotypes from samples of diploid populations: complexity and algorithms. J. Comput. Biol. 8, 305–323 (2001) CrossRefGoogle Scholar
  13. 111.
    Gusfield, D.: Haplotype inference by pure parsimony. In: Baesa-Yates, R., Chavez, E., Crochemore, M. (eds.) The 14th Annual Symposium on Combinatorial Pattern Matching (CPM03), pp. 144–155. Springer, Berlin/Heidelberg (2003) CrossRefGoogle Scholar
  14. 113.
    Hamilton, D.C., Cole, D.E.C.: Standardizing a composite measure of linkage disequilibrium. Ann. Hum. Genet. 68, 234–239 (2004) CrossRefGoogle Scholar
  15. 119.
    Hawley, M., Kidd, K.: HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. J. Hered. 86, 409–411 (1995) Google Scholar
  16. 128.
    Hurley, J.D., Engle, L.J., Davis, J.T., Welsh, A.M., Landers, J.E.: A simple, bead-based approach for multi-SNP molecular haplotyping. Nucleic Acids Res. 32, e186 (2005) CrossRefGoogle Scholar
  17. 130.
    Ito, T., Chiku, S., Inoue, E., Tomita, M., Morisaki, T., Morisaki, H., Kamatani, N.: Estimation of haplotype frequencies, linkage disequilibrium measures, and combination of haplotype copies in each pool by use of pooled DNA data. Am. J. Hum. Genet. 72, 384–398 (2003) CrossRefGoogle Scholar
  18. 132.
    Johnson, G.C.L., Esposito, L., Barratt, B.J., Smith, A.N., Heward, J., DiGenova, G., Veda, H., Cordell, H.J., Eaves, I.A., Dudbridge, F., Twells, R.C.J., Payne, F., Hughes, W., Nutland, S., Stevens, H., Carr, P., Tuomilehto-Wolf, E., Tunmilehto, J., Gough, S.C.L., Clayton, D.G., Todd, J.A.: Haplotype tagging for the identification of common disease genes. Nat. Genet. 29, 233–237 (2001) CrossRefGoogle Scholar
  19. 138.
    Jorde, L.B.: Linkage disequilibrium and the search for complex disease genes. Genome Res. 10, 1435–1444 (2000) CrossRefGoogle Scholar
  20. 141.
    Ke, X., Cardon, L.R.: Efficient selective screening of haplotype tag SNPs. Bioinformatics 19, 287–288 (2003) CrossRefGoogle Scholar
  21. 149.
    Konfortov, B.A., Bankier, A.T., Dear, P.H.: An efficient method for multi-locus molecular haplotyping. Nucleic Acids Res. 35, e6 (2007) CrossRefGoogle Scholar
  22. 154.
    Kuk, A.Y.C., Zhang, H., Yang, Y.: Computationally feasible estimation of haplotype frequencies from pooled genotype data with and without assuming Hardy-Weinberg Equilibrium. Bioinformatics 25, 379–386 (2009) CrossRefGoogle Scholar
  23. 171.
    Li, W.: Three lectures on case-control genetic association analysis. Brief. Bioinform. 9, 1–13 (2008) CrossRefGoogle Scholar
  24. 173.
    Lin, D.Y., Zeng, D.: Likelihood-based inference on haplotype effects in genetic association studies. J. Am. Stat. Assoc. 101, 89–104 (2006) CrossRefMATHMathSciNetGoogle Scholar
  25. 174.
    Liu, N., Zhang, K., Zhao, H.: Haplotype-association analysis. Adv. Genet. 60, 335–405 (2008) CrossRefGoogle Scholar
  26. 175.
    Long, J.C., Williams, R.C., Urbanek, M.: An E-M algorithm and testing strategy for multiple-locus haplotypes. Am. J. Hum. Genet. 56, 799–810 (1995) Google Scholar
  27. 182.
    McVean, G.: Linkage disequilibrium, recombination and selection. In: Balding, D.J., Bishop, M., Cannings, C. (eds.) Handbook of Statistical Genetics, 3rd edn., pp. 909–944. Wiley, New York (2007) CrossRefGoogle Scholar
  28. 183.
    Michalatos-Beloin, S., Tishkoff, S.A., Bentley, K.L., Kidd, K.K., Ruano, G.: Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR. Nucleic Acids Res. 24, 4841–4843 (1996) CrossRefGoogle Scholar
  29. 194.
    Nielsen, D.M., Ehm, M.G., Zaykin, D.V., Weir, B.S.: Effect of two- and three-locus linkage disequilibrium on the power to detect marker/phenotype associations. Genetics 168, 1029–1040 (2004) CrossRefGoogle Scholar
  30. 196.
    Niu, T., Qin, Z.S., Xu, X., Liu, J.S.: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am. J. Hum. Genet. 70, 157–169 (2002) CrossRefGoogle Scholar
  31. 197.
    Niu, T.: Algorithms for inferring haplotypes. Genet. Epidemiol. 27, 334–347 (2004) CrossRefGoogle Scholar
  32. 209.
    Qin, Z.S., Niu, T., Liu, J.S.: Partition-ligation-expectation-maximization algorithm for haplotype inference with single nucleotide polymorphisms. Am. J. Hum. Genet. 71, 1242–1247 (2002) CrossRefGoogle Scholar
  33. 224.
    Satten, G.A., Epstein, M.P.: Comparison of prospective and retrospective methods for haplotype inference in case-control studies. Genet. Epidemiol. 27, 192–201 (2004) CrossRefGoogle Scholar
  34. 229.
    Schaid, D.J.: Relative efficiency of ambiguous vs. directly measured haplotype frequencies. Genet. Epidemiol. 23, 426–443 (2002) CrossRefGoogle Scholar
  35. 230.
    Schaid, D.J.: Evaluating associations of haplotypes with traits. Genet. Epidemiol. 27, 348–364 (2004) CrossRefGoogle Scholar
  36. 237.
    Sebastiani, P., Lazarus, R., Weiss, S.T., Kunkel, L.M., Kohane, I.S., Ramoni, M.F.: Minimal haplotype tagging. Proc. Natl. Acad. Sci. USA 100, 9900–9905 (2003) CrossRefGoogle Scholar
  37. 256.
    Spinka, C., Carroll, R.J., Chatterjee, N.: Analysis of case-control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity. Genet. Epidemiol. 29, 108–127 (2005) CrossRefGoogle Scholar
  38. 257.
    Stephens, M., Smith, N.J., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001) CrossRefGoogle Scholar
  39. 258.
    Stephens, M., Donnelly, P.: A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73, 1162–1169 (2003) CrossRefGoogle Scholar
  40. 262.
    Stram, D.O., Leigh, P.C., Bretsky, P., Freedman, M., Hirschhorn, J.N., Altshuler, D., Kolonel, L.N., Henderson, B.E., Thomas, D.C.: Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals. Hum. Hered. 55, 179–190 (2003) CrossRefGoogle Scholar
  41. 263.
    Stram, D.O., Haiman, C.A., Hirschhorn, J.N., Altshuler, D., Kolonel, L.N., Henderson, B.E., Pike, M.C.: Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum. Hered. 55, 27–36 (2003) CrossRefGoogle Scholar
  42. 268.
    The International HapMap Consortium: The international HapMap project. Nature 426, 789–796 (2003) CrossRefGoogle Scholar
  43. 269.
    The International HapMap Consortium: A haplotype map of the human genome. Nature 437, 1299–1320 (2005) CrossRefGoogle Scholar
  44. 274.
    Tregouët, D.A., Escolano, S., Tiret, L., Mallet, A., Golmard, J.L.: A new algorithm for haplotype-based association analysis: the stochastic-EM algorithm. Ann. Hum. Genet. 68, 165–177 (2004) CrossRefGoogle Scholar
  45. 276.
    Tzeng, J.Y., Devlin, B., Wasserman, L., Roeder, K.: On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. Am. J. Hum. Genet. 72, 891–902 (2003) CrossRefGoogle Scholar
  46. 277.
    Tzeng, J.Y., Wang, C.H., Kao, J.T., Hsiao, C.K.: Regression-based association analysis with clustered haplotypes through use of genotypes. Am. J. Hum. Genet. 78, 231–242 (2006) CrossRefGoogle Scholar
  47. 287.
    Wall, J.D., Pritchard, J.K.: Assessing the performance of the haplotype block model of linkage disequilibrium. Am. J. Hum. Genet. 73, 502–515 (2003) CrossRefGoogle Scholar
  48. 290.
    Wang, S., Kidd, K.K., Zhao, H.: On the use of DNA pooling to estimate haplotype frequencies. Genet. Epidemiol. 24, 74–82 (2003) CrossRefGoogle Scholar
  49. 292.
    Wang, T., Zhu, X., Elston, R.C.: Improving power in contrasting linkage-disequilibrium patterns between cases and controls. Am. J. Hum. Genet. 80, 911–920 (2007) CrossRefGoogle Scholar
  50. 298.
    Weir, B.S.: Inferences about linkage disequilibrium. Biometrics 35, 235–254 (1979) CrossRefMATHGoogle Scholar
  51. 299.
    Weir, B.S.: Genetic Data Analysis II. Sinauer Associates Inc., Sunderland (1996) Google Scholar
  52. 300.
    Weir, B.S., Cockerham, C.C.: Complete characterization of disequilibrium at two loci. In: Feldman, M.W. (ed.) Mathematical Evolutionary Theory, pp. 86–110. Princeton University Press, Princeton (1989) Google Scholar
  53. 307.
    Xing, E.P., Jordan, M.I., Sharan, R.: Bayesian haplotype inference via the Dirichlet process. J. Comput. Biol. 14, 267–284 (2007) CrossRefMathSciNetGoogle Scholar
  54. 312.
    Yang, Y., Zhang, J., Hoh, J., Matsuda, F., Xu, P., Lathrop, M., Ott, J.: Efficiency of single-nucleotide polymorphism haplotype estimation from pooled DNA. Proc. Natl. Acad. Sci. USA 100, 7225–7230 (2003) CrossRefGoogle Scholar
  55. 318.
    Zaykin, D.V., Westfall, P.H., Young, S.S., Karnoub, M.A., Wagner, M.J., Ehm, M.G.: Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered. 53, 79–91 (2002) CrossRefGoogle Scholar
  56. 319.
    Zaykin, D.V., Meng, Z., Ehm, M.G.: Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. Am. J. Hum. Genet. 78, 737–746 (2006) CrossRefGoogle Scholar
  57. 320.
    Zhang, K., Deng, M., Chen, T., Waterman, M.S., Sun, F.: A dynamic programming algorithm for haplotype block partitioning. Proc. Natl. Acad. Sci. USA 99, 7335–7339 (2002) CrossRefMATHGoogle Scholar
  58. 321.
    Zhang, H., Zhang, H., Zheng, G., Li, Z.: Statistical methods for haplotype-based matched case-control association studies. Genet. Epidemiol. 31, 316–326 (2007) CrossRefGoogle Scholar
  59. 322.
    Zhang, H., Yang, H.S., Yang, Y.: PoooL: An efficient algorithm for estimating haplotypes from large DNA pools. Bioinformatics 24, 1942–1948 (2008) CrossRefGoogle Scholar
  60. 328.
    Zhao, L.P., Li, S.S., Khalid, N.: A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies. Am. J. Hum. Genet. 72, 1231–1250 (2003) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Gang Zheng
    • 1
  • Yaning Yang
    • 2
  • Xiaofeng Zhu
    • 3
  • Robert C. Elston
    • 3
  1. 1.BethesdaUSA
  2. 2.School of Management, Dept. Statistics & FinanceUniversity of Science & Technology of ChinaHefeiPeople’s Republic of China
  3. 3.School of Medicine, Dept. Epidemiology & BiostatisticsCase Western Reserve UniversityClevelandUSA

Personalised recommendations