, Volume 70, Issue 5, pp 279–292 | Cite as

Modeling coverage gaps in haplotype frequencies via Bayesian inference to improve stem cell donor selection

  • Yoram Louzoun
  • Idan Alter
  • Loren Gragert
  • Mark Albrecht
  • Martin Maiers
Original Article


Regardless of sampling depth, accurate genotype imputation is limited in regions of high polymorphism which often have a heavy-tailed haplotype frequency distribution. Many rare haplotypes are thus unobserved. Statistical methods to improve imputation by extending reference haplotype distributions using linkage disequilibrium patterns that relate allele and haplotype frequencies have not yet been explored. In the field of unrelated stem cell transplantation, imputation of highly polymorphic human leukocyte antigen (HLA) genes has an important application in identifying the best-matched stem cell donor when searching large registries totaling over 28,000,000 donors worldwide. Despite these large registry sizes, a significant proportion of searched patients present novel HLA haplotypes. Supporting this observation, HLA population genetic models have indicated that many extant HLA haplotypes remain unobserved. The absent haplotypes are a significant cause of error in haplotype matching. We have applied a Bayesian inference methodology for extending haplotype frequency distributions, using a model where new haplotypes are created by recombination of observed alleles. Applications of this joint probability model offer significant improvement in frequency distribution estimates over the best existing alternative methods, as we illustrate using five-locus HLA frequency data from the National Marrow Donor Program registry. Transplant matching algorithms and disease association studies involving phasing and imputation of rare variants may benefit from this statistical inference framework.


Imputation Rare variants Bayesian inference DNA typing HLA 

Supplementary material

251_2017_1040_MOESM1_ESM.pdf (718 kb)
ESM 1 (PDF 718 kb)
251_2017_1040_MOESM2_ESM.pdf (154 kb)
Fig. S1 (PDF 154 kb)
251_2017_1040_MOESM3_ESM.pdf (660 kb)
Fig. S3 (PDF 660 kb)
251_2017_1040_MOESM4_ESM.pdf (448 kb)
Fig. S4 (PDF 447 kb)
251_2017_1040_MOESM5_ESM.pdf (212 kb)
Fig. S5 (PDF 212 kb)
251_2017_1040_MOESM6_ESM.pdf (154 kb)
Fig. S7 (PDF 153 kb)


  1. Anasetti C (2012) The ever elusive permissive mismatch. Biol Blood Marrow Transplant 18:657–658CrossRefPubMedGoogle Scholar
  2. Browning SR, Weir BS (2010) Population structure with localized haplotype clusters. Genetics 185:1337–1344CrossRefPubMedPubMedCentralGoogle Scholar
  3. Chi EC, Zhou H, Chen GK, Del Vecchyo DO, Lange K (2013) Genotype imputation via matrix completion. Genome Res 23:509–518CrossRefPubMedPubMedCentralGoogle Scholar
  4. Consortium IH (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58CrossRefGoogle Scholar
  5. Dehn J, Arora M, Spellman S, Setterholm M, Horowitz M, Confer D, Weisdorf D (2008) Unrelated donor hematopoietic cell transplantation: factors associated with a better HLA match. Biol Blood Marrow Transplant 14(12):1334–1340. CrossRefPubMedPubMedCentralGoogle Scholar
  6. Dehn J, Setterholm M, Buck K, Kempenich J, Beduhn B, Gragert L, Madbouly A, Fingerson S, Maiers M (2016) HapLogic: a predictive human leukocyte antigen-matching algorithm to enhance rapid identification of the optimal unrelated hematopoietic stem cell sources for transplantation. Biol Blood Marrow Transplant 22(11):2038–2046Google Scholar
  7. Dilthey A, Cox C, Iqbal Z, Nelson MR, McVean G (2015) Improved genome inference in the MHC using a population reference graph. Nat Genet 47:682–688CrossRefPubMedPubMedCentralGoogle Scholar
  8. du Preez JA (1998) Efficient training of high-order hidden Markov models using first-order representations. Comput Speech Lang 12:23–39CrossRefGoogle Scholar
  9. Eberhard HP, Feldmann U, Bochtler W, Baier D, Rutt C, Schmidt AH, Müller CR (2010) Estimating unbiased haplotype frequencies from stem cell donor samples typed at heterogeneous resolutions: a practical study based on over 1 million German donors. Tissue Antigens 76:352–361CrossRefPubMedGoogle Scholar
  10. Eberhard HP, Madbouly A, Gourraud P, Balère M, Feldmann U, Gragert L, Maldonado Torres H, Pingel J, Schmidt A, Steiner D (2013) Comparative validation of computer programs for haplotype frequency estimation from donor registry data. Tissue Antigens 82:93–105CrossRefPubMedGoogle Scholar
  11. Erlich H (2012) HLA DNA typing: past, present, and future. Tissue Antigens 80:1–11CrossRefPubMedGoogle Scholar
  12. Ewens W (1972) The sampling theory of selectively neutral alleles. Theor Popul Biol 3(1):87–112Google Scholar
  13. Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12:921–927PubMedGoogle Scholar
  14. Gragert L, Madbouly A, Freeman J, Maiers M (2013) Six-locus high resolution HLA haplotype frequencies derived from mixed-resolution DNA typing for the entire US donor registry. Hum Immunol 74:1313–1320CrossRefPubMedGoogle Scholar
  15. Hansen JA, Yamamoto K, Petersdorf E, Sasazuki T (1999) The role of {HLA} matching in hematopoietic cell transplantation. Rev Immunogenet 1:359–373PubMedGoogle Scholar
  16. Hastie T, Tibshirani R, Friedman J (2009) Unsupervised learning. Springer, The elements of statistical learningCrossRefGoogle Scholar
  17. Hawley ME, Kidd KK (1995) {HAPLO:} a program using the {EM} algorithm to estimate the frequencies of multi-site haplotypes. J Hered 86:409–411CrossRefPubMedGoogle Scholar
  18. Hellinger E (1909) Neue Begr{ü}ndung der Theorie quadratischer Formen von unendlichvielen Ver{ä}nderlichen. Journal f{ü}r die reine und angewandte Mathematik 136:210–271Google Scholar
  19. Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67CrossRefGoogle Scholar
  20. Hou L, Vierra-Green C, Lazaro A, Brady C, Haagenson M, Spellman S, Hurley C (2017) Limited HLA sequence variation outside of antigen recognition domain exons of 360 10 of 10 matched unrelated hematopoietic stem cell transplant donor-recipient pairs. HLA 89:39–46CrossRefPubMedGoogle Scholar
  21. Klitz W, Hedrick P, Louis EJ (2012) New reservoirs of HLA alleles: pools of rare variants enhance immune defense. Trends Genet 28:480–486CrossRefPubMedGoogle Scholar
  22. Kollman C, Maiers M, Gragert L, Müller C, Setterholm M, Oudshoorn M, Hurley CK (2007) Estimation of {HLA-A}, -B, -{DRB1} haplotype frequencies using mixed resolution data from a National Registry with selective retyping of volunteers. Hum Immunol 68:950–958CrossRefPubMedGoogle Scholar
  23. Kulkarni S, Martin MP, Carrington M (2008) The Yin and Yang of HLA and KIR in human disease. Elsevier, Seminars in immunologyGoogle Scholar
  24. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86CrossRefGoogle Scholar
  25. Lee SJ, Klein J, Haagenson M, Baxter-Lowe LA, Confer DL, Eapen M, Fernandez-Vina M, Flomenberg N, Horowitz M, Hurley CK, Noreen H, Oudshoorn M, Petersdorf E, Setterholm M, Spellman S, Weisdorf D, Williams TM, Anasetti C (2007) High-resolution donor-recipient {HLA} matching contributes to the success of unrelated donor marrow transplantation. Blood 110:4576–4583CrossRefPubMedGoogle Scholar
  26. Mack SJ, Cano P, Hollenbach JA, He J, Hurley CK, Middleton D, Moraes ME, Pereira SE, Kempenich JH, Reed EF (2013) Common and well-documented HLA alleles: 2012 update to the CWD catalogue. Tissue Antigens 81:194–203CrossRefPubMedPubMedCentralGoogle Scholar
  27. Mack SJ, Tu B, Lazaro A, Yang R, Lancaster AK, Cao K, Ng J, Hurley CK (2009) {HLA-A}, -B, -C, and -{DRB1} allele and haplotype frequencies distinguish eastern European Americans from the general European American population. Tissue Antigens 73:17–32CrossRefPubMedGoogle Scholar
  28. Maiers M, Gragert L, Klitz W (2007) High-resolution {HLA} alleles and haplotypes in the United States population. Hum Immunol 68:779–788CrossRefPubMedGoogle Scholar
  29. Maiers M, Gragert L, Madbouly A, Steiner D, Marsh SGE, Gourraud P-A, Oudshoorn M, van der Zanden H, Schmidt AH, Pingel J, Hofmann J, Müller C, Eberhard H-P (2013) 16(th) {IHIW:} global analysis of registry {HLA} haplotypes from 20 million individuals: report from the {IHIW} registry diversity group. Int J Immunogenet 40:66–71PubMedGoogle Scholar
  30. Maiers M, Hurley C, Perlee L, Fernandez-Vina M, Baisch J, Cook D, Fraser P, Heine U, Hsu S, Leffell M (1999) Maintaining updated DNA-based HLA assignments in the National Marrow Donor Program Bone Marrow Registry. Rev Immunogenet 2:449–460Google Scholar
  31. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11:499–511CrossRefPubMedGoogle Scholar
  32. Martin AM, Freitas EM, Witt CS, Christiansen FT (2000) The genomic organization and evolution of the natural killer immunoglobulin-like receptor (KIR) gene cluster. Immunogenetics 51:268–280CrossRefPubMedGoogle Scholar
  33. Niu T (2004) Algorithms for inferring haplotypes. Genet Epidemiol 27:334–347CrossRefPubMedGoogle Scholar
  34. Norman PJ, Hollenbach JA, Nemat-Gorgani N, Marin WM, Norberg SJ, Ashouri E, Jayaraman J, Wroblewski EE, Trowsdale J, Rajalingam R (2016) Defining KIR and HLA class I genotypes at highest resolution via high-throughput sequencing. Am J Hum Genet 99:375–391CrossRefPubMedPubMedCentralGoogle Scholar
  35. Nothnagel M, Ellinghaus D, Schreiber S, Krawczak M, Franke A (2009) A comprehensive evaluation of {SNP} genotype imputation. Hum Genet 125:163–171CrossRefPubMedGoogle Scholar
  36. Paunić V, Gragert L, Schneider J, Mueller C, Maiers M (2016) Charting improvements in US registry HLA typing ambiguity using a typing resolution score. Hum Immunol 77:542–549CrossRefPubMedGoogle Scholar
  37. Petersdorf EW, Anasetti C, Martin PJ, Gooley T, Radich J, Malkki M, Woolfrey A, Smith A, Mickelson E, Hansen JA (2004) Limits of {HLA} mismatching in unrelated hematopoietic cell transplantation. Blood 104:2976–2980CrossRefPubMedGoogle Scholar
  38. Petersdorf EW, Malkki M, Gooley TA, Martin PJ, Guo Z (2007) MHC haplotype matching for unrelated hematopoietic cell transplantation. PLoS Med 4:e8CrossRefPubMedPubMedCentralGoogle Scholar
  39. Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SG (2014) The IPD and IMGT/HLA database: allele variant databases. Nucleic acids research:gku1161Google Scholar
  40. Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147–177CrossRefPubMedGoogle Scholar
  41. Siva N (2008) 1000 Genomes project. Nat Biotechnol 26:256–256CrossRefPubMedGoogle Scholar
  42. Slater N, Louzoun Y, Gragert L, Maiers M, Chatterjee A, Albrecht M (2015) Power laws for heavy-tailed distributions: modeling allele and haplotype diversity for the National Marrow Donor Program. PLoS Comput Biol 11(4)Google Scholar
  43. Spellman SR, Eapen M, Logan BR, Mueller C, Rubinstein P, Setterholm MI, Woolfrey AE, Horowitz MM, Confer DL, Hurley CK (2012) A perspective on the selection of unrelated donors and cord blood units for transplantation. Blood 120:259–265CrossRefPubMedPubMedCentralGoogle Scholar
  44. Templeton AR, Sing CF (1993) A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. IV. Nested analyses with cladogram uncertainty and recombination. Genetics 134:659–669PubMedPubMedCentralGoogle Scholar
  45. Trowsdale J, Knight JC (2013) Major histocompatibility complex genomics and human disease. Annu Rev Genomics Hum Genet 14:301–323CrossRefPubMedPubMedCentralGoogle Scholar
  46. Vierra-Green C, Roe D, Hou L, Hurley CK, Rajalingam R, Reed E, Lebedeva T, Yu N, Stewart M, Noreen H (2012) Allele-level haplotype frequencies and pairwise linkage disequilibrium for 14 KIR loci in 506 European-American individuals. PLoS One 7:e47491CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Department of MathematicsBar-Ilan UniversityRamat GanIsrael
  2. 2.Bioinformatics ResearchNational Marrow Donor ProgramMinneapolisUSA
  3. 3.Department of Pathology and Laboratory MedicineTulane University School of MedicineNew OrleansUSA

Personalised recommendations