Human Genetics

, Volume 137, Issue 4, pp 281–292 | Cite as

Genotype imputation performance of three reference panels using African ancestry individuals

  • Candelaria Vergara
  • Margaret M. Parker
  • Liliana Franco
  • Michael H. Cho
  • Ana V. Valencia-Duarte
  • Terri H. Beaty
  • Priya DuggalEmail author
Original Investigation


Genotype imputation estimates unobserved genotypes from genome-wide makers, to increase genome coverage and power for genome-wide association studies. Imputation has been successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We compared the performance of these reference panels when imputing variation in 3747 African Americans (AA) from two cohorts (HCV and COPDGene) genotyped using Illumina Omni microarrays. The haplotypes of 2504 (1000G), 883 (CAAPA) and 32,470 individuals (HRC) were used as reference. We compared the number of variants, imputation quality, imputation accuracy and coverage between panels. In both cohorts, 1000G imputed 1.5–1.6× more variants than CAAPA and 1.2× more than HRC. Similar findings were observed for variants with imputation R2 > 0.5 and for rare, low-frequency, and common variants. When merging imputed variants of the three panels, the total number was 62–63 M with 20 M overlapping variants imputed by all three panels, and a range of 5–15 M variants imputed exclusively with one of them. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. 1000G, HRC and CAAPA provided high performance and accuracy for imputation of African American individuals, increasing the number of variants available for subsequent analyses. These panels are complementary and would benefit from the development of an integrated African reference panel.



This project was funded in part with federal funds from the office of AIDS Research through the Center for Inherited Diseases at Johns Hopkins University, the National Institutes of Drug Abuse R01013324. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. Liliana Franco was supported by COLCIENCIAS’s (Administrative Department of Science, Technology and Innovation -Departamento Administrativo de Ciencia, Tecnología e Innovación-) scholarship program for PhD students and the Epidemiology Group of National School of Public Health of the University of Antioquia. The COPDGene project (NCT00608764) was supported by Award Number R01HL089897 and Award Number R01HL089856 from the National Heart, Lung, and Blood Institute. Margaret M. Parker was supported by T32HL007427. The COPDGene project is also supported by the COPD Foundation through contributions made to the Industry Advisory Board comprising AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, Sunovion, and GlaxoSmithKline.

Compliance with ethical standards

Conflict of interest

M.H.C. has received grant support from GSK. The remaining authors declare that they have no conflict of interest.

Supplementary material

439_2018_1881_MOESM1_ESM.docx (1.1 mb)
Supplementary material 1 (DOCX 1082 kb)
439_2018_1881_MOESM2_ESM.docx (15 kb)
Supplementary material 2 (DOCX 14 kb)


  1. Abecasis GR, Auton A, Brooks LD et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65. CrossRefPubMedGoogle Scholar
  2. Adeyemo A, Gerry N, Chen G et al (2009) A genome-wide association study of hypertension and blood pressure in African Americans. PLoS Genet 5:e1000564. CrossRefPubMedPubMedCentralGoogle Scholar
  3. Alric L, Fort M, Izopet J et al (1997) Genes of the major histocompatibility complex class II influence the outcome of hepatitis C virus infection. Gastroenterology 113:1675–1681CrossRefGoogle Scholar
  4. Anderson CA, Pettersson FH, Barrett JC et al (2008) Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet 83:112–119. CrossRefPubMedPubMedCentralGoogle Scholar
  5. Auton A, Abecasis GR, Altshuler DM et al (2015) A global reference for human genetic variation. Nature 526:68–74. CrossRefPubMedGoogle Scholar
  6. Baran Y, Pasaniuc B, Sankararaman S et al (2012) Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 28:1359–1367. CrossRefPubMedPubMedCentralGoogle Scholar
  7. Brody JA, Morrison AC, Bis JC et al (2017) Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology. Nat Genet 49:1560–1563. CrossRefPubMedPubMedCentralGoogle Scholar
  8. Browning BL, Browning SR (2009) A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84:210–223. CrossRefPubMedPubMedCentralGoogle Scholar
  9. Campbell MC, Tishkoff SA (2008) African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Ann Rev Genom Human Genet 9(1):403–33. CrossRefGoogle Scholar
  10. Cavalli-Sforza LL (2005) The human genome diversity project: past, present and future. Nat Rev Genet 6:333–340. CrossRefPubMedGoogle Scholar
  11. Chanda P, Yuhki N, Li M et al (2012) Comprehensive evaluation of imputation performance in African Americans. J Hum Genet 57:411–421. CrossRefPubMedPubMedCentralGoogle Scholar
  12. Cho MH, Castaldi PJ, Wan ES et al (2012) A genome-wide association study of COPD identifies a susceptibility locus on chromosome 19q13. Hum Mol Genet 21:947–957. CrossRefPubMedGoogle Scholar
  13. Chou W-C, Zheng H-F, Cheng C-H et al (2016) A combined reference panel from the 1000 genomes and UK10K projects improved rare variant imputation in European and Chinese samples. Sci Rep 6:39313. CrossRefPubMedPubMedCentralGoogle Scholar
  14. Cox AL, Netski DM, Mosbruger T et al (2005) Prospective evaluation of community-acquired acute-phase hepatitis C virus infection. Clin Infect Dis 40:951–958. CrossRefPubMedGoogle Scholar
  15. Cramp ME, Carucci P, Underhill J et al (1998) Association between HLA class II genotype and spontaneous clearance of hepatitis C viraemia. J Hepatol 29:207–213CrossRefGoogle Scholar
  16. Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158. CrossRefPubMedPubMedCentralGoogle Scholar
  17. Das S, Forer L, Schönherr S et al (2016) Next-generation genotype imputation service and methods. Nat Genet 48:1284–1287. CrossRefPubMedPubMedCentralGoogle Scholar
  18. Deelen P, Menelaou A, van Leeuwen EM et al (2014) Improved imputation quality of low-frequency and rare variants in European samples using the “Genome of The Netherlands”. Eur J Hum Genet 22:1321–1326. CrossRefPubMedPubMedCentralGoogle Scholar
  19. DePristo MA, Banks E, Poplin R et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498. CrossRefPubMedPubMedCentralGoogle Scholar
  20. Duan Q, Liu EY, Auer PL et al (2013) Imputation of coding variants in African Americans: better performance using data from the exome sequencing project. Bioinformatics 29:2744–2749. CrossRefPubMedPubMedCentralGoogle Scholar
  21. Duggal P, Thio CL, Wojcik GL et al (2013) Genome wide association study of spontaneous resolution of hepatitis C virus infection: data from multiple cohorts. Ann Intern Med 158:235–245. CrossRefPubMedPubMedCentralGoogle Scholar
  22. Durbin R (2014) Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics 30:1266–1272. CrossRefPubMedPubMedCentralGoogle Scholar
  23. Edlin BR, Shu MA, Winkelstein E et al (2009) More rare birds, and the occasional swan. Gastroenterology 136:2412–2414. CrossRefPubMedPubMedCentralGoogle Scholar
  24. Fuchsberger C, Abecasis GR, Hinds DA (2015) minimac2: faster genotype imputation. Bioinformatics 31:782–784. CrossRefPubMedGoogle Scholar
  25. Genome of the Netherlands Consortium LC, Menelaou A, Pulit SL et al (2014) Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 46:818–825. CrossRefGoogle Scholar
  26. Goedert JJ, Chen BE, Preiss L et al (2007) Reconstruction of the hepatitis C virus epidemic in the US hemophilia population, 1940–1990. Am J Epidemiol 165:1443–1453. CrossRefPubMedGoogle Scholar
  27. Gurdasani D, Carstensen T, Tekola-Ayele F et al (2014) The African genome variation project shapes medical genetics in Africa. Nature 517:327–332. CrossRefPubMedPubMedCentralGoogle Scholar
  28. Hancock DB, Levy JL, Gaddis NC et al (2012) Assessment of genotype imputation performance using 1000 genomes in African American studies. PLoS One 7:e50610. CrossRefPubMedPubMedCentralGoogle Scholar
  29. Hilgartner MW, Donfield SM, Willoughby A et al (1993) Hemophilia growth and development study. Design, methods, and entry data. Am J Pediatr Hematol Oncol 15:208–218CrossRefGoogle Scholar
  30. Hoffmann TJ, Zhan Y, Kvale MN et al (2011) Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm. Genomics 98:422–430. CrossRefPubMedPubMedCentralGoogle Scholar
  31. Howie B, Fuchsberger C, Stephens M et al (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44:955–959. CrossRefPubMedPubMedCentralGoogle Scholar
  32. Huang GH, Tseng YC (2014) Genotype imputation accuracy with different reference panels in admixed populations. BMC Proc 8:S64. CrossRefPubMedPubMedCentralGoogle Scholar
  33. Huang L, Li Y, Singleton AB et al (2009) Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet 84:235–250. CrossRefPubMedPubMedCentralGoogle Scholar
  34. Huang J, Howie B, McCarthy S et al (2015) Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun 6:8111. CrossRefPubMedPubMedCentralGoogle Scholar
  35. Johnson EO, Hancock DB, Levy JL et al (2013) Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. Hum Genet 132:509–522. CrossRefPubMedPubMedCentralGoogle Scholar
  36. Jorde LB, Watkins WS, Bamshad MJ, DixonME Ricker CE, Seielstad MT, Batzer MA (2000) The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data. Am J Human Genet 66(3):979–988. CrossRefGoogle Scholar
  37. Kent WJ, Sugnet CW, Furey TS, Roskin KM (2002) The human genome browser at UCSC. Genome Res 12:996–1006. CrossRefPubMedPubMedCentralGoogle Scholar
  38. Khakoo SI, Thio CL, Martin MP et al (2004) HLA and NK cell inhibitory receptor genes in resolving hepatitis C virus infection. Science (80-) 305:872–874. CrossRefGoogle Scholar
  39. Kim AY, Kuntzen T, Timm J et al (2011) Spontaneous control of HCV is associated with expression of HLA-B 57 and preservation of targeted epitopes. Gastroenterology 140:686.e1–696.e1. CrossRefGoogle Scholar
  40. Krithika S, Valladares-Salgado A, Peralta J et al (2012) Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs. BMC Med Genom 5:12. CrossRefGoogle Scholar
  41. Kuniholm MH, Gao X, Xue X et al (2011) The relation of HLA genotype to hepatitis C viral load and markers of liver fibrosis in HIV-infected and HIV-uninfected women. J Infect Dis 203:1807–1814. CrossRefPubMedPubMedCentralGoogle Scholar
  42. Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. CrossRefPubMedGoogle Scholar
  43. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760. CrossRefPubMedPubMedCentralGoogle Scholar
  44. Li Y, Willer CJ, Ding J et al (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834. CrossRefPubMedPubMedCentralGoogle Scholar
  45. Lin P, Hartz SM, Zhang Z et al (2010) A new statistic to evaluate imputation reliability. PLoS One 5:e9697. CrossRefPubMedPubMedCentralGoogle Scholar
  46. Loh P-R, Danecek P, Palamara PF et al (2016) Reference-based phasing using the Haplotype Reference Consortium panel.
  47. Mangia A, Gentile R, Cascavilla I et al (1999) HLA class II favors clearance of HCV infection and progression of the chronic liver damage. J Hepatol 30:984–989CrossRefGoogle Scholar
  48. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11:499–511. CrossRefPubMedGoogle Scholar
  49. Mathias RA, Taub MA, Gignoux CR et al (2016) A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 7:12522. CrossRefPubMedPubMedCentralGoogle Scholar
  50. McCarthy S, Das S, Kretzschmar W et al (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48:1279–1283. CrossRefPubMedPubMedCentralGoogle Scholar
  51. McRae AF (2017) Analysis of genome-wide association data. In: Keith JM (ed) Bioinformatics, 2nd edn. Humana Press, Melbourne, pp 161–174CrossRefGoogle Scholar
  52. Mitt M, Kals M, Pärn K et al (2017) Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet. CrossRefPubMedPubMedCentralGoogle Scholar
  53. Nelson SC, Doheny KF, Pugh EW et al (2013) Imputation-based genomic coverage assessments of current human genotyping arrays. G3 3:1795–1807. CrossRefPubMedGoogle Scholar
  54. Nelson SC, Romm JM, Doheny KF, et al (2017) Imputation-based genomic coverage assessments of current genotyping arrays: Illumina HumanCore, OmniExpress, Multi-Ethnic global array and sub-arrays, Global Screening Array, Omni2.5M, Omni5M, and Affymetrix UK Biobank.
  55. Nothnagel M, Ellinghaus D, Schreiber S et al (2009) A comprehensive evaluation of SNP genotype imputation. Hum Genet 125:163–171. CrossRefPubMedGoogle Scholar
  56. Parker MM, Foreman MG, Abel HJ et al (2014) Admixture mapping identifies a quantitative trait locus associated with FEV1/FVC in the COPDGene study. Genet Epidemiol 38:652–659. CrossRefPubMedPubMedCentralGoogle Scholar
  57. Pistis G, Porcu E, Vrieze SI et al (2015) Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur J Hum Genet 23:975–983. CrossRefGoogle Scholar
  58. Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909. CrossRefPubMedPubMedCentralGoogle Scholar
  59. Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. CrossRefPubMedPubMedCentralGoogle Scholar
  60. R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  61. Ramnarine S, Zhang J, Chen L-S et al (2015) When does choice of accuracy measure alter imputation accuracy assessments. PLoS One 10:e0137601. CrossRefPubMedPubMedCentralGoogle Scholar
  62. Regan EA, Hokanson JE, Murphy JR et al (2010) Genetic epidemiology of COPD (COPDGene) study design. COPD J Chronic Obstr Pulm Dis 7:32–43. CrossRefGoogle Scholar
  63. Roshyara NR, Scholz M (2015) Impact of genetic similarity on imputation accuracy. BMC Genet 16:90. CrossRefPubMedPubMedCentralGoogle Scholar
  64. Roshyara NR, Horn K, Kirsten H et al (2016) Comparing performance of modern genotype imputation methods in different ethnicities. Sci Rep 6:34386. CrossRefPubMedPubMedCentralGoogle Scholar
  65. Shriner D, Adeyemo A, Chen G, Rotimi CN (2010) Practical considerations for imputation of untyped markers in admixed populations. Genet Epidemiol 34:258–265. CrossRefPubMedPubMedCentralGoogle Scholar
  66. Sudmant PH, Rausch T, Gardner EJ et al (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81. CrossRefPubMedPubMedCentralGoogle Scholar
  67. Sung YJ, Gu CC, Tiwari HK et al (2012) Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects. Genet Epidemiol 36:508–516. CrossRefPubMedPubMedCentralGoogle Scholar
  68. The International HapMap 3 Consortium, Altshuler DM, Gibbs RA et al (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58. CrossRefPubMedCentralGoogle Scholar
  69. Tobler LH, Bahrami SH, Kaidarova Z et al (2010) A case–control study of factors associated with resolution of hepatitis C viremia in former blood donors (CME). Transfusion 50:1513–1523. CrossRefPubMedPubMedCentralGoogle Scholar
  70. Van der Auwera G, Carneiro M, Hartl C et al (2013) From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinform 43:11.10.1–11.10.33. CrossRefGoogle Scholar
  71. van Iperen E, Hovingh G, Asselbergs F, Zwinderman A (2017) Extending the use of GWAS data by combining data from different genetic platforms. PLoS One 12:e0172082. (eCollection 2017) CrossRefPubMedPubMedCentralGoogle Scholar
  72. Verma SS, de Andrade M, Tromp G et al (2014) Imputation and quality control steps for combining multiple genome-wide datasets. Front Genet 5:370. CrossRefPubMedPubMedCentralGoogle Scholar
  73. Visscher PM, Wray NR, Zhang Q et al (2017) 10 Years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101:5–22. CrossRefPubMedPubMedCentralGoogle Scholar
  74. Vlahov D, Muñoz A, Anthony J et al (1990) Association of drug injection patterns with antibody to human immunodeficiency virus type 1 among intravenous drug users in Baltimore, Maryland. Am J Epidemiol 132:847–856CrossRefGoogle Scholar
  75. Walter K, Min JL, Huang J et al (2015) The UK10K project identifies rare variants in health and disease. Nature 526:82–90. CrossRefPubMedGoogle Scholar
  76. Warren HR, Evangelou E, Cabrera CP et al (2017) Genome-wide association analysis identifies novel blood pressure loci and offers biological insights into cardiovascular risk. Nat Genet 49:403–415. CrossRefPubMedPubMedCentralGoogle Scholar
  77. Wojcik GL, Thio CL, Kao WHL et al (2014) Admixture analysis of spontaneous hepatitis C virus clearance in individuals of African-descent. Genes Immun 15:241–246. CrossRefPubMedPubMedCentralGoogle Scholar
  78. Wojcik GL, Fuchsberger C, Taliun D, et al (2017) Imputation aware tag SNP selection to improve power for multi-ethnic association studies.
  79. Zhang B, Zhi D, Zhang K et al (2011) Practical consideration of genotype imputation: sample size, window size, reference choice, and untyped rate. Stat Interface 4:339–352CrossRefGoogle Scholar
  80. Zhao Z, Timofeev N, Hartley SW et al (2008) Imputation of missing genotypes: an empirical evaluation of IMPUTE. BMC Genet 9:85. CrossRefPubMedPubMedCentralGoogle Scholar
  81. Zheng-Bradley X, Flicek P (2016) Applications of the 1000 genomes project resources. Brief Funct Genom 16:elw027. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Johns Hopkins University, School of MedicineBaltimoreUSA
  2. 2.Channing Division of Network MedicineBrigham and Women’s HospitalBostonUSA
  3. 3.National School of Public HealthUniversidad de AntioquiaMedellínColombia
  4. 4.School of MedicineUniversidad Pontificia BolivarianaMedellínColombia
  5. 5.Division of Pulmonary and Critical Care MedicineBrigham and Women’s HospitalBostonUSA
  6. 6.Johns Hopkins University, Bloomberg School of Public HealthBaltimoreUSA

Personalised recommendations