Abstract
Genotype imputation estimates unobserved genotypes from genome-wide makers, to increase genome coverage and power for genome-wide association studies. Imputation has been successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We compared the performance of these reference panels when imputing variation in 3747 African Americans (AA) from two cohorts (HCV and COPDGene) genotyped using Illumina Omni microarrays. The haplotypes of 2504 (1000G), 883 (CAAPA) and 32,470 individuals (HRC) were used as reference. We compared the number of variants, imputation quality, imputation accuracy and coverage between panels. In both cohorts, 1000G imputed 1.5–1.6× more variants than CAAPA and 1.2× more than HRC. Similar findings were observed for variants with imputation R2 > 0.5 and for rare, low-frequency, and common variants. When merging imputed variants of the three panels, the total number was 62–63 M with 20 M overlapping variants imputed by all three panels, and a range of 5–15 M variants imputed exclusively with one of them. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. 1000G, HRC and CAAPA provided high performance and accuracy for imputation of African American individuals, increasing the number of variants available for subsequent analyses. These panels are complementary and would benefit from the development of an integrated African reference panel.
Similar content being viewed by others
References
Abecasis GR, Auton A, Brooks LD et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65. https://doi.org/10.1038/nature11632
Adeyemo A, Gerry N, Chen G et al (2009) A genome-wide association study of hypertension and blood pressure in African Americans. PLoS Genet 5:e1000564. https://doi.org/10.1371/journal.pgen.1000564
Alric L, Fort M, Izopet J et al (1997) Genes of the major histocompatibility complex class II influence the outcome of hepatitis C virus infection. Gastroenterology 113:1675–1681
Anderson CA, Pettersson FH, Barrett JC et al (2008) Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet 83:112–119. https://doi.org/10.1016/j.ajhg.2008.06.008
Auton A, Abecasis GR, Altshuler DM et al (2015) A global reference for human genetic variation. Nature 526:68–74. https://doi.org/10.1038/nature15393
Baran Y, Pasaniuc B, Sankararaman S et al (2012) Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 28:1359–1367. https://doi.org/10.1093/bioinformatics/bts144
Brody JA, Morrison AC, Bis JC et al (2017) Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology. Nat Genet 49:1560–1563. https://doi.org/10.1038/ng.3968
Browning BL, Browning SR (2009) A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84:210–223. https://doi.org/10.1016/j.ajhg.2009.01.005
Campbell MC, Tishkoff SA (2008) African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Ann Rev Genom Human Genet 9(1):403–33. https://doi.org/10.1146/annurev.genom.9.081307.164258
Cavalli-Sforza LL (2005) The human genome diversity project: past, present and future. Nat Rev Genet 6:333–340. https://doi.org/10.1038/nrg1596
Chanda P, Yuhki N, Li M et al (2012) Comprehensive evaluation of imputation performance in African Americans. J Hum Genet 57:411–421. https://doi.org/10.1038/jhg.2012.43
Cho MH, Castaldi PJ, Wan ES et al (2012) A genome-wide association study of COPD identifies a susceptibility locus on chromosome 19q13. Hum Mol Genet 21:947–957. https://doi.org/10.1093/hmg/ddr524
Chou W-C, Zheng H-F, Cheng C-H et al (2016) A combined reference panel from the 1000 genomes and UK10K projects improved rare variant imputation in European and Chinese samples. Sci Rep 6:39313. https://doi.org/10.1038/srep39313
Cox AL, Netski DM, Mosbruger T et al (2005) Prospective evaluation of community-acquired acute-phase hepatitis C virus infection. Clin Infect Dis 40:951–958. https://doi.org/10.1086/428578
Cramp ME, Carucci P, Underhill J et al (1998) Association between HLA class II genotype and spontaneous clearance of hepatitis C viraemia. J Hepatol 29:207–213
Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158. https://doi.org/10.1093/bioinformatics/btr330
Das S, Forer L, Schönherr S et al (2016) Next-generation genotype imputation service and methods. Nat Genet 48:1284–1287. https://doi.org/10.1038/ng.3656
Deelen P, Menelaou A, van Leeuwen EM et al (2014) Improved imputation quality of low-frequency and rare variants in European samples using the “Genome of The Netherlands”. Eur J Hum Genet 22:1321–1326. https://doi.org/10.1038/ejhg.2014.19
DePristo MA, Banks E, Poplin R et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498. https://doi.org/10.1038/ng.806
Duan Q, Liu EY, Auer PL et al (2013) Imputation of coding variants in African Americans: better performance using data from the exome sequencing project. Bioinformatics 29:2744–2749. https://doi.org/10.1093/bioinformatics/btt477
Duggal P, Thio CL, Wojcik GL et al (2013) Genome wide association study of spontaneous resolution of hepatitis C virus infection: data from multiple cohorts. Ann Intern Med 158:235–245. https://doi.org/10.7326/0003-4819-158-4-201302190-00003.Genome
Durbin R (2014) Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics 30:1266–1272. https://doi.org/10.1093/bioinformatics/btu014
Edlin BR, Shu MA, Winkelstein E et al (2009) More rare birds, and the occasional swan. Gastroenterology 136:2412–2414. https://doi.org/10.1053/j.gastro.2009.04.040
Fuchsberger C, Abecasis GR, Hinds DA (2015) minimac2: faster genotype imputation. Bioinformatics 31:782–784. https://doi.org/10.1093/bioinformatics/btu704
Genome of the Netherlands Consortium LC, Menelaou A, Pulit SL et al (2014) Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 46:818–825. https://doi.org/10.1038/ng.3021
Goedert JJ, Chen BE, Preiss L et al (2007) Reconstruction of the hepatitis C virus epidemic in the US hemophilia population, 1940–1990. Am J Epidemiol 165:1443–1453. https://doi.org/10.1093/aje/kwm030
Gurdasani D, Carstensen T, Tekola-Ayele F et al (2014) The African genome variation project shapes medical genetics in Africa. Nature 517:327–332. https://doi.org/10.1038/nature13997
Hancock DB, Levy JL, Gaddis NC et al (2012) Assessment of genotype imputation performance using 1000 genomes in African American studies. PLoS One 7:e50610. https://doi.org/10.1371/journal.pone.0050610
Hilgartner MW, Donfield SM, Willoughby A et al (1993) Hemophilia growth and development study. Design, methods, and entry data. Am J Pediatr Hematol Oncol 15:208–218
Hoffmann TJ, Zhan Y, Kvale MN et al (2011) Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm. Genomics 98:422–430. https://doi.org/10.1016/j.ygeno.2011.08.007
Howie B, Fuchsberger C, Stephens M et al (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44:955–959. https://doi.org/10.1038/ng.2354
Huang GH, Tseng YC (2014) Genotype imputation accuracy with different reference panels in admixed populations. BMC Proc 8:S64. https://doi.org/10.1186/1753-6561-8-s1-s64
Huang L, Li Y, Singleton AB et al (2009) Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet 84:235–250. https://doi.org/10.1016/j.ajhg.2009.01.013
Huang J, Howie B, McCarthy S et al (2015) Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat Commun 6:8111. https://doi.org/10.1038/ncomms9111
Johnson EO, Hancock DB, Levy JL et al (2013) Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy. Hum Genet 132:509–522. https://doi.org/10.1007/s00439-013-1266-7
Jorde LB, Watkins WS, Bamshad MJ, DixonME Ricker CE, Seielstad MT, Batzer MA (2000) The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data. Am J Human Genet 66(3):979–988. https://doi.org/10.1086/302825
Kent WJ, Sugnet CW, Furey TS, Roskin KM (2002) The human genome browser at UCSC. Genome Res 12:996–1006. https://doi.org/10.1101/gr.229102
Khakoo SI, Thio CL, Martin MP et al (2004) HLA and NK cell inhibitory receptor genes in resolving hepatitis C virus infection. Science (80-) 305:872–874. https://doi.org/10.1126/science.1097670
Kim AY, Kuntzen T, Timm J et al (2011) Spontaneous control of HCV is associated with expression of HLA-B 57 and preservation of targeted epitopes. Gastroenterology 140:686.e1–696.e1. https://doi.org/10.1053/j.gastro.2010.09.042
Krithika S, Valladares-Salgado A, Peralta J et al (2012) Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs. BMC Med Genom 5:12. https://doi.org/10.1186/1755-8794-5-12
Kuniholm MH, Gao X, Xue X et al (2011) The relation of HLA genotype to hepatitis C viral load and markers of liver fibrosis in HIV-infected and HIV-uninfected women. J Infect Dis 203:1807–1814. https://doi.org/10.1093/infdis/jir192
Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. https://doi.org/10.1038/35057062
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324
Li Y, Willer CJ, Ding J et al (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834. https://doi.org/10.1002/gepi.20533
Lin P, Hartz SM, Zhang Z et al (2010) A new statistic to evaluate imputation reliability. PLoS One 5:e9697. https://doi.org/10.1371/journal.pone.0009697
Loh P-R, Danecek P, Palamara PF et al (2016) Reference-based phasing using the Haplotype Reference Consortium panel. https://doi.org/10.1101/052308
Mangia A, Gentile R, Cascavilla I et al (1999) HLA class II favors clearance of HCV infection and progression of the chronic liver damage. J Hepatol 30:984–989
Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11:499–511. https://doi.org/10.1038/nrg2796
Mathias RA, Taub MA, Gignoux CR et al (2016) A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 7:12522. https://doi.org/10.1038/ncomms12522
McCarthy S, Das S, Kretzschmar W et al (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48:1279–1283. https://doi.org/10.1038/ng.3643
McRae AF (2017) Analysis of genome-wide association data. In: Keith JM (ed) Bioinformatics, 2nd edn. Humana Press, Melbourne, pp 161–174
Mitt M, Kals M, Pärn K et al (2017) Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet. https://doi.org/10.1038/ejhg.2017.51
Nelson SC, Doheny KF, Pugh EW et al (2013) Imputation-based genomic coverage assessments of current human genotyping arrays. G3 3:1795–1807. https://doi.org/10.1534/g3.113.007161
Nelson SC, Romm JM, Doheny KF, et al (2017) Imputation-based genomic coverage assessments of current genotyping arrays: Illumina HumanCore, OmniExpress, Multi-Ethnic global array and sub-arrays, Global Screening Array, Omni2.5M, Omni5M, and Affymetrix UK Biobank. https://doi.org/10.1101/150219
Nothnagel M, Ellinghaus D, Schreiber S et al (2009) A comprehensive evaluation of SNP genotype imputation. Hum Genet 125:163–171. https://doi.org/10.1007/s00439-008-0606-5
Parker MM, Foreman MG, Abel HJ et al (2014) Admixture mapping identifies a quantitative trait locus associated with FEV1/FVC in the COPDGene study. Genet Epidemiol 38:652–659. https://doi.org/10.1002/gepi.21847
Pistis G, Porcu E, Vrieze SI et al (2015) Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur J Hum Genet 23:975–983. https://doi.org/10.1038/ejhg.2014.216
Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909. https://doi.org/10.1038/ng1847
Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. https://doi.org/10.1086/519795
R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Ramnarine S, Zhang J, Chen L-S et al (2015) When does choice of accuracy measure alter imputation accuracy assessments. PLoS One 10:e0137601. https://doi.org/10.1371/journal.pone.0137601
Regan EA, Hokanson JE, Murphy JR et al (2010) Genetic epidemiology of COPD (COPDGene) study design. COPD J Chronic Obstr Pulm Dis 7:32–43. https://doi.org/10.3109/15412550903499522
Roshyara NR, Scholz M (2015) Impact of genetic similarity on imputation accuracy. BMC Genet 16:90. https://doi.org/10.1186/s12863-015-0248-2
Roshyara NR, Horn K, Kirsten H et al (2016) Comparing performance of modern genotype imputation methods in different ethnicities. Sci Rep 6:34386. https://doi.org/10.1038/srep34386
Shriner D, Adeyemo A, Chen G, Rotimi CN (2010) Practical considerations for imputation of untyped markers in admixed populations. Genet Epidemiol 34:258–265. https://doi.org/10.1002/gepi.20457
Sudmant PH, Rausch T, Gardner EJ et al (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81. https://doi.org/10.1038/nature15394
Sung YJ, Gu CC, Tiwari HK et al (2012) Genotype imputation for African Americans using data from HapMap phase II versus 1000 genomes projects. Genet Epidemiol 36:508–516. https://doi.org/10.1002/gepi.21647
The International HapMap 3 Consortium, Altshuler DM, Gibbs RA et al (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58. https://doi.org/10.1038/nature09298
Tobler LH, Bahrami SH, Kaidarova Z et al (2010) A case–control study of factors associated with resolution of hepatitis C viremia in former blood donors (CME). Transfusion 50:1513–1523. https://doi.org/10.1111/j.1537-2995.2010.02634.x
Van der Auwera G, Carneiro M, Hartl C et al (2013) From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinform 43:11.10.1–11.10.33. https://doi.org/10.1002/0471250953.bi1110s43
van Iperen E, Hovingh G, Asselbergs F, Zwinderman A (2017) Extending the use of GWAS data by combining data from different genetic platforms. PLoS One 12:e0172082. https://doi.org/10.1371/journal.pone.0172082 (eCollection 2017)
Verma SS, de Andrade M, Tromp G et al (2014) Imputation and quality control steps for combining multiple genome-wide datasets. Front Genet 5:370. https://doi.org/10.3389/fgene.2014.00370
Visscher PM, Wray NR, Zhang Q et al (2017) 10 Years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101:5–22. https://doi.org/10.1016/j.ajhg.2017.06.005
Vlahov D, Muñoz A, Anthony J et al (1990) Association of drug injection patterns with antibody to human immunodeficiency virus type 1 among intravenous drug users in Baltimore, Maryland. Am J Epidemiol 132:847–856
Walter K, Min JL, Huang J et al (2015) The UK10K project identifies rare variants in health and disease. Nature 526:82–90. https://doi.org/10.1038/nature14962
Warren HR, Evangelou E, Cabrera CP et al (2017) Genome-wide association analysis identifies novel blood pressure loci and offers biological insights into cardiovascular risk. Nat Genet 49:403–415. https://doi.org/10.1038/ng.3768
Wojcik GL, Thio CL, Kao WHL et al (2014) Admixture analysis of spontaneous hepatitis C virus clearance in individuals of African-descent. Genes Immun 15:241–246. https://doi.org/10.1038/gene.2014.11
Wojcik GL, Fuchsberger C, Taliun D, et al (2017) Imputation aware tag SNP selection to improve power for multi-ethnic association studies. https://doi.org/10.1101/105551
Zhang B, Zhi D, Zhang K et al (2011) Practical consideration of genotype imputation: sample size, window size, reference choice, and untyped rate. Stat Interface 4:339–352
Zhao Z, Timofeev N, Hartley SW et al (2008) Imputation of missing genotypes: an empirical evaluation of IMPUTE. BMC Genet 9:85. https://doi.org/10.1186/1471-2156-9-85
Zheng-Bradley X, Flicek P (2016) Applications of the 1000 genomes project resources. Brief Funct Genom 16:elw027. https://doi.org/10.1093/bfgp/elw027
Funding
This project was funded in part with federal funds from the office of AIDS Research through the Center for Inherited Diseases at Johns Hopkins University, the National Institutes of Drug Abuse R01013324. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. Liliana Franco was supported by COLCIENCIAS’s (Administrative Department of Science, Technology and Innovation -Departamento Administrativo de Ciencia, Tecnología e Innovación-) scholarship program for PhD students and the Epidemiology Group of National School of Public Health of the University of Antioquia. The COPDGene project (NCT00608764) was supported by Award Number R01HL089897 and Award Number R01HL089856 from the National Heart, Lung, and Blood Institute. Margaret M. Parker was supported by T32HL007427. The COPDGene project is also supported by the COPD Foundation through contributions made to the Industry Advisory Board comprising AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, Sunovion, and GlaxoSmithKline.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
M.H.C. has received grant support from GSK. The remaining authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Vergara, C., Parker, M.M., Franco, L. et al. Genotype imputation performance of three reference panels using African ancestry individuals. Hum Genet 137, 281–292 (2018). https://doi.org/10.1007/s00439-018-1881-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-018-1881-4