Abstract
In this chapter, we describe analyses commonly applied to immunogenetic population data, along with software tools that are currently available to perform those analyses. Where possible, we focus on tools that have been developed specifically for the analysis of highly polymorphic immunogenetic data. These analytical methods serve both as a means to examine the appropriateness of a dataset for testing a specific hypothesis, as well as a means of testing hypotheses. Rather than treat this chapter as a protocol for analyzing any population dataset, each researcher and analyst should first consider their data, the possible analyses, and any available tools in light of the hypothesis being tested. The extent to which the data and analyses are appropriate to each other should be determined before any analyses are performed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
R Core Development Team (2008) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Li W, Hu Z, Jiang W (2010) An alphabetic list of genetic analysis software. North Shore LIJ Research Institute. http://www.nslij-genetics.org/soft/. Accessed 7 Oct 2010
Lancaster AK, Single RM, Solberg OD, Nelson MP, Thomson G (2007) PyPop update—a software pipeline for large-scale multilocus population genomics. Tissue Antigens 69(s1):192–197
Solberg OD, Mack SJ, Lancaster AK, Single RM, Tsai Y, Sanchez-Mazas A, Thomson G (2008) Balancing selection and heterogeneity across the classical human leukocyte antigen loci: a meta-analytic review of 497 population studies. Hum Immunol 69(7):443–464
Peakall R, Smouse PE (2006) GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Mol Ecol Notes 6:288–295
Lynch M, Milligan BG (1994) Analysis of population genetic structure with RAPD markers. Mol Ecol 3:91–99
Louis EJ, Dempster ER (1987) An exact test for Hardy-Weinberg and multiple alleles. Biometrics 43(4):805–811
Levene H (1949) On a matching problem arising in genetics. Ann Math Stat 20(1):91–94
Emigh TH (1980) A comparison of tests for Hardy-Weinberg equilibrium. Biometrics 36(4):627–642
Guo SW, Thompson EA (1992) Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics 48(2):361–372
Huber M, Chen Y, Dinwoodie I, Dobra A, Nicholas M (2006) Monte Carlo algorithms for Hardy-Weinberg proportions. Biometrics 62:49–53
Yuan A, Bonney GE (2003) Exact test of Hardy-Weinberg equilibrium by Markov chain Monte Carlo. Math Med Biol 20:327–340
Ebrahimi N, Bilgili D (2007) A new method of testing for Hardy-Weinberg equilibrium and ordering populations. J Genet 86:1–7
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21:1087–1092
Chen JJ, Hollenbach JA, Trachtenberg EA, Just JJ, Carrington M, Rønningen KS, Begovich A, King MC, McWeeney SK, Mack SJ, Erlich HA, Thomson G (1999) Hardy-Weinberg testing for HLA class II (DRB1, DQA1, DQB1 and DPB1) loci in 26 human ethnic groups. Tissue Antigens 54:533–542
Hernández JL, Weir BS (1989) A disequilibrium coefficient approach to Hardy-Weinberg testing. Biometrics 45(1):53–70
Chen JJ, Thomson G (1999) The variance for the disequilibrium coefficient in the individual Hardy-Weinberg test. Biometrics 55:1269–1272
Barnetche T, Gourraud PA, Cambon-Thomsen A (2005) Strategies in analysis of the genetic component of multifactorial diseases; biostatistical aspects. Transpl Immunol 14(3–4):255–266
Guan Y, Stephens M (2008) Practical issues in imputation-based association mapping. PLoS Genet 4(12):e1000279
Piazza A (1975) Haplotypes and linkage disequilibria from three-locus phenotypes. In: Kissmeyer-Nielsen F (ed) Histocompatibility testing. Munskgaard, Copenhagen, pp 923–927
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39:1–38
Ott J (1977) Counting methods (EM algorithm) in human pedigree analysis: linkage and segregation analysis. Ann Hum Genet 40:443–454
Yasuda N (1978) Estimation of haplotype frequency and linkage disequilibrium parameter in the HLA system. Tissue Antigens 12:315–322
Morton NE, Simpson SP, Lew R, Yee S (1983) Estimation of haplotype frequencies. Tissue Antigens 22(4):257–262
Hawley ME, Kidd KK (1995) HAPLO: a program using the EM algorithm to estimate the frequencies of multisite haplotypes. J Hered 86:409–411
Long JC, Williams RC, Urbanek M (1995) An E-M algorithm and testing strategy for multiple-locus haplotypes. Am J Hum Genet 56:799–810
Fallin D, Schork NJ (2000) Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. Am J Hum Genet 67:947–959
Tishkoff SA, Pakstis AJ, Ruano G, Kidd KK (2000) The accuracy of statistical methods for estimation of haplotype frequencies: an example from the CD4 locus. Am J Hum Genet 67:518–522
Kirk KM, Cardon LR (2002) The impact of genotyping error on haplotype reconstruction and frequency estimation. Eur J Hum Genet 10:616–622
Single RM, Meyer D, Hollenbeck J, Nelson M, Noble JA, Erlich HA, Thomson G (2002) Haplotype frequency estimation in patient populations: the effect of departures from Hardy-Weinberg proportions and collapsing over a locus in the HLA region. Gen Epidemiol 22:186–195
Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978–989
Niu T, Qin ZS, Xu X, Liu JS (2002) Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am J Hum Genet 70:157–169
Qin ZS, Niu T, Liu JS (2002) Partition-ligation-expectationmaximization algorithm for haplotype inference with singlenucleotide polymorphisms. Am J Hum Genet 71:1242–1247
Stephens M, Donnelly P (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73:1162–1169
Niu T (2004) Algorithms for inferring haplotypes. Genet Epidemiol 27(4):334–347
Slatkin M, Excoffier L (1996) Testing for linkage disequilibrium in genotypic data using the Expectation-Maximization algorithm. Heredity 76:377–383
Tishkoff SA, Kidd KK (2004) Implications of biogeography of human populations for ‘race’ and medicine. Nat Genet 36(Suppl 11):S21–S27
Robinson J, Waller MJ, Fail SC, Marsh SG (2006) The IMGT/HLA and IPD databases. Hum Mutat 27:1192–1199
Gourraud PA, Gagne K, Bignon JD, Cambon-Thomsen A, Middleton D (2007) Preliminary analysis of a KIR haplotype estimation algorithm: a simulation study. Tissue Antigens 69(Suppl 1):96–100
Yoo YJ, Tang J, Kaslow RA, Zhang K (2007) Haplotype inference for present-absent genotype data using previously identified haplotypes and haplotype patterns. Bioinformatics 23(18):2399–2406
Lewontin RC (1964) The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49:49–67
Hill WG, Robertson A (1968) Linkage disequilibrium in finite populations. Theor Appl Genet (Der Ziichter) 38:226–231
Klitz W, Stephen JC, Grote M, Carrington M (1995) Discordant patterns of linkage disequilibrium of the peptide transporter loci within the HLA class II region. Am J Hum Genetics 57:1436–1444
Cramer H (1946) Mathematical methods of statistics. Princeton University Press, Princeton, NJ
Cohen J (1988) Statistical power analysis for the behavioral sciences. Erlbaum, Hillsdale, NJ
Hedrick PW (1987) Gametic disequilibrium measures: proceed with caution. Genetics 117(2):331–341
Abecasis GR, Cookson WO (2000) GOLD-graphical overview of linkage disequilibrium. Bioinformatics 16:182–183
Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21(2):263–265
Gaunt TR, Rodriguez S, Zapata C, Day IN (2006) MIDAS: software for analysis and visualisation of interallelic disequilibrium between multiallelic markers. BMC Bioinformatics 7:227–237
Shin J-H, Blay S, McNeney B, Graham J (2006) LDheatmap: an R function for graphical display of pairwise linkage disequilibria between single nucleotide polymorphisms. J Stat Soft 16 Code Snippet 3
Ewens WJ (1972) The sampling theory of selectively neutral alleles. Theor Popul Biol 3(1):87–112
Watterson G (1978) The homozygosity test of neutrality. Genetics 88:405–417
Slatkin M (1994) An exact test for neutrality based on the Ewens sampling distribution. Genet Res 64:71–74
Slatkin M (1996) A correction to the exact test based on the Ewens sampling distribution. Genet Res 68:259–260
Salamon H, Klitz W, Easteal S, Gao X, Erlich HA, Fernandez-Vina M, Trachtenberg EA (1999) Evolution of HLA class II molecules: allelic and amino acid site variability across populations. Genetics 152:393–400
Conover W (1980) Practical nonparametric statistics. Wiley, New York
Chakravarti A (1991) Information content of the Cen tre d’Etude du Polymorphisme Humain (CEPH) family structures for linkage studies. Hum Genet 87:721–724
Wright S (1951) The genetic structure of populations. Ann Eugen 15:323–354
Nei M (1977) F-statistics and analysis of gene diversity in subdivided populations. Ann Hum Genet 41:225–233
Wright S (1978) Evolution and the genetics of populations, vol 4. The University of Chicago Press, Chicago, Variability Within and Among Natural Populations
Slatkin M (1991) Inbreeding coefficients and coalescence times. Genet Res 58:167–175
Weber JL, Wong C (1993) Mutation of human short tandem repeats. Hum Mol Genet 2(8):1123–1128
Gyapay G, Morissette J, Vignal A, Dib C, Fizames C, Millasseau P, Marc S, Bernardi G, Lathrop M, Weissenbach J (1994) The 1993–1994 Genethon human genetic linkage map. Nat Genet 7:246–339
Slatkin M (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457–462
Reynolds J, Weir BS, Cockerham CC (1983) Estimation for the coancestry coefficient: basis for a short-term genetic distance. Genetics 105:767–779
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370
Hedrick PW (2005) A standardized genetic differentiation measure. Evolution 59:1633–1638
Pearson K (1901) “On lines and planes of closest fit to systems of points in space” (PDF). Phil Mag 2(6):559–572
Cox TF, Cox MAA (2001) Multidimensional Scaling, 2nd edn. Chapman and Hall, Boca Raton, FL
Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer, New York
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945–959
Hubisz MJ, Falush D, Stephens M, Pritchard JK (2009) Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour 9(5):1322–1332
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW (2002) Genetic structure of human populations. Science 298(5602):2381–2385
Karypis G. (2002) CLUTO: a clustering toolkit. Technical Report 02-017. University of Minnesota, Minneapolis, MN
Felsenstein J (1989) PHYLIP—Phylogeny Inference Package (Version 3.2). Cladistics 5:164–166
Felsenstein, J. (2005) PHYLIP (Phylogeny Inference Package) version 3.6 (Distributed by the author). Department of Genome Sciences, University of Washington, Seattle
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425
Nei M (1972) Genetic distance between populations. Am Nat 106:283–292
Cavalli-Sforza LL, Edwards AFW (1967) Phylogenetic analysis: models and estimation procedures. Am J Hum Genet 19:233–257
Hughes AL, Nei M (1988) Pattern of nucleotide substitution at MHC class I loci reveals overdominant selection. Nature 335:167–170
Hughes AL, Yeager M (1998) Natural selection at major histocompatibility complex loci of vertebrates. Ann Rev Genet 32:415–435
Sanchez-Mazas A, Fernandez-Viña M, Middleton D, Hollenbach JA, Buhler S, Di D, Rajalingam R, Dugoujon JM, Mack SJ, Thorsby E (2011) Immunogenetics as a tool in anthropological studies. Immunology 133(2):143–164
Middleton D, Gonzalez F, Fernandez-Vina M, Tiercy JM, Marsh SG, Aubrey M, Bicalho MG, Canossi A, Carter V, Cate S, Guerini FR, Loiseau P, Martinetti M, Moraes ME, Morales V, Perasaari J, Setterholm M, Sprague M, Tavoularis S, Torres M, Vidal S, Witt C, Wohlwend G, Yang KL (2009) A bioinformatics approach to ascertaining the rarity of HLA alleles. Tissue Antigens 74:480–485
Cadavid LF, Watkins DI (1997) Heirs of the jaguar and the anaconda: HLA, conquest and disease in the indigenous populations of the Americas. Tissue Antigens 6:702–711
Erlich HA, Mack SJ, Bergström T, Gyllensten UB (1997) HLA class II alleles in Amerindian populations: implications for the evolution of HLA polymorphism and the colonization of the Americas. Hereditas 127(1–2):19–24
Sokal R, Michener C (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438
Murtagh F (1984) Complexities of hierarchic clustering algorithms: the state of the art. Comput Stat Quart 1:101–113
Bergström TF, Josefsson A, Erlich HA, Gyllensten UB (1997) Analysis of intron sequences at the class II HLA-DRB1 locus: implications for the age of allelic diversity. Hereditas 127(1–2):1–5
Bergström TF, Josefsson A, Erlich HA, Gyllensten U (1998) Recent origin of HLA-DRB1 alleles and implications for human evolution. Nat Genet 18(3):237–242
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791
Penny D, Hendry MD (1985) Testing methods of evolutionary tree construction. Cladistics 1:266–278
Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The history and geography of human genes. Princeton University Press, Pinceton, NJ
Cavalli-Sforza LL, Feldman MW (2003) The application of molecular genetic approaches to the study of human evolution. Nat Genet 33:266–275
Mack SJ, Erlich HA (2007) Population relationships as inferred from classical HLA genes. 13th International histocompatibility workshop anthropology/human genetic diversity joint report. In: Hansen JA (ed) Immunobiology of the human MHC: Proceedings of the 13th international histocompatibility workshop and conference, vol I. IHWG, Seattle, pp. 747–757
Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583–590
Acknowledgments
This work was supported by National Institutes of Health (NIH) grants U01AI067068 (JAH, SJM) and U19 AI067152 (PAG) awarded by the National Institute of Allergy and Infectious Diseases (NIAID) and by NIH/NIAID contract AI40076 (RMS, GT). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Allergy and Infectious Diseases or the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Mack, S.J., Gourraud, PA., Single, R.M., Thomson, G., Hollenbach, J.A. (2012). Analytical Methods for Immunogenetic Population Data. In: Christiansen, F., Tait, B. (eds) Immunogenetics. Methods in Molecular Biology, vol 882. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-842-9_13
Download citation
DOI: https://doi.org/10.1007/978-1-61779-842-9_13
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-841-2
Online ISBN: 978-1-61779-842-9
eBook Packages: Springer Protocols