Ancestry Inference in Complex Admixtures via Variable-Length Markov Chain Linkage Models

  • Sivan Bercovici
  • Jesse M. Rodriguez
  • Megan Elmore
  • Serafim Batzoglou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7262)


Inferring the ancestral origin of chromosomal segments in admixed individuals is key for genetic applications, ranging from analyzing population demographics and history, to mapping disease genes. Previous methods addressed ancestry inference by using either weak models of linkage disequilibrium, or large models that make explicit use of ancestral haplotypes. In this paper we introduce ALLOY, an efficient method that incorporates generalized, but highly expressive, linkage disequilibrium models. ALLOY applies a factorial hidden Markov model to capture the parallel process producing the maternal and paternal admixed haplotypes, and models the background linkage disequilibrium in the ancestral populations via an inhomogeneous variable-length Markov chain. We test ALLOY in a broad range of scenarios ranging from recent to ancient admixtures with up to four ancestral populations. We show that ALLOY outperforms the previous state of the art, and is robust to uncertainties in model parameters.


Population genetics ancestry inference VLMC FHMM 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alkan, C., Coe, B.P., Eichler, E.E.: Genome structural variation discovery and genotyping. Nature Reviews. Genetics 12(5), 363–376 (2011)CrossRefGoogle Scholar
  2. 2.
    Altshuler, D.M., Gibbs, R.A., Peltonen, L., Dermitzakis, E., Schaffner, S.F., Yu, F., Bonnen, P.E., De Bakker, P.I.W., Deloukas, P., Gabriel, S.B., et al.: Integrating common and rare genetic variation in diverse human populations. Nature 467(7311), 52–58 (2010)CrossRefGoogle Scholar
  3. 3.
    Baye, T.M., Wilke, R.A.: Mapping genes that predict treatment outcome in admixed populations. The Pharmacogenomics Journal 10(6), 465–477 (2010)CrossRefGoogle Scholar
  4. 4.
    Bercovici, S., Geiger, D.: Inferring ancestries efficiently in admixed populations with linkage disequilibrium. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 16(8), 1141–1150 (2009)MathSciNetGoogle Scholar
  5. 5.
    Browning, S.R., Browning, B.L.: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. The American Journal of Human Genetics 81(5), 1084–1097 (2007)CrossRefGoogle Scholar
  6. 6.
    Ghahramani, Z., Jordan, M.I., Smyth, P.: Factorial hidden markov models. In: Machine Learning. MIT Press (1997)Google Scholar
  7. 7.
    Gravel, S., Henn, B.M., Gutenkunst, R.N., Indap, A.R., Marth, G.T., Clark, A.G., Yu, F., Gibbs, R.A., Project, T.G., Bustamante, C.D.: Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences 108(29), 11983–11988 (2011)CrossRefGoogle Scholar
  8. 8.
    Haldane, J.B.S.: The combination of linkage values, and the calculation of distance between the loci of linked factors. J. Genet. 8, 299–309 (1919)CrossRefGoogle Scholar
  9. 9.
    Jakobsson, M., Scholz, S.W., Scheet, P., Gibbs, J.R., VanLiere, J.M., Fung, H.-C., Szpiech, Z.A., Degnan, J.H., Wang, K., Guerreiro, R., et al.: Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451(7181), 998–1003 (2008)CrossRefGoogle Scholar
  10. 10.
    Long, J.C.: The genetic structure of admixed population. Genetics (127), 417–428 (1991)Google Scholar
  11. 11.
    Pasaniuc, B., Sankararaman, S., Kimmel, G., Halperin, E.: Inference of locus-specific ancestry in closely related populations. Bioinformatics 25, i213–i221 (2009)CrossRefGoogle Scholar
  12. 12.
    Pasaniuc, B., Zaitlen, N., Lettre, G., Chen, G.K., Tandon, A., Kao, W.H.L., Ruczinski, I., Fornage, M., Siscovick, D.S., Zhu, X., Larkin, E., Lange, L.A., Cupples, L.A., Yang, Q., Akylbekova, E.L., Musani, S.K., Divers, J., Mychaleckyj, J., Li, M., Papanicolaou, G.J., Millikan, R.C., Ambrosone, C.B., John, E.M., Bernstein, L., Zheng, W., Hu, J.J., Ziegler, R.G., Nyante, S.J., Bandera, E.V., Ingles, S.A., Press, M.F., Chanock, S.J., Deming, S.L., Rodriguez-Gil, J.L., Palmer, C.D., Buxbaum, S., Ekunwe, L., Hirschhorn, J.N., Henderson, B.E., Myers, S., Haiman, C.A., Reich, D., Patterson, N., Wilson, J.G., Price, A.L.: Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a Breast Cancer Consortium. PLoS Genetics 7(4), e1001371 (2011)CrossRefGoogle Scholar
  13. 13.
    Patterson, N., Hattangadi, N., Lane, B., Lohmueller, K.E., Hafler, D.A., Oksenberg, J.R., Hauser, S.L., Smith, M.W., O’Brien, S.J., Altshuler, D., Daly, M.J., Reich, D.: Methods for high-density admixture mapping of disease genes. American Journal of Human Genetics 74(5), 979–1000 (2004)CrossRefGoogle Scholar
  14. 14.
    Price, A.L., Tandon, A., Patterson, N., Barnes, K.C., Rafaels, N., Ruczinski, I., Beaty, T.H., Mathias, R., Reich, D., Myers, S.: Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 5(6), e1000519 (2009)CrossRefGoogle Scholar
  15. 15.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)Google Scholar
  16. 16.
    Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. Journal of Computer and System Sciences, 31–40 (1995)Google Scholar
  17. 17.
    Rosenberg, N.A., Li, L.M., Ward, R., Pritchard, J.K.: Informativeness of genetic markers for inference of ancestry. The American Journal of Human Genetics (73), 1402–1422 (2003)Google Scholar
  18. 18.
    Sankararaman, S., Sridhar, S., Kimmel, G., Halperin, E.: Estimating Local Ancestry in Admixed Populations. Journal of Human Genetics, 290–303 (February 2008)Google Scholar
  19. 19.
    Seldin, M.F., Pasaniuc, B., Price, A.L.: New approaches to disease mapping in admixed populations. Nature Reviews. Genetics 12(8), 523–528 (2011)CrossRefGoogle Scholar
  20. 20.
    Sundquist, A., Fratkin, E., Do, C.B., Batzoglou, S.: Effect of genetic divergence in identifying ancestral origin using HAPAA. Genome Research 18(4), 676–682 (2008)CrossRefGoogle Scholar
  21. 21.
    Tang, H., Coram, M., Wang, P., Zhu, X., Risch, N.: Reconstructing genetic ancestry blocks in admixed individuals. American Journal of Human Genetics 79(1), 1–12 (2006)CrossRefGoogle Scholar
  22. 22.
    Tian, C., Hinds, D.A., Shigeta, R., Kittles, R., Ballinger, D.G., Seldin, M.F.: A genomewide single-nucleotide polymorphism panel with high ancestry information for african american admixture mapping. The American Journal of Human Genetics (79), 640–649 (2006)Google Scholar
  23. 23.
    Winkler, C.A., Nelson, G.W., Smith, M.W.: Admixture mapping comes of age. Annual Review of Genomics and Human Genetics 11, 65–89 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sivan Bercovici
    • 1
  • Jesse M. Rodriguez
    • 1
    • 2
  • Megan Elmore
    • 1
  • Serafim Batzoglou
    • 1
  1. 1.Department of Computer ScienceStanford UniversityUSA
  2. 2.Biomedical Informatics ProgramStanford UniversityUSA

Personalised recommendations