Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Population and genomic lessons from genetic analysis of two Indian populations


Indian demographic history includes special features such as founder effects, interpopulation segregation, complex social structure with a caste system and elevated frequency of consanguineous marriages. It also presents a higher frequency for some rare mendelian disorders and in the last two decades increased prevalence of some complex disorders. Despite the fact that India represents about one-sixth of the human population, deep genetic studies from this terrain have been scarce. In this study, we analyzed high-density genotyping and whole-exome sequencing data of a North and a South Indian population. Indian populations show higher differentiation levels than those reported between populations of other continents. In this work, we have analyzed its consequences, by specifically assessing the transferability of genetic markers from or to Indian populations. We show that there is limited genetic marker portability from available genetic resources such as HapMap or the 1,000 Genomes Project to Indian populations, which also present an excess of private rare variants. Conversely, tagSNPs show a high level of portability between the two Indian populations, in contrast to the common belief that North and South Indian populations are genetically very different. By estimating kinship from mates and consanguinity in our data from trios, we also describe different patterns of assortative mating and inbreeding in the two populations, in agreement with distinct mating preferences and social structures. In addition, this analysis has allowed us to describe genomic regions under recent adaptive selection, indicating differential adaptive histories for North and South Indian populations. Our findings highlight the importance of considering demography for design and analysis of genetic studies, as well as the need for extending human genetic variation catalogs to new populations and particularly to those with particular demographic histories.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664. doi:10.1101/gr.094052.109

  2. Al-Kandari W, Jambunathan S, Navalgund V et al (2007) ZXDC, a novel zinc finger protein that binds CIITA and activates MHC gene transcription. Mol Immunol 44:311–321. doi:10.1016/j.molimm.2006.02.029

  3. Al-Mayouf SM, Sunker A, Abdwani R et al (2011) Loss-of-function variant in DNASE1L3 causes a familial form of systemic lupus erythematosus. Nat Genet 43:1186–1188. doi:10.1038/ng.975

  4. Balaresque PL, Ballereau SJ, Jobling MA (2007) Challenges in human genetic diversity: demographic history and adaptation. Hum Mol Genet 16 Spec No:R134–R139. doi:10.1093/hmg/ddm242

  5. Bamshad MJ, Ng SB, Bigham AW et al (2011) Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 12:745–755. doi:10.1038/nrg3031

  6. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265

  7. Basu Mallick C, Iliescu FM, Möls M et al (2013) The light skin allele of SLC24A5 in South Asians and Europeans shares identity by descent. PLoS Genet 9:e1003912. doi:10.1371/journal.pgen.1003912

  8. Basu A, Mukherjee N, Roy S et al (2003) Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res 13:2277–2290. doi:10.1101/gr.1413403

  9. Bittles AH (2010) Consanguinity, genetic drift, and genetic diseases in populations with reduced numbers of founders. In: Speicher MR, Stylianos E, Antonarakis AGM (eds) Vogel Motulsky’s human genetics problem approaches. Springer-Verlag, Berlin, pp 507–528

  10. Bosch E, Laayouni H, Morcillo-Suarez C et al (2009) Decay of linkage disequilibrium within genes across HGDP-CEPH human samples: most population isolates do not show increased LD. BMC Genom 10:338. doi:10.1186/1471-2164-10-338

  11. Bowdish DM, Sakamoto K, Lack NA et al (2013) Genetic variants of MARCO are associated with susceptibility to pulmonary tuberculosis in a Gambian population. BMC Med Genet 14:47. doi:10.1186/1471-2350-14-47

  12. Bustamante CD, Burchard EG, De la Vega FM (2011) Genomics for the world. Nature 475:163–165

  13. Cann HM, de Toma C, Cazes L et al (2002) A human genome diversity cell line panel. Science 80(296):261–262

  14. Carlson CS, Eberle MA, Rieder MJ et al (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106–120

  15. Casals F, Bertranpetit J (2012) Genetics. Human genetic variation, shared and private. Science 337:39–40. doi:10.1126/science.1224528

  16. Casals F, Sikora M, Laayouni H et al (2011) Genetic adaptation of the antibacterial human innate immunity network. BMC Evol Biol 11:202. doi:10.1186/1471-2148-11-202

  17. Casals F, Hodgkinson A, Hussin J et al (2013) Whole-exome sequencing reveals a rapid change in the frequency of rare functional variants in a founding population of humans. PLoS Genet 9:e1003815. doi:10.1371/journal.pgen.1003815

  18. Chadha VK, Kumar P, Jagannatha PS et al (2005) Average annual risk of tuberculous infection in India. Int J Tuberc Lung Dis 9:116–118

  19. Chakrabarti B, Kumar S, Singh R, Dimitrova N (2012) Genetic diversity and admixture patterns in Indian populations. Gene 508:250–255. doi:10.1016/j.gene.2012.07.047

  20. Consortium IGV (2008) Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet 87:3–20

  21. Consortium TIGV (2005) The Indian Genome Variation database (IGVdb): a project overview. Hum Genet 118:1–11. doi:10.1007/s00439-005-0009-9

  22. Court N, Vasseur V, Vacher R et al (2010) Partial redundancy of the pattern recognition receptors, scavenger receptors, and C-type lectins for the long-term control of Mycobacterium tuberculosis infection. J Immunol 184:7057–7070. doi:10.4049/jimmunol.1000164

  23. Coventry A, Bull-Otterson LM, Liu X et al (2010) Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat Commun 1:131

  24. Delaneau O, Marchini J, Zagury J-F (2012) A linear complexity phasing method for thousands of genomes. Nat Methods 9:179–181. doi:10.1038/nmeth.1785

  25. DePristo MA, Banks E, Poplin R et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498

  26. Fu W, O’Connor TD, Jun G et al (2012) Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493:216–220

  27. Gonzalez-Neira A, Ke X, Lao O et al (2006) The portability of tagSNPs across populations: a worldwide survey. Genome Res 16:323–330

  28. Gravel S, Henn BM, Gutenkunst RN et al (2011) Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci USA 108:11983–11988

  29. Izagirre N, García I, Junquera C et al (2006) A scan for signatures of positive selection in candidate loci for skin pigmentation in humans. Mol Biol Evol 23:1697–1706. doi:10.1093/molbev/msl030

  30. Juyal G, Amre D, Midha V et al (2007) Evidence of allelic heterogeneity for associations between the NOD2/CARD15 gene and ulcerative colitis among North Indians. Aliment Pharmacol Ther 26:1325–1332. doi:10.1111/j.1365-2036.2007.03524.x

  31. Juyal G, Midha V, Amre D et al (2009) Associations between common variants in the MDR1 (ABCB1) gene and ulcerative colitis among North Indians. Pharmacogenet Genomics 19:77–85. doi:10.1097/FPC.0b013e32831a9abe

  32. Juyal G, Prasad P, Senapati S et al (2011) An investigation of genome-wide studies reported susceptibility loci for ulcerative colitis shows limited replication in North Indians. PLoS One 6:e16565. doi:10.1371/journal.pone.0016565

  33. Keinan A, Clark AG (2012) Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 80(336):740–743

  34. Kennedy RB, Ovsyannikova IG, Pankratz VS et al (2012) Genome-wide analysis of polymorphisms associated with cytokine responses in smallpox vaccine recipients. Hum Genet 131:1403–1421. doi:10.1007/s00439-012-1174-2

  35. Kryukov GV, Pennacchio LA, Sunyaev SR (2007) Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 80:727–739. doi:10.1086/513473

  36. Laayouni H, Oosting M, Luisi P et al (2014) Convergent evolution in European and Rroma populations reveals pressure exerted by plague on toll-like receptors. Proc Natl Acad Sci USA 111:2668–2673. doi:10.1073/pnas.1317723111

  37. Lamason RL, Mohideen M-APK, Mest JR et al (2005) SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 310:1782–1786. doi:10.1126/science.1116238

  38. Leutenegger A-L, Sahbatou M, Gazal S et al (2011) Consanguinity around the world: what do the genomic data of the HGDP-CEPH diversity panel tell us? Eur J Hum Genet 19:583–587. doi:10.1038/ejhg.2010.205

  39. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760

  40. Li Y, Vinckenbosch N, Tian G et al (2010) Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet 42:969–972. doi:10.1038/ng.680

  41. Manolio TA, Collins FS, Cox NJ et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753. doi:10.1038/nature08494

  42. Marth GT, Yu F, Indap AR et al (2011) The functional spectrum of low-frequency coding variation. Genome Biol 12:R84. doi:10.1186/gb-2011-12-9-r84

  43. McKemy DD, Neuhausser WM, Julius D (2002) Identification of a cold receptor reveals a general role for TRP channels in thermosensation. Nature 416:52–58. doi:10.1038/nature719

  44. McKenna A, Hanna M, Banks E et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303. doi:10.1101/gr.107524.110

  45. Metspalu M, Romero IG, Yunusbayev B et al (2011) Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia. Am J Hum Genet 89:731–744. doi:10.1016/j.ajhg.2011.11.010

  46. Moorjani P, Thangaraj K, Patterson N et al (2013) Genetic evidence for recent population mixture in India. Am J Hum Genet 93:422–438. doi:10.1016/j.ajhg.2013.07.006

  47. Negi S, Juyal G, Senapati S et al (2013) A genome-wide association study reveals ARL15, a novel non-HLA susceptibility gene for rheumatoid arthritis in North Indians. Arthritis Rheum 65:3026–3035. doi:10.1002/art.38110

  48. Nelson MR, Bryc K, King KS et al (2008) The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research. Am J Hum Genet 83:347–358. doi:10.1016/j.ajhg.2008.08.005

  49. Nelson MR, Wegmann D, Ehm MG et al (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337:100–104. doi:10.1126/science.1217876

  50. Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451. doi:10.1038/nrg2986

  51. Peier AM, Moqrich A, Hergarden AC et al (2002) A TRP channel that senses cold stimuli and menthol. Cell 108:705–715

  52. Pickrell JK, Coop G, Novembre J et al (2009) Signals of recent positive selection in a worldwide sample of human populations. Genome Res 19:826–837

  53. Pradhan S, Sengupta M, Dutta A et al (2011) Indian genetic disease database. Nucleic Acids Res 39:D933–D938. doi:10.1093/nar/gkq1025

  54. Prasad P, Kumar A, Gupta R et al (2012) Caucasian and Asian specific rheumatoid arthritis risk loci reveal limited replication and apparent allelic heterogeneity in north Indians. PLoS One 7:e31584. doi:10.1371/journal.pone.0031584

  55. Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909

  56. Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. doi:10.1086/519795

  57. Qin ZS, Gopalakrishnan S, Abecasis GR (2006) An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinformatics 22:220–225. doi:10.1093/bioinformatics/bti762

  58. Reich D, Thangaraj K, Patterson N et al (2009) Reconstructing Indian population history. Nature 461:489–494. doi:10.1038/nature08365

  59. Rosenberg NA, Mahajan S, Gonzalez-Quevedo C et al (2006) Low levels of genetic divergence across geographically and linguistically diverse populations from India. PLoS Genet 2:e215. doi:10.1371/journal.pgen.0020215

  60. Sabeti PC, Schaffner SF, Fry B et al (2006) Positive natural selection in the human lineage. Science 80(312):1614–1620

  61. Sabeti PC, Varilly P, Fry B et al (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449:913–918. doi:10.1038/nature06250

  62. Sironi M, Clerici M (2010) The hygiene hypothesis: an evolutionary perspective. Microbes Infect 12:421–427

  63. Tennessen JA, Bigham AW, O’Connor TD et al (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337:64–69. doi:10.1126/science.1219240

  64. Voight BF, Kudaravalli S, Wen X, Pritchard JK (2006) A map of recent positive selection in the human genome. PLoS Biol 4:e72

  65. Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164. doi:10.1093/nar/gkq603

  66. Weir BS, Hill WG (2002) Estimating F-statistics. Annu Rev Genet 36:721–750. doi:10.1146/annurev.genet.36.050802.093940

  67. Xing J, Watkins WS, Hu Y et al (2010) Genetic diversity in India and the inference of Eurasian population expansion. Genome Biol 11:R113. doi:10.1186/gb-2010-11-11-r113

Download references


We thank Lara Nonell and Eulàlia Puigdecanet from the Servei d’Anàlisi de Microarrays (IMIM) for their invaluable help. We would like to acknowledge David Sondervan and Ingrid Bakker from the section Medical Genomics of the VUMC for sequencing of the samples. We thank Dr. A. R Rao and Dr. Namita Sidhu from IASRI, New Delhi, India for statistical assistance in the early part of the study. We deeply thank Txema Heredia, Ángel Carreño and Jordi Rambla for computational support, Marc Pybus for his help in the selection analysis, and David Comas for critical reading of the manuscript. International fellowship funded by Center for Neurogenomics and Cognitive Research (CNCR), VU, Amsterdam, The Netherlands to GJ; Research grant from J C Bose fellowship to BKT; grant # BT/01/COE/07/UDSC to BKT and salary support to GJ are gratefully acknowledged. FC was supported by a Beatriu de Pinós (2010-BP- B-00128) fellowship and MM by a PhD grant both from AGAUR (Generalitat de Catalunya). Funding to FC by grant SAF2012-35025 from the Ministerio de Economía y Competitividad (Spain); Funding to JB by grants BFU2010-19443 from the Ministerio de Ciencia y Tecnología (Spain), PRI-PIBIN-2011-0942 from the Ministerio de Economía y Competitividad (Spain), and from the Direcció General de Recerca, Generalitat de Catalunya (Grup de Recerca Consolidat 2009 SGR 1101).

Author information

Correspondence to B. K. Thelma or Ferran Casals.

Additional information

G. Juyal and M. Mondal have contributed equally to this work.

B. K. Thelma and F. Casals are co-senior authors.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 131 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Juyal, G., Mondal, M., Luisi, P. et al. Population and genomic lessons from genetic analysis of two Indian populations. Hum Genet 133, 1273–1287 (2014). https://doi.org/10.1007/s00439-014-1462-0

Download citation


  • Rare Variant
  • Indian Population
  • Exome Sequencing
  • Assortative Mating
  • Demographic History