Genetic Architecture of Southeast-coastal Indian tribal populations: A Y-chromosomal phylogenetic analysis
Y chromosome single nucleotide polymorphisms (Y-SNPs) are useful markers for reconstructing male lineages, haplogroup determinations, and paternity identifications. Since Y chromosome lacks recombination, the haplogroups of this series show a greater extent of diverse genome-specific geographical distributions and these haplogroups have been found to play a major role in forensic investigations and population genetics.
Materials and methods
The present study is aimed at determining the Y chromosomal phylogeny of two southeast coastal Indian tribal populations (Porja and Savara; N = 217), using a set of 15 bi-allelic markers on the non-recombining region of Y chromosome, representing two Austro-Asiatic (AA) language speaking populations.
Results and conclusions
The phylogenetic analysis revealed four paternal haplogroups, viz., H1*-M52, H1a*-M82, O2a*-M95, and R2-M124, with high frequency (84.79%) represented by the Austro-Asiatic-specific haplogroup O2a* (M95), confirming to the fact of O2a* haplogroup being the paternal signature of AA language family of Southeast Asians.
KeywordsForensic sciences Forensic genetics Phylogeny Haplogroups Bio-geography Ancestors
Last Glacial Maximum
Mitochondrial DNA mtDNA
Polymerase chain reaction
Primary Health Centre
Y chromosome single nucleotide polymorphisms
Y chromosome-short tandem repeats
DNA evidence has turned into an influential tool in forensic sciences for resolving cases in relating a suspect to a scene of crime, determining issues regarding biological relationships, and recognizing victims of mass disasters. The development of DNA expertise has supplemented diverse areas, utilization of DNA evidence through Y-SNPs, Y chromosome—short tandem repeats (Y-STRs), and mitochondrial DNA (mtDNA), bringing in immense possibilities in assisting the criminal justice system. Due to its uniqueness among the other human chromosomes, Y chromosome haplogroups or haplotypes have been used for the identification of criminals in forensic cases (Jobling et al. 1997), paternal lineages in human evolution (Jobling and Tyler-Smith 1995), diseases in medical genetics (Jobling and Tyler-Smith 2000), and pedigrees in genealogical reconstructions (Jobling 2001). Although forensic genetics covers a broad range of disciplines, such as forensic pathology (Alacs et al. 2010), complex traits (Kayser and Schneider 2009; Pulker et al. 2007), and wild life forensics (Budowle et al. 2005), nowadays, in the field of forensic genetics, short tandem repeats (STRs)-centered DNA testing (Edwards et al. 1992) has been accepted as a principal approach used in cases of naïve paternity investigations (ZupanicPajnic et al. 2001), identification of skeletal remains (ZupanicPajnic et al. 2010), and complex criminal cases, involving rape and gang rape. STRs occupy nearly 3% of the total human genome and are present once in every 10,000 nucleotides on an average (Butler 2005). Multiplexing facilitates the use of these markers in forensic anthropology and medicolegal studies. At present, a number of laboratories conduct STR analysis while studying population genetics and report them in various ethnic populations (Tandon et al. 2002; Sarkar and Kashyap 2002; Sahoo and Kashyap 2002; Gaikwad and Kashyap 2002; Rajkumar and Kashyap 2002; Narkuti et al. 2008; Dubey et al. 2008; Giroti and Talwar 2010; Ghosh et al. 2011; Chaudhari and Dahiya 2014; Shrivastava et al. 2015; Shrivastava et al. 2016; Jain et al. 2017; Imam et al. 2017). However, in spite of being the most consistent and frequently utilized genetic markers in forensics, STRs have some drawbacks, which undermine their efficacy. STRs deliver precise results on well-preserved bone and soft tissue samples. The size of amplification necessary for STR testing is too high (150–450 bp) to permit practical amplification of fragmented DNA templates.
Compared to a monotonous STR-centered DNA profiling, SNP markers provide a valuable and progressively additional important information. SNPs provide an infinite cradle of human genome diversity for testing (Cooper et al. 1985; Wang et al. 1998). SNP profiling as a tool for DNA detection presents some benefits over and above the usage of STR markers (Sinha et al. 2017).
Y chromosome phylogeny (phylogeography) studies can be done by using bi- or multi-allelic markers (Jobling and Tyler-Smith 2000; Y Chromosome Consortium 2002). The largest non-recombining region (NRY) of DNA and different stable markers in the human Y chromosome makes it a perfect marker for use in evolutionary studies. Due to their high geographic specificity, Y-SNP haplogroups can be used to understand admixture and stratification between populations (Jobling and Tyler-Smith 2003). Greater mutational stability and higher mutation rate of Y chromosome SNPs make it advantageous when typing with highly degraded DNA (Thomson et al. 2000; Sobrino et al. 2005; Chakraborty et al. 1999). The Y chromosome haplogroup O-M175 is an important marker for eastern and Southeastern Asia, as it covers the most ubiquitous Y chromosome lineage, covering about 75% of mainland China (Su et al. 1999) and 87% of Southeast Asia (Karafet et al. 2005; Li et al. 2008; Karafet et al. 2010; Delfin et al. 2011 and He et al. 2012). Y chromosome haplogroup O-M175 is present in 84.79% of the studied population and is significantly important, as it is the most ubiquitous Y lineage in mainland India, China, Malaysia, Indonesia, and Vietnam (Southeast Asian populations) (Karafet et al. 2008).
Many Indian studies have reported frequencies of Y chromosome haplogroups in varying ethnic and language speaking tribes and castes (Kumar et al. 2007; Sharma et al. 2012; Khurana et al. 2014; Singh et al. 2016). The findings in the abovementioned studies dissects the Y chromosomal haplogroup pool and are helpful in understanding the current genetic scenario of Indian populations. With the findings of the abovementioned studies in background, the present study was conducted on two important indigenous tribal populations of South India—Porja and Savara.
Porja population is mainly distributed near the hill slopes of Munchingputtu, Anantagiri, and Peddabayalu regions of Visakhapatnam, Andhra Pradesh (AP), India. They migrated from Odisha to the present habitat about 300 years back. Savara population can be seen in Lakaiguda, Mettiguda, Chintalaguda, and Manduguda regions of Srikakulam, AP, India. Savara language is included in the Kol Munda group of Austro-Asiatic language family.
Materials and methodology
Sample collection and DNA extraction
After obtaining individual informed consent from volunteer donors, 5 ml of blood sample was drawn by a trained medical practitioner of Primary Health Centre (PHC) of affiliated villages in EDTA-coated vacutainers and transported to the DNA Laboratory of Anthropological Survey of India, Southern Regional Centre at Mysore, Karnataka, India, for further extraction and analysis. The DNA extraction was done by phenol-chloroform method (Phenol-Chloroform Isoamyl Alcohol (PCI) DNA Extraction 1998) and quantified using UV-visible spectrophotometer (Perkin-Elmer) at A 260/280 nm.
PCR and sequencing
A set of 15 bi-allelic SNP markers was analyzed to identify the Y chromosome haplogroups using sets of primers as described elsewhere (Karafet et al. 2008). The polymerase chain reaction (PCR) cyclic conditions for specific primers were standardized in the DNA Lab, Anthropological Survey of India, Southern Regional Centre at Mysore, Karnataka, India. The initial denaturation was performed at 95 °C for 5 min, followed by denaturation at 94 °C for 1 min, at an annealing temperature for specific primers at 51–58 °C, extension at 72 °C for 2 min 30 s, and final extension at 72 °C for 7 min. The generated amplified products were directly sequenced using Big Dye™Terminator Cycle Sequencing kit in the ABI prism 3730 DNA Analyzer (Applied Bio-Systems, USA).
The sequences which were generated were aligned with the individual reference sequences using SeqScape software V2.5 (Applied Bio-Systems, USA). For assigning Y chromosome binary haplogroups, the revised Y chromosome phylogenetic tree was referred (Karafet et al. 2008).
Results and discussion
Y haplogroup distribution of the studied populations (Porja and Savara)
Haplogroup O identified by M175 (5-bp deletion) was found with highest frequency of 84.79%. It possibly originated in East Asia (Karafet et al. 2008) and then migrated to South Asia Pacific. Paternal signature of haplogroup O can be traced at moderate or low frequencies in some parts of Central Asia and Oceania (Cai et al. 2011; Karafet et al. 2001; Underhill et al. 2001 and Deng et al. 2004). Haplogroup O is further divided into three sub-clades which are defined by the presence of O1-MSY 2.2, O2-P31, and O3-M122. Although the most frequent sub-clade observed in the present study was O2a*, it occurs with a frequency of 42.86% in Porja and 41.94% in Savara. O2a lineages are found in Southeast Asian populations of Malaysia, Vietnam, Indonesia, and Southern China (Sengupta et al. 2006).
Haplogroup R is characterized by M207 and is further segregated into two sub-clades R1, which is identified by M173 A>C allele, and R2, identified by M124 C>T allele. This haplogroup R1-M173 is estimated to have arisen during the Last Glacial Maximum (LGM) and is likely to be found in Southwestern Asia (Zhao et al. 2009), which is believed to have arisen 27,000 years ago in Asia. In the present study, haplogroup R2 lineage is present in 3.23% and 3.69% of Porja and Savara population respectively.
Haplogroup H is identified by M69 T>C allele. It is further divided into two sub-clades H1 which is identified by M52 A>C allele and H2 which is identified by APT G>A allele. Because of the high frequency of the H haplogroup in Indian tribal groups, it is often regarded as the original Indian haplogroup belonging to be the ancient settlers. We found the presence of H1* and H1a* in the present studied samples in low frequencies, about 4.61% and 3.69% respectively. Haplogroup H has also been reported from Central Asia, Western Asia, and Europe (Wells et al. 2001; Regueiro et al. 2006). The low frequency of H1* and H1a* in the present study is possibly due to their higher frequency among the Dravidian speaking tribes of South India and their presence in the remaining parts of the Indian subcontinent is limited (Thomson et al. 2000; Li et al. 2008; Zhao et al. 2009; Cordaux et al. 2004).
In the present study, we examined the genetic components of two populations from Southern India to identify their origin and the genetic similarity levels in the present day scenario. We further explored if any sub-populations, sub-lingual, socio-cultural affiliations, or gender-based demographic patterns influenced the genes or geneticity of these population groups. In this study, 217 individuals were typed for Y chromosome polymorphisms using a set of 15 bi-allelic markers on the non-recombining region of Y chromosome, which might be affected in an insignificant bias against some of the rare lineages. Despite the sample size limitation in the studied samples and reference samples from the border areas of Andhra Pradesh, the diversity in Y chromosome lineages suggested that the genetic pool of Andhra Pradesh especially tribal populace is composed of genes that have known phylogeographic origins in Europe and Southeast Asia, while their Y chromosomes show evidence of traces of the original inhabitants of the continent.
The distribution of Y-SNP haplogroups from the present studied populations will increase the resolution power of haplogroups and can play a crucial role in assigning geographical identity to these individual haplogroups, and make determination of the bio-geography of southeast coastal Indians an easy process. However, while applying the data of these haplogroups in Forensic cases, good number of populations should be analyzed with different markers and the individuality of geographic landmarks should be compared with the different haplogroups’ distribution. These Y chromosomal SNP haplotypes show characters similar to some of the mainland haplogroups. The data generated above can be used to find the patrilineal roots of the tested haplogroups. This information about Y chromosomal haplogroups and haplotypes is restricted to the tested population but can provide conclusive data for understanding the patrilineal bio-geography ancestry especially in disaster victim identifications (DVI). The different branches of Y chromosome tree have massive relationship with geographical areas which make it effectively capable of delivering the route map for the ancestors as Y-SNP markers depict association of ethnicity and geography with particular haplogroup frequencies. These Y-SNP markers have established their efficacy in identifying cases, but these uni-parental SNP markers are very less known for being helpful in identifying geographical ancestry although the SNP markers for DVI have now been utilized in a major case of disaster, i.e., the terrorist attack of 11 September 2001 on the World Trade Center (WTC) at New York City because STRs were too long for heavily degraded sample analysis. Finally, Y-SNPs, for their 100,000-times lower mutation rate in contrast to STRs, are superior for kinship testing and may replace STRs for such purposes once commercial kits become available.
The authors are thankful to all the participants who voluntarily participated for this study on “DNA Polymorphisms” and provided their blood samples. We wish to thank the Director Anthropological Survey of India for granting the ethical clearance to carry out the work.
Funding was obtained from the Anthropological Survey of India.
Availability of data and materials
The datasets generated and/or analyzed during the current study are available from the corresponding author research work.
ARI performed the wet lab analysis and prepared the overall manuscript. MS contributed in the study by making the structuring of the research paper. VP contributed in the sample collection and wet lab and dry lab analysis. AC designed the overall study and incorporated the wet lab and dry lab analysis. BA reviewed the complete manuscript and helped in preparing the “Results” section. All the authors read and approved the final manuscript.
Ethics approval and consent to participate
Written informed consent was obtained from all the participants for this study. The ethical committee of Anthropological Survey of India duly approved the present study under the national project “DNA Polymorphisms”.
Consent for publication
All the authors have given the written consent for publication of this article.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- Cai Q, Wen W, Qu S, Li G, Egan KM, Chen K, Deming SL, Shen H, Shen CY, Gammon MD (2011) Replication and functional genomic analyses of the breast cancer susceptibility locus at 6q25. 1 generalize its importance in women of Chinese, Japanese, and European ancestry. Cancer research. 71:1344–1355PubMedPubMedCentralGoogle Scholar
- Delfin F, Salvador JM, Calacal GC, Perdigon HB, Tabbada KA, Villamor LP, Halos SC, Ttir EG, Myles S, Hughes DA, Xu S, Jin L, Lao O, Kayser M, Hurles ME, Stoneking M, Ungria MC (2011) The Y-chromosome landscape of the Philippines: extensive heterogeneity and varying genetic affinities of Negrito and non-Negrito groups. Eur J Hum Genet 19:224–230PubMedGoogle Scholar
- Karafet TM, Lansing JS, Redd AJ, Reznikova S, Watkins JC, Surata PK, Arthawiguna A, Mayer L, Bamshad M, Jorde LB, Hammer MF (2005) Balinese Y-chromosome perspective on the peopling of Indonesia: genetic contributions from pre-Neolithic hunter-gatherers, Austronesian farmers and Indian traders. Hum Biol 77:93–114PubMedGoogle Scholar
- Kumar V, Reddy AN, Babu JP, Rao TN, Langstieh BT, Thangaraj K, Reddy AG, Singh L, Reddy BM (2007) Y-chromosome evidence suggests a common paternal heritage of Austro-Asiatic populations. BMC EvolBiol 7:47Google Scholar
- Narkuti V, Vellanki RN, Anubrolu N, Doddapaneni KK, Gandhi Kaza PC, Mangamoori LN (2008) Single and double incompatibility at vWA and D8S1179/D21S11 loci between mother and child: Implications in kinship analysis. Clinica Chimica Acta 395(1-2):162–165Google Scholar
- Phenol-chloroform isoamyl alcohol (PCI) DNA extraction. Modified from protocols by Barker et al (1998) Available at: http://ccoon.myweb.usf.edu/ecoimmunology.org/About_Home.html
- Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CT, Lin AA, Mitra M, Sil SK, Ramesh A, Usha Rani MV, Thakur CM, Cavalli-Sforza LL, Majumder PP, Underhill PA (2006) Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet 78:202–221PubMedGoogle Scholar
- Singh S, Singh A, Rajkumar R, Kumar KS, Samy SK, Nizamuddin S, Singh A, Sheikh SA, Peddada V, Khanna V, Veeraiah P, Pandit A, Chaubey G, Singh L, Thangaraj K (2016) Dissecting the influence of Neolithic demic diffusion on Indian Y-chromosome pool through J2-M172 haplogroup. Sci Rep 6:19157. https://doi.org/10.1038/srep19157 Published online 12 Jan 2016. PMCID: PMC4709632; PMID: 26754573PubMedPubMedCentralGoogle Scholar
- Sinha M, Rao AI, Mitra M (2017) Y-Chromosomal and Mitochondrial SNP Haplogroup Distribution in Indian Populations and its Significance in Disaster Victim Identification (DVI)-A Review Based Molecular Approach. Austin Journal of Forensic Science and. Criminology. 4(1)Google Scholar
- Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, Kruglyak L, Stein L, Hsie L, Topaloglou T, Hubbell E, Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C, Rozen S, Hudson TJ, Lander ES et al (1998) Large-scale identification, mapping, and genotyping of single nucleotide polymorphisms in the human genome. Science 280:1077–1082PubMedGoogle Scholar
- Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, Blue-Smithd J, Jinf L, Suf B, Pitchappang R, Shanmugalakshmig S, Balakrishnang K, Readh M, Pearsoni NM, Zerjalj T, Websterk MT, Zholoshvilil I, Jamarjashvilil E, Gambarovm S, Nikbinn B, Dostievo A, Aknazarovp O, Zallouaq P, Tsoyr I, Kitaevs M, Mirrakhimovs M, Charievt A, Bodmer WF (2001) The Eurasian heartland: a continental perspective on Y-chromosome diversity. Proc Natl Acad Sci U S A 98:10244–10249PubMedPubMedCentralGoogle Scholar
- Y Chromosome Consortium (2002) A nomenclature system for the tree of human Y-chromosomal binary haplogroups. Genome Res 12:339–348Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.