Development of the Kazakhstan Y-chromosome haplotype reference database: analysis of 27 Y-STR in Kazakh population


To improve available databases of forensic interest, all Y-STR haplotypes from Kazakh population were presented in this study. The reference database accumulated almost 3650 samples from academic and citizen science. Additionally, 27 Y-STR from Yfiler Plus System were first analyzed in 300 males from Kazakh (Qazaq) populations residing in Kazakhstan. The data is available in the YHDR under accession numbers YA004316 and YA004322. A total of 270 unique haplotypes were observed. Discrimination capacity was 90%. Obtained Y-STR haplotypes exhibited a high intra-population diversity. Analysis of pairwise genetic distances showed lowest RST values from Uighur and Mongolian populations.

The short tandem repeat (STR) of Y chromosome is widely used to analyze both forensic cases [1] and genetic genealogy [2]. To achieve high discrimination power, a number of multiplex Y-STR genotyping systems have been developed recently. The most famous of these are PowerPlex Y23 [3] and Yfiler Plus [4]. Currently, genetic polymorphisms of distinct populations from a wide range of countries and from various parts of the world have been investigated utilizing these systems [5,6,7,8]. The data of Y-STR haplotypes distributions is increasing annually in the Y Chromosome Haplotype Reference Database [9]. Nevertheless, the amount of data on Y-short tandem repeat loci for the Kazakh population from Kazakhstan is small. National Database “Kazakhstan” of YHRD consists of 441 MH (minimal haplotypes) or 207 Yfiler haplotypes, including only 41 Yfiler Plus haplotypes (YA004185). Knowledge of haplotype diversity is important in the construction of databases and interpretation of the significance of DNA-based forensic evidence. Our objective was to provide additional information on these 27 YSTR loci in the Kazakh population and to contribute to the development of domestic reference database.

Kazakhstan is the largest country in the Central Asia and home to over 18 million people. Native Kazakh (Qazaq) people are patrilocal and patrilineal population (12 million;, accessed 01/01/2017). Kazakh population is an admixture of Eastern (70%) and Western (30%) anthropological traits. Western peculiarities are archaic and can be traced back to the Bronze Age people of Kazakhstan [10]. The Kazakh Khanate arose in the fifteenth century after the fall of the Golden Horde. Nomadic cultures from Turkic tribes who lived on the Central Asian steppe became the core of Kazakh culture [11]. Kazakhs belong to the Kipchak Turkic language family [12] and are organized into descent groups whose members claim to have a distinctive common ancestor [13].

Nowadays, genetic studies of Y chromosome on Kazakh population from Kazakhstan are limited. The short tandem repeat data has been obtained predominantly by the 17 Y-STR genotyping systems from East and South Kazakhstan [14, 15]. Results of previous studies revealed association between Y chromosome variations and clans. For example, the Kerey clan showed the highest frequency (76.5%) of haplogroup C2-ST [16]. Another haplogroup G1 is typical of the Argyn clan [17]. Analysis of Altaian Kazakhs reveals a common paternal gene pool for Kazakhs [18]. Kazakh population from China showed significant genetic differentiation from the Han and other ethnic groups [19]. A study of Kazakh clans allows researchers to make historical investigations and biogeographical analysis. However, available records of the Y-chromosome haplotypes are not sufficient. Therefore, reference database requires further development.

A total of 300 blood samples were collected from healthy Kazakh male individuals, who are not related at least within three generations of the same family line. Samples were collected mainly from North, Central, and South Kazakhstan after obtaining written informed consent in compliance with the Declaration of Helsinki approved by the Local Ethics Committee. The DNA was extracted using an organic phenol–chloroform-based extraction and quantified both spectrophotometrically (NanoDrop 2000) and fluorometrically (Qubit 2.0). PCR amplification was carried out on GeneAmp PCR System 9700 Thermal Cycler using the Yfiler Plus PCR amplification kit (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer’s instructions. GeneMapper ID-X software was used to analyze the raw data obtained from ABI 3500xL Genetic Analyzer (Thermo Fisher Scientific, Waltham, MA, USA). To contribute to the haplotype data, the laboratories passed the Quality Control Test of the YHRD (YC000343, YC000346). The 27 Y-STR haplotype data was submitted to YHRD ( with the accession numbers YA004316 and YA004322. Supplementary Table 1 contains a full list of haplotypes, as well as other sample information.

Haplotype and allele frequencies were calculated by direct counting. Haplotype diversities (HD)/genotype diversities (GD) were calculated using Arlequin program ver 3.5 [20] as HD/GD = n*(1 − ∑pi2)/(n − 1), where n is the population size and pi is the allele frequency of certain haplotypes/genotypes. The haplotype match probability (HMP) was calculated as sum of squared frequencies of the observed haplotypes. Discrimination capacity (DC) was determined as the ratio between the number of distinct haplotypes and the total number of haplotypes in the sample. The pairwise genetic distances (RST) between five Kazakh populations (South Kazakhstan, Jambyl and Almaty Region, Central and North Kazakhstan) were calculated and illustrated using the AMOVA/MDS tool provided by YHRD website ( For comparative purposes, 66 distinct populations in Eastern Asia, Central Asia, and West Asia containing a total of 8545 haplotypes were used from YHRD as reference populations.

In order to establish the own reference database, a literature review was conducted using the very first publication written in 1999 [21] along with recent scientific articles. In total, 31 references were covered (Supplementary Table 2). Ultimately, database of the Kazakh Y-chromosome Haplotypes consists almost 3650 samples from citizen and academic science, including 300 samples that are reported here for the first time. However, database consists of datasets with different resolutions of Y-STR. Academic science has generated 2605 samples from Kazakh population, but only 1574 samples are available to date (Supplementary Table 3), while citizen science has accumulated 743 samples on the Family Tree DNA Projects (Supplementary Table 4). By presenting all haplotypes from Kazakh population in one database, we believe that these data will be useful for forensic cases and population studies.

As for the analysis of 300 Kazakh samples, haplotype frequencies at the 27 Y-STR loci are shown in Supplementary Table 5. The obtained results revealed 270 distinct Y-STR haplotypes and 216 particular alleles among them. The Y-STR allele frequency and the genetic diversity for each locus of the 27 Y-STR Kazakh population are shown in Supplementary Table 6. The range of frequencies was 0.0033 to 0.7167. The lowest number of allele variants (n = 4) was found in DYS391 and DYS437, whereas the highest (n = 14) was found in DYS481. The variation of genetic diversity was similarly from 0.442 for DYS391 to 0.85 for DYS481. The highest gene diversities were found in RM Y-STR markers.

Considering allelic diversity, we observed microvariant alleles at loci DYS458 (18.2, 19.2, 21.2, 22.2, 23.2, 24.1), at loci 385a/b (12.2) and at loci DYS481 (24.1). Copy number variations such as duplications were detected at loci DYS19 (n = 11) and at loci DYS518 (n = 1). We also observed a null allele at loci DYS448 (n = 5) and at loci DYS576 (n = 1). Microvariant alleles, duplications, and deletions are known characteristics of specific haplogroups [22, 23]. All the variants were confirmed by repeating experiments.

For investigation of the power the Y-STR genotyping systems, we calculated diversity indices for Kazakh population obtained with Yfiler Plus, Yfiler, and Minimal Haplotype (MH) marker sets (Table 1). Analysis of MH allowed distinguishing 149 different haplotypes and the most frequent observed 28 times, providing a discrimination capacity of 49.66%. Two hundred three distinct haplotypes were observed by Yfiler. There were 163 unique haplotypes and the most frequent were observed 14 times. The discrimination capacity increased up to 67.66%. The discrimination capacity of Yfiler Plus reveals striking differences compare to previous systems. It was 90%, providing to observe 270 haplotypes. It carries 246 unique haplotypes, 21 was seen twice and was once observed three, four, and five times. Haplotype diversity of Yfiler Plus reached a value of 0.9991, which is the highest value among all three systems.

Table 1 Diversity indices for Kazakh population (N = 300) obtained with minimal haplotype (MH), Yfiler, and Yfiler Plus marker sets

Investigation on inter-population level of Kazakh populations observed genetic distances between South Kazakhstan, Jambyl and Almaty Region, Central and North Kazakhstan (Supplementary Table 7) as shown in the MDS plot (Figure S1). Indeed, North Kazakh population was not significant (p = 0.74), close to a previously published Kazakh population from YHDR (YA004185). The same applies to Kazakh from Jambyl and Almaty Region. For exploring the genetic relationships among respective Asian populations, we compared 72 populations by 15 Y-STR using RST distances (Supplementary Table 8). Significant differences were observed between the total Kazakh population and 49 populations (p < 0.05/2556 = 0.00002). The lowest genetic distances were obtained for Uighur [Xinjiang, China] (RST = 0.0313) and Mongolian [Inner Mongolia, China] (RST = 0.0358). The highest genetic distances were obtained for Yakut [Russian Federation] (RST = 0.3255). Kazakh from East Kazakhstan (YA003700) and Gansu province of China (YA003979) showed also highest genetic distances (RST = 0.2604) and (RST = 0.2152), respectively. These results demonstrated that geographically different Kazakh populations were distant between each other and require more detailed studies at intra-population level.

In conclusion, the data of the Kazakh population (Supplementary Table 1) obtained in this study that was presented in developing domestic Kazakhstan Y-chromosome Haplotype Reference Database (Supplementary Tables 3 and 4) could be used for genealogical, biogeographical, forensic routine, evolutionary, and population research. This paper follows the guidelines for publication of population data requested by the journal [24].


  1. 1.

    Kayser M (2017) Forensic use of Y-chromosome DNA: a general overview. Hum Genet 136:621–635.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Calafell F, Larmuseau MHD (2017) The Y chromosome as the most popular marker in genetic genealogy benefits interdisciplinary research. Hum Genet 136:559–573.

    Article  PubMed  Google Scholar 

  3. 3.

    Thompson JM, Ewing MM, Frank WE, Pogemiller JJ, Nolde CA, Koehler DJ, Shaffer AM, Rabbach DR, Fulmer PM, Sprecher CJ, Storts DR (2013) Developmental validation of the PowerPlex (R) Y23 system: a single multiplex Y-STR analysis system for casework and database samples. Forensic Sci Int Genet 7:240–250.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Gopinath S, Zhong C, Nguyen V, Ge J, Lagacé RE, Short ML, Mulero JJ (2016) Developmental validation of the Yfiler (R) Plus PCR Amplification Kit: an enhanced Y-STR multiplex for casework and database applications. Forensic Sci Int Genet 24:164–175.

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Purps J, Siegert S, Willuweit S, Nagy M, Alves C, Salazar R, Angustia SMT, Santos LH, Anslinger K, Bayer B, Ayub Q, Wei W, Xue Y, Tyler-Smith C, Bafalluy MB, Martínez-Jarreta B, Egyed B, Balitzki B, Tschumi S, Ballard D, Court DS, Barrantes X, Bäßler G, Wiest T, Berger B, Niederstätter H, Parson W, Davis C, Budowle B, Burri H, Borer U, Koller C, Carvalho EF, Domingues PM, Chamoun WT, Coble MD, Hill CR, Corach D, Caputo M, D’Amato ME, Davison S, Decorte R, Larmuseau MHD, Ottoni C, Rickards O, Lu D, Jiang C, Dobosz T, Jonkisz A, Frank WE, Furac I, Gehrig C, Castella V, Grskovic B, Haas C, Wobst J, Hadzic G, Drobnic K, Honda K, Hou Y, Zhou D, Li Y, Hu S, Chen S, Immel UD, Lessig R, Jakovski Z, Ilievska T, Klann AE, García CC, de Knijff P, Kraaijenbrink T, Kondili A, Miniati P, Vouropoulou M, Kovacevic L, Marjanovic D, Lindner I, Mansour I, al-Azem M, Andari AE, Marino M, Furfuro S, Locarno L, Martín P, Luque GM, Alonso A, Miranda LS, Moreira H, Mizuno N, Iwashima Y, Neto RSM, Nogueira TLS, Silva R, Nastainczyk-Wulf M, Edelmann J, Kohl M, Nie S, Wang X, Cheng B, Núñez C, Pancorbo MM, Olofsson JK, Morling N, Onofri V, Tagliabracci A, Pamjav H, Volgyi A, Barany G, Pawlowski R, Maciejewska A, Pelotti S, Pepinski W, Abreu-Glowacka M, Phillips C, Cárdenas J, Rey-Gonzalez D, Salas A, Brisighelli F, Capelli C, Toscanini U, Piccinini A, Piglionica M, Baldassarra SL, Ploski R, Konarzewska M, Jastrzebska E, Robino C, Sajantila A, Palo JU, Guevara E, Salvador J, Ungria MCD, Rodriguez JJR, Schmidt U, Schlauderer N, Saukko P, Schneider PM, Sirker M, Shin KJ, Oh YN, Skitsa I, Ampati A, Smith TG, Calvit LS, Stenzl V, Capal T, Tillmar A, Nilsson H, Turrina S, de Leo D, Verzeletti A, Cortellini V, Wetton JH, Gwynne GM, Jobling MA, Whittle MR, Sumita DR, Wolańska-Nowak P, Yong RYY, Krawczak M, Nothnagel M, Roewer L (2014) A global analysis of Y-chromosomal haplotype diversity for 23 STR loci. Forensic Sci Int-Gen 12:12–23.

    CAS  Article  Google Scholar 

  6. 6.

    Oh YN, Lee HY, Lee EY, Kim EH, Yang WI, Shin KJ (2015) Haplotype and mutation analysis for newly suggested Y-STRs in Korean father-son pairs. Forensic Sci Int-Gen 15:64–68.

    CAS  Article  Google Scholar 

  7. 7.

    Garcia O, Yurrebaso I, Mancisidor ID, Lopeza S, Alonso S, Gusmao L (2016) Data for 27 Y-chromosome STR loci in the Basque Country autochthonous population. Forensic Sci Int Genet 20:E10–EE2.

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Iacovacci G, D'Atanasio E, Marini O et al (2017) Forensic data and microvariant sequence characterization of 27 Y-STR loci analyzed in four eastern African countries. Forensic Sci Int Genet 27:123–131.

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Willuweit S, Roewer L (2015) The new Y chromosome haplotype reference database. Forensic Sci Int Genet 15:43–48.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Ismagulov O (1970) Population of Kazakhstan from Bronze Epoch to present: (paleoanthropological research). Nauka Alma-Ata

  11. 11.

    Golden PB (1992) An introduction to the history of the Turkic peoples: ethnogenesis and state-formation in medieval and early modern Eurasia and the Middle East. O. Harrassowitz Wiesbaden

  12. 12.

    Johanson L, Csató É (1998) The Turkic languages. Routledge, London

    Google Scholar 

  13. 13.

    Tynyshpaev M (1925) Materials on the history of Kyrgyz-kazakh people. East Department of Kyrgyz State Press Tashkent

  14. 14.

    Tarlykov PV, Zholdybayeva EV, Akilzhanova AR et al (2013) Mitochondrial and Y-chromosomal profile of the Kazakh population from East Kazakhstan. Croat Med J 54:17–24.

    Article  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Zhabagin M, Balanovska E, Sabitov Z et al (2017) The connection of the genetic, cultural and geographic landscapes of Transoxiana. Sci Rep-Uk 7:3085.

    CAS  Article  Google Scholar 

  16. 16.

    Abilev S, Malyarchuk B, Derenko M, Wozniak M, Grzybowski T, Zakharov I (2012) The Y-chromosome C3*star-cluster attributed to Genghis Khan’s descendants is present at high frequency in the Kerey Clan from Kazakhstan. Hum Biol 84:79–89.

    Article  PubMed  Google Scholar 

  17. 17.

    Balanovsky O, Zhabagin M, Agdzhoyan A et al (2015) Deep phylogenetic analysis of haplogroup G1 provides estimates of SNP and STR mutation rates on the human Y-chromosome and reveals migrations of Iranic speakers. Plos One 10:e0122968.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Dulik MC, Osipova LP, Schurr TG (2011) Y-chromosome variation in Altaian Kazakhs reveals a common paternal gene pool for Kazakhs and the influence of Mongolian expansions. Plos One 6:e17548.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Nothnagel M, Fan GY, Guo F, He Y, Hou Y, Hu S, Huang J, Jiang X, Kim W, Kim K, Li C, Li H, Li L, Li S, Li Z, Liang W, Liu C, Lu D, Luo H, Nie S, Shi M, Sun H, Tang J, Wang L, Wang CC, Wang D, Wen SQ, Wu H, Wu W, Xing J, Yan J, Yan S, Yao H, Ye Y, Yun L, Zeng Z, Zha L, Zhang S, Zheng X, Willuweit S, Roewer L (2017) Revisiting the male genetic landscape of China: a multi-center study of almost 38,000 Y-STR haplotypes. Hum Genet 136:485–497.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10:564–567.

    Article  PubMed  Google Scholar 

  21. 21.

    Perez-Lezaun A, Calafell F, Comas D et al (1999) Sex-specific migration patterns in central Asian populations, revealed by analysis of Y-chromosome short tandem repeats and mtDNA. Am J Hum Genet 65:208–219.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Myres NM, Ekins JE, Lin AA, Cavalli-Sforza LL, Woodward SR, Underhill PA (2007) Y-chromosome short tandem repeat DYS458.2 non-consensus alleles occur independently in both binary haplogroups J1-M267 and R1b3-M405. Croat Med J 48:450–459

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Balaresque P, Parkin E, Roewer L et al (2009) Genomic complexity of the Y-STR DYS19: inversions, deletions and founder lineages carrying duplications. Int J Legal Med 123:15–23.

    Article  PubMed  Google Scholar 

  24. 24.

    Poetsch M, Bajanowski T, Pfeiffer H (2012) The publication of population genetic data in the International Journal of Legal Medicine: guidelines. Int J Legal Med 126:489–490.

    Article  Google Scholar 

Download references


We gratefully acknowledge all sample donors who participated in this study. We thank Nazarbayev University students (Aigul Abilmazhinova, Rassul Khakimov, Azamat Bashabayev, Kamila Issabayeva, Dariya Kassybayeva, Akylzhan Zhaken) for the laboratory assistance. We would like to thank Maxim Solomadin and Ayken Askapuli for contributions to this work.


This study received primary support from the Ministry of Education and Science of the Republic of Kazakhstan (Grant No. AP05134955).

Author information




Conceived and designed the experiments: MZ, AA; performed the experiments: AS, IT, DE, SL, AE; analyzed the data: MZ, AS; contributed reagents/materials/analysis tools: AR, AA, SL, AE, IT; wrote the paper: MZ; AS, AA participated in the drafting; study initiation: MZ; read and approved the final version of the paper: all coauthors.

Corresponding author

Correspondence to Maxat Zhabagin.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Electronic supplementary material

Fig S1

(PDF 4.80 kb)


(XLSX 782 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhabagin, M., Sarkytbayeva, A., Tazhigulova, I. et al. Development of the Kazakhstan Y-chromosome haplotype reference database: analysis of 27 Y-STR in Kazakh population. Int J Legal Med 133, 1029–1032 (2019).

Download citation


  • Central Asia
  • Kazakh
  • Y chromosome
  • YHRD
  • Forensic
  • Population genetics
  • STR