Genetic diversity of the Thao people of Taiwan using Y-chromosome, mitochondrial DNA and HLA gene systems
- 1.4k Downloads
Despite attempts in retracing the history of the Thao people in Taiwan using folktales, linguistics, physical anthropology, and ethnic studies, their history remains incomplete. The heritage of Thao has been associated with the Pazeh Western plains peoples and several other mountain peoples of Taiwan. In the last 400 years, their culture and genetic profile have been reshaped by East Asian migrants. They were displaced by the Japanese and the construction of a dam and almost faced extinction.
In this paper, genetic information from mitochondrial DNA (mtDNA), Histoleucocyte antigens (HLA), and the non-recombining Y chromosome of 30 Thao individuals are compared to 836 other Taiwan Mountain and Plains Aborigines (TwrIP & TwPp), 384 Non-Aboriginal Taiwanese (non-TwA) and 149 Continental East Asians.
The phylogeographic analyses of mtDNA haplogroups F4b and B4b1a2 indicated gene flow between Thao, Bunun, and Tsou, and suggested a common ancestry from 10,000 to 3000 years ago. A claim of close contact with the heavily Sinicized Pazeh of the plains was not rejected and suggests that the plains and mountain peoples most likely shared the same Austronesian agriculturist gene pool in the Neolithic.
Having been moving repeatedly since their arrival in Taiwan between 6000 and 4500 years ago, the Thao finally settled in the central mountain range. They represent the last plains people whose strong bonds with their original culture allowed them to preserve their genetic heritage, despite significant gene flow from the mainland of Asia.
Representing a considerable contribution to the genealogical history of the Thao people, the findings of this study bear on ongoing anthropological and linguistic debates on their origin.
KeywordsPhylotree Human population genetics Mitochondrial DNA Thao Taiwan aboriginal people Phylogeography
Bayesian Skyline plot
Multiple dimensional scaling
Non Taiwan Aborigines
Non recombining Y chromosome
Polymerase chain reaction
Taiwan recognized indigenous peoples
Year before present
Y chromosome single strand repeat
Taiwan’s multicultural and multilingual population reached 23.5 million in 2016 . Mandarin, the official language, is almost universally used and understood, while significant portions of the population speak other Sinitic languages, such as Minnan and Hakka groups originally from Southeast China. It is believed that the very first fully modern humans arrived on the island between 20,000 and 30,000 years before present (YBP) in very small numbers during the late Pleistocene when Taiwan was still a part of the East Asian mainland . Although a few traces of this era can be inferred from the genetic profile of the current population [3, 4, 5, 6], and from archeological artifacts of Paleolithic cultures [2, 7], it is believed that Palaeolithic groups disappeared during the Last Glacial Period of the Mesolithic Age, or at the latest, around the time the Neolithic groups arrived in Taiwan [2, 7, 8, 9], and their genetic identity, origin, and continuity with the extant aboriginal populations of Taiwan remains unresolved.
Today there are 16 groups of officially recognized indigenous peoples in Taiwan (TwrIP) who represent approximately 2.2% of the Taiwan population. These groups speak Austronesian languages. The greatest genealogical diversity of the Austronesian languages is found in Taiwan, where they diversified and expanded from the ancestral Proto-Austronesian languages arriving from the East Asian Mainland 6000 YBP [7, 10] with the Neolithic colonization of the island. This language group most likely reached its present diversity at the beginning of the Neolithic era, and are often referred to as the Formosan languages. Subsequent human entries include at least Metal Age Austronesian groups from Southeast Asia, European, Chinese, Japanese colonial settlers, and post Second World War Chinese exilés, each with substantial cultural and genetic impacts on the island’s population [5, 11, 12, 13].
The Thao comprised three major clans, the Yuan, Shi, and Mau clans. The arrival of the Han from China over the last four centuries bringing armed conflicts and infectious diseases reduced the population of the plains and mountain peoples and brought the Thao people, who were already small in number, to the brink of extinction .
During the period of Japanese colonial administration (1895–1945), the Japanese government began to modernize Taiwan. In 1919, the colonial authorities decided to build a dam on Sun Moon Lake. Most Thao inhabiting the area were forced to relocate to nearby areas . Further, the Chi-Chi earthquake of 1999 damaged or destroyed 80% of the houses of the Thao people and sent many to look for employment in other cities.
After many episodes of displacements and regrouping, the Mau clan now lives in Shuili and Dapinglin (presently Toushe or Puzi) villages, south of Sun Moon Lake, and part of the Shi clan who previously resided further north in Yuchi have now rejoined the groups in Tehuashe (presently Sun Moon village east of Sun Moon Lake) .
However, the home of the Thao clans before they reached the Sun Moon Lake region remains unclear. Were they really in contact with the Pazeh people on the western plain and later came up along the Choshui river ? Did they temporarily settle in the neighborhood of the Tsou people ? A 1921 tourist industry version of a tribal legend of the chasing of a white deer that finally lead the Thao to Sun Moon Lake may indicate that the Thao came from further south, possibly the Alishan region near the current home of the Tsou. Interestingly, in 1951, according to this account and following an initial Japanese anthropological classification allowing recognition of only a limited number of Taiwan groups, the Tsou and Thao were classified as belonging to a single group: the Tsou People [20, 21]. However, this classification, along with the origin of the Thao, remains under debate.
Further anthropological studies showed that the Thao peoples were very different from the Tsou, and although, like the Tsou, Thao peoples lived by farming, hunting, fishing, and collecting, and now principally sell artifacts to tourists they still venerate their ancestral spirits and have conserved a rich and unique culture that is different from the Tsou  or other neighboring peoples. More importantly, the Thao people have unique rituals, such as rhythmic pestle music and tooth pulling, and scholars nowadays describe them as a unique socio-cultural group . The Thao people are a localized kin group of patrilineal exogamous descent. Traditionally, a single hereditary clan maintained control of the leadership whereby the chief, who made decisions about ceremonial rituals, had this authority passed from his father and if there was no first-born son, then the next male kin would inherit the title . Information appertaining to specific clans is not included in this study. All Thao now live in the region to the south of the Atayal and Saisiyat peoples and are close neighbors to the Bunun in the southeast with whom they share some similar linguistic and social traits.
Morphometric differences presented by Yu Chin-Chuan and Tseng Tsung-Ming  were coupled with the geographic distribution of other TwrIP. These included 13 items of observation, 20 morphometric measurements and 20 indexes calculated from these measurements . In brief, the physical characteristics of most Formosan aborigines have been described as 1. straight hair with very little wavy hair, 2. black hair with some black-brown, 3. Brown or dark-brown eye, 4. a high percentage of double-eyelids, 90 to 100%, and 5. Mongoloid folds 61 to 90%. The Thao showed no significant difference from other TwrIP except that they have a lower percentage of Mongoloid folds. Further, Yu and Cheng’s results show that the Thao are physically more similar to the Bunun, the Atayal, and to the Paiwan, and were more distant from the Amis further to the east and the Yami. Intriguingly, the same study also described physical anthropological traits closer to the Hakka, perhaps suggesting gene admixture between Thao and non-Aboriginal groups and/or drift.
The official classification of ethnic groups today considers the individuals or groups’ history, their self-perception, the government’s perception, and the findings of researchers in various fields such as linguistics, culture, and ethnology [21, 22]. Past or present acculturation in Taiwan, sinicization, and recent advances in technology have also influenced the way people view themselves, each other and where they prefer to live. Presently, the impact of genetics on all fields of study  and its easy availability to the public and scientific communities have become generally well accepted, better understood, and taken very seriously. By ascertaining the magnitude and spatial distribution of the genetic diversity in Taiwan, our study aims to shed greater light on the genetic heritage of the Thao people and to detect evidence of past admixture between regional groups. For this, we carried out analysis of the polymorphism of paternally inherited non-recombining Y chromosome (NRY), of the maternally inherited mitochondrial DNA (mtDNA), and of the diploid human leukocyte antigens (HLA-A, −B and -DRB1) among individuals from most groups and locations within Taiwan, the Philippines, and Fujian.
Gene Diversity in three gene systems (NRY, HLA, and mtDNA)
HLA A,B and DRB1
Y-SNP haplogroups ± SD
Y-STR haplotypes ± SD (7 STRs)
mtDNA Haplogroups ± SD
Aleles ± SD
Haplotypes ± SD
0.849 ± 0.017
0.979 ± 0.006
0.990 ± 0.001
0.894 ± 0.022
0.997 ± 0.003
Taiwan sinitic speaking groups
Taiwan mixed sample
0.887 ± 0.007
0.999 ± 0.000
0.990 ± 0.000
0.879 ± 0.022
0.992 ± 0.001
0.889 ± 0.020
0.985 ± 0.006
0.987 ± 0.002
0.892 ± 0.030
1.000 ± 0.002
0.886 ± 0.013
0.987 ± 0.006
0.990 ± 0.001
0.893 ± 0.016
0.996 ± 0.002
0.893 ± 0.006
0.983 ± 0.003
0.990 ± 0.001
0.882 ± 0.021
0.995 ± 0.001
0.689 ± 0.020
0.941 ± 0.014
0.977 ± 0.006
0.878 ± 0.023
0.990 ± 0.005
Other Taiwan Plain tribes (Pingpu)
0.859 ± 0.008
0.994 ± 0.001
0.981 ± 0.001
0.833 ± 0.030
0.976 ± 0.004
Taiwan Austronesian speaking groups (Formosan)
0.177 ± 0.049
0.518 ± 0.060
0.886 ± 0.012
0.809 ± 0.044
0.966 ± 0.004
0.095 ± 0.062
0.447 ± 0.096
0.730 ± 0.025
0.813 ± 0.038
0.976 ± 0.004
0.229 ± 0.080
0.775 ± 0.039
0.869 ± 0.014
0.728 ± 0.064
0.958 ± 0.008
0.227 ± 0.095
0.351 ± 0.010
0.894 ± 0.023
0.788 ± 0.058
0.939 ± 0.017
0.490 ± 0.025
0.886 ± 0.012
0.893 ± 0.008
0.768 ± 0.056
0.945 ± 0.014
0.181 ± 0.056
0.318 ± 0.014
0.920 ± 0.010
0.692 ± 0.078
0.918 ± 0.015
0.669 ± 0.025
0.909 ± 0.024
0.910 ± 0.013
0.702 ± 0.067
0.909 ± 0.010
0.461 ± 0.060
0.905 ± 0.021
0.905 ± 0.012
0.686 ± 0.079
0.945 ± 0.014
0.701 ± 0.029
0.909 ± 0.024
0.929 ± 0.005
0.634 ± 0.076
0.917 ± 0.017
0.688 ± 0.048
0.968 ± 0.009
0.944 ± 0.005
0.785 ± 0.058
0.977 ± 0.006
0.627 ± 0.040
0.834 ± 0.023
0.852 ± 0.009
0.711 ± 0.063
0.875 ± 0.019
All TwMtA (no Thao)
0.603 ± 0.018
0.831 ± 0.017
0.965 ± 0.001
0.797 ± 0.010
0.979 ± 0.001
Malayo-polynesian speaking groups
0.726 ± 0.039
0.935 ± 0.018
0.923 ± 0.012
0.821 ± 0.037
0.963 ± 0.012
0.893 ± 0.007
0.935 ± 0.018
0.952 ± 0.003
0.833 ± 0.023
0.998 ± 0.003
Non-recombining Y chromosome (NRY) of the Thao
Molecular age estimates of subtypes of haplogroup O1 in Thao and other groups using seven Y-STRs
Western Indonesia N = 192
Taiwan Han N = 446
Pingpu (no Pazeh) N = 370
Pazeh N = 44
TwrIP (no Thao) N = 339
Thao N = 16
Batan N = 24
Philippines (no Batan) N = 146
mtDNA molecular variation (age) using rho total (Soares et a. 2009)
B4b1a2 (np T6216C)
B4b1a2f (nps G709A, T14110C)
B4b1a2f2 (nps G709A, T14110C, A10313c)
B4b1a2f3 (nps G709A, T14110C, G6260A)
B4b1a2g (np C16365T)
B4b1a2k (np G207A!, A8014G, C16400T)
B4c1b2a2 (nps T146C, T8772C)
B5a2a (nps A93G, G11149A, C14149T)
B5a2a2 (np T8614C)
B5a2a2b (nps C5027T, C8059T)
B5a2a2b1 (np A4824G)
E1a1anew (nps T6620C, C14766T, G16129A)
F1a3a3 (np C15452T)
F1anew (nps G6962A, T10604C, A14053G)
F4b1 (np A10097c)
F4b1c (nps 8548 s 14215 s 15924 s)
F4b1d (np G513A)
M8a2’3′ (np C16184T)
HLA characterized clear genetic differences between the Continental East Asian multilinguistic areas, such as Fujian, the non-aboriginal or mixed groups (Minnan, Hakka, and TwPp), and the Austronesian speaking TwrIP (Fig. 4). In brief, excluding HLA-DRB1*08:02 (1.67%) and DRB1*13:12 (1.67%) (Additional file 1: Table S1), all other Thao HLA-A, B, and DRB1 alleles were seen at various frequencies in most other Austronesian and non-Austronesian speaking groups of Taiwan and Southeast China [34, 35, 36]. Among these groups, the sole difference in this apparent homogeneity of distribution observable within the groups was most likely brought about by drift. By contrast, except for those haplotypes conserved by selection, recombinations between HLA loci contribute to greater HLA haplotype diversity. Accordingly, we used the Expectation Maximum likelihood procedure in Arlequin 220.127.116.11 to infer HLA-A-B-DRB1 haplotypes and use them as indicators to retrace the events of past migrations and the dispersal history of all groups studied [37, 38]. For example, according to Chu et al. (2004) and Lin et al. (2001) the profile of the distribution of characteristic bi-loci haplotypes seen in Thao and TwrIP (HLA-A*02:07-B*4601, A*11:01-B*15:01:01, A*11:01-B*40:01, A*11:01-B*55:02, A*33:03-B*58:01, and B*58:01-DRB1*03:01:01) is significantly different from the profile seen in non-TwA [34, 36]. Here, using tri-loci haplotypes, only six (26%) of the 23 Thao triplet haplotypes (Fig. 4 right, Additional file 1: Table S1, and S8) were shared between the Thao (k = 23 haplotypes) and Fujian (k = 82 haplotypes) out of 962 haplotypes in the complete data set. This pattern remained consistent when analyzing other TwrIP groups. In addition, while three HLA haplotypes represented 55% of the Thao profile, HLA-A*24:02-B*40:01-DRB*11:01, HLA-A*24:02-B*39:01-DR*08:02, and HLA-A*24:02-B*13:01-DR*12:02, the MDS plot located the Thao among the central Taiwan mountain peoples, and two closely related southern aboriginal peoples, the Paiwan and Rukai (Fig. 4).
Last, the exact test of the Hardy-Weinberg Equilibrium of Thao obtained from all HLA loci using a 100,000 Markov chain length  did not show a departure from expectations (p > 0.12) and corroborated the results described above for mtDNA (data not shown). Moreover, the Ewens-Watterson’s F test of neutrality [40, 41] for all HLA loci did not show a deviation from expectations (p = 0.8) (Additional file 2: Table S9).
Evolutionary mechanisms inferred from mismatch distribution and Bayesian skyline plot
Multiple dimensional scaling (MDS) and putative parental contribution analysis
After having established a definite ancestral affinity between the Thao and the northern and central TwrIP, we looked at the genetic distribution of the three gene systems, HLA, mtDNA, and Y-chromosome (Fig. 4 right, and Additional file 1: Table S1). The Y-chromosome SNP profile of Thao showed higher affinity with Atayal and Tsou than with Fujian or non-TwA. Most interesting was the very close mtDNA affinity seen between Thao and Bunun, likely attributable to the confined distribution of the B4b1a2 subclades among the northern and central mountain peoples (Additional file 3: Supplementary text 1), a finding also supported by Blust on linguistic grounds . In sum, with the exception of the HLA affinity of the Thao with the southern Paiwan and Rukai peoples, the Y-chromosome and mtDNA profiles substantiate the HLA profile in characterizing the Thao as a member of the northern/central mountain peoples.
Gene contribution to Thao from two putative parent groups
Putative Parental populations
Taiwan officially recognized Indigenous people not including Thao (TwrIP)
It is generally believed that the Taiwan Pingpu groups (such as Pazeh and Siraya) were initially Austronesian speakers who belonged to the same group of people as the Taiwan mountain peoples today  (Fig. 1 and Additional file 8: Figure S1). According to archeological and linguistic evidence, they arrived in Taiwan during the early Neolithic from Southeast China approximately 6000 years ago . As the result of continuous and numerous arrivals from China, largely Minnan and Hakka, in the last 400 years, the Neolithic settlers who remained in the more hospitable environment of the western plains of Taiwan are presently heavily culturally and genetically Sinicized [25, 31, 34, 35]. Knowledge of the genetic boundaries between Taiwan aborigines and Taiwan Han is important in reconstructing the heritage of these groups in relation to ancient and modern events, and for the design and implementation of genetic epidemiologic studies.
The Thao Aborigines today are a small and sinicized indigenous group in central Taiwan. Because of their language, the Thao peoples have been classified as a plains people . Their language actually neared extinction in the past few hundred years as the number of individuals fell to approximately 260, and their language in 2000 was then only competently spoken by less than 15 Thao individuals [15, 16]. The official recognition by the Taiwan government in 2001 of the Thao as an indigenous people contributed to the revival and preservation of their ethnic cultures and language. Presently, their language contains loan words from the Bunun ethnic group with whom they mixed and intermarried . More interestingly, the presence, in the Thao language, of specific cognates allows retracing their ancestry to Proto-Austronesian groups . However, debates on their ethnic status and origin are ongoing.
Herein we used genetic information obtained from mtDNA, HLA-A-B-DRB1, 16 Y-STRs, and 81 Y-SNPs to shed light on their origin.
First, Multi-Dimensional-scaling (MDS) analyses, using the three gene systems (Fig. 4) invariably grouped the Thao among the mountain peoples. Moreover, MDS showed a strong paternal influence from the northern peoples, Atayal, Saisiyat, and Taroko, and a strong maternal affinity of Thao with the central peoples, Bunun and Tsou.
The high level of cultural Sinicization of the Thao during the last four centuries is contrasted by the observed lower than expected level of Han genetic admixture for mtDNA and Y chromosome (24.5 to 44.8% respectively).
This mtDNA admixture result was well supported by the evolutionary mechanisms of the Thao inferred from Mismatch Distribution which produced a multimodal curve indicating a past period of female introduction into the Thao. However, according to Harpending [42, 43] an mtDNA diversity as low the one seen in the Thao (Additional file 1: Table S1) and a multimodal curve of the mismatch distribution (Harpending raggedness = 0.035) (Fig. 3, left) possibly indicate an ancestral period with few founding genes, rapid drift, or most likely, admixture events.
The lower HLA-A-B-DRB1 haplotype diversity in Thao (0.939) than in non-Taiwan aborigines (0.995) and Han (0.997) (Additional file 1: Table S1 and Additional file 9: Table S8) suggested that, despite modernization and the strong Han influence of the last 400 years, the Thao have managed to conserve their genetic heritage. The MDS plots (Fig. 4) clearly reflect the important role of the physical impact played by the central mountain ranges in isolating the Thao from later Han gene flow and for the conservation of the original Thao genetic profiles that are seen across the three gene systems used in this study.
Previous contacts with the ancestors of the Pazeh plains people proposed by linguistic researchers  were not refuted by our results. The sharing of genetic traits between the Thao and Pazeh could only have happened at a very early stage during the settlement of the Austronesian agriculturists in the western plain of Taiwan. At that time, the plains peoples and mountain peoples had not yet separated and had sprung from the same southeastern Mainland Asian gene pool, and Y-SNP haplogroups O1a1*P203 and mtDNA haplogroup B4b1a2 were just beginning to diversify from their ancestral founding branches [3, 29] (Additional file 8: Figure S1). The predominance in Thao of specific gene types such as B4b1a2g’f’k and F1b1’c’d, may be the result of later female gene flow from other recognized central mountain peoples (Bunun and Tsou) introduced after the Thao had left the western plain [11, 15, 16, 17] (Additional file 1: Table S1).
For the male counterpart, haplogroup O1a1*P203 in the Thao (87.5%) produced a unique Y-STR network showing no sharing of Y-STRs haplotypes with other Formosan groups, and having an age estimate of molecular variation of 1590 ± 690 YBP (Table 2, Fig. 2 and Additional file 1: Table S1). It is possible that this low age estimate is the consequence of a male bottleneck following bad health or the result of the very small number of Thao survivors forced to relocate several times during the last few centuries . This unique genetic structure further suggests that a small homogeneous group of males, bearers of O1a1*P203 and having strong bonding to their patriarchal culture, managed to remain untouched by male external gene flow in the last two millennia. Any contact with the ancestors of the Pazeh could only have happened before that period. Through maintaining their traditions (Shamanism, patrilineality, the Ulalaluan symbol of ancestry, folktales, and most importantly, their plains tribal language), the Thao have succeeded in conserving a cultural heritage which characterizes them as a discrete member of the other Formosan groups [11, 15, 16, 17]. In retracing their physical journey from the western plains to the central mountain range, we showed that the Thao also succeeded in preserving a Formosan genetic signature which is one that is highly likely to have been shared by all the plains and mountain peoples of the early Neolithic, before the arrival of Han settlers and genetic Sinicization (Additional file 8: Figure S1).
This study has exploited the advantages of using multiple highly polymorphic gene systems as an efficient method to supplement often restricted uniparental chromosome analysis and to deliver robust support to previous genetic, anthropological, archaeological and linguistic studies, linking proto-Austronesians with the Neolithic cultures of Taiwan. At the same time, rapid progress in complete genome sequencing is opening new avenues in population analysis, in particular for disease analyses. The success of this growing field is largely dependent on the availability of data obtained from groups with high homozygosity or out of neutrality equilibrium. This situation presents special problems to the research scientists, as the unique genetic structure of the Taiwan aboriginal peoples and other once isolated aboriginal groups are rapidly being modified through dispersal, social interactions, acculturation, and admixture. Many genetic disease association studies would greatly benefit from the analysis of small aboriginal groups and vice versa. This source of important human genetic data has yet to be systematically used. Without urgent action, their genetic data will be lost forever. Despite the shortcomings introduced in this study by the small number of Thao individuals used, we show that a small aboriginal group, under strong admixture pressure, successfully conserved its ancestral genetic structure, and we raise the awareness of the urgency to create a methodology for exploring the genetic structure of other rare population groups.
Material and methods
The Thao genetic diversity for Y-chromosome, mtDNA, and HLA was determined in 30 unrelated (back to two generations) and healthy individuals. All individuals had both parents and first-generation grandparents belonging to the same people and gave consent to participate in this study. Approval to conduct this project was obtained from the ethics committee of Mackay Memorial Hospital in Taipei (Taiwan).
The Thao data set (Additional file 9: Table S8) was compared to a panel of other Taiwan individuals that we had previously analyzed for Y-chromosome , mtDNA [31, 33] and HLA. The HLA data is available online at http://www.allelefrequencies.net and in the proceedings of the Anthropology/HLA diversity component of the 13th international histocompatibility workshop [24, 25, 34, 51, 52]. Geographic locations and sampling sites of the Taiwanese groups used for a comparative purpose are shown in Fig. 1. This panel comprises a) a dataset of non-Taiwan aborigines that includes Minnan (n = 672), Hakka (n = 200) and a sample of undefined number of Minnan and Hakka, referred to herein as TwMix (n = 3227), b) Taiwan officially recognized indigenous peoples (TwrIP) including Atayal (n = 110), Taroko or Truku (n = 54), Saisiyat (n = 64), Bunun (n = 181), Tsou (n = 60), Rukai (n = 78), Paiwan (n = 172), Amis (n = 294), Puyuma (n = 116) Yami/Tao (n = 88), Ivatan/Batan (n = 50), and c) indigenous Taiwan Pingpu peoples (TwPp, n = 493) including Pazeh (n = 65) and Siraya groups (n = 428). To obtain a more detailed analysis, we selected other in-house material: Eastern Chinese (Fujian, n = 149, Philippines, n = 317, and Batan n = 50) [31, 33, 53, 54]. Phylogenetic analysis was improved through the use of additional data from the literature, principally complete-mtDNA genome typing from Phylotree [3, 6, 55] and NRY Y-STR [26, 48] (Additional file 10: Table S6).
Preparation and sequencing
Genomic DNA was extracted from 500 μl of buffy coat using the QIAamp DNA Blood Mini Kit (Qiagen inc. Chatsworth, California, United States) with minor adjustments to the procedure recommended by the manufacturer.
Mitochondrial haplogroup assignments were obtained by comparing known reference genomes  to the nucleotide variation of the D-loop HVS-I control region (nucleotide positions nps 16,006–16,397) and coding regions (nps 8000–9000, nps 9959–10,917 and nps 14,000–15,000) according to our previously published sequencing protocol . Ambiguous haplogroup assignments were confirmed using further pertinent sequencing of segments of the coding region [31, 56, 57].
Complete mitochondrial genome sequencing for this study was obtained for each representative haplotype of the Thao people using our previously published sequencing protocol .
Y-Chromosome polymorphism was determined using 81 NRY markers, the majority of which are slowly evolving binary markers (Y-SNPs), according to published sequencing protocols [27, 56]. In brief, sequencing was performed on both strands using the DiDeoxy Terminator Cycle Sequencing Kit (Applied Biosystems) according to manufacturer recommendations. Purification on a G50 Sephadex column was performed before the final run on an automated DNA Sequencer (ABI Model 377). The nomenclature used for haplogroup labeling is in agreement with the classification provided by the International Society of Genetic Genealogy for the Y Chromosome Consortium and recent updates [56, 58].
Further genotyping with of 16 microsatellites markers (DYS19, DYS385I, DYS385II, DYS389II, DYSS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, and Y GATA-H4) was done using the Y-filter kit (Applied Biosystems) following the manufacturer’s instructions. In brief, PCR products were mixed with GeneScan 500LIZ (Applied Biosystems) as an internal size standard and analyzed by capillary electrophoresis with an ABI Prism 310 genetic analyzer (Applied Biosystems) using the standard fragment analysis protocol mode. Genotyper 2.5.2 software (Applied Biosystems) was used for allele scoring. For all statistical and network analyses, we used data from DYS389II by subtracting DYS389I from DYS389II .
The Thao frequencies of haplogroups of the Y-SNP and mtDNA gene systems, and of the HLA-A, −B and -DB1 alleles were obtained by mere counting (Additional file 9: Table S8). The HLA-A-B-DRB1 haplotype data were estimated using the EM algorithm in Arlequin version 18.104.22.168 (Additional file 1: Table S1 and Additional file 9: Table S8). To validate these frequencies in the Thao, the linkage disequilibrium of each haplotype was inferred and goodness of fit was calculated using the Pearson’s cumulative chi-squared test statistic χ 2 (Additional file 9: Table S8). [59, 60]. The unbiased gene diversity index, h, and its standard error were calculated using the formulas given by Nei  (Additional file 8: Figure S1). Molecular diversity, Tajima D: , Fu’s Fs , mismatch difference analysis (MMDA) , and pairwise population distances (FST)  were calculated using Arlequin version 3.1143 . Demographic variation through time was obtained from a Bayesian skyline plot (BSP)  using Beast with a relaxed molecular clock and a mutation rate of 2.2964 × 10− 7 mutations per site per year for the mtDNA HVS1 data (Fig. 3).
Y-STR Median-Joining (MJ) networks restricted to a single Y-SNP haplogroup were constructed using Network v. 22.214.171.124 (Fluxus Engineering; http://www.fluxus-engineering.com) after processing the data with the reduced-median method and weighting the STR loci proportionally to the inverse of the repeat variance (Fig. 2). The age of Y microsatellite variation was obtained using the rho statistic method of Zhivotovsky et al.  and modified according to Sengupta et al.  (Table 2). Haplogroups age estimates for mtDNA were calculated from the complete genome variation rate of one substitution every 3624 years using the rho statistic  and corrected for purifying selection as implemented by Soares  (Table 3). Dates were only intended as a rough guide for relative haplogroup ages comparison. Multiple Dimension Scaling Analysis plots (MDS) using haplogroup frequencies of the three gene systems (Fig. 4) were constructed with SPSS version 17.01 using Alscal Euclidian distances (SPSS Inc., Chicago IL).
MtDNA HVS1 region and complete mtDNA sequencing described herein have been deposited in GenBank (GenBank sequence submission of 38 complete mtDNA genome, MH177784- MH177821). Y-chromosome STR data and partial mtDNA sequencing are provided in Additional file 10: Table S6 and Additional file 11: Table S7. Other NRY Y-STR and Y-SNP data sets are available on .
The authors wish to thanks Dr. John S.Sullivan from Sydney Universty for revising this manuscript, Dr. Chu Chen-Chong and Dr. Tse-Yi Wang from the Mackay Memorial Hospital for their helpful discussions and feedback during the manuscript preparation. This work was performed on the Molecular Anthropology database of the Mackay Memorial Hospital of Tamsui in Taiwan.
This work was supported by a grant NSC91–2314-B-195-018 from the National Science Council of Taiwan. The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
The raw complete mtDNA genome data used for the construction of phylogenetic trees shown as a supplementary material have been submitted to GenBank with the following accessions: MH177784- MH177821.
Other NRY Y-STR and Y-SNP data sets are available from Trejaut (2014) .
The project was conceived and designed by JAT, laboratory work was performed by ZSC and YHL. JAT performed data analysis, JAT and FM drafted the manuscript. All other authors gave useful contributions to the analysis of data and the text of the manuscript. All authors have read and approved the final version of the manuscript.
Ethics approval and consent to participate
All individuals gave consent to participate in this study. Approval to conduct this project was obtained from the ethics committee of Mackay Memorial Hospital in Taipei (Taiwan).
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 2.Tsang CH. On the chronology and external affinities of the Palaeolithic Changpin culture in Taiwan. In: Proceeding of the international symposium on the Palaeolithic cultures in Taiwan and its surrounding areas. Taitong: National Museum of Prehistory; 2013. 29–30 March 2013.Google Scholar
- 7.Chang K-C. Prehistoric archaeology of Taiwan. Asian Perspect. 1970;13:59–77.Google Scholar
- 9.Yu C-C, Chang T-M. Physical Anthropology of the Thao, Sun-Moon lake. Journal of archeology and Anthropology, Taiwan University (Taida), Taipei, Taiwan. 1957;9(10):125–36 (in Chinese, summary in English).Google Scholar
- 10.Tsang CH. The prehistory of Taiwan: A brief introduction. In: Seventeenth Congress of the Indo-Pacific Prehistory Association. Taipei: Academia Sinica; 2002.Google Scholar
- 11.Chang KC. The Neolithic Taiwan Strait. Kaogu. 1989;6:541–50 569.Google Scholar
- 12.Olsen JW, Miller-Antonio S. The Palaeolithic in southern China. Asian Perspect. 1992;31(2):129–60.Google Scholar
- 13.Chou WY. A new illustrated history of Taiwan. Taipei: SMC Publishing; 2015.Google Scholar
- 15.Li PJK. The dispersal of the Formosan aborigines in Taiwan. Languages and Linguistics. 2001;2(1):271–8.Google Scholar
- 17.Blundell D. Austronesian Taiwan: linguistics, history, ethnology, prehistory. Revised edition. Taipei/Berkeley: Shung Ye Museum of Formosan Aborigines/Phoebe A. Hearst Museum of Anthropology, University of California Berkeley; 2009.Google Scholar
- 18.Skutsch C. Encyclopedia of the World’s minorities, vol. 1: Routledge; 2013.Google Scholar
- 19.Chan KY. A history of aboriginal migration in the Sun moon Lake region, 1815-1934. Taiwan Historical Research. 2000;7(1):81–134 (in chinese).Google Scholar
- 20.Chen J-Y. “Thao” and “Tsou”: Establishing the Knowledge of the Sun-Moon Lake Aborigines during the Period of Japanese Rule. Bulletin of the Department of Ethnology National Chengchi University for Nationalities (in Chinese). 2005;24:205–41.Google Scholar
- 21.Blundell D. Languages connecting the world. In: Austronesian Taiwan: Linguistics, History, Ethnology, Prehistory. Revised Edition. Taipei/Berkeley, CA: Shung Ye Museum of Formosan Aborigines/Phoebe A. Hearst Museum of Anthropology, University of California Berkeley; 2009. p. 401–59.Google Scholar
- 22.Zeitoun E, Yu C-H. Language analysis and language processing. Computational Linguistics and Chinese Language Processing, Academia Sinica, Taipei, Taiwan. 2005;10(2):167–200.Google Scholar
- 24.Chu CC, Trejaut J, Lee H, Chang S, Lin M: Populations Atayal from Wulai/Chenshih/Wufen, Taiwan Toroko from Hsiulin, Taiwan Saisiat from Wufen/Nanchuang, Taiwan Bunun from Hsin-I/Taitung, Taiwan Tsou from Tapang, Taiwan Rukai from Wutai, Taiwan Paiwan from Lai-I, Taiwan Ami from Hualien/Taitung, Taiwan Puyuma from Peinan, Taiwan Tao from Lan Yu, Taiwan Pazeh from Fengyuan/Puli/Liyutan, Taiwan Siraya from Tanei/Tsochen, Taiwan Thao from Yuchih, Taiwan Minnan, Taiwan Hakka from Hsinchu/Pintung, Taiwan Ivatan from Bantanes, Philippines. In Mack SJ, Tsai Y, Sanchez-Mazas A, Erlich HA, 13th International Histocompatibility Workshop Anthropology/Human Genetic Diversity Joint Report, Chapter 3: Anthropology/human genetic diversity population reports. In: Hansen JA, ed. Immunobiology of the Human MHC: Proceedings of the 13th International Histocompatibility Workshop and Conference, Victoria, Ca; Seattle USA - 12-22 May 2002. Proceedings of the 13th International Histocompatibility Workshop and Conference 2006 (Vol 1. Seattle: IHWG Press):611–615.Google Scholar
- 25.Lin M, Chu C-C, Broadberry R, Yu L-C, Loo J-H, Trejaut J: Genetic diversity of Taiwan's indigenous peoples: possible relationship with insular Southeast Asia. In: Sagart, L.; Blench, R.; Sanchez-Mazas, A., eds. “The peopling of East Asia: putting together archaeology, linguistics and genetics”. Routledge Curzon, London and New York 2005:230–247.Google Scholar
- 28.Zhivotovsky LA, Underhill PA, Cinnioglu C, Kayser M, Morar B, Kivisild T, Scozzari R, Cruciani F, Destro-Bisol G, Spedini G, et al. The effective mutation rate at Y chromosome short tandem repeats, with application to human population-divergence time. Am J Hum Genet. 2004;74(1):50–61.PubMedCrossRefGoogle Scholar
- 29.Delfin F, Salvador JM, Calacal GC, Perdigon HB, Tabbada KA, Villamor LP, Halos SC, Gunnarsdottir E, Myles S, Hughes DA, et al. The Y-chromosome landscape of the Philippines: extensive heterogeneity and varying genetic affinities of Negrito and non-Negrito groups. Eur J Hum Genet. 2010;19(2):224–30.PubMedPubMedCentralCrossRefGoogle Scholar
- 30.Heyer E, Georges M, Pachner M, Endicott P. Genetic diversity of four Filipino negrito populations from Luzon: comparison of male and female effective population sizes and differential integration of immigrants into Aeta and Agta communities. Hum Biol. 2013;85(1–3):189–208.PubMedCrossRefGoogle Scholar
- 34.Chu CC, Lee HL, Trejaut J, Chang HL, Lin M. HLA-A, −B, −Cw and -DRB1 allele frequencies in Ami, Atayal, Bunun. Hakka, Paiwan, Pazeh, Puyuma, Rukai, Saisiat, Tsou, Taroko, Thao and Tao populations from Taiwan. Human Immunology Special Issue: HLA alleles and other immunogenetic polymorphism frequencies from world wide populations Guest editors: Derek Middelton, John Sanil Manavalan, Marcelo A Fernandes-Vina ASHI. 2004;65(9/10):1102–81.Google Scholar
- 48.Wu F-C, Chen M-Y, Chao C-H, Pu C-E. Study on the genetic polymorphisms of Y chromosomal DNA short tandem repeat loci applied to analyzing the relative affinities among ethnic groups in Taiwan. Forensic Science International: Genetics Supplement Series. 2013;4:e69–70.Google Scholar
- 49.Bellwood P: The origins and dispersals of agricultural communities in southeast Asai. In: Southeast Asia: from prehistory to history, eds. By Ian Glover and Peter Bellwood, London and New York: Routledge Curzon, pp, 21–40. 2004.Google Scholar
- 50.Li P-JK: Formosan languages: the state of the art. In: Austronesian Taiwan: linguistics, history, ethnology, prehistory, ed. by David Blundell. Revised edition. Taipei/Berkeley: Shung Ye Museum of Formosan Aborigines/Phoebe a. Hearst Museum of Anthropology, University of California Berkeley, pp. 47–70. 2009.Google Scholar
- 59.Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinformatics Online. 2007;1:47–50.Google Scholar
- 64.Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A, et al. Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of central asian pastoralists. Am J Hum Genet. 2006;78(2):202–21.PubMedCrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.