Background

The discovery of cell-free fetal DNA enables noninvasive prenatal testing (NIPT) for common aneuploidies [1, 2], microdeletion/microduplication syndromes [3,4,5], and monogenic disorders. Initially, NIPT of monogenic disorders focused on detecting de novo or paternally inherited variants responsible for dominant monogenic disorders [6, 7]. Reports indicate that the average genomic carrier burden for severe pediatric recessive variants is 2.8 per person [8] and that the cumulative prevalence among live births is approximately 0.8% [9]. NIPT for most recessive monogenic disorders involves several technical challenges and has only been made clinically available for a limited number of recessive conditions [10] despite the relatively high prevalence of such disorders, because analysis of maternally inherited fetal alleles has been hampered by the high background of maternal DNA in cell-free DNA (cfDNA) [11]. The current approaches for NIPT of recessive diseases are typically classified into two categories [12, 13]: relative mutation dosage (RMD) analysis [14] and relative haplotype dosage (RHDO) analysis [15]. The RMD approach focuses on quantitative comparisons between variant and wild-type alleles present in cfDNA and has relatively high sensitivity and specificity [14, 16]. This approach is powerful for detecting single nucleotide variants (SNVs) and small insertions/deletions (InDels) but usually cannot detect large InDels and copy number variants (CNVs) [17,18,19]. Its performance is also affected by sequencing errors and amplification bias of low-abundance fetal variants in cfDNA [20]. Unlike the RMD approach, the RHDO approach determines the relative proportions of variant and normal haplotypes in maternal plasma [21] and can theoretically detect most types of variants, including large InDels and CNVs, in one test [16, 22]. However, RHDO analysis requires parental haplotype information [23]. Although molecular phasing approaches to determine parental haplotypes, including linked-read sequencing [24, 25] and targeted locus amplification (TLA) [26], have not been widely used in clinical settings due to their high cost and complex procedures [27,28,29,30,31,32], population-based parental haplotyping provides an alternative approach due to its rapid turnaround and inexpensive and relatively simple procedures. However, the use of this method has been limited to a founder variant (GBA gene, c.1226A>G) [33].

Thalassemia causes hemoglobin deficiency and affects approximately 4.4 per 10,000 live births worldwide [34]. Its genetic complexity involves three types of variants: SNVs, InDels, and CNVs. In southern China, the most prevalent variants of α-thalassemia are -α3.7 deletion, --SEA deletion, -α4.2 deletion, HBA2 c.369C>G, and HBA2 c.427T>C, while those of β-thalassemia are HBB c.126_129delCTTT, HBB c.52A>T, HBB c.316-197C>T, HBB c.-78A>G, and HBB c.79G>A [35, 36]. Our population screening data for thalassemia [35] shows that these 10 variants account for 87.9% of β-thalassemia carriers and 96.5% of α-thalassemia carriers (Additional file 1: Fig. S1).

In the present study, we proposed a novel population-based haplotyping-NIPT method (PBH-NIPT) for α-thalassemia and β-thalassemia in which nonfounder variants were detected when the sample size of the reference panel (population data used to infer parental haplotypes) was sufficiently large for accurate deduction of parental haplotypes. The PBH-NIPT model was trained on a large retrospective carrier screening dataset, and its accuracy was verified via invasive prenatal diagnosis. In addition, we assessed the effect of the reference panel sample size on the outcomes of PBH-NIPT.

Methods

Patients and samples

The ethics committees of Guangzhou Women and Children’s Medical Center and BGI approved this study (approval numbers: 2017102408 and BGI-IRB 18043). Fifty-nine couples at risk of having a fetus with thalassemia provided written informed consent. The clinical features of the participants are provided in the supplement (Additional file 2: Table S1). We collected 5 ml of blood from each parent. We promptly isolated maternal plasma using a two-step centrifugation method [37]. We used 10 ml of amniotic fluid (AF) or 5 mg of chorionic villus sample (CVS) for invasive prenatal diagnosis.

Sequencing library preparation

We extracted cfDNA from maternal plasma using a QIAamp Circulating Nucleic Acid Kit (Qiagen, Dusseldorf, Germany) and extracted parental gDNA from peripheral blood and fetal DNA from CVS or AF using a QIAamp DNA Mini Kit (Qiagen).

We used gDNA (500 ng) for library construction and fragmented it ultrasonically with a Bioruptor Pico (Diagenode, Liege, Belgium), yielding 300–700-bp fragments. We then performed end repair, phosphorylation, and A-tailing reactions on the sheared DNA and ligated BGISEQ adaptors with specific barcodes to the A-tailed products. We performed 4–6 cycles of polymerase chain reaction (PCR) amplification to enrich the target regions and performed hybridization capture according to the NimbleGen protocols after pooling twenty barcoded gDNA libraries in equal amounts. Finally, we performed circularization of the post-capture library to generate circular single-stranded DNA (ssDNA). We prepared the maternal plasma DNA library using the same method except without fragmentation and pooled eight cfDNA libraries in equal amounts. After quantitation using Qubit 3.0 (Thermo Fisher, Waltham, USA), we used rolling circle replication to form DNA nanoballs (DNBs) from the ssDNA and loaded each DNB into 1 lane to be processed for 100-bp paired-end sequencing on the BGISEQ-500 and MGISEQ-2000 platforms (BGI, Shenzhen, China).

Reference panel construction

We generated the reference panel from 4356 thalassemia carrier screening-positive cases. Of the total 4356 cases, 3867 were obtained from our previously published paper [35], and 489 were obtained from unpublished in-house data.

We first used our previously published algorithm [35] to call SNPs from 4356 positive carriers and then filtered SNPs with a sequencing depth of less than 20-fold in more than 2% of the population or with an allele ratio between 5 and 40% in more than 70% of heterozygous individuals in the population. We used the publicly available software Beagle (version 4.0) to construct haplotypes for 4356 individuals and used these data as the reference panel for the next step. Since SNPs and InDels are the acceptable input for Beagle, we treated CNVs as SNPs in the phasing procedure. CNVs are represented as the VCF format of SNPs in the Beagle input file (VCF format), where the genomic position is the start position of the CNV, and the genotypes “0/1” and “1/1” represent heterozygous and homozygous CNVs, respectively.

Construction of parental haplotypes by PBH

We aligned the sequence reads from parental gDNA and maternal plasma DNA to the reference human genome (hg19) using BWA version 0.7.12. We marked duplicate reads with Picard version 1.87 and performed variant calling as previously described [35]. We also treated CNVs as SNPs in the phasing procedure. We used the haplotypes of the reference panel and the genotypes of the parents as inputs to deduce parental haplotypes with Beagle 4.0. Finally, we used only heterozygous SNPs to represent parental haplotypes.

NIPT of thalassemia

We calculated the fetal fraction (FF) as described in Additional file 3 and inferred fetal haplotypes inherited from the father and mother separately. First, we determined paternal inheritance using paternal informative SNPs, which were heterozygous in the father but homozygous in the mother. Second, we determined maternal inheritance using maternal informative SNPs, which included two types of SNPs: (1) SNPs heterozygous in the mother but homozygous in the father and (2) SNPs heterozygous in the parents in the blocks where the first step inferred the fetal inherited haplotype from the father. Because informative SNPs linked to the inherited haplotype are overrepresented in maternal plasma, we applied the hidden Markov model (HMM) and Viterbi algorithm [38] to determine the fetal genotypes of pathogenic sites (Additional file 3: Supplementary Methods). For samples with CNVs, all SNPs in the CNV region were not selected as informative SNPs to perform Viterbi decoding.

Invasive prenatal diagnosis of thalassemia

We performed invasive prenatal diagnosis via chorionic villus sampling or amniocentesis in accordance with standard protocols. We determined fetal genotypes through gap-PCR and reverse dot blot PCR (RDB-PCR).

The effect of the reference panel sample size on the outcomes of PBH-NIPT

To assess the effect of the reference panel sample size on the outcomes of PBH-NIPT, we randomly selected one-half, one-quarter, one-sixth, one-eighth, one-twelfth, and 50 of the samples from the total reference panel and performed three independent tests.

Results

As shown in Fig. 1, the PBH-NIPT workflow involves the following steps. First, we generated the reference panel from 4356 thalassemia carrier screening-positive cases. Of the total 4356 cases, 3867 were obtained from our previously published paper [35], and 489 were obtained from unpublished in-house data. Second, we enrolled 59 couples in whom both partners carried at least one of the 10 aforementioned variants and were at risk of having a fetus with thalassemia major or intermedia [39] (Additional file 2: Table S1). The average gestational age at the time of collection was 12.6+3 weeks (range 10+1–22 weeks), and the average FF was 15.4% (range 6.0–26.1%) (Additional file 4: Table S2). We subjected genomic DNA (gDNA) of the couples and fetuses as well as maternal cfDNA to hybridization-based capture and sequencing using a strategy previously described for thalassemia carrier screening [35]. We obtained an average target region coverage of 177-fold (range 56–678) in maternal plasma and 203-fold (range 85–360) in parental gDNA. Third, we inferred parental haplotypes by PBH (see the “Methods” section and Fig. 1). To evaluate the reliability of PBH, we also constructed parental haplotypes by family-based haplotyping (FBH) and calculated the percentage of concordant single-nucleotide polymorphisms (SNPs) phased by these two methods (Additional file 3: Supplementary Methods). The average concordance rates of phased SNPs in the maternal and paternal haplotypes were 98.7% (range 87.5–100%) and 95.7% (range 59.2–100%), respectively (Additional file 5: Fig. S2; Additional file 6: Table S3).

Fig. 1
figure 1

Population-based haplotyping-noninvasive prenatal testing (PBH-NIPT) workflow. The workflow involves 3 parts. We first constructed haplotypes of 59 couples using the haplotypes of 4356 samples as the reference panel. Then, we inferred fetal haplotypes using a hidden Markov model (HMM) and the Viterbi algorithm. Finally, we used invasive prenatal diagnosis results to validate PBH-NIPT results

To correctly infer fetal genotypes of pathogenic sites (rather than all SNPs), we developed a hidden Markov model (HMM) and used the Viterbi algorithm. We calculated a confidence score (CS), defined as the probability of obtaining the correct NIPT result, to evaluate the reliability of each prediction. A “no-call” condition was defined when (1) the CS was less than 0.99 or (2) the inferred haplotype contained two haplotype blocks (pathogenic and normal), and neither block spanned the target gene (HBB or HBA) (Additional file 3: Supplementary Methods). Accordingly, NIPT successfully inferred 111/118 (94.1%) alleles, and invasive prenatal diagnosis confirmed these alleles, with 99.1% (110/111 alleles) accuracy (95% CI, 95.1–100%) (Table 1, Fig. 2, and Additional file 7: Fig. S3). Among these 59 fetuses, 52 had both alleles detected; of these 52 fetuses, 15 were normal, 25 were carriers, and 12 were affected. Seven fetuses had only one allele successfully detected, and the other allele failed, with a CS of less than 0.99 (Table 1 and Fig. 2). Among the 7 fetuses with only one allele inferred by NIPT, 6 inherited the pathogenic allele. Obviously, invasive prenatal diagnosis was needed, which we used to clarify that 4 fetuses were affected and 2 were carriers.

Table 1 Noninvasive prenatal testing of thalassemia
Fig. 2
figure 2

Outcomes of PBH-NIPT. We performed PBH-NIPT on 59 couples. Fifty-nine fetuses had 111 total alleles confirmed by invasive prenatal diagnosis, with 99.1% accuracy (95% CI, 95.1–100%). Fifty-two fetuses had both alleles detected for a total of 104 alleles; 7 fetuses had only one allele detected

To evaluate the relationship between the accuracy of NIPT and the reference panel sample size, we randomly selected one-half, one-quarter, one-sixth, one-eighth, one-twelfth, and 50 of the samples from the total reference panel and performed three independent tests. As expected, in the 52 fetuses in whom NIPT inferred both alleles, the NIPT outcome improved as the reference panel sample size increased (Fig. 3). Reduction of the sample size to one-half of the total reference panel yielded accuracies of NIPT of approximately 89.3% for β-thalassemia and 95.1% for α-thalassemia relative to the invasive prenatal diagnosis results.

Fig. 3
figure 3

Effect of the reference panel sample size on the outcomes of PBH-NIPT. We randomly extracted one-half, one-quarter, one-sixth, one-eighth, one-twelfth, and 50 of the samples from the total reference panel and performed three independent tests for a β-thalassemia and b α-thalassemia. In the 52 fetuses in whom NIPT successfully inferred both alleles, the outcome of NIPT improved as the sample size of the reference panel increased

Discussion

This study demonstrated the feasibility of PBH-NIPT for thalassemia. PBH-NIPT can be used after carrier screening for thalassemia. For high-risk couples reluctant to undergo an invasive procedure, PBH-NIPT is a more attractive option, requiring only a simple blood draw from the pregnant woman. For most conditions, including the deduction of carrier and normal individuals using NIPT, no further confirmation is needed. For conditions where NIPT detects an affected fetus (12 cases) or detects only one pathogenic allele (6 cases), invasive prenatal diagnosis is recommended. In our study, PBH-NIPT dramatically reduced the number of invasive prenatal diagnosis required by approximately 69.5% (from 59 to 18 fetuses).

Here, NIPT successfully inferred 94.1% of the fetal alleles (111/118) from the 59 fetuses. Focusing on the 7 no-call cases clearly shows that the number of informative SNPs in all 7 cases was fewer than 3. This problem can be resolved by increasing the number of informative SNPs flanking the target gene through expansion of the target region [33]. Moreover, this study demonstrated that the reference panel size could affect the performance of NIPT. However, the reference panel size is not a limiting factor since large-scale expanded carrier screening for recessive monogenic disorders is common in clinical practice [40].

This study aimed to evaluate and provide a simple, fast, and inexpensive NIPT method for thalassemia. Compared with the current linked-read sequencing-based NIPT method, which requires 15–20 days (10 days of wet lab work and 5–10 days of data analysis) [25], PBH-NIPT requires only 5–7 days (4–5 days of wet lab work and 1–2 days of data analysis). Training PBH on a large reference panel requires only a few minutes [41]. The PBH-NIPT method costs approximately 80 dollars, as estimated in the supplement (Additional file 8: Table S4). Considering the cost of invasive prenatal diagnosis (~ 1000 dollars/sample [42]) testing in 6 cases, the actual cost of PBH-NIPT per sample is 174 dollars, which is significantly less than those of molecular haplotyping (~ 1500 dollars/sample [25]) and invasive prenatal diagnosis (~ 1000 dollars/sample [42]).

This study has two limitations. First, since all 59 families and training reference data were from southern China, the test cannot detect individuals with ethnic backgrounds differing from those in the training population. Currently, we can only consult ethnic information based on self-reports before testing. A potentially good solution would be to add a quantifiable QC parameter to provide guidance for the reliability of the test. Therefore, we will consider including SNPs that are able to distinguish ethnic information when designing the next version of the probe [43]. Second, the population frequency of these 10 variants was 0.15~2.66% in our dataset [35], and more data are needed to validate whether PBH-NIPT is able to detect variants with lower frequencies.

Conclusions

In summary, we developed and verified PBH-NIPT, a novel method for prenatal testing of α-thalassemia and β-thalassemia. Compared with invasive prenatal diagnosis, this method achieved 99.1% accuracy (95% CI, 95.1–100%). Therefore, we propose that this strategy might be extended to detect variants in addition to single-haplotype founder variants in other recessive monogenic disorders. Additional studies with larger sample sizes are required to confirm the application and performance of PBH-NIPT for other populations and variants with lower frequencies.