Advertisement

medizinische genetik

, Volume 31, Issue 2, pp 191–197 | Cite as

Molecular diagnostics of Mendelian disorders via combined DNA and RNA sequencing

  • Holger ProkischEmail author
Open Access
Schwerpunktthema: NGS aktuell
  • 175 Downloads

Abstract

The diagnostic yield in rare disorders is currently less than 50% although sequencing technologies in use are able to detect the majority of possible variants in our genome. The diagnostic gap is in part due to limitations in prioritizing and interpreting identified variants. The integration of functional data, such as transcriptomics, is emerging as a powerful complementary tool in diagnostics. It is able to quantify aberrant splicing, validate nonsense-mediated mRNA decay for potential loss-of-function variants, identify mono-allelically expressed variants, and help prioritize variants not predicted to change the encoded protein. Moreover, RNA-sequencing has been validated as a tool for the discovery of pathogenic variants in novel Mendelian disease genes. As RNA sequencing provides complementary information to DNA sequencing and can easily be established in addition to DNA sequencing, it has great potential for implementation as a routine tool for improving molecular diagnosis.

Keywords

Aberrant expression Mitochondrial disorders Aberrant splicing Whole-genome sequencing 

Molekulare Diagnostik von monogenetischen Erkrankungen durch kombinierte DNA- und RNA-Sequenzierung

Zusammenfassung

Die diagnostische Ausbeute bei seltenen Erkrankungen beträgt derzeit weniger als 50 %, obwohl die verfügbaren Sequenzierungstechnologien in der Lage sind, die meisten der möglichen DNA-Varianten in unserem Genom nachweisen zu können. Die diagnostische Lücke ist zum Teil auf Einschränkungen bei der Priorisierung und Interpretation identifizierter Varianten zurückzuführen. Die Integration funktionaler Daten, wie z. B. Transcriptomics, entwickelt sich zu einem leistungsfähigen ergänzenden Instrument in der Diagnostik. Die RNA-Sequenzierung ist in der Lage, anomales Splicing zu quantifizieren, „nonsense-mediated mRNA decay“ zu validieren, monoallelisch exprimierte Varianten zu identifizieren und bei der Priorisierung von Varianten zu helfen, von denen nicht erwartet wird, dass sie das kodierte Protein verändern. Darüber hinaus kann die systematische Analyse von RNA-Sequenzierungsdaten pathogene Varianten in Genen identifizieren, die noch nicht mit Erkrankungen beschrieben wurden. Da die RNA-Sequenzierung komplementäre Informationen zur DNA-Sequenzierung liefert und leicht über die DNA-Sequenzierung hinaus etabliert werden kann, bietet sie ein hohes Potenzial, als Routineinstrument zur Verbesserung der molekularen Diagnose eingesetzt zu werden.

Schlüsselwörter

Aberrante Genexpression Splicing Mitochondriale Erkrankung Gesamtgenomsequenzierung 

Next-generation sequencing (NGS) has transformed diagnostic protocols for Mendelian diseases. Although in the past it could be a long, frustrating and often futile battle for parents with an affected child to find the cause of their child’s suffering, the availability of whole-exome sequencing (WES) and whole-genome sequencing (WGS) has made molecular diagnosis—at least conceptually—possible for every patient. Genetic confirmation of diagnosis can be key for treatment, removes uncertainty, and may be important for future family planning. However, this promise has not been fully kept. For mitochondrial and other diseases, the analysis of the coding sequence does not lead to a diagnosis in 50–75% of patients. This figure indicates that in numerous cases, the pathogenic variants escape detection, were detected but erroneously classified as a variant of uncertain significance (VUS), or were part of a more complex genetic constellation.

Limitations of DNA sequencing in diagnostics

Only 25–50% of patients receive a firm genetic diagnosis after WES, often because of limitations concerning the coverage of the target regions, the detection of intronic and regulatory variants, the bioinformatic filtering and prioritization of potential pathogenic variants, and knowledge about the molecular and clinical consequences of genetic variants. WGS improves the coverage and allows detection of extra-exonic variants and structural variants [12]. When focusing on the coding region, WGS currently improves diagnostics of recessive disorders only marginally. When the search space is extended to the full genome, the currently most effective filter for minor allele frequency of 0.1–1.0% is not effective. In a single WES dataset, the frequency filter already yields on average 100–200 variants (25 bi-allelic) requiring manual interpretation. Outside the exome, the numbers of such variants are two orders of magnitudes higher [1]. Moreover, although our understanding of coding variants is incomplete, our understanding of non-coding sequences is severely restricted. The capability of sequencing technology and bioinformatics tools are developing quickly and provide comprehensive genome annotation, much faster than our ability to define the clinically relevant impact of detected variants.

Limitations in DNA variant interpretation

A definitive diagnosis is based on the discovery of known pathogenic variant(s) in a patient with a specific clinical presentation similar to the clinical picture reported multiple times, usually listed in the disease-variant database ClinVar [9]. However, this is not the common situation, neither on a variant nor on a phenotype level. We observe a continuous extension of the phenotypic spectrum associated with variants in the same gene or even the same variant. The increasing overlap of clinical presentations of genetically different disorders is additionally weakening the discriminating power of established genotype–phenotype associations. Identification of possible protein-truncating variants in genes for which non-truncating/in-frame pathogenic loss-of-function variants are known can already be challenging. If transcripts affected by such variants escape nonsense-mediated mRNA decay (NMD), they may still produce a functional protein by the mechanism of translation re-initiation [15], by functional alternative transcript isoforms [13], or by preserved/residual function of the truncated protein [7]. A recent systematic study shows that exons present only in tissue specific isoforms may not be essential for protein function [5]. In many cases, the candidate variant is even more difficult to interpret, such as rare missense, (near) splice site, intronic, and synonymous variants. Therefore, data describing the functional consequences on the molecular level are required to advance diagnostics.

The value of RNA sequencing in diagnostics

Transcriptomics by RNA sequencing (RNA-seq) takes advantage of new sequencing protocols and allows direct insights into the transcriptome of cell lines or tissues, reflecting a snapshot of a specific time point [11]. With a focus on protein coding genes, the procedures usually include an enrichment step for full-length Poly(A) transcripts followed by cDNA synthesis and sequencing; however, many other protocols exist to analyze total RNA, circular RNA or micro RNA to name a few. RNA-seq of full-length mRNA has the capability to detect and quantify known pre-defined RNA species, in addition to rare and novel RNA transcript variants and isoforms [3]. Hence, it uncovers the transcriptional consequences of genetic variants either previously prioritized or previously missed by the applied filters in the bioinformatics pipeline. RNA-seq provides a single assay to validate and quantify the impact of potential regulatory or splice defects for all genes expressed in a biological sample. Moreover, RNA-seq has been validated as a tool to indicate novel Mendelian disease genes through the identification of pathogenic variants in the respective genes [8]. With a diagnostic yield between 10 and 35%, two recent studies convincingly demonstrated the power of combined DNA and RNA sequencing [4, 8]. Whereas Kremer et al. performed RNA-seq on fibroblast cell lines from patients with suspected mitochondrial disorders, Cummings studied muscle biopsy samples from patients with muscular disorders. In both cases, the tissue was carefully selected. More than 90% of the known mitochondrial disease genes were reliably detected in fibroblast cell lines, and muscular disease genes in muscle biopsies respectively. However, this is not applicable for the whole spectrum of tissues, e.g., in the usually available tissue, blood, only about two thirds of the known disease genes are expressed.

RNA-seq data can be analyzed using gene-specific questions to refine transcript isoform annotation and to verify the consequence of a suspected variant on a specific transcript, thereby replacing quantitative RT-PCR and cDNA sequencing in a comprehensive assay including a number of controls. In cases where only the index cases is available, it enables haplotype phasing of two variants in different exons represented on continuous RNA reads. Examples were transcriptome analysis providing complementary information to DNA sequencing, including three cases with non-pathogenic protein truncating variants (Fig. 1). In cases of a homozygous frameshift mutation in exon 2 of ATP7B, we detected mRNA expression comparable with healthy controls, suggesting that NMD could be bypassed by the mechanism of translation re-initiation. This was confirmed by Western blot and functional tests of copper export capacity [15]. In another case, we identified bi-allelic frameshift variants in FLAD1, which encodes FAD synthase. Because FADS is essential for cellular supply of FAD cofactors, the finding of bi-allelic frameshift variants was unexpected. RNA-seq analysis discovered a novel FLAD1 isoform missing the affected exon, explaining why bi-allelic FLAD1 frameshift variants still harbor substantial FADS activity [13]. In a patient with a mitochondrial disorder, we found homozygous, protein-truncating variants in LYRM7 and MTO1, two genes encoding essential mitochondrial proteins. Transcriptome and proteome studies confirmed normal expression of the truncated MTO1 and we did not find any indication of impaired MTO1 activity [7].
Fig. 1

Selected examples where transcriptome analysis provided complementary information to DNA sequencing. UTR untranslated region, NMD nonsense-mediated decay

In addition to the validation of the impact of an identified VUS on the corresponding transcript, RNA-seq data can also be analyzed transcriptome-wide to detect aberrant gene expression. In such systematic analysis of RNA-seq data, searching for extremes (as detailed below) allows candidate disease-causing genes for rare disorders to be identified and prioritized. To focus on rare and recessive diseases, we applied stringent filtering for rare events with strong effect sizes, as described below.

Mono-allelic expression (MAE) is where one allele is silenced, leading to expression of only the second allele. When assuming a recessive mode of inheritance, genes with a single heterozygous rare coding variant identified by WES or WGS analysis are not prioritized [6]. However, MAE of such variants fits the recessive mode of inheritance assumption. Detection of mono-allelic expression can thus help to re-prioritize heterozygous rare variants. Our setting is based on the use of fibroblast cell lines, where about 7500 heterozygous SNPs identified by genotyping are covered by RNA-seq reads at least ten times, allowing detection of alleles expressed by at least 90% [8]. Six of the MAE alleles carry rare single-nucleotide variants (SNVs) affecting the protein sequence.

Aberrant expression, identified as gene expression outliers, occurs when expression is outside their physical range and usually implies impaired gene expression of both alleles with decreased expression levels of less than 50% of the controls. It can result from RNA degradation through nonsense-mediated decay (NMD) based on either apparently protein-truncating variants or splice defects, but it can also result from non-coding variants in regulatory regions such as promoters, enhancers, suppressors or variants in the untranslated region of the transcripts or combinations thereof. The genome-wide analysis reveals a median of only one aberrantly expressed gene per sample (Fig. 2; [8]).
Fig. 2

Complex I subunit NDUFA10 expression outlier in a patient with complex I-deficiency. a Volcano plot visualization shows five expression outliers (red dots) with NDUFA10 showing the lowest z score and p value. b Normalized read counts over 160 samples indicate this sample to be the only expression level outlier for NDUFA10. c Integrative genomics viewer (IGV) representation of a homozygous 20 base pair deletion in the 5′UTR of NDUFA10

Aberrant splicing has been recognized as a major cause of Mendelian disorders for a long time [14]. A systematic study of SNVs in ClinVar predicted that 20 to 30% of VUS and pathogenic variants cause aberrant splicing patterns [10]. However, the prediction of splicing defects from genetic sequences is difficult, because splicing involves a complex set of cis-regulatory elements that are not yet fully understood. Some of them can have deep intronic location and are thus not covered by WES. Hence, direct probing of splice isoforms by RNA-seq is important, and has led to the discovery of multiple splicing defects based on single-gene studies. To detect aberrant splicing events, we adapted an algorithm for splicing quantitative trait loci to the context of rare disorders. This pipeline is based on an annotation-free algorithm that is also able to detect novel splice sites. A median of five aberrantly spliced genes are detected per sample [8]. Aberrant splicing is not only caused by variation affecting known splice sites or splice motifs, it can also be the consequence of variants creating novel splice sites or splice motifs within coding or deep-intronic regions (Fig. 3). Splicing abnormalities include exon creation, skipping, extension and truncation, or a combination thereof, but also intron inclusion and often leads to premature in-frame stop codons, provoking degradation of the RNA by NMD, which may frequently be detected as aberrant expression. The RNA-seq data allow the characterization of all novel transcript isoforms. Quantification of the reads connecting the reference and aberrantly spliced exons may provide a direct readout of the DNA variant’s consequences (Fig. 4).
Fig. 3

Example of a synonymous homozygous variant in C10orf2 creating a novel splice acceptor site. a Sashimi plot visualization of a detected exon truncation. The main transcript has a deletion of 62 base pairs of exon 2. b IGV visualization of DNA and RNA sequencing data. The predicted synonymous variant enhances splicing 4 base pairs downstream and changes the minor isoform (10% of reads in controls) to the main isoform (14 out of 18 reads)

Fig. 4

Example of a complex pattern of aberrant splicing due to a homozygous near splice variant in MRPL44. As a consequence of the near splice variant, three new transcript isoforms are produced, including exon truncation (isoform 2), intron retention (isoform 3), and exon elongation (isoform 4), in addition to the reference transcript isoform 1 (18% of all transcripts)

The small number of less than 20 aberrantly expressed genes per sample allows a manual inspection and evaluation of the RNA-seq data and improved clinical interpretation in the context of the genetic and clinical data.

The RNA-seq protocols and bioinformatics pipelines presently in use are focused on the gene level for expression outliers, on exon/intron or splice site level for aberrant splicing, and on SNPs for mono-allelic expression in a specific tissue or cell line. The development of long-read sequencing will also allow consideration of more complex situations in large genes with multiple transcript isoforms and single-cell RNA-seq protocols will increase the resolution of average expression level from a certain tissue to specific cell types and will allow the cell specific regulation and imprinting mechanism to be studied. However, the methods provide only a snapshot of the cells studied and the non-detection of aberrant expression in a surrogate tissue does not allow normal splicing in the affected tissue to be concluded, which represents a clear limitation. Currently, several RNA-seq analysis pipelines are available, but further improvement is necessary to optimize sensitivity and specificity. To automate and optimize the correction of confounding technical, environmental, or common genetic variations, we recently developed OUTRIDER. OUTRIDER improved the detection of aberrant expression, based on the assessment of statistical significance [2]. Further method development is nevertheless required, especially for the detection of aberrant splicing events and the prediction of causal variants.

Practical conclusion

By integrating phenotype and genotype information only, less than 50% of Mendelian disorders are diagnosed
  • This diagnostic gap is in part due to limitations in prioritizing and interpreting identified variants

  • Transcriptomis by RNA sequencing provides complementary functional information to DNA sequencing

  • RNA sequencing delivers quantitative data on RNA expression level, aberrant splicing, and allele specific expression

  • The systematic analysis helps prioritizing variants predicted or not predicted to change the encoded protein

  • RNA sequencing has been validated as a tool for the discovery of pathogenic variants in novel Mendelian disease genes

  • RNA sequencing has a high potential to be implemented as a routine tool to improve molecular diagnosis.

Notes

Acknowledgements

I am grateful for the cooperation of the patients, clinicians, and scientists in the German and European networks for mitochondrial diseases: mitoNET (BMBF), GENOMIT (BMBF, Horizon2020). I would also like to thank the research team of Prof. Gagneur (Technical University, Munich), for collaborating on establishing the RNA analysis pipeline.

Compliance with ethical guidelines

Conflict of interest

H. Prokisch declares that he has no competing interests.

For this article no studies with human participants or animals were performed by any of the authors. All studies performed were in accordance with the ethical standards indicated in each case.

References

  1. 1.
    1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526(7571):68–74CrossRefGoogle Scholar
  2. 2.
    Brechtmann F, Mertes C, Matusevičiūtė A, Yépez VA, Avsec Ž, Herzog M, Bader DM, Prokisch H, Gagneur J (2018) OUTRIDER: a statistical method for detecting aberrantly expressed genes in RNA sequencing data. Am J Hum Genet 103(6):907–917CrossRefGoogle Scholar
  3. 3.
    Byron SA, Van Keuren-Jensen KR, Engelthaler DM, Carpten JD, Craig DW (2016) Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet 17(5):257–271CrossRefGoogle Scholar
  4. 4.
    Cummings BB, Marshall JL, Tukiainen T, Lek M, Donkervoort S, Foley AR, Bolduc V, Waddell LB, Sandaradura SA, O’Grady GL, Estrella E, Reddy HM, Zhao F, Weisburd B, Karczewski KJ, O’Donnell-Luria AH, Birnbaum D, Sarkozy A, Hu Y, Gonorazky H, Claeys K, Joshi H, Bournazos A, Oates EC, Ghaoui R, Davis MR, Laing NG, Topf A, Genotype-Tissue Expression Consortium, Kang PB, Beggs AH, North KN, Straub V, Dowling JJ, Muntoni F, Clarke NF, Cooper ST, Bönnemann CG, MacArthur DG (2017) Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci Transl Med 9(386).  https://doi.org/10.1126/scitranslmed.aal5209 Google Scholar
  5. 5.
    Cummings BB, Karczewski KJ, Kosmicki JA, Seaby EG, Watts NA, Singer-Berk M, Mudge JM, Karjalainen J, Satterstrom KF, ODonnell-Luria A, Poterba T, Seed C, Solomonson M, Alfoldi J, The Genome Aggregation Database Production Team, The Genome Aggregation Database Consortium, Daly MJ, MacArthur DG (2019) Transcript expression-aware annotation improves rare variant discovery and interpretation. https://www.biorxiv.org/.  https://doi.org/10.1101/554444 Google Scholar
  6. 6.
    Eilbeck K, Quinlan A, Yandell M (2017) Settling the score: variant prioritization and Mendelian disease. Nat Rev Genet 18(10):599–612CrossRefGoogle Scholar
  7. 7.
    Kremer LS, L’hermitte-Stead C, Lesimple P, Gilleron M, Filaut S, Jardel C, Haack TB, Strom TM, Meitinger T, Azzouz H, Tebib N, Ogier de Baulny H, Touati G, Prokisch H, Lombès A (2016) Severe respiratory complex III defect prevents liver adaptation to prolonged fasting. J Hepatol 65(2):377–385CrossRefGoogle Scholar
  8. 8.
    Kremer LS, Bader DM, Mertes C, Kopajtich R, Pichler G, Iuso A, Haack TB, Graf E, Schwarzmayr T, Terrile C, Koňaříková E, Repp B, Kastenmüller G, Adamski J, Lichtner P, Leonhardt C, Funalot B, Donati A, Tiranti V, Lombes A, Jardel C, Gläser D, Taylor RW, Ghezzi D, Mayr JA, Rötig A, Freisinger P, Distelmaier F, Strom TM, Meitinger T, Gagneur J, Prokisch H (2017) Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat Commun 8:15824CrossRefGoogle Scholar
  9. 9.
    Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Jang W, Karapetyan K, Katz K, Liu C, Maddipatla Z, Malheiro A, McDaniel K, Ovetsky M, Riley G, Zhou G, Holmes JB, Kattman BL, Maglott DR (2018) ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46(D1):D1062–D1067CrossRefGoogle Scholar
  10. 10.
    Lee H, Deignan JL, Dorrani N, Strom SP, Kantarci S, Quintero-Rivera F, Das K, Toy T, Harry B, Yourshaw M, Fox M, Fogel BL, Martinez-Agosto JA, Wong DA, Chang VY, Shieh PB, Palmer CG, Dipple KM, Grody WW, Vilain E, Nelson SF (2014) Clinical exome sequencing for genetic identification of rare Mendelian disorders. JAMA 312(18):1880–1887CrossRefGoogle Scholar
  11. 11.
    Lowe R, Shirley N, Bleackley M, Dolan S, Shafee T (2017) Transcriptomics technologies. Plos Comput Biol 13(5):e1005457CrossRefGoogle Scholar
  12. 12.
    Merker JD, Wenger AM, Sneddon T, Grove M, Zappala Z, Fresard L, Waggott D, Utiramerur S, Hou Y, Smith KS, Montgomery SB, Wheeler M, Buchan JG, Lambert CC, Eng KS, Hickey L, Korlach J, Ford J, Ashley EA (2018) Long-read genome sequencing identifies causal structural variation in a Mendelian disease. Genet Med 20(1):159–163CrossRefGoogle Scholar
  13. 13.
    Olsen RKJ, Koňaříková E, Giancaspero TA, Mosegaard S, Boczonadi V, Mataković L, Veauville-Merllié A, Terrile C, Schwarzmayr T, Haack TB, Auranen M, Leone P, Galluccio M, Imbard A, Gutierrez-Rios P, Palmfeldt J, Graf E, Vianey-Saban C, Oppenheim M, Schiff M, Pichard S, Rigal O, Pyle A, Chinnery PF, Konstantopoulou V, Möslinger D, Feichtinger RG, Talim B, Topaloglu H, Coskun T, Gucer S, Botta A, Pegoraro E, Malena A, Vergani L, Mazzà D, Zollino M, Ghezzi D, Acquaviva C, Tyni T, Boneh A, Meitinger T, Strom TM, Gregersen N, Mayr JA, Horvath R, Barile M, Prokisch H (2016) Riboflavin-Responsive and -Non-responsive Mutations in FAD Synthase Cause Multiple Acyl-CoA Dehydrogenase and Combined Respiratory-Chain Deficiency. Am J Hum Genet 98(6):1130–1145CrossRefGoogle Scholar
  14. 14.
    Scotti MM, Swanson MS (2016) RNA mis-splicing in disease. Nat Rev Genet 17(1):19CrossRefGoogle Scholar
  15. 15.
    Stalke A, Pfister ED, Baumann U, Eilers M, Schäffer V, Illig T, Auber B, Schlegelberger B, Brackmann R, Prokisch H, Krooss S, Bohne J, Skawran B (2019) Homozygous frame shift variant in ATP7B exon 1 leads to bypass of nonsense-mediated mRNA decay and to a protein capable of copper export. Eur J Hum Genet.  https://doi.org/10.1038/s41431-019-0345-1 Google Scholar

Copyright information

© The Author(s) 2019

Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Institut für Humangenetik, Klinikum rechts der IsarTechnische Universität MünchenMunichGermany
  2. 2.Institut für HumangenetikHelmholtz Zentrum MünchenNeuherbergGermany
  3. 3.Beijing Children’s HospitalCapital Medical UniversityBeijingChina

Personalised recommendations