Introduction

The rapid adoption and use of genetic sequencing in clinical laboratories has been largely driven by the evolution of faster, better, and cheaper sequencing methods. The inflection point for clinical application was likely the introduction of automated capillary sequencing methods in the 1990s. With the introduction of next-generation sequencing (NGS) into the clinical space and the some early significant successes, the growth of genetic sequencing in the clinical laboratory is likely to accelerate even more. Today, genetic testing is available for more than 2,000 genes by clinical laboratories around the world (http://www.genetests.org) [1], and this number increases annually. The rapid increase in availability of genetic sequence information has also enabled clinical discovery, which then forms the basis of new clinical tests. As our knowledge of disease biology and genetics increases, the reach and utility of clinical genetic testing will only continue to expand and improve. The implementation of NGS will undoubtedly further accelerate both discovery and testing. In this chapter, we focus on the implementation of whole-genome sequencing (WGS) as a clinical laboratory test. This chapter is organized according to the workflow, and sections are arranged in terms of pre-analytic, analytic, and post-analytic considerations (Fig. 17.1).

Figure 17-1
figure 1

Process and workflow for genome sequencing. This figure depicts the major steps in the processing of a genome through next-generation sequencing. The pre-analytic section illustrates the important steps in establishing the test and good communication between the ordering physician and the laboratory. The analytic section shows the processing of the physical sample in the laboratory and the calling of the data using bioinformatics processes. The post-analytic section depicts the steps involved in aggregating information about the results, interpreting those results, and generating a report that can be returned to the ordering physician. The sections in this chapter provide detailed descriptions of these steps

Whereas WGS may appear to be a single test, it has many possible indications for use, and each requires different handling throughout the process. Therefore, we discuss the possible clinical indications for testing and the pre-analytical, analytical, and post-analytical requirements for each of these applications. These issues are addressed with regard to current professional and regulatory best practices, guidelines, and resources [2, 3]. However, this field is evolving rapidly, and whereas the principles in this chapter are likely to remain consistent, many details such as specific resources or databases that are discussed are likely to change; therefore, it is strongly recommended that additional resources be consulted when implementing WGS in a clinical laboratory. It is an exciting time to be involved in clinical genetic testing, as there is an opportunity to help drive important advances in medical care. However, WGS is also a nascent test type and as such the uses, potential, challenges, and concerns have not been entirely characterized yet. Good communication between laboratory and physician, careful analytical and bioinformatics processes, and thoughtful policy development are necessary to offer WGS as a clinical test.

Pre-analytical Considerations: Test Definition, Physician Support, and Process Development

The pre-analytical phase encompasses all steps taken prior to the actual testing of the sample. The introduction of any new test in a clinical laboratory requires several considerations prior to the physical launch of this clinical test. Several guidelines have been published to aid clinical laboratorians with the evaluation of when, how, and why to implement a new test ([4]; CLSI publications (multiple); CAP checklists). These guidelines include discussions of assessing clinical and regulatory concerns as well as financial and workflow considerations. Additional guidelines and recommendations specific to the implementation and offering of genomic sequencing testing have also recently been published [58]. The principles established in previous guidelines and best practice recommendations are very much still applicable and certainly should be included in the planning process. However, when the test involves a relatively new methodology that can be applied in a number of different ways, these multiple considerations must be refined and developed by the individual laboratories offering the testing. Newer guidelines, such as the ACMG (American College of Medical Genetics and Genomics) clinical laboratory standards for NGS [9], are particularly useful for considering the additional complexities that this new technology may introduce.

In the case of WGS, as it is commonly referred to, it is important to begin the initial assessment with a test definition and intended use statement to clarify the capabilities and expectations. First, it must be clear that WGS, and in particular clinical WGS, is not representative of every base position of the entire genome, nor can it detect all the types of sequence variants that might be present in a whole genome. For example, all sequencing methodologies tend to be error prone in regions with large nucleotide repeat expansions, such as the CGG repeat expansion associated with Fragile X disease; WGS using NGS is no exception to the rule. Additionally, while WGS is potentially able to detect many types of variants, including single nucleotide variants, copy number variants, insertions, deletions, and translocation events, it is not able to detect all of these different types of variants with the same levels of sensitivity and specificity. In particular, for clinical WGS, thresholds or statistical algorithms can be used to determine whether each variant call meets strict quality metrics that are used to ensure that when calls are made in a clinical context, they meet a minimum threshold of accuracy. This will be discussed further in the analytical portion of this chapter, but is called out here to emphasize that clinical WGS requires additional rigor that might, in some cases, reduce the WGS representation. Some enthusiastic doctors may consider ordering this test for a patient, without realizing that WGS might require supplementary testing in order to prove useful, and, in some cases, may not be the most appropriate test. Therefore, the first consideration when deciding to offer clinical WGS in one’s laboratory should be to consider what the test can and cannot be used for, and the degree to which the clinical laboratory is able to support the wide range of potential clinical questions for which WGS might be employed.

In defining the intended use statement for a WGS test, important components to consider include the following considerations:

  1. 1.

    Is WGS to be used as a preliminary screen, confirmatory test, test in aid of diagnosis, or as a test to make prognostic or management decisions after a diagnosis has been made?

  2. 2.

    Is it intended to address conditions caused by inherited or somatic genetic variants?

  3. 3.

    What types of variants are clinically relevant for the population being assessed, and how well does the technology detect these different types of variants?

    1. (a)

      Will multiple analyses or methods be combined?

  4. 4.

    What are the technical requirements for the condition(s) being assessed?

  5. 5.

    Who are the ordering physicians and what level of support will they need?

    1. (a)

      Are genetic counselors available to support questions from physicians?

    2. (b)

      What marketing materials, clear instructions, and definitions of terms will be needed? Will supplementary educational materials be needed?

  6. 6.

    Consent and information return policies for the laboratory

    1. (a)

      Who owns or has access to results, and for how long?

    2. (b)

      Do results constitute only the clinical report, or could the data be reanalyzed to address different questions at other dates?

When the clinical laboratory answers these questions, it rapidly becomes clear that the same whole-genome sequence could serve to support multiple different test definitions, and might require different support staff and educational materials, as well as multiple processing and reporting policies. A thorough evaluation of the laboratory, the population it serves, and the abilities and needs of both parties will be critical to defining how WGS is offered. It may be beneficial for a laboratory to perform a thorough analysis of the community it serves and to identify the most important needs of that population before defining and offering a test.

Today, the most common use of WGS is in the assessment of rare disease of suspected genetic etiology where symptoms may be overlapping or nonspecific and first tier testing has been inconclusive [1012] or where WGS presents the fastest possible aid for differential diagnostic evaluation [13]. Inherent in this approach is the expectation that the disease is caused by variants in a single gene (sometimes called monogenic or Mendelian conditions). The primary intention of clinical testing is not gene discovery; however, as with microarray testing, variants may be identified in genes for which the gene function is not yet established, but only suspected or perhaps completely undefined. In such cases, if those variants are thought to be likely causative, additional testing may be required to establish clinical validity and ideally clinical laboratories should have plans for how to make such recommendations to physicians who have ordered the test.

Analytical Considerations: Analytical and Bioinformatics Validations and Quality Control

The analytic phase of the testing begins after the pre-analytic phase and involves all of the processes that enable the actual testing of the sample to produce the analytical result. For NGS, this includes DNA extraction, DNA shearing and size selection, adapter ligation, library preparation, cluster generation, sequencing, alignment, variant calling, and all of the quality metrics associated with every stage of these processes. The process of DNA extraction depends on the type of sample being received, which may differ between different types of WGS tests. Diagnostic testing for Mendelian conditions is typically performed using DNA extracted from peripheral blood, whereas other types of testing may accept other types of samples, for example saliva. Preparation of the sample would therefore include the extraction of DNA from the specific sample type being tested. As part of this process, the evaluation of the quality and quantity of the DNA must occur prior to testing and meet all established quality parameters.

The subsequent process of DNA shearing and size selection, adapter ligation, library preparation, cluster generation, and sequencing are generally the same for all WGS testing, with potential differences being associated with factors such as read length and targeted depth of coverage. Before offering a test clinically, the clinical laboratory must validate the test for the specific performance metrics that were established in the test definition. For example, if one intends to detect substitutions, that must be validated, but a validation performed on substitutions does not assess the ability to accurately identify copy number variation (CNV) or insertion or deletion (indel) events. Additionally, regions that are variable across the genome, such as high and low GC regions, should be evaluated to understand the consistency of base calling quality. Validations are intended to evaluate the analytical sensitivity and specificity, limits of detection, and reportable range. During the process of validation, quality metrics and filters should be established that can then be used during ongoing quality control and assessment.

When considering an entire genome and the resulting number of data points that must be considered in that evaluation, multiple tiered validation approaches may be appropriate. One method of validation is to test the performance with a “truth set” of DNA. Many samples are available that have been sequenced using orthologous technology and contain specific, well-characterized and clinically valid mutations that are known to be causative for diseases (repositories such as Coriell, Hospital for Sick Kids, etc.). Many of these available samples include parent–child pedigrees, so in addition to confirming variant detection, subtraction and filters can be tested using these known relationships. The analysis should account for background conflicts that can be attributed to de novo mutations in every generation (<100/genome). The number of conflicts observed that exceed this background rate is dependent on the choice of aligner and variant caller and the settings that have been used to align reads and make genotype calls. Another approach involves deep sequencing of targeted regions that are relevant to the WGS; in such a case, it is recommended that multiple samples representing various regions of the genome, various GC and other regional genomic characteristics, and various types and complexities of variants are included in the analysis. These samples, if amplified by PCR and sequenced, typically result in a pool of sequence data that is quite deep, for example several hundreds of thousands or even millions of independently sequenced fragments. Re-sampling (bootstrapping) analyses can then be used to evaluate the depth and quality filters that yield high quality sequence. If done across multiple regions and using multiple samples, this experiment can be very useful in establishing the confidence in specific types of calls and in assessing how they calibrate to quality metrics. Additionally, confidence in different types of calls made and for different genomic regions (e.g., percent GC) can be established. Validation of WGS should be updated in the event of any processing changes, regardless of whether the chemistry or platform changes.

The quality of an NGS sequence relies on both the sequencing platform itself and the methods used to analyze the resulting data. For that reason, validations must be designed to establish both the sequencing and the pipeline used for analysis. Specific methods to evaluate the bioinformatics pipeline separate from the platform can be performed using datasets that are rapidly becoming available through efforts such as the National Institute of Standards and Technology (NIST) and Genetic Testing Reference Materials Coordination Program (GetRM). Synthetically generated data can also be used to test specific challenges to calling algorithms.

Transformation of signals produced during NGS into genetic calls of DNA bases involves a highly complex process that utilizes sophisticated bioinformatic analyses. Generally, there are three steps in the analysis—(1) preprocessing of reads, (2) alignment, and (3) variant calling. Preprocessing involves filtering out raw sequence data that do not meet certain quality criteria. The process of alignment involves mapping of reads to the reference human genome sequence, which may be obtained from the National Center for Biotechnology Information (NCBI), University of California Santa Cruz (UCSC) Genome Browser, or Ensembl. There are many tools that employ different algorithms to align reads; each offers trade-offs on speed and accuracy [14, 15]. Mapping is complicated by the fact that the reference genome is incomplete and because humans have some regions that may be individually variable. Because of this, approximately 5–10 % of reads will fail to be aligned. Mapping quality is measured and the confidence score assigned with each read placement. One of the community-accepted standards to represent alignments is in Binary Alignment Map (BAM) file format [16], which captures the above-mentioned data, allows efficient compression and enables random access of reads (when sorted) that align to a particular segment of the genome. Once the alignment procedure is complete, the BAM file serves as input to the next step in the bioinformatics pipeline—variant calling, where genetic variants are identified. Depending on the intended use of the test, a variety of variant calling tools, each one specializing in detecting small (single nucleotide variants (SNVs) and indels) or large genomic alterations (structural variants (SVs) and CNVs), might be employed. In some cases, several tools might be used in conjunction to identify different types of variants.

Variant calling algorithms are typically based on two main paradigms—the first one involves relying on base counting and allelic fraction to distinguish between a heterozygous and homozygous genotype call, the second involves probabilistic methods (Bayes’ theorem) to calculate posterior probability given the observed read data and a genomic prior probability [17]. The latter method accounts for noise in the data and helps provide a measure of statistical uncertainty associated with each genotype call in the form of a score. The score is usually a representation of the confidence in the genotype call. Although many algorithms report on variant positions, it is important to consider that the reference genome may contain a non-wild type allele, and to monitor the quality of the positions called as homozygous to the reference; no calls and poor quality homozygous reference calls should be considered in the downstream interpretation effort.

During the validation process, a clinical laboratory that is implementing WGS should be aware of and test for potential artifacts in processing. For example, the reference genome is not necessarily wild type. Perhaps the most prominent example of this is that the reference genome carries the Factor V Leiden mutation. Therefore, if a laboratory is only considering the variants that are called against the reference, such mutations may be missed in an individual who also carries this genotype. Assessment of reference allele frequency based on the 1,000 Genomes Project data shows that there are approximately 63,000 positions in the genome where the reference genome carries an allele that is present in populations at less than 1 % allele frequency. Additionally, for regions of the genome such as Human Leukocyte Antigen (HLA) locus, there is not necessarily a “wild type” per se, and additional information such as phasing may be necessary to confidently evaluate the variants found. While similar challenges exist for many types of clinical tests, laboratories should be aware of and prepared to manage such issues.

One challenge to the implementation of WGS in a clinical laboratory is that the analytical validity may not be the same for all regions of the genome, nor is it for all types of variants that may be of interest. The specific weaknesses and strengths of WGS must be considered when launching a test, and then communicated effectively and evaluated, potentially on a case-by-case basis, for appropriateness given the needs of the test in that specific situation.

Post-analytical Considerations: Interpretation and Reporting

The post-analytic process occurs after the analytic phase and includes the interpretation and reporting of the analytical calls produced in the analytic phase. As with the previous phases, the type of testing being performed has a significant impact on the post-analytic process.

A genome is approximately 3.1 billion data points, and contains around three to four million variable positions, including on average 9,600 amino acid changing positions and 73 premature termination positions (internal data). Given such a large amount of information, a thoughtful plan must exist for how to identify and evaluate the information that is most likely to be relevant and informative to the clinical questions being considered.

After achieving confidence in the quality of the genetic calls that have been made and defining the regions for which calling can be done with confidence, the clinical implications of the call should be assessed. This process can be divided into annotation, in which information and meta-data are gathered about the variant calls, interpretation, in which all the information is evaluated in the clinical context, and reporting, in which the information is communicated back to the ordering physician.

Historically, the assessment of clinical validity, or the strength of the relationship between a variant (or call) and a disease has been recommended but not required in genetic reporting. This is changing, and recent College of American Pathologists (CAP) guidelines now address how clinical laboratories should support the assessment of the clinical implications of a call. For instance, in cases with a single gene, this typically consists of an expert or panel of experts within the laboratory who evaluate each variant based on peer-reviewed publications and other factual evidence and categorize them for inclusion in the report. This process has become significantly more sophisticated in recent years, as there are now several databases and online tools that can aid in the assessment of clinical implications of variants.

The process of information gathering can be automated and is commonly referred to as annotation. As WGS is implemented in a laboratory, using a series of automated tools for annotation becomes necessary to support the large number of variants that are detected and require downstream evaluation. Tools are available online, such as the Variant Effects Predictor (VEP, part of the Ensembl suite of resources) that will gather information from a variety of databases as well as predict characteristics such as amino acid change based on transcript. Recommendations for types of information that should be gathered can be found in official publications by CAP and ACMG; these include information about which gene and transcript a specific variant is found in, the position of the variant in the genome, the DNA and amino acid change produced by the variant (using Human Genome Variation Society (HGVS) nomenclature), the consequence of the variant (e.g., intronic, upstream, missense, stop gained, synonymous), characteristics such as frequency of the variant and conservation of that position. Additionally, in silico structure or function prediction software such as SIFT [18] or PolyPhen [19] may provide additional information. When implementing the annotation process, it is very important to assess the annotation software suites that will be used, and confirm that the variant is being searched correctly and the information gathered is being downloaded and displayed properly. It is also important to keep in mind the unproven nature of many of the prediction software tools; while these may be useful in assessment, they are not yet reliable.

Having annotated the positions, interpretation of the variants for reporting can begin. The evaluation of evidence around what the clinical implications of variants might be is a critical process that is guided by both professional expertise [20, 21] and a pipeline that can support such evaluations (Fig. 17.2). Several biological and clinical characteristics of a variant should be considered. Biological characteristics include the type of variant, where it occurs in the gene, the frequency of the variant and possibly in silico evaluations of the variant. Clinical characteristics include whether the variant has been reported to be associated with a condition or phenotype and can take the form of case study reports, case–control studies and functional evaluations of the effect of the mutation in vitro or in vivo. These peer-reviewed publications in which a clinical phenotype or functional effect has been measured in individuals who carry the variant that is being reviewed are often the most compelling evidence in variant assessment. Careful literature searches, or searches of appropriate databases may be helpful in identifying the full body of literature that exists. It is important to remember that these databases may or may not be updated regularly, and may or may not be complete with regard to the actual publications that exist. Furthermore, many variants have been characterized in databases based on old information, and therefore, if a database reports a variant as pathogenic or of uncertain significance, it is important that the clinical laboratory perform an updated and independent assessment to ensure that this information is still valid. The gene in which the variant was detected must also be considered, and the degree to which the relationship of the gene relative to the disease is relevant. This includes a familiarity with phenomena such as whether certain types of mutations (e.g., activating) or regions of genes (specific exons) are known to be more or less likely to be associated with a particular disease. With regard to the disease, specifics of the mode of inheritance, prevalence of disease, and age of onset are important considerations. For example, if a disease has a prevalence of 1/100,000 and is autosomal recessive, then, using Hardy–Weinberg principles, a variant with a frequency higher than 1 % is unlikely to be causing that disease. Likewise, if mutations known to cause disease are exclusively gain-of-function, then a stop mutation or silent mutation is less likely to be considered pathogenic. Finally, when reporting the results, the clinical questions and context of the patient must be considered: is this a diagnostic evaluation or a carrier screen? What other tests have already been performed? Is there any additional phenotypic information that might be relevant for the results and how they should be considered? This is a complex set of considerations and requires knowledge of clinical and technical genetics.

Figure 17-2
figure 2

Decision tree for the evaluation of clinical implications associated with sequencing calls. The process shown is the one that the Illumina Clinical Services Laboratory uses for the evaluation of evidence that links a particular allele to a clinical condition

Literature has historically been an important source of information regarding the clinical associations of a genetic variant to a disease. The ability to publish case reports has been critical to help identify genes and variants that are suspicious for possibly causing a disease. However, these are often just first steps and subsequent studies in which cases are evaluated against controls, either in pedigrees or populations, and additional functional evaluations may be critical for providing evidence that a variant is likely causative for a disease. Very often an initial report will appear in which a variant is found in a gene that is known to be associated with disease, and the variant may seem a compelling explanation for the disease, but further studies show that the variant is also found in unaffected individuals or has no effect on the protein function. It is therefore imperative to review all the literature associated with a variant, as well as to evaluate how strong the data in the papers are before determining one’s confidence level regarding a variant’s pathogenicity or lack thereof. Some articles are more robust than others, and should be weighted accordingly. This process is often where clinical laboratories have real specialties because they typically have a well-trained, clinically oriented staff of MDs, PhDs, and Genetic Counselors who review the evidence presented in these papers and bring all of the considerations listed in Fig. 17.2 into the final reporting language. This process is challenging and time-consuming at a single gene level, but represents what is likely the biggest challenge at the level of genome sequencing.

There are many tools and approaches that have been or can be developed to help manage this burden on the clinical laboratory. Clearly defining how filtering tools will be applied based on the case at hand and indications for use will significantly reduce the interpretive burden at this point. Other approaches, such as ruling out variants with high frequencies or those that are synonymous, before having to do additional downstream investigation, are commonly practiced in clinical laboratories. When choosing to do this, however, it is important to consider aspects such as how common a disease might be in a particular ethnic population and incomplete penetrance, which might lead a laboratory to incorrectly ruling the variant out. Natural language processing tools have also been suggested to assist with the burden of reading published literature. Collection of the literature associated with variants requires a standardized set of terms, such as that used by the HGVS [22, 23] that search tools can use. The extent to which natural language software and other software approaches can automate the evaluation of variants is still highly debated. It is clear that these tools are invaluable for collating the information. One challenge is simply reading the papers. Individuals must be able to read through a paper, evaluate the strength of evidence regardless of author’s conclusions, and document this. This currently requires professionals spending a significant amount of time sifting through that information. Every clinical laboratory faced with the numbers of variants to be assessed in WGS will be challenged to hire a qualified staff large enough to support such efforts. For this reason, databases in which such information is available and could be shared become extremely valuable. At the same time, each laboratory that builds up these databases incurs a huge expense in this effort. How the laboratories can create community access that will benefit other laboratories and ultimately the patients while still paying for the effort required of them to create this information is an interesting and active area of exploration.

Designing the Post-analytic Process for Monogenic Conditions

Given the daunting number of variants to consider, approaches must be developed to apply filters so that only variants of potential relevance are identified and evaluated. Both biological and clinical features can be used to help refine the search for genomic information. In cases where parental samples are available, a geneticist or genetic counselor should begin with taking a family history to identify whether the current condition is most likely to be autosomal recessive, autosomal dominant (possibly with reduced penetrance), or de novo inheritance. If both or even one parent sample can be sequenced along with the affected individual, then subtraction can be performed across the entire genome in order to evaluate variants that meet the biological hypothesis of the following conditions:

  • Autosomal recessive, in which one would expect to find at least two variants within a single gene, one inherited from each parent. To perform this search, all three samples are sequenced and the child’s variants are filtered to match the expectation of two variants in a gene, one from each parent. This can significantly reduce the number of variants that must be considered. After this subset of gene/variants is identified, the genes and specific variants can be filtered further. For example, common variants with allele frequencies above 5 % might be excluded from consideration; when making such decisions, patient ethnicity, prevalence of the condition in that ethnic group, penetrance, and modes of inheritance should be considered because sometimes common variants are pathogenic. Through the use of these types of filters, the resulting subset of variants should be of a tractable number that can be individually evaluated by qualified clinical laboratory staff.

  • Autosomal dominant, in which one would expect to find only a single causative variant within a gene. This model is more difficult because there are significantly more possible variants to evaluate; however, if there is a family history (even with reduced penetrance), one can subtract variants from the unaffected side of the family and look for matches to the presumed carrier parent (who may or may not be affected). Again, additional filters to remove high frequency variants can be applied and the resulting variants can be considered.

  • De novo, in which the causative variant arose within the proband. In this case, all variants inherited from both parents can be subtracted and only those variants that arose in the affected individual can be considered.

For all of the above methods, the process may also include the evaluation of the resulting variant set in the context of the clinical phenotype or a defined set of genes that are set out by the physician/medical geneticist. This could take the form of a filtering tool that enables the list of variants to those in genes known to be associated with the phenotype, or simply as part of the context that the clinical laboratory staff uses during the evaluation process.

These types of approaches based on filtering by modes of inheritance and parental genomes are currently the most popular way of WGS testing. However, this does require the added expense of sequencing multiple genomes in order to identify potentially causative variant(s). Sometimes, the parental samples may not be available, or the additional cost may be prohibitive. In such cases, a clinical phenotype approach can be used on its own. This is not likely to be as effective as a biological approach, but it has been used successfully in several cases. For this approach, one requires access to thorough clinical phenotype information, such as all presenting features or previous testing results (e.g., no increased creatine kinase). This information can then be used to search through phenotype-to-gene information that is available in various databases (e.g., Online Mendelian Inheritance in Man (OMIM) [24]), or accessible within phenotype software tools, to identify and rank order genes that might be involved with the symptoms affecting the proband. Then, all variants within that subset of genes can be considered, with additional filters applied to remove variants that are too common to be likely involved with disease.

Each of these approaches is labor intensive and requires a clinical laboratory staff trained in the evaluation of genetic disease, preferably formally trained and certified through the American Board of Medical Genetics (ABMG), American Board of Pathology (ABP) or the American Board of Genetic Counseling (ABGC). In the case where the first assessment is found inconclusive, multiple different approaches might need to be performed. The clinical laboratory team performing the filtering and variant assessment should expect to spend several hours per genome evaluating the resulting variants, and this type of effort should be budgeted for in the planning for this type of testing. For laboratories that have implemented these types of approaches, diagnostic yields ranging from 30 to 40 % have been reported (personal communication); keeping in mind that these are often patients for whom all other testing has failed. The cost and time investment of the WGS test must be considered against the potential costs and consequences to the affected individuals of undiagnosed genetic disease.

Designing the Post-analytic Process for Oncology Applications

Another possible use for WGS is in the assessment of the molecular profile of tumors in patients who have already been diagnosed with cancer. This type of testing can be useful in determining candidate therapeutic treatments when standard of care approaches have been exhausted. In these cases, the tumor sample and a normal sample of DNA are procured from the patient. Variants found in the normal sample are subtracted from the tumor sample, so that only variants that have arisen somatically can be identified. In a somewhat unique manner, most laboratories that perform clinical oncological testing will have a tumor board associated with the laboratory that reviews the findings and contributes to the interpretation. Results from this type of testing, the clinical laboratory and the associated tumor board may be able to identify the most promising chemotherapeutic options based on the presentation of the molecular profile. Of particular interest for oncological applications are large chromosomal rearrangements, insertion or deletion events, and copy number variants that can be identified. Anecdotal reports of these approaches have been very encouraging [25, 26].

The analyses required for detection of tumor variants are significantly more complex than those described above for Mendelian conditions. In the analytic phase, special consideration about the sample type should be given based on the type of cancer being tested. For example, blood samples in leukemic patients would likely be more representative of the tumor rather than the normal signal and the type of tissue most appropriate for the normal sample should be thoughtful considered. Beyond that, the analytic process for the normal sample is essentially the same as what would be done for the monogenic conditions (described above). Tumor samples, however, require special additional processing and handling. To begin with, the DNA isolated from a tumor may be from fresh, fresh frozen, or more commonly, from formalin-fixed paraffin-embedded (FFPE) tissue. The different tissues may require significantly different extraction techniques and evaluations of the quality of that DNA. Laboratories must evaluate their abilities to support each of these extraction techniques and subsequent evaluations of appropriate DNA quality and quantity. The downstream informatics processing of tumor samples also has some unique requirements. Tumor samples are often contaminated with some amount of normal cells. Quantifying this fraction is difficult and imprecise and has implications for downstream informatics processing that must be incorporated into the process. Additionally, NGS methods sequence individual molecules separately, and therefore, in a diploid situation a heterozygote would be expected to have approximately half of the sequences showing one variant and half with the other. The algorithms that have been developed for NGS typically have been developed to optimize for this scenario, and general recommendations regarding the required number of independent sampling events are also usually made with this expectation. However, a tumor does not represent a diploid scenario. Therefore, one must establish at what frequency one wishes to detect somatic variants; this might be 20 %, 5 %, 1 %, or less. Depending on what the laboratory decides, sequencing must be done to a depth that ensures likely detection of variants. The depth required to attain the required sensitivity can be estimated using a sampling statistic:

$$ P\left( x, p, N\right)={\displaystyle \sum}_{K= X}^{N- X}\frac{N!}{\left( X!\right)\left( N- X\right)!}{p}^X{q}^{\left( N- X\right)} $$

Empirical validation will be discussed in the validation section. However, in addition to different processing requirements, the bioinformatics algorithms used to detect variants may also need to be optimized, and additional or alternative algorithms may be needed. In some cases, different algorithms may be called for to detect different types of variants, for example copy number or structural variants (chromosomal rearrangements). Laboratories planning to launch tumor–normal WGS analyses should be prepared to evaluate these needs and plan appropriately for implementation. This can be an arduous process and a team may be needed to identify the requirements and evaluate the appropriate set of tools for implementation.

Cancer is not the only disease type that is associated with the occurrence of somatic variants; certain genetic conditions (often associated with hemi-hypertrophy or skin lesions and increased likelihood of developing cancer later in life) may also demonstrate these and be of interest for a clinical molecular lab. Additionally, in testing for mitochondrial diseases, it may be critical to enable detection of mitochondrial heteroplasmy. All of these applications involve the challenges described above for tumor scenarios, and may require the same or similar planning and evaluations before implementation.

Designing the Post-analytic Process for Screening for Fetal Aneuploidies

WGS can also be used for various forms of screening tests. Screening involves identifying genetic variants with potential clinical implications, typically before there is any clinical presentation, and often that would be confirmed by additional testing before any medical action is taken. Currently, the most common and popular screen involving WGS is for aneuploidy in prenatal settings. Commonly called noninvasive prenatal screening or testing (NIPS or NIPT), this involves performing deep sequencing of either targeted regions or the whole genome in an effort to identify chromosomal regions that are present at non-diploid copy numbers. These kinds of screens have only been available in the last few years, but their sensitivity and specificity is greatly improved over serum screening paradigms and therefore is being rapidly adopted, particularly for high-risk pregnancies. These tests are performed from a maternal blood sample where the DNA for the testing is fetal DNA circulating in the maternal blood stream, and thus considered noninvasive from the perspective of the fetus. Because this testing requires isolation and enhancement of the fetal DNA, specific planning should be given to additional techniques that might be necessary for implementation, such as for DNA isolation and quality evaluation to ensure that the appropriate quality and quantity of DNA is present to perform testing. This test also requires quantification of genomic regions that are present at non-diploid quantities, and the subsequent analyses.

Designing the Post-analytic Process for Predisposition and Carrier Screening

Finally, WGS can also be used for more traditional screening of genetic variants for which individuals may be carriers or at risk. While this type of testing is currently more likely to be performed using targeted panels, it is possible to employ WGS for this purpose. The post-analytic process for this type of testing is heavily dependent on the test definition provided in the pre-analytic phase. Typically, this would have identified a set of genes that would be included in the test, and this set would have established clinical utility of testing for a specified set of diseases. In the case of WGS, this can be a many-to-many relationship where there may be many genes tested that are providing information about predisposition or carrier status for one disease, but also any one gene could have multiple diseases clinically associated with it. The test definition would also define the regions within those genes that are included in the test (e.g., exonic regions, parts of intronic regions directly adjacent to the exons). Therefore, the set of variants requiring interpretation from the analytical stage would be filtered to those included in the established test definition. Most laboratories performing this testing restrict reporting to those variants assessed as clinically significant; however, ancillary documentation of all assessed variants and their classifications are included in some cases.

Once the clinical implications of a particular individual’s variants have been decided, the information must be put into the clinical context for which the test was ordered. Incidental findings can potentially be quite numerous, and additionally, a single answer might not be found. There could be three or four variants in two or three genes that plausibly lead to a patient’s symptoms and have equally inconclusive or conclusive evidence supporting them. Indeed, one discovery is that in at least a few cases, patients have been discovered to be suffering from more than one genetic disease [10] explaining the perplexing clinical presentation. Reports must be flexible enough to enable the benefit of a personalized survey of the genome, but standardized enough to enable clear communication of results. A searchable electronic report might be the best solution; this could provide links to disease descriptions and additional evidence that practitioners could then have access to as needed. The goal is to provide a succinct answer to the major question of the moment, but also to enable both the physician and patient to benefit from the additional information that may be present and of concern. One challenge with whole-genome evaluation is that our understanding of genetics and biology is not perfect or complete; most variants that will be detected will be of uncertain significance. This is also an area in which it would be of benefit for clinical laboratories to communicate more effectively with physicians and genetic counselors. Whereas it will require an upfront time commitment as well as tools that enable communication, it might well be worthwhile for laboratories to ensure that doctors and genetic counselors have access to the following information before they receive their reports:

  1. 1.

    The standards that a laboratory uses in order to make calls

  2. 2.

    How laboratories classify variants into the standard bins of Pathogenic, Likely Pathogenic, variants of uncertain significance (VUS), Likely Benign, Benign, or other

  3. 3.

    How much confidence a practitioner should have in that call

  4. 4.

    What the weaknesses of the test are, and any recommendations regarding additional testing that could supplement these weaknesses

Communication tools might be readily located on clinical laboratory Web sites, where quick, 5-min podcast type communications might provide both doctors and genetic counselors with information that can significantly increase the power and confidence they have when using a test.

An ongoing challenge will still be the large number of variants about which people are uncertain. While a large number of VUS is a point of concern, this is not new to the field. The International Standards for Cytogenomic Arrays Consortium (http://www.iscaconsortium.org) [27] has demonstrated approaches to dealing with the large number of novel and uncertain variants that are detected in individuals when genomic evaluations are standardly performed. In less than a decade the cytogenetics community has made huge strides in understanding the nature and degree of variation at the cytogenetic level. Similar approaches could be used in the field of sequencing to better understand the nature of human genetic variation, which will aid significantly in improving and refining interpretation in the future. Meanwhile, clinical laboratories can make every effort to communicate a priori that this is an anticipated outcome of these tests, and help to prepare physicians and genetic counselors for managing the information.

Communication and Support

Once a test has been defined and the performance specifications and abilities established, it is critical to develop support materials. The laboratory should also be staffed with trained genetic support specialists. These specialists should be available to help physicians decide if WGS is the best test for the presenting situation, and also to help plan for alternative or supplemental testing that might be necessary. It is of particular importance, but also particularly challenging to communicate this when the very title “whole-genome sequencing” might imply all things to everyone. It is helpful to provide information through a Web site that can help individuals evaluate what the test supports and what it does not.

Depending on the breadth of WGS services that a laboratory intends to offer, it may be helpful to develop an overview section that clarifies which tests offer what, and are likely to be most appropriate. As information such as analytical validity, limitations of detection, and reportable regions need to be included in test definitions, and because these will be variable depending on the application of WGS, it is likely that it will be necessary to create multiple test definitions and descriptions. Including general educational materials will help physicians and patients navigate the options and choose most appropriately. Importantly, information should be readily available to help physicians understand the limits of detection, such as an ability to detect variants present in the sample at, for example, 10 % but not 5 % in tumor samples, or the ability to detect deletions within certain size ranges. Laboratories should be prepared to monitor and track their capabilities to make calls of any type throughout the genome. As tests are ordered, laboratory staff evaluate the test requisitions and evaluate the laboratory’s ability to support the request. If there are concerns about whether WGS is appropriate for the sample being ordered, the laboratory should contact the physician and discuss the options before the testing is initiated.

Genetic counseling is a best practice recommendation for genetic tests in which the results may have direct medical indications for immediate family members or in which the results might be predictive. WGS produces information that meets those criteria, not only for the specific indication of the testing, but also for secondary findings. The ACMG has issued a series of recommendations for clinical genomic testing, counseling, and consent [28]. The ACMG has stated that genome or exome sequencing is appropriate in a series of circumstances that include strong reason to suspect a genetic etiology, symptoms associated with multiple genetic conditions for which simultaneous evaluation of multiple genes can be practical, inconclusive previous tests, and, in special cases, prenatal diagnosis. WGS is not advised at this time for prenatal or newborn screening. The recommendations specifically advise that the following elements be addressed in counseling and consent sessions: (1) pretest counseling including written documentation, (2) discussion of potential for incidental findings, (3) discussion of expected outcomes as well as incidental findings to be returned to physician, (4) potential benefits, risks and limitations of testing and if there are alternatives, (5) distinction between clinical testing and research, (6) potential for results to be identifiable in databases, and (7) policies for updating information. It is also recommended that such testing only be performed on minors in cases where the testing can lead to diagnosis for conditions in which interventions might be possible, and under institutional review board (IRB) approved research. Additionally, the ACMG has recommended that everyone who has access to WGS, regardless of indications, should have results reported for a set of 56 conditions. These conditions represent highly penetrant genetic conditions for which there are potentially life-saving interventions available. Although these recommendations have been controversial, it is indicative of the medical community’s rapid adoption and preparations to manage this information in regular clinical practice.

After WGS analysis and interpretation has been performed, additional communication with the ordering physician is likely to be necessary. While inconclusive test results are not uncommon for physicians, findings may require additional communication, particularly with regard to the management or further testing of VUS.

Infrastructure Considerations

After identifying what the WGS test will be used for, the clinical laboratory should consider the current infrastructure and any possible additional needs that would require additional build out. Depending on what resources and infrastructure a laboratory has, an assessment of necessary components includes the following:

  • Facility

    • NGS sequencers are not usually very bulky, but they require space that is stable, climate controlled and has both power and Internet support. Specific requirements include uninterruptible power supply (UPS) and e-power setup, with heat, ventilation, and air conditioning (HVAC), temperature and humidity control at around 68–72 °F and 70 % relative humidity. Laboratories are required to practice space separation between pre- and post-amplification activities, and ideally would have negative pressure control on rooms that could have contamination, use a pressure-controlled hood. Additional safety precautions may also be necessary depending on specific requirements.

  • Staff

    • NGS is considered to be high complexity testing and involves many steps. A well-trained staff is critical for this. Typically, a staff to support WGS will require people with expertise in high complexity molecular assays, genetics analyses, bioinformatics, and genetic counseling.

  • Workflow process

    • WGS may be among the easier of the NGS assays to perform in that there are no capture or amplification steps (Fig. 17.1). Nonetheless, there are still several manual steps required and each of these can potentially introduce a contaminant or sample swap. In order to avoid such complications, a good workflow process and ideally a laboratory information management system (LIMS) to track and document a sample’s process through the assay steps should be implemented. Assessment of steps in the process that can be error-prone is critical to designing a workflow in the laboratory that is robust, and consideration of appropriate controls, performance metrics, and tracking systems is prudent. In particular, positive sample controls are recommended because pre-analytical sample swapping is one of the most common errors introduced into clinical testing.

  • Computing and bioinformatics infrastructure

    • A high-performance storage and computing cluster (a set of connected computers that work together as a single system) is necessary to perform whole-genome sequence analyses in high volumes. These analyses can be performed on a computing cluster consisting of many multi-core computers. An evaluation of these needs should be based on predicted volumes and specific analytical requirements for the test(s) that will be supported. Additionally, a tracking system for recording quality metrics across and within each sequencing run, lane, and sample is extremely useful for catching runs that go poorly and not wasting time and money on failed runs. These types of tracking system can also enable users to identify when additional sequencing will be necessary. Finally, bioinformaticians who are skilled in these analyses are important members of the NGS clinical team.

    • A data management system for storage of genomic information should be planned for before implementing WGS in the clinical laboratory. Various guidelines suggest that sequencing results that could be used in evaluation of hereditary conditions should be stored for multiple years [3, 4]. The recently released CAP NGS checklist requires that data be stored for a minimum of 2 years to enable reanalysis of NGS results. This is in addition to other requirements around storage of actual clinical deliverables. What will be stored, and how it will be stored requires thorough consideration.

    • Many software tools are available to support the multiple steps involved in WGS analysis. An evaluation of which tools should be used based on the intended use of the test should be performed. Once the right set of tools is identified, users may need to create a workflow using custom scripts that enable the usage of several tools, keeping in mind that input and output abilities and requirements may be variable among these tools. The software tools used in the analytical calling and downstream analysis and classification of variants are among the most variable aspects of clinical WGS being performed today. It is critical that laboratories understand the caveats and limitations associated with any of the software tools being used in their data analysis pipeline.

  • Security

    • It is likely that WGS will be considered impossible to make anonymous. Privacy concerns around how these data are stored, when and how they are updated, who should have access, and what should go into the medical record are currently not well addressed by policies. However, laboratories are thinking about how this is likely to change and what safeguards and options they will be able to offer the doctors and patients who are interested in ordering WGS.

Ongoing Quality Assessment and Control

After validations have been performed, quality filters and metrics established, mechanisms are developed to monitor ongoing performance during testing of clinical samples. The process of genome sequencing can be divided up into three stages: wet-lab processing, bioinformatic analysis, and interpretation and report generation. The wet-lab component encompasses DNA extraction, DNA shearing and size selection, ligation of oligonucleotide adaptors to create a size-selected library and physical isolation of the library fragments during amplification and sequencing. Each step of the process should be considered for the implications of a failure or contamination event; accordingly, the quality monitoring should be designed to detect the most likely or significant possible failures. Specifically, DNA extraction, library preparation, cluster generation, and the sequencing run should be assayed for quality. There are many ways in which quality can be monitored, and these include establishing run metrics at various steps, performing quality assessment steps (such as quantitative PCR (qPCR), DNA quantification and purity measures, run metric measures). Robotics and automation are valuable additions that can be made to a protocol to minimize the possibility of human error. Future advances to further combine the sequencing laboratory steps with automation will increasingly assure a reduction in potential errors. Controls can also be useful in the assessment of run quality. External controls, such as lambda DNA fragments, can be spiked into samples to measure the success of the run. Alternatively, orthologous assays such as microarrays can be utilized to measure sequencing accuracy at a very high level by comparing the concordance of calls from a genomic level microarray to the sequencing calls.

Proficiency testing is one method that is used as part of ongoing quality assessment. The molecular pathology on-site inspections by the CAP occur every 2 years, but ongoing proficiency testing with both intra- and inter-laboratory analysis improves testing procedures and helps to prevent errors (reviewed in [4]). As several clinical laboratories are currently offering genomic level sequencing, alternative proficiency testing programs are used to enable laboratories offering exome and genome sequencing to compare their calls. In a recent exchange between the Illumina Clinical Services Laboratory and the University of California, Los Angeles (UCLA) molecular pathology laboratory comparing two samples that had been run and reported in both laboratories, both laboratories made calls for 3,573,631 sites, of which 19,340 represented variants from the reference. Across all the calls made, 16 positions were called discordantly between the two laboratories. Investigation of such discordantly called sites, along with relative quality metrics from each run and the types of variants these sites represented (e.g., high GC regions or repeat regions) will help participating laboratories improve quality.

Conclusions

The implementation of clinical WGS is not trivial, and the suggestions made in this chapter highlight the need for well-trained teams that bring diverse expertise to the clinical laboratory. One challenge that is often raised is the lack of experts available; this is a legitimate concern and for that reason community efforts for establishing guidelines, and promoting education and best practices are critically needed. Ongoing training and certification, active participation in societies and meetings, and regular review of recent guidelines and publications will be necessary particularly during the early phases when the learning curve will be steep and policies are likely to evolve. That said, this is also a great opportunity for clinical laboratorians to work closely with their medical practitioner colleagues, as well as with experts in diverse fields such as bioinformatics, population genetics, and information technology to create a new approach to evaluating, diagnosing, and managing genetic disease using entire genomes of information.