Systematics for types and effects of DNA variations
Numerous different types of variations can occur in DNA and have diverse effects and consequences. The Variation Ontology (VariO) was developed for systematic descriptions of variations and their effects at DNA, RNA and protein levels.
VariO use and terms for DNA variations are described in depth. VariO provides systematic names for variation types and detailed descriptions for changes in DNA function, structure and properties. The principles of VariO are presented along with examples from published articles or databases, most often in relation to human diseases. VariO terms describe local DNA changes, chromosome number and structure variants, chromatin alterations, as well as genomic changes, whether of genetic or non-genetic origin.
DNA variation systematics facilitates unambiguous descriptions of variations and their effects and further reuse and integration of data from different sources by both human and computers.
KeywordsDNA variations Mutation Variation Systematics Variation ontology VariO Annotation Databases Ontology
Bruton tyrosine kinase
Copy number variation
Dopamine receptor D4
Evidence & Conclusion Ontology
The Encyclopedia of DNA Elements
G protein subunit alpha 11
Histone H3 at lysine 4
HUGO Gene Nomenclature Committee
Human Genome Variation Society
Interleukin 1 receptor type 2
International System for human Cytogenetic Nomenclature
Laminin subunit beta 1
Long interspersed element
Leiden Open (source) Variation Database
LDL receptor related protein 1B
Locus specific variation database
Long terminal repeat
Lysosomal trafficking regulator
Methyl-CpG binding protein 2
MYC proto-oncogene, bHLH transcription factor
Nucleic Acid Database
Nuclease sensitive element
Nuclear receptor binding SET domain protein 1
Protein Data Bank
DNA polymerase gamma, catalytic subunit
Protein tyrosine phosphatase, non-receptor type 22
Short interspersed nuclear element
Topologically associating domain
Telomerase reverse transcriptase
Variations at DNA are frequent and form the foundation of evolution. Some variants are related to diseases but many do not have any associated phenotype. The range of changes is very wide, from single nucleotide substitutions to changes in the number of entire chromosome sets. We can distinguish four categories, those in local DNA regions, such as genes; chromosomal variations; chromatin changes; and genome-wide alterations. To fully understand variants and their mechanisms and significance it is necessary to investigate them from different angles, e.g. to identify types of variants, but also to understand how they may affect structure, function, interactions, properties etc. For a systematic description of variations and their consequences, effects and mechanisms a framework called Variation Ontology (VariO) was developed . As an ontology VariO facilitates systematic and detailed descriptions of variants. VariO includes terms for all kinds of alterations in DNA, RNA and protein.
Experimental studies provide the most reliable interpretation for variants and their effects and consequences. However, the huge volume of variants, e.g. about 3 million substitutions in a genome for a human individual, does not allow extensive experimental studies. Therefore, different kinds of prediction methods have been developed. The numbers of such tools are much higher for protein variants (see e.g. ). Non-coding variants are more difficult to predict largely due to lack of examples with known outcome. DeepSEA  is an example of a DNA predictor. For transcription factor binding sites and expression regulation, several approaches are available.
The Encyclopedia of DNA Elements (ENCODE) project has annotated functional elements at genomic regions, largely based on predictions . There are data for transcription, transcription factor association, chromatin structure and histone modifications. For transcription factor binding sites and expression regulation, several predictors are available, reviewed in , that take into account sequence motifs, chromatin features and others. There are also methods to predict effects of cis regulatory elements and variants  including enhancers .
Dedicated methods are available for insertions and deletions whether affecting the reading frame or not [8, 9, 10]. When considering using these tools, one should bear in mind that most of them have not been systematically benchmarked as has been done for e.g. amino acid substitutions [11, 12]. Systematic method assessments are available for nucleosome position prediction methods [13, 14] as well as for predictors of topologically associating domains (TADs) .
Here, DNA variations, their types, functions, structural effects and properties are described in the systematic framework of VariO, similar to a previous article for protein variations . As far as the author knows, this is the first systematic treatise of DNA variations and applicable to all organisms and kinds of variations and mechanisms. Variations at DNA level are important as such but also because they constitute the basis for inherited variations at RNA and protein levels. Examples are presented to highlight the different features of variants, usually in the context of human diseases.
Databases for DNA variations
Examples of DNA variation databases
General variation databases
Ensembl Variation Database
Database of Short Genetic Variations (dbSNP)
Exome and complete genome sequences
NHLBI Exome Sequencing Project (ESP) Exome Variant Server (EVS)
The 1000 Genomes Project
European Nucleotide Archive (ENA)
Locus specific variation databases
Leiden Open Variation Databases (LOVD)
Universal Mutation Database (UMD)
ImmunoDeficiency Variation Databases (IDbases)
The TP53 web site
Allele frequency databases
The ALlele FREquency Database (ALFRED)
Allele Frequency Net Database (AFND)
Allele Frequency Community (AFC)
Cancer variation databases
Catalogue of Somatic Mutations in Cancer (COSMIC)
The Cancer Genome Atlas (TCGA)
International Cancer Genome Consortium (ICGC)
Pakistan Genetic Mutation Database
The Singapore Human Mutation And Polymorphism Database
Databases of genomic structural variations
Database of Genomic Variants (DGV)
Database of Genomic Variants archive (DGVa)
Mitelman Database of Chromosome Aberrations in Cancer
Human Polymorphic Inversion Database (InvFEST)
The European database of L1-HS retrotransposon insertions in humans (euL1db)
L1base, LINE-1 insertions
Short Tandem Repeat DNA Internet DataBase (STRBase)
Methylation Bank (MethBank)
miRNA target databases
Polymorphism in microRNAs and their TargetSites (PolymiRTS)
Somatic mutations altering microRNA-ceRNA interactions (SomamiR DB)
DNA loop database
Databases have been established for many diseases, those for cancer contain large amounts of data. Structural variants form a special group of alterations, there are specific data collections for them. Several resources share information on short repeat sequences and of methylation. Dedicated databases list microRNA and target variants, as well as DNA loops.
For an efficient use, reuse, search and integration of variation information it is essential to describe it in a systematic way. VariO (http://variationontology.org/) was developed for the systematic description of variation types, effects, consequences and mechanisms . The ontology is used to annotate information in databases at the three molecular levels: DNA, RNA and protein. Each of these levels contains further terms for variation type, function, structure and various properties. Here, DNA variation types and effects will be discussed. VariO annotations are always made in relation to a reference state, e.g. a reference sequence or a wild type property. A new version of VariO has been released with new terms, especially for DNA. VariO development continues, new terms are added and some rearrangements of already included terms are made when required, as in the latest releases for some areas in DNA and RNA terms. The basic structure of VariO has remained the same ever since first released, however new terms have been added, terms have been reorganized, clarified and redefined, when need has arisen. New terms, clarifications and updates can be suggested via the web site.
Systematic annotations consist of two parts: the VariO prefix and a number followed by the term. As an example, VariO:0132 is for “chromosomal variation”. The number with the prefix is mandatory for annotation, the term name can be derived with that information. This article is organized according to the VariO: DNA variations are divided into the four major sublevels - DNA variation type, function, structure and properties. Subheadings are VariO terms, in the text terms are written in quotation marks. Detailed guidelines for the use and annotation have been published . Consistent database annotations can be made with the VariOtator annotation tool . VariO annotations are already used in a number of databases including some of those in the LOVD (Leiden Open (source) Variation Database) LSDB system, such as BTKbase  and SH2base , as well as in UniProt  and VariBench . VariO is available in several ways including the website, AmiVariO, Ontology Lookup Service (https://www.ebi.ac.uk/ols/ontologies/vario), OBO Foundry (http://www.obofoundry.org/ontology/vario.html), NCBO BioPortal (https://bioportal.bioontology.org/ontologies/VARIO), Ontobee (http://www.ontobee.org/ontology/VariO), AgroPortal (http://agroportal.lirmm.fr/ontologies/VARIO), FAIRsharing (https://fairsharing.org/bsg-s000776/) and others.
VariO is used to describe the outcome of the mutation, i.e. the changed nucleotides etc., not the mechanism that led to the alteration. The latter we cannot explain just by looking at the variant. Note that “mutation” (VariO:0139) in VariO means “any process generating variation”, not the outcome of these processes.
VariO annotations can be enriched with additional systematics, as described in the original article . To provide details on the methods based on which the annotations are made, Evidence & Conclusion Ontology (ECO) terms  can be used to indicate whether and which laboratory experiments, computational methods, literature curation, or other means have been applied.
DNA variation type (VariO:0129)
There are four levels for the descriptions: DNA chain, chromosomal, genomic and chromatin levels, depending on the type and size of the variation. With VariOtator, the variation type annotations at DNA, RNA and protein level can be made automatically, including for Leiden Open Variation databases (LOVD), from the HGVS names. In the following examples, HUGO Gene Nomenclature Committee (HGNC) names  are indicated for genes. The HGVS prefixes for DNA (c. for coding DNA, g. genomic sequence, m. mitochondrial) are used in the text. In some instances protein variants are discussed, they are indicated with prefix p.
DNA variation classification (VariO:0322)
Histone variants or alterations in remodeler and modifier enzymes or their expression affect “chromatin variation” (VariO:0153). These alterations are frequent in cancers . “Chromosomal variation” (VariO:0132) is either “variation of chromosome number” (VariO:0133) or “variation of chromosome structure” (VariO:0134). Down syndrome with trisomy of chromosome 21  is an example of “variation of chromosome number”, while Rett syndrome due to an inversion in X chromosome  is a “variation of chromosome structure”.
There are 5 categories of “DNA chain variation” (VariO:0135) types, some of them with subcategories. “DNA deletion” (VariO:0141) of G from region for intron 3 (g.101374535del) in BTK gene coding for Bruton tyrosine kinase causes a splice defect and leads to X-linked agammaglogulinemia (XLA) . “DNA indel” (VariO:0143) is a variant that is due to both insertion and deletion. Alteration from C to TG in BTK gene coding for exon 17 (c.1684_1685delinsT) causes XLA due to RNA frameshift and truncated protein  is an example of a DNA indel. The original base C is deleted and TG inserted instead. “DNA insertion” (VariO:0142) introduces a new base(s) to the DNA, such as insertion of T to BTK gene for exon 3 (g.101374623insT) introducing a new stop codon . “DNA substitution” (VariO:0136) is the most common single nucleotide variation type and DNA variation in general. G to C substitution in the BTK gene coding for the TH domain (g.101362620C > G) causes amino acid substitution in Zn finger leading to XLA . DNA substitutions are either transitions or transversions. “Transition” (VariO:0313) changes a purine base (A, G) to another purine or a pyrimidine (C, T) to another pyrimidine. “Transversion” (VariO:0316) is a substitution from a purine to pyrimidine or vice versa. The G to C substitution is a transversion. Transitions can be classified further to “purine transition” (VariO:0315) and “pyrimidine transition” (VariO:0314) .
When a sequence stretch is moved to a new location within a chromosome it is called for “DNA translocation” (VariO:0144). “DNA inversion” (VariO:0145) is a special type of translocation where the sequence is inverted to its original place. Microinversions are rare, such as a 95 nucleotide inversion at 22q11.21 (Database of Genomic Variants nsv1129408) .
Genomic variations affect the entire genome. Autopolyploidy, which means duplication of chromosome sets originating from the same organism, is an example of “genomic variation” (VariO:0131) and common in human liver .
DNA variation origin (VariO:0127)
There are two types of “DNA variation origin” (VariO:0127), namely “DNA variation of genetic origin” (VariO:0130) and “DNA variation of non-genetic origin” (VariO:0146). Variants of genetic origin have appeared on DNA (or RNA) level and therefore directly affect the protein, when in a coding region.
Insertion in the non-coding region of exon 2 in BTK is a “de novo variation” (VariO:0444) and has occurred in that invididual , while G to C substitution (c.1685G > C) for codon 562 causing p.R562P substitution in protein is a “germinal variation” (VariO:0445)  that has occurred in the germ cell of the mother. Melanoma-related A to T transversion in GNA11 (G protein subunit alpha 11) gene leading to a G209 L substitution is a “somatic variation” (VariO:0446) .
Several variation types are of non-genetic origin. Replacement of A by C in BTK leading to the amino acid substitution p.Y334S was made in a construction and is thus an “artificial DNA variation” (VariO:0172) . Novel genome editing technologies allow generation of specific DNA alterations e.g. to correct for genetic defects as in β-thalassemia  leading to “edited DNA” (VariO:0407). This example is an artificial variation, but genomic editing appears naturally in some organisms. Changes in DNA methylation pattern are a form of “epigenetic DNA variation” (VariO:0147) and are associated to systemic lupus erythematosus due to changes in transcription activation . DNA lesion, such as incorporation of 8-hydroxyguanine to DNA, causes a form of “modified DNA” (VariO:0337) .
Variation affecting DNA function (VariO:0148)
DNA molecules have several functions. Some DNA molecules have catalytic deoxyribozyme activities. Self-catalyzed sequence-specific DNA depurination is the only known DNA catalytic activity . Variations to the required cruciform structure could have an “effect on catalytic DNA activity” (VariO:0412).
Deletion of G from the region for intron 3 in BTK gene causes splice defect and XLA  due to “effect on DNA information transfer” (VariO:0150). The type of DNA variation affects DNA repair mechanisms. T/G or U/G mismatches are corrected by base excision repair, but lead also to increased frequency of variations i.e. have an “effect on DNA repair” (VariO:0151) as reviewed in . Variation A to C in the TATA box of the HBB gene for hemoglobin subunit beta leads to β-thalassemia  because of “effect on regulatory function of DNA” (VariO:0152). DNA replication fidelity can be affected by numerous factors including DNA variations such as DNA adducts caused by reactions with e.g. environmental mutagens, and sequence context [79, 80], thus having an “effect on DNA replication” (VariO:0154).
Variations at two major TERT (telomerase reverse transcriptase) gene promoter sites are frequent in melanoma patients and generate binding sites for Ets/TCF transcription factors . These variants are classified to have “effect on transcription” (VariO:0149).
Variation affecting DNA property (VariO:0227)
DNA properties affected by variations are described by terms in this category. Insertion of T to the BTK gene coding for exon 3 introduces a new stop codon  and has “association of DNA variation to pathogenicity” (VariO:0229). Variation c.82C > T in BTK causing p.R28C  affects “conservation of DNA variation site” (VariO:0231)  by affecting highly conserved position. Variations at TERT gene promoter in melanoma patients generate binding sites for Ets/TCF transcription factors  and have “effect on DNA interaction” (VariO:0230).
Variation affecting DNA structure (VariO:0155)
Affected DNA level (VariO:0159)
DNA level terms are used to indicate what kind of DNA molecule and region is affected by the variation. A Rett syndrome-causing inversion in the X chromosome  has “chromosome affected” (VariO:0164). “DNA chain affected” (VariO:0160) has three subcategories. TERT gene promoter variants in melanoma patients that generate binding sites for Ets/TCF transcription factors  are “variation at intergenic DNA” (VariO:0163). G to C substitution in the BTK gene leads to amino acid substitution at zinc finger motif causing XLA  and is a “variation in exon” (VariO:0162). Deletion of G from the region for intron 3 in the BTK gene causes splice defect and XLA  and is a “variation in intron” (VariO:0161) .
G to A substitution coding for codon 467 (p.A467T) in the mitochondrial POLG (DNA polymerase gamma, catalytic subunit) gene causing progressive external opthalmoplegia and other diseases  has “extrachromosomal DNA affected” (VariO:0072) of type “organellar DNA affected” (VariO:0448) and even more specifically “mitochondrial DNA affected” (VariO:0450). Mitochondria are essential organelles for energy production in eukaryotes, whereas the other compartments with their own DNA, plastids are unique for plants and algae and appear only in some eukaryotes. Substitutions in the plastid infA (IF1 homolog) gene in spring barley lead to cytoplasmic line 2 (CL2) syndrome  and have “plastid DNA affected” (VariO:0451).
There are two additional forms of “extrachromosomal DNA affected” (VariO:0072). Variants to a H group plasmid change its maintenance as temperature sensitive in Escherichia coli . In this case the variant has “plasmid affected” (VariO:0391). Plasmids are independently replicating circular DNA units common in bacteria but can appear also in other organisms. Plasmids can be transferred between cells, even organisms. Many plasmids contain toxin or antibiotic resistance genes. “Extrachromosomal circular DNA” (VariO:0449) is common in many organisms and are widely variable in size and contents as they originate from material in linear chromosomes .
Trisomy of chromosome 21  has “genome affected” (VariO:0391).
Chromatin structure variation (VariO:0226)
GAA triplet expansions in the FXN (frataxin) gene are the most usual cause of Friedreich ataxia, a form of progressive damage of the nervous system. The triplet expansion alters nucleosome positioning so that transcriptional activity is reduced because the start site is not accessible  being a “chromatin structure variation” (VariO:0226) due to effect on “nucleosome positioning” (VariO:0158).
Topologically associating domains (TADs) are a higher order chromatin structures where genomic regions interact with each other. These regions are thought to be involved e.g. in regulation. “Variation in topology associating domain” (VariO:0454) appears in diseases including various forms of cancers where boundaries of TADs are altered .
Chromosome variation (VariO:0176)
“Chromosome variation” (VariO:0176) is divided into two categories “chromosome number variation” (VariO:0206) and “chromosome structure variation” (VariO:0180).
Chromosome number variation (VariO:0206)
Chromosome structure variation (VariO:0180)
The numerous types of variations in this category are depicted in Fig. 4.
Chromosomal amplification (VariO:0183)
Numerous variation types and mechanisms affect the number of chromosomal region copies. “Copy number variation” (CNV) (VariO:0187) ranges in size from 1 kb up to several megabases and can be either amplification or deletion (Fig. 4b). CNV duplication of LAMB1 (laminin B1) gene causes autosomal dominant leukodystrophy .
DNA mobile genetic element insertion (VariO:0192)
“DNA mobile genetic element insertion” (VariO:0192) and its subcategories are used to describe insertions of various mobile genetic elements. The transposition of a “DNA transposon” (VariO:0378) is catalysed by transposase enzymes with a cut-and-paste mechanism .
“Insertion sequence” (IS) (VariO:0392) is a short transposable element that contains only genes for transposition activity. Thereby, IS differs from other transposons that can contain or can be loaded with additional genetic material. Insertion sequence 2404 specific for Mycobacterium ulcerans originating from a crayfish can cause Buruli ulcer, a severe skin infectious disease in human .
“Retrotransposon insertion” (VariO:0377) means a transposon insertion via RNA intermediate which is reverse transcribed to DNA. There are three types of retrotransposons: LINE, LTR and SINE. “LINE” (VariO:0379), long interspersed element, copies constitute totally about 17% of the human genome . Insertion of LINE elements of about 6000 bp long to or close to human genes leads to a number of diseases including familiar hypocalciuric hypercalcinemia and neonatal severe hyperparathyroidism . “SINE” (VariO:0380), short interspersed nuclear element, is 100–700 nucleotides long and requires LINE for replication. Alu element is the most common form of SINE and involved in numerous human diseases . “LTR” (VariO:0388) (long terminal repeat) transposons form the third category of retrotransposons. They are in size between 100 and 5000 bp. Similar to SINEs, LTRs require LINE for transposition.
“Nucleotide expansion” (VariO:0430) is a large group of variations where repeated nucleotide sequences are inserted to DNA. “Microsatellite” (VariO:0188) means repetitive sequences formed by units of one to six nucleotides. CAG expansion in the HTT gene for huntingtin is an example of “trinucleotide expansion” (VariO:0189) . This microsatellite expansion introduces polyglutamine tract of variable length to the amino terminus of the encoded protein. There are terms from “mononucleotide expansion” (VariO:0190) to “heptanucleotide expansion” (VariO:0452) to describe these types of variants.
“Minisatellite” (VariO:0186) is a somewhat longer repeated sequence unit, in length from 10 to 60 bp, repeated up to 50 times. 48 bp minisatellite in dopamine receptor D4 gene, DRD4, is associated with Tourette syndrome, a neuropsychiatric disease .
“Type of chromosomal amplification” (VariO:0427) indicates whether the amplification is interspersed (Fig. 4c) or tandem repeat (Fig. 4b). Insertion of Alu element, a LINE transposon, is an example of “interspersed repeat” (VariO:0184), where the repeat units are separated from each other . CAG trinucleotide repeat in Huntington’s disease is a form of “tandem repeat” (VariO:0185) .
Chromosomal deletion (VariO:0193)
Variants with “chromosomal deletion” (VariO:0193) are highly variable in size. “Copy number variation” (VariO:0187) can in addition to increasing copies of a DNA stretch also mean deletion. Williams-Beuren syndrome-causing deletions at 7q11.23 appear in the middle of the chromosome 7  and are thus of “interstitial deletion” (VariO:0194) type (Fig. 4d). Deletions at chromosome 11 leading to Jacobsen syndrome are 5 to 20 Mb long and typically include the chromosome end  and are thus “terminal deletion” (VariO:0195) (Fig. 4e).
Chromosomal translocation (VariO:0197)
“Chromosomal translocation” (VariO:0197) rearranges genomic regions by moving them within and between chromosomes. There are several types of these changes as depicted in Fig. 4. When translocation occurs between coding regions gene fusions occur like the Philadelphia chromosome in BCR-ABL1 fusion between chromosomes 9 and 22 , which is a hallmark of chronic myelogenous leukemia. “Interchromosomal translocation” (VariO:0202) occurs between different chromosomes, e.g. t(11;14)(q13;q32) in mantle cell lymphoma patients . In “dicentric translocation” (VariO:0405) both the joined segments contain a centromere (Fig. 4f). The acentric segments are lost. This kind of variation leads e.g. to Kabuki syndrome . “Reciprocal chromosomal translocation” (VariO:0203) happens between two chromosomes, such as t(11;14)(q13;q32) in mantle cell lymphoma patients  (Fig. 4g). “Robertsonian translocation” (VariO:0204) is a special type of translocation where the long arms of chromosomes are fused (Fig. 4h). This occurs between so called acrocentric chromosomes, which have very short p arms. In human, chromosomes 13, 14, 15, 21, 22 and Y are acrocentric. Infertile population has 10% increased prevalence of Robertsonian translocations compared to general population (1% vs 0.1%). Translocation rob(14;15)(q10:q10) is one such variation among females with recurrent abortions .
“Intrachromosomal translocation” (VariO:0198) occurs within one chromosome. “Chromosomal inversion” (VariO:0199) is a special type of translocation where the segment is joined inverted end to end back to the same chromosome (Fig. 4i). “Paracentric inversion” (VariO:0200) occurs within a single chromosome arm, such as in the X-chromosome in Rett syndrome patient where the epigenetic changes lead to overexpression of MECP2 (methyl-CpG binding protein 2) gene  (Fig. 4i). “Pericentric inversion” (VariO:0201) includes the centromere, as an example leading to disruption of the NSD1 (nuclear receptor binding SET domain protein 1) gene in Sotos syndrome  (Fig. 4j).
Complex chromosomal variation (VariO:0196)
Immunological recognition molecule diversification (VariO:0447)
To achieve the huge amount of variability to immunological recognition molecules (antibodies, B and T-cell receptors, and major histocompatibility complex type I and II) special mechanisms have evolved. The human body can generate up to 10 billion different antibodies, thus effective diversity generating mechanisms are required as there are only about 22,000 genes in man.
“Immunological receptor gene rearrangement” (VariO:0166) is the major somatic recombination step where fragments for immunological receptor genes are joined to form a gene  (Fig. 4l). During “immunological receptor gene conversion” (VariO:0170) secondary diversification happens by replacing homologous DNA segments  (Fig. 4m). During “somatic hypermutation” (VariO:0168) variations are introduced to the antigen variable region  (Fig. 4n).
“Class switch recombination” (VariO:0169) is the final diversification step for antibodies where immunoglobulin M is switched to other isotypes by changing a portion of the heavy chain coding region (see ) (Fig. 4o).
Isochromosome has one arm duplicated and the other one completely lacking (Fig. 4p). An example is the tetrasomy 18p syndrome where the isochromosome appears in addition to the normal chromosome pair .
Ring chromosome (VariO:0182)
Telomere length change (VariO:0177)
Telomeres are repetitive structures in the chromosome ends which are required for chromosome replication. During this process they are shortened because Okazaki fragments acting as RNA primers prevent complete replication. “Telomere extension” (VariO:0179) means variation that extends telomere . In “telomere shortening” (VariO:0178) the telomere structure is shortened, a phenomenon that is related to many diseases (see ) (Fig. 4r).
DNA sugar variation (VariO:0434)
DNA stands for deoxyribonucleic acids. It is composed of nucleotides, deoxyribose sugars, and phosphate groups. Most DNA variations affect nucleotides, however, “DNA sugar variation” (VariO:0434) does also exist e.g. due to carcinogens  and have special properties that could be beneficial for biotechnological and research applications .
Effect on DNA tertiary structure (VariO:0171)
DNA tertiary structure means the three-dimensional shape of the DNA. Primary structure indicates the nucleotide sequence, secondary structure the base pairing of the molecule, and quarternary structure describes intermolecular interactions or interactions with other molecules. These structural levels are analogous to protein structural levels. Experimentally determined DNA structural forms are available at ProteinData Bank (PDB)  and Nucleic Acid Database (NDB) . The structures were visualized with Jmol: an open-source Java viewer for chemical structures in 3D (http://jmol.sourceforge.net/).
Effect on DNA form (VariO:0167)
“Effect on A-motif” (VariO:0455) is an example of “effect on DNA form” (VariO:0167), more defined as “effect on single stranded DNA structure” (VariO:0455). A-motif has a single-stranded helical structure at alkaline and neutral pH while at acidic pH it forms a right-handed helical duplex. The structure requires A-rich DNA or RNA sequence and is important e.g. for the mRNA molecules that contain long poly-A tails.
Effect on DNA double helix (VariO:0390)
Effect on DNA triple helix (VariO:0175)
“Effect on DNA triple helix” (VariO:0175) means alteration to triple helical nucleotide chain structure. “Effect on D loop” (VariO:0433) is a form of “effect on intermolecular DNA triple helix” (VariO:0423). In this structure the strands in double-stranded DNA are separated and one of them pairs with a third strand which can be DNA or RNA (Fig. 5e) . D loops are essential for the replication of mitochondrial DNA, which is circular. Variants at the D loop are common in cancers  and in some other diseases.
“Effect on intramolecular DNA triple helix” (VariO:0422) is the other type. The triple helix in H DNA requires mirror repeat symmetry. Supercoiling provides energy for opening of double-stranded DNA, then one of the chains swivels its background parallel to the remaining duplex DNA to form a triple helical structure. These are abundant in genomes and appear e.g. on regions that regulate expression of many genes involved in diseases. Variation can affect these structures and have “effect on H DNA” (VariO:0419) .
Effect on four-stranded DNA (VariO:0420)
“Effect on four-stranded DNA” (VariO:0420) means change to DNA structures where four chains are involved. DNA cruciform is formed on inverted repeat sequences when they form a cross-shaped structure with intrastrand base pairing. There are two conformations, in extended conformation the arms are at tips of a tetrahedron, whereas in closed conformation the arms are almost parallel. Cruciforms are involved in numerous interactions at DNA usage processes including gene expression regulation, replication and recombination . Variations can have “effect on DNA cruciform” (VariO:0394). Cruciform structures are prone for translocations and DNA instability .
i-Motifs appear in C-rich sequences. Two parallel C-rich strands that form a duplex are intercalated in antiparallel orientation, see Fig. 5f . The structures are uni-, bi-, or tetramolecular. Variations at these C-rich segments can have an “effect on i-motif” (VariO:0174). The MYC (MYC proto-oncogene, bHLH transcription factor) gene has in its promoter region seven nuclease sensitive element (NHE) III1 regions. Its expression is mainly (up to 90%) regulated by NHE III1 which can form an i-motif structure .
“Effect on nucleic acid G-quadruplex” (VariO:0173) describes changes where a G-quadruplex structure is involved  (Fig. 5g). These structures can be unimolecular, bimolecular or tetramolecular, and the chains in the two first ones can be either parallel or antiparallel, and formed by DNA, RNA or DNA-RNA hybrids . Certain diseases are associated to these structures, including neurological diseases such as fragile X syndrome .
Effect on DNA-RNA hybrid (VariO:0424)
DNA and RNA chains can bind complementarily and form hybrids. D loop is one such structure.
R loop consists of a DNA:RNA hybrid and a displaced single-stranded DNA. The RNA strand is produced by transcription. These loops are rather rare and instable, being targets for nuclease cleavage . They are implicated in human diseases, such are trinucleotide repeat-associated diseases . Changes to these hybrids can have an “effect on R loop” (VariO:0431)  (Fig. 5h). R-loop DB  includes both predicted and detected R loops in 8 organisms, including human.
T loops appear on telomeres where the single stranded chromosome terminus forms a loop to protect the DNA repair system from recognizing them . T loop is part of a large complex in which several proteins are involved, in human the sheltering complex of six proteins. Variations to these structures cause “effect on T loop” (VariO:0432) .
Epigenetic DNA modification (VariO:0156)
Epigenetic changes are heritable traits that do not change the DNA sequence. There are three major types of “epigenetic DNA modification” (VariO:0156), including DNA methylation, histone modification and nucleosome positioning.
“Epigenetic DNA methylation” (VariO:0157) occurs almost exclusively on cytosines at CpG dinucleotides in C + G rich regions called CpG islands. Methylations in these islands are often associated to gene silencing including genomic imprinting, which causes monoallelic gene expression. DNA methylation is significantly affected in systemic lupus erythematosus including numerous cytokine genes. An example of “epigenetic DNA methylation” (VariO:0157) is decreased methylation of the interleukin 1 receptor type 2 gene, IL1R2, which is a suppressor for IL1 signalling that leads to downregulation of IL1 and can be used as a biomarker for lupus . Further, trimethylation of histone H3 at lysine 4 (H3K4) molecules at PTPN22 (protein tyrosine phosphatase, non-receptor type 22) and LRP1B (LDL receptor related protein 1B) genes positively correlate with lupus severity and is annotated as “histone modification” (VariO:0453).
The GAA triplet expansion of the FXN gene in Friedreich ataxia alters nucleosome positioning and reduces transcription by making the start site not accessible . This is an example of “nucleosome positioning” (VariO:0158).
Genome variation (VariO:0428)
Genome-wide alterations are described at this level.
“Chromosome set number variation” (VariO:0215) is used to annotate variations that affect the entire chromosome set number. The variations range from “nulliploidy” (VariO:0221) to polyploidy (VariO:0218), from 0 to several genomic copies, respectively. “Polyploidy” (VariO:0218) appears naturally also in some human cells including liver . In “allopolyploidy” (VariO:0220) the chromosome sets originate from different organisms and is quite common in plants, such as in wheat . In “autopolyploidy” (VariO:0219) the chromosome sets originate from the same organism, as in the human liver polyploidy .
“Complex genomic variation” (VariO:0429) describes genomic variations that contains several complex components within a single chromosome or between several ones. In chromothripsis a chromosome or several is shattered into segments some of which are randomly combined  and other segments are lost. This is an ultimate example of “complex genomic variation” (VariO:0429).
VariO facilitates a detailed description of all kinds of DNA variants and their effects and consequences. These annotations can be made for any organism. DNA has four major sublevels for terms: variation type, function, structure, properties. DNA molecules have four levels: DNA chain, chromosome, chromatin and genome. By combining the terms, very detailed annotations are possible. By applying Evidence & Conclusion Ontology annotations  the quality and type of methods used or obtaining the data for the annotations can be described. For consistent annotation, the use of VariOtator tool  is recommended. It can generate variation type annotations automatically from HGVS descriptions and be directly ported to LOVD databases. Other types of annotations are made manually, VariOtator writes the annotation summary once all terms for a variant have been selected. VariO annotations will make data integration easier and more reliable. In this article, the full spectrum of DNA variations and their effects are presented in a systematic way with examples.
This work was supported by the Swedish Research Council [VR 2015–02510]. The funding body did not have any role in the design of the study and collection, analysis, or interpretation of data or in the writing of the manuscript.
Availability of data and materials
The Variation Ontology is available at http://www.variationontology.org/.
The author performed the study and wrote the manuscripts alone.The author read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The author declares that he has no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 8.Folkman L, Yang Y, Li Z, Stantic B, Sattar A, Mort M, Cooper DN, Liu Y, Zhou Y. DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels. Bioinformatics. 2015;31:1599–606.PubMedCrossRefPubMedCentralGoogle Scholar
- 11.Grimm DG, Azencott CA, Aicheler F, Gieraths U, MacArthur DG, Samocha KE, Cooper DN, Stenson PD, Daly MJ, Smoller JW, et al. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum Mutat 2015.Google Scholar
- 28.Viennas E, Komianou A, Mizzi C, Stojiljkovic M, Mitropoulou C, Muilu J, Vihinen M, Grypioti P, Papadaki S, Pavlidis C, et al. Expanded national database collection and data coverage in the FINDbase worldwide database for clinically relevant genomic variation allele frequencies. Nucleic Acids Res. 2017;45:D846–d853.PubMedCrossRefGoogle Scholar
- 29.Gonzalez-Galarza FF, Takeshita LY, Santos EJ, Kempson F, Maia MH, da Silva AL, Teles e Silva AL, Ghattaoraya GS, Alfirevic A, Jones AR, Middleton D. Allele frequency net 2015 update: new features for HLA epitopes, KIR and disease and HLA adverse drug reaction associations. Nucleic Acids Res. 2015;43:D784–8.CrossRefGoogle Scholar
- 32.Tomczak K, Czerwinska P, Wiznerowicz M. The Cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol (Pozn). 2015;19:A68–77.Google Scholar
- 33.Zhang J, Baran J, Cros A, Guberman JM, Haider S, Hsu J, Liang Y, Rivkin E, Wang J, Whitty B, et al. International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data. Database (Oxford) 2011; 2011. p. bar026.Google Scholar
- 34.Qasim I, Ahmad B, Khan MA, Khan N, Muhammad N, Basit S, Khan S. Pakistan genetic mutation database (PGMD); a centralized Pakistani mutome data source. Eur J Med Genet. 2017;61:204–8.Google Scholar
- 44.Lebron R, Gomez-Martin C, Carpena P, Bernaola-Galvan P, Barturen G, Hackenberg M, Oliver JL. NGSmethDB 2017: Enhanced methylomes and differential methylation. Nucleic Acids Res. 2017;(45):D97–d103.Google Scholar
- 56.Chibucos MC, Mungall CJ, Balakrishnan R, Christie KR, Huntley RP, White O, Blake JA, Lewis SE, Giglio M. Standardized description of scientific evidence using the evidence ontology (ECO). Database (Oxford). 2014;2014:bau066.Google Scholar
- 62.Vieira JP, Lopes F, Silva-Fernandes A, Sousa MV, Moura S, Sousa S, Costa BM, Barbosa M, Ylstra B, Temudo T, et al. Variant Rett syndrome in a girl with a pericentric X-chromosome inversion leading to epigenetic changes and overexpression of the MECP2 gene. Int J Dev Neurosci. 2015;46:82–7.PubMedCrossRefGoogle Scholar
- 63.Holinski-Feder E, Weiss M, Brandau O, Jedele KB, Nore B, Bäckesjö CM, Vihinen M, Hubbard SR, Belohradsky BH, Smith CI, Meindl A. Mutation screening of the BTK gene in 56 families with X-linked agammaglobulinemia (XLA): 47 unique mutations without correlation to clinical course. Pediatrics. 1998;101:276–84.PubMedCrossRefGoogle Scholar
- 67.Alsmadi O, John SE, Thareja G, Hebbar P, Antony D, Behbehani K, Thanaraj TA. Genome at juncture of early human migration: a systematic analysis of two whole genomes and thirteen exomes from Kuwaiti population subgroup of inferred Saudi Arabian tribe ancestry. PLoS One. 2014;9:e99069.PubMedPubMedCentralCrossRefGoogle Scholar
- 72.Mattsson PT, Lappalainen I, Bäckesjö CM, Brockmann E, Lauren S, Vihinen M, Smith CIE. Six X-linked agammaglobulinemia-causing missense mutations in the Src homology 2 domain of Bruton's tyrosine kinase: phosphotyrosine-binding and circular dichroism analysis. J Immunol. 2000;164:4170–7.PubMedCrossRefGoogle Scholar
- 74.Javierre BM, Fernandez AF, Richter J, Al-Shahrour F, Martin-Subero JI, Rodriguez-Ubreva J, Berdasco M, Fraga MF, O'Hanlon TP, Rider LG, et al. Changes in the pattern of DNA methylation associate with twin discordance in systemic lupus erythematosus. Genome Res. 2010;20:170–9.PubMedPubMedCentralCrossRefGoogle Scholar
- 89.Kaiser VB, Semple CA. When TADs go bad: chromatin structure and nuclear organisation in human disease. F1000Res. 2017;6:314.Google Scholar
- 95.Ohtsuka M, Kikuchi N, Yamamoto T, Suzutani T, Nakanaga K, Suzuki K, Ishii N. Buruli ulcer caused by Mycobacterium ulcerans subsp shinshuense: a rare case of familial concurrent occurrence and detection of insertion sequence 2404 in Japan. JAMA Dermatol. 2014;150:64–7.PubMedCrossRefPubMedCentralGoogle Scholar
- 96.Doucet AJ, Hulme AE, Sahinovic E, Kulpa DA, Moldovan JB, Kopera HC, Athanikar JN, Hasnaoui M, Bucheton A, Moran JV, Gilbert N. Characterization of LINE-1 ribonucleoprotein particles. PLoS Genet. 2016:6(10):e1001150.Google Scholar
- 108.de Souza DC, de Figueiredo AF, Ney Garcia DR, da Costa ES, Othman MAK, Liehr T, Abdelhay E, Silva MLM, de Souza Fernandez T. A unique set of complex chromosomal abnormalities in an infant with myeloid leukemia associated with Down syndrome. Mol Cytogenet. 2017;10:35.PubMedPubMedCentralCrossRefGoogle Scholar
- 109.Roth DB. V(D)J recombination: mechanism, errors, and Fidelity. Microbiol Spectr. 2014:2(6):MNDA3-0041-2014.Google Scholar
- 112.Plaiasu V, Ochiana D, Motei G, Georgescu A. A rare chromosomal disorder - isochromosome 18p syndrome. Maedica (Buchar). 2011;6:132–6.Google Scholar
- 117.Suresh G, Priyakumar UD. Atomistic investigation of the effect of incremental modification of deoxyribose sugars by locked nucleic acid (beta-D-LNA and alpha-L-LNA) moieties on the structures and thermodynamics of DNA-RNA hybrid duplexes. J Phys Chem B. 2014;118:5853–63.PubMedCrossRefPubMedCentralGoogle Scholar
- 129.Inagaki H, Ohye T, Kogo H, Kato T, Bolor H, Taniguchi M, Shaikh TH, Emanuel BS, Kurahashi H. Chromosomal instability mediated by non-B DNA: cruciform conformation and not DNA sequence is responsible for recurrent translocation in humans. Genome Res. 2009;19:191–8.PubMedPubMedCentralCrossRefGoogle Scholar
- 135.Freudenreich CH. R-loops: targets for nuclease cleavage and repeat instability. Curr Genet. 2018;64:789–94.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.