Proposed nomenclature for microhaplotypes
- 827 Downloads
Microhaplotypes are a new type of genetic marker in forensics and population genetics. A standardized nomenclature is desirable. A simple approach that does not require a central authority for approval is proposed. The nomenclature proposed follows the recommendation of the HUGO Gene Nomenclature Committee (http://www.genenames.org): “We strongly encourage naming families and groups of genes related by sequence and/or function using a “root” symbol. This is an efficient and informative way to name related genes, and already works well for a number of established gene families…” The proposal involves a simple root consisting of “mh” followed by the two-digit chromosome number and unique characters established by the authors in the initial publication. We suggest the unique symbol be an indication of the laboratory followed by characters unique to the chromosome and laboratory. For instance, the microhaplotype symbol mh01KK-001 refers to a locus on chromosome 1 published by the Kidd Lab (KK-) as their #001. Publication defines mh01KK-001 as comprised of four single nucleotide polymorphisms (SNPs), rs4648344, rs6663840, rs58111155, and rs6688969.
A microhaplotype locus has been defined as consisting of two to five (or more) single nucleotide polymorphisms (SNPs) within the length of a DNA sequence read, arbitrarily set at about 200 to 300 bp. This length has been chosen to make the loci phase-known in an individual who is genotyped by current massively parallel sequencing (MPS) [1, 2, 3]. The alleles at the locus are defined as the haplotypes comprised, at the defining SNPs, of the specific alleles seen on chromosomes in the population. Microhaplotypes have been advocated as potentially very useful in forensics and population genetics [1, 2, 3]. This nomenclature proposal is the result of our own lab’s nomenclature problems with microhaplotypes and builds upon previous experience with early DNA polymorphism nomenclature as well as ongoing issues in maintaining ALFRED . The proposal is not meant to be dictatorial but to inspire thought and discussion. Feedback is welcome, especially positive and constructive feedback. Negative feedback is also welcome, especially if an alternative system is proposed.
In presenting and discussing data in papers, it is simply too cumbersome to use the series of SNP symbols, usually rs numbers from dbSNP, in each mention of a microhaplotype (microhap) locus or its alleles. Standard procedure in the scientific literature would be to define a short symbol/acronym early in the paper and use that throughout, e.g., the use of SNP for single nucleotide polymorphism. In our publications, to date, we have used the nearby gene or our lab symbols for the locus while acknowledging that is not ideal . Other laboratories are now searching for and publishing microhaplotype loci [6, 7]. As different labs and authors may refer to the same microhap locus with different short symbols, how to facilitate cross-referencing and incorporating data into a common database can become a problem.
In the early 1980s, the gene mapping community established an initial system to catalog and establish symbols for DNA polymorphisms (e.g., ) known as D numbers. D numbers consist of the letter D (for “DNA”), the chromosome number, the letter S (for “site” or “sequence”) and a centrally assigned sequential catalog number. While for individual SNPs the dbSNP rs numbers have superseded the D number symbols, those symbols persist for many short tandem repeat polymorphisms (STRPs), including many commonly used in forensics, such as D18S51 (the 51st site cataloged on chromosome 18) and D3S1358 (the 1358th site cataloged on chromosome 3). Based on that experience (for several years I was in charge of the central cataloging and assigning of D numbers using resources at the Yale Human Gene Mapping Library), an analogous system could be accepted and used by the genetics community for microhaps. Note that this is analogous to the nomenclature used for open reading frames that may be functional genes, e.g., “C14ORF43,” which was one of the ad hoc microhaplotype “names” in Kidd et al. . The problem is that there is no central authority with the funding to assign official names. (We note that the correct previous symbol, C14orf43, has now been replaced by a gene name, ELMSAN1 .)
Symbol previously used
SNPs currently involved
If subsequent papers would use the standardized symbolism proposed starting with the initial publication, considerable confusion could be avoided. Using this schema for naming microhaplotype loci, each lab could maintain its own records and create its own unique symbol when the lab’s microhaplotype data are published. The lab’s subsequent papers as well as papers by other researchers could use that as the symbol for that microhaplotype.
What we propose is not perfect; we recognize problems with definition of alleles or even the extent of the locus when additional variants are identified. Microhaplotype data obtained by MPS will include, in addition to the SNPs initially used to study the locus, other variations already known and characterized in dbSNP and 1000 Genomes, as other polymorphic sites or as rare single nucleotide variants (SNVs). Novel variation is likely to be identified when “new” populations are studied. In such cases, the initial SNPs specified become the initial basis for definition of alleles. While the same locus symbol would ideally be used, specification of the specific sites identified and definitions of alleles (haplotypes based on sites studied and identified) would be necessary in any publication. Possibly a system of indicating a modification of a previously defined microhaplotype could be devised rather than defining a completely new microhaplotype symbol. In the past, this has been the case with some studies of P450 genes (e.g., [11, 12]) because haplotypes were identified that did not correspond to the definitions in the “cypalleles” web site . When individual SNPs are typed and haplotypes defined by statistical phasing, it is also possible that a SNP in the initial definition is omitted in a particular study. That could be specifically noted as the alleles are defined for that study. Manuscript-specific definition of alleles (haplotypes) will be less of a problem if at least a common symbol is used for the microhap locus more broadly defined.
Our own papers cited above illustrate the difficulty of maintaining a consistent symbolism when publications occur at different stages of the overall research in the lab. If subsequent papers used the standardized symbolism proposed starting with the initial publication, considerable confusion could be avoided. Each lab could maintain its own records and create its own unique symbol, but a common theme would preclude much potential confusion.
This work was supported in part by grants 2013-DN-BX-K023, 2014-DN-BX-K030, and 2015-DN-BX-023 to KKK awarded by the National Institute of Justice, Office of Justice Programs, US Department of Justice. Points of view in this document are those of the author and do not necessarily represent the official position or policies of the US Department of Justice. The work was also supported in part by grant BCS-1444279 from the US National Science Foundation.
I thank Weibo Liang and Daniele Podini for the helpful discussions on this issue. I thank Usha Soundararajan for help with the manuscript and William Speed and Françoise Friedlaender for help with the tables and figures.
The author declares that he has no competing interests.
- 4.The ALlele FREquency Database ALFRED. https://alfred.med.yale.edu/alfred/. Accessed: Accessed 31st March, 2016
- 9.HUGO Gene Nomenclature Committee. http://www.genenames.org/. Accessed: 31st March, 2016
- 13.CYP2E1 allele nomenclature. http://www.cypalleles.ki.se/cyp2e1.htm. Accessed: 31st March, 2016
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.