Modern genome sequencing technology has transformed human molecular genetics. Exome and whole genome sequence data from patients with diseases of unknown etiology have provided several hundred new candidate genes for many diseases. Comparison of gene, phenotype, and genetic disease data from mouse models aids in refinement of candidate genes, and thus, in identifying causative mutations and potential therapeutic targets. To enable use of mouse model data in translational research, the Mouse Genome Informatics (MGI, www.informatics.jax.org) resource offers access to integrated mouse data, including experimentally and computationally generated data sets, to correlate mouse phenotype with human clinical signs and symptoms (Eppig et al. 2015a). MGI catalogs all mouse mutant alleles, annotates genotypes of mice carrying mutations to standardized phenotype terms and Online Mendelian Inheritance in Man (OMIM, www.omim.org) human disease descriptions, and includes links to other supporting gene information including sequence, polymorphism, spatiotemporal expression, genomic location, biochemical function and process, subcellular topology, and mammalian gene homology. Standardization of nomenclature and annotation of data with bio-ontologies and standardized vocabularies ensure that data are consistently annotated, making precise data mining possible.
Data provided by studies of the laboratory mouse have been shown to be valuable in driving discovery of human disease mutations. For example, a recent study describes the discovery of a variant of human PNPLA8 that causes mitochondrial myopathy (OMIM 251950, Saunders et al. 2015). A mouse mutation in this gene with a similar phenotype had been previously described in Mancuso et al. 2009. Similarly, a mutation in the human FAT1 gene was recently shown to be causative for a facioscapulohumeral dystrophy-like disease (Puppo et al. 2015). Mice carrying a muscle-specific conditional mutation in Fat1 or a constitutive hypomorphic allele were shown to develop muscular and non-muscular defects mimicking human FSHD and these studies predicted the discovery of the human mutation in the same gene (Caruso et al. 2013). In addition, mouse models can provide valuable insights into molecular mechanisms and therapies. For example, studies of mutations in the mouse Trp53 gene have been used to infer functions of the human TRP53 gene, the most frequently mutated gene in human cancer, and have led to key understanding of functions of this gene as well as its role as a therapeutic target (Donehower 2014). MGI continues to provide integrated access to mutations in mouse to aid such discoveries related to human disease.
MGI collects, integrates, and presents data describing published and contributed spontaneous, induced, and genetically engineered mouse mutations. Downloaded data from large-scale mouse mutagenesis projects, including N-ethyl-N-nitrosourea (ENU), gene trap, and knockout mutagenesis projects (Smith and Eppig 2012) are also incorporated into the MGI dataset. The phenotypic consequences of all of these mutations in mice are described using the Mammalian Phenotype Ontology (Smith and Eppig 2012) and associated with human disease terms from OMIM (Eppig et al. 2015a) to enable consistent searching and data retrieval across all mutation types. MGI currently holds (Table 1) over 43,000 alleles present in mice which have been used to investigate phenotypes and diseases in over 55,600 and 4500 genotypes, respectively. These genotypes have contributed to the understanding of phenotype and disease consequences of mutations in over 15,300 and 2000 genetic markers, respectively. These disease, mutation, and phenotype data are fully integrated with all other MGI data including genomic, expression, tumor, and pathway knowledge to enable complex queries across multiple data types.
Table 1 Selection of counts of disease, mutant allele, and phenotype data in MGI (data compiled on May 13, 2015)
Mouse phenotype data generated by the International Mouse Phenotyping Consortium (IMPC) is the newest addition to MGI’s automated phenotype data integration pipeline, with new data added weekly. This worldwide consortium plans to generate knockout mice using the embryonic stem cell knockout resource developed by the International Mouse Knockout Consortium (IKMC, Bradley et al. 2012). Mice made with these mutant alleles are subject to high-throughput phenotyping screens. Statistically robust phenodeviants are automatically assigned MP terms and these phenotyping calls are downloaded to MGI (Koscielny et al. 2014).
Integration of these data with all other phenotypic data at MGI facilitates comparison of the effects of different mutation types to support comparative and correlative discovery and hypothesis building (Fig. 1a, b).