Medaka: a promising model animal for comparative population genomics
Within-species genome diversity has been best studied in humans. The international HapMap project has revealed a tremendous amount of single-nucleotide polymorphisms (SNPs) among humans, many of which show signals of positive selection during human evolution. In most of the cases, however, functional differences between the alleles remain experimentally unverified due to the inherent difficulty of human genetic studies. It would therefore be highly useful to have a vertebrate model with the following characteristics: (1) high within-species genetic diversity, (2) a variety of gene-manipulation protocols already developed, and (3) a completely sequenced genome. Medaka (Oryzias latipes) and its congeneric species, tiny fresh-water teleosts distributed broadly in East and Southeast Asia, meet these criteria.
Using Oryzias species from 27 local populations, we conducted a simple screening of nonsynonymous SNPs for 11 genes with apparent orthology between medaka and humans. We found medaka SNPs for which the same sites in human orthologs are known to be highly differentiated among the HapMap populations. Importantly, some of these SNPs show signals of positive selection.
These results indicate that medaka is a promising model system for comparative population genomics exploring the functional and adaptive significance of allelic differentiations.
KeywordsBrown Adipose Tissue International HapMap Project Japanese Medaka Natural Library Medaka Genome
The accumulation of human genetic polymorphism data provided by sources such as the international HapMap project [1, 2] has revealed a number of SNP sites with markedly different allele frequencies among human populations. Such data make systematic searches for disease-causing or drug-responsive genomic regions possible [3, 4], and the accumulated SNP data can also provide compelling evidence of positive selection during human evolution [5, 6]. An inevitable issue, however, is that mutagenesis and/or crossing-over experiments to elucidate functional differences between alleles at these polymorphic sites are practically impossible in humans. A vertebrate model animal with a broad geographic distribution and documented high genetic polymorphism could serve as a "natural library" of genetic variation in humans for orthologous genes that could be under similar selective pressures.
The medaka (Oryzias latipes) is a notable candidate for such a model animal. This small freshwater fish is found in East Asia with closely related congeneric species broadly distributed throughout Southeast Asia, and it has a long history of use as an experimental animal since the early 20th century. A number of inbred medaka strains have been established, and transgenesis and mutagenesis protocols have been developed, suggesting that medaka has great potential for use in systematic genetic analyses [7, 8, 9, 10]. Medaka genome sequences are also available . The greatest advantage of using medaka is its enormous genetic diversity compared to the other fish models (zebrafish, pufferfish, etc.), with the average nucleotide difference of 3.4% between two inbred medaka strains being the highest among any vertebrates thus far documented . In this study, our purpose is to assess the validity of medaka as a useful resource of comparative population genomics.
Japanese medaka (Oryzias latipes) populations consist of four geographical populations. We selected 24 wild-type strains from the Japanese medaka (see Additional file 1) and three closely related congeneric species (O. curvinotus, O. luzonensis and O. celebensis; see Additional file 2). We also examined an inbred strain (Hd-rR) of Southern Japanese origin.
PCR-direct sequence, mRNA extraction and cDNA sequence
The 11 genes examined in this study
Gene ontology "biological process" annotation
alcohol metabolic process
I-kappaB kinase/NF-kappaB cascade
coagulation factor II
regulation of G-protein coupled receptor protein signaling pathway
required for axial rotation and left-right specification
solute carrier family 24, member 5
solute carrier family 30 (zinc transporter), member 9
solute carrier family 45, member 2
opsin 1 (cone pigments), long-wave-sensitive (color blindness, protan)
response to temperature stimulus
Statistical and phylogenetic analysis
Nucleotide sequences were aligned using CLUSTALW . The pairwise dN and dS values among strains of 11 genes were calculated by DnaSP Software (version 4.0) according to the Nei-Gojobori method . Insertions and deletions (indels) were excluded from analysis. For the entire nucleotide sequence of RTTN, the d N-d S and p-values were calculated by MEGA 4  according to the Nei-Gojobori method with statistical significance tested by Z-tests.
Protein structure prediction
The GeneSilico metaserver  was used to predict protein secondary structure and order/disorder, and to carry out fold-recognition (i.e. match the query sequence with structurally characterized templates). Potential phosphorylation sites were predicted using a semi-independent component of the metaserver available at the URL http://genesilico.pl/Phosphoserver/. For the THEA2 protein, the metaserver indicated very high similarity (PCONS score 3.28) of residues 1–360 (human numbering) to known Acyl-CoA hydrolase structures (e.g. 2gvh in the Protein Data Bank) and high similarity of residues 360–607 (PCONS score 2.00) to lipid transfer proteins from the STAR family (e.g. 1ln1 in the PDB). Long regions of intrinsic conformational disorder were predicted for loops connecting structural domains (around residues 160–200 and 340–370). For the RTTN protein, the metaserver identified the α-helical armadillo domain of β-catenin (1i7w in Protein Data Bank) as the best modeling template, in particular for residues 1–120, with a high confidence score (PCONS score 1.67). Long regions of structural disorder, devoid of secondary and tertiary structure, were predicted for residues 120–160 and 280–370. Three-dimensional structural models of the ordered (i.e. stably folded) parts of THEA2 and RTTN proteins were generated and optimized using the FRankenstein's Monster method . The final models were evaluated as good quality by the PROQ server . The models were expected to exhibit a root mean square deviation to the true structures in the order of 2–4 Å, suggesting that they are sufficiently reliable to make functional predictions at the level of individual amino acid residues. The atomic details of these models, however, must be taken with a grain of salt.
Results and discussion
The d N - d S values (upper diagonal) and the significance (lower diagonal) based on RTTN cDNA (5.8 kb) sequences
Although its exact function is not known, RTTN is reported to be involved in determining the rotation of the body axis and the left-right asymmetry of internal organs during the embryonic development of mice . The conspicuous differentiation of RTTN alleles among human populations also suggests differential natural selection acting on different populations: at a nonsynonymous SNP site (rs3911730) in the RTTN exon 3, the A/A genotype occurs in 90% of Africans, 2% of Europeans and is absent in Asians, while the C/C genotype occurs in 3% of Africans, 80% of Europeans and 100% of Asians.
Previous studies have reported that genes identified in fish through "forward genetic" analysis of phenotypic mutants are involved in forming variations of related phenotypes in humans, e.g. of skin pigmentation [20, 21, 22, 23, 24] and epithelial development . Our approach in this study is an extension of these previous studies, as a form of "reverse genetics" of genes that show, as a signature of natural selection acting on them, a prominent level of diversification in the allele frequency among populations with different ecological histories in both fish and humans. We found that out of 11 genes in our analysis, the medaka THEA2 gene has a nonsynonymous polymorphic site at exactly the same position as its ortholog in humans, and the RTTN gene shows signs of population differentiation that can be explained plausibly by natural selection. The aim of our analysis is not to demonstrate evidence of natural selection in medaka, but to indicate that medaka is a marvelous resource as a "natural library" of genetic diversity, and this approach is efficient enough to find candidate genes targeted by natural selection in both humans and medaka. The exact function of the genes and the exact nature of the functional differences between alleles can be studied more feasibly in medaka, where crossing experiments between different genotypes of interest and transgenic techniques have already been established [7, 8]. This method can be applied to any polymorphic gene in humans, and larger-scale and more systematic screening of orthologous gene polymorphisms in medaka will find various target genes for further functional analyses. As the medaka has been widely used for carcinogenesis and ecotoxicological studies , for example, in screening for genetic variants concerning medaka carcinogenesis and ecotoxins, it could also be used for testing variations in drug response in humans. Thus, we conclude that the medaka is a good vertebrate model of the functional diversity caused by human DNA polymorphisms that have been identified by recent resequencing and typing efforts.
This work was supported by a Grant-in-Aid for Scientific Research (A) from the Japan Society for the Promotion of Science (JSPS) (19207018) to SK, by a Grant-in-Aid for Scientific Research (C) from JSPS (19570226) to HO, and by a Grant-in-Aid for Scientific Research in the Priority Area "Comparative Genomics" (#015) from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT) to HM. We thank Professor Emeritus Akihiro Shima and Dr. Atsuko Shimada (the University of Tokyo) for their efforts on keeping medaka stocks from wild populations.
- 12.Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMedCentralCrossRefPubMedGoogle Scholar
- 16.Kosinski J, Cymerman IA, Feder M, Kurowski MA, Sasin JM, Bujnicki JM: A "FRankenstein's monster" approach to comparative modeling: merging the finest fragments of Fold-Recognition models and iterative model refinement aided by 3D structure evaluation. Proteins. 2003, 53 (Suppl 6): 369-379. 10.1002/prot.10545.CrossRefPubMedGoogle Scholar
- 18.Adams SH, Chui C, Schilbach SL, Yu XX, Goddard AD, Grimaldi JC, Lee J, Dowd P, Colman S, Lewin DA: BFIT, a unique acyl-CoA thioesterase induced in thermogenic brown adipose tissue: cloning, organization of the human gene and assessment of a potential link to obesity. Biochem J. 2001, 360: 135-142. 10.1042/0264-6021:3600135.PubMedCentralCrossRefPubMedGoogle Scholar
- 29.Oota H, Pakstis AJ, Bonne-Tamir B, Goldman D, Grigorenko E, Kajuna SL, Karoma NJ, Kungulilo S, Lu RB, Odunsi K, et al: The evolution and population genetics of the ALDH2 locus: random genetic drift, selection, and low levels of recombination. Ann Hum Genet. 2004, 68: 93-109. 10.1046/j.1529-8817.2003.00060.x.CrossRefPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.