ZFNGenome: A comprehensive resource for locating zinc finger nuclease target sites in model organisms
- 9.5k Downloads
Zinc Finger Nucleases (ZFNs) have tremendous potential as tools to facilitate genomic modifications, such as precise gene knockouts or gene replacements by homologous recombination. ZFNs can be used to advance both basic research and clinical applications, including gene therapy. Recently, the ability to engineer ZFNs that target any desired genomic DNA sequence with high fidelity has improved significantly with the introduction of rapid, robust, and publicly available techniques for ZFN design such as the Oligomerized Pool ENgineering (OPEN) method. The motivation for this study is to make resources for genome modifications using OPEN-generated ZFNs more accessible to researchers by creating a user-friendly interface that identifies and provides quality scores for all potential ZFN target sites in the complete genomes of several model organisms.
ZFNGenome is a GBrowse-based tool for identifying and visualizing potential target sites for OPEN-generated ZFNs. ZFNGenome currently includes a total of more than 11.6 million potential ZFN target sites, mapped within the fully sequenced genomes of seven model organisms; S. cerevisiae, C. reinhardtii, A. thaliana, D. melanogaster, D. rerio, C. elegans, and H. sapiens and can be visualized within the flexible GBrowse environment. Additional model organisms will be included in future updates. ZFNGenome provides information about each potential ZFN target site, including its chromosomal location and position relative to transcription initiation site(s). Users can query ZFNGenome using several different criteria (e.g., gene ID, transcript ID, target site sequence). Tracks in ZFNGenome also provide "uniqueness" and ZiFOpT (Zi nc F inger OP EN T argeter) "confidence" scores that estimate the likelihood that a chosen ZFN target site will function in vivo. ZFNGenome is dynamically linked to ZiFDB, allowing users access to all available information about zinc finger reagents, such as the effectiveness of a given ZFN in creating double-stranded breaks.
ZFNGenome provides a user-friendly interface that allows researchers to access resources and information regarding genomic target sites for engineered ZFNs in seven model organisms. This genome-wide database of potential ZFN target sites should greatly facilitate the utilization of ZFNs in both basic and clinical research.
KeywordsZinc Finger Zinc Finger Nuclease Protein Code Transcript Potential Target Site Zinc Finger Array
List of Abbreviations Used
O ligomerized P ool EN gineering
Zinc Finger Array
Zinc Finger Protein
Zinc Finger Nuclease
Zi nc F inger OP EN T argeter
double stranded breaks
non-homologous end joining
generic model organism database project
support vector machines
Zinc Finger Database
The ability to efficiently modify the genome of an organism with a high degree of specificity would advance both research with model organisms and human gene therapy clinical trials [1, 2, 3]. In recent studies, zinc finger nuclease (ZFN)-mediated genomic modification rates of 3% - 100% for specific genes have been reported in zebrafish, Arabidopsis, and rat [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. Moreover, ZFNs are being evaluated in human gene therapy clinical trials for treating AIDS [11, 17, 18, 19]. Thus, ZFNs are emerging as premier tools for site-specific genomic modification in both animals and plants.
In eukaryotes, repair of DSBs in DNA is primarily accomplished via one of two pathways, homologous recombination (HR) and non-homologous end-joining (NHEJ) (Figure 1A). Depending on the desired modification, either pathway can be exploited in ZFN-mediated genomic engineering. Because HR relies on homologous DNA to repair the DSB, gene targeting can be achieved by supplying an exogenous "donor" template. This results in replication of the "donor" DNA sequence at the target locus, a process that has been utilized to introduce small mutations or large insertions [4, 9, 12, 13, 16, 24, 25, 26, 27]. In contrast, NHEJ is an error-prone repair process and hence is ideal for generating mutations that can result in gene knockouts or knock-downs when the ZFN-mediated DSB is introduced into the protein coding sequence of a gene [5, 6, 7, 8, 9, 11, 28, 29].
Oligomerized Pool Engineering (OPEN) is a highly robust and publicly available protocol for engineering zinc finger arrays with high specificity and in vivo functionality [9, 30, 31]. OPEN has been successfully used to generate ZFNs that function efficiently in plants [13, 15], zebrafish , and human somatic  and pluripotent stem cells . OPEN is a selection-based method in which a pre-constructed randomized pool of candidate ZFAs is screened to identify those with high affinity and specificity for a desired target sequence. Significantly higher in vivo success rates have been reported using OPEN-generated ZFNs, compared with ZFNs generated using the more traditional modular assembly approach [32, 33, 34]. Resources for generating ZFNs using OPEN have been developed and made publicly available by the Zinc Finger Consortium [9, 31, 35]. Currently, OPEN reagents include modules that recognize all 16 possible GNN triplets (i.e., DNA triplets beginning with G, followed by any nucleotide in the second and third positions), as well as several TNN triplets. Thus, all DNA sites that contain only GNN and/or select TNN triplets can potentially be targeted using the OPEN protocol .
To facilitate use of OPEN ZFNs for genome modification, we have developed ZFNGenome, a resource that displays potential ZFN target sites in a genome browser built on the user-friendly GBrowse platform . We analyzed the complete sequenced genomes of seven model organisms and identified all sequences that are potentially targetable using currently available OPEN ZFN reagents. ZFN reagents were obtained from Joung and colleagues , and ZFN target sites were identified using software implemented in the ZiFiT web server [37, 38]. ZFNGenome thus allows users to quickly evaluate "pre-identified" ZFN target sites for any desired gene or region of interest.
To our knowledge, ZFNGenome represents the first compendium of potential ZFN target sites in sequenced and annotated genomes of model organisms. The current version includes ZFN target sites in seven organisms: Saccharomyces cerevisiae (budding yeast), Chlamydomonas reinhardtii (green algae), Arabidopsis thaliana (thale cress), Caenorhabditis elegans (nematode), Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), and Homo sapiens (human). Additional model organisms, including three plant species; Glycine max (soybean), Oryza sativa (rice), Zea mays (maize), and three animal species Tribolium castaneum (red flour beetle), Mus musculus (mouse), Rattus norvegicus (brown rat) will be added in the near future.
Construction and Content
Model organism genomes analyzed and the number of OPEN ZFN target sites identified
Total # of
Total # of
Avg. # ZFN
ZFNGenome utilizes GBrowse 1.7  to display identified potential OPEN target sites, along with basic genome annotations, such as genes, transcripts, exons, introns, and 5' and 3' UTRs. ZFNGenome is hosted on an Apache2 web server and uses a MySQL DB linked to a GBrowse front end via open source adaptors available in BioPerl (version 1.6) . The ZFN target sites can be exported for use as annotations in other GBrowse-based genome browsers such TAIR and WormBase. As described below, each ZFN target site is hyperlinked to ZiFDB .
Resources available in ZFNGenome
To evaluate the reliability of data presented in ZFNGenome, we compared our results with other published data. Two types of data are presented in ZFNGenome: annotated genomic features and potential ZFN target sites. The sources from which we acquired the genomic features are listed in Table 1. These are widely considered to be the "gold standard" data sources for the model organisms analyzed because they are carefully annotated and repeatedly evaluated by the curators and users of these databases. These source databases are also extensively used by investigators utilizing the various model organisms and are therefore familiar to users. To identify potential errors that may have been introduced during pre-processing or data analysis, we performed quality assurance tests as follows: i) for each organism, several 5 kb segments of genomic sequence were randomly selected from each chromosome; 2) selected chromosomal DNA sequences were individually re-scanned using the ZiFiT web server  to identify potential OPEN ZFN sites; 3) sites identified by the ZiFiT server were directly compared to the results for the corresponding region obtained from the ZFNGenome database; genomic features were checked against the original database. To improve the user interface and documentation, we incorporated suggestions from at least one expert scientist for each of model organisms included in ZFNGenome.
Utility and Discussion
Currently available ZFNs can target 80 - 95% of protein coding transcripts in 7 model organisms
The results presented in Table 1 illustrate both the power and current limitations of OPEN ZFN engineering technology and identify gaps where further improvement is needed. Most striking is the relatively high level of coverage currently achievable. This ranges from 85% of protein coding transcripts in Caenorhabditis elegans to 95% of protein coding transcripts in Danio rerio. Also noteworthy is the number of potential target sites available within any given transcript: in the model organisms examined to date, each transcript contains, on average 5 - 23 target sites (Table 1). The current lack of OPEN ZFN reagents for targeting TNN, ANN and CNN triplets is a limitation, especially in organisms with AT rich genomes. However, even in Arabidopsis (35.5% GC) more than 91% of the protein coding transcripts are potentially targetable. As more ZFN reagents for targeting additional triplets become available, the applicability of ZFN technology will continue to increase.
The first study in which the entire genome of a model organism was analyzed to identify potential target sites for ZFNs focused on the zebrafish, Danio rerio. In that study, identified ZFN target sites were published in the form of 26 supplemental tables (one for each chromosome). Although this information has apparently proven useful for members of the zebrafish community, ZFNGenome was developed in an effort to make such large datasets searchable and more readily accessible to a broader group of researchers working in zebrafish as well as other model organisms.
Because the experimental generation and testing of ZFNs using the OPEN protocol is not a trivial undertaking, the utility of a method to discriminate between ZFN target sites that are likely to function successfully in vivo and those that are not, cannot be over-emphasized. Our analysis discussed above reveals that, on average, every transcript in the zebrafish genome contains ~ 8 potential ZFN target sites (see Table 1). In ZFNGenome, the incorporation of "uniqueness" and ZiFOpT "confidence" scores (42) should help improve the time and cost-effectiveness of genomic modification experiments utilizing ZFNs.
In the first implementation of ZFNGenome, we used GBrowse version 1.67 with a BerkeleyDB back end to display all potential ZFN target sites found in Arabidopsis. A total of 381,497 sites were identified, 171,409 of which were located within coding regions (an average of 5.7 sites per targetable transcript). The current version of ZFNGenome (2.0) includes S. cerevisiae, C. reinhardtii, A. thaliana, C. elegans, D. melanogaster, D. rerio, and H. sapiens. In addition, it has been implemented in the newer GBrowse 1.7 with a MySQL database, which results in a more dynamic and user-friendly interface. GBrowse 1.7 is a robust and highly customizable browser available from the Generic Model Organism Database project (GMOD) . A noteworthy feature is the ability to share tracks with other GBrowse-based resources. To date ~119 implementations of GBrowse are available http://gmod.org/wiki/GMOD_Users. Users accustomed to using popular model organism resources, such as TAIR for Arabidopsis  or FlyBase for Drosophila , can simply export tracks containing ZFN target sites from ZFNGenome and into their browser of choice for further analysis.
Several existing databases house information on ZFPs and associated binding sites. ZiFDB http://bindr.gdcb.iastate.edu/ZiFDB contains information about engineered zinc finger arrays and individual modules that have been experimentally evaluated for function in vivo. ZifBase http://web.iitd.ac.in/~sundar/zifbase/ is a repository that includes information about both naturally occurring and engineered zinc finger proteins . Sequences of ZFP binding sites are also collected in TRANSFAC http://www.biobase-international.com/index.php?id=transfac and JASPAR http://jaspar.genereg.net/. Tools for predicting the DNA target sites for a selected ZFP include ZIFIBI http://bioinfo.hanyang.ac.kr/ZIFIBI/frameset.php, a hidden Markov model based predictor that takes into account the interdependence between positions -1, +3 and +6 of a chosen ZFP to predict its potential DNA binding site(s) . Also, Persikov et al.  have used support vector machines (SVMs) to predict and rank potential ZFP binding sites for a selected ZFP.
Several web-based tools for identifying potential ZFN binding sites within a given DNA sequence are currently available. Zinc Finger Tools http://www.scripps.edu/mb/barbas/zfdesign/zfdesignhome.php can be used to identify target sites for zinc finger arrays composed of available modules (16 GNN, 15 ANN, 15 CNN), generated by the Barbas laboratory, within any given DNA sequence up to 10 kb in length . ZifBase tools http://web.iitd.ac.in/~sundar/zifbase/ can identify target sites in a given DNA sequence, with the option of using target site triplet composition (i.e., the number of GNN, CNN, TNN and ANN triplets) as a selection criterion. TagScan http://www.isrec.isb-sib.ch/tagger/tagscan.html is capable of performing searches for either exact or nearly exact matches (≤ 2 mismatches) between a given query sequence, such as a ZFP target site, and a large database, such as a genomic sequence database . ZiFiT http://bindr.gdcb.iastate.edu/zifit/ is similar to ZFTools in that it allows users to identifying target sites for ZFNs. ZiFiT also can identify sites potentially targetable with ZFPs made from zinc finger modules developed and/or characterized by the Barbas lab, Sangamo BioSciences, Inc., and Toolgen http://www.toolgen.com.
In contrast to all of these existing web-based tools, which identify potential ZFN target sites within a user-provided DNA sequence (typically < 10 kb), ZFNGenome is a comprehensive repository that contains all potential ZFN sites targetable using available OPEN reagents in the complete genomic sequences of 7 model organisms. To help users distinguish between high and low quality potential ZFN target sites, ZFNGenome provides two metrics: a "uniqueness" score showing the number of times a sequence is found within the given genome and a ZiFOpT score providing a prediction of the likelihood that a given ZFN will be active in vivo.
Planned future development
ZFNGenome will be updated regularly to incorporate revisions in genomic DNA sequences and annotations, and to take into account new potential ZFN target sites that can be considered when new reagents, such as additional OPEN pools, become available. The genomes of several other established and emerging model organisms currently in the pipeline include: maize, rice, soybean, red flower beetle, mouse, and rat. We also intend to implement additional features, including capabilities for identifying target sites for ZFNs made by other publicly available engineering methods such as modular assembly.
OPEN is a robust, publicly available, experimental platform for the generation of engineered ZFNs that function with high specificity in vivo. ZFNGenome was developed to enhance and broaden the applicability of ZFNs for genomic modification by providing an online resource that contains all potential target sites for OPEN-generated ZFNs in the sequenced genomes of several model organisms. ZFNGenome has a user-friendly interface and is seamlessly integrated with other publicly available Zinc Finger Consortium resources, such as ZiFiT, ZiFDB, and ZiFOpT. ZFNGenome should be a valuable resource for scientists and clinicians who wish to exploit the powerful technologies for genome modification now available as a result of recent developments in ZFP design and engineering.
Availability and Requirements
We thank members of our research groups for helpful discussions and Chris Campbell for assistance with 64 bit conversion. We also thank Jo Anne Powell-Coffman, Jeff Essner, David Wright, Ben Lewis and Rasna Walia for critical comments on the ZFNGenome server and the manuscript. This work was supported by NSF DBI 0923827 to DFV, DD, and JKJ, NIH grants R01 GM069906 and R01 GM088040 to JKJ, The Roy J. Carver Charitable Trust 08-3185 to CRC, and the Center for Integrated Animal Genomics at Iowa State University to DD. JDS was supported by the NIH T32 CA009216.
- 5.Doyon Y, McCammon JM, Miller JC, Faraji F, Ngo C, Katibah GE, Amora R, Hocking TD, Zhang L, Rebar EJ, Gregory PD, Urnov FD, Amacher SL: Heritable targeted gene disruption in zebrafish using designed zinc-finger nucleases. Nat Biotechnol. 2008, 26 (6): 702-708. 10.1038/nbt1409.CrossRefPubMedPubMedCentralGoogle Scholar
- 7.Geurts AM, Cost GJ, Freyvert Y, Zeitler B, Miller JC, Choi VM, Jenkins SS, Wood A, Cui X, Meng X, Vincent A, Lam S, Michalkiewicz M, Schilling R, Foeckler J, Kalloway S, Weiler H, Menoret S, Anegon I, Davis GD, Zhang L, Rebar EJ, Gregory PD, Urnov FD, Jacob HJ, Buelow R: Knockout rats via embryo microinjection of zinc-finger nucleases. Science. 2009, 325 (5939): 433-10.1126/science.1172447.CrossRefPubMedPubMedCentralGoogle Scholar
- 9.Maeder ML, Thibodeau-Beganny S, Osiak A, Wright DA, Anthony RM, Eichtinger M, Jiang T, Foley JE, Winfrey RJ, Townsend JA, Unger-Wallace E, Sander JD, Muller-Lerch F, Fu F, Pearlberg J, Gobel C, Dassie JP, Pruett-Miller SM, Porteus MH, Sgroi DC, Iafrate AJ, Dobbs D, McCray PB, Cathomen T, Voytas DF, Joung JK: Rapid "open-source" engineering of customized zinc-finger nucleases for highly efficient gene modification. Mol Cell. 2008, 31 (2): 294-301. 10.1016/j.molcel.2008.06.016.CrossRefPubMedPubMedCentralGoogle Scholar
- 11.Perez EE, Wang J, Miller JC, Jouvenot Y, Kim KA, Liu O, Wang N, Lee G, Bartsevich VV, Lee YL, Guschin DY, Rupniewski I, Waite AJ, Carpenito C, Carroll RG, Orange JS, Urnov FD, Rebar EJ, Ando D, Gregory PD, Riley JL, Holmes MC, June CH: Establishment of HIV-1 resistance in CD4+ T cells by genome editing using zinc-finger nucleases. Nat Biotechnol. 2008, 26 (7): 808-816. 10.1038/nbt1410.CrossRefPubMedPubMedCentralGoogle Scholar
- 12.Shukla VK, Doyon Y, Miller JC, DeKelver RC, Moehle EA, Worden SE, Mitchell JC, Arnold NL, Gopalan S, Meng X, Choi VM, Rock JM, Wu YY, Katibah GE, Zhifang G, McCaskill D, Simpson MA, Blakeslee B, Greenwalt SA, Butler HJ, Hinkley SJ, Zhang L, Rebar EJ, Gregory PD, Urnov FD: Precise genome modification in the crop species Zea mays using zinc-finger nucleases. Nature. 2009, 459 (7245): 437-441. 10.1038/nature07992.CrossRefPubMedGoogle Scholar
- 15.Zhang F, Maeder ML, Unger-Wallace E, Hoshaw JP, Reyon D, Christian M, Li X, Pierick CJ, Dobbs D, Peterson T, Joung JK, Voytas DF: High frequency targeted mutagenesis in Arabidopsis thaliana using zinc finger nucleases. Proc Natl Acad Sci USA. 2010, 107 (26): 12028-12033. 10.1073/pnas.0914991107.CrossRefPubMedPubMedCentralGoogle Scholar
- 16.Zou J, Maeder ML, Mali P, Pruett-Miller SM, Thibodeau-Beganny S, Chou BK, Chen G, Ye Z, Park IH, Daley GQ, Porteus MH, Joung JK, Cheng L: Gene targeting of a disease-related gene in human induced pluripotent stem and embryonic stem cells. Cell Stem Cell. 2009, 5 (1): 97-110. 10.1016/j.stem.2009.05.023.CrossRefPubMedPubMedCentralGoogle Scholar
- 23.Mani M, Smith J, Kandavelou K, Berg JM, Chandrasegaran S: Binding of two zinc finger nuclease monomers to two specific sites is required for effective double-strand DNA cleavage. Biochem Biophys Res Commun. 2005, 334 (4): 1191-1197. 10.1016/j.bbrc.2005.07.021.CrossRefPubMedPubMedCentralGoogle Scholar
- 25.Hockemeyer D, Soldner F, Beard C, Gao Q, Mitalipova M, DeKelver RC, Katibah GE, Amora R, Boydston EA, Zeitler B, Meng X, Miller JC, Zhang L, Rebar EJ, Gregory PD, Urnov FD, Jaenisch R: Efficient targeting of expressed and silent genes in human ESCs and iPSCs using zinc-finger nucleases. Nat Biotechnol. 2009, 27 (9): 851-857. 10.1038/nbt.1562.CrossRefPubMedPubMedCentralGoogle Scholar
- 26.Lombardo A, Genovese P, Beausejour CM, Colleoni S, Lee YL, Kim KA, Ando D, Urnov FD, Galli C, Gregory PD, Holmes MC, Naldini L: Gene editing in human stem cells using zinc finger nucleases and integrase-defective lentiviral vector delivery. Nat Biotechnol. 2007, 25 (11): 1298-1306. 10.1038/nbt1353.CrossRefPubMedGoogle Scholar
- 34.Ramirez CL, Foley JE, Wright DA, Muller-Lerch F, Rahman SH, Cornu TI, Winfrey RJ, Sander JD, Fu F, Townsend JA, Cathomen T, Voytas DF, Joung JK: Unexpected failure rates for modular assembly of engineered zinc fingers. Nat Methods. 2008, 5 (5): 374-375. 10.1038/nmeth0508-374.CrossRefPubMedGoogle Scholar
- 38.Sander JD, Zaback P, Joung JK, Voytas DF, Dobbs D: Zinc Finger Targeter (ZiFiT): an engineered zinc finger/target site design tool. Nucleic Acids Res. 2007, W599-605. 10.1093/nar/gkm349. 35 Web ServerGoogle Scholar
- 39.Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E: The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002, 12 (10): 1611-1618. 10.1101/gr.361602.CrossRefPubMedPubMedCentralGoogle Scholar
- 40.Fu F, Sander JD, Maeder M, Thibodeau-Beganny S, Joung JK, Dobbs D, Miller L, Voytas DF: Zinc Finger Database (ZiFDB): a repository for information on C2H2 zinc fingers and engineered zinc-finger arrays. Nucleic Acids Res. 2009, D279-283. 10.1093/nar/gkn606. 37 DatabaseGoogle Scholar
- 42.Sander JD, Reyon D, Maeder ML, Foley JE, Thibodeau-Beganny S, Li X, Regan MR, Dahlborg EJ, Goodwin MJ, Fu F, Voytas DF, Joung JK, Dobbs D: Predicting success of oligomerized pool engineering (OPEN) for zinc finger target site sequences. BMC Bioinformatics. 2010, 11: 543-10.1186/1471-2105-11-543.CrossRefPubMedPubMedCentralGoogle Scholar
- 43.Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, Radenbaugh A, Singh S, Swing V, Tissier C, Zhang P, Huala E: The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 2008, D1009-1014. 36 DatabaseGoogle Scholar
- 44.Tweedie S, Ashburner M, Falls K, Leyland P, McQuilton P, Marygold S, Millburn G, Osumi-Sutherland D, Schroeder A, Seal R, Zhang H: FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res. 2009, D555-559. 10.1093/nar/gkn788. 37 DatabaseGoogle Scholar
- 46.Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006, D108-110. 10.1093/nar/gkj143. 34 DatabaseGoogle Scholar
- 47.Bryne JC, Valen E, Tang MH, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A: JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 2008, D102-106. 36 DatabaseGoogle Scholar
- 50.Mandell JG, Barbas CF: Zinc Finger Tools: custom DNA-binding domains for transcription factors and nucleases. Nucleic Acids Res. 2006, W516-523. 10.1093/nar/gkl209. 34 Web ServerGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.