Soybean transcription factor ORFeome associated with drought resistance: a valuable resource to accelerate research on abiotic stress resistance
Whole genome sequencing provides the most comprehensive collection of an organism’s genetic information. The availability of complete genome sequences is expected to dramatically deliver a high impact on biology. However, to achieve this impact in the area of crop improvement, significant efforts are still required on functional genomics, including the areas of gene annotation, cloning, expression profiling, and functional validation.
Here we report our efforts in generating the first transcription factor (TF) open reading frame (ORF)eome resource associated with drought resistance in soybean (Glycine max), a major oil/protein crop grown worldwide. This study provides a highly annotated soybean TF-ORFeome associated with drought resistance. It contains information from experimentally verified protein-coding sequences (CDS), expression profiling under several abiotic stresses (drought, salinity, dehydration and ABA), and computationally predicted protein subcellular localization and cis-regulatory elements (CREs) analysis. All the information is available to plant researchers through a freely accessible and user-friendly database, Soybean Knowledge Base (SoyKB).
The soybean TF-ORFeome provides a valuable public resource for functional genomics studies, especially in the area of plant abiotic stresses. It will accelerate findings in the areas of abiotic stresses and lead to the generation of crops with enhanced resistance to multiple stresses.
KeywordsSoybean Transcription factor ORFeome Abiotic stresses Drought Salinity Dehydration ABA cis-regulatory element
Binding to DRE
Ethylene-responsive element binding protein
Expressed sequence tag
Fragments per kilobase of exon per million fragments mapped
Inducer of CBF expression
Mylocytomatosis oncogene homolog 2
Open reading frame
Polymerase chain reaction
Soybean Knowledge Base
Whole genome sequencing
Whole genome sequencing (WGS) provides the most comprehensive collection of an organism’s genetic information. Large-scale genome sequencing is expected to change the way in which biology has traditionally been conducted. The ever-decreasing cost of sequencing is moving towards a new era in plant genetic and genomic studies. By taking advantage of large data acquisition platforms, genomes from more than 40 plants of agronomical importance have been sequenced so far . However, to achieve this promise of WGS in research focused on crop improvement, significant efforts are required in functional genomics that include gene annotation, cloning, expression, and further functional analysis.
Knowledge of gene sequences and of the deduced protein sequences is very important in determining protein functions. In this process, large genomic resources such as expressed sequence tag (EST) databases, full-length complimentary DNA (cDNA) libraries, and open reading frame (ORF) collections (ORFeome) have played important roles. Although EST databases and computational predictions are useful, the EST databases usually provide only partial transcribed sequences that could be misleading, while the automated computational predication are not fully accurate . Full-length cDNA libraries contain full-ORFs plus 5′ and 3′ un-translated regions (UTRs), which will allow massive functional screening in various fields of biology. However, the drawback of cDNA libraries has become obvious due to the interference of 5′- and 3′- UTRs, and low coverage of cDNA libraries for total gene transcripts . ORFeome collections not only overcome the problems mentioned above, but also have additional advantages. By using gene-specific primers, genuine full ORFs can be obtained, which assure high coverage and no interference of 5′- or 3′- UTRs. The recombination-based cloning techniques including Gateway cloning , have revolutionized the ways of conventional “cut-and-paste” techniques, and greatly expedited high-throughput gene cloning. Furthermore, access to the ORF cDNA clones would facilitate various functional studies of genes and corresponding proteins by transferring ORFs via LR reactions from Entry clones into Gateway-compatible expression vectors . ORFeome resources have been successfully applied in genome annotation, genome-wide protein localization, metabolic structure studies, proteomics, comparative functional genomics, global mapping protein-protein interaction and DNA-protein interactions [6-11]. However, despite all the achievements made so far by plant scientists in building various ORFeomic resources, most existing ORFeomes are too general. This leads to a situation wherein researchers working in a specific area (e.g., drought research) have to spend a significant amount of time finding information in their area of interest.
Soybean (Glycine max) is the most important cash crop widely grown for its high protein and oil content, beneficial phytochemicals, and production as biodiesel. However, its growth and grain yield are highly affected by soil water availability. Drought stresses have caused significant yield losses worldwide [12, 13]. Plants respond and adapt to drought stress conditions with an array of molecular, biochemical, and physiological alterations. Despite the fact that the soybean’s entire genome was sequenced several years ago , the exact transcript structures of the majority of its protein-coding genes remain experimentally unverified. As such, there is an urgent need in the soybean community for ORFeome clones of protein-coding genes. Since TFs are master regulators in controlling many, if not all, of the biological processes such as development, growth, cell division, and responses to environmental stimuli, our efforts in this study are focused on generating the first transcription factor (TF) ORFeome resource associated with drought resistance in soybean. The soybean TF-ORFeome related information has been deposited in the Soybean Knowledge Base (SoyKB) [15-17] and is available to the global research community for comprehensive functional characterization. This will greatly accelerate findings in the area of drought resistance research.
Results and discussion
Soybean TF selection and cloning
Sequence analysis of cloned TFs
As expected, most (90.3 %) of the soybean TF ORFeome clones matched the gene annotation in the public database Phytozome (v 9.1)  based on sequencing results. For clones showing sequence differences, two independent RT-PCRs were performed to make certain that the sequence differences were not caused by errors during RT and/or PCR. At least two clones for each ORFeome were used for sequence verification. However, our sequence analysis revealed differences in 20 clones, 9.7 % of total TF-ORFs cloned in this study (Additional file 3). qRT-PCR analysis of expression changes of these genes upon mild drought stress treatment were conducted (Additional file 2B), and the results showed that most of them were positively regulated by the stress at the transcriptional level. The sequence differences might be due to alternative splicing, nucleotide replacement, insertion or deletion.
There are several possibilities for the sequence discrepancy in this study. Nearly 75 % of the soybean genes have paralogs, which were probably caused by two whole-genome duplication events that occurred between 59 and 13 million years ago, respectively . Aligning the discrepant sequences back to the soybean genome excluded the possibility that they are one copy of the many duplicated genes, although it is still possible that the duplicated genes are located in the un-sequenced gaps. Another cause of sequence differences of the ORFs might be due to the genomic heterogeneity of Williams 82, which led to the intra-cultivar variations among individuals . However, there is little chance of error from RT-PCR or sequencing due to the stringent conditions set for these processes and the use of multiple clones for sequence verification, as stated above.
Expression profiles of selected TFs from ORFeome collection under drought, dehydration, salt and ABA conditions
Analysis of gene expression in different tissues and under different conditions is a useful way to predict gene functions. By searching the available whole genome profiling data, gene expression profiles of the TFs in 7 soybean tissues/organs (Additional file 4, data are from ) and under water deficit conditions (Additional file 1) were collected. Both the tissue expression patterns and the expression fold changes under water deficit conditions revealed a large amount of variation among different TFs, suggesting their diverse functions during soybean growth, development and adaption to water deficit conditions.
Discovery of cis-regulatory elements (CREs) in soybean TF promoters
Although other alternative mechanisms of gene expression regulation exist, the control of gene transcription via CREs in promoters is still a primary mode of gene expression regulation. Our interest in abiotic stress prompted us to investigate abiotic stress responsive CREs, which may be bound and regulated by other TFs, in the genetic up-stream regions in our soybean TF-ORFeome collection. A total of 21 CREs responsive to abiotic stresses were identified among 200 TF promoters (Additional file 3). However, over-representation analysis did not show any of these CREs significantly enriched in the 1 kb promoters of the 200 TFs.
Integration of TF-ORFeome resource into SoyKB website
The TF-ORFeome data has been incorporated into SoyKB [15-17]. The data can be directly accessed via the URL  after registration. The genes have been linked to the gene card pages (Additional file 5A, B), where users can access other relevant genomic information (Additional file 5C), and multi-omics expression datasets (Additional file 5D, E) available in SoyKB. The motif locations can also be browsed in tabular format or using the graphical visualization Motif Viewer tool. All the results can be downloaded as a CSV file.
Subcellular localization prediction of cloned TF-ORFs
Protein subcellular localizations are closely linked to their biological functions, and precisely predicting protein subcellular localizations is important for gene function prediction and genome annotation. To maximize the prediction accuracy, results were derived from adopting several publicly available tools [36-39] and carefully analyzed, compared, and combined. Consistent with the putative function of cloned genes in this study as TFs, most of them were predicted to reside in the nucleus (Additional file 3). However, Glyma13g27280.1 was predicted to be localized in the nucleus or chloroplast. Multiple subcellular localizations or altered subcellular localization of proteins are believed to be associated with multiple or altered functions, which have been observed in both mammals and plants [40-44]. Several lines of evidence also showed that nucleus encoded TFs might regulate gene expression, directly or indirectly, in other organelles such as mitochondria and chloroplasts [40, 44]. Furthermore, with the aid of another protein, a TF is able to shuttle dynamically between the nucleus and cytoplasm . It is, therefore, possible that Glyma13g27280.1 functions in both of the organelles. However, experimental investigation is needed for validation of such an assumption.
Application of soybean TF-ORFeome resources to stresses studies
Since the results presented here are from various comprehensive analyses, plant biologists, especially researchers in the field of abiotic stresses, may find our genomic resources very informative in their search for candidate genes as a starting point. Two examples are given below to demonstrate what function a certain soybean TF may have by putting all data together. Glyma06g17420, one TF from our ORFeome collection, is annotated as a member of the bHLH superfamily, of which 393 members have been in-silico characterized in the soybean genome but until now, none have been functionally characterized in terms of drought resistance . Its subcellular localization in the nucleus suggested it might function as a TF (Additional file 3). Its expression was highly up-regulated in shoots upon drought and ABA treatments (Figs. 2 and 6), indicating a role in responding to drought and probably through an ABA dependent pathway. Since it has little similarity with well characterized MYC2 or ICE1, which are positive regulators of drought tolerance [47, 48], exploring the possible novel function of Glyma06g17420 might be interesting.
NAC is one of the largest plant-specific gene families with 152 genes in soybean, and 58 of them are putative stress-responsive genes . Ectopic expression of several of these stress-responsive genes in Arabidopsis enhanced resistance to salinity and freezing . According to our qRT-PCR analysis, Glyma13g35550 (GmNAC101) was highly up-regulated by drought, dehydration, salt and ABA. Recent studies reported that higher expressions of this gene were detected in both shoots and roots of the drought-tolerant cultivar DT51 in comparison with the drought-sensitive cultivar MTD720 under drought conditions [51, 52]. More interestingly, a total of 10 CRE motifs were identified within its 1 kb promoter sequence, indicating that this gene is under complex regulations. All of this evidence suggests that Glyma13g35550 is a potential candidate for in-depth investigation.
The soybean TF-ORFeome provides a valuable public resource for functional genomics studies, especially in the area of plant abiotic stresses, and will facilitate accelerating the findings in the area of abiotic stresses and in generating crops with enhanced resistance to multiple stresses.
Plant growth, treatments, and tissue collections
Soybean (cv. Williams 82) seedlings were grown in 4-gallon pots containing a mixture of turface and sand (3:1) under the same growth chamber conditions . Drought treatments were initiated by withholding water at the VC stage (stage that cotyledons and unifoliates are fully expanded), while water was provided daily to the well-watered control seedlings. The water potentials for mild and moderate drought were −7 bar and −13 bar, respectively. Dehydration and salt treatments were conducted as previously described . For ABA treatments, two-week-old seedlings were irrigated and sprayed with 200 μM ABA (or a mock solution without ABA as control) and incubated for certain period of times (0.5, 1, 3, and 5 h). After treatment, tissues were harvested and frozen immediately in liquid nitrogen and stored at −80 °C. All samples were collected in biological triplicates.
RNA isolation and qRT-PCR
Total RNA isolation and qRT-PCR were carried out as described previously . Three biological and two technical replications were conducted in all the qPCR experiments. Gene-specific primers (Additional file 6) for qRT-PCR were designed using Primer3 (version 0.4.0) . The efficacy of primers for qRT-PCR was tested and desirable results were obtained. Soybean Ubiquitin3 gene (Glyma20g27950.1) was used as an internal control for all qRT-PCR analysis.
Soybean TF-ORF gene cloning
PCR was performed using Phusion high-fidelity DNA polymerase (Thermo Scientific, Pittsburgh PA, USA). PCR products were purified with a gel extraction kit (Epoch Life Sciences, Sugar Land, TX, USA), cloned into pENTR™/D-TOPO® vector or pDONR™/Zeo vector (Invitrogen, Carlsbad, CA, USA), and verified by sequencing using M13 forward and reverse primers, and additional gene specific primers if necessary. Primers were designed based on sequence information obtained from the Phytozome (v. 9.1) .
TF promoter putative CRE analysis
One thousand base pairs (bps) of the TF promoter sequences retrieved from Phytozome (version 9.1) were subjected to CRE analysis through DNA Pattern Search  by referring to the literature [56, 57] and the Stress Responsive Transcription Factor Database (STIFDB) .
TF subcellular localization prediction
Deduced TF protein sequences from experimentally verified ORF sequences were used for predicting TF proteins’ subcellular localization by adopting on-line tools, including WoLF PSORT , PlantLoc , Cell-PLoc , and Euk-mPLoc2.0 .
Availability of supporting data
The data sets supporting the results of this article are included within the article and its additional files.
We would like to thank Dr. Scott Jackson (University of Georgia, USA) for helpful discussion on the source of soybean ORF sequence discrepancy. We thank Theresa Musket for carefully editing this manuscript. We thank Jiaojiao Wang for adding the data to SoyKB. We thank the two anonymous reviewers for their constructive comments, which helped us to improve the manuscript. Funding support from United Soybean Board Grant number: 1204 (High-Impact Research for Soybean Improvement Using Genetics and Genomics,) is gratefully appreciated.
- 17.Soybean Knowledge Base (SoyKB). [http://soykb.org].
- 24.Phytozome (v 9.1). [http://www.phytozome.net/].
- 27.Urbaniak GC, Plous S. Research Randomizer (Version 4.0). http://www.randomizer.org/. Retrieved on 22 June 2013.
- 28.Valliyodan B, Nguyen HT. Genomics of Abiotic Stress in Soybean. In: Genetics and Genomics of Soybean. Edited by Stacey G, vol. 2: Springer; 2008. p. 343–72.Google Scholar
- 35.Chai C, Wang Y, Joshi T, Valliyodan B, Prince S, Michel L, et al. Soybean transcription factor ORFeome associated with drought. [http://soykb.org/TF_ORFeome/].
- 49.Le DT, Nishiyama R, Watanabe Y, Mochida K, Yamaguchi-Shinozaki K, Shinozaki K, et al. Genome-wide survey and expression analysis of the plant-specific NAC transcription factor family insoybean during development and dehydration stress. DNA Res. 2011;18(4):263–76.PubMedCentralCrossRefPubMedGoogle Scholar
- 54.Primer3 (version 0.4.0). [http://frodo.wi.mit.edu/].
- 55.DNA Pattern Search. [http://www.geneinfinity.org/sms/sms_DNApatterns.html].
- 58.Stress Responsive Transcription Factor Database (STIFDB, V2.0). [http://caps.ncbs.res.in/stifdb].
- 59.Yamaguchi M, Valliyodan B, Zhang J, Lenoble ME, Yu O, Rogers EE, et al. Regulation of growth response to water stress in the soybean primary root. I. Proteomic analysis reveals region-specific regulation of phenylpropanoid metabolism and control of free iron in the elongation zone. Plant Cell Environ. 2010;33(2):223–43.CrossRefPubMedGoogle Scholar
- 60.BAR HeatMapper Plus Tool. [http://bar.utoronto.ca/ntools/cgi-bin/ntools_heatmapper_plus.cgi].
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.