Advertisement

Plant Molecular Biology

, Volume 57, Issue 3, pp 445–460 | Cite as

Evaluation of five ab initio gene prediction programs for the discovery of maize genes

  • Hong Yao
  • Ling Guo
  • Yan Fu
  • Lisa A. Borsuk
  • Tsui-Jung Wen
  • David S. Skibbe
  • Xiangqin Cui
  • Brian E. Scheffler
  • Jun Cao
  • Scott J. Emrich
  • Daniel A. Ashlock
  • Patrick S. Schnable
Article

Abstract

Five ab initio programs (FGENESH, GeneMark.hmm, GENSCAN, GlimmerR and Grail) were evaluated for their accuracy in predicting maize genes. Two of these programs, GeneMark.hmm and GENSCAN had been trained for maize; FGENESH had been trained for monocots (including maize), and the others had been trained for rice or Arabidopsis. Initial evaluations were conducted using eight maize genes (gl8a, pdc2, pdc3, rf2c, rf2d, rf2e1, rth1, and rth3) of which the sequences were not released to the public prior to conducting this evaluation. The significant advantage of this data set for this evaluation is that these genes could not have been included in the training sets of the prediction programs. FGENESH yielded the most accurate and GeneMark.hmm the second most accurate predictions. The five programs were used in conjunction with RT-PCR to identify and establish the structures of two new genes in the a1-sh2 interval of the maize genome. FGENESH, GeneMark.hmm and GENSCAN were tested on a larger data set consisting of maize assembled genomic islands (MAGIs) that had been aligned to ESTs. FGENESH, GeneMark.hmm and GENSCAN correctly predicted gene models in 773, 625, and 371 MAGIs, respectively, out of the 1353 MAGIs that comprise data set 2.

Keywords

abinitio gene prediction a1-sh2 interval maize maize assembled genomic islands maize genomic survey sequences 

Abbreviations

AE

actual exon

CC

correlation coefficient

FN

false negative

FP

false positive

GSSs

genome survey sequences

HC

high C o t

MAGIs

maize assembled genomic islands

ME

missing exon

MF

methylation filtration

OE

overlapped exon

PE

partial exon

RACE

Rapid Amplification of cDNA Ends

SN

sensitivity

SP

specificity

TE

true exon

TP

true positive

WE

wrong exon

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, J.E., Pertea, M., Salzberg, S.L. 2004Computational gene prediction using multiple sources of evidenceGenome Res.14142148Google Scholar
  2. Bennetzen, J.L., Chandler, V.L., Schnable, P.S. 2001National Science Foundation-sponsored workshop report. Maize genome sequencing projectPlant Physiol.12715721578Google Scholar
  3. Brendel, V., Xing, L., Zhu, W. 2004Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locusBioinformatics2011571169Google Scholar
  4. Burge, C., Karlin, S. 1997Prediction of complete gene structures in human genomic DNAJ. Mol. Biol.2687894Google Scholar
  5. Burge, C., Karlin, S. 1998Finding the genes in genomic DNACurr. Opin. Struct. Biol.8346354Google Scholar
  6. Burset, M., Guigo, R. 1996Evaluation of gene structure prediction programsGenomics34353367Google Scholar
  7. Burset, M., Seledtsov, I.A., Solovyev, V.V. 2000Analysis of canonical and non-canonical splice sites in mammalian genomesNucleic Acids Res.2843644375Google Scholar
  8. Burset, M., Seledtsov, I.A., Solovyev, V.V. 2001SpliceDB: database of canonical and non-canonical mammalian splice sitesNucleic Acids Res.29255259Google Scholar
  9. Chen, M., Bennetzen, J.L. 1996Sequence composition and organization in the Sh2/A1-homologous region of ricePlant Mol. Biol.329991001Google Scholar
  10. Chen, M., SanMiguel, P., Bennetzen, J.L. 1998Sequence organization and conservation in sh2/a1-homologous regions of sorghum and riceGenetics148435443Google Scholar
  11. Civardi, L., Xia, Y., Edwards, K.J., Schnable, P.S., Nikolau, B.J. 1994The relationship between genetic and physical distances in the cloned a1-sh2 interval of the Zea mays L. genomeProc. Natl. Acad. Sci. USA9182688272Google Scholar
  12. Emrich, S.J., Aluru, S., Fu, Y., Wen, T.-J., Narayanan, M., Guo, L., Ashlock, D.A., Schnable, P.S. 2004Astrategy for assembling the maize (Zea mays L.) genomeBioinformatics20140147Google Scholar
  13. Goff, S.A.,  et al. 2002A draft sequence of the rice genome (Oryza sativa L. ssp. japonica)Science29692100CrossRefPubMedGoogle Scholar
  14. Hebsgaard, S.M., Korning, P.G., Tolstrup, N., Engelbrecht, J., Rouze, P., Brunak, S. 1996Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence informationNucleic Acids Res.2434393452Google Scholar
  15. Kolmogorov, A.N. 1933Sulla determinazione empirica di una legge di distribuzioneGiornale dell’ Istituto Italiano degli Attuari48391Google Scholar
  16. Korf, I. 2004Gene finding in novel genomesBMC Bioinformatics559Google Scholar
  17. Korf, I.P., Flicek, D.D., Brent, M.R. 2001Integrating genomic homology into gene structure predictionBioinformatics17140148Google Scholar
  18. Lukashin, A.V., Borodovsky, M. 1998GeneMark.hmm: new solutions for gene findingNucleic Acids Res.2611071115Google Scholar
  19. Mathé, C., Sagot, M.F., Schiex, T., Rouze, P. 2002Current methods of gene prediction, their strengths and weaknessesNucleic Acids Res.3041034117Google Scholar
  20. Moore, G. 2000Cereal chromosome structure, evolution, and pairingAnnu. Rev. Plant Physiol. Plant Mol. Biol.51195222Google Scholar
  21. Palmer, L.E., Rabinowicz, P.D., O’Shaughnessy, A.L., Balija, V.S., Nascimento, L.U., Dike, S., Bastide, M., Martienssen, R.A., McCombie, W.R. 2003Maize genome sequencing by methylation filtrationScience30221152117Google Scholar
  22. Pavy, N., Rombauts, S., Dehais, P., Mathe, C., Ramana, D.V., Leroy, P., Rouze, P. 1999Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequencesBioinformatics15887899Google Scholar
  23. Pertea, M., Lin, X., Salzberg, S.L. 2001GeneSplicer: a new computational method for splice site predictionNucleic Acids Res.2911851190Google Scholar
  24. Pertea, M., Salzberg, S.L. 2002Computational gene finding in plantsPlant Mol. Biol.483948Google Scholar
  25. Peterson, D.G., Wessler, S.R., Paterson, A.H. 2002Efficient capture of unique sequences from eukaryotic genomesTrends Genet.18547550Google Scholar
  26. Rabinowicz, P.D., Schutz, K., Dedhia, N., Yordan, C., Parnell, L.D., Stein, L., McCombie, W.R., Martienssen, R.A. 1999Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genomeNature Genet.23305308Google Scholar
  27. Rogic, S., Mackworth, A.K., Ouellette, F.B.F. 2001Evaluation of gene-finding programs on mammalian sequencesGenome Res.11817832Google Scholar
  28. Salamov, A.A., Solovyev, V.V. 2000Ab initio gene finding in Drosophila genomic DNAGenome Res.10516522Google Scholar
  29. Salzberg, S.L., Pertea, M., Delcher, A.L., Gardner, M.J., Tettelin, H. 1999Interpolated Markov models for eukaryotic gene findingGenomics592431Google Scholar
  30. SanMiguel, P., Tikhonov, A., Jin, Y.K., Motchoulskaia, N., Zakharov, D., Melake-Berhan, A., Springer, P.S., Edwards, K.J., Lee, M., Avramova, Z., Bennetzen, J.L. 1996Nested retrotransposons in the intergenic regions of the maize genomeScience274765768Google Scholar
  31. Smirnov, N.V. 1939Estimate of deviation between empirical distribution functions in two independent samplesBull. Moscow University2316Google Scholar
  32. Solovyev, V. 2001Statistical approaches in eukaryotic gene predictionBalding, D.J.Bishop, M.Cannings, C. eds. Handbook of Statistical GeneticsJohn Wiley & Sons LtdNew York83127Google Scholar
  33. Stormo, G.D. 2000Gene-finding approaches for eukaryotesGenome Res.10394397Google Scholar
  34. The Arabidopsis Genome Initiative 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815Google Scholar
  35. Tolstrup, N., Rouze, P., Brunak, S. 1997A branch point consensus from Arabidopsis found by non-circular analysis allows for better prediction of acceptor sitesNucleic Acids Res.2531593163Google Scholar
  36. Usuka, J., Brendel, V. 2000Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoringJ. Mol. Biol.29710751085Google Scholar
  37. Usuka, J., Zhu, W., Brendel, V. 2000Optimal spliced alignment of homologous cDNA to a genomic DNA templateBioinformatics16203211Google Scholar
  38. Wang, J., Li, S., Zhang, Y., Zheng, H., Xu, Z., Ye, J., Yu, J., Wong, G.K. 2003Vertebrate gene predictions and the problem of large genesNat. Rev. Genet.4741749Google Scholar
  39. Whitelaw, C.A., Barbazuk, W.B., Pertea, G., Chan, A.P., Cheung, F., Lee, Y., Zheng, L., Heeringen, S., Karamycheva, S., Bennetzen, J.L., SanMiguel, P., Lakey, N., Bedell, J., Yuan, Y., Budiman, M.A., Resnick, A., Van Aken, S., Utterback, T., Riedmuller, S., Williams, M., Feldblyum, T., Schubert, K., Beachy, R., Fraser, C.M., Quackenbush, J. 2003Enrichment of gene-coding sequences in maize by genome filtrationScience30221182120Google Scholar
  40. Xu, Y., Uberbacher, E.C. 1997Automated gene identification in large-scale genomic sequencesJ. Comput. Biol.4325338Google Scholar
  41. Yao, H., Zhou, Q., Li, J., Smith, H., Yandeau, M., Nikolau, B.J., Schnable, P.S. 2002Molecular characterization of meiotic recombination across the 140-kb multigenic a1-sh2 interval of maizeProc. Natl. Acad. Sci. USA9961576162Google Scholar
  42. Yu, J.,  et al. 2002A draft sequence of the rice genome (Oryza sativa L. ssp. indica)Science2967992CrossRefPubMedGoogle Scholar
  43. Yuan, Q., Quackenbush, J., Sultana, R., Pertea, M., Salzberg, S.L., Buell, C.R. 2001Rice bioinformatics. Analysis of rice sequence data and leveraging the data to other plant speciesPlant Physiol.12511661174Google Scholar
  44. Yuan, Y., SanMiguel, P.J., Bennetzen, J.L. 2003High-Cot sequence analysis of the maize genomePlant J.34249255Google Scholar
  45. Zhu, W., Schlueter, S.D., Brendel, V. 2003Refined annotation of the Arabidopsis genome by complete expressed sequence tag mappingPlant Physiol.132469484Google Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  • Hong Yao
    • 1
    • 4
  • Ling Guo
    • 1
    • 6
  • Yan Fu
    • 1
    • 4
  • Lisa A. Borsuk
    • 1
    • 6
  • Tsui-Jung Wen
    • 2
  • David S. Skibbe
    • 1
    • 5
  • Xiangqin Cui
    • 1
    • 4
    • 9
  • Brian E. Scheffler
    • 8
  • Jun Cao
    • 1
    • 4
  • Scott J. Emrich
    • 6
  • Daniel A. Ashlock
    • 3
    • 6
  • Patrick S. Schnable
    • 1
    • 2
    • 4
    • 5
    • 6
    • 7
  1. 1.Department of Genetics, Development, and Cell BiologyIowa State UniversityAmes
  2. 2.Department of AgronomyIowa State UniversityAmes
  3. 3.Department of MathematicsIowa State UniversityAmes
  4. 4.Inderdepartmental Graduate Programs in GeneticsIowa State UniversityAmes
  5. 5.Department of Molecular, Cellular and Developmental BiologyIowa State UniversityAmes
  6. 6.Department of Electrical and Computer Engineering and Department of Bioinformatics and Computational BiologyIowa State UniversityAmes
  7. 7.Center for Plant GenomicsIowa State UniversityAmes
  8. 8.Mid South Area Genomics FacilityUSDA-ARSStonevilleUSA
  9. 9.Department of BiostatisticsBirminghamUSA

Personalised recommendations