Advertisement

GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data

  • Jens KeilwagenEmail author
  • Frank Hartung
  • Jan Grau
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1962)

Abstract

GeMoMa is a homology-based gene prediction program that predicts gene models in target species based on gene models in evolutionary related reference species. GeMoMa utilizes amino acid sequence conservation, intron position conservation, and RNA-seq data to accurately predict protein-coding transcripts. Furthermore, GeMoMa supports the combination of predictions based on several reference species allowing to transfer high-quality annotation of different reference species to a target species. Here, we present a detailed description of GeMoMa modules and the GeMoMa pipeline and how they can be used on the command line to address particular biological problems.

Key words

Gene prediction Homology Intron position conservation RNA-seq Open-source 

References

  1. 1.
    Hoff KJ , Stanke M (2015) Current methods for automated annotation of protein-coding genes. Curr Opin Insect Sci 7:8–14.  https://doi.org/10.1016/j.cois.2015.02.008. ISSN 2214-5745CrossRefGoogle Scholar
  2. 2.
    Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinf 12(1):491.  https://doi.org/10.1186/1471-2105-12-491. ISSN 1471-2105
  3. 3.
    Hartung F, Blattner FR, Puchta H (2002) Intron gain and loss in the evolution of the conserved eukaryotic recombination machinery. Nucleic Acids Res 30(23):5175–5181.  https://doi.org/10.1093/nar/gkf649 CrossRefGoogle Scholar
  4. 4.
    Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F (2016) Using intron position conservation for homology-based gene prediction. Nucleic Acids Res 44(9):e89.  https://doi.org/10.1093/nar/gkw092 CrossRefGoogle Scholar
  5. 5.
    Fedorov A, Merican AF, Gilbert W (2002) Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Natl Acad Sci U S A 99(25):16128–16133.  https://doi.org/10.1073/pnas.242624899 CrossRefGoogle Scholar
  6. 6.
    Hartung F, Suer S, Bergmann T, Puchta H (2006) The role of AtMUS81 in DNA repair and its genetic interaction with the helicase AtRecQ4A. Nucleic Acids Res 34(16):4438–4448.  https://doi.org/10.1093/nar/gkl576 CrossRefGoogle Scholar
  7. 7.
    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410.  https://doi.org/10.1016/S0022-2836(05)80360-2. ISSN 0022-2836CrossRefGoogle Scholar
  8. 8.
    She R, Chu JS-C, Uyar B, Wang J, Wang K, Chen N (2011) genBlastG: using BLAST searches to build homologous gene models. Bioinformatics 27(15):2141–2143.  https://doi.org/10.1093/bioinformatics/btr342 CrossRefGoogle Scholar
  9. 9.
    Slater G, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinf 6(1):31.  https://doi.org/10.1186/1471-2105-6-31. ISSN 1471-2105CrossRefGoogle Scholar
  10. 10.
    Testa AC, Hane JK, Ellwood SR, Oliver RP (2015) CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts. BMC Genomics 16(1):170.  https://doi.org/10.1186/s12864-015-1344-4. ISSN 1471–2164
  11. 11.
    Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M (2016) BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32(5):767.  https://doi.org/10.1093/bioinformatics/btv661 CrossRefGoogle Scholar
  12. 12.
    Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J (2018) Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC Bioinf 19(1):189.  https://doi.org/10.1186/s12859-018-2203-5. ISSN 1471-2105
  13. 13.
    Grau J, Keilwagen J, Gohr A, Haldemann B, Posch S, Grosse I (2012) Jstacs: a Java framework for statistical analysis and classification of biological sequences. J Mach Learn Res 13(June):S. 1967–1971Google Scholar
  14. 14.
    Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36CrossRefGoogle Scholar
  15. 15.
    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15.  https://doi.org/10.1093/bioinformatics/bts635 CrossRefGoogle Scholar
  16. 16.
    Song Li, Shankar DS, Florea L (2016) Rascaf: improving genome assembly with RNA sequencing data. Plant Genome 9(3)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute for Biosafety in Plant BiotechnologyJulius Kühn-Institut (JKI), Federal Research Centre for Cultivated PlantsQuedlinburgGermany
  2. 2.Institute of Computer ScienceMartin Luther University Halle-WittenbergHalle (Saale)Germany

Personalised recommendations