Integrated Modeling of Structural Genes Using MCuNovo

  • Xiaolong Cao
  • Haobo JiangEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1858)


Correct modeling of protein-coding genes based on genome and cDNA data is a prerequisite for functional studies. Various programs such as MAKER, Cufflinks, Oases, and Trinity have been developed, each with advantages and drawbacks. Manual integration of different models for a single gene is cumbersome and becomes a daunting task for 14,000–18,000 genes in a typical holometabolous insect. We developed methods to evaluate the output of MAKER, Cufflinks, Oases and Trinity and select the best models to constitute the MCOT1.0 set for Manduca sexta, a biochemical model insect. To apply these methods in other organisms, we improved the algorithm (designated MCuNovo Gene Selector) and automated the data processing. In this chapter, we describe background information of algorithm development and how to prepare and run this program.

Key words

Insect Genomics Transcriptome Gene modeling Python Arthropod 



This study is supported by NIH grants GM58634 and AI112662. This work was approved for publication by the Director of Oklahoma Agricultural Experimental Station and supported in part under project OKLO2450.


  1. 1.
    Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11(1):31–46CrossRefGoogle Scholar
  2. 2.
    Koboldt DC et al (2013) The next-generation sequencing revolution and its impact on genomics. Cell 155(1):27–38CrossRefGoogle Scholar
  3. 3.
    Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63CrossRefGoogle Scholar
  4. 4.
    Park PJ (2009) ChIP–seq: advantages and challenges of a maturing technology. Nat Rev Genet 10(10):669–680CrossRefGoogle Scholar
  5. 5.
    Yandell M, Ence D (2012) A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13(5):329–342CrossRefGoogle Scholar
  6. 6.
    Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491CrossRefGoogle Scholar
  7. 7.
    Trapnell C et al (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578CrossRefGoogle Scholar
  8. 8.
    Grabherr M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644–652CrossRefGoogle Scholar
  9. 9.
    Schulz M et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (Oxford, England) 28(8):1086–1092CrossRefGoogle Scholar
  10. 10.
    Cao X, Jiang H (2015) Integrated modeling of protein-coding genes in the Manduca sexta genome using RNA-Seq data from the biochemical model insect. Insect Biochem Mol Biol 62:2–10CrossRefGoogle Scholar
  11. 11.
    Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59CrossRefGoogle Scholar
  12. 12.
    Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19(Suppl 2):ii215–ii225CrossRefGoogle Scholar
  13. 13.
    Lomsadze A et al (2005) Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res 33(20):6494–6506CrossRefGoogle Scholar
  14. 14.
    Haas BJ et al (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9(1):1–22CrossRefGoogle Scholar
  15. 15.
    Brown JB et al (2014) Diversity and dynamics of the Drosophila transcriptome. Nature 512(7515):393–399CrossRefGoogle Scholar
  16. 16.
    Saha S et al (2017) Improved annotation of the insect vector of citrus greening disease: Biocuration by a diverse genomics community. Database 1–20Google Scholar
  17. 17.
    Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210–3212CrossRefGoogle Scholar
  18. 18.
    Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421CrossRefGoogle Scholar
  19. 19.
    Chang Z et al (2015) Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 16:30CrossRefGoogle Scholar
  20. 20.
    Hoff KJ et al (2016) BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32(5):767–769CrossRefGoogle Scholar
  21. 21.
    Pertea M et al (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11(9):1650–1667CrossRefGoogle Scholar
  22. 22.
    Liu J et al (2016) BinPacker: packing-based De Novo transcriptome assembly from RNA-seq data. PLoS Comput Biol 12(2):e1004772CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Entomology and Plant PathologyOklahoma State UniversityStillwaterUSA

Personalised recommendations