Correct modeling of protein-coding genes based on genome and cDNA data is a prerequisite for functional studies. Various programs such as MAKER, Cufflinks, Oases, and Trinity have been developed, each with advantages and drawbacks. Manual integration of different models for a single gene is cumbersome and becomes a daunting task for 14,000–18,000 genes in a typical holometabolous insect. We developed methods to evaluate the output of MAKER, Cufflinks, Oases and Trinity and select the best models to constitute the MCOT1.0 set for Manduca sexta, a biochemical model insect. To apply these methods in other organisms, we improved the algorithm (designated MCuNovo Gene Selector) and automated the data processing. In this chapter, we describe background information of algorithm development and how to prepare and run this program.
This is a preview of subscription content, log in to check access.
Springer Nature is developing a new tool to find and evaluate Protocols. Learn more
This study is supported by NIH grants GM58634 and AI112662. This work was approved for publication by the Director of Oklahoma Agricultural Experimental Station and supported in part under project OKLO2450.
Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11(1):31–46CrossRefGoogle Scholar
Koboldt DC et al (2013) The next-generation sequencing revolution and its impact on genomics. Cell 155(1):27–38CrossRefGoogle Scholar
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63CrossRefGoogle Scholar
Park PJ (2009) ChIP–seq: advantages and challenges of a maturing technology. Nat Rev Genet 10(10):669–680CrossRefGoogle Scholar
Yandell M, Ence D (2012) A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13(5):329–342CrossRefGoogle Scholar
Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491CrossRefGoogle Scholar
Trapnell C et al (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578CrossRefGoogle Scholar
Grabherr M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644–652CrossRefGoogle Scholar
Schulz M et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (Oxford, England) 28(8):1086–1092CrossRefGoogle Scholar
Cao X, Jiang H (2015) Integrated modeling of protein-coding genes in the Manduca sexta genome using RNA-Seq data from the biochemical model insect. Insect Biochem Mol Biol 62:2–10CrossRefGoogle Scholar