CRM Discovery Beyond Model Insects

  • Majid KazemianEmail author
  • Marc S. HalfonEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1858)


Although the number of sequenced insect genomes numbers in the hundreds, little is known about gene regulatory sequences in any species other than the well-studied Drosophila melanogaster. We provide here a detailed protocol for using SCRMshaw, a computational method for predicting cis-regulatory modules (CRMs, also “enhancers”) in sequenced insect genomes. SCRMshaw is effective for CRM discovery throughout the range of holometabolous insects and potentially in even more diverged species, with true-positive prediction rates of 75% or better. Minimal requirements for using SCRMshaw are a genome sequence and training data in the form of known Drosophila CRMs; a comprehensive set of the latter can be obtained from the SCRMshaw download site. For basic applications, a user with only modest computational know-how can run SCRMshaw on a desktop computer. SCRMshaw can be run with a single, narrow set of training data to predict CRMs regulating a specific pattern of gene expression, or with multiple sets of training data covering a broad range of CRM activities to provide an initial rough regulatory annotation of a complete, newly-sequenced genome.

Key words

Non-model insects Regulatory genomics Transcriptional gene regulation Genome annotation Enhancer prediction 



We thank Kushal Suryamohan for comments on the manuscript. This work was supported by USDA grant 2012-67013-19361 (MSH) and NIH grant 5K22HL125593-02 (MK).


  1. 1.
    i5k Consortium (2013) The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J Hered 104:595–600CrossRefGoogle Scholar
  2. 2.
    Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491CrossRefGoogle Scholar
  3. 3.
    Ekblom R, Wolf JB (2014) A field guide to whole-genome sequencing, assembly and annotation. Evol Appl 7:1026–1042CrossRefGoogle Scholar
  4. 4.
    Yandell M, Ence D (2012) A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13:329–342CrossRefGoogle Scholar
  5. 5.
    Suryamohan K, Halfon M (2015) Insect regulatory genomics. In: Raman C et al (eds) Short views on insect genomics and proteomics. Springer International Publishing, pp 119–155Google Scholar
  6. 6.
    Cho, K.W. (2012) Enhancers. Wiley interdisciplinary reviews developmental biology, vol. 1, pp 469–478CrossRefGoogle Scholar
  7. 7.
    Long HK et al (2016) Ever-changing landscapes: transcriptional enhancers in development and evolution. Cell 167:1170–1187CrossRefGoogle Scholar
  8. 8.
    Shlyueva D et al (2014) Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet 15:272–286CrossRefGoogle Scholar
  9. 9.
    Smith E, Shilatifard A (2014) Enhancer biology and enhanceropathies. Nat Struct Mol Biol 21:210–219CrossRefGoogle Scholar
  10. 10.
    Vernimmen D, Bickmore WA (2015) The hierarchy of transcriptional activation: from enhancer to promoter. Trends Genet 31:696–708CrossRefGoogle Scholar
  11. 11.
    Buffry AD et al (2016) The functionality and evolution of eukaryotic transcriptional enhancers. Adv Genet 96:143–206PubMedGoogle Scholar
  12. 12.
    Suryamohan K, Halfon MS (2015) Identifying transcriptional cis-regulatory modules in animal genomes. Wiley Interdiscip Rev Dev Biol 4:59–84CrossRefGoogle Scholar
  13. 13.
    Li Y et al (2015) The identification of cis-regulatory elements: a review from a machine learning perspective. Biosystems 138:6–17CrossRefGoogle Scholar
  14. 14.
    Murakawa Y et al (2016) Enhanced identification of transcriptional enhancers provides mechanistic insights into diseases. Trends Genet 32:76–88CrossRefGoogle Scholar
  15. 15.
    modENCODE Consortium et al (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330:1787–1797CrossRefGoogle Scholar
  16. 16.
    Gallo SM et al (2011) REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Res 39:D118–D123CrossRefGoogle Scholar
  17. 17.
    Kantorovitz MR et al (2009) Motif-blind, genome-wide discovery of cis-regulatory modules in Drosophila and mouse. Dev Cell 17:568–579CrossRefGoogle Scholar
  18. 18.
    Kazemian M et al (2011) Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison. Nucleic Acids Res 39:9463–9472CrossRefGoogle Scholar
  19. 19.
    Kazemian M et al (2014) Evidence for deep regulatory similarities in early developmental programs across highly diverged insects. Genome Biol Evol 6:2301–2320CrossRefGoogle Scholar
  20. 20.
    Suryamohan K et al (2016) Redeployment of a conserved gene regulatory network during Aedes aegypti development. Dev Biol 416:402–413CrossRefGoogle Scholar
  21. 21.
    Stein, L. (2013) Generic Feature Format Version 3 (GFF3).
  22. 22.
    Gramates LS et al (2017) FlyBase at 25: looking to the future. Nucleic Acids Res 45:D663–D671CrossRefGoogle Scholar
  23. 23.
    Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580CrossRefGoogle Scholar
  24. 24.
    Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842CrossRefGoogle Scholar
  25. 25.
    Kent WJ et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006CrossRefGoogle Scholar
  26. 26.
    Zdobnov EM et al (2017) OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Res 45:D744–D749CrossRefGoogle Scholar
  27. 27.
    Sonnhammer EL, Ostlund G (2015) InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res 43:D234–D239CrossRefGoogle Scholar
  28. 28.
    Huerta-Cepas J et al (2016) eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44:D286–D293CrossRefGoogle Scholar
  29. 29.
    Suryamohan, K. (2016) PhD Thesis: Regulatory networks in development: understanding the role of cis-regulatory modules in Gene Regulatory Network evolution. Department of Biochemistry, University at Buffalo-State University of New YorkGoogle Scholar
  30. 30.
    Yang W, Sinha S (2017) A novel method for predicting activity of cis-regulatory modules, based on a diverse training set. Bioinformatics 33:1–7CrossRefGoogle Scholar
  31. 31.
    Barolo S (2012) Shadow enhancers: frequently asked questions about distributed cis-regulatory information and enhancer redundancy. Bioessays 34:135–141CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Departments of Biochemistry and Computer SciencePurdue UniversityWest LafayetteUSA
  2. 2.Departments of Biochemistry, Biomedical Informatics, and Biological SciencesUniversity at Buffalo-State University of New YorkBuffaloUSA
  3. 3.NY State Center of Excellence in Bioinformatics and Life SciencesBuffaloUSA
  4. 4.Department of Molecular and Cellular Biology and Program in Cancer GeneticsRoswell Park Comprehensive Cancer CenterBuffaloUSA

Personalised recommendations