RetroSpect, a New Method of Measuring Gene Regulatory Evolution Rates Using Co-mapping of Genomic Functional Features with Transposable Elements

  • Daniil Nikitin
  • Maxim Sorokin
  • Victor Tkachev
  • Andrew Garazha
  • Alexander Markov
  • Anton BuzdinEmail author


Transposable elements (TEs) are selfish genetic sequences that proliferate in the host genomes by spreading their copies in new genomic locations. TEs reside in the genomes of all groups of living organisms. TE sequences may be recruited by the host cells to serve as regulatory sites for the neighboring genes. These regulatory sites can be transcription factor binding sites (TFBS), histone modification loci, DNase I hypersensitivity sites, etc. Insertion of a TE in a gene neighborhood changes an equilibrium of regulatory sequences controlling this gene functioning. The more regulatory sites can be identified within gene-proximate TEs, the faster should be the evolution of gene regulation. We proposed a method for measuring evolutionary rates of gene regulation based on relative quantitation of regulatory sites located within TEs next to gene transcriptional start sites. It allows interrogating regulatory evolution for organisms with TE-rich genomes. This method termed RetroSpect was applied first for studying human gene evolution using TFBS co-mapping with the human retroelements (REs). RE is a subgroup of TEs that was active in mammals before and after their radiation. We characterized human genes and molecular pathways either enriched or deficient in RE-linked TFBS regulation for 563 transcription factors in thirteen human cell lines. We found that major groups enriched by RE regulation deal with gene control by microRNAs, olfaction, color vision, fertilization, cellular immune response, amino acids and fatty acids metabolism and detoxication. The deficient groups were involved in protein translation, RNA transcription and processing, chromatin organization, and molecular signaling.


Genome evolution Gene regulation Human genetics Transcription factor binding sites Transposable elements Retrotransposons Molecular pathways ChIP-seq Omics approach in evolutionary biology 



We acknowledge Amazon and Microsoft Azure grants for cloud-based computations which helped us to complete this study. We thank Oncobox/OmicsWay research program in machine learning and digital oncology for providing access to software and pathway databases. The authors (A.B and M.S.) were supported by the Russian Science Foundation grant no. 18-15-00061.

Conflicts of Interests

The authors declare that they have no competing interests.


  1. Albert FW, Kruglyak L (2015) The role of regulatory variation in complex traits and disease. Nat Rev Genet 16(4):197–212. Scholar
  2. Aliper AM, Korzinkin MB, Kuzmina NB, Zenin AA, Venkova LS, Smirnov PY, Borisov NM (2017) Mathematical justification of expression-based pathway activation scoring (PAS). Methods Mol Biol 1613:31–51. Scholar
  3. Artemov A, Aliper A, Korzinkin M, Lezhnina K, Jellen L, Zhukov N, Buzdin A (2015) A method for predicting target drug efficiency in cancer based on the analysis of signaling pathway activation. Oncotarget 6(30):29347–29356. Scholar
  4. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Sherlock G (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29. Scholar
  5. Badge RM, Alisch RS, Moran JV (2003) ATLAS: a system to selectively identify human-specific L1 insertions. Am J Hum Genet 72(4):823–838. Scholar
  6. Barrio AM, Lagercrantz E, Sperber GO, Blomberg J, Bongcam-Rudloff E (2009) Annotation and visualization of endogenous retroviral sequences using the distributed annotation system (DAS) and eBioX. BMC Bioinf 10(Suppl 6):S18. Scholar
  7. BioCarta (2019) Available online: Cited 26 Mar 2019
  8. Boehm T, Swann JB (2014) Origin and evolution of adaptive immunity. Annu Rev Anim Biosci 2(1):259–283. Scholar
  9. Borisov N, Suntsova M, Sorokin M, Garazha A, Kovalchuk O, Aliper A, Buzdin A (2017) Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data. Cell Cycle 16(19):1810–1823. Scholar
  10. Borisov NM, Terekhanova NV, Aliper AM, Venkova LS, Smirnov PY, Roumiantsev S, Buzdin AA (2014) Signaling pathways activation profiles make better markers of cancer than expression of individual genes. Oncotarget 5(20):10198–10205. Scholar
  11. Burns KH, Boeke JD (2012) Human transposon tectonics. Cell 149(4):740–752. Scholar
  12. Buzdin AA, Prassolov V, Garazha AV (2017a) Friends-enemies: endogenous retroviruses are major transcriptional regulators of human DNA. Front Chem 5.
  13. Buzdin AA, Prassolov V, Zhavoronkov AA, Borisov NM (2017b) Bioinformatics meets biomedicine: OncoFinder, a quantitative approach for interrogating molecular pathways using gene expression data. Methods Mol Biol 1613:53–83. Scholar
  14. Caetano-Anollés G, Yafremava LS, Gee H, Caetano-Anollés D, Kim HS, Mittenthal JE (2009) The origin and evolution of modern metabolism. Int J Biochem Cell Biol 41(2):285–297. Scholar
  15. Cheatle Jarvela AM, Hinman VF (2015) Evolution of transcription factor function as a mechanism for changing metazoan developmental gene regulatory networks. Evodevo 6(1):3. Scholar
  16. Chuong EB, Elde NC, Feschotte C (2016) Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351(6277):1083–1087. Scholar
  17. Cordaux R, Batzer MA (2009) The impact of retrotransposons on human genome evolution. Nat Rev Genet 10(10):691–703. Scholar
  18. Danino YM, Even D, Ideses D, Juven-Gershon T (2015) The core promoter: at the heart of gene expression. Biochim Biophys Acta Gene Regul Mech 1849(8):1116–1131. Scholar
  19. DAVID (2019) DAVID functional annotation bioinformatics microarray analysis. Available online: Cited 26 Mar 2019
  20. Doucet-O’Hare TT, Sharma R, Rodić N, Anders RA, Burns KH, Kazazian HH (2016) Somatically acquired LINE-1 insertions in normal esophagus undergo clonal expansion in esophageal squamous cell carcinoma. Hum Mutat 37(9):942–954. Scholar
  21. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinf 10(1):48. Scholar
  22. ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74. Scholar
  23. ENCODE (2019a) ENCODE database, transcription factors. Available online: Cited 26 Mar 2019
  24. ENCODE Database, BWA Software (2019b) Available online: Cited 26 Mar 2019
  25. ENCODE ChIP-seq Analysis Pipeline (2019c) Available online: Cited 26 Mar 2019
  26. Feschotte C (2008) Transposable elements and the evolution of regulatory networks. Nat Rev Genet 9(5):397–405. Scholar
  27. Fox GE (2010) Origin and evolution of the ribosome. Cold Spring Harb Perspect Biol 2(9):a003483–a003483. Scholar
  28. Garazha A, Ivanova A, Suntsova M, Malakhova G, Roumiantsev S, Zhavoronkov A, Buzdin A (2015) New bioinformatic tool for quick identification of functionally relevant endogenous retroviral inserts in human genome. Cell Cycle 14(9):1476–1484. Scholar
  29. Giordano J, Ge Y, Gelfand Y, Abrusán G, Benson G, Warburton PE (2007) Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLoS Comput Biol 3(7):e137. Scholar
  30. GOrilla (2019) GOrilla—a tool for identifying enriched GO terms. Cited 26 Mar 2019
  31. Harris BHL, Barberis A, West CML, Buffa FM (2015) Gene expression signatures as biomarkers of tumour hypoxia. Clin Oncol 27(10):547–560. Scholar
  32. Hoeijmakers JHJ (2009) DNA damage, aging, and cancer. N Engl J Med 361(15):1475–1485. Scholar
  33. Huang DW, Sherman BT, Lempicki RA (2009a) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37(1):1–13. Scholar
  34. Huang DW, Sherman BT, Lempicki RA (2009b) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57. Scholar
  35. Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genome-wide mapping of in Vivo protein-DNA interactions. Science (80-)316(5830):1497–1502. Scholar
  36. Kapitonov VV, Jurka J (2008) A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet 9(5):411–412; author reply 414. Scholar
  37. Kato T, Iwamoto K (2014) Comprehensive DNA methylation and hydroxymethylation analysis in the human brain and its implication in mental disorders. Neuropharmacology 80:133–139. Scholar
  38. Kazazian HH Jr, Moran JV (2017) Mobile DNA in health and disease. N Engl J Med 377(4):361. Scholar
  39. KEGG (2019) Available online: Cited 26 Mar 2019
  40. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J (2001) International human genome sequencing consortium. Initial sequencing and analysis of the human genome. Nature 409(6822):860–921.
  41. Lavialle C, Cornelis G, Dupressoir A, Esnault C, Heidmann O, Vernochet C, Heidmann T (2013) Paleovirology of 'syncytins', retroviral env genes exapted for a role in placentation. Philos Trans R Soc Lond B Biol Sci 368(1626):20120507. Scholar
  42. Lynch M, Ackerman MS, Gout JF, Long H, Sung W, Thomas WK, Foster PL (2016) Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet 17(11):704–714. Scholar
  43. Maleszka R, Mason PH, Barron AB (2014) Epigenomics and the concept of degeneracy in biological systems. Brief Funct Genomics 13(3):191–202. Scholar
  44. Meier K, Brehm A (2014) Chromatin regulation: how complex does it get? Epigenetics 9(11):1485–1495. Scholar
  45. Mundade R, Ozer HG, Wei H, Prabhu L, Lu T (2014) Role of ChIP-seq in the discovery of transcription factor binding sites, differential gene regulation mechanism, epigenetic marks and beyond. Cell Cycle 13(18):2847–2852. Scholar
  46. National Cancer Institute (2019) Available online: Cited 26 Mar 2019
  47. Nikitin D, Garazha A, Sorokin M, Penzar D, Tkachev V, Markov A, Buzdin A (2019) Retroelement-linked transcription factor binding patterns point to quickly developing molecular pathways in human evolution. Cells 8(2):130. Scholar
  48. Nikitin D, Penzar D, Garazha A, Sorokin M, Tkachev V, Borisov N, Buzdin AA (2018) Profiling of human molecular pathways affected by retrotransposons at the level of regulation by transcription factor proteins. Front Immunol 9:30. Scholar
  49. Numpy Least squares polynomial fit (2019) Available online: Cited 26 Mar 2019
  50. O’Brien PJ (2006) Catalytic promiscuity and the divergent evolution of DNA repair enzymes. Chem Rev 106(2):720–752. Scholar
  51. Pathway Central (2019) Available online: Cited 26 Mar 2019
  52. Reactome (2019) Available online: Cited 26 Mar 2019
  53. RepeatMasker (2019) Available online: Cited 26 Mar 2019
  54. Royer-Bertrand B, Rivolta C (2015) Whole genome sequencing as a means to assess pathogenic mutations in medical genetics and cancer. Cell Mol Life Sci 72(8):1463–1471. Scholar
  55. Seaborn (2019) Available online: Cited 26 Mar 2019
  56. Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Cherry JM (2016) ENCODE data at the ENCODE portal. Nucleic Acids Res 44(D1):D726–D732. Scholar
  57. Suntsova M, Garazha A, Ivanova A, Kaminsky D, Zhavoronkov A, Buzdin A (2015) Molecular functions of human endogenous retroviruses in health and disease. Cell Mol Life Sci 72(19):3653–3675. Scholar
  58. The Gene Ontology Consortium (2017) Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res 45(D1):D331–D338. Scholar
  59. Thompson D, Regev A, Roy S (2015) Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu Rev Cell Dev Biol 31(1):399–428. Scholar
  60. Turner BM (2014) Nucleosome signalling; an evolving concept. Biochim Biophys Acta 1839(8):623–626. Scholar
  61. UCSC Browser, bedGraph files (2019a) Available online: Cited 26 Mar 2019
  62. UCSC Browser, Human genome (2019b) Available online: Cited 26 Mar 2019
  63. Varriale A (2014) DNA methylation, epigenetics, and evolution in vertebrates: facts and challenges. Int J Evol Biol 2014:475981. Scholar
  64. Villar D, Flicek P, Odom DT (2014) Evolution of transcription factor binding in metazoans—mechanisms and functional implications. Nat Rev Genet 15(4):221–233. Scholar
  65. Yin H, Wang S, Zhang Y-H, Cai Y-D, Liu H (2016) Analysis of important gene ontology terms and biological pathways related to pancreatic cancer. Biomed Res Int 2016:1–10. Scholar
  66. Yuryev A (2015) Gene expression profiling for targeted cancer treatment. Expert Opin Drug Discov 10(1):91–99. Scholar
  67. Zhong X (2016) Comparative epigenomics: a powerful tool to understand the evolution of DNA methylation. New Phytol 210(1):76–80. Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Daniil Nikitin
    • 1
    • 2
    • 4
  • Maxim Sorokin
    • 1
    • 3
  • Victor Tkachev
    • 2
  • Andrew Garazha
    • 2
  • Alexander Markov
    • 4
  • Anton Buzdin
    • 1
    • 2
    • 3
    Email author
  1. 1.I.M. Sechenov First Moscow State Medical UniversityMoscowRussia
  2. 2.Omicsway Corp.WalnutUSA
  3. 3.Shemyakin-Ovchinnikov Institute of Bioorganic ChemistryMoscowRussia
  4. 4.Faculty of BiologyMoscow State UniversityMoscowRussia

Personalised recommendations