Prediction and Integration of Regulatory and Protein–Protein Interactions

  • Duangdao Wichadakul
  • Jason McDermott
  • Ram Samudrala
Part of the Methods in Molecular Biology book series (MIMB, volume 541)


Knowledge of transcriptional regulatory interactions (TRIs) is essential for exploring functional genomics and systems biology in any organism. While several results from genome-wide analysis of transcriptional regulatory networks are available, they are limited to model organisms such as yeast ( 1 ) and worm ( 2 ). Beyond these networks, experiments on TRIs study only individual genes and proteins of specific interest. In this chapter, we present a method for the integration of various data sets to predict TRIs for 54 organisms in the Bioverse ( 3 ). We describe how to compile and handle various formats and identifiers of data sets from different sources and how to predict TRIs using a homology-based approach, utilizing the compiled data sets. Integrated data sets include experimentally verified TRIs, binding sites of transcription factors, promoter sequences, protein subcellular localization, and protein families. Predicted TRIs expand the networks of gene regulation for a large number of organisms. The integration of experimentally verified and predicted TRIs with other known protein–protein interactions (PPIs) gives insight into specific pathways, network motifs, and the topological dynamics of an integrated network with gene expression under different conditions, essential for exploring functional genomics and systems biology.

Key words

Regulog interolog protein–DNA interaction prediction transcriptional regulatory interaction (TRI) prediction protein–protein interaction (PPI) prediction homology-based approach transferability of homologs 



This work was supported in part by a Searle Scholar Award and NSF Grant DBI-0217241 to R.S., and the University of Washington’s Advanced Technology Initiative in Infectious Diseases. Also, it is supported in part by the National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science Technology & Development Agency (NSTDA), Thailand.


  1. 1.
    Lee, T.I., et al., Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 2002. 298(5594): 799–804.PubMedGoogle Scholar
  2. 2.
    Deplancke, B., et al., A gene-centered C. elegans protein-DNA interaction network. Cell, 2006. 125(6): 1193–1205.PubMedGoogle Scholar
  3. 3.
    McDermott, J., et al., BIOVERSE: Enhancements to the framework for structural, functional and contextual modeling of proteins and proteomes. Nucl. Acids Res., 2005. 33(suppl_2): W324–W325.PubMedGoogle Scholar
  4. 4.
    H Caron, et al., The Human Transcriptome Map reveals a clustering of highly expressed genes in chromosomal domains. Science, 2001. 291: 1289–1292.Google Scholar
  5. 5.
    Shen-Orr, S.S., et al., Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet., 2002. 31(1): 64–68.PubMedGoogle Scholar
  6. 6.
    Martinez-Antonio, A. and J. Collado-Vides, Identifying global regulators in transcriptional regulatory networks in bacteria. Curr. Opin. Microbiol., 2003. 6(5): 482–489.PubMedGoogle Scholar
  7. 7.
    Harbison, C.T., et al., Transcriptional regulatory code of a eukaryotic genome. Nature, 2004. 431(7004): 99.PubMedGoogle Scholar
  8. 8.
    Proft, M., et al., Genomewide identification of Sko1 target promoters reveals a regulatory network that operates in response to osmotic stress in Saccharomyces cerevisiae. Eukaryot. Cell, 2005. 4(8): 1343–1352.PubMedGoogle Scholar
  9. 9.
    Sharma, M.R., et al., Transcriptional networks in a rat model for nonalcoholic fatty liver disease: A microarray analysis. Exp. Mol. Pathol., 2006. [Epub ahead of print].Google Scholar
  10. 10.
    Reymann, S. and J. Borlak, Transcriptome profiling of human hepatocytes treated with Aroclor 1254 reveals transcription factor regulatory networks and clusters of regulated genes. BMC Genomics, 2006. 7(1): 217.PubMedGoogle Scholar
  11. 11.
    Makita, Y., et al., DBTBS: Database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics. Nucl. Acids Res., 2004. 32(suppl_1): D75–D77.PubMedGoogle Scholar
  12. 12.
    Matys, V., et al., TRANSFAC(R) and its module TRANSCompel(R): Transcriptional gene regulation in eukaryotes. Nucl. Acids Res., 2006. 34(suppl_1): D108–D110.PubMedGoogle Scholar
  13. 13.
    Salgado, H., et al., RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucl. Acids Res., 2006. 34(suppl_1): D394–D397.PubMedGoogle Scholar
  14. 14.
    Segal, E., et al., Module networks: Identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet., 2003. 34(2): 166–176.PubMedGoogle Scholar
  15. 15.
    Pilpel, Y., P. Sudarsanam, and G. M. Church, Identifying regulatory networks by combinatorial analysis of promoter elements. Nat. Genet., 2003. 29: 153–159.Google Scholar
  16. 16.
    Yeger-Lotem, E., et al., Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. PNAS, 2004. 101(16): 5934–5939.PubMedGoogle Scholar
  17. 17.
    Yu, T. and K.-C. Li, Inference of transcriptional regulatory network by two-stage constrained space factor analysis. Bioinformatics, 2005. 21(21): 4033–4038.PubMedGoogle Scholar
  18. 18.
    Zhang, L., et al., Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J. Biol., 2005. 4(2): 6.PubMedGoogle Scholar
  19. 19.
    Jiang, R., et al., Network motif identification in stochastic networks. PNAS, 2006. 103(25): 9404–9409.PubMedGoogle Scholar
  20. 20.
    Mandel-Gutfreund, Y. and H. Margalit, Quantitative parameters for amino acid-base interaction: Implications for prediction of protein-DNA binding sites. Nucl. Acids Res., 1998. 26(10): 2306–2312.PubMedGoogle Scholar
  21. 21.
    Luscombe, N.M. and J.M. Thornton, Protein–DNA interactions: Amino acid conservation and the effects of mutations on binding specificity. J. Mol. Biol., 2002. 320(5): 991–1009.PubMedGoogle Scholar
  22. 22.
    Kato, M., et al., Identifying combinatorial regulation of transcription factors and binding motifs. Genome Biol., 2004. 5(8): R56.PubMedGoogle Scholar
  23. 23.
    Morozov, A.V., et al., Protein-DNA binding specificity predictions with structural models. Nucl. Acids Res., 2005. 33(18): 5781–5798.PubMedGoogle Scholar
  24. 24.
    Gertz, J., et al., Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics. Genome Res., 2005. 15(8): 1145–1152.PubMedGoogle Scholar
  25. 25.
    Tompa, M., et al., Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol., 2005. 23: 137–144.PubMedGoogle Scholar
  26. 26.
    GuhaThakurta, D., Computational identification of transcriptional regulatory elements in DNA sequence. Nucl. Acids Res., 2006. 34(12): 3585–3598, doi: 10.1093/nar/gkl372.PubMedGoogle Scholar
  27. 27.
    Yu, H., et al., Annotation transfer between genomes: Protein-protein interologs and protein-DNA regulogs. Genome Res., 2004. 14(6): 1107–1118.PubMedGoogle Scholar
  28. 28.
    Du, W., et al., RBF, a novel RB-related gene that regulates E2F activity and interacts with cyclin E in Drosophila. Genes Dev., 1996. 10(10): 1206–1218.PubMedGoogle Scholar
  29. 29.
    Walhout, A.J.M., et al., Protein interaction mapping in C. elegans using proteins involved in vulval development. Science, 2000. 287: 116–122.PubMedGoogle Scholar
  30. 30.
    Matthews, L.R., et al., Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or “Interologs”. Genome Res., 2001. 11(12): 2120–2126.PubMedGoogle Scholar
  31. 31.
    Lehner, B. and A.G. Fraser, A first-draft human protein-interaction map. Genome Biol., 2004. 5(9): R63.1–9.Google Scholar
  32. 32.
    Huang, T.-W., et al., POINT: A database for the prediction of protein-protein interactions based on the orthologous interactome. Bioinformatics, 2004. 20(17): 3273–3276.PubMedGoogle Scholar
  33. 33.
    Kemmer, D., et al., Ulysses – an application for the projection of molecular interactions across species. Genome Biol., 2005. 6(12): R106.PubMedGoogle Scholar
  34. 34.
    Brown, K.R. and I. Jurisica, Online predicted human interaction database. Bioinformatics, 2005. 21(9): 2076–2082.PubMedGoogle Scholar
  35. 35.
    von Mering, C., et al., STRING: Known and predicted protein-protein associations, integrated and transferred across organisms. Nucl. Acids Res., 2005. 33(suppl_1): D433–D437.Google Scholar
  36. 36.
    Zhu, J. and M.Q. Zhang, SCPD: A promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics, 1999. 15(7): 607–611.PubMedGoogle Scholar
  37. 37.
    Bader, G.D., D. Betel, and C.W. Hogue, BIND: The biomolecular interaction network database. Nucl. Acids Res., 2003. 31(1): 248–250.PubMedGoogle Scholar
  38. 38.
    Alfarano, C., et al., The biomolecular interaction network database and related tools 2005 update. Nucl. Acids Res., 2005. 33(suppl_1): D418–D424, doi: 10.1093/nar/gki051.PubMedGoogle Scholar
  39. 39.
    Chen, N., et al., WormBase: A comprehensive data resource for Caenorhabditis biology and genomics. Nucl. Acids Res., 2005. 33(suppl_1): D383–D389.PubMedGoogle Scholar
  40. 40.
    Schwarz, E.M., et al., WormBase: Better software, richer content. Nucl. Acids Res., 2006. 34(suppl_1): D475–D478, doi: 10.1093/nar/gkj061.PubMedGoogle Scholar
  41. 41.
    Hayakawa, J., et al., Identification of promoters bound by c-Jun/ATF2 during rapid large-scale gene activation following genotoxic stress. Mol. Cell, 2004. 16(4): 521.PubMedGoogle Scholar
  42. 42.
    Kim, J., et al., Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat. Meth., 2005. 2(1): 47.Google Scholar
  43. 43.
    Kim, T.H., et al., Direct isolation and identification of promoters in the human genome. Genome Res., 2005. 15(6): 830–839.PubMedGoogle Scholar
  44. 44.
    Hong, E.L., et al., Saccharomyces Genome Database. http://, 2006.
  45. 45.
    Hinrichs, A.S., et al., The UCSC genome browser database: Update 2006. Nucl. Acids Res., 2006. 34(suppl_1): D590–D598.PubMedGoogle Scholar
  46. 46.
    Michael, J.M., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): 860.Google Scholar
  47. 47.
    Waterston, R.H., et al., Initial sequencing and comparative analysis of the mouse genome. Nature, 2002. 420(6915): 520.PubMedGoogle Scholar
  48. 48.
    Gibbs, R.A., et al., Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature, 2004. 428(6982): 493.PubMedGoogle Scholar
  49. 49.
    Adams, M.D., et al., The genome sequence of drosophila melanogaster. Science, 2000. 287(5461): 2185–2195.PubMedGoogle Scholar
  50. 50.
    The C. elegans Sequencing Consortium, Genome sequence of the nematode C. elegans: A platform for investigating biology. Science, 1998. 282(5396): 2012–2018.Google Scholar
  51. 51.
    Rhee, S.Y., et al., The Arabidopsis Information Resource (TAIR): A model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucl. Acids Res., 2003. 31(1): 224–228.PubMedGoogle Scholar
  52. 52.
    The Arabidopsis Genome, I., Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 2000. 408(6814): 796.Google Scholar
  53. 53.
    Theologis, A., et al., Sequence and analysis of chromosome 1 of the plant Arabidopsis thaliana. Nature, 2000. 408(6814): 816.PubMedGoogle Scholar
  54. 54.
    European Union Chromosome 3 Arabidopsis Genome Sequencing, C., R. The Institute for Genomic, and D.N.A.R.I. Kazusa, Sequence and analysis of chromosome 3 of the plant Arabidopsis thaliana. Nature, 2000. 408(6814): 820.Google Scholar
  55. 55.
    Kazusa, D.N.A.R.I., et al., Sequence and analysis of chromosome 5 of the plant Arabidopsis thaliana. Nature, 2000. 408(6814): 823.Google Scholar
  56. 56.
    Yuan, Q., et al., The institute for genomic research Osa1 rice genome annotation database. Plant Physiol., 2005. 138(1): 18–26.PubMedGoogle Scholar
  57. 57.
    Goff, S.A., et al., A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science, 2002. 296(5565): 92–100.PubMedGoogle Scholar
  58. 58.
    HUGO Gene Nomenclature Committee September 2006.
  59. 59.
    Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD), {September 2006}. World Wide Web URL:
  60. 60.
    Huh, W.-K., et al., Global analysis of protein localization in budding yeast. Nature, 2003. 425: 686–691.PubMedGoogle Scholar
  61. 61.
    Drawid, A., R. Jansen, and M. Gerstein, Genome-wide analysis relating expression level with protein subcellular localization. Trends Genet., 2000. 16(10): 426.PubMedGoogle Scholar
  62. 62.
    Ross-Macdonald, P., et al., Large-scale analysis of the yeast genome by transposon tagging and gene disruption. Nature, 1999. 402(6760): 413.PubMedGoogle Scholar
  63. 63.
    Kumar, A., et al., TRIPLES: A database of gene function in Saccharomyces cerevisiae. Nucl. Acids Res., 2000. 28(1): 81–84.PubMedGoogle Scholar
  64. 64.
    Tatiana, T.A. and T.L. Madden, Blast 2 sequences – a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett., 1999. 174: 247–250.Google Scholar
  65. 65.
    Altschul, S.F., et al., Basic local alignment search tool. J. Mol. Biol., 1990. 215(3): 403–410.PubMedGoogle Scholar
  66. 66.
    Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl., Acids Res., 1997. 25(17): 3389–3402.Google Scholar
  67. 67.
    Smith, T.F. and M.S. Waterman, Identification of common molecular subsequences. J. Mol. Biol., 1981. 147(1): 195–197.PubMedGoogle Scholar
  68. 68.
    Pearson, W.R., Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics, 1991. 11(3): 635–650.PubMedGoogle Scholar
  69. 69.
    Thompson, J.D., D.G. Higgins, and T.J. Gibson, CLUSTAL W: Improving the sensitivity of progressivemultiple sequence alignment through sequence weighting,position-specific gap penalties and weight matrix choice. Nucl. Acids Res., 1994. 22(22): 4673–4680.PubMedGoogle Scholar
  70. 70.
    Yu, J., et al., A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science, 2002. 296(5565): 79–92.PubMedGoogle Scholar
  71. 71.
    Donna Maglott, et al., Entrez Gene: Gene-centered information at NCBI. Nucl. Acids Res., 2005. 33(Database): D54–D58.Google Scholar
  72. 72.
    Bashton, M. and C. Chothia, The geometry of domain combination in proteins. J. Mol. Biol., 2002. 315(4): 927.PubMedGoogle Scholar
  73. 73.
    Bjorklund, A.K., et al., Domain rearrangements in protein evolution. J. Mol. Biol., 2005. 353(4): 911.PubMedGoogle Scholar
  74. 74.
    Geer, L.Y., et al., CDART: Protein homology by domain architecture. Genome Res., 2002. 12(10): 1619–1623, doi: 10.1101/gr.278202,.PubMedGoogle Scholar
  75. 75.
    Hegyi, H. and M. Gerstein, Annotation transfer for genomics: Measuring functional divergence in multi-domain proteins. Genome Res., 2001. 11(10): 1632–1640, doi: 10.1101/gr. 183801.PubMedGoogle Scholar
  76. 76.
    The UniProt Consortium, The Universal Protein Resource (UniProt). Nucl. Acids Res., 2007. 35(suppl_1): D193–D197, doi: 10.1093/nar/gkl929.Google Scholar
  77. 77.
    Luscombe, N.M., et al., Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004. 431: 308–312.PubMedGoogle Scholar
  78. 78.
    Guldener, U., et al., CYGD: The comprehensive yeast genome database. Nucl. Acids Res., 2005. 33(suppl_1): D364–D368, doi: 10.1093/nar/gki053.PubMedGoogle Scholar
  79. 79.
    Andreoli, C., et al., MitoP2, an integrated database on mitochondrial proteins in yeast and man. Nucl. Acids Res., 2004. 32(1): D459–D462.PubMedGoogle Scholar
  80. 80.
    Fink, J.L., et al., LOCATE: A mouse protein subcellular localization database. Nucl. Acids Res., 2006. 34(suppl_1): D213–D217, doi: 10.1093/nar/gkj069.PubMedGoogle Scholar
  81. 81.
    Nakai, K. and P. Horton, PSORT: A program for detecting the sorting signals of proteins and predicting their subcellular localization. Trends Biochem. Sci., 1999. 24(1): 34–35.PubMedGoogle Scholar
  82. 82.
    Drawid, A. and M. Gerstein, A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome. J. Mol. Biol., 2000. 301: 1059–1075.PubMedGoogle Scholar
  83. 83.
    Nair, R. and B. Rost, LOC3D: Annotate sub-cellular localization for protein structures. Nucl. Acids Res., 2003. 31(13): 3337–3340.PubMedGoogle Scholar
  84. 84.
    Olof Emanuelsson, et al., Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol., 2000. 300: 1005–1016.Google Scholar
  85. 85.
    Hua, S. and Z. Sun, Support vector machine approach for protein subcellular localization prediction. Bioinformatics, 2001. 17(8): 721–728.PubMedGoogle Scholar
  86. 86.
    Mulder, N.J., et al., InterPro, progress and status in 2005. Nuc. Acids Res., 2005. 33(Database issue): D201–D205.Google Scholar
  87. 87.
    Mulder, N.J., et al., New developments in the InterPro database. Nucl. Acids Res., 2007. 35(suppl_1): D224–D228, doi: 10.1093/nar/gkl841.PubMedGoogle Scholar
  88. 88.
    Finn, R.D., et al., Pfam: Clans, web tools and services. Nucl. Acids Res., 2006. 34(Database issue): D247–D251.PubMedGoogle Scholar
  89. 89.
    Hulo, N., et al., The PROSITE database. Nucl. Acids Res., 2006. 34(Database issue): D227–D230.PubMedGoogle Scholar
  90. 90.
    Catherine B., et al., The ProDom database of protein domain families: More emphasis on 3D. Nucl. Acids Res., 2005. 33(Database Issue): D212–D215.Google Scholar
  91. 91.
    Henikoff, S., J.G. Henikoff, and S. Pietrokovski, Blocks+: A non-redundant database of protein alignment blocks dervied from multiple compilations. Bioinformatics, 1999. 15(6): 471–479.PubMedGoogle Scholar
  92. 92.
    Henikoff, J.G., et al., Increased coverage of protein families with the blocks database servers. Nucl. Acids Res., 2000. 28: 228–230.PubMedGoogle Scholar
  93. 93.
    Attwood, T.K., et al., PRINTS and its automatic supplement, prePRINTS. Nucl. Acids Res., 2003. 31: 400–402.PubMedGoogle Scholar
  94. 94.
    Haft, D.H., J.D. Selengut, and O. White, The TIGRFAMs database of protein families. Nucl. Acids Res., 2003. 31: 371–373.PubMedGoogle Scholar
  95. 95.
    Meinel, T., A. Krause, H. Luz, M. Vingron, and E. Staub, The SYSTERS protein family database in 2005. Nucl. Acids Res., 2005. 33(Database issue): D226–D229.PubMedGoogle Scholar
  96. 96.
    Murzin, A.G., et al., SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 1995. 247: 536–540.PubMedGoogle Scholar
  97. 97.
    Andreeva A., et al., SCOP database in 2004: Refinements integrate structure and sequence family data. Nucl. Acid Res., 2004. 32: D226–D229.Google Scholar
  98. 98.
    Letunic, I., et al., SMART 5: Domains in the context of genomes and networks. Nucl. Acids Res., 2006. 34(suppl_1): D257–D260, doi: 10.1093/nar/gkj079.PubMedGoogle Scholar
  99. 99.
    Gough, J., et al., Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol., 2001. 313(4): 903–919.PubMedGoogle Scholar
  100. 100.
    Gough, J. and C. Chothia, SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucl. Acids Res., 2002. 30(1): 268–272, doi: 10.1093/nar/30.1.268.PubMedGoogle Scholar
  101. 101.
    Pearl, F., et al., The CATH domain structure database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucl. Acids Res., 2005. 33(suppl_1): D247–D251, doi: 10.1093/nar/gki024.PubMedGoogle Scholar
  102. 102.
    Yeats, C., et al., Gene3D: Modelling protein structure, function and evolution. Nucl. Acids Res., 2006. 34(suppl_1): D281–D284, doi: 10.1093/nar/gkj057.PubMedGoogle Scholar
  103. 103.
    Wu, C.H., et al., PIRSF: Family classification system at the protein information resource. Nucl. Acids Res., 2004. 32(suppl_1): D112–D114, %R 10.1093/nar/gkh097.PubMedGoogle Scholar
  104. 104.
    Mi, H., et al., The PANTHER database of protein families, subfamilies, functions and pathways. Nucl. Acids Res., 2005. 33(suppl_1): D284–D288, 10.1093/nar/gki078.PubMedGoogle Scholar
  105. 105.
    Mi, H., et al., PANTHER version 6: Protein sequence and function evolution data with expanded representation of biological pathways. Nucl. Acids Res., 2007. 35(suppl_1): D247–D252, doi: 10.1093/nar/gkl869.PubMedGoogle Scholar
  106. 106.
    Marchler-Bauer, A., et al., CDD: A conserved domain database for protein classification. Nucl. Acids Res., 2005. 33(suppl_1): D192–D196, doi: 10.1093/nar/gki069.PubMedGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Duangdao Wichadakul
    • 1
  • Jason McDermott
    • 2
  • Ram Samudrala
    • 3
  1. 1.BIOTECPathumthaniThailand
  2. 2.Computational Biology and BioinformaticsPacific Northwest National LaboratoryRichlandUSA
  3. 3.Department of MicrobiologyUniversity of WashingtonSeattleUSA

Personalised recommendations