Skip to main content

Topological Metrics in Blast Data Mining: Plasmid and Nitrogen-Fixing Proteins Case Studies

  • Conference paper
Bioinformatics Research and Development (BIRD 2008)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Included in the following conference series:

  • 720 Accesses

Abstract

Over the past years, a number of metrics have been introduced to characterize the topology of complex networks. We use these methodologies to analyze networks obtained through Blast data mining. The algorithm we present consists of the following steps: 1- encode results of Blast searches as a distance matrix of e-values; 2- perform entropy-controlled clustering analysis to identify the communities; 3- statistical analysis of the resulting network, 4- gene ontology and data mining in sequence databases to infer the function of the identified clusters. We report on the analysis of two data sets; the first is formed by over 3300 plasmid encoded proteins and the second comprises over 4200 sequences related to nitrogen fixation proteins. In the first case we observed strong selective pressures for horizontal transfer and maintenance of genes encoding proteins for resistance to antibiotics, plasmid stability and conjugal transfer. Nitrogen fixation proteins can be divided on the basis of our results into three different groups: proteins with no paralogs in any of the genomes considered, proteins with paralogs belonging to different metabolic processes (O–paralogs) and proteins with paralogs in other and the same metabolic processes (I/O–paralogs).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhalg, J., Zhalg, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  2. Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: highly sensitive and fast homology search. J. Bioinform. Comput. Biol. 2(3), 417–439 (2004)

    Article  Google Scholar 

  3. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.: Complex Networks: Structure and Dynamics. Physics Reports 424, 175–308 (2006)

    Article  MathSciNet  Google Scholar 

  4. Dorogovtsev, S.N., Mendes, J.F., Samukhin, A.N.: Structure of growing networks with preferential linking. Phys Rev. Lett. 85(21), 4633–4636 (2000)

    Article  Google Scholar 

  5. Colizza, V., Barrat, A., Barthelemy, M., Vespignani, A.: The role of the airline transportation network in the prediction and predictability of global epidemics. Proc. Natl. Acad. Sci. U S A 103(7), 2015–2020 (2006)

    Article  Google Scholar 

  6. Gross, T., D’Lima, C.J., Blasius, B.: Epidemic dynamics on an adaptive network. Phys Rev. Lett. 96(20), 208701 (2006)

    Article  Google Scholar 

  7. Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393, 440–442 (1998)

    Article  Google Scholar 

  8. Barabasi, A.L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004)

    Article  Google Scholar 

  9. Forster, J., Famili, I., Fu, P., Palsson, B.O., Nielsen, J.: Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res. 13, 244–253 (2003)

    Article  Google Scholar 

  10. Monge, R.A., Roman, E., Nombela, C., Pla, J.: The MAP kinase signal transduction network in Candida albicans. Microbiology 152, 905–912 (2006)

    Article  Google Scholar 

  11. Herrgard, M.J., Covert, M.W., Palsson, B.O.: Reconstruction of microbial transcriptional regulatory networks. Curr. Opin. Biotechnol. 15, 70–77 (2004)

    Article  Google Scholar 

  12. Jones, C.E., Baumann, U., Brown, A.L.: Automated methods of predicting the function of biological sequences using GO and BLAST. BMC Bioinformatics 15, 272 (2005)

    Article  Google Scholar 

  13. Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., Koonin, E.V.: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28 (2001)

    Article  Google Scholar 

  14. Li, L., Stoeckert Jr., C.J., Roos, S.D.: OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. (13), 2178–2189 (2003)

    Article  Google Scholar 

  15. Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)

    Article  Google Scholar 

  16. van Dongen, S.: Graph clustering by flow simulation. (2000) PhD thesis http://igitur-archive.library.uu.nl/dissertations/1895620/inhoud.htm , http://micans.org/mcl/

  17. Johnson, T.J., Siek, K.E., Johnson, S.J., Nolan, L.K.: DNA sequence and comparative genomics of pAPEC-O2-R, an avian pathogenic Escherichia coli transmissible R plasmid. Antimicrob Agents Chemother 49, 4681–4688 (2005)

    Article  Google Scholar 

  18. Thomas, C.M., Nielsen, K.M.: Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat. Rev. Microbiol. 3, 711–721 (2005)

    Article  Google Scholar 

  19. Kondrashov, F.A., Kondrashov, A.S.: Role of selection in fixation of gene duplications. J. Theor. Biol. 21, 141–151 (2006)

    Article  MathSciNet  Google Scholar 

  20. Guimera, R., Sales-Pardo, M., Amaral, L.A.: Modularity from fluctuations in random graphs and complex networks. Phys Rev. E Stat. Nonlin. Soft Matter Phys 70, 025101 (2004)

    Google Scholar 

  21. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys Rev. E Stat. Nonlin. Soft Matter Phys 69, 026113 (2004)

    Google Scholar 

  22. Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proc. Natl. Acad. Sci. U S A 104, 36–41 (2007)

    Article  Google Scholar 

  23. Gfeller, D., Chappelier, J.C., De Los Rios, P.: Finding instabilities in the community structure of complex networks. Phys Rev. E Stat. Nonlin. Soft Matter Phys 75, 056135 (2005)

    Google Scholar 

  24. Tetko, I.V., Facius, A., Ruepp, A., Mewes, H.W.: Super paramagnetic clustering of protein sequences. BMC Bioinformatics 6, 82 (2005)

    Article  Google Scholar 

  25. Zhang, Z., Luo, Z.W., Kishino, H., Kearsey, M.J.: Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law. Mol. Biol. Evol. 22, 501–505 (2005)

    Article  Google Scholar 

  26. Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., Rothberg, J.M.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000)

    Article  Google Scholar 

  27. Wagner, A.: The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. Mol. Biol. Evol. 18, 1283–1292 (2001)

    Google Scholar 

  28. Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. PNAS 98, 4569–4574 (2001)

    Article  Google Scholar 

  29. Goh, K.I., Oh, E., Jeong, H., Kahng, B., Kim, D.: Classification of scale-free networks. Proc. Natl. Acad. Sci. U S A 99, 12583–12588 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  30. Li, S., Armstrong, C.M., Bertin, N., Ge, H., Milstein, S., Boxem., M., Vidalain, P.-O., Han, J.-D.J., Chesneau, A., Hao, T., Goldberg, D.S., Li, N., Martinez, M., Rual, J.-F., Lamesch, P., Xu, L., Tewari, M., Wong, S.L., Zhang, L.V., Berritz, G.F., Jacotot, L., Vaglio, P., Reboul, J., Hirozane-Kishikawa, T., Li, Q., Gabel, H.W., Elewa, A., Baumgartner, B., Rose, D.J., Yu, H., Bosak, S., Sequerra, R., Fraser, A., Mange, S.E., Saxton, W.M., Strome, S., van den Heuvel, S., Piano, F., Vandenhaute, J., Sardet, C., Gerstein, M., Doucette-Stamm, L., Gunsalus, K.C., Harper, J.W., Cusick, M.E., Roth, F.P., Hill, D.E., Vidal, M.: A map of the interactome network of the metazoan C. elegans. Science 303, 540–543 (2004)

    Article  Google Scholar 

  31. Hughes, A.L., Friedman, R.: Gene Duplication and the Properties of Biological Networks. J. Mol. Evol. 61, 758–764 (2005)

    Article  Google Scholar 

  32. Koonin, E.V., Wolf, Y.I., Karev, G.P.: The structure of the protein universe and genome evolution. Nature 420, 218–223 (2002)

    Article  Google Scholar 

  33. Larsen, M.H., Figurski, D.H.: Structure, expression, and regulation of the kilC operon of promiscuous IncP alpha plasmids. J. Bacteriol. 176, 5022–5032 (1994)

    Google Scholar 

  34. Arthur, D.C., Ghetu, A.F., Gubbins, M.J., Edwards, R.A., Frost, L.S., Glover, J.N.: FinO is an RNA chaperone that facilitates sense-antisense RNA interactions. EMBO J. 22, 6346–6355 (2003)

    Article  Google Scholar 

  35. Aguirre-Ramirez, M., Ramirez-Santos, J., Van Melderen, L., Gomez-Eichelmann, M.C.: Expression of the F plasmid ccd toxin-antitoxin system in Escherichia coli cells under nutritional stress. Can J. Microbiol. 52, 24–30 (2006)

    Article  Google Scholar 

  36. Escobar-Paramo, P., Giudicelli, C., Parsot, C., Denamur, E.: The evolutionary history of Shigella and enteroinvasive Escherichia coli revised. J. Mol. Evol. 57, 140–148 (2003)

    Article  Google Scholar 

  37. Hartman, A.B., Essiet, I.I., Isenbarger, D.W., Lindler, L.E.: Epidemiology of tetracycline resistance determinants in Shigella spp. and enteroinvasive Escherichia coli: characterization and dissemination of tet(A)-1. J. Clin. Microbiol. 41, 1023–1032 (2003)

    Article  Google Scholar 

  38. Call, D.R., Kang, M.S., Daniels, J., Besser, T.E.: Assessing genetic diversity in plasmids from Escherichia coli and Salmonella enterica using a mixed-plasmid microarray. J. Appl. Microbiol. 100, 15–28 (2006)

    Article  Google Scholar 

  39. Sperotto, R.A., Gross, J., Vedoy, C., Passaglia, L.M., Schrank, I.S.: The electron transfer flavoprotein fixABCX gene products from Azospirillum brasilense show a NifA-dependent promoter regulation. Curr. Microbiol. 49, 267–273 (2004)

    Article  Google Scholar 

  40. Qiao, F., Bowie, J.U.: The many faces of SAM. Sci STKE. 286, re7 (2005)

    Google Scholar 

  41. Burke, D.H., Hearst, J.E., Sidow, A.: Early evolution of photosynthesis: clues from nitrogenase and chlorophyll iron proteins. Proc. Natl. Acad. Sci. U S A 90(15), 7134–7138 (1993)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lió, P., Brilli, M., Fani, R. (2008). Topological Metrics in Blast Data Mining: Plasmid and Nitrogen-Fixing Proteins Case Studies. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70600-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70598-7

  • Online ISBN: 978-3-540-70600-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics