Skip to main content

Screening for Ortholog Clusters Using Multipartite Graph Clustering by Quasi-Concave Set Function Optimization

  • Conference paper
Book cover Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RSFDGrC 2005)

Abstract

Finding orthologous genes, similar genes in different genomes, is a fundamental problem in comparative genomics. We present a model for automatically extracting candidate ortholog clusters in a large set of genomes using a new clustering method for multipartite graphs. The groups of orthologous genes are found by focusing on the gene similarities across genomes rather than similarities between genes within a genome. The clustering problem is formulated as a series of combinatorial optimization problems whose solutions are interpreted as ortholog clusters. The objective function in optimization problem is a quasi-concave set function which can be maximized efficiently. The properties of these functions and the algorithm to maximize these functions are presented. We applied our method to find ortholog clusters in data which supports the manually curated Cluster of Orthologous Genes (COG) from 43 genomes containing 108,090 sequences. Validation of candidate ortholog clusters was by comparison against the manually curated ortholog clusters in COG, and by verifying annotations in Pfam and SCOP – in most cases showing strong correlations with the known results. An analysis of Pfam and SCOP annotations, and COG membership for sequences in 7,701 clusters which include sequences from at least three organisms, shows that 7,474(97%) clusters contain sequences that are all consistent in at least one of the annotations or their COG membership.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)

    Article  Google Scholar 

  2. Fujibuchi, W., Ogata, H., Matsuda, H., Kanehisa, M.: Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping. Nucleic. Acids Res. 28, 4096 (2002)

    Google Scholar 

  3. Tatusov, R., Koonin, E., Lipmann, D.: A genomic perspective on protein families. Science 278, 631–637 (1997)

    Article  Google Scholar 

  4. Strom, C.E., Sonnhammer, E.L.: Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics 18, 92–99 (2002)

    Article  Google Scholar 

  5. Remm, M., Strom, C.E., Sonnhammer, E.L.: Automatics clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001)

    Article  Google Scholar 

  6. Zmasek, C.M., Eddy, S.R.: RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BioMed Central Bioinformatics 3 (2002)

    Google Scholar 

  7. Tatusov, R.L., Galperin, M.Y., Natale, D.A., Koonin, E.V.: The COG database: a tool for genome-scale analysis of protein function and evolution. Nucleic. Acids Res. 28, 33–36 (2000)

    Article  Google Scholar 

  8. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucleic. Acids Res. 32, 138–141 (2004)

    Article  Google Scholar 

  9. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic. Acids Res. 32, 226–229 (2004)

    Article  Google Scholar 

  10. Mirkin, B., Muchnik, I.: Layered clusters of tightness set functions. Appl. Math. Lett. 15, 147–151 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  11. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic. Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  12. Eddy, S.R.: A review of the profile HMM literature from 1996-1998. Bioinformatics 14, 755–763 (1998)

    Article  Google Scholar 

  13. Rand, W.M.: Objective criterion for the evaluation of clustering methods. J. Am. stat. Assoc. 66, 846–850 (1971)

    Article  Google Scholar 

  14. Hubert, L.J., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)

    Article  Google Scholar 

  15. Tomii, K., Kanehisa, M.: A comparative analysis of ABC transporters in complete microbial genomes. Genome Res. 8, 1048–1059 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vashist, A., Kulikowski, C., Muchnik, I. (2005). Screening for Ortholog Clusters Using Multipartite Graph Clustering by Quasi-Concave Set Function Optimization. In: Ślęzak, D., Yao, J., Peters, J.F., Ziarko, W., Hu, X. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. RSFDGrC 2005. Lecture Notes in Computer Science(), vol 3642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11548706_43

Download citation

  • DOI: https://doi.org/10.1007/11548706_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28660-8

  • Online ISBN: 978-3-540-31824-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics