Ortholog Clustering on a Multipartite Graph

Vashist, Akshay; Kulikowski, Casimir; Muchnik, Ilya

doi:10.1007/11557067_27

Akshay Vashist²¹,
Casimir Kulikowski²¹ &
Ilya Muchnik^21,22

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3692))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1119 Accesses
3 Citations

Abstract

We present a method for automatically extracting groups of orthologous genes from a large set of genomes through the development of a new clustering method on a weighted multipartite graph. The method assigns a score to an arbitrary subset of genes from multiple genomes to assess the orthologous relationships between genes in the subset. This score is computed using sequence similarities between the member genes and the phylogenetic relationship between the corresponding genomes. An ortholog cluster is found as the subset with highest score, so ortholog clustering is formulated as a combinatorial optimization problem. The algorithm for finding an ortholog cluster runs in time O(|E| + |V| log |V|), where V and E are the sets of vertices and edges, respectively in the graph. However, if we discretize the similarity scores into a constant number of bins, the run time improves to O(|E| + |V|). The proposed method was applied to seven complete eukaryote genomes on which manually curated ortholog clusters, KOG (eukaryotic ortholog clusters, http://www.ncbi.nlm.nih.gov/COG/new/) are constructed. A comparison of our results with the manually curated ortholog clusters shows that our clusters are well correlated with the existing clusters. Finally, we demonstrate how gene order information can be incorporated in the proposed method for improving ortholog detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst Zool 19, 99–113 (1970)
Article Google Scholar
Fujibuchi, W., Ogata, H., Matsuda, H., Kanehisa, M.: Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping. Nucleic Acids Res 28, 4096–4036 (2002)
Google Scholar
Kamvysselis, M., Patterson, N., Birren, B., Berger, B., Lander, E.: Whole-genome comparative annotation and regulatory motif discovery in multiple yeast species. In: RECOMB, pp. 157–166 (2003)
Google Scholar
Remm, M., Strom, C., Sonnhammer, E.: Automatics clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 314, 1041–1052 (2001)
Article Google Scholar
Koonin, E.V., et al.: A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 5 (2004)
Google Scholar
Tatusov, R., Koonin, E., Lipmann, D.: A genomic perspective on protein families. Science 278, 631–637 (1997)
Article Google Scholar
Zmasek, C., Eddy, S.: RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BioMed Central Bioinformatics 3 (2002)
Google Scholar
Huynen, M.A., Bork, P.: Measuring genome evolution. Proc. Natl. Acad. Sci. USA 95, 5849–5856 (1998)
Article Google Scholar
Tang, J., Moret, B.: Phylogenetic reconstruction from gene rearrangement data with unequal gene content. In: Dehne, F., Sack, J.-R., Smid, M. (eds.) WADS 2003. LNCS, vol. 2748, pp. 37–46. Springer, Heidelberg (2003)
Chapter Google Scholar
Dawande, M., Keskinocak, P., Swaminathan, J.M., Tayur, S.: On bipartite and multipartite clique problems. J. Algorithms 41, 388–403 (2001)
Article MATH MathSciNet Google Scholar
Matula, D.W., Beck, L.L.: Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 417–427 (1983)
Article MATH MathSciNet Google Scholar
Mirkin, B., Muchnik, I.: Induced layered clusters, hereditary mappings, and convex geometries. Appl. Math. Lett. 15, 293–298 (2002)
Article MATH MathSciNet Google Scholar
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 596–615 (1987)
Article MathSciNet Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)
MATH Google Scholar
Altschul, S., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Rand, W.M.: Objective criterion for the evaluation of clustering methods. J. Am. stat. Assoc. 66, 846–850 (1971)
Article Google Scholar
Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Res. 32, 138–141 (2004)
Article Google Scholar
Guigo, R., Muchnik, I., Smith, T.: Reconstruction of ancient molecular phylogeny. Mol Phylogenet Evol. 6, 189–213 (1996)
Article Google Scholar
Cannon, S.B., Young, N.D.: OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics 4 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science,
Akshay Vashist, Casimir Kulikowski & Ilya Muchnik
DIMACS Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
Ilya Muchnik

Authors

Akshay Vashist
View author publications
You can also search for this author in PubMed Google Scholar
Casimir Kulikowski
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Muchnik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Biocomputing Group, University of Bologna, Italy
Rita Casadio
Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, USA
Gene Myers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vashist, A., Kulikowski, C., Muchnik, I. (2005). Ortholog Clustering on a Multipartite Graph. In: Casadio, R., Myers, G. (eds) Algorithms in Bioinformatics. WABI 2005. Lecture Notes in Computer Science(), vol 3692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557067_27

Download citation

DOI: https://doi.org/10.1007/11557067_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29008-7
Online ISBN: 978-3-540-31812-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics