Protein Function Annotation Based on Ortholog Clusters Extracted from Incomplete Genomes Using Combinatorial Optimization

Vashist, Akshay; Kulikowski, Casimir; Muchnik, Ilya

doi:10.1007/11732990_10

Akshay Vashist²⁴,
Casimir Kulikowski²⁴ &
Ilya Muchnik^24,25

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3909))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1282 Accesses

Abstract

Reliable automatic protein function annotation requires methods for detecting orthologs with known function from closely related species. While current approaches are restricted to finding ortholog clusters from complete proteomes, most annotation problems arise in the context of partially sequenced genomes. We use a combinatorial optimization method for extracting candidate ortholog clusters robustly from incomplete genomes. The proposed algorithm focuses exclusively on sequence relationships across genomes and finds a subset of sequences from multiple genomes where every sequence is highly similar to other sequences in the subset. We then use an optimization criterion similar to the one for finding ortholog clusters to annotate the target sequences.

We report on a candidate annotation for proteins in the rice genome using ortholog clusters constructed from four partially complete cereal genomes – barley, maize, sorghum, wheat and the complete genome of Arabidopsis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abascal, F., Valencia, A.: Automatic annotation of protein function based on family identification. Proteins 53, 683–692 (2003)
Article Google Scholar
Tatusov, R., Koonin, E., Lipmann, D.: A genomic perspective on protein families. Science 278, 631–637 (1997)
Article Google Scholar
Enright, A.J., Van Dongen, S., Ouzonis, C.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30, 1575–1584 (2002)
Article Google Scholar
Petryszak, R., Kretschmann, E., Wieser, D., Apweiler, R.: The predictive power of the CluSTr database. Bioinformatics 21, 3604–3609 (2005)
Article Google Scholar
Wu, C.H., Huang, H., Yeh, L.S.L., Barker, W.C.: Protein family classification and functional annotation. Comput. Biol. Chem. 27, 37–47 (2003)
Article Google Scholar
Bru, C., Courcelle, E., Carrre, S., Beausse, Y., Dalmar, S., Kahn, D.: The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33, D212–215 (2005)
Google Scholar
Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucleic Acids Res 32, 138–141 (2004)
Article Google Scholar
Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32, 226–229 (2004)
Article Google Scholar
Fleishmann, W., Moller, S., Gateau, A., Apweiler, R.: A novel method for automatic functional annotation of proteins. Bioinformatics 15, 228–233 (1999)
Article Google Scholar
Curwen, V., Wyras, E., Andrews, T.D., Clarke, L., Mongin, E., Searle, S.M., Clamp, M.: The Ensembl automatic gene annotation system. Genome Res 14, 942–950 (2004)
Article Google Scholar
Eisen, J., Wu, M.: Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theor. Popul. Biol. 61, 481–487 (2002)
Article Google Scholar
Galperin, M.Y., Koonin, E.V.: Who’s your neighbor? new computational approaches for functional genomics. Nat. Biotechnol. 18, 609–613 (2000)
Article Google Scholar
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997)
Article Google Scholar
Koski, L.B., Golding, G.B.: The closest BLAST hit is often not the nearest neighbor. J. Mol. Biol. 52, 540–542 (2001)
Google Scholar
Remm, M., Strom, C.E., Sonnhammer, E.L.: Automatics clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001)
Article Google Scholar
Li, L., Stoeckert, C.K., Roos, D.S.: OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178–2189 (2003)
Article Google Scholar
Tatusov, R., Fedorova, N., Jackson, J., Jacobs, A., Kiryutin, B., Koonin, E., Krylov, D., Mazumdes, R., Mekhedov, S., Nikolskaya, A., Rao, B., Smirnov, S., Sverdlov, A., Vasudevan, S., Wolf, Y., Yin, J., Natale, D.: The COG database: an updated version includes eukaryotes. BioMed Central Bioinformatics (2003)
Google Scholar
Abascal, F., Valencia, A.: Clustering of proximal sequence space for identification of protein families. Bioinformatics 18, 908–921 (2002)
Article Google Scholar
Vashist, A., Kulikowski, C., Muchnik, I.: Ortholog clustering on a multipartite graph. In: Workshop on Algorithms in Bioinformatics, pp. 328–340 (2005)
Google Scholar
Kamvysselis, M., Patterson, N., Birren, B., Berger, B., Lander, E.: Whole-genome comparative annotation and regulatory motif discovery in multiple yeast species. In: RECOMB, pp. 157–166 (2003)
Google Scholar
Huynen, M.A., Bork, P.: Measuring genome evolution. Proc. Natl. Acad. Sci. USA 95, 5849–5856 (1998)
Article Google Scholar
Fujibuchi, W., Ogata, H., Matsuda, H., Kanehisa, M.: Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping. Nucleic Acids Res 28, 4036–4096 (2002)
Google Scholar
Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999)
Article Google Scholar
He, X., Goldwasser, M.H.: Identifying conserved gene clusters in the presence of orthologous groups. In: RECOMB, pp. 272–280 (2004)
Google Scholar
Dandekar, T., Snel, B., Huynen, M., Bork, P.: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998)
Article Google Scholar
Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Workshop on Algorithms in Bioinformatics, pp. 252–263 (2001)
Google Scholar
Cannon, S.B., Young, N.D.: OrthoParaMap: distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics 4 (2003)
Google Scholar
Dong, Q., Schlueter, D., Brendel, V.: PlantGDB, plant genome database and analysis tools. Nucleic Acids Res 32, D354–D359 (2004)
Google Scholar
Schoof, H., Zaccaria, P., Gundlach, H., Lemcke, K., Rudd, S., Kolesov, G., Mewes, R.A.H., Mayer, K.: MIPS arabidopsis thaliana database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome. Nucleic Acids Res 30, 91–93 (2002)
Article Google Scholar
Kellogg, E.A.: Relationships of cereal crops and other grasses. Proc. Natl. Acad. Sci. USA 95, 2005–2010 (1998)
Article Google Scholar
Darlingto, H., Rouster, J., Hoffmann, L., Halford, N., Shewry, P., Simpson, D.: Identification and molecular characterisation of hordoindolines from barley grain. Plant Mol. Biol. 47, 785–794 (2001)
Article Google Scholar
Castleden, C.K., Aoki, N., Gillespie, V.J., MacRae, E.A., Quick, W.P., Buchner, P., Foyer, C.H., Furbank, R.T., Lunn, J.E.: Evolution and function of the sucrose-phosphate synthase gene families in wheat and other grasses. Plant Physiology 135, 1753–1764 (2004)
Article Google Scholar
Song, R., Llaca, V., Linton, E., Messing, J.: Sequence, regulation, and evolution of the maize 22-kD alpha zein gene family. Genome Res. 11, 1817–1825 (2001)
Google Scholar
Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science,
Akshay Vashist, Casimir Kulikowski & Ilya Muchnik
DIMACS Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
Ilya Muchnik

Authors

Akshay Vashist
View author publications
You can also search for this author in PubMed Google Scholar
Casimir Kulikowski
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Muchnik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology and Università di Padova,
Alberto Apostolico
Topic Chairs, P.O. Box
Concettina Guerra
Center for Molecular Biology and Computer Sciecne Department, Brown University, 115 Waterman St., 02912, Providence, RI, USA
Sorin Istrail
University of California, San Diego, USA
Pavel A. Pevzner
Department of Molecular and Computational Biology, University of Southern California, 1050 Childs Way, 90089-2910, Los Angeles, CA, USA
Michael Waterman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vashist, A., Kulikowski, C., Muchnik, I. (2006). Protein Function Annotation Based on Ortholog Clusters Extracted from Incomplete Genomes Using Combinatorial Optimization. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_10

Download citation

DOI: https://doi.org/10.1007/11732990_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33295-4
Online ISBN: 978-3-540-33296-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics