Skip to main content

Protein Function Annotation Based on Ortholog Clusters Extracted from Incomplete Genomes Using Combinatorial Optimization

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3909))

  • 1282 Accesses

Abstract

Reliable automatic protein function annotation requires methods for detecting orthologs with known function from closely related species. While current approaches are restricted to finding ortholog clusters from complete proteomes, most annotation problems arise in the context of partially sequenced genomes. We use a combinatorial optimization method for extracting candidate ortholog clusters robustly from incomplete genomes. The proposed algorithm focuses exclusively on sequence relationships across genomes and finds a subset of sequences from multiple genomes where every sequence is highly similar to other sequences in the subset. We then use an optimization criterion similar to the one for finding ortholog clusters to annotate the target sequences.

We report on a candidate annotation for proteins in the rice genome using ortholog clusters constructed from four partially complete cereal genomes – barley, maize, sorghum, wheat and the complete genome of Arabidopsis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abascal, F., Valencia, A.: Automatic annotation of protein function based on family identification. Proteins 53, 683–692 (2003)

    Article  Google Scholar 

  2. Tatusov, R., Koonin, E., Lipmann, D.: A genomic perspective on protein families. Science 278, 631–637 (1997)

    Article  Google Scholar 

  3. Enright, A.J., Van Dongen, S., Ouzonis, C.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30, 1575–1584 (2002)

    Article  Google Scholar 

  4. Petryszak, R., Kretschmann, E., Wieser, D., Apweiler, R.: The predictive power of the CluSTr database. Bioinformatics 21, 3604–3609 (2005)

    Article  Google Scholar 

  5. Wu, C.H., Huang, H., Yeh, L.S.L., Barker, W.C.: Protein family classification and functional annotation. Comput. Biol. Chem. 27, 37–47 (2003)

    Article  Google Scholar 

  6. Bru, C., Courcelle, E., Carrre, S., Beausse, Y., Dalmar, S., Kahn, D.: The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res 33, D212–215 (2005)

    Google Scholar 

  7. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucleic Acids Res 32, 138–141 (2004)

    Article  Google Scholar 

  8. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32, 226–229 (2004)

    Article  Google Scholar 

  9. Fleishmann, W., Moller, S., Gateau, A., Apweiler, R.: A novel method for automatic functional annotation of proteins. Bioinformatics 15, 228–233 (1999)

    Article  Google Scholar 

  10. Curwen, V., Wyras, E., Andrews, T.D., Clarke, L., Mongin, E., Searle, S.M., Clamp, M.: The Ensembl automatic gene annotation system. Genome Res 14, 942–950 (2004)

    Article  Google Scholar 

  11. Eisen, J., Wu, M.: Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theor. Popul. Biol. 61, 481–487 (2002)

    Article  Google Scholar 

  12. Galperin, M.Y., Koonin, E.V.: Who’s your neighbor? new computational approaches for functional genomics. Nat. Biotechnol. 18, 609–613 (2000)

    Article  Google Scholar 

  13. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997)

    Article  Google Scholar 

  14. Koski, L.B., Golding, G.B.: The closest BLAST hit is often not the nearest neighbor. J. Mol. Biol. 52, 540–542 (2001)

    Google Scholar 

  15. Remm, M., Strom, C.E., Sonnhammer, E.L.: Automatics clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001)

    Article  Google Scholar 

  16. Li, L., Stoeckert, C.K., Roos, D.S.: OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178–2189 (2003)

    Article  Google Scholar 

  17. Tatusov, R., Fedorova, N., Jackson, J., Jacobs, A., Kiryutin, B., Koonin, E., Krylov, D., Mazumdes, R., Mekhedov, S., Nikolskaya, A., Rao, B., Smirnov, S., Sverdlov, A., Vasudevan, S., Wolf, Y., Yin, J., Natale, D.: The COG database: an updated version includes eukaryotes. BioMed Central Bioinformatics (2003)

    Google Scholar 

  18. Abascal, F., Valencia, A.: Clustering of proximal sequence space for identification of protein families. Bioinformatics 18, 908–921 (2002)

    Article  Google Scholar 

  19. Vashist, A., Kulikowski, C., Muchnik, I.: Ortholog clustering on a multipartite graph. In: Workshop on Algorithms in Bioinformatics, pp. 328–340 (2005)

    Google Scholar 

  20. Kamvysselis, M., Patterson, N., Birren, B., Berger, B., Lander, E.: Whole-genome comparative annotation and regulatory motif discovery in multiple yeast species. In: RECOMB, pp. 157–166 (2003)

    Google Scholar 

  21. Huynen, M.A., Bork, P.: Measuring genome evolution. Proc. Natl. Acad. Sci. USA 95, 5849–5856 (1998)

    Article  Google Scholar 

  22. Fujibuchi, W., Ogata, H., Matsuda, H., Kanehisa, M.: Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping. Nucleic Acids Res 28, 4036–4096 (2002)

    Google Scholar 

  23. Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999)

    Article  Google Scholar 

  24. He, X., Goldwasser, M.H.: Identifying conserved gene clusters in the presence of orthologous groups. In: RECOMB, pp. 272–280 (2004)

    Google Scholar 

  25. Dandekar, T., Snel, B., Huynen, M., Bork, P.: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998)

    Article  Google Scholar 

  26. Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Workshop on Algorithms in Bioinformatics, pp. 252–263 (2001)

    Google Scholar 

  27. Cannon, S.B., Young, N.D.: OrthoParaMap: distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics 4 (2003)

    Google Scholar 

  28. Dong, Q., Schlueter, D., Brendel, V.: PlantGDB, plant genome database and analysis tools. Nucleic Acids Res 32, D354–D359 (2004)

    Google Scholar 

  29. Schoof, H., Zaccaria, P., Gundlach, H., Lemcke, K., Rudd, S., Kolesov, G., Mewes, R.A.H., Mayer, K.: MIPS arabidopsis thaliana database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome. Nucleic Acids Res 30, 91–93 (2002)

    Article  Google Scholar 

  30. Kellogg, E.A.: Relationships of cereal crops and other grasses. Proc. Natl. Acad. Sci. USA 95, 2005–2010 (1998)

    Article  Google Scholar 

  31. Darlingto, H., Rouster, J., Hoffmann, L., Halford, N., Shewry, P., Simpson, D.: Identification and molecular characterisation of hordoindolines from barley grain. Plant Mol. Biol. 47, 785–794 (2001)

    Article  Google Scholar 

  32. Castleden, C.K., Aoki, N., Gillespie, V.J., MacRae, E.A., Quick, W.P., Buchner, P., Foyer, C.H., Furbank, R.T., Lunn, J.E.: Evolution and function of the sucrose-phosphate synthase gene families in wheat and other grasses. Plant Physiology 135, 1753–1764 (2004)

    Article  Google Scholar 

  33. Song, R., Llaca, V., Linton, E., Messing, J.: Sequence, regulation, and evolution of the maize 22-kD alpha zein gene family. Genome Res. 11, 1817–1825 (2001)

    Google Scholar 

  34. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vashist, A., Kulikowski, C., Muchnik, I. (2006). Protein Function Annotation Based on Ortholog Clusters Extracted from Incomplete Genomes Using Combinatorial Optimization. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_10

Download citation

  • DOI: https://doi.org/10.1007/11732990_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33295-4

  • Online ISBN: 978-3-540-33296-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics