Advertisement

Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment

  • Yinglei Song
  • Chunmei Liu
  • Xiuzhen Huang
  • Russell L. Malmberg
  • Ying Xu
  • Liming Cai
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3692)

Abstract

Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence similarity but also spatially conserved conformations caused by residue interactions, and consequently is computationally intractable. It is difficult to cope with the inefficiency without compromising alignment accuracy, especially for structure search in genomes or large databases.

This paper introduces a novel method and a parameterized algorithm for structure-sequence alignment. Both the structure and the sequence are represented as graphs, where in general the graph for a biopolymer structure has a naturally small tree width. The algorithm constructs an optimal alignment by finding in the sequence graph the maximum valued subgraph isomorphic to the structure graph. It has the computational time complexity O(k t N 2) for the structure of N residues and its tree decomposition of width t. The parameter k, small in nature, is determined by a statistical cutoff for the correspondence between the structure and the sequence. The paper demonstrates a successful application of the algorithm to developing a fast program for RNA structural homology search.

Keywords

Covariance Model Structure Graph Parameterized Algorithm Tree Decomposition Subgraph Isomorphism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arnborg, S., Proskurowski, A.: Linear time algorithms for NP-hard problems restricted to partial k-trees. Discrete Applied Mathematics 23, 11–24 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Arnborg, S., Lagergren, J., Seese, D.: Easy problems for tree-decomposable graphs. Journal of Algorithms 12, 308–340 (1991)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Bowie, J., Luthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164–170 (1991)CrossRefGoogle Scholar
  4. 4.
    Bodlaender, H.L.: A linear time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on Computing 25, 1305–1317 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Bryant, S.H., Altschul, S.F.: Statistics of sequence-structure threading. Curr. Opinion Struct. Biol. 5, 236–244 (1995)CrossRefGoogle Scholar
  6. 6.
    Brown, M., Wilson, C.: RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. In: Pacific Symposium on Biocomputing, pp. 109–125 (1995)Google Scholar
  7. 7.
    Cai, L., Malmberg, R., Wu, Y.: Stochastic Modeling of Pseudoknot Structures: A Grammatical Approach. Bioinformatics 19, i66–i73 (2003)Google Scholar
  8. 8.
    Dandekar, T., Schuster, S., Snel, B., Huynen, M., Bork, P.: Pathway alignment: application to the comparative analysis of glycolytic enzymes. Biochemical Journal. 1, 115–124 (1999)CrossRefGoogle Scholar
  9. 9.
    Dandjinou, A.T., Lévesque, N., Larose, S., Lucier, J., Elela, S.A., Wellinger, R.J.: A phylogenetically based secondary structure for the yeast telomerase RNA. Current Biology 14, 1148–1158 (2004)CrossRefGoogle Scholar
  10. 10.
    Doudna, J.A.: Structural genomics of RNA. Nature Structural Biology 7(11) supp., 954–956 (2000)Google Scholar
  11. 11.
    Downey, R., Fellows, M.: Parameterized Complexity. Springer, Heidelberg (1999)Google Scholar
  12. 12.
    Eddy, S.R.: Computational genomics of non-coding RNA genes. Cell 109, 137–140 (2002)CrossRefGoogle Scholar
  13. 13.
    Eddy, S., Durbin, R.: RNA sequence analysis using covariance models. Nucleic Acids Research 22, 2079–2088 (1994)CrossRefGoogle Scholar
  14. 14.
    Eppstein, D.: Subgraph isomorphism in planar graphs and related problems. Journal of Graph Algorithms and Applications 3.3, 1–27 (1999)Google Scholar
  15. 15.
    Geobel, S.J., Hsue, B., Dombrowski, T.F., Masters, P.S.: Characterization of the RNA components of a Putative Molecular Switch in the 3’ Untranslated Region of the Murine Coronavirus Genome. Journal of Virology 78, 669–682 (2004)CrossRefGoogle Scholar
  16. 16.
    Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.R.: Rfam: an RNA family database. Nucleic Acids Research 31, 439–441 (2003)CrossRefGoogle Scholar
  17. 17.
    Klein, R.J., Eddy, S.R.: RSEARCH: Finding Homologs of Single Structured RNA Sequences. BMC Bioinformatics 4, 44 (2003)CrossRefGoogle Scholar
  18. 18.
    Lathrop, R.H.: The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Engineering 7, 1069–1068 (1994)Google Scholar
  19. 19.
    Lathrop, R.H., Rogers Jr., R.G., Bienkowska, J., Bryant, B.K.M., Buturovic, L.J., Gaitatzes, C., Nambudripad, R., White, J.V., Smith, T.F.: Analysis and algorithms for protein sequence-structure alignment. In: Salzberg, Searls, Kasif (eds.) Computational Methods in Molecular Biology. Elsevier, Amsterdam (1998)Google Scholar
  20. 20.
    Lenhof, H.-P., Reinert, K., Vingron, M.: A polyhedral approach to RNA sequence structure alignment. Journal of Computational Biology 5(3), 517–530 (1998)CrossRefGoogle Scholar
  21. 21.
    Liu, C., Song, Y., Malmberg, R., Cai, L.: Profiling and searching for RNA pseudoknot structures in genomes. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 968–975. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  22. 22.
    Lowe, T.M., Eddy, S.R.: tRNAscan-SE: A Program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25, 955–964 (1997)CrossRefGoogle Scholar
  23. 23.
    Lyngso, S.B., Pedersen, C.N.: RNA pseudoknot prediction in energy-based models. Journal of Computational Biololgy 7(3), 409–427 (2000)CrossRefGoogle Scholar
  24. 24.
    Matousek, J., Thomas, R.: On the complexity of finding iso- and other morphisms for partial k-trees. Discrete Mathematics 108, 343–364 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Marcotte, E.M., Matteo, P., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999)CrossRefGoogle Scholar
  26. 26.
    Nameki, N., Felden, B., Atkins, J.F., Gesteland, R.F., Himeno, H., Muto, A.: Functional and structural analysis of a pseudoknot upstream of the tag-encoded sequence in E. coli tmRNA. Journal of Molecular Biology 286(3), 733–744 (1999)CrossRefGoogle Scholar
  27. 27.
    Pervouchine, D.D.: IRIS: Intermolecular RNA Interaction Search. Genome Informatics 15(2), 92–101 (2004)MathSciNetGoogle Scholar
  28. 28.
    Reinert, K., Lenhof, H.-P., Mutzel, P., Mehlhorn, K., Kececioglu, J.D.: A branch-and-cut algorithm for multiple sequence alignment. In: Proceedings of the first annual international conference on Computational molecular biology, pp. 241–250 (1997)Google Scholar
  29. 29.
    Rivas, E., Eddy, S.R.: Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2, 8 (2001)CrossRefGoogle Scholar
  30. 30.
    Robertson, N., Seymour, P.D.: Graph Minors II. Algorithmic aspects of tree-width. Journal of Algorithms 7, 309–322 (1986)zbMATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Song, Y., Liu, C., Malmberg, R., Pan, F., Cai, L.: Tree decomposition based fast search of RNA secondary structures in genomes. In: Proceedings of 2005 IEEE Computational Systems Biology Conference (2005), (in press)Google Scholar
  32. 32.
    Uemura, Y., Hasegawa, A., Kobayashi, Y., Yokomori, T.: Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science 210, 277–303 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Wang, G., Dunbrack Jr., R.L.: PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)CrossRefGoogle Scholar
  34. 34.
    Xu, D., Unseren, M.A., Xu, Y., Uberbacher, E.C.: Sequence-structure specificity of a knowledge based energy function at the secondary structure level. Bioinformatics 16, 257–268 (2000)CrossRefGoogle Scholar
  35. 35.
    Xu, J.: Rapid side-chain packing via tree decomposition. In: Proceedings of 2005 International Conference on Research in Computational Biology (to appear)Google Scholar
  36. 36.
    Xu, J., Li, M., Kim, D., Xu, Y.: RAPTOR: optimal protein threading by linear programming. Journal of Bioinformatics and Computational Biology 1(1), 95–113 (2003)CrossRefGoogle Scholar
  37. 37.
    Xu, Y., Xu, D., Uberbacher, E.C.: An efficient computational method for globally optimal threading. Journal of Computational Biology 5(3), 597–614Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Yinglei Song
    • 1
  • Chunmei Liu
    • 1
  • Xiuzhen Huang
    • 2
  • Russell L. Malmberg
    • 3
  • Ying Xu
    • 4
  • Liming Cai
    • 1
  1. 1.Dept. of Computer ScienceUniv. of GeorgiaAthensUSA
  2. 2.Dept. of Computer ScienceArkansas State Univ., State UniversityUSA
  3. 3.Dept. of Plant BiologyUniv. of GeorgiaAthensUSA
  4. 4.Dept. of Biochemistry and Molecular BiologyUniv. of GeorgiaAthensUSA

Personalised recommendations