Abstract
We describe algorithms for pattern-matching and pattern-learning in TOPS diagrams (formal descriptions of protein topologies). These problems can be reduced to checking for subgraph isomorphism and finding maximal common subgraphs in a restricted class of ordered graphs. We have developed a subgraph isomorphism algorithm for ordered graphs, which performs well on the given set of data. The maximal common subgraph problem then is solved by repeated subgraph extension and checking for isomorphisms. Despite its apparent inefficiency, this approach yields an algorithm with time complexity proportional to the number of graphs in the input set and is still practical on the given set of data. As a result we obtain fast methods that can be used for building a database of protein topological motifs and for the comparison of a given protein of known secondary structure against a motif database.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Berman, H.M., Westbrook, J., Feng., Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28 (2000) 235–242.
Bron, C., Kerbosch, J.: Algorithm 457: Finding all cliques of an undirected graph. Communications of ACM 16 (1973) 575–577.
Evans, P.A.: Finding common subsequences with arcs and pseudoknots. Proceedings of Combinatorial Pattern Matching 1999, LNCS 1645 (1999) 270–280.
Flores, T.P.J., Moss, D.M., Thornton, J.M.: An algorithm for automatically generating protein topology cartoons. Protein Engineering 7 (1994) 31–37.
Gilbert, D., Westhead, D.R., Nagano, N., Thornton, J.M.: Motif-based searching in tops protein topology databases. Bioinformatics 15 (1999) 317–326.
Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Research 27 (1999) 215–219.
Holm, L., Park, J.: DaliLite workbench for protein structure comparison. Bioinformatics 16 (2000) 566–567.
Koch, I., Lengauer, T., Wanke, E.: An algorithm for finding maximal common subtopologies in a set of protein structures. Journal of Computational Biology 3 (1996) 289–306.
McGregor, J.J.: Relational consistency algorithms and their application in finding subgraph and graph isomorphisms. Information Science 19 (1979) 229–250.
Orengo, C.A.: CORA—topological fingerprints for protein structural families. Protein Science 8 (1999) 699–715.
Orengo, C.A., Michie, A.D., Jones, S., Swindelis, M.B.: CATH—a hierarchic classification of protein domain structures. Structure 5 (1997) 1093–1108.
Ullmann, J.R.: An algorithm for subgraph isomorphism. Journal of the ACM 23 (1976) 31–42.
Westhead, D.R., Hatton, D.C., Thornton, J.M.: An atlas of protein topology cartoons available on the World Wide Web. Trends in Biochemical Sciences 23 (1998) 35–36.
Westhead, D.R., Slidel, T.W.F., Flores, T.P.J., Thornton, J.M.: Protein structural topology: automated analysis and diagrammatic representation. Protein Science 8 (1999) 897–904.
Zhang, K., Wang, L., Ma, B.: Computing similarity between RNA structures. Proceedings of Combinatorial Pattern Matching 1999, LNCS 1645 (1999) 281–293.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Víksna, J., Gilbert, D. (2001). Pattern Matching and Pattern Discovery Algorithms for Protein Topologies. In: Gascuel, O., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2001. Lecture Notes in Computer Science, vol 2149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44696-6_8
Download citation
DOI: https://doi.org/10.1007/3-540-44696-6_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42516-8
Online ISBN: 978-3-540-44696-5
eBook Packages: Springer Book Archive