Searching Protein 3-D Structures in Linear Time
Finding similar structures from 3-D structure databases of proteins is becoming more and more important issue in the post-genomic molecular biology. To compare 3-D structures of two molecules, biologists mostly use the RMSD (root mean square deviation) as the similarity measure. We propose new theoretically and practically fast algorithms for the fundamental problem of finding all the substructures of structures in a structure database of chain molecules (such as proteins), whose RMSDs to the query are within a given constant threshold. We first propose a breakthrough linear-expected-time algorithm for the problem, while the previous best-known time complexity was O(Nlogm), where N is the database size and m is the query size. For the expected time analysis, we propose to use the random-walk model (or the ideal chain model) as the model of average protein structures. We furthermore propose a series of preprocessing algorithms that enable faster queries. We checked the performance of our linear-expected-time algorithm through computational experiments over the whole PDB database. According to the experiments, our algorithm is 3.6 to 28 times faster than previously known algorithms for ordinary queries. Moreover, the experimental results support the validity of our theoretical analyses.
KeywordsTime Complexity Chain Molecule Query Size Fast Query Searching Protein
Unable to display preview. Download preview PDF.
- 4.Boyd, R.H., Phillips, P.J.: The Science of Polymer Molecules: An Introduction Concerning the Synthesis. In: Structure and Properties of the Individual Molecules That Constitute Polymeric Materials. Cambridge University Press, Cambridge (1996)Google Scholar
- 6.de Gennes, P.-G.: Scaling Concepts in Polymer Physics. Cornell University Press (1979)Google Scholar
- 9.Flory, P.J.: Statistical Mechanics of Chain Molecules. Interscience, New York (1969)Google Scholar
- 10.Gerstein, M.: Integrative database analysis in structural genomics. Nat. Struct. Biol., 960–963 (2000)Google Scholar
- 11.Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. John Hopkins University Press (1996)Google Scholar
- 14.Kallenberg, O.: Foundations of Modern Probability. Springer, Heidelberg (1997)Google Scholar