Abstract
A protein’s structure is determined by its amino acid sequence alone. In order to describe the relation between amino acid and corresponding structural sequences, we use an association rule mining approach. Traditional association rule mining is not appropriate in a sequential context. Therefore, we develop the structure FS-tree to represent subsequences and their frequencies in a sequence database, as well as the underlying construction algorithm.
A FS-tree is a prefix tree, which stores subsequences in a compact way. The sequential context oblige us to introduce a modification of the support concept, the relative support which does not give too much weight to short sequences. A 2-dimensional FS-tree for sequence pairs over different alphabets allows to obtain rules that establish the relation within the pairs.
Mining a 2-dimensional FS-tree of amino acid sequences and corresponding secondary structures, enables us to generate rules for their relation. We analyze hypothetical and observed tree size, inferring that there are short residue sequences acting as determinants of specific secondary structures. The most important rules are related to pure structure sequences, where rules for turn and helices exceed by far the rules for strands, as revealed by a rule composition analysis. By cross validation we verified that_residue sequences with high propensity to specific structure sequences apply generally, independant of a specific protein sample. These promising results motivate us to explore FS-tree related analysis in a wider range of applications including the development of rules based prediction algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chou, P.Y., Fasman, G.D.: Prediction of protein conformation. Biochemistry 13(2), 222–245 (1974)
Chou, P.Y., Fasman, G.D.: Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13(2), 211–222 (1974)
Garnier, J., Osguthorpe, D.J., Robson, B.: Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology 120(1), 97–120 (1978)
Garnier, J., Gibrat, J.F., Robson, B.: GOR method for predicting protein secondary structure from amino acid sequence. Methods in Enzymology, vol. 266, pp. 540–553. Academic Press (1996)
Strelets, V.B.: New machine learning technique for analysis and prediction of sequence and structure features: Protein secondary structure prediction. Network Science (1995)
Salzberg, S., Cost, S.: Predicting protein secondary structure with a nearest-neighbor algorithm. Journal of Molecular Biology 227(2), 371–374 (1992)
Salamov, A.A., Solovyev, V.V.: Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. Journal of Molecular Biology 247(1), 11–15 (1995)
Rost, B.: Phd: predicting one-dimensional protein structure by profile-based neural networks. Methods in Enzymology 266, 525–539 (1996)
Pollastri, G., McLysaght, A.: Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21(8), 1719–1720 (2005)
Zhang, G.Z., Huang, D.S., Zhu, Y.P., Li, Y.X.: Improving protein secondary structure prediction by using the residue conformational classes. Pattern Recognition Letters 26(15), 2346–2352 (2005)
Bock, J.R., Gough, D.A.: Predicting protein-protein interactions from primary structure. Bioinformatics 17(5), 455–460 (2001)
Bradford, J.R., Westhead, D.R.: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 21(8), 1487–1494 (2005)
Ward, J.J., McGuffin, L.J., Buxton, B.F., Jones, D.T.: Secondary structure prediction with support vector machines. Bioinformatics 19(13), 1650–1655 (2003)
Birzele, F., Kramer, S.: A new representation for protein secondary structure prediction based on frequent patterns. Bioinformatics 22(22), 2628–2634 (2006)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data SIGMOD 1993, vol. 22, pp. 207–216 (May 1993)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases, vol. 15, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28(1), 235–242 (2000)
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
Fujiwara, K., Toda, H., Ikeguchi, M.: Dependence of alpha-helical and beta-sheet amino acid propensities on the overall protein fold type. BMC Structural Biology 12(1), 18 (2012)
Costantini, S., Colonna, G., Facchiano, A.M.: Amino acid propensities for secondary structures are influenced by the protein structural class. Biochemical and Biophysical Research Communications 342(2), 441–451 (2006)
Fu, H., Grimsley, G.R., Razvi, A., Scholtz, J.M., Pace, C.N.: Increasing protein stability by improving beta-turns. Proteins 77(3), 491–498 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Mossos, N., Mejia-Carmona, D.F., Tischer, I. (2014). FS-Tree: Sequential Association Rules and First Applications to Protein Secondary Structure Analysis. In: Castillo, L., Cristancho, M., Isaza, G., PinzĂłn, A., RodrĂguez, J. (eds) Advances in Computational Biology. Advances in Intelligent Systems and Computing, vol 232. Springer, Cham. https://doi.org/10.1007/978-3-319-01568-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-01568-2_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01567-5
Online ISBN: 978-3-319-01568-2
eBook Packages: EngineeringEngineering (R0)