FS-Tree: Sequential Association Rules and First Applications to Protein Secondary Structure Analysis

Mossos, Nilson; Mejia-Carmona, Diego Fernando; Tischer, Irene

doi:10.1007/978-3-319-01568-2_28

Nilson Mossos⁷,
Diego Fernando Mejia-Carmona⁸ &
Irene Tischer⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 232))

2071 Accesses

Abstract

A protein’s structure is determined by its amino acid sequence alone. In order to describe the relation between amino acid and corresponding structural sequences, we use an association rule mining approach. Traditional association rule mining is not appropriate in a sequential context. Therefore, we develop the structure FS-tree to represent subsequences and their frequencies in a sequence database, as well as the underlying construction algorithm.

A FS-tree is a prefix tree, which stores subsequences in a compact way. The sequential context oblige us to introduce a modification of the support concept, the relative support which does not give too much weight to short sequences. A 2-dimensional FS-tree for sequence pairs over different alphabets allows to obtain rules that establish the relation within the pairs.

Mining a 2-dimensional FS-tree of amino acid sequences and corresponding secondary structures, enables us to generate rules for their relation. We analyze hypothetical and observed tree size, inferring that there are short residue sequences acting as determinants of specific secondary structures. The most important rules are related to pure structure sequences, where rules for turn and helices exceed by far the rules for strands, as revealed by a rule composition analysis. By cross validation we verified that_residue sequences with high propensity to specific structure sequences apply generally, independant of a specific protein sample. These promising results motivate us to explore FS-tree related analysis in a wider range of applications including the development of rules based prediction algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chou, P.Y., Fasman, G.D.: Prediction of protein conformation. Biochemistry 13(2), 222–245 (1974)
Article Google Scholar
Chou, P.Y., Fasman, G.D.: Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13(2), 211–222 (1974)
Article Google Scholar
Garnier, J., Osguthorpe, D.J., Robson, B.: Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology 120(1), 97–120 (1978)
Article Google Scholar
Garnier, J., Gibrat, J.F., Robson, B.: GOR method for predicting protein secondary structure from amino acid sequence. Methods in Enzymology, vol. 266, pp. 540–553. Academic Press (1996)
Google Scholar
Strelets, V.B.: New machine learning technique for analysis and prediction of sequence and structure features: Protein secondary structure prediction. Network Science (1995)
Google Scholar
Salzberg, S., Cost, S.: Predicting protein secondary structure with a nearest-neighbor algorithm. Journal of Molecular Biology 227(2), 371–374 (1992)
Article Google Scholar
Salamov, A.A., Solovyev, V.V.: Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. Journal of Molecular Biology 247(1), 11–15 (1995)
Article Google Scholar
Rost, B.: Phd: predicting one-dimensional protein structure by profile-based neural networks. Methods in Enzymology 266, 525–539 (1996)
Article Google Scholar
Pollastri, G., McLysaght, A.: Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21(8), 1719–1720 (2005)
Article Google Scholar
Zhang, G.Z., Huang, D.S., Zhu, Y.P., Li, Y.X.: Improving protein secondary structure prediction by using the residue conformational classes. Pattern Recognition Letters 26(15), 2346–2352 (2005)
Article Google Scholar
Bock, J.R., Gough, D.A.: Predicting protein-protein interactions from primary structure. Bioinformatics 17(5), 455–460 (2001)
Article Google Scholar
Bradford, J.R., Westhead, D.R.: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 21(8), 1487–1494 (2005)
Article Google Scholar
Ward, J.J., McGuffin, L.J., Buxton, B.F., Jones, D.T.: Secondary structure prediction with support vector machines. Bioinformatics 19(13), 1650–1655 (2003)
Article Google Scholar
Birzele, F., Kramer, S.: A new representation for protein secondary structure prediction based on frequent patterns. Bioinformatics 22(22), 2628–2634 (2006)
Article Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data SIGMOD 1993, vol. 22, pp. 207–216 (May 1993)
Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases, vol. 15, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)
Article Google Scholar
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28(1), 235–242 (2000)
Article Google Scholar
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
Article Google Scholar
Fujiwara, K., Toda, H., Ikeguchi, M.: Dependence of alpha-helical and beta-sheet amino acid propensities on the overall protein fold type. BMC Structural Biology 12(1), 18 (2012)
Article Google Scholar
Costantini, S., Colonna, G., Facchiano, A.M.: Amino acid propensities for secondary structures are influenced by the protein structural class. Biochemical and Biophysical Research Communications 342(2), 441–451 (2006)
Article Google Scholar
Fu, H., Grimsley, G.R., Razvi, A., Scholtz, J.M., Pace, C.N.: Increasing protein stability by improving beta-turns. Proteins 77(3), 491–498 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of System Engineering and Computer Science, Universidad del Valle, Cali, Colombia
Nilson Mossos & Irene Tischer
Graduate School of Biomedical Sciences, Universidad del Valle, Calle 13 #, 100 - 00, Cali, Colombia
Diego Fernando Mejia-Carmona

Authors

Nilson Mossos
View author publications
You can also search for this author in PubMed Google Scholar
Diego Fernando Mejia-Carmona
View author publications
You can also search for this author in PubMed Google Scholar
Irene Tischer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Caldas, Manizales, Colombia
Luis F. Castillo
Cenicafé - Centro Nacional de Investigaciones del Café en Colombia, Chinchiná, Colombia
Marco Cristancho
University of Caldas, Manizales, Colombia
Gustavo Isaza
BIOS - Centro Bioinformática y Biologia Computacional de Colombia, Manizales, Colombia
Andrés Pinzón
Department of Computer Science School of Science, University of Salamanca, Salamanca, Spain
Juan Manuel Corchado Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mossos, N., Mejia-Carmona, D.F., Tischer, I. (2014). FS-Tree: Sequential Association Rules and First Applications to Protein Secondary Structure Analysis. In: Castillo, L., Cristancho, M., Isaza, G., Pinzón, A., Rodríguez, J. (eds) Advances in Computational Biology. Advances in Intelligent Systems and Computing, vol 232. Springer, Cham. https://doi.org/10.1007/978-3-319-01568-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-01568-2_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01567-5
Online ISBN: 978-3-319-01568-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics