Skip to main content

FS-Tree: Sequential Association Rules and First Applications to Protein Secondary Structure Analysis

  • Conference paper
Advances in Computational Biology

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 232))

  • 2071 Accesses

Abstract

A protein’s structure is determined by its amino acid sequence alone. In order to describe the relation between amino acid and corresponding structural sequences, we use an association rule mining approach. Traditional association rule mining is not appropriate in a sequential context. Therefore, we develop the structure FS-tree to represent subsequences and their frequencies in a sequence database, as well as the underlying construction algorithm.

A FS-tree is a prefix tree, which stores subsequences in a compact way. The sequential context oblige us to introduce a modification of the support concept, the relative support which does not give too much weight to short sequences. A 2-dimensional FS-tree for sequence pairs over different alphabets allows to obtain rules that establish the relation within the pairs.

Mining a 2-dimensional FS-tree of amino acid sequences and corresponding secondary structures, enables us to generate rules for their relation. We analyze hypothetical and observed tree size, inferring that there are short residue sequences acting as determinants of specific secondary structures. The most important rules are related to pure structure sequences, where rules for turn and helices exceed by far the rules for strands, as revealed by a rule composition analysis. By cross validation we verified that_residue sequences with high propensity to specific structure sequences apply generally, independant of a specific protein sample. These promising results motivate us to explore FS-tree related analysis in a wider range of applications including the development of rules based prediction algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chou, P.Y., Fasman, G.D.: Prediction of protein conformation. Biochemistry 13(2), 222–245 (1974)

    Article  Google Scholar 

  2. Chou, P.Y., Fasman, G.D.: Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13(2), 211–222 (1974)

    Article  Google Scholar 

  3. Garnier, J., Osguthorpe, D.J., Robson, B.: Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of Molecular Biology 120(1), 97–120 (1978)

    Article  Google Scholar 

  4. Garnier, J., Gibrat, J.F., Robson, B.: GOR method for predicting protein secondary structure from amino acid sequence. Methods in Enzymology, vol. 266, pp. 540–553. Academic Press (1996)

    Google Scholar 

  5. Strelets, V.B.: New machine learning technique for analysis and prediction of sequence and structure features: Protein secondary structure prediction. Network Science (1995)

    Google Scholar 

  6. Salzberg, S., Cost, S.: Predicting protein secondary structure with a nearest-neighbor algorithm. Journal of Molecular Biology 227(2), 371–374 (1992)

    Article  Google Scholar 

  7. Salamov, A.A., Solovyev, V.V.: Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. Journal of Molecular Biology 247(1), 11–15 (1995)

    Article  Google Scholar 

  8. Rost, B.: Phd: predicting one-dimensional protein structure by profile-based neural networks. Methods in Enzymology 266, 525–539 (1996)

    Article  Google Scholar 

  9. Pollastri, G., McLysaght, A.: Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21(8), 1719–1720 (2005)

    Article  Google Scholar 

  10. Zhang, G.Z., Huang, D.S., Zhu, Y.P., Li, Y.X.: Improving protein secondary structure prediction by using the residue conformational classes. Pattern Recognition Letters 26(15), 2346–2352 (2005)

    Article  Google Scholar 

  11. Bock, J.R., Gough, D.A.: Predicting protein-protein interactions from primary structure. Bioinformatics 17(5), 455–460 (2001)

    Article  Google Scholar 

  12. Bradford, J.R., Westhead, D.R.: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 21(8), 1487–1494 (2005)

    Article  Google Scholar 

  13. Ward, J.J., McGuffin, L.J., Buxton, B.F., Jones, D.T.: Secondary structure prediction with support vector machines. Bioinformatics 19(13), 1650–1655 (2003)

    Article  Google Scholar 

  14. Birzele, F., Kramer, S.: A new representation for protein secondary structure prediction based on frequent patterns. Bioinformatics 22(22), 2628–2634 (2006)

    Article  Google Scholar 

  15. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data SIGMOD 1993, vol. 22, pp. 207–216 (May 1993)

    Google Scholar 

  16. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases, vol. 15, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)

    Google Scholar 

  17. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)

    Article  Google Scholar 

  18. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28(1), 235–242 (2000)

    Article  Google Scholar 

  19. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)

    Article  Google Scholar 

  20. Fujiwara, K., Toda, H., Ikeguchi, M.: Dependence of alpha-helical and beta-sheet amino acid propensities on the overall protein fold type. BMC Structural Biology 12(1), 18 (2012)

    Article  Google Scholar 

  21. Costantini, S., Colonna, G., Facchiano, A.M.: Amino acid propensities for secondary structures are influenced by the protein structural class. Biochemical and Biophysical Research Communications 342(2), 441–451 (2006)

    Article  Google Scholar 

  22. Fu, H., Grimsley, G.R., Razvi, A., Scholtz, J.M., Pace, C.N.: Increasing protein stability by improving beta-turns. Proteins 77(3), 491–498 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Mossos, N., Mejia-Carmona, D.F., Tischer, I. (2014). FS-Tree: Sequential Association Rules and First Applications to Protein Secondary Structure Analysis. In: Castillo, L., Cristancho, M., Isaza, G., PinzĂłn, A., RodrĂ­guez, J. (eds) Advances in Computational Biology. Advances in Intelligent Systems and Computing, vol 232. Springer, Cham. https://doi.org/10.1007/978-3-319-01568-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01568-2_28

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01567-5

  • Online ISBN: 978-3-319-01568-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics