Abstract
Protein sequence motifs are short conserved subsequences common to related protein sequences. Information about motifs is extremely important to the study of biologically significant conserved regions in protein families. These conserved regions can determine the functions and conformation of proteins. Conventionally, recurring patterns of proteins are explored using short protein segments and classification based on similarity measures between the segments. Two protein sequences are classified into the same class if they have high homology in terms of feature patterns extracted through sequence alignment algorithms. Such methodology focuses on finding position specific motifs only. In this chapter, we propose a new algorithm to explore protein sequences by studying subsequences with relative-positioning of amino acids followed by K-Means clustering of fixed-sized segments. The dataset used for our work is most updated among studies for sequence motifs. The various biochemical tests that are found in literature are used to test the significance of motifs and these tests show that motifs generated are of both structural and functional interest. The results suggest that this method may also be applied to closely-related area of finding DNA motifs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Karp, G.: Cell and Molecular Biology(Concepts and Experiments), 3rd edn. Wiley, New York (2002)
Hulo, N., et al.: Recent Improvements to the Prosite Database. Nucl. Acids Res. (1994)
Kasuya, A., Thornton, J.M.: Three-Dimensional Structure Analysis of Prosite Patterns. Journal of Molecular Biology 286(5), 1673–1691 (1999)
Gribskov, M., McLachlan, A., Eisenberg, D.: Prole Analysis: Detection of Distantly Related Proteins. Proceedings of National Academy of Sciences 84(13), 4355–4358 (1987)
Hertz, G.Z., Stormo, G.D.: Escherichia Colipromoter Sequences: Analysis and Prediction. Methods in Enzymology 273, 30–42 (1996)
Brazma, A., Jonassen, I., Edhammer, I., Gilbert, D.: Approaches to the Automatic Discovery of Patterns in Biosequenes. Journal of Computational Biology 5, 279–305 (1998)
Vanet, A., Marson, L., Sagot, M.F.: Promotor Sequences and Algorithmical Methods for Identifying Them. Research in Microbioloby 150, 779–799 (1999)
Marson, L., Sagot, M.F.: Algorithms for Extracting Structured Motifs using Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification. Journal of Computational Biology 7, 345–362 (2000)
Han, K.F., Baker, D.: Recurring Local Sequence Motifs in Proteins. Journal of Molecular Biology 251(1), 176–187 (1995)
Han, K.F., Baker, D.: Biophysics - Global Properties of the Mapping Between Local Amino Acid Sequence and Local Structure in Proteins. Proceedings of National Academy Sciences, USA 93, 5814–5818 (1996)
Zhong, W., Altun, G., Harrison, R., Tai, P.C., Pan, Y.: Improved K-Means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property. IEEE Transactions on Nanobioscience 4(3) (2005)
Brejova, B., DiMarco, C., Vinar, T., Hidalgo, S.R., Holguin, C., Patten, C.: Finding Patterns in Biological Sequences - Project Report for CS79g. University of Waterloo (2000)
Rigoutsos, L., Floratos, A., Parida, L., Gao, Y., Platt, D.: The Emergency of Pattern Discovery Techniques in Computational Biology. Metabolic Engineering 2, 159–177 (2000)
Durbin, R., Eddy, S., Krough, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Protein and Nucleic Acid. Cambridge University Press, Cambridge (1998)
Petsko, G.A., Ringe, D.: Proteins Structure and Function. New Science Press (2003)
Pabo, C.O., Sauer, R.T.: Transcriptional Factors: Structural Familes and Principle of DNA Recognition. Annals of Revolutionary Biochemistry 61, 1053–1095 (1992)
Nelson, H.C.M.: Structure and Function of DNA-Binding Proteins. Current Opinion in Genetics and Development 5, 180–189 (1995)
Scott, M.P., Tamkun, J.W., Hartzell, G.W.: The Structure and Function of the Homeodomain. Biochemistry Biophysics Acta 989(1), 25–48 (1989)
Crochemore, M., Sagot, M.: Motifs in Sequences: Localization and Extraction. In: Hand book of Computational Chemistry. Marcel Dekker Inc., New York (2001)
Heger, A., Lappe, M., Holm, L.: Accurate Detection of Very Sparse Sequence Motifs. In: Proceedings of RECOMB, pp. 139-147 (2003)
Wang, G., Dunbrack Jr., R.L.: Pisces: Recent Improvements to a PDB Sequence Culling Server. Nucleic Acids Research 33 (2005)
Kabsh, W., Sander, C.: Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 22, 2577–2637 (1983)
Sander, C., Schneider, R.: Database of Homology Derived Protein Structures and the Structral Meaning of Sequence Alignment. Proteins Structural Functional Genetics 7(2), 121–135 (1967)
Berg, J.M., Tymoczko, J.L., Stryer, L.: Biochemistry, 5th edn. W H Freeman, New York (2002)
Robertson, A.D.: Intramolecular Interactions at Protein Surfaces and Their Impact on Protein Function. Trends Biochemistry Sciences 27, 521–526 (2002)
Kyte, J., Doolitle, R.F.: A Simple Method for Displaying the Hydropathic Character of Protein. Journal of Molecular Biology (157), 105–132 (1982)
Zimmerman, J.M., Eliezer, N., Simha, R.: The Characterization of Amino Acid Sequences in Proteins by Statistical Methods. Journal of Theoretical Biology (2001)
Finer-Moore, J., Stroud, R.M.: Amphipathic Analysis and Possible Formation of the Ion Channel in an Acetocholine Receptor. Proceedings of National Academy of Sciences, USA 81(1), 155–159 (1984)
Segrest, J.P., De Loof, H., Dohlman, L.G., Brouilette, C.G., Anantharamaiah, G.M.: Amphipathic Helix Motif: Classes and Properties. Protein Structural Functional Genetics 8(2), 103–117 (1990)
Kaiser, E.T., Kezdy, F.J.: Amphiphilic Secondary Structure: Design of Peptide Hormones. Science 223, 249–255 (1984)
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Venugopal, K.R., Srinivasa, K.G., Patnaik, L.M. (2009). Exploring Structurally Similar Protein Sequence Motifs. In: Soft Computing for Data Mining Applications. Studies in Computational Intelligence, vol 190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00193-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-00193-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00192-5
Online ISBN: 978-3-642-00193-2
eBook Packages: EngineeringEngineering (R0)