Abstract
Consensus and sequence pattern analysis on family alignments are extensively used to identify new family members and to determine functionally and structurally important identities. Since these common approaches emphasize dominant characteristics of the family and assume residue identities are independent at each position, there is no way to describe residue preferences outside of the family consensus. In this study, we propose a novel approach to detect motifs outside the consensus of a protein family alignment via an information theoretic approach. We implemented an algorithm that discovers frequent residue motifs that are high in information content and outside of the family consensus, called informative motifs, inspired by the classic Apriori algorithm. We observed that these informative motifs are mostly spatially localized and present distinctive features of various members of the family. Availability: The source code is available upon request from the authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gribskov, M., McLachlan, A.D., Eisenberg, D.: Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358 (1987)
Gribskov, M., Luthy, R., Eisenberg, D.: Profile analysis. Methods in Enzymology 183, 146 (1990)
Baldi, P., Chauvin, Y., Hunkapiller, T., McClure, M.A.: Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. 91(3), 1059–1063 (1994)
Eddy, S., Mitchison, G., Durbin, R.: Maximum discrimination hidden Markov models of sequence consensus. J. Comput. Biol. 2, 9–23 (1995)
Krogh, A., Brown, M., Mian, I.S., Sjolander, K., Haussler, D.: Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)
Schneider, T.D., Stephens, R.M.: Sequence logos: a new way to display consensus sequences. Nuc. Acids Res. 18(20), 6097–6100 (1990)
Halperin, I., Wolfson, H., Nussinov, R.: Correlated Mutations: Advances and limitations.A study on fusion proteins and on the Chesin-Dockerin family. Proteins 63, 832–845 (2006)
Valdar, W.S.J.: Scoring residue conservation. Proteins 48, 227–241 (2002)
Ray, W.C.: MAVL and StickWRLD: visually exploring relationships in nucleic acid sequence alignments. Nucleic Acids Res. 32, W59–W63 (2004)
Ray, W.C.: MAVL/StickWRLD for protein: visualizing protein sequence families to detect non-consensus features. Nucleic Acids Res. 33, W315–W319 (2005)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, Santiago, Chile, pp. 487–499 (1994)
Shannon, C.E.: A mathematical theory of communication. Bell Sys. Tech. J. 27, 379–423, 623–656 (1948)
Dunham, M.: Association Rules. In: Data Mining: Introductary and Advanced Topics, pp. 164–191. Prentice-Hall, Englewood Cliffs (2002)
Finn, R.D., Mistry, J., Schuster-Böckler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S.R., Sonnhammer, E.L.L., Bateman, A.: Pfam: clans, web tools and services. Nucleic Acids Research 34(Database issue), D247–D251 (2006)
Long, J.J., Wang, J.L., Berry, J.O.: Cloning and analysis of the C4 photosynthetic NAD-dependent malic enzyme of amaranth mitochondria. J. Biol. Chem. 269(4), 2827–2833 (1998)
Berry, M., Phillips Jr., G.N.: Crystal Structures of Bacillus stearothermophilus Adenylate kinase with boundAp5A,Mg2+Ap5A, and Mn2+ Ap5A reveal an intermediate lid position and six coordinate octahedral geometry for bound Mg2+ and Mn2+. Prot. Str. Func. Gen. 32, 276–288 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ozer, H.G., Ray, W.C. (2007). Informative Motifs in Protein Family Alignments. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-74126-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)