Aligning Discovered Patterns from Protein Family Sequences

  • En-Shiun Annie Lee
  • Dennis Zhuang
  • Andrew K. C. Wong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7632)


A basic task in protein analysis is to discover a set of sequence patterns that characterizes the function of a protein family. To address this task, we introduce a synthesized pattern representation called Aligned Pattern (AP) Cluster to discover potential functional segments in protein sequences. We apply our algorithm to identify and display the binding segments for the Cytochrome C. and Ubiquitin protein families. The resulting AP Clusters correspond to protein binding segments that surround the binding residues. When compared to the results from the protein annotation databases, PROSITE and pFam, ours are more efficient in computation and comprehensive in quality. The significance of the AP Cluster is that it is able to capture subtle variations of the binding segments in protein families. It thus could help to reduce time-consuming simulations and experimentation in the protein analysis.


Protein Analysis Protein Function Identification Pattern Discovery Pattern Clustering Hierarchical Clustering Motif Finding Local Alignment Approximate String Matching 


  1. 1.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)CrossRefGoogle Scholar
  2. 2.
    Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)CrossRefGoogle Scholar
  3. 3.
    Subramanian, A.R., Kaufmann, A.M., Morgenstern, B.: Dialign-tx: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol. Biol. 3, 6 (2008)CrossRefGoogle Scholar
  4. 4.
    Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press (1998)Google Scholar
  5. 5.
    Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1(4), 337–348 (1994)CrossRefGoogle Scholar
  6. 6.
    Frith, M.C., Hansen, U., Spouge, J.L., Weng, Z.: Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32(1), 189–200 (2004)CrossRefGoogle Scholar
  7. 7.
    Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21(1/2), 51–80 (1995)CrossRefGoogle Scholar
  8. 8.
    Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.F.: Bases of motifs for generating repeated patterns with wild cards. IEEE/ACM Transactions on Computational BIology and Bioinformatics 2(1), 40–50 (2005)CrossRefGoogle Scholar
  9. 9.
    Lee, E.-S.A., Wong, A.K.C.: Synthesizing aligned random pattern digraphs from protein sequence patterns. In: Bioinformatics and Biomedicine Workshops (BIBMW), pp. 178–185 (2011)Google Scholar
  10. 10.
    Bairoch, A.: Prosite: a dictionary of sites and patterns in proteins. Nucleic Acids Research 19, 2241–2245 (1991)Google Scholar
  11. 11.
    Sigrist, C.J.A., Cerutti, L., de Castro, E., Langendijk-Genevaux, P.S., Bulliard, V., Bairoch, A., Hulo, N.: Prosite, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 38(Database issue), 161–166 (2010)CrossRefGoogle Scholar
  12. 12.
    Sonnhammer, E.L., Eddy, S.R., Durbin, R.: Pfam: A comprehensive database of protein domain families based on seed alignments. PROTEINS: Structure, Function, and Genetics 28, 405–420 (1997)CrossRefGoogle Scholar
  13. 13.
    Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L., Eddy, S.R., Bateman, A.: The pfam protein families database. Nucleic Acids Research 211, D211–D222 (2010)CrossRefGoogle Scholar
  14. 14.
    Peng, J., Schwartz, Elias, Thoreen, Cheng, Marsischky, Roelofs, et al.: A proteomics approach to understanding protein ubiquitination. Nature Biotechnology 21(8), 921–926 (2003)CrossRefGoogle Scholar
  15. 15.
    Xu, P.P.: Characterization of polyubiquitin chain structure by middle-down mass spectrometry. Analytical Chemistry 80(9), 3438–3444 (2008)CrossRefGoogle Scholar
  16. 16.
    Kirisako, T., Kamei, K., Kato, M., Fukumoto, Kanie, Sano, Tokunaga: A ubiquitin ligase complex assembles linear polyubiquitin chains. The EMBO Journal 25(20), 4877–4887 (2006)CrossRefGoogle Scholar
  17. 17.
    Kim, H., Kim, Lledias, Kisselev, S., Skowyra, Gygi, Goldberg: Goldberg: Certain pairs of ubiquitin-conjugating enzymes (e2s) and ubiquitin-protein ligases (e3s) synthesize condegradable forked ubiquitin chains containing all possible isopeptide linkages. The Journal of Biological Chemistry 282(24), 17375–17386 (2007)CrossRefGoogle Scholar
  18. 18.
    Ikeda, F.: Dikic: Atypical ubiquitin chains: new molecular signals. ’protein modifications: Beyond the usual suspects’ review series. EMBO Reports 9 (6), 536–542 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • En-Shiun Annie Lee
    • 1
  • Dennis Zhuang
    • 1
  • Andrew K. C. Wong
    • 1
  1. 1.Centre of Pattern Analysis and Machine IntelligenceUniversity of WaterlooWaterlooCanada

Personalised recommendations