A Combinatorial Approach to Automatic Discovery of Cluster-Patterns

  • Revital Eres
  • Gad M. Landau
  • Laxmi Parida
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2812)


Functionally related genes often appear in each others neighborhood on the genome, however the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper we address the problem of automatically discovering clusters of entities be it genes or domains: we formalize the abstract problem as a discovery problem called the πpattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E Coli protein sequences.


Design and analysis of algorithms combinatorial algorithms on words discovery data mining clusters patterns motifs 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via parikh mapping. Journal of Discrete Algorithms (2003) (to appear)Google Scholar
  2. 2.
    Apostolico, A., Iliopoulos, C., Landau, G.M., Schieber, B., Vishkin, U.: Parallel construction of a suffix tree with applications. Algorithmica 3, 347–365 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Brown, J.W., Clark, G.P., Leader, D.J., Simpson, C.G., Lowe, T.: RNA 7, 1817–1832 (2001)Google Scholar
  4. 4.
    Dandekar, T., Snel, B., Huynen, M., Bork, P.: Trends Biochem. Sci. 23, 324–328 (1998)Google Scholar
  5. 5.
    Giglio, S., Broman, K.W., Matsumoto, N., Calvari, V., Gimelli, G., Neuman, T., Obashi, H., Voullaire, L., Larizza, D., Giorda, R., Weber, J.L., Ledbetter, D.H., Zuffardi, O.: Olfactory receptor-gene clusters, genomic-inversion polymorphisms, and common chromosme rearrangements. Am. J. Hum. Genet. 68(4), 874–883 (2001)CrossRefGoogle Scholar
  6. 6.
    Heber, S., Stoye, J.: Finding all common intervals of k permutations. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 207–218. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Kihara, D., Kanehisa, M.: Genome Res 10, 731–743 (2000)CrossRefGoogle Scholar
  8. 8.
    Kedem, Z.M., Landau, G.M., Palem, K.V.: Parallel suffix-prefix matching algorithm and application. SIAM Journal of Computing 25(5), 998–1023 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Karp, R., Miller, R., Rosenberg, A.: Rapid identification of repeated patterns in strngs, arrays and trees. In: Symposium on Theory of Computing, vol. 4, pp. 125–136 (1972)Google Scholar
  10. 10.
    Lawrence, J.G., Roth, J.R.: Genetics 143, 1843–1860 (1996)Google Scholar
  11. 11.
    Nakaya, A., Goto, S., Kanehisa, M.: Extraction of corelated gene clusters by mulitple graph comparison. Genome Informatics (12), 44–53 (2001)Google Scholar
  12. 12.
    Overbeek, R., Fonstein, M., Dsouza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96(6), 2896–2901 (1999)CrossRefGoogle Scholar
  13. 13.
    Ogata, H., Fujibuchi, W., Goto, S.: Nucleic Acids Res 28, 4021–4028 (2000)CrossRefGoogle Scholar
  14. 14.
    Parida, L.: Some results on flexible-pattern matching. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 33–45. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  15. 15.
    Marcott, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions. Science 285, 751–753 (1999)CrossRefGoogle Scholar
  16. 16.
    Snel, B., Lehmann, G., Bork, P., Huynen, M.A.: A web-server to retrieve and display repeatedly occurring neighbourhood of a gene. Nucleic Acids Research 28(18), 3443–3444 (2000)CrossRefGoogle Scholar
  17. 17.
    Siefert, J.L., Martin, K.A., Abdi, F., Widger, W.R., Fox, G.E.: J. Mol. Evol. 45, 467–472 (1997)CrossRefGoogle Scholar
  18. 18.
    Tamames, J., Casari, G., Ouzounis, C., Valencia, A.: J. Mol. Evol. 44, 66–73 (1997)CrossRefGoogle Scholar
  19. 19.
    Tomii, K., Kanehisa, M.: Genome Res 8, 1048–1059 (1998)Google Scholar
  20. 20.
    Watanbe, H., Mori, H., Itoh, T., Gojobori, T.: J. Mol. Evol. 44, S57–S64 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Revital Eres
    • 1
  • Gad M. Landau
    • 1
    • 2
  • Laxmi Parida
    • 3
  1. 1.Department of Computer ScienceHaifa UniversityHaifaIsrael
  2. 2.Department of Computer and Information SciencePolytechnic UniversityBrooklynUSA
  3. 3.Computational Biology CenterIBM TJ Watson Research CenterNew YorkUSA

Personalised recommendations