Advertisement

Using PQ Trees for Comparative Genomics

  • Gad M. Landau
  • Laxmi Parida
  • Oren Weimann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3537)

Abstract

Permutations on strings representing gene clusters on genomes have been studied earlier in [3, 12, 14, 17, 18] and the idea of a maximal permutation pattern was introduced in [12]. In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees [6]: this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal πpattern [12] and each subgraph of the PQ tree corresponds to a non-maximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome data sets. In our analysis of the whole genomes of human and rat we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, and, with E Coli K-12 and B Subtilis genomes we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters. Further, we show specific instances of functionally related genes in the two cases.

Keywords

Pattern discovery data mining clusters patterns motifs permutation patterns PQ trees comparative genomics whole genome analysis evolutionary analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexandersson, M., Cawley, S., Pachter, L.: SLAM- Cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Research 13(3), 496–502 (2003)CrossRefGoogle Scholar
  2. 2.
    Bergeron, A., Blanchette, M., Chateau, A., Chauve, C.: Reconstructing ancestral gene orders using conserved intervals. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 14–25. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Bergeron, A., Corteel, S., Raffinot, M.: The algorithmic of gene teams. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 464–476. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Bergeron, A., Mixtacki, J., Stoye, J.: Reversal Distance without Hurdles and Fortresses. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 388–399. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Bergeron, A., Stoye, J.: On the similarity of sets of permutations and its applications to genome comparison. In: Warnow, T.J., Zhu, B. (eds.) COCOON 2003. LNCS, vol. 2697, pp. 68–79. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Booth, K., Leuker, G.: Testing for the consecutive ones property, interval graphs, and graph planarity using pq-tree algorithms. Journal of Computer and System Sciences 13, 335–379 (1976)zbMATHMathSciNetCrossRefGoogle Scholar
  7. 7.
    Bray, N., Couronne, O., Dubchak, I., Ishkhanov, T., Pachter, L., Poliakov, A., Rubin, E., Ryaboy, D.: Strategies and Tools for Whole-Genome Alignments. Genome Research 13(1), 73–80 (2003)CrossRefGoogle Scholar
  8. 8.
    Bray, N., Dubchak, I., Pachter, L.: AVID: A Global Alignment Program. Genome Research 13(1), 97–102 (2003)CrossRefGoogle Scholar
  9. 9.
    Bryan, S.K., Hagensee, M.E., Moses, R.E.: DNA Polymerase III Requirement for Repair of DNA Damage Caused by Methyl Methanesulfonate and Hydrogen Peroxide. Journal of Bacteriology 16(10), 4608–4613 (1987)Google Scholar
  10. 10.
    Burns, K.H., Matzuk, M.M., Roy, A., Yan, W.: Tektin3 encodes an evolutionarily conserved putative testicular micro tubules-related protein expressed preferentially in male germ cells. Molecular Reproduction and Development 67, 295–302 (2004)CrossRefGoogle Scholar
  11. 11.
    Didier, G.: Common intervals of two sequences. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 17–24. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Eres, R., Parida, L., Landau, G.M.: A combinatorial approach to automatic discovery of cluster-patterns. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 139–150. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    He, X., Goldwasser, M.H.: Identifying conserved gene clusters in the presence of orthologous groups. In: Proceedings of the Eighth Annual International Conferences on Research in Computational Molecular Biology (RECOMB), pp. 272–280 (2004)Google Scholar
  14. 14.
    Heber, S., Stoye, J.: Finding all common intervals of k permutations. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 207–218. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    McConnell, R.M.: A certifying algorithm for the consecutive-ones property. In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), vol. 15, pp. 761–770 (2004)Google Scholar
  16. 16.
    Mulley, J., Holland, P.: Small genome, big insights. Nature 431, 916–917 (2004)CrossRefGoogle Scholar
  17. 17.
    Schmidt, T., Stoye, J.: Quadratic time algorithms for finding common intervals in two and more sequences. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 347–358. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  18. 18.
    Uno, T., Yagiura, M.: Fast algorithms to enumerate all common intervals of two permutations. Algorithmica 26(2), 290–309 (2000)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Gad M. Landau
    • 1
    • 2
  • Laxmi Parida
    • 3
  • Oren Weimann
    • 1
  1. 1.Department of Computer ScienceUniversity of HaifaHaifaIsrael
  2. 2.Department of Computer and Information SciencePolytechnic University, Six MetroTech CenterBrooklynUSA
  3. 3.Computational Biology CenterIBM TJ Watson Research CenterYorktown HeightsUSA

Personalised recommendations