Discovering Patterns in Gene Order

Part of the Methods in Molecular Biology book series (MIMB, volume 855)


Various genetic events during the process of natural evolution shape the landscape of the genomes. In this chapter, we explore an approach to investigating multiple genomes in order to unravel their complex relationships that go beyond their placement on a phylogeny. To this end, we treat genes as the smallest syntactic unit on the genome and explore their relative organization across multiple genomes. In the first half of the chapter, we discuss mathematical models to capture the combinatorial structures of this relative organization and statistical models to study their distributions. In the second half of the chapter, we apply these models to analyze the relationship between three closely related plant genomes.

Key words

Gene cluster Gene order comparison Pattern significance Permutation pattern Whole genome analysis Pattern discovery Statistical significance Maximal PQ trees Plant gene order 



We would like to thank Alex Feltus for providing the genomic characteristics for the three plant species (Fig. 3).


  1. 1.
    Laxmi Parida. (2007) Statistical significance of large gene clusters. Journal of Computational Biology, 14(9):1145–1149.PubMedCrossRefGoogle Scholar
  2. 2.
    M Zhang, H W Leong. (2009) Gene Team Tree: A Hierarchical Representation of Gene Teams for All Gap Lengths. J. Comp. Biol, 16(10):1383–1389.CrossRefGoogle Scholar
  3. 3.
    Q Yang, G Yi, F Zhang, M R Thon, S-H Sze. (2010) Identifying Gene Clusters within Localized Regions in Multiple Genomes. J. Comp. Biol., 17(5):657–668.CrossRefGoogle Scholar
  4. 4.
    M Zhang, H W Leong. (2010) Bidirectional best hit r-window gene clusters. BMC Bioinf., 11(Suppl 1):s63.Google Scholar
  5. 5.
    Gad Landau, Laxmi Parida, and Oren Weimann. (2005) Using PQ trees for comparative genomics. In Proc. of the Symp. on Comp. Pattern Matching, volume 3537 of Lecture Notes in Computer Science, pages 128–143. Springer-Verlag.Google Scholar
  6. 6.
    Laxmi Parida. (2007) Pattern Discovery in Bioinformatics: Theory and Algorithms. Chapman Hall.Google Scholar
  7. 7.
    Rakesh Agrawal and Ramakrishnan Srikant. (1994) Fast algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pp 487–499, Santiago, Chile.Google Scholar
  8. 8.
    R Eres, G M Landau, L Parida. (2004) Permutation pattern discovery in biosequences. J. Comp. Biol., 11(6):1050–1060.CrossRefGoogle Scholar
  9. 9.
    D Sankoff, L Haque. (2005) Power boosts for cluster tests. LNCS 3678:121–130.Google Scholar
  10. 10.
    Z Yang, D Sankoff. (2009) Natural parameter values for generalized gene adjacency. LNCS, 5817:13–23.Google Scholar
  11. 11.
    O. Jaillon et al. (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature, 449(5793):463–467.Google Scholar
  12. 12.
    G. A. Tuskan et al. (2006) The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray). Science, 313(5793): 1596–1604.Google Scholar
  13. 13.
    The Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408(6814):796–815.CrossRefGoogle Scholar
  14. 14.
    P. J. Kersey et al. (2010) Ensembl Genomes: Extending Ensembl across the taxonomic space. Nucleic Acids Research, 38(suppl 1):D563–D569.Google Scholar
  15. 15.
    Albert J. Vilella, Jessica Severin, Abel Ureta-Vidal, Li Heng, Richard Durbin, and Ewan Birney. (2009) EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Research, 19(2):327–335.PubMedCrossRefGoogle Scholar
  16. 16.
    The Gene Ontology Consortium. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25(1):25–9.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.IBM Thomas J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations