Skip to main content

A Retrospective on Genomic Preprocessing for Comparative Genomics

  • Chapter

Part of the book series: Computational Biology ((COBO,volume 19))

Abstract

In this paper, we present a survey of research on genomic preprocessing for comparative genomics, i.e., handling genomes with gene repetitions, missing or redundant genes, initiated by David Sankoff in 1999. The development of this research ends with several interesting results within and beyond computational biology and bioinformatics, with possible new contributions in the future. We will describe the history of development of this research and review the current status of the corresponding problems. For the problem of handling missing genes (scaffold filling), we also present some technical details which are not given in the previous papers. Some open problems will be listed at the end for further research.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15, 1093–1115 (2008)

    Article  MathSciNet  Google Scholar 

  2. Angibaud, S., Fertin, G., Rusu, I., Thevenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bader, D., Moret, B., Yan, M.: A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. J. Comput. Biol. 8(5), 483–491 (2001)

    Article  Google Scholar 

  4. Bafna, V., Pevzner, P.: Sorting by reversals: genome rearrangements in plant organelles and evolutionary history of X chromosome. Mol. Biol. Evol. 12, 239–246 (1995)

    Google Scholar 

  5. Bar-Yehuda, R., Halldórsson, M.M., Naor, J.(S.), Shachnai, H., Shapira, I.: Scheduling split intervals. SIAM J. Comput. 36, 1–15 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bereg, S., Jiang, M., Wang, W., Yang, B., Zhu, B.: Simplifying 3D polygonal chains under the discrete Fréchet distance. In: Proc. 8th Latin American Theoretical Informatics Symposium (LATIN’08), April 7–11, 2008. LNCS, vol. 4957, pp. 630–641 (2008)

    Google Scholar 

  7. Bergeron, A., Mixtacki, J., Stoye, J.: On sorting by translocations. J. Comput. Biol. 13(2), 567–578 (2006)

    Article  MathSciNet  Google Scholar 

  8. Bergeron, A., Stoye, J.: On the similarity of sets of permutations and its applications to genome comparison. In: Proc. 9th Intl. Ann. Comput. and Combinatorics (COCOON’03). LNCS, vol. 2697, pp. 68–79 (2003)

    Chapter  Google Scholar 

  9. Berman, P., Hannenhalli, S., Karpinski, M.: 1.375-approximation algorithm for sorting by reversals. In: Proceedings of the 10th Annual European Symposium on Algorithms (ESA’02), pp. 200–210 (2002)

    Google Scholar 

  10. Bertrand, D., Blanchette, M., El-Mabrouk, N.: Genetic map refinement using a comparative genomic approach. J. Comput. Biol. 16(10), 1475–1486 (2009)

    Article  MathSciNet  Google Scholar 

  11. Blin, G., Rizzi, R.: Conserved interval distance computation between non-trivial genomes. In: Proc. 11th Intl. Ann. Comput. and Combinatorics (COCOON’05). LNCS, vol. 3595, pp. 22–31 (2005)

    Chapter  Google Scholar 

  12. Blin, G., Chauve, C., Fertin, G., Rizzi, R., Vialette, S.: Comparing genomes with duplicates: a computational complexity point of view. IEEE/ACM Trans. Comput. Biol. Bioinform. 4, 523–534 (2007)

    Article  Google Scholar 

  13. Blin, G., Fertin, G., Sikora, F., Vialette, S.: The exemplar breakpoint distance for non-trivial genomes cannot be approximated. In: Proc. 3nd Workshop on Algorithm and Computation (WALCOM’09). LNCS, vol. 5431, pp. 357–368 (2009)

    Google Scholar 

  14. Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J. (eds.) Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment, and the Evolution of Gene Families, pp. 207–212. Kluwer Academic, Dordrecht (2000)

    Chapter  Google Scholar 

  15. Bulteau, L., Fertin, G., Jiang, M., Rusu, I.: Tractability and approximability of maximal strip recovery. Theor. Comput. Sci. 440–441, 14–28 (2012)

    Article  MathSciNet  Google Scholar 

  16. Bulteau, L., Fertin, G., Rusu, I.: Sorting by transpositions is difficult. SIAM J. Discrete Math. 26(3), 1148–1180 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  17. Caprara, A.: Sorting permutations by reversals and Eulerian cycle decompositions. SIAM J. Discrete Math. 12, 91–110 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  18. Chen, J., Huang, X., Kanj, I., Xia, G.: Linear FPT reductions and computational lower bounds. In: Proceedings of the 36th ACM Symposium on Theory of Computing (STOC’04), pp. 212–221 (2004)

    Google Scholar 

  19. Chen, X.: On sorting permutations by double-cut-and-joins. In: Proc. of the 16th International Conf. on Computing and Combinatorics (COCOON’10), pp. 439–448 (2010)

    Chapter  Google Scholar 

  20. Chen, X., Sun, R., Yu, J.: Approximating the double-cut-and-join distance between unsigned genomes. BMC Bioinform. 12(Suppl. 9), S17 (2011)

    Article  Google Scholar 

  21. Chen, Z., Fu, B., Zhu, B.: The approximability of the exemplar breakpoint distance problem. In: Proc. 2nd Intl. Conf. on Algorithmic Aspects in Information and Management (AAIM’06). LNCS, vol. 4041, pp. 291–302 (2006)

    Chapter  Google Scholar 

  22. Chen, Z., Fu, B., Fowler, R., Zhu, B.: Lower bounds on the approximation of the exemplar conserved interval distance problem of genomes. In: Proc. 12th Intl. Ann. Comput. and Combinatorics (COCOON’06). LNCS, vol. 4112, pp. 245–254 (2006)

    Chapter  Google Scholar 

  23. Chen, Z., Fu, B., Yang, B., Xu, J., Zhao, Z., Zhu, B.: Non-breaking similarity of genomes with gene repetitions. In: Proceedings of the 18th Annual Symposium on Combinatorial Pattern Matching (CPM’07). LNCS, vol. 4580, pp. 119–130 (2007)

    Chapter  Google Scholar 

  24. Chen, Z., Fu, B., Fowler, R., Zhu, B.: On the inapproximability of the exemplar conserved interval distance problem of genomes. J. Comb. Optim. 15(2), 201–221 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  25. Chen, Z., Fu, B., Jiang, M., Zhu, B.: On recovering syntenic blocks from comparative maps. J. Comb. Optim. 18, 307–318 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  26. Chen, Z., Fu, B., Goebel, R., Lin, G., Tong, W., Xu, J., Yang, B., Zhao, Z., Zhu, B.: On the approximability of the exemplar non-breakpoint similarity problem of genomes with gene repetitions. Theor. Comput. Sci. (2013, to appear)

    Google Scholar 

  27. Choi, V., Zheng, C., Zhu, Q., Sankoff, D.: Algorithms for the extraction of synteny blocks from comparative maps. In: Proc. of the 7th International Workshop on Algorithms in Bioinformatics (WABI’07), pp. 277–288 (2007)

    Chapter  Google Scholar 

  28. Christie, D.: A 3/2-approximation algorithm for sorting by reversals. In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’98), pp. 244–252 (1998)

    Google Scholar 

  29. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  30. Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. In: Proc. 13th ACM-SIAM Symp. on Discrete Algorithms (SODA’02), pp. 667–676 (2002)

    Google Scholar 

  31. Cui, Y., Wang, L., Zhu, D., Liu, X.: A (1.5+ϵ)-approximation algorithm for unsigned translocation distance. IEEE/ACM Trans. Comput. Biol. Bioinform. 5(1), 56–66 (2008)

    Article  Google Scholar 

  32. Downey, R., Fellows, M.: Parameterized Complexity. Springer, Berlin (1999)

    Book  Google Scholar 

  33. Elias, I., Hartman, T.: A 1.375-approximation algorithm for sorting by transpositions. IEEE/ACM Trans. Comput. Biol. Bioinform. 3, 369–379 (2006)

    Article  Google Scholar 

  34. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)

    MATH  Google Scholar 

  35. Gascuel, O. (ed.): Mathematics of Evolution and Phylogeny. Oxford University Press, Oxford (2004)

    Google Scholar 

  36. Goldstein, A., Kolman, P., Zheng, J.: Minimum common string partitioning problem: hardness and approximations. In: Proc.15th Intl. Symposium on Algorithms and Computation (ISAAC’04). LNCS, vol. 3341, pp. 473–484 (2011). Also in: Electron. J. Comb. 12, paper R50 (2005)

    Google Scholar 

  37. Hannenhalli, S.: Polynomial-time algorithm for computing translocation distance between genomes. Discrete Appl. Math. 71(1–3), 137–151 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  38. Hannenhalli, S., Pevzner, P.: Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J. ACM 46(1), 1–27 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  39. Hästad, J.: Clique is hard to approximate within n 1−ϵ. Acta Math. 182, 105–142 (1999)

    Article  MathSciNet  Google Scholar 

  40. Jiang, H., Zheng, C., Sankoff, D., Zhu, B.: Scaffold filling under the breakpoint distance. In: Proc. of the 2010 International RECOMB-CG Workshop (RECOMB-CG’10). LNBI, vol. 6398, pp. 83–92 (2010)

    Google Scholar 

  41. Jiang, H., Zhong, F., Zhu, B.: Filling scaffolds with gene repetitions: maximizing the number of adjacencies. In: Proc. 22nd Annual Symposium on Combinatorial Pattern Matching (CPM’11). LNCS, vol. 6661, pp. 55–64 (2011)

    Chapter  Google Scholar 

  42. Jiang, H., Zhu, B., Zhu, D.: Algorithms for sorting unsigned linear genomes by the DCJ operations. Bioinformatics 27(3), 311–316 (2011)

    Article  Google Scholar 

  43. Jiang, H., Li, Z., Lin, G., Wang, L., Zhu, B.: Exact and approximation algorithms for the complementary maximal strip recovery problem. J. Comb. Optim. 23(4), 493–506 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  44. Jiang, H., Zheng, C., Sankoff, D., Zhu, B.: Scaffold filling under the breakpoint and related distances. IEEE/ACM Trans. Bioinform. Comput. Biol. 9(4), 1220–1229 (2012)

    Article  Google Scholar 

  45. Jiang, H., Zhu, B.: A linear kernel for the complementary maximal strip recovery problem. In: Proc. 23rd Annual Combinatorial Pattern Matching Symposium (CPM’12). LNCS, vol. 7354, pp. 349–359 (2012)

    Chapter  Google Scholar 

  46. Jiang, H., Wang, L., Zhu, B., Zhu, D.: A (1.408+ϵ)-approximation algorithm for sorting unsigned genomes by reciprocal translocations. In: RECOMB’13, poster (2013)

    Google Scholar 

  47. Jiang, M.: The zero exemplar distance problem. In: Proc. of the 2010 International RECOMB-CG Workshop (RECOMB-CG’10). LNBI, vol. 6398, pp. 74–82 (2010)

    Google Scholar 

  48. Kaplan, H., Shamir, R., Tarjan, R.: A faster and simpler algorithm for sorting signed permutations by reversals. SIAM J. Comput. 29, 880–892 (1999)

    Article  MathSciNet  Google Scholar 

  49. Li, G., Qin, X., Wang, X., Zhu, B.: A linear-time algorithm for computing translocation distance between signed genomes. In: Proc. of the 15th Annual Symposium on Combinatorial Pattern Matching (CPM’04), pp. 323–332 (2004)

    Chapter  Google Scholar 

  50. Lin, G., Goebel, R., Li, Z., Wang, L.: An improved approximation algorithm for the complementary maximal strip recovery problem. J. Comput. Syst. Sci. 78(3), 720–730 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  51. Liu, N., Jiang, H., Zhu, D., Zhu, B.: An improved approximation algorithm for scaffold filling to maximize the common adjacencies. In: Proc. of the 19th Intl. Conf. on Computing and Combinatorics (COCOON’13). LNCS, vol. 7936, pp. 397–408 (2013)

    Chapter  Google Scholar 

  52. Makaroff, C., Palmer, J.: Mitochondrial DNA rearrangements and transcriptional alternatives in the male sterile cytoplasm of Ogura radish. Mol. Cell. Biol. 8, 1474–1480 (1988)

    Google Scholar 

  53. Marron, M., Swenson, K., Moret, B.: Genomic distances under deletions and insertions. Theor. Comput. Sci. 325(3), 347–360 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  54. Muñoz, A., Zheng, C., Zhu, Q., Albert, V., Rounsley, S., Sankoff, D.: Scaffold filling, contig fusion and gene order comparison. BMC Bioinform. 11, 304 (2010)

    Article  Google Scholar 

  55. Nguyen, C.T., Tay, Y.C., Zhang, L.: Divide-and-conquer approach for the exemplar breakpoint distance. Bioinformatics 21(10), 2171–2176 (2005)

    Article  Google Scholar 

  56. Ozery-Flato, M., Shamir, R.: An \(O(n^{\frac{3}{2}}\sqrt{\log n})\) algorithm for sorting by reciprocal translocations. J. Discrete Algorithms 9(4), 344–357 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  57. Palmer, J., Herbon, L.: Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J. Mol. Evol. 27, 87–97 (1988)

    Article  Google Scholar 

  58. Peng, C., Zhou, J., Zhu, B., Zhu, H.: The program download problem: complexity and algorithms. In: Proc. of the 19th Intl. Conf. on Computing and Combinatorics (COCOON’13). LNCS, vol. 7936, pp. 688–695 (2013)

    Chapter  Google Scholar 

  59. Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 16(11), 909–917 (1999)

    Article  Google Scholar 

  60. Schaefer, T.: The complexity of satisfiability problem. In: Proceedings of the 10th ACM Symposium on Theory of Computing (STOC’78), pp. 216–226 (1978)

    Google Scholar 

  61. Sturtevant, A.: A crossover reducer in Drosophila melanogaster due to inversion of a section of the third chromosome. Biol. Zent.bl. 46, 697–702 (1926)

    Google Scholar 

  62. Sturtevant, A., Dobzhansky, T.: Inversions in the third chromosome of wild races of drosophila pseudoobscura, and their use in the study of the history of the species. Proc. Natl. Acad. Sci. USA 22, 448–450 (1936)

    Article  Google Scholar 

  63. Swenson, K., Rajan, V., Lin, Y., Moret, B.: Sorting signed permutations by inversions in O(nlogn) time. J. Comput. Biol. 17(3), 489–501 (2010)

    Article  Google Scholar 

  64. Tannier, E., Sagot, M.-F.: Sorting by reversals in subquadratic time. In: Proc. of 15th Symp. Combinatorial Pattern Matching (CPM’04), pp. 1–13 (2004)

    Chapter  Google Scholar 

  65. Wang, L., Zhu, B.: On the tractability of maximal strip recovery. J. Comput. Biol. 17(7), 907–914 (2010). (Correction, 18(1) (Jan. 2011))

    Article  MathSciNet  Google Scholar 

  66. Watterson, G., Ewens, W., Hall, T., Morgan, A.: The chromosome inversion problem. J. Theor. Biol. 99, 1–7 (1982)

    Article  Google Scholar 

  67. Wylie, T., Zhu, B.: Protein chain pair simplification under the discrete Frechet distance. IEEE/ACM Trans. Comput. Biol. Bioinform. 2013). doi:167B699B-E22D-471A-8EE7-01F51E8230D4. Special Issue of ISBRA’12

    Google Scholar 

  68. Yap, I., Schneider, D., Kleinberg, J., et al.: A graph-theoretic approach to comparing and integrating genetic, physical and sequence-based maps. Genetics 165, 2235–2247 (2003)

    Google Scholar 

  69. Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21, 3340–3346 (2005)

    Article  Google Scholar 

  70. Zheng, C., Zhu, Q., Sankoff, D.: Removing noise and ambiguities from comparative maps in rearrangement analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. 4, 515–522 (2007)

    Article  Google Scholar 

  71. Zhu, D., Wang, L.: On the complexity of unsigned translocation distance. Theor. Comput. Sci. 352(1–3), 322–328 (2006)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

I would like to thank my collaborators for this series of research: Zhixiang Chen, Richard Fowler, Bin Fu, Haitao Jiang, Minghui Jiang, Zhong Li, Guohui Lin, Nan Liu, David Sankoff, Weitian Tong, Lusheng Wang, Boting Yang, Zhiyu Zhao, Chunfang Zheng and Daming Zhu.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binhai Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Zhu, B. (2013). A Retrospective on Genomic Preprocessing for Comparative Genomics. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds) Models and Algorithms for Genome Evolution. Computational Biology, vol 19. Springer, London. https://doi.org/10.1007/978-1-4471-5298-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5298-9_9

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5297-2

  • Online ISBN: 978-1-4471-5298-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics