Skip to main content

Fast Optimal Genome Tiling with Applications to Microarray Design and Homology Search

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2452))

Abstract

In this paper we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size bound parameters, we want to find a set of tiles such that they satisfy the size bounds and the total weight of the tiles is maximized. This solution to this problem is important to a number of computational biology applications, such as selecting genomic DNA fragments for amplicon microarrays, or performing homology searches with long sequence queries. Our goal is to design efficient algorithms with linear or near-linear time and space in the normal range of parameter values for these problems. For this purpose, we discuss the solution of a basic online interval maximum problem via a sliding window approach and show how to use this solution in a nontrivial manner for many of our tiling problems. We also discuss NPhardness and approximation algorithms for generalization of our basic tiling problem to higher dimensions.

Supported in part by National Library of Medicine grant LM05110

Supported in part by NIH grants P50 HG02357 and R01 CA77808

Supported in part by NSF grant CCR-0296041 and a UIC startup fund

Supported in part by NSF grant EIA-0112934

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. D. Adams et al. The genome sequence of Drosophila melanogaster. Science, 287:2185–2195, 2000.

    Article  Google Scholar 

  2. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. A basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990.

    Google Scholar 

  3. P. Berman, B. DasGupta, and S. Muthukrishnan. On the exact size of the binary space partitioning of sets of isothetic rectangles with applications. SIAM Journal of Discrete Mathematics, 15 (2): 252–267, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  4. P. Berman, B. DasGupta, and S. Muthukrishnan. Slice and dice: A simple, improved approximate tiling recipe. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, 455–464, January 2002.

    Google Scholar 

  5. P. Berman, B. DasGupta, S. Muthukrishnan, and S. Ramaswami. Improved approximation algorithms for tiling and packing with rectangles. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, 427–436, January 2001.

    Google Scholar 

  6. P. Bertone, M. Y. Kao, M. Snyder, and M. Gerstein. The maximum sequence tiling problem with applications to DNA microarray design, submitted for journal publication.

    Google Scholar 

  7. T. H. Cormen, C. L. Leiserson and R. L. Rivest, Introduction to Algorithms, MIT Press, Cambridge, MA, 1990.

    MATH  Google Scholar 

  8. M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, 635–644, January 2002.

    Google Scholar 

  9. D. S. Hochbaum. Approximation Algorithms for NP-Hard Problems. PWS Publishing, Boston, MA, 1997.

    Google Scholar 

  10. C. E. Horak, M. C. Mahajan, N. M. Luscombe, M. Gerstein, S. M. Weissman, and M. Snyder. GATA-1 binding sites mapped in the beta-globin locus by using mammalian chip-chip analysis. Proceedings of the National Academy of Sciences of the U.S.A., 995:2924–2929, 2002.

    Google Scholar 

  11. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 15:860–921, 2001.

    Google Scholar 

  12. V. R. Iyer, C. E. Horak, C. S. Scafe, D. Botstein, M. Snyder, and P. O. Brown. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature, 409:33–538, 2001.

    Article  Google Scholar 

  13. J. Jurka. Repbase Update: a database and an electronic journal of repetitive elements. Trends in Genetics, 9:418–420, 2000.

    Article  Google Scholar 

  14. S. Khanna, S. Muthukrishnan, and M. Paterson. On approximating rectangle tiling and packing. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, 384–393, 1998.

    Google Scholar 

  15. S. Khanna, S. Muthukrishnan, and S. Skiena. Efficient array partitioning. In G. Goos, J. Hartmanis, and J. van Leeuwen, editors, Lecture Notes in Computer Science 1256: Proceedings of the 24th International Colloquium on Automata, Languages, and Programming, 616–626. Springer-Verlag, New York, NY, 1997.

    Google Scholar 

  16. D. J. Lockhart, H. Dong, M. C. Byrne, M. T. Follettie, M. V. Gallo, M. S. Chee, M. Mittmann, C. Wang, M. Kobayashi, and H. Horton et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nature Biotechnology, 14:1675–1680, 1996.

    Article  Google Scholar 

  17. K. Mullis, F. Faloona, S. Scharf, R. Saiki, G. Horn, and H. Erlich. Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harbor Symposium in Quantitative Biology, 51:263–273, 1986.

    Google Scholar 

  18. S. Muthukrishnan, V. Poosala, and T. Suel. On rectangular partitions in two dimensions: Algorithms, complexity and applications. In Proceedings of the 7th International Conference on Database Theory, 236–256, 1999.

    Google Scholar 

  19. National Center for Biotechnology Information (NCBI). http://www.ncbi.nlm.nih.gov, 2002.

  20. W. L. Ruzzo and M. Tompa. Linear time algorithm for finding all maximal scoring subsequences. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, 234–241, 1999.

    Google Scholar 

  21. D. D. Shalon and P. O. B. S. J. Smith. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Research, 6(7):639–645, July 1996.

    Article  Google Scholar 

  22. A. F. A. Smit and P. Green. RepeatMasker, repeatmasker.genome.washington.edu, 2002.

    Google Scholar 

  23. A. Smith and S. Suri. Rectangular tiling in multi-dimensional arrays. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, 786–794, 1999.

    Google Scholar 

  24. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195–197, 1981.

    Article  Google Scholar 

  25. The Arabidipsis Genome Initiative. Analysis of the genome sequence of the flowering plant arabidopsis thaliana. Nature, 408:796–815, 2000.

    Article  Google Scholar 

  26. The C. elegans Sequencing Consortium. Genome sequence of the nematode c. elegans: a platform for investigating biology. Science, 282:2012–2018, 1998.

    Article  Google Scholar 

  27. J. C. Venter et al. The sequence of the human genome. Science, 291:1304–1351, 2001.

    Article  Google Scholar 

  28. Z. Zhang, P. Berman, and W. Miller. Alignments without low-scoring regions. Journal of Computational Biology, 5(2):197–210, 1998.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Berman, P., Bertone, P., DasGupta, B., Gerstein, M., Kao, MY., Snyder, M. (2002). Fast Optimal Genome Tiling with Applications to Microarray Design and Homology Search. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_32

Download citation

  • DOI: https://doi.org/10.1007/3-540-45784-4_32

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44211-0

  • Online ISBN: 978-3-540-45784-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics