Abstract
In this paper we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size bound parameters, we want to find a set of tiles such that they satisfy the size bounds and the total weight of the tiles is maximized. This solution to this problem is important to a number of computational biology applications, such as selecting genomic DNA fragments for amplicon microarrays, or performing homology searches with long sequence queries. Our goal is to design efficient algorithms with linear or near-linear time and space in the normal range of parameter values for these problems. For this purpose, we discuss the solution of a basic online interval maximum problem via a sliding window approach and show how to use this solution in a nontrivial manner for many of our tiling problems. We also discuss NPhardness and approximation algorithms for generalization of our basic tiling problem to higher dimensions.
Supported in part by National Library of Medicine grant LM05110
Supported in part by NIH grants P50 HG02357 and R01 CA77808
Supported in part by NSF grant CCR-0296041 and a UIC startup fund
Supported in part by NSF grant EIA-0112934
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
M. D. Adams et al. The genome sequence of Drosophila melanogaster. Science, 287:2185–2195, 2000.
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. A basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990.
P. Berman, B. DasGupta, and S. Muthukrishnan. On the exact size of the binary space partitioning of sets of isothetic rectangles with applications. SIAM Journal of Discrete Mathematics, 15 (2): 252–267, 2002.
P. Berman, B. DasGupta, and S. Muthukrishnan. Slice and dice: A simple, improved approximate tiling recipe. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, 455–464, January 2002.
P. Berman, B. DasGupta, S. Muthukrishnan, and S. Ramaswami. Improved approximation algorithms for tiling and packing with rectangles. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, 427–436, January 2001.
P. Bertone, M. Y. Kao, M. Snyder, and M. Gerstein. The maximum sequence tiling problem with applications to DNA microarray design, submitted for journal publication.
T. H. Cormen, C. L. Leiserson and R. L. Rivest, Introduction to Algorithms, MIT Press, Cambridge, MA, 1990.
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, 635–644, January 2002.
D. S. Hochbaum. Approximation Algorithms for NP-Hard Problems. PWS Publishing, Boston, MA, 1997.
C. E. Horak, M. C. Mahajan, N. M. Luscombe, M. Gerstein, S. M. Weissman, and M. Snyder. GATA-1 binding sites mapped in the beta-globin locus by using mammalian chip-chip analysis. Proceedings of the National Academy of Sciences of the U.S.A., 995:2924–2929, 2002.
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 15:860–921, 2001.
V. R. Iyer, C. E. Horak, C. S. Scafe, D. Botstein, M. Snyder, and P. O. Brown. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature, 409:33–538, 2001.
J. Jurka. Repbase Update: a database and an electronic journal of repetitive elements. Trends in Genetics, 9:418–420, 2000.
S. Khanna, S. Muthukrishnan, and M. Paterson. On approximating rectangle tiling and packing. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, 384–393, 1998.
S. Khanna, S. Muthukrishnan, and S. Skiena. Efficient array partitioning. In G. Goos, J. Hartmanis, and J. van Leeuwen, editors, Lecture Notes in Computer Science 1256: Proceedings of the 24th International Colloquium on Automata, Languages, and Programming, 616–626. Springer-Verlag, New York, NY, 1997.
D. J. Lockhart, H. Dong, M. C. Byrne, M. T. Follettie, M. V. Gallo, M. S. Chee, M. Mittmann, C. Wang, M. Kobayashi, and H. Horton et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nature Biotechnology, 14:1675–1680, 1996.
K. Mullis, F. Faloona, S. Scharf, R. Saiki, G. Horn, and H. Erlich. Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harbor Symposium in Quantitative Biology, 51:263–273, 1986.
S. Muthukrishnan, V. Poosala, and T. Suel. On rectangular partitions in two dimensions: Algorithms, complexity and applications. In Proceedings of the 7th International Conference on Database Theory, 236–256, 1999.
National Center for Biotechnology Information (NCBI). http://www.ncbi.nlm.nih.gov, 2002.
W. L. Ruzzo and M. Tompa. Linear time algorithm for finding all maximal scoring subsequences. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, 234–241, 1999.
D. D. Shalon and P. O. B. S. J. Smith. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Research, 6(7):639–645, July 1996.
A. F. A. Smit and P. Green. RepeatMasker, repeatmasker.genome.washington.edu, 2002.
A. Smith and S. Suri. Rectangular tiling in multi-dimensional arrays. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, 786–794, 1999.
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147:195–197, 1981.
The Arabidipsis Genome Initiative. Analysis of the genome sequence of the flowering plant arabidopsis thaliana. Nature, 408:796–815, 2000.
The C. elegans Sequencing Consortium. Genome sequence of the nematode c. elegans: a platform for investigating biology. Science, 282:2012–2018, 1998.
J. C. Venter et al. The sequence of the human genome. Science, 291:1304–1351, 2001.
Z. Zhang, P. Berman, and W. Miller. Alignments without low-scoring regions. Journal of Computational Biology, 5(2):197–210, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berman, P., Bertone, P., DasGupta, B., Gerstein, M., Kao, MY., Snyder, M. (2002). Fast Optimal Genome Tiling with Applications to Microarray Design and Homology Search. In: Guigó, R., Gusfield, D. (eds) Algorithms in Bioinformatics. WABI 2002. Lecture Notes in Computer Science, vol 2452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45784-4_32
Download citation
DOI: https://doi.org/10.1007/3-540-45784-4_32
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44211-0
Online ISBN: 978-3-540-45784-8
eBook Packages: Springer Book Archive