Abstract
Suffix-trees are popular indexing structures for various sequence processing problems in biological data management. We investigate here the possibility of enhancing the search efficiency of disk-resident suffix-trees through customized layouts of tree-nodes to disk-pages. Specifically, we propose a new layout strategy, called Stellar, that provides significantly improved search performance on a representative set of real genomic sequences. Further, Stellar supports both the standard root-to-leaf lookup queries as well as sophisticated sequencesearch algorithms that exploit the suffix-links of suffix-trees. Our results are encouraging with regard to the ultimate objective of seamlessly integrating sequence processing in database engines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alstrup, S., et al.: Efficient tree layout in a multilevel memory hierarchy. Technical Report arXiv:cs.DS/0211010v1 (2002)
Altschul, S., et al.: A Basic Local Alignment Search Tool. Journal of Molecular Biology 215(3) (1990)
Baswana, S., Sen, S.: Planar Graph Blocking for External Searching. Algorithmica 34(3) (2002)
Bayer, R., McCreight, E.M.: Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1(3) (1972)
Bedathur, S., Haritsa, J.: Engineering a Fast Online Persistent Suffix Tree Construction. In: Proc. of the IEEE Intl. Conf. on Data Engg, ICDE (2004)
Bedathur, S., Haritsa, J.: Search-Optimized Persistent Suffix-tree Storage for Biological Applications. Technical Report TR-2004-04, Database Systems Lab, Indian Institute of Science (2004)
Chang, W.I., Lawler, E.L.: Approximate String Matching in Sublinear Expected Time. In: Proc. of the IEEE Symp. on Found. of Comp. Sci, FOCS (1990)
Cobbs, A.L.: Fast Approximate Matching using Suffix Trees. In: Proc. of the 6th Annual Symp. on Combinatorial Pattern Matching, CPM (1995)
Delcher, A.L., et al.: Alignment of Whole Genomes. Nucleic Acids Research 27(11) (1999)
Diwan, A.A., et al.: Clustering Techniques for Minimizing External Path Length. In: Proc. of the 22nd Intl. Conf. on Very Large Databases, VLDB (1996)
Gil, J., Itai, A.: How to Pack Trees. Journal of Algorithms 32(2) (1999)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Gusfield, D.: Suffix Trees Come of Age in Bioinformatics (Invited Talk). In: IEEE Bioinformatics Conference, CSB (2002)
Hunt, E., Atkinson, M.P., Irving, R.W.: Database Indexing for Large DNA and Protein Sequence Collections. VLDB Journal 7(3) (2001)
Hunt, E., Atkinson, M.P., Irving, R.W.: A Database Index to Large Biological Sequences. In: Proc. of the 27th Intl. Conf. on Very Large Databases, VLDB (2001)
McCreight, E.M.: A Space-Efficient Suffix Tree Construction Algorithm. Jl. of the ACM (JACM) 23(2) (1976)
Nodine, M., Goodrich, M., Vitter, J.: Blocking for External Graph Searching. In: Proc. of the 12th ACM Symp. on Principles of Database Systems, PODS (1993)
Schürman, K.-B., Stoye, J.: Suffix Tree Construction and Storage with Limited Main Memory. Technical Report 2003-06, Universität Bielefeld (2003)
Tata, S., Hankins, R.A., Patel, J.M.: Practical Suffix Tree Construction. In: Proc. of the 30th Intl. Conf. on Very Large Databases, VLDB (2004)
Thite, S.: Optimum Binary Search Trees on the Hierarchical Memory Model. Master’s thesis, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign (2001)
Ukkonen, E.: Approximate String Matching over Suffix Trees. In: Proc. of the 4th Annual Symp. on Combinatorial Pattern Matching, CPM (1993)
Ukkonen, E.: Online Construction of Suffix-trees. Algorithmica 14(3) (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bedathur, S.J., Haritsa, J.R. (2005). Search-Optimized Suffix-Tree Storage for Biological Applications. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds) High Performance Computing – HiPC 2005. HiPC 2005. Lecture Notes in Computer Science, vol 3769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11602569_8
Download citation
DOI: https://doi.org/10.1007/11602569_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30936-9
Online ISBN: 978-3-540-32427-0
eBook Packages: Computer ScienceComputer Science (R0)