Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Index Structures for Biological Sequences

  • Tamer Kahveci
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1434

Definition

Biological sequence databases are mainly composed of DNA, RNA, and protein sequences. DNA and RNA sequences are polymers of nucleotides, whereas proteins are polymers of amino acids. A database of biological sequences contains a set of biological sequences of the same type. The length of each sequence varies from less than a hundred to several hundred million bases. An index structure on a database of biological sequences helps in identifying sequences in that database that are similar to a given query sequence quickly. The definition of similarity depends on two orthogonal parameters; similarity function and the length of the similarity of interest.

The simplest similarity function is the edit distance, which measures the number of substitutions, insertions, and deletions needed to transform one sequence to the other. More complex functions involve variable gap penalties and substitution scores based on how frequent substitutions are observed in nature. The length of the...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Altschul S, Gish W, Miller W, Meyers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.CrossRefGoogle Scholar
  2. 2.
    Benson D, Karsch-Mizrachi I, Lipman D, Ostell J, Rapp B, Wheeler D. GenBank. Nucleic Acids Res. 2000;28(1):15–8.CrossRefGoogle Scholar
  3. 3.
    Bray N, Dubchak I, Pachter L. AVID: a global alignment program. Genome Res. 2003;13(1):97–102.CrossRefGoogle Scholar
  4. 4.
    Ferragina P, Grossi R. The string B-tree: a new data structure for string search in external memory and its applications. J ACM. 1999;46(2):236–80.MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Flho RFS, Traina AJM, Caetano Traina J, Faloutsos C. Similarity search without tears: the OMNI family of all-purpose access methods. In: Proceedings of the 17th International Conference on Data Engineering; 2001. p. 623–30.Google Scholar
  6. 6.
    Giladi E, Walker M, Wang J, Volkmuth W. SST: an algorithm for finding near-exact sequence matches in time proportional to the logarithm of the database size. Bioinformatics. 2002;18(6):873–7.CrossRefGoogle Scholar
  7. 7.
    Kahveci T, Singh A. An efficient index structure for string databases. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 351–60.Google Scholar
  8. 8.
    Manber U, Myers E. Suffix arrays: a new method for on-line string searches. SIAM J Comput. 1993;22(5):935–48.MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    McCreight E. A space-economical suffix tree construction algorithm. J ACM. 1976;23(2):262–72.MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Pearson W, Lipman D. Improved tools for biological sequence comparison. Proc Natl Acad Sci. 1988;85(8):2444–8.CrossRefGoogle Scholar
  11. 11.
    Pol A, Kahveci T. Highly scalable and accurate seeds for subsequence alignment. In: Proceedings of the IEEE International Conference on Bioinformatics and Bioengineering; 2005.Google Scholar
  12. 12.
    Ukkonen E. On-line construction of suffix-trees. Algorithmica. 1995;14(3):249–60.MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Venkateswaran J, Lachwani D, Kahveci T, Jermaine C. Reference-based indexing for metric spaces with costly distance measures. VLDB J. 2008;17(5):1231–51.CrossRefGoogle Scholar
  14. 14.
    Weiner P. Linear pattern matching algorithms. In: Proceedings of the IEEE Symposium on Switching and Automata Theory; 1973. p. 1–11.Google Scholar
  15. 15.
    Yianilos P. Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms; 1993. p. 311–21.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of FloridaGainesvilleUSA

Section editors and affiliations

  • Louiqa Raschid
    • 1
  1. 1.Robert H. Smith School of BusinessUniversity of MarylandCollege ParkUSA