Synonyms
Querying DNA sequences; Querying protein sequences
Definition
A common type of data that is used in life science applications is biological sequence data. Data such as DNA sequence and protein sequence data are growing at a very fast rate. For example, the data at GenBank[GB07] has been growing exponentially, doubling roughly every 18 months. These sequence datasets are often queried in complex ways and the methods required to query these sequences go far beyond the simple string matching methods that have been used in more traditional string applications. In order to enable users to easily pose sophisticated queries on these biological sequences, different languages have been designed to support a rich library of functions. In addition, some database systems have been extended to support a rich set of operators on the sequence data type. Compared to the stand-alone approach, the database method brings the power of algebraic query optimization and the use of indexes making it...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.
Barbara A, Eckman AK. Querying BLAST within a data federation. Q Bull IEEE TC Data Eng. 2004;27(3):12–9.
Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. Atlas Protein Seq Struct. 1978;5(3):345–52.
Hammer J, Schneider M. Genomics algebra: a new, integrating data model, language, and tool for processing and querying genomic information. In: Proceedings of the 1st Biennial Conference on Innovative Data Systems Research; 2003. p. 176–87.
Henikoff S, Henikoff J. Amino acid substitution matrices from protein blocks. In Proc Natl Acad Sci. 1992;89(22):10915–9.
Hsiao R-L, Stott Parker Jr D, Yang H-C. Support for BioIndexing in BLASTgres. In: In Data Integration in the Life Sciences (DILS), LNCS, vol. 3615. Berlin: Springer; 2005. p. 284–7.
Mao R, Weijia X, Neha S, Miranker DP. An assessment of a metric space database index to support sequence homology. In: Proceedings of the IEEE 3rd International Symposium on Bioinformatics and Bioengineering; 2003. p. 375–82.
Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci. 1988;85(8):2444–8.
Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195–7.
Stephens S, Chen JY, Davidson MG, Thomas S, Trute BM. Oracle database 10 g: a platform for BLAST search and regular expression pattern matching in life sciences. Nucleic Acids Res. 2005;33(Database-Issue):675–9.
Stephens S, Chen JY, Thomas S. ODM BLAST: sequence homology search in the RDBMS. Q Bull IEEE TC Data Eng. 2004;27(3):20–3.
Tata S, Lang W, Patel JM. Periscope/SQ: interactive exploration of biological sequence databases. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2007. p. 1406–9.
Tata S, Patel JM. PiQA: an algebra for querying protein data sets. In: Proceedings of the 15th International Conference on Scientific and Statistical Database Management; 2003. p. 141–50.
Tata S, Patel JM, Friedman JS, Swaroop A. Declarative querying for biological sequences. In: Proceedings of the 22nd International Conference on Data Engineering; 2006. p. 87.
Weiner P. Linear pattern matching algorithm. In: Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory; 1973. p. 1–11.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Tata, S., Patel, J.M. (2018). Query Languages and Evaluation Techniques for Biological Sequence Data. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_630
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_630
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering