Abstract
Bioinformatics is facing a post-genomic era characterized by the release of large amounts of data boosted by the scientific revolution in high throughput technologies. This document presents an approach to deal with such a massive data processing problem in a paradigmatic application from which interesting lessons can be learned. The design of an out-of-core and modular implementation of traditional High-scoring Segment Pairs (HSPs) applications removes the limits of genome size and performs the work in linear time and with controlled computational requirements. Regardless of the expected huge I/O operations, the full system performs faster than state-of-the-art references providing additional advantages such as monitoring and interactive analysis, the exploitation of important intermediate results, and giving the specific nature of the modules, instead of monolithic software, enabling the plugging of external components to squeeze results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Oxford Nanopore Technologies, http://www.nanoporetech.com/news/press-releases/view/39
Gibbs, A.J., McIntyre, G.A.: The diagram, a method for comparing se-quences, Its use with aminoacid and nucleotide sequences. Eur. J. Biochem. 16, 1–11 (1970)
Maizel, J.V., Lenk, R.P.: Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences. In: Proc. of the National Academy of Science, USA 78, vol. 12, pp. 7665–7669. Genetics (1981)
Staden, R.: An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. Nucleic Acids Research 10(9), 2951–2961 (1982)
Pustell, J., Kafatos, F.C.: A convenient and adaptable package of computer programs for DNA and protein sequence management, analysis and homology determination. Nucl. Acids Res. 12, 643–655 (1984)
Argos, P.: A sensitive procedure to compare aminoacids. J. Mol. Biol. 193, 385–396 (1987)
Vingron, Argos: Motif recognition and alignment for many sequences by com-parison of dot-matrices. J. Mol. Biology 218(1) (1991)
Reisner, H., Bucholtz: The use of various properties of amino acids in color and monochrome dot-matrix analyses for protein homologies. Bioinformatics 4(3), 395–402 (1988)
Nedde, D.N., Ward, M.O.: Visualizing relationships between nucleic acid sequences using correlation images. CABIOS 9(3), 331–335 (1993)
Sonnhammer, E.L.L., Durbin, R.: A dot-matrix program with dynamic thres-hold control suited for genomic DNA and protein sequence analysis. Gene. 167, GC1–GC10 (1995)
Trelles, O., et al.: An Image processing approach to DotPlots: An X window-based program for interactive analysis of dotplots derived from sequence and structural data. CABIOS 11(3), 301–308 (1995)
Junier, T., Pagni, M.: DotLet: Diagonal plots in a web-browser. Bioinformatics 16(2), 178–179 (2000)
Schwartz, S.: PipMaker: a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586 (2000)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Zhang, Z., et al.: A greedy algorithm for aligning DNA sequences. J. Comp. Biol. 7, 203–214 (2000)
Kurtz et al.: Versatile and open software for comparing large genomes. Genome Biol., 5, R12 (2004)
Krumsiek, J., et al.: Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23(8), 1026–1028 (2007), doi:10.1093/bioinformatics
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moreno, A.R., Tirado, Ó.T., Salazar, O.T. (2013). Out of Core Computation of HSPs for Large Biological Sequences. In: Rojas, I., Joya, G., Cabestany, J. (eds) Advances in Computational Intelligence. IWANN 2013. Lecture Notes in Computer Science, vol 7903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38682-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-38682-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38681-7
Online ISBN: 978-3-642-38682-4
eBook Packages: Computer ScienceComputer Science (R0)