Skip to main content

Out of Core Computation of HSPs for Large Biological Sequences

  • Conference paper
Advances in Computational Intelligence (IWANN 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7903))

Included in the following conference series:

Abstract

Bioinformatics is facing a post-genomic era characterized by the release of large amounts of data boosted by the scientific revolution in high throughput technologies. This document presents an approach to deal with such a massive data processing problem in a paradigmatic application from which interesting lessons can be learned. The design of an out-of-core and modular implementation of traditional High-scoring Segment Pairs (HSPs) applications removes the limits of genome size and performs the work in linear time and with controlled computational requirements. Regardless of the expected huge I/O operations, the full system performs faster than state-of-the-art references providing additional advantages such as monitoring and interactive analysis, the exploitation of important intermediate results, and giving the specific nature of the modules, instead of monolithic software, enabling the plugging of external components to squeeze results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Oxford Nanopore Technologies, http://www.nanoporetech.com/news/press-releases/view/39

  2. Gibbs, A.J., McIntyre, G.A.: The diagram, a method for comparing se-quences, Its use with aminoacid and nucleotide sequences. Eur. J. Biochem. 16, 1–11 (1970)

    Article  Google Scholar 

  3. Maizel, J.V., Lenk, R.P.: Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences. In: Proc. of the National Academy of Science, USA 78, vol. 12, pp. 7665–7669. Genetics (1981)

    Google Scholar 

  4. Staden, R.: An interactive graphics program for comparing and aligning nucleic acid and amino acid sequences. Nucleic Acids Research 10(9), 2951–2961 (1982)

    Article  Google Scholar 

  5. Pustell, J., Kafatos, F.C.: A convenient and adaptable package of computer programs for DNA and protein sequence management, analysis and homology determination. Nucl. Acids Res. 12, 643–655 (1984)

    Article  Google Scholar 

  6. Argos, P.: A sensitive procedure to compare aminoacids. J. Mol. Biol. 193, 385–396 (1987)

    Article  Google Scholar 

  7. Vingron, Argos: Motif recognition and alignment for many sequences by com-parison of dot-matrices. J. Mol. Biology 218(1) (1991)

    Google Scholar 

  8. Reisner, H., Bucholtz: The use of various properties of amino acids in color and monochrome dot-matrix analyses for protein homologies. Bioinformatics 4(3), 395–402 (1988)

    Article  Google Scholar 

  9. Nedde, D.N., Ward, M.O.: Visualizing relationships between nucleic acid sequences using correlation images. CABIOS 9(3), 331–335 (1993)

    Google Scholar 

  10. Sonnhammer, E.L.L., Durbin, R.: A dot-matrix program with dynamic thres-hold control suited for genomic DNA and protein sequence analysis. Gene. 167, GC1–GC10 (1995)

    Google Scholar 

  11. Trelles, O., et al.: An Image processing approach to DotPlots: An X window-based program for interactive analysis of dotplots derived from sequence and structural data. CABIOS 11(3), 301–308 (1995)

    Google Scholar 

  12. Junier, T., Pagni, M.: DotLet: Diagonal plots in a web-browser. Bioinformatics 16(2), 178–179 (2000)

    Article  Google Scholar 

  13. Schwartz, S.: PipMaker: a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586 (2000)

    Article  Google Scholar 

  14. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  15. Zhang, Z., et al.: A greedy algorithm for aligning DNA sequences. J. Comp. Biol. 7, 203–214 (2000)

    Article  Google Scholar 

  16. Kurtz et al.: Versatile and open software for comparing large genomes. Genome Biol., 5, R12 (2004)

    Google Scholar 

  17. Krumsiek, J., et al.: Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23(8), 1026–1028 (2007), doi:10.1093/bioinformatics

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moreno, A.R., Tirado, Ó.T., Salazar, O.T. (2013). Out of Core Computation of HSPs for Large Biological Sequences. In: Rojas, I., Joya, G., Cabestany, J. (eds) Advances in Computational Intelligence. IWANN 2013. Lecture Notes in Computer Science, vol 7903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38682-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38682-4_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38681-7

  • Online ISBN: 978-3-642-38682-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics