Advertisement

A Practical Index for Genome Searching

  • Heikki Hyyrö
  • Gonzalo Navarro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2857)

Abstract

Current search tools for computational biology trade efficiency for precision, losing many relevant matches. We push in the direction of obtaining maximum efficiency from an indexing scheme that does not lose any relevant match. We show that it is feasible to search the human genome efficiently on an average desktop computer.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hyyrö, H.: A bit-vector algorithm for computing Levenshtein and Damerau edit distances. Nordic Journal of Computing 10, 1–11 (2003)Google Scholar
  2. 2.
    Manber, U., Myers, E.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 935–948 (1993)Google Scholar
  3. 3.
    Myers, E.: A sublinear algorithm for approximate keyword searching. Algorithmica 12(4/5), 345–374 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic progamming. Journal of the ACM 46(3), 395–415 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)CrossRefGoogle Scholar
  6. 6.
    Navarro, G.: NR-grep: a fast and flexible pattern matching tool. Software Practice and Experience 31, 1265–1312 (2001)zbMATHCrossRefGoogle Scholar
  7. 7.
    Navarro, G., Baeza-Yates, R.: A hybrid indexing method for approximate string matching. Journal of Discrete Algorithms (JDA) 1(1), 205–239 (2000)MathSciNetGoogle Scholar
  8. 8.
    Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Engineering Bulletin 24(4), 19–27 (2001)Google Scholar
  9. 9.
    National center for biotechnology information, http://www.ncbi.nlm.nih.gov/
  10. 10.
    Ucsc human genome project working draft, http://genome.cse.ucsc.edu/
  11. 11.
    Ukkonen, E.: Finding approximate patterns in strings. J. of Algorithms 6, 132–137 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Williams, H.E., Zobel, J.: Indexing and retrieval for genomic databases. IEEE Trans. on Knowledge and Data Engineering 14(1), 63–78 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Heikki Hyyrö
    • 1
  • Gonzalo Navarro
    • 2
  1. 1.Dept. of Comp. and Inf. SciencesUniv. of TampereFinland
  2. 2.Dept. of Comp. ScienceUniv. of Chile 

Personalised recommendations