Skip to main content

Data Structures for Accelerating Tanimoto Queries on Real Valued Vectors

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6293))

Abstract

Previous methods for accelerating Tanimoto queries have been based on using bit strings for representing molecules. No work has gone into examining accelerating Tanimoto queries on real valued descriptors, even though these offer a much more fine grained measure of similarity between molecules. This study utilises a recently discovered reduction from Tanimoto queries to distance queries in Euclidean space to accelerate Tanimoto queries using standard metric data structures. The presented experiments show that it is possible to gain a significant speedup and that general metric data structures are better suited than a data structure tailored for Euclidean space on vectors generated from molecular data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baldi, P., Hirschberg, D.S., Nasr, R.J.: Speeding up chemical database searches using a proximity filter based on the logical exclusive OR. Journal of Chemical Information and Modeling 48(7), 1367–1378 (2008)

    Article  CAS  PubMed  Google Scholar 

  2. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  Google Scholar 

  3. Brin, S.: Near neighbor search in large metric spaces. The VLDB Journal, 574–584 (1995)

    Google Scholar 

  4. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB 1997, Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, August 25-29, pp. 426–435. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  5. Gillet, V.J., Willett, P., Bradshaw, J.: Similarity searching using reduced graphs. Journal of Chemical Information and Computer Sciences 43(2), 338–345 (2003)

    Article  CAS  PubMed  Google Scholar 

  6. Huafeng, X., Agrafiotis, D.K.: Nearest neighbor search in general metric spaces using a tree data structure with a simple heuristic. Journal of Chemical Information and Modeling 43(6), 1933–1941 (2003)

    Google Scholar 

  7. Irwin, J.J., Shoichet, B.K.: ZINC: A free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling 45(1), 177–182 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Kristensen, T.G., Nielsen, J., Pedersen, C.N.S.: A tree-based method for the rapid screening of chemical fingerprints. Algorithms for Molecular Biology 5(1), 9 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  9. Kristensen, T.G.: Transforming Tanimoto queries on real valued vectors to range queries in Euclidian space. Journal of Mathematical Chemistry (March 2010)

    Google Scholar 

  10. Leach, A.R., Gillet, V.J.: An Introduction to Chemoinformatics, rev. ed edn. Kluwer Academic Publishers, Dordrecht (2007)

    Book  Google Scholar 

  11. Lipkus, A.H.: A proof of the triangle inequality for the Tanimoto distance. Journal of Mathematical Chemistry 26(1-3), 263–265 (1999)

    Article  Google Scholar 

  12. Molegro: Molegro Virtual Docker User Manual version 3.0.0 (2008)

    Google Scholar 

  13. Späth, H.: Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Ellis Horwood (1980)

    Google Scholar 

  14. Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences 43(2), 493–500 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Swamidass, S.J., Baldi, P.: Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time. Journal of Chemical Information and Modeling 47(2), 302–317 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB 1998: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 194–205. Morgan Kaufmann Publishers Inc., San Francisco (1998)

    Google Scholar 

  17. Willett, P.: Similarity-based approaches to virtual screening. Biochemical Society Transactions 31(Pt 3), 603–606 (2003)

    Article  CAS  PubMed  Google Scholar 

  18. Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. Journal of Chemical Information and Computer Sciences 38(6), 983–996 (1998)

    Article  CAS  Google Scholar 

  19. Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth ACM-SIAM Symposium on Discrete Algorithms (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kristensen, T.G., Pedersen, C.N.S. (2010). Data Structures for Accelerating Tanimoto Queries on Real Valued Vectors. In: Moulton, V., Singh, M. (eds) Algorithms in Bioinformatics. WABI 2010. Lecture Notes in Computer Science(), vol 6293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15294-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15294-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15293-1

  • Online ISBN: 978-3-642-15294-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics