Advertisement

Data Structures for Accelerating Tanimoto Queries on Real Valued Vectors

  • Thomas G. Kristensen
  • Christian N. S. Pedersen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6293)

Abstract

Previous methods for accelerating Tanimoto queries have been based on using bit strings for representing molecules. No work has gone into examining accelerating Tanimoto queries on real valued descriptors, even though these offer a much more fine grained measure of similarity between molecules. This study utilises a recently discovered reduction from Tanimoto queries to distance queries in Euclidean space to accelerate Tanimoto queries using standard metric data structures. The presented experiments show that it is possible to gain a significant speedup and that general metric data structures are better suited than a data structure tailored for Euclidean space on vectors generated from molecular data.

Keywords

Virtual Screening Vantage Point Distance Calculation Random Data Query Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baldi, P., Hirschberg, D.S., Nasr, R.J.: Speeding up chemical database searches using a proximity filter based on the logical exclusive OR. Journal of Chemical Information and Modeling 48(7), 1367–1378 (2008)CrossRefPubMedGoogle Scholar
  2. 2.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefGoogle Scholar
  3. 3.
    Brin, S.: Near neighbor search in large metric spaces. The VLDB Journal, 574–584 (1995)Google Scholar
  4. 4.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB 1997, Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, August 25-29, pp. 426–435. Morgan Kaufmann, San Francisco (1997)Google Scholar
  5. 5.
    Gillet, V.J., Willett, P., Bradshaw, J.: Similarity searching using reduced graphs. Journal of Chemical Information and Computer Sciences 43(2), 338–345 (2003)CrossRefPubMedGoogle Scholar
  6. 6.
    Huafeng, X., Agrafiotis, D.K.: Nearest neighbor search in general metric spaces using a tree data structure with a simple heuristic. Journal of Chemical Information and Modeling 43(6), 1933–1941 (2003)Google Scholar
  7. 7.
    Irwin, J.J., Shoichet, B.K.: ZINC: A free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling 45(1), 177–182 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Kristensen, T.G., Nielsen, J., Pedersen, C.N.S.: A tree-based method for the rapid screening of chemical fingerprints. Algorithms for Molecular Biology 5(1), 9 (2010)CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Kristensen, T.G.: Transforming Tanimoto queries on real valued vectors to range queries in Euclidian space. Journal of Mathematical Chemistry (March 2010)Google Scholar
  10. 10.
    Leach, A.R., Gillet, V.J.: An Introduction to Chemoinformatics, rev. ed edn. Kluwer Academic Publishers, Dordrecht (2007)CrossRefGoogle Scholar
  11. 11.
    Lipkus, A.H.: A proof of the triangle inequality for the Tanimoto distance. Journal of Mathematical Chemistry 26(1-3), 263–265 (1999)CrossRefGoogle Scholar
  12. 12.
    Molegro: Molegro Virtual Docker User Manual version 3.0.0 (2008)Google Scholar
  13. 13.
    Späth, H.: Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Ellis Horwood (1980)Google Scholar
  14. 14.
    Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences 43(2), 493–500 (2003)CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Swamidass, S.J., Baldi, P.: Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time. Journal of Chemical Information and Modeling 47(2), 302–317 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB 1998: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 194–205. Morgan Kaufmann Publishers Inc., San Francisco (1998)Google Scholar
  17. 17.
    Willett, P.: Similarity-based approaches to virtual screening. Biochemical Society Transactions 31(Pt 3), 603–606 (2003)CrossRefPubMedGoogle Scholar
  18. 18.
    Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. Journal of Chemical Information and Computer Sciences 38(6), 983–996 (1998)CrossRefGoogle Scholar
  19. 19.
    Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth ACM-SIAM Symposium on Discrete Algorithms (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Thomas G. Kristensen
    • 1
  • Christian N. S. Pedersen
    • 1
  1. 1.Bioinformatics Research CenterAarhus UniversityAarhus C.Denmark

Personalised recommendations