Skip to main content

Succinct Multibit Tree: Compact Representation of Multibit Trees by Using Succinct Data Structures in Chemical Fingerprint Searches

  • Conference paper
Algorithms in Bioinformatics (WABI 2012)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7534))

Included in the following conference series:

Abstract

Similarity searches in the databases of chemical fingerprints are a fundamental task in discovering novel drug-like molecules. Multibit trees have a data structure that enables fast similarity searches of chemical fingerprints (Kristensen et al., WABI’09). A standard pointer-based representation of multibit trees consumes a large amount of memory to index large-scale fingerprint databases. To make matters worse, original fingerprint databases need to be stored in memory to filter out false positives. A succinct data structure is compact and enables fast operations. Many succinct data structures have been proposed thus far, and have been applied to many fields such as full text indexing and genome mapping. We present compact representations of both multibit trees and fingerprint databases by applying these data structures. Experiments revealed that memory usage in our representations was much smaller than that of the standard pointer-based representation. Moreover, our representations enabled us to efficiently perform PubChem-scale similarity searches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aung, Z., Ng, S.-K.: An Indexing Scheme for Fast and Accurate Chemical Fingerprint Database Searching. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 288–305. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  2. Baldi, P., Hirschberg, D.: An Intersection Inequality Sharper than the Tanimoto Triangle Inequality for Efficiently Searching Large Databases. Journal of Chemical Information and Modeling 49, 1866–1870 (2009)

    Article  Google Scholar 

  3. Baldi, P., Hirschberg, D., Nasr, R.: Speeding Up Chemical Database Searches Using a Proximity Filter Based on the Logical Exclusive-OR. Journal of Chemical Information and Modeling 48, 1367–1378 (2008)

    Article  Google Scholar 

  4. Chazelle, B.: A Functional Approach to Data Structures and its Use in Multidimensional Searching. SIAM Journal on Computing 17 (1988)

    Google Scholar 

  5. Elias, P.: Efficient Storage and Retrieval by Content and Address of Static Files. Journal of the ACM 21, 246–260 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 269–278. Society for Industrial and Applied Mathematics (2001)

    Google Scholar 

  7. Jacobson, G.: Space-efficient Static Trees and Graphs. In: Proceedings of the 30th Annual Symposium of Foundations of Computer Science, pp. 549–554 (1989)

    Google Scholar 

  8. Keiser, M., Roth, B., Armbruster, B., Ernsberger, P., Irwin, J., Shoichet, B.: Relating protein pharmacology by ligand chemistry. Nature Biotechnology 25(2), 197–206 (2007)

    Article  Google Scholar 

  9. Leach, A., Gillet, V.: An introduction to chemoinformatics. Kluwer Academic Publishers, The Netherlands, rev. ed. (2007)

    Google Scholar 

  10. Nasr, R., Hirschberg, D., Baldi, P.: Hashing Algorithms and Data Structures for Rapid Searches of Fingerprint Vectors. Journal of Chemical Information and Modeling 50, 1358–1368 (2010)

    Article  Google Scholar 

  11. Nasr, R., Kristensen, T., Baldi, P.: Tree and hashing data structures to speed up chemical searches: Analysis and experiments. Molecular Informatics 30, 791–800 (2011)

    Article  Google Scholar 

  12. Navarro, G., Providel, E.: Fast, Small, Simple Rank/Select on Bitmaps. In: Proc. SEA, pp. 295–306 (2012)

    Google Scholar 

  13. Okanohara, D., Sadakane, K.: Practical Entropy-Compressed Rank/Select Dictionary. In: Workshop on Algorithm Engineering & Experiments (2007)

    Google Scholar 

  14. Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 232–242 (2002)

    Google Scholar 

  15. Swamidass, S., Baldi, P.: Bounds and Algorithms for Exact Searches of Chemical Fingerprints in Linear and Sublinear time. Journal of Chemical Information and Modeling 47, 302–317 (2007)

    Article  Google Scholar 

  16. Tarjan, R.E., Yao, A.C.: Storing a Sparse Table. Communications of the ACM 22, 606–611 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  17. Kristensen, T.G., Nielsen, J., Pedersen, C.N.S.: A Tree Based Method for the Rapid Screening of Chemical Fingerprints. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 194–205. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  18. Kristensen, T.G., Nielsen, J., Pedersen, C.N.S.: A tree-based method for the rapid screening of chemical fingerprints. Algorithms for Molecular Biology 5 (2010)

    Google Scholar 

  19. Turan, G.: Succinct Representation of Graphs. Discrete Applied Math. 8, 289–294 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  20. Williams, H.E., Zobel, J.: Compressing integers for fast file access. Comput. J. 42, 193–201 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tabei, Y. (2012). Succinct Multibit Tree: Compact Representation of Multibit Trees by Using Succinct Data Structures in Chemical Fingerprint Searches. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33122-0_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33121-3

  • Online ISBN: 978-3-642-33122-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics