Skip to main content

Efficient Approximate Similarity Search Using Random Projection Learning

  • Conference paper
Web-Age Information Management (WAIM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6897))

Included in the following conference series:

Abstract

Efficient similarity search on high dimensional data is an important research topic in database and information retrieval fields. In this paper, we propose a random projection learning approach for solving the approximate similarity search problem. First, the random projection technique of the locality sensitive hashing is applied for generating the high quality binary codes. Then the binary code is treated as the labels and a group of SVM classifiers are trained with the labeled data for predicting the binary code for the similarity queries. The experiments on real datasets demonstrate that our method substantially outperforms the existing work in terms of preprocessing time and query processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)

    Google Scholar 

  2. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388 (2002)

    Google Scholar 

  3. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS, pp. 459–468. MIT, Cambridge (2006)

    Google Scholar 

  4. Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp. 563–576 (2009)

    Google Scholar 

  5. Min, K., Yang, L., Wright, J., Wu, L., Hua, X.S., Ma, Y.: Compact Projection: Simple and Efficient Near Neighbor Search with Practical Memory Requirements. In: CVPR, pp. 3477–3484 (2010)

    Google Scholar 

  6. Salakhutdinov, R., Hinton, G.: Semantic Hashing. International Journal of Approximate Reasoning 50(7), 969–978 (2009)

    Article  Google Scholar 

  7. Zhang, D., Wang, J., Cai, D., Lu, J.: Self-taught hashing for fast similarity search. In: SIGIR, pp. 18–25 (2010)

    Google Scholar 

  8. Joachims, T.: Training linear SVMs in linear time. In: SIGKDD, pp. 217–226 (2006)

    Google Scholar 

  9. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  10. World Wide Knowledge Base project (2001), http://www.cs.cmu.edu/~webkb/

  11. Reuters21578 (1999), http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

  12. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. Addison Wesley, Reading (1999)

    Google Scholar 

  13. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 517 (1975)

    Article  MATH  Google Scholar 

  14. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)

    Google Scholar 

  15. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD 19(2), 322–331 (1990)

    Article  Google Scholar 

  16. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, pp. 194–205 (1998)

    Google Scholar 

  17. Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: SIGMOD, pp. 301–312 (2003)

    Google Scholar 

  18. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  19. Yao, B., Li, F., Kumar, P.: k-nearest neighbor queries and knn-joins in large relational databases (almost) for free. In: ICDE, pp. 4–15 (2010)

    Google Scholar 

  20. Ramsak, F., Markl, V., Fenk, R., Zirkel, M., Elhardt, K., Bayer, R.: Integrating the UB-tree into a database system kernel. In: VLDB, pp. 263–272 (2000)

    Google Scholar 

  21. Liao, S., Lopez, M., Leutenegger, S.: High dimensional similarity search with space filling curves. In: ICDE, pp. 615–622 (2001)

    Google Scholar 

  22. Baluja, S., Covell, M.: Learning to hash: forgiving hash functions and applications. Data Mining and Knowledge Discovery 17(3), 402–430 (2008)

    Article  MathSciNet  Google Scholar 

  23. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. NIPS 21, 1753–1760 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yuan, P., Sha, C., Wang, X., Yang, B., Zhou, A. (2011). Efficient Approximate Similarity Search Using Random Projection Learning. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23535-1_44

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23534-4

  • Online ISBN: 978-3-642-23535-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics