Abstract
Efficient similarity search on high dimensional data is an important research topic in database and information retrieval fields. In this paper, we propose a random projection learning approach for solving the approximate similarity search problem. First, the random projection technique of the locality sensitive hashing is applied for generating the high quality binary codes. Then the binary code is treated as the labels and a group of SVM classifiers are trained with the labeled data for predicting the binary code for the similarity queries. The experiments on real datasets demonstrate that our method substantially outperforms the existing work in terms of preprocessing time and query processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC, pp. 380–388 (2002)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS, pp. 459–468. MIT, Cambridge (2006)
Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: SIGMOD, pp. 563–576 (2009)
Min, K., Yang, L., Wright, J., Wu, L., Hua, X.S., Ma, Y.: Compact Projection: Simple and Efficient Near Neighbor Search with Practical Memory Requirements. In: CVPR, pp. 3477–3484 (2010)
Salakhutdinov, R., Hinton, G.: Semantic Hashing. International Journal of Approximate Reasoning 50(7), 969–978 (2009)
Zhang, D., Wang, J., Cai, D., Lu, J.: Self-taught hashing for fast similarity search. In: SIGIR, pp. 18–25 (2010)
Joachims, T.: Training linear SVMs in linear time. In: SIGKDD, pp. 217–226 (2006)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
World Wide Knowledge Base project (2001), http://www.cs.cmu.edu/~webkb/
Reuters21578 (1999), http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. Addison Wesley, Reading (1999)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACMÂ 18(9), 517 (1975)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD 19(2), 322–331 (1990)
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, pp. 194–205 (1998)
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: SIGMOD, pp. 301–312 (2003)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)
Yao, B., Li, F., Kumar, P.: k-nearest neighbor queries and knn-joins in large relational databases (almost) for free. In: ICDE, pp. 4–15 (2010)
Ramsak, F., Markl, V., Fenk, R., Zirkel, M., Elhardt, K., Bayer, R.: Integrating the UB-tree into a database system kernel. In: VLDB, pp. 263–272 (2000)
Liao, S., Lopez, M., Leutenegger, S.: High dimensional similarity search with space filling curves. In: ICDE, pp. 615–622 (2001)
Baluja, S., Covell, M.: Learning to hash: forgiving hash functions and applications. Data Mining and Knowledge Discovery 17(3), 402–430 (2008)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. NIPS 21, 1753–1760 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yuan, P., Sha, C., Wang, X., Yang, B., Zhou, A. (2011). Efficient Approximate Similarity Search Using Random Projection Learning. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-23535-1_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)