Locality-Sensitive Hashing Without False Negatives for $$l_p$$

Pacuk, Andrzej; Sankowski, Piotr; Wegrzycki, Karol; Wygocki, Piotr

doi:10.1007/978-3-319-42634-1_9

Locality-Sensitive Hashing Without False Negatives for $l_p$

Andrzej Pacuk¹⁵,
Piotr Sankowski¹⁵,
Karol Wegrzycki¹⁵ &
…
Piotr Wygocki¹⁵

Conference paper
First Online: 20 July 2016

1033 Accesses
2 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9797))

Abstract

In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius R in d dimensional space equipped with $l_p$ norm when $p \in [1,\infty ]$. Furthermore, we show how to use these hash functions to solve the c-approximate nearest neighbor search problem without false negatives. Namely, if there is a point at distance R, we will certainly report it and points at distance greater than cR will not be reported for $c=\varOmega (\sqrt{d},d^{1-\frac{1}{p}})$. The constructed algorithms work:

with preprocessing time $\mathcal {O}(n \log (n))$ and sublinear expected query time,
with preprocessing time $\mathcal {O}(\mathrm {poly}(n))$ and expected query time $\mathcal {O}(\log (n))$.

Our paper reports progress on answering the open problem presented by Pagh [8], who considered the nearest neighbor search without false negatives for the Hamming distance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
$||\cdot ||_p$ denotes the standard $l_p$ norm for fixed p.
2.
However, one may try to obtain tighter bound (e.g., $c = d^{1/2}/\log (d)$) or show that for every $\epsilon > 0$, the approximation factor $c=d^{1/2}-\epsilon $ does not work.

References

Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Article Google Scholar
Andoni, A., Razenshteyn, I.: Optimal data-dependent hashing for approximate near neighbors. In: Servedio, R.A., Rubinfeld, R. (eds.) Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, 14–17 June 2015, pp. 793–801. ACM (2015)
Google Scholar
Bentley, J.L.: K-d trees for semidynamic point sets. In: Proceedings of the Sixth Annual Symposium on Computational Geometry, SCG 1990, pp. 187–197. ACM, New York (1990)
Google Scholar
Datar, M., Indyk, P.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG 2004, pp. 253–262. ACM Press (2004)
Google Scholar
Haagerup, U.: The best constants in the Khintchine inequality. Stud. Math. 70(3), 231–283 (1981)
MathSciNet MATH Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Article MathSciNet MATH Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 604–613. ACM, New York (1998)
Google Scholar
Pagh, R.: Locality-sensitive hashing without false negatives. In: Krauthgamer, R. (ed.) Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, 10–12 January 2016, pp. 1–9. SIAM (2016)
Google Scholar
Veraar, M.: On Khintchine inequalities with a weight. Proc. Am. Math. Soc. 138, 4119–4121 (2010)
Article MathSciNet MATH Google Scholar
Williams, R.: A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci. 348(2), 357–365 (2005)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was supported by ERC PoC project PAAl-POC 680912 and FET project MULTIPLEX 317532. We would also like to thank Rafał Latała for meaningful discussions.

Author information

Authors and Affiliations

Institute of Informatics, University of Warsaw, Warsaw, Poland
Andrzej Pacuk, Piotr Sankowski, Karol Wegrzycki & Piotr Wygocki

Authors

Andrzej Pacuk
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Sankowski
View author publications
You can also search for this author in PubMed Google Scholar
Karol Wegrzycki
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Wygocki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piotr Wygocki .

Editor information

Editors and Affiliations

Virginia Commonwealth Univ , Richmond, Virginia, USA
Thang N. Dinh
University of Florida , Gainesville, Alabama, USA
My T. Thai

Appendices

A Proof of Observation 1

Proof

We will use, the fact that for any $x,y \in \mathbb {R}$ we have $| \left\lfloor x \right\rfloor - \left\lfloor y \right\rfloor | \le 1 \Rightarrow |x-y| < 2$. Then the following implications hold:

$$\begin{aligned} | h_p(x) - h_p(y) | \le 1 \iff&\bigg | \left\lfloor \frac{\left\langle x , v \right\rangle }{\rho _pr} \right\rfloor - \left\lfloor \frac{\left\langle y , v \right\rangle }{\rho _pr} \right\rfloor \bigg | \le 1\!&\implies \Big | \frac{\left\langle x , y \right\rangle }{\rho _pr} - \frac{\left\langle y , v \right\rangle }{\rho _pr} \Big |< 2\!\! \iff \\ \iff&| \left\langle x-y , v \right\rangle | < 2 \rho _pr.&\end{aligned}$$

So, based on the increasing property of the probability:

$$\begin{aligned} \mathrm {if} \; A \subset B \; \mathrm {then} \; \mathbb {P}\left[ A \right] \le \mathbb {P}\left[ B \right] , \end{aligned}$$

the inequality of the probabilities holds. $\square $

B Proof of Observation 2

Proof

We will use the fact that for $x,y \in \mathbb {R}: | x-y | < 1 \Rightarrow | \left\lfloor x \right\rfloor - \left\lfloor y \right\rfloor | \le 1$).

$$\begin{aligned} \Big |\left\langle x-y , v \right\rangle \Big |< \rho _pr \iff&\!\!\!\!\!\!\! \Big | \frac{\left\langle x , v \right\rangle }{\rho _pr} - \frac{\left\langle x , v \right\rangle }{\rho _pr} \Big | < 1\!\!\!\!\!\!\!&\implies \bigg | \left\lfloor \frac{\left\langle x , v \right\rangle }{\rho _pr} \right\rfloor - \left\lfloor \frac{\left\langle x , v \right\rangle }{\rho _pr} \right\rfloor \bigg | \le 1\!\!\!\! \iff \\ \iff&| h_p(x) - h_p(y) | \le 1&\end{aligned}$$

$\square $

C Proof of Observation 4

Proof

For every $0 < b \le a$ vectors in $\mathbb {R}^d$ satisfy the inequality:

$$\begin{aligned} ||z ||_a \le ||z ||_b \le d^{(\frac{1}{b} - \frac{1}{a})} ||z ||_a . \end{aligned}$$

(1)

For $p>2$ we have $\max \{ d^\frac{1}{2} , d^{1-\frac{1}{p}} \} = d^{1-\frac{1}{p}}$. Then, using ineqaulity (1) for $a=p$ and $b=2$ we have:

$$\begin{aligned} ||z ||_2 \ge ||z ||_p = \frac{\rho _p}{d^{1-\frac{1}{p}}} ||z ||_p = \frac{\rho _p}{\max \{d^\frac{1}{2}, d^{1 - \frac{1}{p} } \} } ||z ||_p \end{aligned}$$

For $1 \le p \le 2$ we have $\max \{ d^\frac{1}{2}, d^{1-\frac{1}{p}} \} = d^\frac{1}{2}$. Analogously by using inequality (1) for $a = 2$ and $b=p$:

$$\begin{aligned} ||z ||_p \le d^{\frac{1}{p} - \frac{1}{2}} ||z ||_2 = ||z ||_2 \frac{d^{\frac{1}{2}}}{\rho _p} \end{aligned}$$

Hence, by dividing both sides we have:

$$\begin{aligned} ||z ||_p \frac{\rho _p}{\max \{ d^\frac{1}{2}, d^{1-\frac{1}{p}} \}} \le ||z ||_2 \end{aligned}$$

$\square $

D Hoeffding Bound

Here we are going to show all technical details used in the proof in the Sect. 3.1. Let us start with the Hoeffding inequality. Let $X_1, \ldots , X_d$ be bounded independent random variables: $a_i \le X_i \le b_i$ and $\overline{X}$ be the mean of these variables $ \overline{X} = \sum _{i=1}^{d}X_i / d $. Theorem 2 of Hoeffding [6] states:

$$\begin{aligned} \mathbb {P}\left[ | \overline{X} - \mathbb {E}\left[ \overline{X} \right] |\ge t \right]&\le 2 \cdot \exp \left( -\frac{2d^2t^2}{\sum _{i=1}^d(b_i - a_i)^2} \right) . \end{aligned}$$

In our case, $D_1, \ldots , D_d$ are bounded by $a_i = -1 \le D_i \le 1 = b_i$ with $\mathbb {E}D_i=0$. Hoeffding inequality implies:

$$\begin{aligned} \mathbb {P}\left[ \left| \frac{\sum _{i=1}^{d} D_i}{d} \right| \ge t \right]&\le 2 \cdot \exp \left( -\frac{2d^2t^2}{\sum _{i=1}^d(b_i - a_i)^2} \right) = 2 \cdot \exp \left( -\frac{dt^2}{2} \right) . \end{aligned}$$

Taking $t=d^{-1/2 +\epsilon }$ we get the claim:

$$\begin{aligned} \mathbb {P}\left[ \bigg |\frac{\sum _{i=1}^{d} D_i}{d} \bigg | \ge d^{-1/2 +\epsilon } \right]&\le 2 \cdot \exp \left( -\frac{d^{2 \epsilon }}{2} \right) . \end{aligned}$$

E Preprocessing Complexity Bounds for the Distributions Introduced in Lemma 1

By Lemma 1, we have: ${\textsf {p}_{\textsf {fp}}}_1 = 1-\frac{(1-\frac{\tau _1^2}{c^2})^2}{3}$, so:

$$\begin{aligned} \lim _{c \rightarrow \infty } {\gamma }=\lim _{c \rightarrow \infty } \frac{\ln {3}}{-\ln {{\textsf {p}_{\textsf {fp}}}_1}} = {\frac{\ln {3}}{\ln {1.5}}} \approx {2.71} . \end{aligned}$$

If we omit terms polynomial in d, the preprocessing time of the algorithm from Theorem 2 converges to $\mathcal {O}(n^{3.71})$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pacuk, A., Sankowski, P., Wegrzycki, K., Wygocki, P. (2016). Locality-Sensitive Hashing Without False Negatives for $l_p$ . In: Dinh, T., Thai, M. (eds) Computing and Combinatorics . COCOON 2016. Lecture Notes in Computer Science(), vol 9797. Springer, Cham. https://doi.org/10.1007/978-3-319-42634-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-42634-1_9
Published: 20 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42633-4
Online ISBN: 978-3-319-42634-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Locality-Sensitive Hashing Without False Negatives for \(l_p\)

Abstract

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof of Observation 1

Proof

B Proof of Observation 2

Proof

C Proof of Observation 4

Proof

D Hoeffding Bound

E Preprocessing Complexity Bounds for the Distributions Introduced in Lemma 1

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Abstract

Buying options

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proof of Observation 1

Proof

B Proof of Observation 2

Proof

C Proof of Observation 4

Proof

D Hoeffding Bound

E Preprocessing Complexity Bounds for the Distributions Introduced in Lemma 1

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation