A New Algorithm for Finding Closest Pair of Vectors (Extended Abstract)

Xie, Ning; Xu, Shuai; Xu, Yekun

doi:10.1007/978-3-319-90530-3_27

A New Algorithm for Finding Closest Pair of Vectors (Extended Abstract)

Ning Xie¹⁵,
Shuai Xu¹⁵ &
Yekun Xu¹⁵

Conference paper
First Online: 25 April 2018

1047 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10846))

Abstract

Given n vectors \(x_0, x_1, \ldots , x_{n-1}\) in \(\{0,1\}^{m}\), how to find two vectors whose pairwise Hamming distance is minimum? This problem is known as the Closest Pair Problem. If these vectors are generated uniformly at random except two of them are correlated with Pearson-correlation coefficient \(\rho \), then the problem is called the Light Bulb Problem. In this work, we propose a novel coding-based scheme for the Close Pair Problem. We design both randomized and deterministic algorithms, which achieve the best known running time when the minimum distance is very small compared to the length of input vectors. When applied to the Light Bulb Problem, our algorithms yields state-of-the-art deterministic running time when the Pearson-correlation coefficient \(\rho \) is very large.

A full version of the paper is available at https://arxiv.org/abs/1802.09104.

N. Xie, S. Xu and Y. Xu—Research supported in part by NSF grant 1423034.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Subqubic fast matrix multiplication algorithms are practical only for Strassen-based ones [11, 22]. Even though the recent breakthrough results [20, 35, 42] achieve asymptotically faster than Strassen’s algorithm [36], however, these algorithms are all based on Coppersmith-Winograd’s algorithm [15], and to the best of our knowledge, there are no practical implementations of these trilinear based algorithms.
2.
Actually, we only need to “check” when the two adjacent decoded vectors are identical.
3.
The Hamming bound, also known as the sphere packing bound, specifies an upper bound on the number codewords a code can have given the block length and the minimum distance of the code.
4.
The GV bound is known to be attainable by the random codes.
5.
All these results are due to the fact that Valiant’s algorithms are much more robust to weak correlations than other algorithms. Our algorithms therefore do not give improved bounds for these learning problems in the general settings.

References

Abboud, A., Rubinstein, A., Williams, R.: Distributed PCP theorems for hardness of approximation in P. In: Proceedings of 58th Annual IEEE Symposium on Foundations of Computer Science, pp. 25–36 (2017)
Google Scholar
Abboud, A., Williams, R., Yu, H.: More applications of the polynomial method to algorithm design. In: Proceedings of 26th ACM-SIAM Symposium on Discrete Algorithms, pp. 218–230 (2015)
Google Scholar
Alman, J., Chan, T.M., Williams, R.: Polynomial representations of threshold functions and algorithmic applications. In: Proceedings of 57th Annual IEEE Symposium on Foundations of Computer Science, pp. 467–476 (2016)
Google Scholar
Alman, J., Williams, R.: Probabilistic polynomials and Hamming nearest neighbors. In: Proceedings of 56th Annual IEEE Symposium on Foundations of Computer Science, pp. 136–150 (2015)
Google Scholar
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 117–122 (2008)
Article Google Scholar
Andoni, A., Indyk, P.: Nearest neighbors in high-dimensional spaces. In: Goodman, J., O’Rourke, J., Toth, C.D. (eds.) Handbook of Discrete and Computational Geometry, 3rd (edn). Chapman and Hall/CRC (2017)
Google Scholar
Andoni, A., Laarhoven, T., Razenshteyn, I., Waingarten, E.: Optimal hashing-based time-space trade-offs for approximate near neighbors. In: Proceedings of 28th ACM-SIAM Symposium on Discrete Algorithms, pp. 47–66 (2017)
Google Scholar
Arya, S., Mount, D.: Computational geometry: proximity and location. In: Mehta, D.P., Sahni, S. (eds) Handbook of Data Structures and Applications. Chapman and Hall/CRC (2005)
Google Scholar
Aston, C., Ralph, D., Lalo, D., Manjeshwar, S., Gramling, B., DeFreese, D., West, A., Branam, D., Thompson, L., Craft, M., et al.: Oligogenic combinations associated with breast cancer risk in women under 53 years of age. Hum. Genet. 116(3), 208–221 (2005)
Article Google Scholar
Becker, A., Joux, A., May, A., Meurer, A.: Decoding random binary linear codes in \(2^{n/20}\): how \(1+1=0\) improves information set decoding. In: EUROCRYPT, pp. 520–536 (2012)
Google Scholar
Benson, A., Ballard, G.: A framework for practical parallel fast matrix multiplication. In: ACM SIGPLAN Notices, pp. 42–53 (2015)
Article Google Scholar
Bentley, J.: Multidimensional divide-and-conquer. Commun. ACM 23(4), 214–229 (1980)
Article MathSciNet MATH Google Scholar
Cho, J., Nicolae, D., Gold, L., Fields, C., LaBuda, M., Rohal, P., Pickles, M., Qin, L., Fu, Y., Mann, J., et al.: Identification of novel susceptibility loci for inflammatory bowel disease on chromosomes 1p, 3q, and 4q: evidence for epistasis between 1p and IBD1. Proc. Natl. Acad. Sci. 95(13), 7502–7507 (1998)
Article Google Scholar
Clarkson, K.: A randomized algorithm for closest-point queries. SIAM J. Comput. 17(4), 830–847 (1988)
Article MathSciNet MATH Google Scholar
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of 19th Annual ACM Symposium on the Theory of Computing, pp. 1–6 (1987)
Google Scholar
Cordell, H.: Detecting gene \(\times \) gene interactions that underlie human diseases. Nat. Rev. Genet. 10(6), 392–404 (2009)
Article Google Scholar
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.: Locality-sensitive hashing scheme based on \(p\)-stable distributions. In: Proceedings of 20th Symposium on Computational Geometry, pp. 253–262 (2004)
Google Scholar
Dubiner, M.: Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem. IEEE Trans. Inf. Theory 56(8), 4166–4179 (2008)
Article MathSciNet MATH Google Scholar
Frazer, K., Ballinger, D., Cox, D., Hinds, D., Stuve, L., Gibbs, R., Belmont, J., Boudreau, A., Hardenbol, P., Leal, S., et al.: A second generation human haplotype map of over 3.1 million SNPs. Nature 449(7164), 851–861 (2007)
Article Google Scholar
Le Gall, F.: Faster algorithms for rectangular matrix multiplication. In: Proceedings of 53rd Annual IEEE Symposium on Foundations of Computer Science, pp. 514–523 (2012)
Google Scholar
Golin, M.J., Raman, R., Schwarz, C., Smid, M.: Simple randomized algorithms for closest pair problems. Nord. J. Comput. 2(1), 3–27 (1995)
MathSciNet Google Scholar
Huang, J., Rice, L., Matthews, D., van de Geijn, R.: Generating families of practical fast matrix multiplication algorithms. In: 2017 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 656–667 (2017)
Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of 30th Annual ACM Symposium on the Theory of Computing, pp. 604–613 (1998)
Google Scholar
Karppa, M., Kaski, P., Kohonen, J.: A faster subquadratic algorithm for finding outlier correlations. In: Proceedings of 27th ACM-SIAM Symposium on Discrete Algorithms, pp. 1288–1305 (2016)
Google Scholar
Khuller, S., Matias, Y.: A simple randomized sieve algorithm for the closest-pair problem. Inf. Comput. 118(1), 34–37 (1995)
Article MathSciNet MATH Google Scholar
Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of 30th Annual ACM Symposium on the Theory of Computing, pp. 614–623 (1998)
Google Scholar
May, A., Ozerov, I.: On computing nearest neighbors with applications to decoding of binary linear codes. In: EUROCRYPT, pp. 203–228 (2015)
Google Scholar
Meiser, S.: Point location in arrangements of hyperplanes. Inf. Comput. 106(2), 286–303 (1993)
Article MathSciNet MATH Google Scholar
Motwani, R., Naor, A., Panigrahi, R.: Lower bounds on locality sensitive hashing. In: Proceedings of 22nd Symposium on Computational Geometry, pp. 154–157 (2006)
Google Scholar
Musani, S., Shriner, D., Liu, N., Feng, R., Coffey, C., Yi, N., Tiwari, H., Allison, D.: Detection of gene \(\times \) gene interactions in genome-wide association studies of human population data. Hum. Hered. 63(2), 67–84 (2007)
Article Google Scholar
O’Donnell, R., Wu, Y., Zhou, Y.: Optimal lower bounds for locality-sensitive hashing (except when \(q\) is tiny). ACM Trans. Comput. Theory 6(1), 5 (2014)
Article MathSciNet Google Scholar
Paturi, R., Rajasekaran, S., Reif, J.: The light bulb problem. Inf. Comput. 117, 187–192 (1995)
Article MATH Google Scholar
Rabin, M.: Probabilistic algorithms. In: Algorithms and Complexity, pp. 21–30 (1976)
Google Scholar
Smid, M.: Closest-point problems in computational geometry. In: Sack, J., Urrutia, J. (eds.) Handbook of computational geometry, pp. 877–935. Elsevier Science Publishing (1997)
Chapter MATH Google Scholar
Stothers, A.: On the complexity of matrix multiplication. Ph.D. thesis, The University of Edinburgh (2010)
Google Scholar
Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969)
Article MathSciNet MATH Google Scholar
Valiant, G.: Finding correlations in subquadratic time, with applications to learning parities and juntas. J. ACM 62, 13 (2015). Earlier version in FOCS 2012
Article MathSciNet MATH Google Scholar
Valiant, L.G.: Functionality in neural nets. In: AAAI, pp. 629–634 (1988)
Google Scholar
Williams, R.: Faster all-pairs shortest paths via circuit complexity. In: Proceedings of 46th Annual ACM Symposium on the Theory of Computing, pp. 664–673 (2014)
Google Scholar
Williams, R.: The polynomial method in circuit complexity applied to algorithm design (invited talk). In: Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS), pp. 47–60 (2014)
Google Scholar
Williams, R.: On the difference between closest, furthest, and orthogonal pairs: nearly-linear vs barely-subquadratic complexity. In: Proceedings of 29th ACM-SIAM Symposium on Discrete Algorithms, pp. 1207–1215 (2018)
Chapter Google Scholar
Williams, V.: Multiplying matrices faster than Coppersmith-Winograd. In: Proceedings of 44th Annual ACM Symposium on the Theory of Computing, pp. 887–898 (2012)
Google Scholar

Download references

Acknowledgements

We would like to thank Karthik C.S. and the anonymous referees for their valuable comments.

Author information

Authors and Affiliations

Florida International University, Miami, FL, 33199, USA
Ning Xie, Shuai Xu & Yekun Xu

Authors

Ning Xie
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yekun Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ning Xie , Shuai Xu or Yekun Xu .

Editor information

Editors and Affiliations

Department of Informatics, University of Bergen, Bergen, Norway
Fedor V. Fomin
Steklov Mathematical Institute, Moscow, Russia
Vladimir V. Podolskii

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, N., Xu, S., Xu, Y. (2018). A New Algorithm for Finding Closest Pair of Vectors (Extended Abstract). In: Fomin, F., Podolskii, V. (eds) Computer Science – Theory and Applications. CSR 2018. Lecture Notes in Computer Science(), vol 10846. Springer, Cham. https://doi.org/10.1007/978-3-319-90530-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-90530-3_27
Published: 25 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90529-7
Online ISBN: 978-3-319-90530-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics