Skip to main content

A New Algorithm for Finding Closest Pair of Vectors (Extended Abstract)

  • Conference paper
  • First Online:
  • 1047 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10846))

Abstract

Given n vectors \(x_0, x_1, \ldots , x_{n-1}\) in \(\{0,1\}^{m}\), how to find two vectors whose pairwise Hamming distance is minimum? This problem is known as the Closest Pair Problem. If these vectors are generated uniformly at random except two of them are correlated with Pearson-correlation coefficient \(\rho \), then the problem is called the Light Bulb Problem. In this work, we propose a novel coding-based scheme for the Close Pair Problem. We design both randomized and deterministic algorithms, which achieve the best known running time when the minimum distance is very small compared to the length of input vectors. When applied to the Light Bulb Problem, our algorithms yields state-of-the-art deterministic running time when the Pearson-correlation coefficient \(\rho \) is very large.

A full version of the paper is available at https://arxiv.org/abs/1802.09104.

N. Xie, S. Xu and Y. Xu—Research supported in part by NSF grant 1423034.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Subqubic fast matrix multiplication algorithms are practical only for Strassen-based ones [11, 22]. Even though the recent breakthrough results [20, 35, 42] achieve asymptotically faster than Strassen’s algorithm [36], however, these algorithms are all based on Coppersmith-Winograd’s algorithm [15], and to the best of our knowledge, there are no practical implementations of these trilinear based algorithms.

  2. 2.

    Actually, we only need to “check” when the two adjacent decoded vectors are identical.

  3. 3.

    The Hamming bound, also known as the sphere packing bound, specifies an upper bound on the number codewords a code can have given the block length and the minimum distance of the code.

  4. 4.

    The GV bound is known to be attainable by the random codes.

  5. 5.

    All these results are due to the fact that Valiant’s algorithms are much more robust to weak correlations than other algorithms. Our algorithms therefore do not give improved bounds for these learning problems in the general settings.

References

  1. Abboud, A., Rubinstein, A., Williams, R.: Distributed PCP theorems for hardness of approximation in P. In: Proceedings of 58th Annual IEEE Symposium on Foundations of Computer Science, pp. 25–36 (2017)

    Google Scholar 

  2. Abboud, A., Williams, R., Yu, H.: More applications of the polynomial method to algorithm design. In: Proceedings of 26th ACM-SIAM Symposium on Discrete Algorithms, pp. 218–230 (2015)

    Google Scholar 

  3. Alman, J., Chan, T.M., Williams, R.: Polynomial representations of threshold functions and algorithmic applications. In: Proceedings of 57th Annual IEEE Symposium on Foundations of Computer Science, pp. 467–476 (2016)

    Google Scholar 

  4. Alman, J., Williams, R.: Probabilistic polynomials and Hamming nearest neighbors. In: Proceedings of 56th Annual IEEE Symposium on Foundations of Computer Science, pp. 136–150 (2015)

    Google Scholar 

  5. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 117–122 (2008)

    Article  Google Scholar 

  6. Andoni, A., Indyk, P.: Nearest neighbors in high-dimensional spaces. In: Goodman, J., O’Rourke, J., Toth, C.D. (eds.) Handbook of Discrete and Computational Geometry, 3rd (edn). Chapman and Hall/CRC (2017)

    Google Scholar 

  7. Andoni, A., Laarhoven, T., Razenshteyn, I., Waingarten, E.: Optimal hashing-based time-space trade-offs for approximate near neighbors. In: Proceedings of 28th ACM-SIAM Symposium on Discrete Algorithms, pp. 47–66 (2017)

    Google Scholar 

  8. Arya, S., Mount, D.: Computational geometry: proximity and location. In: Mehta, D.P., Sahni, S. (eds) Handbook of Data Structures and Applications. Chapman and Hall/CRC (2005)

    Google Scholar 

  9. Aston, C., Ralph, D., Lalo, D., Manjeshwar, S., Gramling, B., DeFreese, D., West, A., Branam, D., Thompson, L., Craft, M., et al.: Oligogenic combinations associated with breast cancer risk in women under 53 years of age. Hum. Genet. 116(3), 208–221 (2005)

    Article  Google Scholar 

  10. Becker, A., Joux, A., May, A., Meurer, A.: Decoding random binary linear codes in \(2^{n/20}\): how \(1+1=0\) improves information set decoding. In: EUROCRYPT, pp. 520–536 (2012)

    Google Scholar 

  11. Benson, A., Ballard, G.: A framework for practical parallel fast matrix multiplication. In: ACM SIGPLAN Notices, pp. 42–53 (2015)

    Article  Google Scholar 

  12. Bentley, J.: Multidimensional divide-and-conquer. Commun. ACM 23(4), 214–229 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cho, J., Nicolae, D., Gold, L., Fields, C., LaBuda, M., Rohal, P., Pickles, M., Qin, L., Fu, Y., Mann, J., et al.: Identification of novel susceptibility loci for inflammatory bowel disease on chromosomes 1p, 3q, and 4q: evidence for epistasis between 1p and IBD1. Proc. Natl. Acad. Sci. 95(13), 7502–7507 (1998)

    Article  Google Scholar 

  14. Clarkson, K.: A randomized algorithm for closest-point queries. SIAM J. Comput. 17(4), 830–847 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  15. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of 19th Annual ACM Symposium on the Theory of Computing, pp. 1–6 (1987)

    Google Scholar 

  16. Cordell, H.: Detecting gene \(\times \) gene interactions that underlie human diseases. Nat. Rev. Genet. 10(6), 392–404 (2009)

    Article  Google Scholar 

  17. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.: Locality-sensitive hashing scheme based on \(p\)-stable distributions. In: Proceedings of 20th Symposium on Computational Geometry, pp. 253–262 (2004)

    Google Scholar 

  18. Dubiner, M.: Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem. IEEE Trans. Inf. Theory 56(8), 4166–4179 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  19. Frazer, K., Ballinger, D., Cox, D., Hinds, D., Stuve, L., Gibbs, R., Belmont, J., Boudreau, A., Hardenbol, P., Leal, S., et al.: A second generation human haplotype map of over 3.1 million SNPs. Nature 449(7164), 851–861 (2007)

    Article  Google Scholar 

  20. Le Gall, F.: Faster algorithms for rectangular matrix multiplication. In: Proceedings of 53rd Annual IEEE Symposium on Foundations of Computer Science, pp. 514–523 (2012)

    Google Scholar 

  21. Golin, M.J., Raman, R., Schwarz, C., Smid, M.: Simple randomized algorithms for closest pair problems. Nord. J. Comput. 2(1), 3–27 (1995)

    MathSciNet  Google Scholar 

  22. Huang, J., Rice, L., Matthews, D., van de Geijn, R.: Generating families of practical fast matrix multiplication algorithms. In: 2017 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 656–667 (2017)

    Google Scholar 

  23. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of 30th Annual ACM Symposium on the Theory of Computing, pp. 604–613 (1998)

    Google Scholar 

  24. Karppa, M., Kaski, P., Kohonen, J.: A faster subquadratic algorithm for finding outlier correlations. In: Proceedings of 27th ACM-SIAM Symposium on Discrete Algorithms, pp. 1288–1305 (2016)

    Google Scholar 

  25. Khuller, S., Matias, Y.: A simple randomized sieve algorithm for the closest-pair problem. Inf. Comput. 118(1), 34–37 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  26. Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of 30th Annual ACM Symposium on the Theory of Computing, pp. 614–623 (1998)

    Google Scholar 

  27. May, A., Ozerov, I.: On computing nearest neighbors with applications to decoding of binary linear codes. In: EUROCRYPT, pp. 203–228 (2015)

    Google Scholar 

  28. Meiser, S.: Point location in arrangements of hyperplanes. Inf. Comput. 106(2), 286–303 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  29. Motwani, R., Naor, A., Panigrahi, R.: Lower bounds on locality sensitive hashing. In: Proceedings of 22nd Symposium on Computational Geometry, pp. 154–157 (2006)

    Google Scholar 

  30. Musani, S., Shriner, D., Liu, N., Feng, R., Coffey, C., Yi, N., Tiwari, H., Allison, D.: Detection of gene \(\times \) gene interactions in genome-wide association studies of human population data. Hum. Hered. 63(2), 67–84 (2007)

    Article  Google Scholar 

  31. O’Donnell, R., Wu, Y., Zhou, Y.: Optimal lower bounds for locality-sensitive hashing (except when \(q\) is tiny). ACM Trans. Comput. Theory 6(1), 5 (2014)

    Article  MathSciNet  Google Scholar 

  32. Paturi, R., Rajasekaran, S., Reif, J.: The light bulb problem. Inf. Comput. 117, 187–192 (1995)

    Article  MATH  Google Scholar 

  33. Rabin, M.: Probabilistic algorithms. In: Algorithms and Complexity, pp. 21–30 (1976)

    Google Scholar 

  34. Smid, M.: Closest-point problems in computational geometry. In: Sack, J., Urrutia, J. (eds.) Handbook of computational geometry, pp. 877–935. Elsevier Science Publishing (1997)

    Chapter  MATH  Google Scholar 

  35. Stothers, A.: On the complexity of matrix multiplication. Ph.D. thesis, The University of Edinburgh (2010)

    Google Scholar 

  36. Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  37. Valiant, G.: Finding correlations in subquadratic time, with applications to learning parities and juntas. J. ACM 62, 13 (2015). Earlier version in FOCS 2012

    Article  MathSciNet  MATH  Google Scholar 

  38. Valiant, L.G.: Functionality in neural nets. In: AAAI, pp. 629–634 (1988)

    Google Scholar 

  39. Williams, R.: Faster all-pairs shortest paths via circuit complexity. In: Proceedings of 46th Annual ACM Symposium on the Theory of Computing, pp. 664–673 (2014)

    Google Scholar 

  40. Williams, R.: The polynomial method in circuit complexity applied to algorithm design (invited talk). In: Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS), pp. 47–60 (2014)

    Google Scholar 

  41. Williams, R.: On the difference between closest, furthest, and orthogonal pairs: nearly-linear vs barely-subquadratic complexity. In: Proceedings of 29th ACM-SIAM Symposium on Discrete Algorithms, pp. 1207–1215 (2018)

    Chapter  Google Scholar 

  42. Williams, V.: Multiplying matrices faster than Coppersmith-Winograd. In: Proceedings of 44th Annual ACM Symposium on the Theory of Computing, pp. 887–898 (2012)

    Google Scholar 

Download references

Acknowledgements

We would like to thank Karthik C.S. and the anonymous referees for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ning Xie , Shuai Xu or Yekun Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xie, N., Xu, S., Xu, Y. (2018). A New Algorithm for Finding Closest Pair of Vectors (Extended Abstract). In: Fomin, F., Podolskii, V. (eds) Computer Science – Theory and Applications. CSR 2018. Lecture Notes in Computer Science(), vol 10846. Springer, Cham. https://doi.org/10.1007/978-3-319-90530-3_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90530-3_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90529-7

  • Online ISBN: 978-3-319-90530-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics