Skip to main content
Log in

Improving and Extending the Testing of Distributions for Shape-Restricted Properties

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Distribution testing deals with what information can be deduced about an unknown distribution over \(\{1,\ldots ,n\}\), where the algorithm is only allowed to obtain a relatively small number of independent samples from the distribution. In the extended conditional sampling model, the algorithm is also allowed to obtain samples from the restriction of the original distribution on subsets of \(\{1,\ldots ,n\}\). In 2015, Canonne, Diakonikolas, Gouleakis and Rubinfeld unified several previous results, and showed that for any property of distributions satisfying a “decomposability” criterion, there exists an algorithm (in the basic model) that can distinguish with high probability distributions satisfying the property from distributions that are far from it in the variation distance. We present here a more efficient yet simpler algorithm for the basic model, as well as very efficient algorithms for the conditional model, which until now was not investigated under the umbrella of decomposable properties. Additionally, we provide an algorithm for the conditional model that handles a much larger class of properties. Our core mechanism is an algorithm for efficiently producing an interval-partition of \(\{1,\ldots ,n\}\) that satisfies a “fine-grain” quality. We show that with such a partition at hand we can avoid the search for the “correct” partition of \(\{1,\ldots ,n\}\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The lower bounds for unconditional and non-adaptive conditional testing of L-decomposable properties with \(L=1\) are exactly the lower bounds for uniformity testing; the lower bound for adaptive conditional testing follows easily from the proved existence of properties that have no sub-linear complexity adaptive conditional tests; finally, the lower bound for properties k-characterized by atlases with \(k=1\) is just a bound for a symmetric property constructed there. About the last one, we conjecture that there exist properties with much higher lower bounds.

  2. The behavior of the conditional oracle on sets A with \(\mu (A)=0\) is as per the model of Chakraborty et al. [9]. However, upper bounds in this model also hold in the model of Canonne et al. [10], and most known lower bounds can be easily converted to it.

References

  1. Alon, N., Andoni, A., Kaufman, T., Matulef, K., Rubinfeld, R., Xie, N.: Testing k-wise and almost k-wise independence. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (New York, NY, USA), STOC ’07, ACM, pp. 496–505 (2007)

  2. Acharya, J., Canonne, C.L., Kamath, G.: A chasm between identity and equivalence testing with conditional queries. In: Garg, N., Jansen, K., Rao, A., Rolim, J.D.P., (eds.) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2015, August 24–26, 2015, Princeton, NJ, USA, LIPIcs, vol. 40, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, pp. 449–466 (2015)

  3. Acharya, J., Daskalakis, C., Kamath, G.: Optimal testing for properties of distributions. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp. 3591–3599 (2015)

  4. Batu, T., Fortnow, L., Fischer, E., Kumar, R., Rubinfeld, R., White, P.: Testing random variables for independence and identity. In: 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14–17 October 2001, Las Vegas, Nevada, USA, pp. 442–451 (2001)

  5. Batu, T., Fortnow, L., Rubinfeld, R., Smith, W.D., White, P.: Testing that distributions are close. In: 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, 12–14 November 2000, Redondo Beach, California, USA, pp. 259–269. IEEE Computer Society (2000)

  6. Batu, T., Kumar, R., Rubinfeld, R.: Sublinear algorithms for testing monotone and unimodal distributions,. In: Babai, L. (ed.) Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13–16, 2004, pp. 381–390. ACM (2004)

  7. Canonne, C.L.: A survey on distribution testing: your data is big. But is it blue? Electron. Colloq. Comput. Complex. (ECCC) 22, 63 (2015)

    Google Scholar 

  8. Canonne, C.L., Diakonikolas, I., Gouleakis, T., Rubinfeld, R.: Testing shape restrictions of discrete distributions. In: 33rd Symposium on Theoretical Aspects of Computer Science, STACS 2016, February 17–20, 2016, Orléans, France, pp. 25:1–25:14 (2016)

  9. Chakraborty, S., Fischer, E., Goldhirsh, Y., Matsliah, A.: On the power of conditional samples in distribution testing. SIAM J. Comput. 45(4), 1261–1296 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  10. Canonne, C.L., Ron, D., Servedio, R.A.: Testing probability distributions using conditional samples. SIAM J. Comput. 44(3), 540–616 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  11. Diakonikolas, I.: Learning structured distributions. In: Handbook of Big Data, p. 267 (2016)

  12. Diakonikolas, I., Kane, D.M.: A new approach for testing properties of discrete distributions. In: IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9–11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pp. 685–694 (2016)

  13. Diakonikolas, I., Lee, H.K., Matulef, K., Onak, K., Rubinfeld, R., Servedio, R.A., Wan, A.: Testing for concise representations, In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20–23, 2007, Providence, RI, USA, Proceedings, pp. 549–558 (2007)

  14. Goldreich, O., Ron, D.: On testing expansion in bounded-degree graphs. Electron. Colloquium Comput. Complex. (ECCC) 7(20) (2000)

  15. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963). MR 0144363

    Article  MathSciNet  MATH  Google Scholar 

  16. Indyk, P., Levi, R., Rubinfeld, R.: Approximating and testing k-histogram distributions in sub-linear time. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20–24, 2012, pp. 15–22 (2012)

  17. Paninski, L.: A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inf. Theory 54(10), 4750–4755 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  18. Servedio, R.A.: Testing by implicit learning: a brief survey. In: Property Testing—Current Research and Surveys [outgrow of a workshop at the Institute for Computer Science (ITCS) at Tsinghua University, January 2010], pp. 197–210 (2010)

  19. Valiant, G., Valiant, P.: A CLT and tight lower bounds for estimating entropy. Electron. Colloq. Comput. Complex. (ECCC) 17, 179 (2010)

    Google Scholar 

  20. Valiant, G., Valiant, P.: An automatic inequality prover and instance optimal identity testing. SIAM J. Comput. 46(1), 429–455 (2017)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Fischer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version with less refined bounds appeared in the Proceedings of the 34th STACS (2017).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fischer, E., Lachish, O. & Vasudev, Y. Improving and Extending the Testing of Distributions for Shape-Restricted Properties. Algorithmica 81, 3765–3802 (2019). https://doi.org/10.1007/s00453-019-00598-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-019-00598-1

Keywords

Navigation