Sublinear Algorithms in the External Memory Model

Andoni, Alexandr; Indyk, Piotr; Onak, Krzysztof; Rubinfeld, Ronitt

doi:10.1007/978-3-642-16367-8_15

Alexandr Andoni¹⁷,
Piotr Indyk¹⁸,
Krzysztof Onak¹⁸ &
…
Ronitt Rubinfeld^18,19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6390))

1095 Accesses

Abstract

We initiate the study of sublinear-time algorithms in the external memory model. In this model, the data is stored in blocks of a certain size B, and the algorithm is charged a unit cost for each block access. This model is well-studied, since it reflects the computational issues occurring when the (massive) input is stored on a disk. Since each block access operates on B data elements in parallel, many problems have external memory algorithms whose number of block accesses is only a small fraction (e.g. 1/B) of their main memory complexity.

However, to the best of our knowledge, no such reduction in complexity is known for any sublinear-time algorithm. One plausible explanation is that the vast majority of sublinear-time algorithms use random sampling and thus exhibit no locality of reference. This state of affairs is quite unfortunate, since both sublinear-time algorithms and the external memory model are important approaches to dealing with massive data sets, and ideally they should be combined to achieve best performance.

We show that such combination is indeed possible. In particular, we consider three well-studied problems: testing of distinctness, uniformity and identity of an empirical distribution induced by data. For these problems we show random-sampling-based algorithms whose number of block accesses is up to a factor of \(1/\sqrt{B}\) smaller than the main memory complexity of those problems. We also show that this improvement is optimal for those problems.

Since these problems are natural primitives for a number of sampling-based algorithms for other problems, our tools improve the external memory complexity of other problems as well.

The research was supported in part by David and Lucille Packard Fellowship, by MADALGO (Center for Massive Data Algorithmics, funded by the Danish National Research Association), by Marie Curie IRG Grant 231077, by NSF grants 0514771, 0728645, and 0732334, and by a Symantec Research Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Olken, F., Rotem, D.: Simple random sampling from relational databases. In: VLDB, pp. 160–169 (1986)
Google Scholar
Olken, F.: Random Sampling from Databases. PhD thesis, U.C. Berkeley (1993)
Google Scholar
Fischer, E.: The art of uninformed decisions: A primer to property testing. Bulletin of the European Association for Theoretical Computer Science 75, 97–126 (2001)
MathSciNet MATH Google Scholar
Ron, D.: Property testing (a tutorial). In: Rajasekaran, S., Pardalos, P.M., Reif, J.H., Rolim, J.D.P. (eds.) Handbook on Randomization, vol. II, pp. 597–649. Kluwer Academic Press, Dordrecht (2001)
Chapter Google Scholar
Goldreich, O.: Combinatorial property testing—a survey. In: Randomization Methods in Algorithm Design, pp. 45–60 (1998)
Google Scholar
Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Sampling algorithms: lower bounds and applications. In: STOC, pp. 266–275 (2001)
Google Scholar
Vitter, J.S.: External memory algorithms and data structures. ACM Comput. Surv. 33(2), 209–271 (2001)
Article Google Scholar
Goldreich, O., Ron, D.: On testing expansion in bounded-degree graphs. Electronic Colloqium on Computational Complexity 7(20) (2000)
Google Scholar
Batu, T.: Testing Properties of Distributions. PhD thesis, Cornell University (August 2001)
Google Scholar
Batu, T., Fortnow, L., Rubinfeld, R., Smith, W.D., White, P.: Testing that distributions are close. In: FOCS, pp. 259–269 (2000)
Google Scholar
Batu, T., Fortnow, L., Fischer, E., Kumar, R., Rubinfeld, R., White, P.: Testing random variables for independence and identity. In: FOCS, pp. 442–451 (2001)
Google Scholar
Fischer, E., Matsliah, A.: Testing graph isomorphism. SIAM J. Comput. 38(1), 207–225 (2008)
Article MathSciNet MATH Google Scholar
Onak, K.: Testing properties of sets of points in metric spaces. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 515–526. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Princeton University and Center for Computational Intractability, Princeton, NJ, USA
Alexandr Andoni
Massachusetts Institute of Technology, Cambridge, MA, USA
Piotr Indyk, Krzysztof Onak & Ronitt Rubinfeld
Tel-Aviv University, Tel Aviv, Israel
Ronitt Rubinfeld

Authors

Alexandr Andoni
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Indyk
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Onak
View author publications
You can also search for this author in PubMed Google Scholar
Ronitt Rubinfeld
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Mathematics and Computer Science, Weizmann Institute of Science, 76100, Rehovot, Israel
Oded Goldreich

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Andoni, A., Indyk, P., Onak, K., Rubinfeld, R. (2010). Sublinear Algorithms in the External Memory Model. In: Goldreich, O. (eds) Property Testing. Lecture Notes in Computer Science, vol 6390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16367-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-16367-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16366-1
Online ISBN: 978-3-642-16367-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics