Abstract
In this work we study sublinear space algorithms for detecting periodicity over data streams. A sequence of length n is said to be periodic if it consists of repetitions of a block of length p for some \(p \leq \frac{n}{2}\). In the first part of this paper, we give a 1-pass randomized streaming algorithm that uses O(log2 n) space and reports the shortest period if the given stream is periodic. At the heart of this result is a 1-pass O(lognlogm) space streaming pattern matching algorithm. This algorithm uses similar ideas to Porat and Porat’s algorithm in FOCS 2009 but it does not need an offline pre-processing stage and is simpler.
In the second part, we study distance to p-periodicity under the Hamming metric, where we estimate the minimum number of character substitutions needed to make a given sequence p-periodic. In streaming terminology, this problem can be described as computing the cascaded aggregate \(L_1\circ F_1^{res(1)}\) over a matrix \(A_{p \times \lfloor\frac{n}{p}\rfloor}\) given in column ordering. For this problem, we present a randomized streaming algorithm with approximation factor 2 + ε that takes \(\tilde{O}(\frac{1}{\epsilon^2})\) space. We also show a 1 + ε randomized streaming algorithm which uses \(\tilde{O}(\frac{1}{\epsilon^{5.5}}p^{1/2})\) space.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alon, N., Matias, Y., Szegedy, M.: Space complexity of approximating the frequency moments. In: STOC 1996 (1996)
Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. In: SODA 2000 (2000)
Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Sampling algorithms: lower bounds and applications. In: CCC 2002 (2002)
Berinde, R., Cormode, G., Indyk, P., Strauss, M.: Space-optimal heavy hitters with strong error bounds. In: PODS 2009 (2009)
Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: SODA 2006 (2006)
Bose, P., Kranakis, E., Morin, P., Tang, Y.: Bounds for frequency estimation of packet streams. In: Proceedings of the 10th International Colloquium on Structural Information and Communication Complexity (2003)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. Theor. Comput. Sci. 312(1), 3–15 (2004)
Cole, R., Hariharan, R.: Approximate String Matching: A Simpler Faster Algorithm. In: SODA 1998 (1998)
Cormode, G., Muthukrishnan, S.: Space efficient mining of multigraph streams. In: PODS 2005, pp. 271–282 (2005)
Czumaj, A., Gąsieniec, L.: On the complexity of determining the period of a string. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 412–422. Springer, Heidelberg (2000)
Elfeky, M.G., Aref, W.G., Elmagarmid, A.K.: STAGGER: periodicity mining of data streams using expanding sliding windows. In: ICDM 2006 (2006)
Ergun, F., Muthukrishnan, S., Sahinalp, C.: Sublinear methods for detecting periodic trends in data streams. In: Farach-Colton, M. (ed.) LATIN 2004. LNCS, vol. 2976, pp. 16–28. Springer, Heidelberg (2004)
Ganguly, S., Kesh, D., Saha, C.: Practical algorithms for tracking database join sizes. In: Sarukkai, S., Sen, S. (eds.) FSTTCS 2005. LNCS, vol. 3821, pp. 297–309. Springer, Heidelberg (2005)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM 53(3), 307–323 (2006)
Indyk, P., Woodruff, D.: Optimal approximations of the frequency moments of data streams. In: STOC 2005 (2005)
Jayram, T.S., Woodruff, D.: The data stream space complexity of cascaded norms. In: FOCS 2009 (2009)
Kane, D.M., Nelson, J., Woodruff, D.: An optimal algorithm for the distinct elements problem. In: PODS 2010 (2010)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern matching algorithms. IBM Journal of Res. and Dev. 249, 260 (1987)
Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comp. 6, 323–350 (1977)
Lachish, O., Newman, I.: Testing periodicity. In: Chekuri, C., Jansen, K., Rolim, J.D.P., Trevisan, L. (eds.) APPROX 2005 and RANDOM 2005. LNCS, vol. 3624, pp. 366–377. Springer, Heidelberg (2005)
Lipsky, O., Porat, E.: Improved sketching of hamming distance with error correcting. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 173–182. Springer, Heidelberg (2007)
Misra, J., Gries, D.: Finding repeated elements. Technical Report, Cornell University (1982)
Monemizadeh, M., Woodruff, D.: 1-Pass relative-error Lp-sampling with applications. In: SODA 2010 (2010)
Muthukrishnan, S.: Data stream algorithms. In: The Barbados Workshop on Computational Complexity (2009)
Porat, B., Porat, E.: Exact and approximate pattern matching in the streaming model. In: FOCS 2009 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ergun, F., Jowhari, H., Sağlam, M. (2010). Periodicity in Streams. In: Serna, M., Shaltiel, R., Jansen, K., Rolim, J. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. RANDOM APPROX 2010 2010. Lecture Notes in Computer Science, vol 6302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15369-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-15369-3_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15368-6
Online ISBN: 978-3-642-15369-3
eBook Packages: Computer ScienceComputer Science (R0)