Finding Frequent Elements in Compressed 2D Arrays and Strings

Gagie, Travis; He, Meng; Munro, J. Ian; Nicholson, Patrick K.

doi:10.1007/978-3-642-24583-1_29

Finding Frequent Elements in Compressed 2D Arrays and Strings

Travis Gagie¹⁸,
Meng He¹⁹,
J. Ian Munro¹⁹ &
…
Patrick K. Nicholson¹⁹

Conference paper

740 Accesses
17 Citations
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7024))

Abstract

We show how to store a compressed two-dimensional array such that, if we are asked for the elements with high relative frequency in a range, we can quickly return a short list of candidates that includes them. More specifically, given an m ×n array A and a fraction α > 0, we can store A in \(\ensuremath{\mathcal{O}\!\left( {m n (H + 1) \log^2 (1 / \alpha)} \right)}\) bits, where H is the entropy of the elements’ distribution in A, such that later, given a rectangular range in A and a fraction β ≥ α, in \(\ensuremath{\mathcal{O}\!\left( {1 / \beta} \right)}\) time we can return a list of \(\ensuremath{\mathcal{O}\!\left( {1 / \beta} \right)}\) distinct array elements that includes all the elements that have relative frequency at least β in that range. We do not verify that the elements in the list have relative frequency at least β, so the list may contain false positives. In the case when m = 1, i.e., A is a string, we improve this space bound by a factor of log(1/α), and explore a space-time trade off for verifying the frequency of the elements in the list. This leads to an \(\ensuremath{\mathcal{O}\!\left( {n \min(\log(1/\alpha), H+1)\log n} \right)}\) bit data structure for strings that, in \(\ensuremath{\mathcal{O}\!\left( {1/\beta} \right)}\) time, can return the \(\ensuremath{\mathcal{O}\!\left( {1/\beta} \right)}\) elements that have relative frequency at least β in a given range, without false positives, for β ≥ α.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet partitioning for compressed rank/Select and applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)
Chapter Google Scholar
Clark, D., Munro, J.I.: Efficient Suffix Trees on Secondary Storage (extended abstract). In: Proc. SODA, p. 383 (1996)
Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. of Alg. 55(1), 58–75 (2005)
Article MathSciNet MATH Google Scholar
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)
Chapter Google Scholar
Durocher, S., He, M., Munro, J.I., Nicholson, P.K., Skala, M.: Range majority in constant time and linear space. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011. LNCS, vol. 6755, pp. 244–255. Springer, Heidelberg (2011)
Chapter Google Scholar
Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans. on Inf. Theory 21(2), 194–203 (1975)
Article MathSciNet MATH Google Scholar
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.: Computing iceberg queries efficiently. In: Proc. VLDB, pp. 299–310 (1998)
Google Scholar
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. on Alg. 3(2) (2007)
Google Scholar
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. SODA, pp. 841–850 (2003)
Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Data. Sys. 28(1), 51–55 (2003)
Article Google Scholar
Karpinski, M., Nekrich, Y.: Searching for frequent colors in rectangles. In: Proc. CCCG, pp. 11–14 (2008)
Google Scholar
Misra, J., Gries, D.: Finding repeated elements. Sci. Comp. Prog. 2, 143–152 (1982)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Aalto University, Finland
Travis Gagie
Cheriton School of Computer Science, University of Waterloo, Canada
Meng He, J. Ian Munro & Patrick K. Nicholson

Authors

Travis Gagie
View author publications
You can also search for this author in PubMed Google Scholar
Meng He
View author publications
You can also search for this author in PubMed Google Scholar
J. Ian Munro
View author publications
You can also search for this author in PubMed Google Scholar
Patrick K. Nicholson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Università di Pisa, Italy
Roberto Grossi
Consiglio Nazionale delle Ricerche, Area della Ricerca di Pisa, Istituto di Scienza e Tecnologia dell’Informazione “Alessandro Faedo”, Via Giuseppe Moruzzi 1, 56124, Pisa, Italy
Fabrizio Sebastiani & Fabrizio Silvestri &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gagie, T., He, M., Munro, J.I., Nicholson, P.K. (2011). Finding Frequent Elements in Compressed 2D Arrays and Strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds) String Processing and Information Retrieval. SPIRE 2011. Lecture Notes in Computer Science, vol 7024. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24583-1_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-24583-1_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24582-4
Online ISBN: 978-3-642-24583-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics