Abstract
We consider the prefix sums problem: given a (static) sequence of positive integers \(\vec{x} = (x_1, \ldots, x_n)\), such that \(\sum_{i=1}^n x_i = m\), we wish to support the operation \({\sf sum}(\vec{x},j)\), which returns \(\sum_{i=1}^{j} x_i\). Our interest is in minimising the space required for storing \(\vec{x}\), where ‘minimal space’ is defined according to some compressibility criteria, while supporting sum as rapidly as possible.
There are two main compressibility criteria: (a) the succinct space bound, \(B(m, n) = \lceil \log_2 {{m-1}\choose{n-1}} \rceil\) bits, applies to any sequence \(\vec{x}\) whose elements add up to m; (b) data-aware measures, which depend on the values in \(\vec{x}\), and can be lower than the succinct bound for some sequences. Appropriate data-aware measures have been studied extensively in the information retrieval (IR) community [17].
We demonstrate a close connection between the data-aware measure that is the best in practice for an important IR application and the succinct bound. We give theoretical solutions that use space close to other data-aware compressibility measures (often within o(n) bits), and support sum in doubly-logarithmic (or better) time, and experimental evaluations of practical variants thereof.
A bit-vector is a data structure that supports ‘rank/select’ on a bit-string, and is fundamental to succinct and compressed data structures. We describe a new bit-vector that is robust and efficient.
Delpratt is supported by PPARC e-Science Studentship PPA/S/E/2003/03749.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Clark, D., Munro, J.I.: Efficient Suffix Trees on Secondary Storage. In: Proc. 7th ACM-SIAM SODA, pp. 383–391. ACM Press, New York (1996)
Delpratt, O., Rahman, N., Raman, R.: Engineering the LOUDS Succinct Tree Representation. In: Àlvarez, C., Serna, M.J. (eds.) WEA 2006. LNCS, vol. 4007, pp. 134–145. Springer, Heidelberg (2006)
Elias, P.: Efficient Storage Retrieval by Content and Address of Static Files. J. ACM 21, 246–260 (1974)
Fredman, M.L., Willard, D.E.: Trans-Dichotomous Algorithms for Minimum Spanning Trees and Shortest Paths. J. Comput. Sys. Sci. 48, 533–551 (1994)
Geary, R.F., Rahman, N., Raman, R., Raman, V.: A Simple Optimal Representation for Balanced Parentheses. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 159–172. Springer, Heidelberg (2004)
Grossi, R., Sadakane, K.: Squeezing Succinct Data Structures into Entropy Bounds. In: Proc. 17th ACM-SIAM SODA, pp. 1230–1239. ACM Press, New York (2006)
Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. In: Proc. ACM STOC (Prel. vers.), pp. 397–406. ACM Press, New York (2002)
Grossi, R., Vitter, J.S.: Private communication (2004)
Gupta, A., Hon, W.-K., Shah, R., Vitter, J.S.: Compressed Data Structures: Dictionaries and Data-Aware Measures. In: Proc. DCC ’06, IEEE, pp. 213–222 (2006)
Gupta, A., Hon, W.-K., Shah, R., Vitter, J.S.: Compressed Dictionaries: Space Measures, Data Sets, and Experiments. In: Àlvarez, C., Serna, M.J. (eds.) WEA 2006. LNCS, vol. 4007, pp. 158–169. Springer, Heidelberg (2006)
Hagerup, T.: Sorting and Searching on the Word RAM. In: Meinel, C., Morvan, M. (eds.) STACS 1998. LNCS, vol. 1373, pp. 366–398. Springer, Heidelberg (1998)
Hagerup, T., Tholey, T.: Efficient Minimal Perfect Hashing in Nearly Minimal Space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001)
Kim, D.-K., Na, J.C., Kim, J.E., Park, K.: Efficient Implementation of Rank and Select Functions for Succinct Representation. In: Nikoletseas, S.E. (ed.) WEA 2005. LNCS, vol. 3503, pp. 315–327. Springer, Heidelberg (2005)
Raman, R., Raman, V., Rao, S.S.: Succinct Indexable Dictionaries, with Applications to Representing k-Ary Trees and Multisets. In: 13th ACM-SIAM SODA, pp. 233–242. ACM Press, New York (2002)
UW XML Repository, http://www.cs.washington.edu/research/xmldatasets/
VOTable Documentation, http://www.us-vo.org/VOTable/
Witten, I., Moffat, A., Bell, I.: Managing Gigabytes, 2nd edn. Morgan Kaufmann, San Francisco (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Delpratt, O., Rahman, N., Raman, R. (2007). Compressed Prefix Sums. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds) SOFSEM 2007: Theory and Practice of Computer Science. SOFSEM 2007. Lecture Notes in Computer Science, vol 4362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69507-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-69507-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69506-6
Online ISBN: 978-3-540-69507-3
eBook Packages: Computer ScienceComputer Science (R0)