Abstract
For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to characters. In this paper we first show how, given a probability distribution over an alphabet of \(\sigma \) characters, we can store a nearly optimal alphabetic prefix-free code in \(o (\sigma )\) bits such that we can encode and decode any character in constant time. We then consider a kind of code introduced recently to reduce the space usage of wavelet matrices (Claude, Navarro, and Ordóñez, Information Systems, 2015). They showed how to build an optimal prefix-free code such that the codewords’ lengths are non-decreasing when they are arranged such that their reverses are in lexicographic order. We show how to store such a code in \(\mathcal {O}\!\left( {\sigma \log L + 2^{\epsilon L}}\right) \) bits, where L is the maximum codeword length and \(\epsilon \) is any positive constant, such that we can encode and decode any character in constant time under reasonable assumptions. Otherwise, we can always encode and decode a codeword of \(\ell \) bits in time \(\mathcal {O}\!\left( {\ell }\right) \) using \(\mathcal {O}\!\left( {\sigma \log L}\right) \) bits of space.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Funded in part by European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 690941 (project BIRDS). The first author was supported by: MINECO (PGE and FEDER) grants TIN2013-47090-C3-3-P and TIN2015-69951-R; MINECO and CDTI grant ITC-20151305; ICT COST Action IC1302; and Xunta de Galicia (co-founded with FEDER) grant GRC2013/053. The second author was supported by Academy of Finland grants 268324 and 250345 (CoECGR). The fourth author was supported by Millennium Nucleus Information and Coordination in Networks ICM/FIC P10-024F, Chile.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Since the code tree has height L and \(\sigma \) leaves, it follows that \(L < \sigma \).
- 2.
This descent is conceptual; we do not have a concrete node v at each level, but we do know \(r_v\).
References
Claude, F., Navarro, G., Ordóñez, A.: The wavelet matrix: an efficient wavelet tree for large alphabets. Inf. Syst. 47, 15–32 (2015)
Evans, W., Kirkpatrick, D.G.: Restructuring ordered binary trees. J. Algorithms 50, 168–193 (2004)
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences, full-text indexes. ACM Trans. Algorithm 3(2), 20 (2007)
Gagie, T., He, M., Munro, J.I., Nicholson, P.K.: Finding frequent elements in compressed 2D arrays and strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 295–300. Springer, Heidelberg (2011)
Gagie, T., Navarro, G., Nekrich, Y., Ordóñez, A.: Efficient and compact representations of prefix codes. IEEE Trans. Inf. Theory 61(9), 4999–5011 (2015)
Grossi, R., Gupta, A., and Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings SODA, pp. 841–850 (2003)
Itai, A.: Optimal alphabetic trees. SIAM J. Comp. 5, 9–18 (1976)
Kraft, L.G.: A device for quantizing, grouping, and coding amplitude modulated pulses. M.Sc. thesis, EE Dept., MIT (1949)
Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. Comp. 31(3), 762–776 (2001)
Navarro, G.: Wavelet trees for all. J. Discr. Algorithm 25, 2–20 (2014)
Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012)
Pǎtraşcu, M.: Succincter. In: Proceedings FOCS, pp. 305–313 (2008)
Schwartz, E.S., Kallick, B.: Generating a canonical prefix encoding. Commun. ACM 7, 166–169 (1964)
Wessner, R.L.: Optimal alphabetic search trees with restricted maximal height. Inf. Proc. Lett. 4, 90–94 (1976)
Acknowledgements
This research was carried out in part at University of A Coruña, Spain, while the second author was visiting and the fifth author was a PhD student there. It started at a StringMasters workshop at the Research Center on Information and Communication Technologies (CITIC) of the university. The workshop was partly funded by EU RISE project BIRDS (Bioinformatics and Information Retrieval Data Structures). The authors thank Nieves Brisaboa and Susana Ladra.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Fariña, A., Gagie, T., Manzini, G., Navarro, G., Ordóñez, A. (2016). Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes. In: Inenaga, S., Sadakane, K., Sakai, T. (eds) String Processing and Information Retrieval. SPIRE 2016. Lecture Notes in Computer Science(), vol 9954. Springer, Cham. https://doi.org/10.1007/978-3-319-46049-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-46049-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46048-2
Online ISBN: 978-3-319-46049-9
eBook Packages: Computer ScienceComputer Science (R0)