Abstract
Relative Lempel-Ziv is a popular algorithm designed to compress sets of strings relative to a given reference string, which acts as a kind of dictionary. It can still applied even when there is no obvious natural reference string for a dataset, by sampling substrings from the dataset and concatenating them to obtain an artificial reference. This works well in practice but a theoretical analysis has been lacking. In this paper we provide such an analysis and verify it experimentally.
Supported by the Academy of Finland through grants 268324, 284598, and 294143. Part of this work was done while the first author visited the University of A Coruña, Spain. The authors thank Paweł Gawrychowski for his suggestions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at http://pizzachili.dcc.uchile.cl/.
- 2.
For more details, see http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm.
- 3.
Though RLZ does have attractive properties other than ease of compression; for example, support for random access.
References
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Gawrychowski, P.: Faster algorithm for computing the edit distance between SLP-compressed strings. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 229–236. Springer, Heidelberg (2012)
Hoobin, C., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections. Proc. VLDB 5, 265–273 (2011)
Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Linear time Lempel-Ziv factorization: simple, fast, small. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 189–200. Springer, Heidelberg (2013)
Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory. In: Proceedings of the DCC, pp. 153–162 (2014)
Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010)
Liao, K., Petri, M., Moffat, A., Wirth, A.: Effective construction of relative Lempel-Ziv dictionaries. In: Proceedings of the WWW, pp. 807–816 (2016)
Raskhodnikova, S., Ron, D., Rubinfeld, R., Smith, A.D.: Sublinear algorithms for approximating string compressibility. Algorithmica 65(3), 685–709 (2013)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comp. Sci. 302(1–3), 211–222 (2003)
Tong, J., Wirth, A., Zobel, J.: Principled dictionary pruning for low-memory corpus compression. In: Proceedings of the SIGIR, pp. 283–292 (2014)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 337–343 (1977)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Gagie, T., Puglisi, S.J., Valenzuela, D. (2016). Analyzing Relative Lempel-Ziv Reference Construction. In: Inenaga, S., Sadakane, K., Sakai, T. (eds) String Processing and Information Retrieval. SPIRE 2016. Lecture Notes in Computer Science(), vol 9954. Springer, Cham. https://doi.org/10.1007/978-3-319-46049-9_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-46049-9_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46048-2
Online ISBN: 978-3-319-46049-9
eBook Packages: Computer ScienceComputer Science (R0)