Abstract
Most modern lossless data compression techniques used today, are based in dictionaries. If some string of data being compressed matches a portion previously seen, then such string is included in the dictionary and its reference is included every time it appears. A possible generalization of this scheme is to consider not only strings made of consecutive symbols, but more general patterns with gaps between its symbols. The main problems with this approach are the complexity of pattern discovery algorithms and the complexity for the selection of a good subset of patterns. In this paper we address the last of these problems. We demonstrate that such problem is NP-complete and we provide some preliminary results about heuristics that points to its solution.
Categories and Subject Descriptors: E.4 [Coding and Information Theory]–data compaction and compression; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Problems; I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search–heuristic methods.
General Terms: Algorithms, Theory
Additional Key Words and Phrases: Genetic algorithms, optimization, NP-hardness
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burkhardt, S., Kärkkäinen, J.: Better Filtering with Gapped q-Grams. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 73–85. Springer, Heidelberg (2001)
De Jong, K., Spears, W.M.: Using genetic algorithms to solve NP-complete problems. In: Schaffer, J.D. (ed.) Proceedings of the Third International Conference on Genetic Algorithms, pp. 124–132 (1989), http://citeseer.nj.nec.com/dejong89using.html
Hao, J.-K., Lardeux, J.F., Saubion, F.: Evolutionary Computing for the Satisfiability Problem. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoIASP 2003, EvoWorkshops 2003, EvoSTIM 2003, EvoROB/EvoRobot 2003, EvoCOP 2003, EvoBIO 2003, and EvoMUSART 2003. LNCS, vol. 2611, pp. 259–268. Springer, Heidelberg (2003)
Klein, S.T.: Improving Static Compression Schemes by Alphabet Extension. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 210–221. Springer, Heidelberg (2000)
Kuri, A.: A universal Eclectic Genetic Algorithm for Constrained Optimization. In: Proceedings 6th European Congress on Intelligent Techniques & Soft Computing, EUFIT 1998, pp. 518–522 (1998)
Kuri, A.: A Methodology for the Statistical Characterization of Genetic Algorithms. In: Coello Coello, C.A., de Albornoz, Á., Sucar, L.E., Battistutti, O.C. (eds.) MICAI 2002. LNCS (LNAI), vol. 2313, pp. 79–89. Springer, Heidelberg (2002)
Shannon, C.E.: A Mathematical Theory of Communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)
Storer, J., Szymanski, T.: Data Compression via Textual Substitution. JACM 29(4), 928–951 (1982)
Vilo, J.: Pattern Discovery from Biosequences, PhD Thesis, Technical Report A-2002-3, Department of Computer Science, University of Helsinki (2002)
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuri, Á., Galaviz, J. (2004). Pattern-Based Data Compression. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds) MICAI 2004: Advances in Artificial Intelligence. MICAI 2004. Lecture Notes in Computer Science(), vol 2972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24694-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-24694-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21459-5
Online ISBN: 978-3-540-24694-7
eBook Packages: Springer Book Archive