On prediction by data compression

Vitányi, Paul; Li, Ming

doi:10.1007/3-540-62858-4_69

Paul Vitányi¹ &
Ming Li²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1224))

Included in the following conference series:

European Conference on Machine Learning

482 Accesses
3 Citations
4 Altmetric

Abstract

Traditional wisdom has it that the better a theory compresses the learning data concerning some phenomenon under investigation, the better we learn, generalize, and the better the theory predicts unknown data. This belief is vindicated in practice but apparently has not been rigorously proved in a general setting. Making these ideas rigorous involves the length of the shortest effective description of an individual object: its Kolmogorov complexity. In a previous paper we have shown that optimal compression is almost always a best strategy in hypotheses identification (an ideal form of the minimum description length (MDL) principle). Whereas the single best hypothesis does not necessarily give the best prediction, we demonstrate that nonetheless compression is almost always the best strategy in prediction methods in the style of R. Solomonoff.

Paul Vitányi is also affiliated with the University of Amsterdam. He was supported by NSERC through International Scientific Exchange Award ISE0125663, and by the European Union through NeuroCOLT ESPRIT Working Group Nr. 8556, and by NWO through NFI Project ALADDIN under Contract number NF 62–376. Ming Li was supported in part by NSERC operating grant OGP-046506, ITRC, and a CGAT grant and the Steacie Fellowship. On sabbatical leave from: Department of Computer Science, University of Waterloo; Email: mli@math.uwaterloo.ca.

Download to read the full chapter text

Chapter PDF

Learnability can be undecidable

Article 07 January 2019

Minimizing Relative Entropy in Hierarchical Predictive Coding

Mining and Using Sets of Patterns through Compression

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

J.L. Doob, Stochastic Processes, Wiley, 1953.
Google Scholar
P. Gács, On the symmetry of algorithmic information, Soviet Math. Dokl., 15 (1974) 1477–1480. Correction: ibid., 15 (1974) 1480.
Google Scholar
P. Gács, On the relation between descriptional complexity and algorithmic probability, Theoret. Comput. Sci., 22(1983), 71–93.
Google Scholar
A.N. Kolmogorov, Three approaches to the quantitative definition of information, Problems Inform. Transmission 1:1 (1965) 1–7.
Google Scholar
L.A. Levin, On the notion of a random sequence, Soviet Math. Dokl., 14(1973), 1413–1416.
Google Scholar
M. Li and P.M.B. Vitányi, An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 1993.
Google Scholar
M. Li and P.M.B. Vitanyi, Computational Machine Learning in Theory and Praxis. In: ‘Computer Science Today', J. van Leeuwen, Ed., Lecture Notes in Computer Science, Vol. 1000, Springer-Verlag, Heidelberg, 1995, 518–535.
Google Scholar
P.M.B. Vitanyi and M. Li, Ideal MDL and Its Relation To Bayesianism, ‘Proc. ISIS: Information, Statistics and Induction in Science', World Scientific, Singapore, 1996, 282–291.
Google Scholar
P. Martin-Löf, The definition of random sequences, Inform. Contr., 9(1966), 602–619.
Google Scholar
J.J. Rissanen, Modeling by the shortest data description, Automatica-J.IFAC 14 (1978) 465–471.
Google Scholar
J.J. Rissanen, Stochastic Complexity and Statistical Inquiry, World Scientific Publishers, 1989.
Google Scholar
J.J. Rissanen, Fisher information and stochastic complexity, IEEE Trans. Inform. Theory, IT-42:1(1996), 40–47.
Google Scholar
J. Segen, Pattern-Directed Signal Analysis, PhD Thesis, Carnegie-Mellon University, Pittsburgh, 1980.
Google Scholar
R.J. Solomonoff, A formal theory of inductive inference, Part 1 and Part 2, Inform. Contr., 7(1964), 1–22, 224–254.
Google Scholar
R.J. Solomonoff, Complexity-based induction systems: comparisons and convergence theorems, IEEE Trans. Inform. Theory IT-24 (1978) 422–432.
Google Scholar
A.M. Turing, On computable numbers with an application to the Entscheidungsproblem, Proc. London Math. Soc., Ser. 2, 42(1936), 230–265; Correction, Ibid, 43(1937), 544–546.
Google Scholar
R. von Mises, Grundlagen der Wahrscheinlichkeitsrechnung, Mathemat. Zeitsch., 5(1919), 52–99.
Google Scholar
V. Vovk, Minimum description length estimators under the universal coding scheme, in: P. Vitányi (Ed.), Computational Learning Theory, Proc. 2nd European Conf. (EuroCOLT '95), Lecture Notes in Artificial Intelligence, Vol. 904, Springer-Verlag, Heidelberg, 1995, pp. 237–251; Learning about the parameter of the Bernoulli model, J. Comput. System Sci., to appear.
Google Scholar
C.S. Wallace and D.M. Boulton, An information measure for classification, Computing Journal 11 (1968) 185–195.
Google Scholar
C.S. Wallace and P.R. Freeman, Estimation and inference by compact coding, J. Royal Stat. Soc, Series B, 49 (1987) 240–251. Discussion: ibid.,252–265.
Google Scholar
K. Yamanishi, A Randomized Approximation of the MDL for Stochastic Models with Hidden Variables, Proc. 9th ACM Comput. Learning Conference, ACM Press, 1996.
Google Scholar
A.K. Zvonkin and L.A. Levin, The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms, Russian Math. Surveys 25:6 (1970) 83–124.
Google Scholar

Download references

Author information

Authors and Affiliations

CWI, Kruislaan 413, 1098, SJ Amsterdam, The Netherlands
Paul Vitányi
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
Ming Li

Authors

Paul Vitányi
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Maarten van Someren Gerhard Widmer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vitányi, P., Li, M. (1997). On prediction by data compression. In: van Someren, M., Widmer, G. (eds) Machine Learning: ECML-97. ECML 1997. Lecture Notes in Computer Science, vol 1224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62858-4_69

Download citation

DOI: https://doi.org/10.1007/3-540-62858-4_69
Published: 02 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62858-3
Online ISBN: 978-3-540-68708-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

On prediction by data compression

Abstract

Chapter PDF

Similar content being viewed by others

Learnability can be undecidable

Minimizing Relative Entropy in Hierarchical Predictive Coding

Mining and Using Sets of Patterns through Compression

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On prediction by data compression

Abstract

Chapter PDF

Similar content being viewed by others

Learnability can be undecidable

Minimizing Relative Entropy in Hierarchical Predictive Coding

Mining and Using Sets of Patterns through Compression

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation