As we discussed above, the problem of main interest for us is to obtain a measure of both the complexity and the (useful) information in a data set. As in the algorithmic theory, the complexity is the primary notion, which then allows us to define the more intricate notion of information. Our plan is to define the complexity in terms of the shortest code length when the data is encoded with a class of models as codes. In the previous section we saw that this leads into the noncomputability problem if we let the class of models include the set of all computer programs, a “model” identified with a computer program (code) that generates the given data. However, if we select a smaller class, the noncomputability problem can be avoided, but we have to overcome another difficulty: How are we to define the shortest code length? It seems that in order not to fall back to the Kolmogorov complexity we must spell out exactly how the distributions as models are to be used to restrict the coding operations. Here we adopt a different strategy: we define the idea of shortest code length in a probabilistic sense, which turns out to satisfy all practical requirements - unless the data strings are too short. It is clear that if the strings are short there are too many ways to model them. As a matter of fact, the strings must be long even for the desired algorithmic results to hold.
KeywordsCode Length Universal Model Fisher Information Matrix Kolmogorov Complexity Optimal Degree
Unable to display preview. Download preview PDF.