New Generation Computing

, Volume 13, Issue 2, pp 187–214 | Cite as

Computing as compression: An overview of the SP theory and system

  • J. Gerard Wolff
Regular Papers


This article is an overview of a programme of research based on the conjecture thatall kinds of computing and formal reasoning may usefully be understood as information compression by pattern matching, unification and metrics-guided search.

The research aims to develop this idea into a theory of computing to integrate and simplify diverse concepts in the field. The research also aims to develop a ‘new generation’ computing system, based on the theory, to integrate and simplify diverse kinds of computing and to achieve more flexibility and ‘intelligence’ than conventional computers. Software simulations of the proposed new system provide a concrete expression of the developing theory and a test-bed for the ideas.

The background to the research is briefly reviewed including evidence that information compression is a significant element in biological information processing systems.

Concepts ofinformation andredundancy are described as a basis for describing how information compression may be achieved by the comparison ormatching of patterns, the merging orunification of patterns which are the same, together withmetrics-guided search (e.g., ‘hill climbing’, ‘beam search’) to maximise compression for a given computational effort.

The main elements of the SP theory and of the proposed SP system are described with a summary of developments to date.

Some of the kinds of computing which be interpreted as information compression are briefly reviewed. These include: the ‘low level’ workings of conventional computers; information retrieval, pattern recognition and de-referencing of identifiers; unsupervised inductive learning (grammatical inference, data mining, automatic organisation of software and of knowledge bases); the execution of mathematical or computing functions; deductive and probabilistic inference; parsing and natural language processing; planning and problem solving.

Areas of uncertainty where further work is needed are indicated at appropriate points throughout the article.


Information Compression Theory of Computing Learning Information Retrieval Pattern Recognition Deduction Abduction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1).
    Aamodt, A. and Plaza, E., “Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches,”AI Communications, 7, pp. 39–59, 1994.Google Scholar
  2. 2).
    Atick, J. J. and Redlich, A. N., “Towards a Theory of Early Visual Processing,”Neural Computation, 2, pp. 308–320, 1990.CrossRefGoogle Scholar
  3. 3).
    Attneave, F., “Informational Aspects of Visual Perception,”Psychological Review, 61, pp. 183–193, 1954.CrossRefGoogle Scholar
  4. 4).
    Barlow, H. B., “Possible Principles Underlying the Transformations of Sensory Messages,” inSensory Communication (W. A. Rosenblith, ed.), Cambridge Mass.: MIT Press, pp. 217–234, 1961.Google Scholar
  5. 5).
    Barlow, H. B., “Trigger Features, Adaptation and Economy of Impulses,” inInformation Processing in the Nervous System (K. N. Leibovic, ed.), New York: Springer, pp. 209–230, 1969.Google Scholar
  6. 6).
    Barlow, H. B., “Single Units and Sensation: A neuron Doctrine for Perceptual Psychology,”Perception, 1, pp. 371–394, 1972.CrossRefGoogle Scholar
  7. 7).
    Barlow, H. B. and Földiák, P., “Adaptation and Decorrelation in the Cortex” inThe Computing Neuron (R. M. Durbin, C. Miall, and G. J. Mitchison, eds.), Chapter 4, Wokingham: Addison-Wesley, pp. 54–72, 1989.Google Scholar
  8. 8).
    Barlow, H. B., Kaushal, T. P., and Mitchison, G. J., “Finding Minimum Entropy Codes,”Neural Computation, 1, pp. 412–423, 1989.CrossRefGoogle Scholar
  9. 9).
    Becker, K.-H. and Dorfler, M.,Dynamical Systems and Fractals, Cambridge: Cambridge University Press, 1989.MATHGoogle Scholar
  10. 10).
    Chaitin, G. J.,Algorithmic Information Theory, Cambridge: Cambridge University Press, 1987.Google Scholar
  11. 11).
    Cheeseman, P., “On Finding the Most Probable Model,” inComputational Models of Scientific Discovery and Theory Formation (J. Strager and P. Langley, eds.), San Mateo, Ca.: Morgan Kaufmann, pp. 73–95, 1990.Google Scholar
  12. 12).
    Collins, A. M. and Quillian, M. R., “Experiments on Semantic Memory and Language Comprehension,” inCognition in Learning and Memory (L. W. Gregg, ed.), New York: Wiley, pp. 117–147, 1972.Google Scholar
  13. 13).
    Cook, C. M. and Rosenfeld, A., “Some Experiments in Grammatical Inference,” inComputer Oriented Learning Processes (J. C. Simon, ed.), Leyden: Noordhoff, pp. 157–174, 1976.Google Scholar
  14. 14).
    Cottrell, G. W., Munro, P., and Zipser, D., “Image Compression by Back Propagation: An Example of Extensional Programming,” inModels of Cognition: A Review of Cognitive Science (N. E. Sharkey, ed.), pp. 209–238, 1989.Google Scholar
  15. 15).
    Enderle, G., Kansy, K., and Pfaff, G.,Computer Graphics Programming, Berlin: Springer-Verlag, 1987.MATHGoogle Scholar
  16. 16).
    Földiák, P., “Forming Sparse Representations by Local Anti-Hebbian Learning,”Biological Cybernetics, 64, pp. 165–170, 1990.CrossRefGoogle Scholar
  17. 17).
    Forsyth, R. S., “Ockham’s Razor as a Gardening Tool: Simplifying Discrimination Trees by Entropy Min-Max,” inResearch and Development in Expert Systems, X (M. A. Bramer and R. W. Milne, eds.), Cambridge: Cambridge University Press, pp. 183–195, 1992.Google Scholar
  18. 18).
    Fries, C. C.,The Structure of English, New York: Harcourt, Brace & World, 1952.Google Scholar
  19. 19).
    Gammerman, A., “The Representation and Manipulation of the Algorithmic Probability Measure for Problem Solving,”Annals of Mathematics and Artificial Intelligence, 4, pp. 281–300, 1991.MATHCrossRefMathSciNetGoogle Scholar
  20. 20).
    Gammerman, A., “Geometric Analogy Problems by Minimum Length Encoding,”4th Conference of the International Federation of Classification Societies (IFCS-93), Paris, August–September 1993.Google Scholar
  21. 21).
    Gazdar, G. and Mellish, C.,Natural Language Processing in Prolog, Wokingham: Addison-Wesley, 1989.Google Scholar
  22. 22).
    Harris, Z. S., “Distributional Structure,”Linguistics Today, 10, pp. 146–162, 1954.Google Scholar
  23. 23).
    Hald, G. and Marshall, T. R.,Data Compression: Techniques and Applications, Hardware and Software Considerations, second edition, Chichester: Wiley, 1987.Google Scholar
  24. 24).
    Hinton, G. E. and Sejnowski, T. J., “Learning and Relearning in Boltzmann Machines,” inParallel Distributed Processing, Vol. 1 (D. E. Rumelhart and J. L. McClelland, eds.), Cambridge Mass.: MIT Press, pp. 282–317, 1986.Google Scholar
  25. 25).
    Hopfield, J. J., “Neural Networks and Physical Systems with Emergent Collective Properties,”Proceedings of the National Academy of Science, USA 79, pp. 2554–2558, 1982.CrossRefMathSciNetGoogle Scholar
  26. 26).
    Kolmogorov, A. N., “Three Approaches to the Quantitative Definition of Information,”Problems of Information Transmisson, 1, 1, pp. 1–7, 1965.MathSciNetGoogle Scholar
  27. 27).
    Kumar, V., “Algorithms for Constraint-Satisfaction Problems,”AI Magazine, 13, 1, pp. 32–44, 1992.Google Scholar
  28. 28).
    Li, M. and Vitanyi, P. M. B., “Kolmogorov Complexity and Its Applications,” inHandbook of Theoretical Computer Science (J. van Leeuwen, ed.), Chapter 4, Amsterdam: Elsevier, pp. 188–254, 1990.Google Scholar
  29. 29).
    Li M. and Vitanyi, P. M. B., “Inductive Reasoning and Kolmogorov Complexity,”Journal of Computer and System Sciences, 44, pp. 343–384, 1992.MATHCrossRefMathSciNetGoogle Scholar
  30. 30).
    Linsker, R., “Self-Organization in a Perceptual Network,”IEEE Computer, 21, pp. 105–117, 1988.Google Scholar
  31. 31).
    Mahowald, M. A. and Mead, C., “The Silicon Retina,”Scientific American, 264, 5, pp. 40–47, 1991.Google Scholar
  32. 32).
    Mandrioli, D. and Ghezzi, C.,Theoretical Foundations of Computer Science, New York: Wiley, 1987.MATHGoogle Scholar
  33. 33).
    Muggleton, S., “Inductive Logic Programming,”New Generation Computing, 8, 4, pp. 295–318, 1991.MATHCrossRefGoogle Scholar
  34. 34).
    Newell, A.,Unified Theories of Cognition, Cambridge, Mass.: Harvard University Press, 1990.Google Scholar
  35. 35).
    Newell, A., Shaw, J. C. and Simon, H., “Elements of a Theory of Human Problem Solving,”Psychological Review, 65, pp. 151–166, 1958.CrossRefGoogle Scholar
  36. 36).
    Oja, E., “A Simplified Neuron Model as a Principal Component Analyser,”Journal of Mathematical Biology, 15, pp. 267–273, 1982.MATHCrossRefMathSciNetGoogle Scholar
  37. 37).
    Oldfied, R. C., “Memory Mechanisms and the Theory of Schemata,”British Journal of Psychology, 45, pp. 14–23, 1954.Google Scholar
  38. 38).
    Pednault, E. P. D., “Minimal Length Encoding and Inductive Inference,” inKnowledge Discovery in Databases (G. Piatetsky-Shapiro and W. J. Frawley, eds.), Cambridge, Mass.: MIT Press, pp. 71–92, 1991.Google Scholar
  39. 39).
    Phillips, W. A., Hay, I. M., and Smith, L. S., “Lexicality and Pronunciation in a Simulated Neural Net,”British Journal of Mathematical and Statistical Psychology, 46, pp. 193–205, 1993.Google Scholar
  40. 40).
    Redlich, A. N., “Redundancy Reduction as a Strategy for Unsupervised Learning,”Neural Computation, 5, pp. 289–304, 1993.CrossRefGoogle Scholar
  41. 41).
    Rissanen, J., “Modelling by the Shortest Data Description,”Automatica-J. IFAC, 14, pp. 465–471, 1978.MATHCrossRefGoogle Scholar
  42. 42).
    Rissanen, J., “Stochastic Complexity,”Journal of the Royal Statistical Society, B 49, 3, pp. 223–239 and pp. 252–265, 1987.MATHMathSciNetGoogle Scholar
  43. 43).
    Sanger, T. D., “Optimal Unsupervised Learning in a Single-Layer Linear Feed-Forward Network,”Neural Networks, 2, pp. 459–473, 1989.CrossRefGoogle Scholar
  44. 44).
    Shannon, C. E. and Weaver, W., “The Mathematical Theory of Communication, Urbana: University of Illinois Press, 1949.MATHGoogle Scholar
  45. 45).
    Solomonoff, R. J., “A Formal Theory of Inductive Inference. Parts I and II,”Information and Control, 7, pp. 1–22 and pp. 224–254, 1964.MATHCrossRefMathSciNetGoogle Scholar
  46. 46).
    Solomonoff, R. J., “The Application of Algorithmic Probability to Problems in Artificial Intelligence,” inUncertainty in Artificial Intelligence (L. N. Karnal and J. F. Lemmer, eds.), Elsevier Science, pp. 473–491, 1986.Google Scholar
  47. 47).
    Stanfill, C. and Waltz, D., “Toward Memory-Based Reasoning,”Communications of the ACM, 29, 12, pp. 1213–1228.Google Scholar
  48. 48).
    Storer, J. A.,Data Compression: Methods and Theory, Rockville, Maryland: Computer Science Press, 1988.Google Scholar
  49. 49).
    Southcott, C. B., Boyd, I., Coleman, A. E. and Hammett, P. G., “Low Bit Rate Speech Coding for Practical Applications,” inSpeech and Language Processing (C. Wheddon and R. Linggard, eds.), London: Chapman & Hall, 1990.Google Scholar
  50. 50).
    Stephen, G. A. and Mather, P., “Sweeping away the Problems That Dog the Industry?”AI Communications, 6, 3/4, pp. 213–218, 1993.Google Scholar
  51. 51).
    Sudkamp, T. A.,Languages and Machines, an Introduction to the Theory of Computer Science, Reading, Mass.: Addison-Wesley, 1988.Google Scholar
  52. 52).
    Uspensky, V. A., “Kolmogorov and Mathematical Logic,”Journal of Symbolic Logic, 57, 2, pp. 385–412, 1992.MATHCrossRefMathSciNetGoogle Scholar
  53. 53).
    Von Békésy, G.,Sensory Inhibition, Princeton, NJ: Princeton University Press, 1967.Google Scholar
  54. 54).
    Wallace, C. S. and Boulton, D. M., “An Information Measure for Classification,”Computer Journal, 11, 2, pp. 185–195, 1968.MATHGoogle Scholar
  55. 55).
    Wallace, C. S. and Freeman, P. R., “Estimation and Inference by Compact Coding,”Journal of the Royal Statistical Society, B 49, 3, pp. 240–252, 1987.MATHMathSciNetGoogle Scholar
  56. 56).
    Watanabe, S., “Pattern Recognition as Information Compression,” inFrontiers of Pattern Recognition (S. Watanabe, ed.), New York: Academic Press, 1972.Google Scholar
  57. 57).
    Watanabe, S.,Pattern Recognition: Human and Mechanical, New York: Wiley, 1985.Google Scholar
  58. 58).
    Winston, P. H.,Artificial Intelligence, third edition, Reading, Mass.: Addison-Wesley, 1992.Google Scholar
  59. 59).
    Wolff, J. G., “Language Acquisition, Data Compression and Generalisation,”Language & Communication, 2, pp. 57–89, 1982. (reproduced in Ref. 63), chapter 3).CrossRefGoogle Scholar
  60. 60).
    Wolff, J. G., “Learning Syntax and Meanings through Optimization and Distributional Analysis,” inCategories and Processes in Language Acquisition (Y. Levy, I. M. Schlesinger, and M. D. S. Braine, eds.), Hillsdale, N. J.: Lawrence, Erlbaum, pp. 179–215, 1988. (reproduced in Ref. 63), Chapter 2).Google Scholar
  61. 61).
    Wolff, J. G., “The Management of Risk in System Development: ‘Project SP’ and the ‘New Spiral Model’,”Software Engineering Journal, 4, 3, pp. 134–142, 1989.Google Scholar
  62. 62).
    Wolff, J. G., “Simplicity and Power: Some Unifying Ideas in Computing,”Computer Journal, 33, 6, pp. 518–534, 1990. (reproduced in Ref. 63), Chapter 4).CrossRefMathSciNetGoogle Scholar
  63. 63).
    Wolff, J. G.,Towards a Theory of Cognition and Computing, Chichester: Ellis Horwood, 1991.Google Scholar
  64. 64).
    Wolff, J. G., “On the Integration of Learning, Logical Deduction and Probabilistic Inductive Inference,”Proceedings of the First International Workshop on Inductive Logic Programming, Viana de Castelo, Portugal, pp. 177–191, March 1991.Google Scholar
  65. 65).
    Wolff, J. G., “Computing, Cognition and Information Compression,”AI Communications, 6, 2, pp. 107–127, 1993.MathSciNetGoogle Scholar
  66. 66).
    Wolff, J. G., “Towards a New Concept of Software,”Software Engineering Journal, 9, 1, pp. 27–38, 1994.CrossRefGoogle Scholar
  67. 67).
    Wolff, J. G., “A Scaleable Technique for Best-Match Retrieval of Sequential Information Using Metrics-Guided Search,”Journal of Information Science, 20, 1, pp. 16–28, 1994.CrossRefGoogle Scholar
  68. 68).
    Wolff, J. G., “Computing as Compression: SP20,”New Generation Computing, 13, 2, pp. 215–241.Google Scholar
  69. 69).
    Wolff, J. G., “Computing and Information Compression: A Reply,”AI Communications, 7, 3/4, pp. 203–219, 1994.Google Scholar
  70. 70).
    Wolff, J. G., “An Alternative Scaleable Technique for Best-Match Retrieval of Sequential Information Using Metrics-Guided Search,” in preparation.Google Scholar
  71. 71).
    Wolff, J. G. and Chipperfield, A. J., “Unifying Computing: Inductive Learning and Logic,” inResearch and Development in Expert Systems, VII (T. R. Addis and R. M. Muir, eds.), (Proceedings of Expert Systems’90, Reading, England, September 1990), pp. 263–276, 1990.Google Scholar
  72. 72).
    Zipf, G. K.,Human Behaviour and the Principle of Least Effort, Cambridge, Mass.: Addison-Wesley, 1949.Google Scholar

Copyright information

© Ohmsha, Ltd. and Springer 1995

Authors and Affiliations

  • J. Gerard Wolff
    • 1
  1. 1.School of Electronic Engineering and Computer SystemsUniversity of WalesBangorUK

Personalised recommendations