Machine Learning

, Volume 94, Issue 2, pp 133–149 | Cite as

From machine learning to machine reasoning

An essay


A plausible definition of “reasoning” could be “algebraically manipulating previously acquired knowledge in order to answer a new question”. This definition covers first-order logical inference or probabilistic inference. It also includes much simpler manipulations commonly used to build large learning systems. For instance, we can build an optical character recognition system by first training a character segmenter, an isolated character recognizer, and a language model, using appropriate labelled training sets. Adequately concatenating these modules and fine tuning the resulting system can be viewed as an algebraic operation in a space of models. The resulting model answers a new question, that is, converting the image of a text page into a computer readable text.

This observation suggests a conceptual continuity between algebraically rich inference systems, such as logical or probabilistic inference, and simple manipulations, such as the mere concatenation of trainable learning systems. Therefore, instead of trying to bridge the gap between machine learning systems and sophisticated “all-purpose” inference mechanisms, we can instead algebraically enrich the set of manipulations applicable to training systems, and build reasoning capabilities from the ground up.


Machine learning Reasoning Recursive networks 


  1. Ahmed, A., Yu, K., Xu, W., Gong, Y., & Xing, E. P. (2008). Training hierarchical feed-forward visual recognition models using transfer learning from pseudo tasks. In Proc. 10th European conference on computer vision (ECCV). Google Scholar
  2. Aiello, M., Pratt-Hartmann, I., & van Benthem, J. (Eds.) (2007). Handbook of spatial logics. Berlin: Springer. MATHGoogle Scholar
  3. Bakır, G. H., Hofmann, T., Schölkopf, B., Smola, A. J., Taskar, B., & Vishwanathan, S. V. N. (Eds.) (2007). Predicting structured data. Cambridge: MIT Press. Google Scholar
  4. Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in neural information processing systems (Vol. 19, pp. 153–160). Cambridge: MIT Press. Google Scholar
  5. Bordes, A., Weston, J., Collobert, R., & Bengio, Y. (2011). Learning structured embeddings of knowledge bases. In Proc. 25th conference on artificial intelligence (AAAI). Google Scholar
  6. Bottou, L. (2008). Artificial intelligence in seven years? Seminar presentation, University of Montreal, June 2008.
  7. Bottou, L. (2011). From machine learning to machine reasoning, February 2011. arXiv:1102.1808v3.
  8. Bottou, L., & Gallinari, P. (1991). A framework for the cooperation of learning algorithms. In Advances in neural information processing systems (Vol. 3). San Mateo: Morgan Kaufmann. Google Scholar
  9. Bottou, L., LeCun, Y., & Bengio, Y. (1997). Global training of document processing systems using graph transformer networks. In Proc. of computer vision and pattern recognition (pp. 489–493). New York: IEEE Press. Google Scholar
  10. Buntine, W. (1994). Operations for learning with graphical models. The Journal of Artificial Intelligence Research, 2, 159–225. Google Scholar
  11. Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75. CrossRefGoogle Scholar
  12. Collobert, R. (2011). Deep learning for efficient discriminative parsing. In Proc. artificial intelligence and statistics (AISTAT). Google Scholar
  13. Collobert, R., & Weston, J. (2007). Fast semantic extraction using a novel neural network architecture. In Proc. 45th annual meeting of the association of computational linguistics (ACL) (pp. 560–567). Google Scholar
  14. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537. MATHGoogle Scholar
  15. Wiesel, T. N., & Hubel, D. H. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology, 160, 106–154. CrossRefGoogle Scholar
  16. Etter, V. (2009). Semantic vector machines. Master’s thesis, EPFL. Google Scholar
  17. Friedman, N., Getoor, L., Koller, D., & Pfeffer, A. (1999). Learning probabilistic relational models. In Proc. sixteenth international joint conference on artificial intelligence (pp. 1300–1307). Google Scholar
  18. Grangier, D., Bottou, L., & Collobert, R. (2009). Deep convolutional networks for scene parsing. ICML 2009 Deep Learning Workshop.
  19. Harris, Z. S. (1968). Mathematical structures of language. New York: Wiley. MATHGoogle Scholar
  20. Hilbert, D., & Ackermann, W. (1928). Grundzüge der theoretischen Logik. Berlin: Springer. MATHGoogle Scholar
  21. Hinton, G. E. (1990). Mapping part-whole hierarchies into connectionist networks. Artificial Intelligence, 46, 47–75. CrossRefGoogle Scholar
  22. Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554. MathSciNetCrossRefMATHGoogle Scholar
  23. Hoiem, D., Stein, A., Efros, A. A., & Hebert, M. (2007). Recovering occlusion boundaries from a single image. In Proc. international conference on computer vision (CVPR). Google Scholar
  24. Khardon, R., & Roth, D. (1997). Learning to reason. Journal of the ACM, 44(5), 697–725. MathSciNetCrossRefMATHGoogle Scholar
  25. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. CrossRefGoogle Scholar
  26. LeCun, Y., Bottou, L., & HuangFu, J. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In Proc. computer vision and pattern recognition. Google Scholar
  27. Lighthill, J. (1973). Artificial intelligence: a general survey. In Artificial intelligence: a paper symposium. Science Research Council. Google Scholar
  28. Lonardi, S., Sperduti, A., & Starita, A. (1994). Encoding pyramids by labeling RAAM. In Proc. neural networks for signal processing. Google Scholar
  29. Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 10–60. MathSciNetGoogle Scholar
  30. Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 343–355. CrossRefGoogle Scholar
  31. Miller, M. (2006). Personal communication. Google Scholar
  32. Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge: MIT Press. MATHGoogle Scholar
  33. Neville, J., & Jensen, D. (2003). Collective classification with relational dependency networks. In Proc. second international workshop on multi-relational data mining (pp. 77–91). Google Scholar
  34. NIPS (1987–2010). Advances in neural processing information systems. Volumes 0 to 22. Google Scholar
  35. Paccanaro, A., & Hinton, G. E. (2001). Learning hierarchical structures with linear relational embedding. In Advances in neural information processing systems (Vol. 14). Cambridge: MIT Press. Google Scholar
  36. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo: Morgan Kaufmann. Google Scholar
  37. Pearl, J. (2000). Causality: models, reasoning, and inference. Cambridge: Cambridge University Press. Google Scholar
  38. Piaget, J. (1937). La construction du réel chez l’enfant. Neuchatel: Delachaux et Niestlé. Google Scholar
  39. Plate, T. (1994). Distributed Representations and Nested Compositional Structure. PhD thesis, Department of Computer Science, University of Toronto. Google Scholar
  40. Pollack, J. B. (1990). Recursive distributed representations. Artificial Intelligence, 46, 77–105. CrossRefGoogle Scholar
  41. Popper, K. (1959). The logic of scientific discovery. Stroudsburg: Dowden, Hutchinson and Ross. MATHGoogle Scholar
  42. Richardson, M., & Domingos, P. (2006). Markov logic networks. Journal of Machine Learning Research, 62, 107–136. CrossRefGoogle Scholar
  43. Riesenhuber, M., & Poggio, T. (2003). How visual cortex recognizes objects: the tale of the standard model. The Visual Neurosciences, 2, 1640–1653. Google Scholar
  44. Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. Communications of the ACM, 5, 23–41. Google Scholar
  45. Roth, D. (1996). On the hardness of approximate reasoning. Artificial Intelligence, 82, 273–302. MathSciNetCrossRefGoogle Scholar
  46. Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173. CrossRefGoogle Scholar
  47. Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46, 159–216. MathSciNetCrossRefMATHGoogle Scholar
  48. Socher, R., Ng, A., & Manning, C. (2010). Learning continuous phrase representations and syntactic parsing with recursive neural networks. NIPS Deep Learning workshop presentation, November 2010.
  49. Socher, R., Lin, C., Ng, A. Y., & Manning, C. D. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proc. 28th international conference on machine learning (ICML). Google Scholar
  50. Sperduti, A. (1994). In advances in neural information processing systems: Vol. 5. Encoding labeled graphs by labeling RAAM. San Mateo: Morgan Kaufmann. Google Scholar
  51. Ponce, J., Lazebnik, S., & Schmid, C. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Proc. computer vision and pattern recognition (Vol. II, pp. 2169–2178). Google Scholar
  52. Vapnik, V. N. (1995). The nature of statistical learning theory. Berlin: Springer. CrossRefMATHGoogle Scholar
  53. von Ahn, L. (2006). Games with a purpose. IEEE Computer, 39(6), 92–94 CrossRefGoogle Scholar
  54. Welling, M. (2009). Herding dynamic weights to learn. In Proc. 26th international conference on machine learning (pp. 1121–1128). Google Scholar
  55. Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. In Proc. 25th international conference on machine learning (pp. 1168–1175). CrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  1. 1.Microsoft ResearchRedmondUSA

Personalised recommendations