Advertisement

Minsky, Chomsky and Deep Nets

  • Kenneth Ward Church
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

When Minsky and Chomsky were at Harvard in the 1950s, they started out their careers questioning a number of machine learning methods that have since regained popularity. Minsky’s Perceptrons was a reaction to neural nets and Chomsky’s Syntactic Structures was a reaction to ngram language models. Many of their objections are being ignored and forgotten (perhaps for good reasons, and perhaps not). While their arguments may sound negative, I believe there is a more constructive way to think about their efforts; they were both attempting to organize computational tasks into larger frameworks such as what is now known as the Chomsky Hierarchy and algorithmic complexity. Section 5 will propose an organizing framework for deep nets. Deep nets are probably not the solution to all the world’s problems. They don’t do the impossible (solve the halting problem), and they probably aren’t great at many tasks such as sorting large vectors and multiplying large matrices. In practice, deep nets have produced extremely exciting results in vision and speech, though other tasks may be more challenging for deep nets.

Keywords

Minsky Chomsky Deep nets Perceptrons 

References

  1. 1.
    Church, K.: Emerging trends: artificial intelligence, China and my new job at Baidu. J. Nat. Lang. Eng. (to appear). University Press, CambridgeGoogle Scholar
  2. 2.
    Minsky, M., Papert, S.: Perceptrons. MIT Press, Cambridge (1969)zbMATHGoogle Scholar
  3. 3.
    Chomsky, N.: Syntactic Structures. Mouton & Co. (1957). https://archive.org/details/NoamChomskySyntcaticStructures
  4. 4.
    Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948). http://math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdfMathSciNetCrossRefGoogle Scholar
  5. 5.
    Shannon, C.: Prediction and entropy of printed English. Bell Syst. Tech. J. 30(1), 50–64 (1951). https://www.princeton.edu/~wbialek/rome/refs/shannon51.pdfCrossRefGoogle Scholar
  6. 6.
    Zipf, G.: Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Boston (1949)Google Scholar
  7. 7.
    Harris, Z.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRefGoogle Scholar
  8. 8.
    Firth, J.: A synopsis of linguistic theory, 1930–1955. Stud. Linguist. Anal. Basil Blackwell (1957). http://annabellelukin.edublogs.org/files/2013/08/Firth-JR-1962-A-Synopsis-of-Linguistic-Theory-wfihi5.pdf
  9. 9.
    Church, K.: A pendulum swung too far. Linguist. Issues Lang. Technol. 6(6), 1–27 (2011)Google Scholar
  10. 10.
    Turing, A.: On computable numbers, with an application to the Entscheidungsproblem. In: Proceedings of the London Mathematical Society, vol. 2, no. 1, pp. 230–265. Wiley Online Library (1937). http://www.turingarchive.org/browse.php/b/12MathSciNetCrossRefGoogle Scholar
  11. 11.
    Hillis, W.: The Connection Machine. MIT Press, Cambridge (1989)Google Scholar
  12. 12.
    Blelloch, G., Leiserson, C., Maggs, B., Plaxton, C., Smith, S., Marco, C.: A comparison of sorting algorithms for the connection machine CM-2. In: Proceedings of the Third Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, pp. 3–16 (1991). https://courses.cs.washington.edu/courses/cse548/06wi/files/benchmarks/radix.pdf
  13. 13.
    Church, K.: On memory limitations in natural language processing, unpublished Master’s thesis (1980). http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-245.pdf
  14. 14.
    Koskenniemi, K., Church, K.: Complexity, two-level morphology and Finnish. In: Coling (1988). https://aclanthology.info/pdf/C/C88/C88-1069.pdf
  15. 15.
    Graves, A., Wayne, G., Danihelka, I.: Neural Turing Machines. arXiv (2014). https://arxiv.org/abs/1410.5401
  16. 16.
    Sun, G., Giles, C., Chen, H., Lee, Y: The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations. arXiv (2017). https://arxiv.org/abs/1711.05738
  17. 17.
    Banko, M., Brill, E.: Scaling to very very large corpora for natural language disambiguation, pp. 26–33. ACL (2001). http://www.aclweb.org/anthology/P01-1005
  18. 18.
    Church, K., Mercer, R.: Introduction to the special issue on computational linguistics using large corpora. Comput. Linguist. 19(1), 1–24 (1993). http://www.aclweb.org/anthology/J93-1001Google Scholar
  19. 19.
    West, G.: Scale. Penguin Books, New York (2017)Google Scholar
  20. 20.
    Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H.: Deep Learning Scaling is Predictable, Empirically. arXiv (2017). https://arxiv.org/abs/1712.00409

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.BaiduSunnyvaleUSA

Personalised recommendations