Some history of the hierarchical Bayesian methodology

  • I. J. Good
Beliefs About Beliefs Invited Papers


A standard technique in subjective “Bayesian” methodology is for a subject (“you”) to make judgements of the probabilities that a physical probability lies in various intervals. In the hierarchical Bayesian technique you make probability judgements (of a higher type, order, level, or stage) concerning the judgements of lower type. The paper will outlinesome of the history of this hierarchical technique with emphasis on the contributions by I. J. Good because I have read every word written by him.


Hierarchical Bayes Partially-Ordered Probabilities Upper and Lower Probabilities Empirical Bayes Species Frequencies Multinomial Estimation Probability Estimation in Contingency Tables Probability Density Estimation Maximum Entropy ML/E Method Type II Likelihood Ratio Information in Marginal Totals Kinds of Probability Bayes/Non-Bayes Synthesis Hyper-Razor of Duns and Ockham 


  1. Bishop, Y.M.M., Fienberg, S.E., andHolland, P.W. (1975).Discrete Multivariate Analysis Harvard, Mass: M.I.T. Press.MATHGoogle Scholar
  2. Crook, J.F. andGood, I.J. (1980). On the application of symmetric Dirichlet distributions and their mixtures to contingency tables, II.Ann. Statist. (to be published).Google Scholar
  3. David, F.N. (1949).Probability Theory for Statistical Methods. Cambridge: University Press.MATHGoogle Scholar
  4. Dempster, A.P., Laird, N.M., andRubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm.J. Roy. Statist. Soc. B 39, 1–38 (with discussion).MATHMathSciNetGoogle Scholar
  5. Good, I.J. (1950).Probability and the Weighing of Evidence. London: Griffin.MATHGoogle Scholar
  6. — (1952). Rational decisions.J. Roy Statist. Soc. B 14, 107–114MathSciNetGoogle Scholar
  7. — (1953). On the population frequencies of species and the estimation of population parameters.Biometrika 40, 237–264.MATHMathSciNetGoogle Scholar
  8. — (1955). Contribution to the discussion on the Symposium on Linear Programming.J. Roy. Statist. Soc. B. 17, 194–196.MathSciNetGoogle Scholar
  9. — (1956). On the estimation of small frequencies in contingency tables.J. Roy. Statist. Soc. B.,18, 113–124.MATHMathSciNetGoogle Scholar
  10. — (1957). Saddle-point methods for the multinomial distribution.Ann. Math. Statist. 28, 861–881.CrossRefMathSciNetGoogle Scholar
  11. — (1959). Kinds of probability.Science 127, 443–447.CrossRefMathSciNetGoogle Scholar
  12. Good, I.J., (1962). Subjective probability as the measure of a non-measurable set. InLogic, Methodology and Philosophy of Science (Nagel, E., Suppes, P., and Tarski, A. eds), 319–329.Google Scholar
  13. — (1963). Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables.Ann. Math. Statist,34, 911–934.MATHCrossRefMathSciNetGoogle Scholar
  14. — (1964). Contribution to the discussion of A.R. Thatcher Relationships between Bayesian and confidence limits for predictions.J. Roy. Statist. Soc. B,26, 204–205.Google Scholar
  15. — (1965).The Estimation of Probabilities: An Essay on Modern Bayesian Methods. Harvard, Mass: M.I.T. Press.MATHGoogle Scholar
  16. — (1966). How to estimate probabilities.J. Inst. Math. Applics. 2, 364–383.MATHCrossRefGoogle Scholar
  17. — (1967). A Bayesian significance test for multinomial distributions.J. Roy. Statist. Soc. B 29, 399–431.MATHMathSciNetGoogle Scholar
  18. — (1969). A subjective analysis of Bode’s law and an ‘objective’ test for approximate numerical rationality.J. Amer. Statist. Assoc. 64, 23–66 (with discussion).CrossRefGoogle Scholar
  19. Good, I.J., (1971a). Contribution to the discussion of Orear and Cassel (1971), 284–286.Google Scholar
  20. — (1971b). Nonparametric roughness penalty for probability densities.Nature Physical Science 229, 29–30.Google Scholar
  21. — (1971c). Twenty-seven principles of rationality. InFoundations of Statistical Inference (V.P. Godambe and D.A. Sprott. ed.) 123–127, Toronto: Holt, Rinehart and Winston.Google Scholar
  22. — (1975). The Bayes factor against equiprobability of a multinomial population assuming a symmetric Dirichlet prior.Ann. Statist. 3, 246–250.MATHCrossRefMathSciNetGoogle Scholar
  23. — (1976a). On the application of symmetric Dirichlet distributions and their mixtures to contingency tables.Ann. Statist. 4, 1159–1189.MATHCrossRefMathSciNetGoogle Scholar
  24. — (1976b). The Bayesian influence or how to sweep subjectivism under the carpet. InFoundations of Probability Theory, Statistical Inference, and Statistical Theories of Science 2 (C.A. Hooker and W. Harper, eds.) 125–174, Dordrecht, Holland: D. Reidel.Google Scholar
  25. Good, I.J., (1979a). A comparison of some statistical estimates for the numbers of contingency tables, item C26 in “Comments, Conjectures, and Conclusions”. InJ. Statist. Comput. Simul. 8, 312–314.Google Scholar
  26. — (1979b). The contributions of Jeffreys to Bayesian statistics. InStudies in Bayesian Econometrics and Statistics in Honor of Harold Jeffreys. (A. Zellner, ed.), 21–34. Amsterdam: North Holland.Google Scholar
  27. — (1979c). Predictive sample reuse and the estimation of probabilities.J. Statist. Comput. Simul. 9, 238–239.CrossRefMathSciNetGoogle Scholar
  28. Good, I.J. andCrook, J.F. (1974). The Bayes/non-Bayes compromise and the multinomial distribution.J. Amer. Statist. Assoc. 69, 711–720.MATHCrossRefGoogle Scholar
  29. — (1977). The enumeration of arrays and a generalization related to contingency tables.Discrete Mathematics 19, 23–45.MATHCrossRefMathSciNetGoogle Scholar
  30. Good, I.J. andGaskins, R.A. (1971). Non-parametric roughness penalties for probability densities.Biometrika 58, 255–277.MATHCrossRefMathSciNetGoogle Scholar
  31. — (1972). Global nonparametric estimation of probability densities.Virginia J. of Science 23, 171–193.MathSciNetGoogle Scholar
  32. — (1979). Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data.J. Amer. Statist. Assoc. 75, 42–73 (with discussion).CrossRefMathSciNetGoogle Scholar
  33. Hardy, G.F. (1889). In correspondence in Insurance Record. Reprinted inTrans.Fac. Actuaries 8 (1920), 174–182.Google Scholar
  34. Hurwicz, L. (1951). Some specification problems and applications to econometric models,Econometrics 19, 343–344 (abstract).Google Scholar
  35. Jaynes, E.T. (1957). Information theory and statistical mechanics.Phys. Rev. 106, 620–630.CrossRefMathSciNetGoogle Scholar
  36. Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems.Proc. Roy. Soc. (London), A. 186, 453–461.MATHCrossRefMathSciNetGoogle Scholar
  37. Johnson, W.E. (1932). Appendix (ed. R.B. Braithwaite) to Probability: deductive and inductive problems.Mind 41, 421–423.Google Scholar
  38. Kemble, E.C. (1941). The probability concept.Philosophy of Science 8, 204–232.CrossRefMathSciNetGoogle Scholar
  39. Keynes, J.M. (1921).A Treatise on Probability. London: Macmillan.MATHGoogle Scholar
  40. Koopmanm, B.O. (1940a). The basis of probability.Bull. Amer. Math. Soc. 46, 763–764.CrossRefMathSciNetGoogle Scholar
  41. — (1940a). The axioms and algebra of intuitive probability.Ann. Math.,41, 269–292.CrossRefGoogle Scholar
  42. Leonard, T. (1978). Density estimation, stochastic processes and prior information.J. Roy. Statist. Soc, B,40, 113–146 (with discussion).MATHMathSciNetGoogle Scholar
  43. Levi, I. (1973). Inductive logic and the improvement of knowledge.Tech. Rep, Columbia University.Google Scholar
  44. Levin, B. andReeds, J. (1977). Compound multinomial likelihood functions: proof of a conjecture of I.J. Good.,Ann. Statist. 5, 79–87.MATHCrossRefMathSciNetGoogle Scholar
  45. Lidstone, G.J. (1920). Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities.Trans. Fac. Actuar. 8, 182–192.Google Scholar
  46. Lindley, D.V. (1971). The estimation of many parameters. InFoundations of Statistical Inference (V.P. Godambe and D.A. Sprott, eds.) 435–455, (with discussion). Toronto: Holt, Rinehart and Winston.Google Scholar
  47. Lindley, D.V. andSmith, A.F.M. (1972). Bayes estimates for the linear model.J. Roy. Statist. Soc. B. 34, 1–41 (with discussion).MATHMathSciNetGoogle Scholar
  48. De Morgan, A. (1847). Theory of probabilities.Encyclopaedia Metropolitana 2, 393–490.Google Scholar
  49. Orear, J. andCassel, D. (1971). Applications of statistical inference to physics. InFoundations of Statistical Inference (V.P. Godambe and D.A. Sprott. eds.) 280–288 (with discussion). Toronto: Holt, Rinehart and Winston.Google Scholar
  50. Pelz, W. (1977).Topics on the estimation of small probabilities. Ph D thesis, Virginia Polytechnic Institute and State University.Google Scholar
  51. Perks, W. (1947). Some observations on inverse probability including a new indifference rule.J. Inst. Actuar 73, 285–312.MathSciNetGoogle Scholar
  52. Reichenbach, H. (1949).The Theory of Probability. Berkeley: University of California Press.MATHGoogle Scholar
  53. Ribbins, H. (1951). Asymptotically subminimax solutions of compound statistical decision problems.Proc. 2nd Berkeley Symp. 131–148. Berkeley: University of California Press.Google Scholar
  54. — (1956). An empirical Bayes approach to statistics.Proc. 3rd Berkeley Symp.1, 157–163. Berkeley: University of California Press.Google Scholar
  55. Rogers, J.M. (1974).Some examples of compromises between Bayesian and non-Bayesian statistical methods. Ph. D. Thesis, Virginia Polytechnic Institute and State University.Google Scholar
  56. Savage, L.J. (1954).The Foundations of Statistics. New York: Wiley.MATHGoogle Scholar
  57. Smith, C.A.B. (1961). Consistency in statistical inference and decision.J. Roy. Statist. Soc. B. 23, 1–37 (with discussion).MATHGoogle Scholar
  58. von Mises, R. (1942). On the correct use of Bayes’s formula.Ann. Math. Statist. 13, 156–165.MATHCrossRefGoogle Scholar

References in the Discussion

  1. Dawes, R.M. (1971). A case study of graduate admissions: Application of three principles of human decision making.Amer. Psychol. 25, 180–188.CrossRefGoogle Scholar
  2. Garthwaithe, P. (1977).Psychological aspects of subjective probability elicitation. M.Sc. Thesis. Department of Statistics, University College of Wales, Aberystwyth.Google Scholar
  3. Geisser, S. (1975a). The predictive sample reuse method with applications.J. Amer. Statist. Assoc. 70, 320–328.MATHCrossRefGoogle Scholar
  4. — (1975b). A new approach to the fundamental problem of applied statistics.Sankhya B 37, 385–397.MATHMathSciNetGoogle Scholar
  5. Goel, P.K. andDeGroot, M.H. (1979). Information about hyperparameters in hierarchical models.Tech. Rep. 160. Department of Statistics, Carnegie-Mellon University.Google Scholar
  6. Gokhale andPress, S.J. (1979). The assessment of a prior distribution for the correlation coefficient in a bivariate normal distribution.Tech. Rep. 58, University of California, Riverside.Google Scholar
  7. Good, I.J. (1962). A compromise between credibility and subjective probability.International Congress of Mathematicians, Abstracts of Short Communication. Stockholm, 160.Google Scholar
  8. — (1965b). Speculations concerning the first ultraintelligent machine.Advances in Computer 6, 31–38.Google Scholar
  9. Good, I.J. (1971d). Unpublished lecture notes entitled “The Bayesian Influence” 122. Statistics Department, Virginia Polytechnic Institute and State University.Google Scholar
  10. — (1972). Food for thought. InInterdisciplinary Investigation of the Brain (J.P. Nicholson, ed.) 1972, 213–228. New York: Plenum Press.Google Scholar
  11. — (1975b). Explicativity, corroboration and the relative odds of hypotheses.Synthese 30, 39–73.MATHCrossRefGoogle Scholar
  12. — (1980a). The logic of hypothesis testing. InPhilosophical Foundations of Economics (J.C. Pitt, ed.) Dordrecht: Reidel.Google Scholar
  13. Good, I.J. (1980b). An approximation of value in the Bayesian analysis of contingency tables.J. Statist. Comput. Simulation (in press).Google Scholar
  14. Kadane, J.B., Dickey, J.M., Winkler, R.L., Smith, W.S. andPeters, S.C. (1980). Interactive elicitation of opinion for a normal linear model.J. Amer. Statist. Assoc. 75 (to appear).Google Scholar
  15. Leonard, T. (1973). A Bayesian method for histograms.Biometrika 60, 297–308.MATHMathSciNetGoogle Scholar
  16. — (1973). Bayesian estimation methods for two-way contingency tables.J. Roy. Statist. Soc. B 37, 23–37.MathSciNetGoogle Scholar
  17. — (1977). A Bayesian approach to some multinomial estimation and pretesting problems.J. Amer. Statist. Assoc. 72, 869–874.MATHCrossRefMathSciNetGoogle Scholar
  18. Lindqvist, B. (1977). How fast does a Markov chain forget the initial state? A decision theoretical approach.Scand. J. Statist. 4, 145–152.MathSciNetGoogle Scholar
  19. — (1978). On the loss of information incurred by lumping states of a Markov chain.Scand. J. Statist. 5, 92–98.MATHMathSciNetGoogle Scholar
  20. O’Hagan, A. andLeonard, T. (1976). Bayes estimation subject to uncertainty about parameter constraints.Biometrika 63, 201–203.MATHCrossRefMathSciNetGoogle Scholar
  21. Scott, D., Tapia, R.A. andThompson, J.R. (1978). Multivariate density estimation by discrete maximum penalized likelihood methods. InGraphical Representation of Multivariate Data. 169–182. New York: Academic Press.Google Scholar
  22. Slovic, P. andLichtenstein, S.C. (1971) Comparison of Bayesian and regression approaches to the study of information processing in judgement.Organizational Behavior and Human Performance 6, 649–744.CrossRefGoogle Scholar
  23. Stone, M. (1974). Cross-validation and multinomial prediction.Biometrika 61, 509–515.MATHCrossRefMathSciNetGoogle Scholar
  24. Tversky, A. (1974). Assessing uncertainty.J. Roy. Statist. Soc. B 36, 148–159.MATHMathSciNetGoogle Scholar

Copyright information

© Springer 1980

Authors and Affiliations

  • I. J. Good
    • 1
  1. 1.Virginia Polytechnic Institute and State UniversityUSA

Personalised recommendations