Abstract
The Bayes Blind Spot of a Bayesian Agent is, by definition, the set of probability measures on a Boolean \(\sigma \)algebra that are absolutely continuous with respect to the background probability measure (prior) of a Bayesian Agent on the algebra and which the Bayesian Agent cannot learn by a single conditionalization no matter what (possibly uncertain) evidence he has about the elements in the Boolean \(\sigma \)algebra. It is shown that if the Boolean algebra is finite, then the Bayes Blind Spot is a very large set: it has the same cardinality as the set of all probability measures (continuum); it has the same measure as the measure of the set of all probability measures (in the natural measure on the set of all probability measures); and is a “fat” (second Baire category) set in topological sense in the set of all probability measures taken with its natural topology. Features of the Bayes Blind Spot are determined from the perspective of repeated Bayesian learning when the Boolean algebra is finite. Open problems about the Bayes Blind Spot are formulated in probability spaces with infinite Boolean \(\sigma \)algebras. The results are discussed from the perspective of Bayesianism.
Keywords
Probability theory Conditionalization Bayesianism1 The main claims
The notion of Bayes Blind Spot of a Bayesian Agent was introduced in Gyenis and Rédei (2017): The Bayes Blind Spot is, by definition, the set of probability measures on a Boolean \(\sigma \)algebra that are absolutely continuous with respect to the background probability measure (prior) of a Bayesian Agent on the algebra and which the Bayesian Agent cannot learn by a single conditionalization no matter what (possibly uncertain) evidence he has about the elements in the Boolean \(\sigma \)algebra. Conditionalization in the most general case involving uncertain evidence is to be understood as conditionalizing using the technique of conditional expectations, of which the usual Bayes rule and Jeffrey conditionalization are special cases (Huttegger 2013; Gyenis and Rédei 2017; Gyenis et al. 2017). The aim of this paper is to determine the properties of the Bayes Blind Spot.
It was shown in Gyenis and Rédei (2017) that the Bayes Blind Spot is a nonempty set in socalled standard probability measure spaces (Definition 4.5 in Petersen 1989). Standard probability spaces include probability spaces with a Boolean algebra having a finite number of elements and also probability spaces on \({\mathrm{I\!R}}^n\) where the probability measure is given by a density function with respect to the Lebesgue measure on \({\mathrm{I\!R}}^n\). These results lead naturally to the question (formulated already in Gyenis and Rédei 2017) of how large the Bayes Blind Spot is. This is a nontrivial problem and there is no unique answer to it in general: The answer depends on both what one takes as the “measure of size” of a set and on the specific properties of the probability measure space. We show in this paper that if the Boolean algebra of the probability space representing the Bayesian Agent’s propositional knowledge has a finite number of elements then the Bayes Blind Spot of this Agent is a very large set, no matter what the prior of the Agent is: The Bayes Blind Spot has the same cardinality as the cardinality of the set of all probability measures on the finite Boolean algebra (continuum); it has the same measure as the measure of the set of all probability measures (in the natural measure on the set of all probability measures); and it is a “fat” (second Baire category) set in topological sense in the set of all probability measures taken with its natural topology.
The large size of the Bayes Blind Spot displays an aspect of the crucial role of priors in Bayesian learning that to our best knowledge has not been noted in the large literature on Bayesian statistical inference so far. The main focus of discussion about priors in Bayesianism is typically about how to chose the prior. Different positions about how to select a prior range from strict subjectivism through objectivism [see Williamson (2010, p. 2) for a brief summary of typical positions]. A large variety of formal methods aiming at selecting priors in Bayesian statistical inference in a disciplined manner also have been developed [see Kaas and Wasserman (1996) for a review]. Our result shows that, irrespective of where the prior comes from, any selected prior is extremely restrictive from the perspective of how many probability measures are in principle accessible for the Bayesian agent as posterior obtained as a result of a single act of conditionalization—if the propositional knowledge of the Bayesian Agent is represented by a finite Boolean algebra. (This will be further discussed in Sect. 8.)
 (a)
How much can a finite Bayesian Agent learn as a result of repeated conditionalization?
 (b)
How large is the Bayes Blind Spot of a nonfinite Bayesian Agent?
Given the notions of conservative and bold Bayesian dynamics, one also can define Bayes NBlind Spots with respect to both conservative and bold Bayes dynamics (Definition 5.4): The Bayes NBlind Spot is the set of probability measures to which there does not lead any (conservative, respectively bold) Bayesian learning path of length less than or equal to N starting from any evidence (N being a natural number). The corresponding infinite Bayes Blind Spots are the intersection of all the (conservative, respectively bold) Bayes NBlind Spots (\(N=1,2\ldots \)). We will see that the infinite conservative Bayes Blind Spot is a very large set if the Boolean algebra is finite. In sharp contrast, the bold Bayes 2Blind Spot (hence also the bold infinite Bayes Blind Spot) of a Bayesian Agent is empty if the Boolean algebra is finite (Proposition 5.5). Thus, given any probability on a finite Boolean algebra, an Agent having a faithful prior can learn this probability from some specific (possibly uncertain) evidence in only two steps of conditioning—if the Agent discards his prior after the first conditioning and performs the second conditioning using as prior the probability learned in the first step. While this is in principle an attractive feature of Bayesian conditioning, it should be emphasized that the Agent must have access to very specific evidence to be able to infer the given probability in only two steps.
Determining the size of the Bayes Blind Spot of a Bayesian Agent represented by a general probability measure space (question (b) above) seems to be a difficult problem. In Sect. 8 we collect the known results proved on this problem in Gyenis and Rédei (2017). This section formulates some further possible lines of inquiry.
2 Learning by conditionalizing
A Bayesian Agent is an abstract, ideal person having degrees of belief p(C) about (the truths of) propositions C in a set \(\mathcal{S}\) forming a Boolean \(\sigma \)algebra. The degrees of belief p(C) behave like probabilities: p is an additive map on \(\mathcal{S}\) formed by (some) subsets of the set X of elementary propositions. The triplet \((X,\mathcal{S},p)\) is a probability measure space (Billingsley 1995; Rosenthal 2006). For monographic works on Bayesianism we refer to Howson and Urbach (1989), Bovens and Hartmann (2004) and Williamson (2010); for papers discussing basic aspects of Bayesianism, including conditionalization, see Howson and Franklin (1994), Howson (1996, 2014), Hartmann and Sprenger (2010), Easwaran (2011a, b) and Weisberg (2011, 2015); for a discussion of Jeffrey conditionalization, see Diaconis and Zabell (1982) and Huttegger (2015). From now on it is assumed that the Boolean algebra \(\mathcal{S}\) has a finite number of elements. (In Sect. 8 we will comment on the situation when \(\mathcal{S}\) is infinite.)
Remark 2.1
A note on terminology: we used the phrase “conditionalizing p via A using Bayes’ rule” above, rather than just saying “conditionalizing on A”, which would be more standard. We do this because we take the position that conditionalizing is a concept and technique in probability theory that is much more general than the Bayes’ rule (1) [also called “ratio formula” (Rescorla 2015)]: Both the Bayes’ rule and Jeffrey rule (see below) are special cases of conditioning with respect to a \(\sigma \)field [see Billingsley (1995, Chapters 33–34) and Gyenis and Rédei (2017) for further discussion of the relation of Bayes’ and Jeffrey rules to the theory of conditionalization via conditional expectation determined by \(\sigma \)fields]. We will also say that the “Bayesian Agent learnsq on the basis of evidence \(q_{\mathcal{A}}\)”. This terminology is common in the literature of machine learning/artificial intelligence (Neal 1996; Barber 2012), and it might be slightly confusing because one also says the “Agent learns” the evidence. But the conceptual structure of the situation is clear: The Agent’s “learning” q means the Agent infersq from evidence \(q_{\mathcal{A}}\) using conditionalization as inference device.
Remark 2.2
3 The Bayes Blind Spot
The question whether q is Bayes accessible is asking whether there exists a nontrivial partition of the 6 element set \(X_6\) and a probability measure \(q_{\mathcal{A}}\) defined on elements of this partition such that q can be obtained from \(q_{\mathcal{A}}\) in the manner (3). The question is not trivial: there exist 203 different partitions in \(\mathcal{S}\) [203 is the \(6{\text {th}}\) Bell number (Conway and Guy 1996, pp. 91–93)]. Thus, if one would try to answer the question by “brute force”, one would have to consider all the 203 partitions and, for each partition, write out Eq. (3) for every B to obtain a large number of equations to solve with \(q_{\mathcal{A}}(A_i)\) as unknowns to see if the system of equations admits a solution. While doable, this procedure becomes intractable in the general situation when the number of elements in the Boolean algebra is very large. One can however find a simple, compact condition that can be used to decide whether a probability measure can be obtained as a conditional probability via Jeffrey conditionalization:
Proposition 3.1
(cf. Gyenis and Rédei 2017) Let \((X,\mathcal{S},p)\) be a probability space with a finite set X having n elements and \(\mathcal{S}\) the Boolean algebra of subsets of X. A probability measure q on \(\mathcal{S}\) is not Bayes accessible if and only if its Radon–Nikodym derivative \(\frac{dq}{dp}\) is an injective function.
Proof
As the example of die throwing shows, Bayes inaccessible probability measures can exist. More generally, one can show that given any background probability p on a finite Boolean algebra, there exists a q on that Boolean algebra that is Bayes inaccessible (Gyenis and Rédei 2017). Following the terminology introduced in Gyenis and Rédei (2017) we will call the set of probability measures on \(\mathcal{S}\) that are not Bayes accessible for the Bayesian Agent (with respect to the fixed background probability p) the “Bayes Blind Spot” of the Agent. If the pdependence of the Bayes Blind Spot needs to be made explicit, we say “Bayes pBlind Spot”.
Remark 3.2
Note that we assumed the background probability p to be faithful, which entails that each probability measure on the Boolean algebra \(\mathcal{S}\) is absolutely continuous with respect to p. If p is not faithful, then there exist probability measures on \(\mathcal{S}\) that are not absolutely continuous with respect to p, and these are trivially not obtainable as conditional probabilities using p as prior. To exclude these trivially nonBayes accessible probability measures from the Bayes Blind Spot in case when p is not faithful, we define the pBayes Blind Spot for such a general p as the set of those probability measures that are absolutely continuous with respect to pand are not Bayes accessible for the Bayesian Agent (with respect to p). Since the main results of the paper state the large size of the Bayes Blind Spot, defining the Bayes Blind Spot more restrictively strengthens the results presented.
4 Size of the Bayes Blind Spot
 Cardinality

One can ask what the cardinality of the Bayes Blind Spot is and how its cardinality is related to the cardinality of the set of all probability measures.
 Topological size

One can ask whether the Bayes Blind Spot is a meager (Baire first category) or nonmeager (Baire second category) set in the set of all probability measures with respect to a natural topology.
 Measure theoretical size

One can ask what the size of the Bayes Blind Spot is with respect to a measure on the set of all probability measures.
4.1 Cardinality
Proposition 4.1
The Bayes Blind Spot of a finite Bayesian Agent has the cardinality of the continuum, and, consequently, for such a Bayesian Agent there exist exactly as many Bayes inaccessible probability measures as the number of all probability measures (in the sense of cardinality), namely a continuum number.
4.2 Topological size—Baire category
Recall that, given a subset E of a topological space T, point x in T is an interior point of E if there is an open set O such that x is belongs to O and O is contained in E. The set of all interior points of E is called the interior of E. A subset E of T is said to be nowhere dense if its closure has empty interior. The sets of the first Baire category in T are those that are countable unions of nowhere dense sets (Rudin 1991, p. 42). Any subset of T that is not of the Baire first category is said to be of the second Baire category. A set E is nowhere dense if and only if its complement \(T\setminus E\) contains an open set that is dense in T. Thus a subset of T which is open and dense is of the second Baire category.
Sets of first category are “meager”, whereas sets of second category are regarded as nonmeager (“fat”) in a topological sense. To see why, it is useful to have examples.
Consider the real line \({\mathbb {R}}\) with its usual topology. Any finite set of points on the line is a nowhere dense set. The set \({\mathbb {Q}}\) of rational numbers is a meager set because \({\mathbb {Q}}\) is a countable union of single rational numbers.
Noncountable meager sets also exists: the Cantor set is uncountable, closed, compact and nowhere dense in \({\mathbb {R}}\) (see Steen and Seebach 1978). The Cantor set is large in cardinality (within the set of real numbers), small in the sense of topology and also small measure theoretically: it is a nullset with respect to the Lebesque measure. But a meager set can have large measure: the real line can be decomposed into two disjoint sets, one being of first Baire category, the other having measure zero with respect to the Lebesgue measure (Theorem 1.6 in Oxtoby 1980). Such a set is the fat Cantor set (Steen and Seebach 1978), which is meager but can have arbitrary large measure.
Open dense sets are easy to come up with: obviously \({\mathbb {R}}\) is open and dense in itself. Removing a finite number of points from \({\mathbb {R}}\) one obtains an open dense set. Less obvious example is the complement of the Cantor set: since the Cantor set is closed and nowhere dense, its complement is open and dense.
To assess the topological size of the Bayes Blind Spot in the set \(M(\mathcal{S})\) of all probability measures on \(\mathcal{S}\), we need to specify a topology on \(M(\mathcal{S})\). Topologies can be defined by metrics (distance functions), and this is how one can specify a topology in the set of probability measures. There exist several types of metrics among probability measures that one can consider. The Appendix lists five typical ones that occur in different contexts. It turns out (and this is proved in the Appendix) that they all are equivalent in the sense that they determine the same topology, which we will call the standard uniform topology. The content of this topology can be expressed in different ways, one of which is the formulation in terms of the distance \(d_3\) of the Appendix: if the probability measure q is \(d_3\)close to the probability measure \(q'\) then the supremum of the difference of the expectation values of random variables with respect to q and \(q'\) is small among all the random variables whose expectation values with respect to the background probability p are close.
Given the standard uniform topology, the topological size of the Bayes Blind Spot is characterized by the following proposition (proof of which we give in the Appendix):
Proposition 4.2
The Bayes Blind Spot of a finite Bayesian Agent is an open and dense set in the set \(M(\mathcal{S})\) of all probability measures equipped with the standard uniform topology on the probability measures.
Corollary 4.3
The complement of the Bayes Blind Spot of a finite Bayesian Agent, the set of Bayes accessible probability measures is a closed, nowhere dense set in the standard uniform topology on the probability measures.
Proposition 4.2 says that the Bayes Blind Spot is a very large, a “fat” set in topological sense, much larger than the set of Bayes accessible states. Viewed from the perspective of topology, there exist much more Bayes inaccessible states than Bayes accessible ones.
Corollary 4.3 entails that the limit of Bayes accessible probability measures is again Bayes accessible. Consequently, a Bayes inaccessible probability measure cannot be approximated with arbitrary precision by Bayes accessible probability measures. Thus one cannot “neutralize” the presence of Bayes inaccessible states by taking the position that the Bayesian Agent can in principle be presented with a series of evidences that can get him arbitrarily close to a Bayes inaccessible probability measure.
Furthermore, the set of Bayes accessible probability measures, being the complement of a dense open set, is not only a closed set but a meager set: a closed set with empty interior. Thus, while there exist an uncountably infinite number of Bayes inaccessible probability measures arbitrary close to every Bayes accessible one, every Bayes inaccessible probability measure has a neighborhood in which there are only Bayes inaccessible probability measures.
The Bayes inaccessible probability measures “dominate” the set of all probability measures completely in a topological sense.
4.3 Measure theoretical size
To assess the measure theoretical size of the Bayes Blind Spot in the set \(M(\mathcal{S})\), one has to specify a \(\sigma \)algebra in \(M(\mathcal{S})\) and a measure over this algebra. The natural algebra and measure is the one arising from the Lebesgue measure in the following way:
We can identify measures in \(M(\mathcal{S})\) with functions \(f:X\rightarrow [0,1]\) such that \(\sum _{x\in X}f(x)=1\). Under this identification each probability measure is identified with a point in \([0,1]^n\) (recall: n is the number of elements in X). Thus \(M(\mathcal{S})\subseteq [0,1]^{n}\).
Proposition 4.4
The Bayes Blind Spot has \(\mu \) measure equal to 1. The set of Bayes accessible states is a \(\mu \) measure zero set.
Proposition 4.4 says that the Bayes Blind Spot is a very large set in the set of all probability measures, with respect to the natural (Lebesgue) measure in which the set of all probability measures has nonzero measure.^{1} “Very large” means here: as large as possible: having the same size as the size of the set of all probability measures. This entails that the Bayes accessible states form a measure zero set in this measure.
Remark 4.5
Propositions 4.1, 4.2 and 4.4 are proved under the assumption that the background probability measure p is faithful. These propositions remain true however if the faithfulness assumption is dropped: If p is not a faithful probability measure, then it has zero probability on some elements in X. In terms of the geometrical picture of figure (Sect. 4.3) this means that the point in the simplex representing p is on an “edge” E of the simplex. All the probability measures that are absolutely continuous with respect to p, hence all the potentially Bayes paccessible probability measures, are also on E. This edge can be regarded as the set of all probability measures on the Boolean algebra that is obtained from \(\mathcal{S}\) by removing from \(\mathcal{S}\) the oneelement sets on which p is zero, and the restriction \(p'\) of p to this Boolean algebra is faithful. Proposition 4.2 entails then, that the set of Bayes \(p'\)accessible probability measures is a nowhere dense set in E in the relative topology on E inherited from \(M(\mathcal{S})\). But then this set also is a nowhere dense set in \(M(\mathcal{S})\), and its complement, the Bayes \(p'\)Blind Spot, contains an open dense set, and is thus a set of Baire second category. It follows that the Bayes pBlind Spot is a set of second Baire category, irrespective of wether p is faithful or not. Since an open and dense set in a complete metric space has to have uncountable cardinality, the Bayes pBlind Spot has uncountable cardinality irrespective of wether p is faithful or not. Furthermore, since the edge E lies in a proper linear subspace of the linear space in which \(M(\mathcal{S})\) has nonzero \(\lambda ^{n1}\) (Lebesgue) measure, the measure of the set of Bayes \(p'\)accessible measures in E also has \(\lambda ^{n1}\) measure zero. It follows that the Bayes pBlind spot has measure 1 in the measure \(\mu \) in which \(M(\mathcal{S})\) has measure 1 too—irrespective of whether p is faithful.
5 Bayesian dynamics and the Bayes Blind Spot
5.1 Bayes Blind Spots of Bayes learning paths
Once the Bayesian Agent with background measure p has inferred a probability measure q from evidence \(q_{\mathcal{A}}\) using the Jeffrey rule (3), he can do one of two things: He can look for further evidence to check which of the inferred probabilities are correct, and, keeping his background probabilityp, he can perform a second inference that takes into account the new evidence. This learning move, identified and analyzed briefly in Gyenis and Rédei (2017), gives rise to a dynamic of Bayesian learning, which we call here the conservative Bayes dynamic—conservative because the Agent keeps his background probability while repeating conditionalization on the basis of new evidence. “Correct” means in this context that the inferred probability measure is equal to a specific probability measure \(p^*\) that the Agent wishes to learn (for instance because \(p^*\) represents objectively given frequencies). The other thing the Agent can do is to transform himself into a different Agent by replacing his background beliefp with the inferred probability measure q, and, on the basis of this new background measure, he can infer probability measures from new evidences, based on his new prior. This defines a dynamic of Bayesian learning we call bold Bayes dynamic—bold because the Agent accepts the inferred probability measure as background in spite of the fact that the inferred probability might not be the correct \(p^*\) the Agent wants to learn. In this section we analyze the Bayes Blind Spot from the perspective of these two types of dynamics.
The precise definitions of the two dynamics are as follows:
Definition 5.1
 1.Given a probability measure q in \(M(\mathcal{S})\), the sequence of probability measures \(\{q^c_n\}_{n\in {\mathrm{I\!N}}}\) in \(M(\mathcal{S})\) is called a conservative Bayes learning path from q determined by the Bayesian dynamical system \((X,\mathcal{S},p, \{\mathcal{A}_n\}_{n\in {\mathrm{I\!N}}})\) (the superscript c standing for “conservative”) if \(q^c_0=q\), and for all \(n>0\) the \(q^c_n\) is obtained from \(q^c_{n1}\) via the Jeffrey rule (3) using \(\mathcal{C}_n\); i.e. for all \(n>0\) we havewhere for all i the set \(A^n_i\) is an element of the partition \(\mathcal{C}_n=\{A^n_i : i=1,2,\ldots K_n\}\).$$\begin{aligned} q^c_n(B)\doteq \sum ^{K_n}_i\frac{p(B\cap A^n_i)}{p(A^n_i)}q^c_{n1}(A^n_i)\quad \text{ for } \text{ all } B\in \mathcal{S}\end{aligned}$$(16)
 2.Given a sequence \(\{r_n\}_{n\in {\mathrm{I\!N}}}\) of probability measures in \(M(\mathcal{S})\), the sequence of probability measures \(\{q^b_n\}_{n\in {\mathrm{I\!N}}}\) in \(M(\mathcal{S})\) is called a bold Bayes learning path determined by the Bayesian dynamical system \((X,\mathcal{S},p, \{\mathcal{A}_n\}_{n\in {\mathrm{I\!N}}})\) (the superscript b standing for “bold”) based on the evidence sequence \(\{r_n\}_{n\in {\mathrm{I\!N}}}\) if \(q^b_0=p\), and for all \(n>0\) the \(q^b_n\) is obtained from \(q^b_{n1}\) via the Jeffrey rule (3) using \(\mathcal{C}_n\) and evidence \(r_n\) with the prior being the probability measure \(q^b_{n1}\) inferred in the preceding step, i.e. if for all \(n>0\) we havewhere for all i the set \(A^n_i\) is an element of the partition \(\mathcal{C}_n=\{A^n_i : i=1,2,\ldots K_n\}\).$$\begin{aligned} q^b_n(B)\doteq \sum ^{K_n}_i\frac{q^b_{n1}(B\cap A^n_i)}{q^b_{n1}(A^n_i)}r_n(A^j_i)\quad \text{ for } \text{ all } B\in \mathcal{S}\end{aligned}$$(17)
To simplify notation, in what follows we use \(\{q^{\alpha }_n\}_{n\in {\mathrm{I\!N}}}\) to refer to both conservative (\(\alpha =c\)) and bold (\(\alpha =b\)) Bayes learning paths.
The bold Bayesian learning is risky in the following sense: When the Bayesian Agent infers \(q^b_1\) from \(r_1\) on the basis of his prior p via a Jeffrey conditionalization in the first step of the learning process specified by Eq. (17), he cannot be certain that the inferred probability measure \(q^b_1\) is correct in the sense of being equal to \(p^*\) because \(p^*\) might be in the Bayes pBlind Spot and we know from the propositions in Sect. 4 that it is “overwhelmingly likely” (as measured in terms of the measure \(\mu \) defined by Eq. (15) in which the Bayes accessible measures are measure zero) that \(p^*\) is in the Bayes pBlind Spot hence that \(q^b_1\) is not correct; yet, the Agent adopts \(q^b_1\) as his new prior, on the basis of which he performs the second inference. The same holds for the second, third, and any subsequent inferences via Jeffrey conditionalization: if \(p^*\) happens to be in the Bayes \(q^b_{n1}\)Blind Spot, then \(q^b_n\) obtained via (17) will be incorrect. It follows that the risk of adopting a wrong probability measure as prior is present at every step in a bold Bayes learning path. To see whether this risk gets reduced as the Agent moves along a Bayes learning path one has to look at the Bayes Blind Spot of the whole learning path, defined as the intersection of the Bayes Blind Spots the Agent has at every step:
Definition 5.2
(Recall that BBS(p) denotes the Bayes pBlind spot of a probability measure p.)
Since in a conservative Bayes learning path the background measure stays the same, at every step on such a conservative learning path the Bayes Blind Spot remains the same and is identical to the Bayes pBlind Spot: \(BBS[\{q^c_n\}_{n\in {\mathrm{I\!N}}}]=BBS(p)\). Thus we can conclude that moving alone a conservative Bayes Learning path does not reduce the size of the Bayes Blind Spot.
In a bold Bayes Learning Path \(\{q^b_n\}_{n\in {\mathrm{I\!N}}}\) the priors entering the Jeffrey formula may change at every step: for any j, at step \(j+1\) in a bold Bayes learning path \(\{q^b_n\}_{n\in {\mathrm{I\!N}}}\), the probability \(q^b_j\) is the Agent’s background measure on the basis of which \(q^b_{j+1}\) is inferred. For \(j>0\), \(q^b_j\) may not be faithful even if \(q^b_0=p\) is; however, by Remark 4.5 the Bayes \(q^b_{j}\)Blind Spot also is a large set topologically: it contains an open dense set and is therefore of second Baire category. Since in a complete metric space the intersection of a countable number of open and dense sets is open and dense by the Baire category theorem (Oxtoby 1980), and since the set \(M(\mathcal{S})\) of all probability measures is a metric space with respect to any of the metrics discussed in Sect. 8.1, it follows that the intersection of all the Bayes \(q^b_j\)Blind Spots contains an open dense set and is thus a fat (nonmeager) set in the topological sense. Also by Remark 4.5 the Bayes \(q^b_j\)accessible states are measure zero for all j [in the measure defined by (15)]. Thus their (countable) union also has measure zero, and it thus follows that the intersection of all the Bayes \(q^b_j\)Blind Spots, i.e. \(BBS[\{q^b_n\}_{n\in {\mathrm{I\!N}}}]\), has measure 1. Thus what we have shown is the next proposition:
Proposition 5.3
Given any Bayesian dynamical system \((X,\mathcal{S},p, \{\mathcal{A}_n\}_{n\in {\mathrm{I\!N}}})\) with a finite Boolean algebra \(\mathcal{S}\), and given any conservative or bold Bayes Learning Paths \(\{q^c_n\}_{n\in {\mathrm{I\!N}}}\) and \(\{q^b_n\}_{n\in {\mathrm{I\!N}}}\), the Bayes Blind Spots \(BBS[\{q^c_n\}_{n\in {\mathrm{I\!N}}}]\) and \(BBS[\{q^b_n\}_{n\in {\mathrm{I\!N}}}]\) of both the conservative and bold Bayes Learning Paths are large sets: in cardinality, in the sense of topology and with respect to the natural measure on the set of all probability measures.
5.2 Bayes Naccessibility and the infinite Bayes Blind Spot
Given the notions of conservative and bold Bayesian dynamics, one also can define Bayes Naccessibility with respect to both conservative and bold learning paths of length N. This in turn makes it possible to define the corresponding Bayes NBlind Spots and an infinite Bayes Blind spot. In this section we define these notions and investigate their properties.
Definition 5.4
 i.
We say that the probability measure r on \(\mathcal{S}\) is Bayes Naccessible for the Bayesian Agent via a conservative (respectively bold) Bayes learning path if there exists a series of (proper, nontrivial) Boolean subalgebras \(\{\mathcal{A}_n\}_{n\in {\mathrm{I\!N}}}\) of \(\mathcal{S}\) and a conservative \(\{q^c_n\}_{n\in {\mathrm{I\!N}}}\) (respectively bold \(\{q^b_n\}_{n\in {\mathrm{I\!N}}}\)) Bayes learning path (in the sense of Definition 5.1) such that \(r=q^c_{N}\) (respectively \(r=q^b_{N}\)) for some natural number N.
 ii.The conservative (respectively bold) Bayes (p, N)Blind Spots denoted by \(BBS^c(p,N)\) and \(BBS^b(p,N)\) of the Bayesian Agent is the set of probability measures on \(\mathcal{S}\) that are absolutely continuous with respect to p and which are not Bayes \(N'\)accessible via a conservative (respectively bold) Bayes learning path of length \(N'\) smaller than or equal to N. The infinite Bayes Blind Spots \(BBS^{\alpha }_{\infty }(p)\) (\(\alpha =c,b\)) are defined as the intersection:$$\begin{aligned} BBS^{\alpha }_{\infty }(p)\equiv \cap _{N\in {\mathrm{I\!N}}} BBS^c(p,N) \quad \alpha = c, b \end{aligned}$$(19)
Since in a conservative Bayes learning path the background measure stays the same, at every step on such a conservative learning path the Bayes Blind Spot remains the same and is identical to the Bayes pBlind Spot. Thus \(BBS^c_{\infty }[\{q^c_n\}_{n\in {\mathrm{I\!N}}}]=BBS(p)\)—the infinite conservative Bayes Blind Spot is a very large set if the Boolean algebra is finite. The situation is radically different in the case of bold Bayes learning:
Proposition 5.5
Let \((X,\mathcal{S},p)\) be a probability measure space with a finite Boolean algebra \(\mathcal{S}\) having at least 3 atoms on each of which p has nonzero values. Then the bold Bayes (p, 2)Blind Spot is empty. As a consequence, the infinite bold Bayes Blind Spot is also empty: \(BBS^b_{\infty }(p)=\emptyset \).
Proposition 5.5 states that, given any prior p of a finite Bayesian Agent, for any probability measure \(p^*\) (absolutely continuous with respect to the prior) there exists an ordered pair \((r_1, r_2)\) of two probability measures \(r_1,r_2\) as (uncertain) evidence such that \(p^*\) can be obtained as a result of only two subsequent Jeffrey conditionalizations using evidences \(r_1\) and \(r_2\)—provided the prior p used in the first conditionalization is replaced by the inferred probability in the second conditionalization that uses evidence \(r_2\). We prove Proposition 5.5 in the “Appendix”.
6 The Bayes Blind Spot in infinite probability spaces
The results presented in the previous sections lead to several questions concerning the Bayes Blind Spot in probability measure spaces \((X,\mathcal{S},p)\) with an infinite Boolean \(\sigma \)algebra \(\mathcal{S}\). In this more general situation the general conditioning rule yielding conditional probabilities with respect to arbitrary sub\(\sigma \)fields \(\mathcal{A}\) of \(\mathcal{S}\) is given by the concept of \((\mathcal{A},p)\)conditional expectation \({\mathscr {E}}(\cdot \mathcal{A})\) (Billingsley 1995, p. 445), of which the Jeffrey rule (and hence Bayes’ rule) is just particular cases (Gyenis and Rédei 2017; Gyenis et al. 2017). \({\mathscr {E}}_p(\cdot \mathcal{A})\) is a linear map (projection) on the set \(\mathcal{L}^1(X,\mathcal{S},p)\) of pintegrable real valued random variables defined on X. In complete analogy with the Bayes accessibility relation in Sect. 3, this map \({\mathscr {E}}_p(\cdot \mathcal{A})\) defines a Bayes accessibility relation in the set of probability measures that are absolutely continuous with respect to p, and the notion of pBayes Blind Spot also can be defined exactly as in Sect. 3 [for an explicit definition see also Gyenis and Rédei (2017)]. Determining the size of the Bayes pBlind Spot of a general probability measure space \((X,\mathcal{S},p)\) is a nontrivial problem, with a number of questions still open. At this point, the following partial results are known in the general case:

There exist probability spaces with an empty Bayes Blind Spot. The only example of such a probability space known to us is the one constructed in Gyenis and Rédei (2017). The set of elementary events X of this probability space is very large: its cardinality X has to satisfy \(X>2^{2^{\aleph _0}}\) (with \(\aleph _0\) being the countable cardinality).

The “usual” (technically speaking: the “standard”, see Definition 4.5 in Petersen 1989) infinite probability spaces that occur in applications can be shown to have a Bayes Blind Spot that has the cardinality of the continuum (Gyenis and Rédei 2017). Such probability spaces include the probability measures on \({\mathrm{I\!R}}^n\) given by a density function with respect to the Lebesgue measure in \({\mathrm{I\!R}}^n\). Work is in progress to determine the topological and measure theoretical size of the Bayes Blind Spot of these standard probability spaces (Gyenis and Rédei 2019).
7 Some Bayesian models of learning and the Bayes Blind Spot
“Bayesian learning” is not a unique concept. There exist different understandings of learning and learning scenarios and they have different mathematical models based on probability theory; in particular the mathematical notion of conditioning is used in those models in different ways. In this section we comment on the relation of the concept of Bayesian learning as understood in this paper to two other interpretations of learning in a Bayesian manner: Bayesian parameter estimate and merging of probabilities (opinions). The mathematically explicit and precise description of these scenarios requires a lot of technical definition. Giving those details would go way beyond the framework of this paper; hence we just summarize here the main ideas with minimal notation and only to the extent needed to make some points about the relation of the notion of Bayesian learning used in our paper to these scenarios. We also make brief comments on the phenomenon of nonempty Bayes Blind Spots from the perspective of these other ideas about Bayesian learning.
7.1 Bayesian parameter estimate
The Bayesian interpretation of this result is that if one conditionalizes subjective priors defined on the parameter space on the basis of objective probabilities obtained in independent identically distributed trials, then the conditioned subjective prior will concentrate more and more on the parameter that corresponds to the objective probability. While the technical result described above does lend some support to this interpretation, one should not forget about the constraint on this interpretation entailed by the condition that the tendency of the prior to concentrate more and more on \(\theta _0\) does not hold for all\(\theta _0\) in \(\Theta \): it holds only for parameters in a set of probability 1 with respect to the subjective prior \(\Pi \). And, as Belot (2013) argues (and as the discussion of size in Sect. 4 also indicates), a probability 0 set need not be small in some other, relevant senses of size (cardinality, topological size). Thus the subjective prior does constrain what can be learned in a Bayesian parameter estimate, and the constraint can be very significant. The precise content of the constraint depends on the specific properties of the prior \(\Pi \) [see Belot (2013) for a detailed analysis and Barron et al. (1999) for further technical results on this dependence]. This limitation is not exactly the same as the limitation of Bayesian learning displayed by large Bayes Blind Spots of a prior but is similar in kind: showing limits of a specific Bayesian model of learning entailed by the need of fixing a specific prior in Bayesian learning.
 1.
The conditioning in Bayesian parameter estimate is carried out on a probability space which is the product of the (infinite product of the) “objective” probability measure space and of the space of parameters (= of probability measures on the Boolean algebra of objective random events). This product space, in which conditioning takes place, is infinite. In our framework the prior is not over the joint space of the parameters and outcomes, but only over the space of outcomes. Learning is understood as taking place within this probability space—this is a standard concept of learning, see Diaconis and Zabell (1982). In harmony with this, the main results on the size of the Bayes Blind Spot in our paper hold for finite spaces only.
 2.
The evidence in Bayesian parameter estimate is an infinite sequence of random events with a specific distribution reflecting the probability to be learned. In our framework the evidence is a sharply defined probability measure on a proper, nontrivial subalgebra of the whole Boolean algebra of random events. It is assumed that this probability is given, i.e. that it is known, and we take Bayesian learning as inference from this given, probabilistically accurate, precise and sharp but (from the perspective of the whole set of random events) partial information. In Bayesian parameter estimate the elements in the sequence \((x_1,x_2,\ldots , x_n,\ldots )\) are not restricted to a proper subset of the set of all elementary random events. In harmony with this, at no point in the process of the Agent’s prior approaching the measure concentrating on the true parameter \(\theta _0\) will the Agent know necessarily the precise values of the probabilities \(p_{\theta _0}(A)\) on any element A, hence on any subalgebra.
This example of die throwing shows that the learning scenario we consider is trivial in the case when the Boolean algebra is trivial. This happens in coinflipping: here the Boolean algebra has only four elements, hence there are no nontrivial subBoolean algebras and thus one cannot learn the distribution (frequency) of heads/tails obtained in a series of flips in a nontrivial manner via conditionalization in this simple, meagre probability space, in the way we define Bayesian learning. To put it differently: The Bayes Blind Spot of the probability space \((X,\mathcal{S},p)\) where X has only two random events, is (vacuously) the set of all probability measures on the four element algebra \(\mathcal{S}\).
Yet, the kind of Bayesian learning situation we are considering is not exceptional or artificial. It occurs every time one has the task of inferring probabilities from coarsegrained probabilities. Suppose one has aggregate frequency data of occurrence of a certain property P in a population and one wishes to infer from these numbers the frequency of occurrence of P data in other portions of the population. For instance given car accident frequency data (P) in large counties, one may wish to infer car accident frequency data in other (in particular: smaller) municipalities. One can make such an inference by conditioning, and the result of the inferences might be factually correct in an infinite number of cases: whenever the true distribution is not in the Bayes Blind Spot of the prior chosen. But in such a situation our results apply: no matter what the prior, there will be a lot of probability measures (the ones in the Bayes Blind Spot) to which one cannot infer in such situations, and the true probability might be among these.
As described under 2. above, the kind of learning investigated in this paper differs in two significant ways from learning in a Bayesian parameter estimate: (i) input information is stronger, since the agent receives precise probability values of elements in a proper subalgebra; and (ii) the success criteria is more demanding since it asks that the target measure be learned exactly and not just approximated as sample sizes increase. Since in both of these Bayesian learning models the selected prior constrains what can be learned, the question arises^{2} what the relation of the two constraints are. Specifically: is it true that (some) elements in the Bayes Blind Spot belong to the set of parameters on which the Agent’s prior does not concentrate in the limit? This is an interesting but difficult question to which we do not know the answer. Part of the difficulty is that in Bayesian parameter estimate the probability spaces are infinite and very little is known about the size of the Bayes Blind Spot in the infinite case (cf. Sect. 6). But clarifying the situation would be interesting because one would like to know whether the constraints imposed by the priors in the two models of Bayesian learning strengthen or compensate each other. This could be a topic for further investigation.
7.2 Merging of probabilities
 (i)
partition \(\mathcal{P}_i\) is finer than partition \(\mathcal{P}_j\) for \(i>j\);
 (ii)
the union of the Boolean algebras \(\mathcal{A}_n\) generated by partition \(\mathcal{P}_n\) generate \(\mathcal{S}\);
 (iii)
if \(q(A)>0\) for an A in a partition \(\mathcal{P}_n\), then \(p(A)>0\).
The major result on merging probability measures is:
Proposition 7.1
(Theorems 1 and 2 in Kalai and Lehrer 1994) p merges q if and only if q is absolutely continuous with respect to p.
One also can regard p as the Agent’s prior (i.e. the Agent’s assumption about what the objective probability is) and q as the objective probability describing some phenomenon. Then even if the merging of p and q is interpreted as “learning q” (an interpretation that needs further supporting arguments), this merging is fully compatible with the presence of nonempty Bayes Blind Spots: If p merges q in some information sequence, then q must be absolutely continuous with respect to p by Proposition 7.1; hence q is given by a density function f with respect to p. If f is injective then q is in the Bayes Blind Spot of p [Proposition 3.1 and Gyenis and Rédei (2017, Lemma 6.3)]. This simply means that after any conditionalization based on p as prior (i.e. after extending q from its restriction to any proper Boolean subalgebra \(\mathcal{A}\) to the whole Boolean algebra using the \({\mathscr {E}}_p(\cdot \mathcal{A})\) conditional expectation) the conditioned probability (i.e. the extension) will not be equal to q—in spite of the conditioned probabilities \(p(\cdot A_n)\) and \(q(\cdot A_n)\) of p and q getting closer to each other asymptotically (merging).[...] if the opinions of two individuals, as summarized by p and q, agree only in that \(p(D)> 0 \leftrightarrow q(D) > 0\) [mutual absolute continuity of p and q], then they are certain that after a sufficiently large finite number of observations [...] their opinions will become and remain close to each other, where close means that for every event E the probability that one man assigns to E differs by at most \(\varepsilon \) from the probability that the other man assigns to it, where \(\varepsilon \) does not depend on E. Blackwell and Dubins (1962, p. 885)
In a finite algebra any filtration generating \(\mathcal{S}\) must include the whole set X as the last element; so the martingale equation holds trivially in this case. Accordingly, q cannot be approached in the way (22) if all the elements in the filtration are proper subalgebras of \(\mathcal{S}\). This lies at the heart of the large size of the Bayes Blind Spot in finite probability spaces.
8 Concluding remarks
The results presented in the previous section contribute to a better understanding of the role of prior probability in Bayesian learning; more generally of the role of prior probability in any application of probability theory where conditionalization is used.
One lesson of the presented results is that the limits of what can be learned in a single probabilistic inference on the basis of a prior are extremely restrictive in case of a probability theory with a finite set of random events. It should be noted that from the perspective of validity of the results presented in this paper it makes no difference whether the probability measures that are Bayes inaccessible for a Bayesian Agent with a specific prior p are viewed as objectively given or whether they are interpreted as representing subjective degrees of belief: If both the prior probability measure p and the probability measures in the Bayes pBlind Spot BBS(p) are viewed as representing objective state of affairs (for example frequencies, or some sort of ratios), then the Bayes inaccessibility of the probability measures in BBS(p) presents difficulty for the statistical inference based on p because the objective state of affairs represented by probabilities in BBS(p) are simply not inferable from any incomplete evidence on the basis of p.
In particular this poses a problem for objective Bayesianism, which intends to avoid the arbitrary subjectivism in probabilistic inference; furthermore, the larger the size of BBS(p) the more serious the problem is because less is inferable then via conditionalizing. Thus the large size of the Bayes Blind Spot can be taken as strengthening the arguments against (Bayes or Jeffrey) conditionalization in Williamson’s version of objective Bayesianism (Williamson 2010). If the probability measures are all interpreted subjectively as degrees of belief, then the results on the size of Bayes pBlind spot and its behavior under repeated inferences contribute to a better understanding of the nature and limits of Bayesian learning dynamic. In particular the fact that (in the finite case) the Bayes pBlind spot is very large for any prior p whatsoever displays a difficulty that is not related to the arbitrariness of the subjective prior: the difficulty is not that an Agent possibly selects a prior that is biased in a particular way which might distort the posterior probabilities in an unacceptable manner. The problem is that any prior of the Agent prohibits him to obtain an enormously large set of probabilities via conditioning. And this fact is rooted deeply in the concept of (Bayes/Jeffrey) conditionalization; it is a structural, inherent feature of Bayesianism that cannot be “cured” by restricting priors on the basis of some arguments of plausibility or rationality.
One can try to weaken the significance of the large size of Bayes Blind Spot. One way of doing this is to say that not all probability measures on the Boolean algebra are epistemologically relevant, and as a consequence, not all probability measures in the Bayes Blind Spot might be epistemically relevant either. The inaccessibility for the Agent of those irrelevant measures is thus not troubling. For instance, one might say that in a specific context all the epistemologically relevant probability measures are such that they take on values that are rational numbers (e.g. because they represent relative frequencies in finite ensembles). More generally, given some condition of epistemic relevance that restricts the set of all probabilities on a Boolean algebra to a subset \(\mathcal{R}\), one could try to determine the size of the intersection of \(\mathcal{R}\) and of the Bayes Blind Spot. And this set might be small. It seems plausible that in specific applications of probability theory such restricting epistemic relevance conditions arise naturally. Another way of curtailing the significance of the large size of the Bayes Blind Spot would be to say that the notions of size used in this paper (cardinality, topological size, size in the natural measure) are arbitrary from an epistemological perspective. To articulate this line or reasoning would require the specification of an “epistemologically relevant size”. It’s not clear how one could do this in the abstract; at any rate we do not have any suggestion for such a “size”. But of the three sizes used in our paper to assess the size of the Bayes Blind Spot the topological one does have a clear epistemological interpretation: it is based on closeness of probability measures as this closeness is measured in any of the standard metrics in the set of all probability measures. So the topological largeness of the Bayes Blind Spot has a clear epistemological significance.
Another lesson one can draw from the results is that, in a specific sense, repeated learning via Bayesian/Jeffrey conditionalization modeled by either a conservative or bold Bayes learning path does not mitigate the heavy constraint represented by large Bayes Blind Spots on what can be learned in a Bayesian manner in a finite context: Given any starting prior and given any infinite series of (certain or uncertain) evidences, the set of probability measures learnable via an arbitrary long series of conditionalizations based on this given set of evidence is a very meager set—just as meager as the set that can be learned in a single act of conditioning. Note that this is not in contradiction with the phenomenon known as “washing out of priors”. The relation of washing out of priors [understood in terms of Doob’s upward martingale theorem (Earman 1992, Chapter 6, Sect. 4)] to the inaccessibility of certain probability measures via a possibly infinite series of conditionalization was clarified in Gyenis and Rédei (2017) (see especially Sect. 7 therein).
From the perspective of the power of Bayesian learning a positive result is however Proposition 5.5: This proposition says that given any nontrivial prior, any probability measure \(p^*\) (absolutely continuous with respect to the prior) can be learned in not more than two bold steps of (Jeffrey) conditioning on the basis of suitable evidence (if the Boolean algebra is finite). In order not to overestimate this positive feature of bold Bayesian learning one should however keep in mind the following: A look at the proof of Proposition 5.5 makes it clear that the two pieces of evidence \(r_1\) and \(r_2\) on which the inferences leading to \(p^*\) are based have to be very specific: The two subBoolean algebras of \(\mathcal{S}\) on which \(r_1\) and \(r_2\) are defined must be perfectly finetuned in the sense that they have to generate the whole \(\mathcal{S}\). In other words: values of \(p^*\) must be revealed on each atom of \(\mathcal{S}\) during the two steps of inferences. Thus it pays off to be bold in Bayesian learning indeed—but only if the Agent is confident that he has access to evidence rich enough to yield information about all the values of the probability measure to be learned. This is in harmony with the fact that Bayesian learning understood as statistical inference via conditionalization is an ampliative inference, not a deductive one. This feature Bayesian inference also is reflected by the nonaxiomatizability of certain modal logics that are defined semantically in terms of the Bayes accessibility relation (Brown et al. 2018).
Footnotes
 1.
One could in principle consider a measure \(\mu \) on the set of all probability measures different from the Lebesque measure. Then the \(\mu \)size of the Bayes Blind Spot would be different. We do not see a principled way of choosing such a \(\mu \) but it might be interesting to explore this possibility.
 2.
We thank an anonymous referee for raising this question.
Notes
Acknowledgements
Zalán Gyenis would like to acknowledge the Premium Postdoctoral Grant of the Hungarian Academy of Sciences. Miklós Rédei thanks the hospitality of the Munich Center for Mathematical Philosophy, where he was staying in the academic year 2018–2019 supported by a Carl Friedrich von Siemens Research Award of the Alexander von Humboldt Foundation. Research supported in part by the Hungarian Scientific Research Found (OTKA). Contract number: K115593.
References
 Barber, D. (2012). Bayesian reasoning and machine learning. Cambridge: Cambridge University Press.Google Scholar
 Barron, A., Schervish, M. J., & Wasserman, L. (1999). The consisteny of posterior distributions in nonparametric problems. The Annals of Statistics, 27, 536–561.CrossRefGoogle Scholar
 Belot, G. (2013). Bayesian orgulity. Philosophy of Science, 80, 483–503.CrossRefGoogle Scholar
 Billingsley, P. (1995). Probability and measure (3rd ed.). New York: Wiley.Google Scholar
 Blackwell, D., & Dubins, L. (1962). Merging of opinions with increasing information. The Annals of Mathematical Statistics, 33, 882–886.CrossRefGoogle Scholar
 Bovens, L., & Hartmann, S. (2004). Bayesian epistemology. Oxford: Oxford University Press.CrossRefGoogle Scholar
 Bradley, R. (2005). Radical probabilism and Bayesian conditioning. Philosophy of Science, 72, 342–364.CrossRefGoogle Scholar
 Brown, W., Gyenis, Z., & Rédei, M. (2018). The modal logic of Bayesian belief revision. Jornal of Philosophical Logic. Online first: 3 Dec 2018. Open access: https://doi.org/10.1007/s1099201894959.
 Carothers, N. L. (2000). Real analyis. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
 Conway, J. H., & Guy, R. (1996). The book of numbers. New York: Copernicus—Springer.CrossRefGoogle Scholar
 Diaconis, P., & Zabell, S. L. (1982). Updating subjective probability. Journal of the American Statistical Association, 77, 822–830.CrossRefGoogle Scholar
 Doob, J. L. (1949). Application of the theory of martingales. In Actes du Colloque International Le Calcul des Probabilités et ses applications (Lyon, 28 Juin3 Juillet, 1948), Centre National de la Recherche Scientifique, number 13, Paris (pp. 23–27).Google Scholar
 Earman, J. (1992). Bayes or bust?. Cambridge: MIT Press.Google Scholar
 Easwaran, K. (2011a). Bayesianism I: Introduction and arguments in favor. Philosophy Compass, 6, 312–320.CrossRefGoogle Scholar
 Easwaran, K. (2011b). Bayesianism II: Applications and criticisms. Philosophy Compass, 6, 321–332.CrossRefGoogle Scholar
 Freedman, D. (1963). On the asymptotic behavior of Bayes’ estimates in the discrete case. Annals of Mathematical Statistics, 34, 1386–1403.CrossRefGoogle Scholar
 Gyenis, Z., HoferSzabó, G., & Rédei, M. (2017). Conditioning using conditional expectations: The Borel–Kolmogorov Paradox. Synthese, 194, 2595–2630.CrossRefGoogle Scholar
 Gyenis, Z., & Rédei, M. (2017). General properties of Bayesian learning as statistical inference determined by conditional expectations. The Review of Symbolic Logic, 10, 719–755.CrossRefGoogle Scholar
 Gyenis, Z., & Rédei, M. (2019). Bayesian dynamical systems (in preparation).Google Scholar
 Hartmann, S., & Sprenger, J. (2010). Bayesian epistemology. In S. Bernecker & D. Pritchard (Eds.), Routledge companion to epistemology (pp. 609–620). London: Routledge.Google Scholar
 Howson, C. (1996). Bayesian rules of updating. Erkenntnis, 45, 195–208.Google Scholar
 Howson, C. (2014). Finite additivity, another lottery paradox, and conditionalization. Synthese, 191, 989–1012.CrossRefGoogle Scholar
 Howson, C., & Franklin, A. (1994). Bayesian conditionalization and probability kinematics. The British Journal for the Philosophy of Science, 45, 451–466.CrossRefGoogle Scholar
 Howson, C., & Urbach, P. (1989). Scientific reasoning: The Bayesian approach. La Salle, IL: Open Court. Second edition (1993).Google Scholar
 Huttegger, S. M. (2013). In defense of reflection. Philosophy of Science, 80, 413–433.CrossRefGoogle Scholar
 Huttegger, S. M. (2015). Merging of opinions and probability kinematics. The Review of Symbolic Logic, 8, 611–648.CrossRefGoogle Scholar
 Jeffrey, R. C. (1965). The logic of decision (1st ed.). Chicago: The University of Chicago Press.Google Scholar
 Kaas, A., & Wasserman, L. (1996). The selection of prior distributions by formal rules. Journal of the American Statistical Association, 91, 1343–1370.CrossRefGoogle Scholar
 Kalai, E., & Lehrer, E. (1994). Weak and strong merging of opinions. Journal of Mathematical Economics, 23, 73–86.CrossRefGoogle Scholar
 Miller, J. (2018). A detailed treatment of Doob’s theorem. https://arxiv.org/pdf/1801.03122.pdf.
 Neal, R. M. (1996). Bayesian learning for neural networks, lecture notes in statistics. New York: Springer.CrossRefGoogle Scholar
 Oxtoby, J. C. (1980). Measure and category, volume 2 of graduate texts in mathematics. Springer, New York. 2nd edition (1980). First edition (1971).Google Scholar
 Petersen, K. (1989). Ergodic theory. Cambridge: Cambridge University Press.Google Scholar
 Rescorla, M. (2015). Some epistemological ramifications of the Borel–Kolmogorov Paradox. Synthese, 192, 735–767.CrossRefGoogle Scholar
 Rosenthal, J. S. (2006). A first look at rigorous probability theory. Singapore: World Scientific.CrossRefGoogle Scholar
 Rudin, W. (1987). Real and complex analysis (3rd ed.). Singapore: McGrawHill.Google Scholar
 Rudin, W. (1991). Functional analysis. International series in pure and applied mathematics (2nd ed.). New York: McGrawHill.Google Scholar
 Ryabko, D. (2011). Learnability in problems of sequantial inference. Ph.D. thesis, Université Lille 1—Sciences et Technologies.Google Scholar
 Steen, L. A.., & Seebach, Jr., J. A. (1978). Counterexamples in topology. Springer, New York. Reprinted by Dover Publications, New York (1995).Google Scholar
 Weisberg, J. (2009). Commutativity or holism? A dilemma for conditionalizers. The British Journal for the Philosophy of Science, 60, 793–812.CrossRefGoogle Scholar
 Weisberg, J. (2011). Varieties of Bayesianism. In D. M. Gabbay, S. Hartmann, & J. Woods (Eds.), Inductive logic, volume 10 of Handbook of the history of logic (pp. 477–551). Oxford: NorthHolland (Elsevier).Google Scholar
 Weisberg, J. (2015). You’ve come a long way, Bayesians. Journal of Philosophical Logic, 44, 817–834.CrossRefGoogle Scholar
 Williamson, J. (2010). In defence of objective Bayesianism. Oxford: Oxford University Press.CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.