1 Introduction

An analyst, as Hardy said, is a mathematician habitually seen in the company of the real or complex number systems. For simplicity, we restrict ourselves to the reals here, as the complex case is obtainable from this by a cartesian product. In mathematics, one’s underlying assumption is generally Zermelo–Fraenkel set theory (ZF), augmented by the Axiom of Choice (AC) when needed, giving ZFC. One should always be as economical as possible about one’s assumptions; it is often possible to proceed without the full strength of AC. Our object here is to survey, with the working analyst in mind, a range of recent work, and of situations in analysis, where one can usefully assume less than ZFC. This survey is rather in the spirit of that by Wright [229] forty years ago and Mathias’ ‘Surrealist landscape with figures’ survey [143]; cf. [48, 49]. A great deal has happened since in the area, and we feel that the time is ripe for a further survey along such lines.

As classical background for what follows, we refer to the book of Oxtoby [175] on (Lebesgue) measure and (Baire) category. This book explores the duality between them, focussing on their remarkable similarity; for Oxtoby, it is the measure case that is primary. Our viewpoint is rather different: for us, it is the category case that is primary, and we focus on their differences. Our motivation was that a number of results in which category and measure behave interchangeably disaggregate on closer examination. One obtains results in which what can be said depends explicitly on what axioms one assumes. One thus needs to adopt a flexible, or pluralist, approach in order to be able to handle apparently quite traditional problems within classical analysis.

We come to this survey with the experience of a decade of work on problems with wide-ranging contexts, from the real line to topological groups, in which ‘the category method’ has been key. The connection with the Baire Category Theorem, viewed as an equivalent of the Axiom of Dependent Choices (DC: for every non-empty set X and \(R\subseteq X^{2}\) satisfying there is \(h\in X^{\mathbb {N}}\) with for all \(n\in \mathbb {N})\), has sensitized us to a reliance on this as a very weak version of the Axiom of Choice, AC, but one that is often adequate for analysis. This sensitivity has been further strengthened by settings of general theorems on Banach spaces reducible to the separable case (e.g. via Blumberg’s Dichotomy [22, Theorem B]; compare the separable approximations in group theory [147, Chapter II, Section 2.6]) where DC suffices. On occasion, it has been possible to remove dependence on the Hahn–Banach Theorem, a close relative of AC.

For the relative strengths of the usual Hahn–Banach Theorem (HB) and AC, see [178, 179]; [182] provides a model of set theory in which DC holds but HB fails. HB is derivable from the Prime Ideal Theorem (PI), an axiom weaker than AC: for literature see again [25, 178, 179] and [35, Section 16] for equivalence of HB and the least upper bound axiom (LUB) for ordered vector spaces; for the relation of the Axiom of Countable Choice (ACC), to DC here, see [101]. Note that HB for separable normed spaces is not provable from DC [64, Corollary 4], unless the space is complete—see [25].

When category methods fail, e.g. on account of ‘character degradation’, as when the limsup operation is applied to well-behaved functions (see Sect. 9), the obstacles may be removed by appeal to supplementary set-theoretic axioms, so leading either beyond, or sometimes away from, a classical setting. This calls for analysts to acquire an understanding of their interplay and their standing in relation to ‘classical intuition’ as developed through the historical narrative. Our aim here is to describe this hinterland in a language that analysts may appreciate.

We list some sources that we have found useful, though we have tried to make the text reasonably self-contained. From logic and foundations of mathematics, we need AC and its variants, for which we refer to Jech [105]. For set theory, our general needs are served by [106]; see also Ciesielski [48], Shoenfield [191], Kunen [126, 128]. For descriptive set theory, see Kechris [118] or [139], and Moschovakis [153], especially for its historical comments; for analytic sets see Rogers et al. [189]. For large cardinals, see e.g. Drake [66], Kanamori [113], Woodin [225].

The paper is organized as follows. After a review of the early history of the axiomatic approach to set theory (including also a brief review of some formalities) we discuss the contributions of Gödel and Tarski and their legacy, then of Ramsey and Erdős and their legacy. We follow this with a discussion of the role of infinite combinatorics (the partition calculus) and of ‘large cardinals’. We then sketch the various ‘pre-Cohen’ expansions of Gödel’s universe of constructible sets L (via the ultrapowers of Łoś, or the indiscernibles of Ehrenfeucht–Mostowski models, and the insights they bring to our understanding of L). This is followed by an introduction to the ‘forcing method’ and the generic extensions which it enables. We describe classical descriptive set theory: the early ‘definability theme’ pursued by Suslin, Luzin, Sierpiński and their legacy, and the completion of their programme more recently through a recognition of the unifying role of Banach–Mazur games; these require large cardinals for an analysis of their ‘consistency strength’, and are seen in some of the most recent literature as casting ‘shadows’ (Sect. 8) on the real line.

In order to draw on the ‘definability theme’ more meaningfully, we offer a discussion of the ‘syntax of analysis’ in Sect. 9, and in that light we turn finally in Sect. 10 to the additional axioms which permit a satisfying category-measure duality for the working analyst.

1.1 The canonical status of the reals

The English speak of ‘the elephant in the room’, meaning something that hangs in the air all around us, but is not (or little) spoken of in polite company. The canonical status of the reals is one such, and the sets of reals available is another such. We speak of ‘the reals’ (or ‘the real line’), \(\mathbb {R}\) and ‘the rationals’, \(\mathbb {Q}\)—as in the Hardy saying above. The definite article suggests canonical status, in some sense—what sense?

The rationals are indeed canonical. We may think of them as a ‘tie-rack’, on which irrationals are ‘hung’. But how, and how many? In Sect. 6 we will review the forcing method for selecting from ‘outside the wardrobe’ an initial lot of almost any size that one may wish for, together with their ‘Skolem hull’—further ones required by the operation of the axioms (see the Skolem functions of Sect. 2). For background on canonicity (or to give its proper name, ‘categoricity’) see [36, 37].

In brief: the reals are canonical modulo cardinality, but not otherwise. This is no surprise, in view of the Continuum Hypothesis (CH), which directly addresses the cardinality of the real line/continuum, and which we know from Cohen’s work of 1963/64 [50, 51] will always be just that, a hypothesis. The canonical status of the reals rests on (at least) four things:

  1. (i)

    (geometrical; ancient Greeks): lines in Euclidean geometry: any line can be made into a cartesian axis;

  2. (ii)

    (analytical; 1872): Dedekind cutsFootnote 1;

  3. (iii)

    (analytical; 1872): Cantor, equivalence classes of Cauchy sequences of rationals (subsequently extended topologically: completion of any metric space);

  4. (iv)

    (algebraic; modern): any complete archimedean ordered field is isomorphic to \(\mathbb {R}\) (see e.g. [55, Section 6.6]: our italics; here too Cauchy sequences are used to define completeness).

None of these is concerned with cardinality; CH is. On the other hand, completeness depends on which \(\omega \)-sequences (i.e. functions with domain \(\omega \)) are available. Accordingly, the problems that confront the working analyst split, into two types. Some (usually the ‘less detailed’) do not hinge on cardinality, and for these the reals retain their traditional canonical status. By contrast, some do hinge on cardinality; these are the ones that lead the analyst into set-theoretic underpinnings involving an element of choice. Such choices emphasize the need for a plural approach, to axiomatic assumptions, and hence to the status of the reals. This is inevitable: as Solovay [202] puts it, ‘it (the cardinality of the reals) can be anything it ought to be’. (The only constraint is that its cofinality be uncountable, by a result of König of 1905—see e.g. [126, Section 10.40] or [128, I.13.12].)

We turn now to the second of the ‘elephants’ above: which sets of reals are available. The spectrum of axiom possibilities which we review in Sect. 10 extends from the ‘prodigal’ (below—see Sect. 2) AC at one end (which yields for example non-measurable Vitali sets) to the restrictive DC with additional components of LM (‘all sets of reals are measurable’) and/or PB (‘all sets of reals have the Baire property’) at the other, and include intermediate positions for the additional component such as PD (‘all projective sets of reals are determined’), where the sets of reals with the above-mentioned so-called ‘regularity properties’ are qualified (see Sects. 79).

Underlying an analysis of these axioms is repeated appeal to simplification of contexts—a mathematical ex oriente lux—typified by passage to a ‘large’ homogeneous/monochromatic subset, as in Ramsey’s Theorem on \( \mathbb {N}\) (Sect. 4.1). This has generalizations to large cardinals, in particular ones that support a \(\{0,1\}\)-valued measure (equivalently, a ‘suitably complete’ ultrafilter—see below). On the one hand, the latter permits an extension of Suslin’s classical tree-like representation of an analytic set (Sect. 7) to sets of far greater logical complexity (via witnessing membership of a set by means of infinite branches in a corresponding tree, the branches being required to pass through ‘large’ sets of nodes at each height/level—see Sect. 7). On the other hand, in the context of the ‘line’ of ordinals, one meets other forms of isomorphic behaviour on ‘large’ sets: on closed unbounded subsets of ordinals, and on the related stationary sets (Sects. 5.2, 6.2).

Notes. 1. This survey arose out of our decade-long probing (see e.g. [17, 19]) of questions in regular variation [16]. In [19] we needed to disaggregate a classical theorem of Delange (see [16, Theorem 2.01]); the category and measure aspects need different set-theoretic assumptions. We regard the category case as primary, as one can obtain the measure case from it by working bitopologically (passing from the Euclidean to the density topology; see [18, 23, 24]); also, measure theory needs stronger set-theoretic assumptions than category theory (Sects. 10.2, 10.3). If one replaces the limits in regular variation by limsups, the Baire property or measurability may be lost; the resulting character degradation is studied in detail in [19, Sections 3, 5, 11].

2. Although our focus here is on analysis, it may be as well to note the great impact of AC on algebra (see e.g. [105]). We confine ourselves here to mentioning that van der Waerden, in his epoch making Moderne Algebra, used AC in the first edition (1930)—convincing his fellow algebraists of its great value to them. He then lost faith in AC (perhaps under the influence of his compatriot Brouwer, Sect. 2), and dropped AC from his second edition (1937). Following a storm of protests, he reinstated AC in his third edition (1950). For background, see Moore [148, Section 4.5]. Echoes of this sentiment, that nothing much of significance can be ventured without relying on AC, reverberate in model theory—in Woodin’s words: “The difficulty is that without the axiom of choice, it is extraordinarily difficult to prove anything about sets” [227, p. 455]. We will briefly return to note the contiguity of our main subject matter with algebra, in Sect. 3 when touching on the origins of model theory.

3. We close with a brief mention of ‘yet another elephant in the room’. One can never prove consistency (of sets of rich enough axioms), merely relative consistency. This is related to Gödel’s incompleteness theorems (Sect. 3). Thus we do not know that ZF or ZFC itself is consistent; this is something we have to live with; it is no reason to despair, or give up mathematics; quite the contrary, if anything. In what follows, ‘consistency’ means ‘consistency relative to ZF’.

2 Early history

A little historical background may not come amiss here. The essence of analysis—and the reason behind the Hardy quotation that we began with—is its concern with infinite or limiting processes—most notably, as in calculus, our most powerful single technique in mathematics (and indeed, in science generally). Life being only finitely long, the infinite—actual or potential—takes us beyond direct human experience, even in principle. This underlies the unease the ancient Greeks had with the irrationals (or reals), and why they missed calculus (at least in its differential form, despite their success with areas and volumes under the heading of the ‘method of exhaustion’). One can see, for example in the ordering of the material in the thirteen books of Euclid’s Elements, that they were at ease with rationals, and with geometrical objects such as line-segments etc., but not with reals. Traces of this unease survive in Newton’s handling of the material in his Principia, where he was at pains to use established geometrical arguments rather than his own ‘method of fluxions’. That there was unfinished business here shows, e.g., in the title of a work of one of the founding fathers of analysis, Bolzano, with his Paradoxien des Unendlichen (1852, posthumous). The bridge between the real line and the complex plane (the ‘Argand diagram’—Argand, 1806, Wessel, 1799, Gauss, 1831) pre-dated this. The construction of the reals came independently in two different ways in 1872: Dedekind cuts (or sections), which still dominate settings where one has an order, and Cantor’s construction via (equivalence classes of) Cauchy sequences (of rationals)—still ubiquitous, as the completion procedure for metric spaces.

Cantor. Cantor’s work, in the 1870s to 1890s, established set theory (Mengenlehre) as the basis on which to do mathematics, and analysis in particular. Here we find, for example, the countability of the rationals, and of the algebraic numbers (Cantor, 1874) and the uncountability of the reals (Cantor, 1895), established via the familiar Cantor diagonalization argument. But note what is implicit here: the “other” Cantor diagonalization (as used, say, to prove the countability of the rationals) is an effective argument. But to move from this to saying that ‘the union of countably many countable sets is countable’ (Cantor, 1885) needs the Axiom of Countable Choice (ACC), below.

Hilbert. Moving to the 20th century, Hilbert famously said [97]: ‘No one shall expel us from the paradise that Cantor has created for us’. Hilbert addressed himself to the programme of re-working the mathematical canon of its time to (then) modern standards of rigour, witness his books on the foundations of geometry [94,95,96] (1899) and of mathematics [98] (1934, 1939, with Bernays), cf. the Hilbert problems of 1900. As we shall see, Hilbert was a man of his time here, and his views on foundational questions were too naive. Meanwhile, Lebesgue introduced measure theory in 1902, Fréchet metric spaces in 1906, and Hausdorff general topology in 1905–1914 (three very different editions of his classic book Grundzüge der Mengenlehre appeared in 1914, 1927 and 1935). Hilbert space emerged c. 1916 (work of Hilbert and Schmidt; named by F. Riesz in 1926). Banach’s book [5] appeared in 1932, effectively launching the field of functional analysis; this magisterial work is still worth reading. But, Banach was again a man of his time; he worked sequentially, rather than using the language of weak topologies, presumably because he felt it to be not yet in final form. However, the language and viewpoint of general topology was already available, and already a speciality of the new Polish school of mathematics, of which Banach himself was the supreme ornament. For a scholarly and sympathetic account of these matters, see Rudin [190, Appendix B].

The need for care in set theory had been dramatically shown by the Russell Paradox of 1902, and its role in showing the limitations of Frege’s programme in logic and foundations, especially his Grundgesetze der Arithmetik (vol. 2 of 1903). The Paradox, far from being a programme wrecker, was pregnant with consequences [78], just as with Gödel’s work later (below), and that too was ultimately based on a Paradox (the ‘Liar paradox’—cf. [155]; see [201] for a textbook account, [87] for a discussion). The familiar Russell, or to give another self-referential case, the liar, paradox has a number of forms; one is as follows. Take a piece of paper; write on both sides ‘The statement on the other side of this piece of paper is false’. Read from either side, one is (ostensibly) confronted by a definite statement; is it true or false?; neither—‘if it’s true it’s false, if it’s false it’s true’.

Foundational questions had been addressed in 1889 by Peano. Zermelo began his axiomatization, and gave the Axiom of Choice (AC) in 1902. Fraenkel, Skolem and others continued and revised this work; what is known nowadays as Zermelo–Fraenkel set theory (ZF), together with \(\mathrm{ZF}+\mathrm{AC}\), or ZFC, emerged by 1930. (For some time the system was called Zermelo–Fraenkel–Skolem; one may regret that this did not survive, abbreviated as ZFS.) AC is most often used in the (equivalent) form of Zorn’s Lemma of 1935 (a misnomer, as the result is due to Kuratowski in 1922, but the usage is now established). It will be helpful for later passages to note that the axioms include the operations of separation (the forming of a subset determined by a property), union and power set (denoted here by \(\wp \)), as well as foundation/regularity, asserting the well-foundedness of the relation of membership \(\in \) (‘no infinite descending \(\in \)-chains’—in the presence of DC). In this context AC is a generator of sets par excellence, with effects of both positive and negative aspects: allowing the construction both to ‘satisfy intuition’ (as in the construction of ‘invariant means’) and to astound it (as in the Banach–Tarski paradox): see the comments in [217, Chapter 15]. The tension between ‘too many’ sets or ‘too few’ pervades the history of set theory through the lens of logic, all the way back to Cantor: see [86]. For a discussion of approaches to axiomatization see [193].

Brouwer. The interplay between analysis (specifically, topology) and foundations in this era is well exemplified by the work of Brouwer. Brouwer is best remembered for two contributions: his fixed-point theorem (of 1911, [30]), and Intuitionism (1920, cf. [31]): for more details see the special issue of Indagationes Mathematicae (volume 29 (2018), no. 1) entitled L.E.J. Brouwer after 50 years. The first is beloved of economists, as it provides existence proofs of economic equilibria—the ‘invisible hand’ of Adam Smith, and his later ‘disciples’. But, his proof of the fixed-point theorem was a non-constructive existence proof, and Brouwer lost faith in these for foundational reasons. He reacted by seeking to re-formulate mathematics ‘intuitively’, on new foundations—differing from those in use then and now by, for instance, outlawing proof by contradiction. This led to serious conflict, for instance the Annalenstreit (Annals struggle) of 1928, where Hilbert, as Editor-in-Chief of the Mathematische Annalen, ejected Brouwer from the Editorial Board. For an account of this matter, see e.g. van Dalen [57, Section 14.3], where the term Frosch–Mäusekrieg—the War of the Frogs and Mice, or Batrochomyomachia—is used, following Einstein.

Von Neumann. Von Neumann contributed to foundational questions, e.g. by formalizing the (or a) construction of the natural numbers \(\mathbb {N}\):

see [88, Section 11] ([163, 164], 1928), and work on amenable groups (see e.g. [176]), with applications to the ‘Banach–Tarski paradox’ (as above) ([15, 217]).

The sets x in Von Neumann’s definition above are ordered by \(\in \) and are transitive: if \(z\in y\in x\), then \(z\in x\). Indeed the ordinals, which form the class On (not a set), are initially introduced as transitive well-ordered structures with \(\in _{x}\) the restriction to x of the membership relation. Thus the transitive set

$$\begin{aligned} \omega :=\{0,1,2,3,\dots \} \end{aligned}$$

follows all the natural numbers, with its existence legitimized by the Axiom of Infinity. Once ordinals \(\alpha \) are established (this uses the Axiom of Regularity), Von Neumann’s scheme above naturally yields the cumulative hierarchy\(V_{\alpha }\), introduced inductively so that \( V_{\alpha +1}=\wp (V_{\alpha })\), with \(\wp \) the power set operation, and for \(\lambda \) a limit ordinal. The class of all sets, the “Cantor universe”, is then , and each set x has a well-defined rank: the least \(\alpha \) with \(x\in V_{\alpha }\).

On the brink: 1930. We pause to review the state of matters in 1930. The Zermelo–Fraenkel(–Skolem) axioms had emerged. The epoch-making contributions of Gödel and Tarski were imminent. The Annalenstreit had recently ended, in which Hilbert, whose view of foundations was about to be demolished by Gödel, but whose position was in keeping with the general thinking of his time, and (appropriately modernized) broadly remains so still, was in conflict with Brouwer, an arch-apostle of Intuitionism, which (while not demonstrably untenable, as Hilbert’s position was soon shown to be) had not found wide acceptance then, nor has done so since. One may well sympathize with both parties. The important point is the shock to which the mathematical world was about to be exposed, in the new era of Gödel and Tarski: it is perhaps difficult for us, with the benefit of so much hindsight, to appreciate how disturbing all this was at the time.

There are parallels to be drawn with our own situation today. The text below is the substance of what we have to say here; the Coda which ends the paper summarizes (in the light of this text) our thinking as of today.

Formalities. The formal language of set theory LST builds formulas from a defined sequence of free variables (e.g. \( v_{0},v_{1},\dots )\), the atomic ones taking the form \(x\in y\) and \(x=y\), with x and y standing for free variables; the syntactically more complex ones then arise from the usual logical connectives and quantifiers (\(\forall x\) and \(\exists \, y\)—creating bound variables from the free variables xy). The idea is that the free and bound variables of a formula are restricted to range only over the elements in the universe of discourse (thus yielding a ‘first-order’ language). This language is a necessary ingredient of the axiomatic method, its first purpose being to give meaning to the notion of ‘property’ (so that e.g. is recognized as a (sub)set when \(\varphi \) is a formula with one free variable x—an instance of the Axiom of Separation).

The language LST is minimal as compared to the language of, say, group theory, whose type (officially: signature) involves more distinguished constants (a designated element 1, functions like , relations, etc.). Each such language is interpreted in a mathematical structure; for instance, at its simplest a group structure has the form \( \mathscr {G}:=\langle G,1_{G},\circ _{G},\cdot ^{-1}\rangle \) and so lists its domain, designated elements and operations. Below structures are assumed to be sets unless otherwise qualified; it is sometimes convenient (despite formal complications) to allow a class as a domain, e.g. .

The (metamathematical—i.e. ‘external’ to the discourse in the language) semantic relation \(\models \) of satisfaction/truth (below), due to Tarski (see [215], cf. [11, Chapter 3, Section 2]), is read as ‘models’, or informally as ‘thinks’ (adopting a common enough anthropomorphic stance). A formula \( \varphi \) of LST with free variables \(x,y,\dots ,z\) may be interpreted in the structure \(\mathscr {M}:=\langle M,\in _{M}\rangle \) (with \(\in _{M}\) now a binary set relation on the set M) for a given assignment \(a,b,\dots ,c\) in M for these free variables, and one writes

if the property holds for the said assignment; this requires an induction on the syntactic complexity of the formula starting with the atomic formulas (for instance, the atomic case \(x\in y\) is interpreted under the assignment ab as holding iff ). Compare the reduction of complexity in the forcing relation of Sect. 6.

This apparatus enables definition of ‘suitably qualified’ forms of definability; by contrast, unrestricted ‘definability’ leads to such difficulties as the ‘least ordinal that is not definable’, so is to be avoided (compare Sect. 3 with Tarski’s undefinability of truth). A simple example is that of an element \(w\in M\) being definable over M from a parameter \(v\in M\), in which case for some formula \(\varphi (x,y)\) with two free variables:

Thus Gödel introduced the constructible hierarchy\(L_{\alpha }\) by analogy with \(V_{\alpha }\): however, \(L_{\alpha +1}\) comprises only sets definable over \(L_{\alpha }\) from a parameter in \(L_{\alpha }\); here for \(\lambda \) a limit ordinal, a matter we return to later, yielding the class .

Certain formulas, like \(\varphi (x,y)\) above (which can be explicitly, and so effectively, enumerated, as \(\varphi _{m}\) say), may give rise, via the substitution of a parameter v for y, to a family of not necessarily unique elements \(u\in M\) satisfying \(\varphi (u,v)\). An appeal, in general, to AC, but in the ‘metamathematical’ setting (i.e. the context of the mathematics studying relations between the language and the structures), selects a witness w of the relation \(\varphi (x,v)\) holding in M: the corresponding function \(v\mapsto w\) is called a Skolem function (for M and \(\varphi \)). We will see a striking application presently—for background on this key notion see e.g. [99], and for a historical account [100]. Evidently, a structure like \(\mathscr {M}:=\langle L_{\alpha },\in _{L_{\alpha }}\rangle \) contains enough canonical well-orderings of its initial parts \(L_{\beta }\) for \(\beta <\alpha \) (induced by the enumeration \(\varphi _{m}\) and the well-ordering of the ordinal parameters) that reference to AC here becomes unnecessary. (Incidentally, this is why AC holds in the class structure .)

We will refer to some other definability classes below in Sect. 6, so as an introduction we mention two classical ones. The class OD of ordinally definable sets comprises those that are definable from ordinal parameters over \(\langle V_{\alpha },\in _{V_{\alpha }}\rangle \) for some \( \alpha \). An element of a set in OD need not itself be in OD; the class HOD is the smaller class of those elements x whose transitive closure consists entirely of sets in OD, so HOD is a transitive class; see [161] for a discussion.

In view of the finitary character of formulas, the Löwenheim–Skolem(–Tarski) theorem (see e.g. [99], or [11, Chapter 4.3]), as applied to the language of set theory LST, asserts that if a set \({\Sigma } \) of sentences is modelled in an infinite structure \(\mathscr {M}\), then there exist structures \(\mathscr {N}\) of any infinite cardinality satisfying \( {\Sigma }\), including countable ones. The latter ones are generated by induction by iterative application of all the Skolem functions; so this needs only the Axiom of Dependent Choices. A familiar example is the countable subring with domain \(\mathbb {Q}\) of the ordered ring structure . Passing to above-continuum cardinalities yields models of non-standard analysis with infinitesimals and infinite integers (see below); but here AC is needed to construct Skolem functions with which to generate the much larger structure.

The axioms of set theory comprise a finite set of axioms together with an axiom schema corresponding to the Axiom of Replacement (which asserts that the image of a set under a functional relation \(\varphi (x,y)\) expressed in LST is again a set). In order to model these axioms in structures like \(\langle M,\in _{M}\rangle \) with M a set, it is necessary to restrict attention to the use of a finite number of instances of the axiom schema—causing no practical loss of generality, since any amount of mathematical argument will necessarily do just that (for instance, a deduction of an inconsistency). Thus, assuming the consistency of the axioms of set theory, any finite subset of the axioms has a model \(\mathscr {M}\) (by the Gödel–Mal\('\)cev–Henkin Completeness TheoremFootnote 2; see e.g. [11, Theorem 4.2], [186, Theorem 1.6.2, p. 22], and Sect. 3) and so also a countable model \(\mathscr {N}\). This reference to an appropriate finite subset of axioms is conventionally and systematically replaced (or by-passed) by the assumption that the axioms of set theory have a countable model. Of course, in ZFC we cannot prove the existence of such a model (as that would be tantamount to proving consistency of ZFC within ZFC). For a discussion of this see [126, Chapter 7, Section 9, Appendix: other approaches and historical remarks] and the more recent [128, IV.5 The metamathematics of forcing].

By its very nature the countable model \(\mathscr {N}\) will contain far fewer bijections than exist in Cantor’s world V. If transitive, the domain N of \(\mathscr {N}\) will have an initial segment of the ordinals in V; however, there might be countable ordinals which \(\mathscr {N}\) ‘thinks’ are uncountable, owing to missing bijections. The rule to observe is that ordinals are absolute whereas cardinality is relative. This is exploited in arranging the failure of the Continuum Hypothesis (CH) by the model extension process of forcing (see below for details and references). In the context of a transitive model of set theory \(\mathscr {M}\) we will write e.g. \(\omega _{1}^{M}\) for the ordinal which in \(\mathscr {M}\) is its first uncountable. In the absence of a superscript the implied context is V. Provided the Regularity Axiom is included, the structure \(\mathscr {N} =\langle N,\in _{N}\rangle \), being then well-founded, is isomorphic to a transitive structure; the isomorphism \(\pi \) is given inductively by:

and is known as the Mostowski collapse. Thus, for example, \(\pi (\varnothing ^{M})=\varnothing \).

3 Skolem, Gödel, Tarski, Mal\('\)cev and their legacy

The use of formal language brought greater clarity to the axiomatic method: thus Skolem helpfully clarified one of Zermelo’s axioms by replacing the latter’s use of the informal notion of ‘definite property’ with a formal rendering (i.e. by reference to formulas in a formal language). He initially came to these matters from number theory, an area to which he repeatedly returned, probing their interconnections (see e.g. [200] and the Coda below). This was soon to be followed by the discovery of the limitations of formal language: the publication in 1931 of Gödel’s two incompleteness theorems, preceded by the results of his 1930 thesis on the completeness of first-order logic (that every universally valid sentence is provable—[11, Theorem 12.1.3]) and on compactness (a corollary). The latter was to bear fruit at the hands of Tarski much later (1958 on). We recall Tychonoff’s Compactness Theorem in topology and its proofs (see e.g. [121, p. 143], [68, 3.2.4]). As the name implies, the Compactness Theorem for predicate calculus (that a set of sentences has a model iff each finite subset has a model [11, Chapter 5, Section 4]) and Tychonoff’s Compactness Theorem in topology are deeply connected, as are both to AC; see [105]. See also [11, Chapter 5, especially Section 5] for the status of variants and the connection with the ultraproducts of Sect. 5, and Beth [14] for his topological proof of the ‘Löwenheim–Skolem–Gödel’ theorems and references to similar topological approaches to assertions in logic. The ‘inter-disciplinary’ nature of this illuminating connection is not altogether surprising: the language of topology has the power to embrace analogues of semantical arguments both in converting (perhaps, a reduct of) a model into a space (see e.g. [166]), or dually in using a space of points to represent certain models of a fixed set of sentences (cf. again [14]).

The two incompleteness theorems concern any axiomatic system rich enough to encompass arithmetic (firstly, the existence, in the formal language of the axiom system, of sentences that can be neither proved nor disproved, and secondly, the impossibility of such a system to provide a proof for its own consistency). Rather than just wreck Hilbert’s programme, this produced untold benefits to the richness of mathematics. On the one hand, it introduced the plurality of the possible interpretations of a set of axioms (as in Skolem’s non-standard arithmetic), and the accompanying search for choosing ways to reduce incompleteness. On the other hand, it focussed on the need to test or justify any belief in consistency, especially in the case of the axioms of set theory. See [210].

Gödel’s enduring insight was the embedding by arithmetic coding (hence the need for the ‘rich enough’ presence of arithmetic) of (aspects of) a ‘metalanguage’—the informal language of discourse needed to examine a formal language as a mathematical entity—back into the formal language, specifically the concepts of proof and provability—see below.

Addressing the incompleteness of set theory, Gödel’s second legacy relates to ‘relative consistency’: proof in 1938 (published in 1940) of the consistency relative to ZF of both AC—a matter of supreme importance, given the Banach–Tarski paradox (dating back to 1924)—and of GCH. The key idea in the proof was the introduction (see Sect. 2) of the cumulative hierarchy \(L_{\alpha }\) of constructible sets whose totality comprising the class L is an inner model (i.e. a subuniverse of the full universe V, specifically a transitive class containing On). This was to be the foundation stone for the advances of the ‘next one hundred years’ in two ways. The first was to invite extensions of L by appropriate choice of sets outside L. The second, more technical, derives from Skolem’s method (1912) of constructing countable sub-models, enshrined in a condensation principle, that if \(M\ \)is a countable ‘submodel’ of L (see below), then it is isomorphic to a set \(L_{\alpha }\). More accurately, and with some hindsight, here M is an ‘elementary substructure’, i.e. any sentence referring to only a finite number of elements of M holds in M iff it holds in L [11, Chapter 4], written .

Contemporaneously with Gödel’s earliest contributions, and blending and intertwining with them, there occurs a ‘volcanic eruption’ of ideas and results from the fertile mind of Tarski: bursting forth in 1924 with the Banach–Tarski paradox (mentioned above) and evidenced by the working seminars of 1927–1929, laying the foundations of Tarski’s remarkable legacy, both that published in its time and that published later. This included work on the definability or otherwise (definable if ‘external’, not if ‘internal’) of the concept of truth, a result closely allied to Gödel’s incompleteness result and of similar vintage. Suffice it to point to the role of ‘elementary substructure’ (term due to Tarski, made explicit in 1961, but implicit long beforehand) in the condensation principle above, and the naming of the discipline of model theory by Tarski in 1954 (see [100] for the prior absence of a consensus on a name).

Deficiencies in Hilbert’s approach to geometry (e.g., its tacit assumption of set theory) led Tarski to re-examine the axiomatic basis of geometry. In 1930 Tarski was able to prove the decidability of ‘elementary geometry’, via a reduction to ‘elementary algebra’ where he was able to generalize Sturm’s algorithm for counting zeros of polynomials—see [220] for references and [206] for recent developments in this area.

Digressing briefly one more time from our concerns with analysis to algebra, we mention again Mal\('\)cev’s name here, for his far-reaching and lasting contributions, starting in 1936, at least a decade ahead of his time, to ‘model theory’ and its interface with algebra, a trail-blazing endeavour. This was based initially on his independently conceived, extended version of the completeness and compactness theorems (Sect. 2): see [137], and also the references to him in [55, 99, 186].

4 Ramsey, Erdős and their legacy: infinite combinatorics; the partition calculus and large cardinals

4.1 Ramsey and Erdős

Pursuing a special case of Hilbert’s Entscheidungsproblem of 1928—proposing the task of finding an effective algorithm to decide the validity of a formula in first-order logic—Ramsey was led to results in both finite and infinite combinatorics (obtained late that same year, and published in 1930, [185]), the finite version of which yielded the desired algorithm for the special (“though common”) universal type of formula. In general no computable algorithm exists, as was shown by Church (using Gödel’s coding) in 1935, and independently by Turing in 1936 (via Turing machines). The Infinite Ramsey Theorem (which acted as a paradigm for its finite variants) asserts in its simplest form that if the distinct unordered pairs (doubletons) of natural numbers are partitioned into two (disjoint) classes, then there exists an infinite subset \(\mathbb {M}\subseteq \mathbb {N} \) all doubletons from which fall in the same partitioning class; thus \( \mathbb {M}\), which may be said to be a homogeneous (monochromatic) subset for the partition, is large—see [66, Chapters 2.8.1, 7.2 which both use DC]. (Homogeneity is a constantly recurring theme in what follows.) Thus, as a corollary, a Cauchy sequence in \(\mathbb {R}\) contains either an increasing or a decreasing subsequence. The combinatorial result extends from doubletons to (unordered) n-tuples (called by Ramsey ‘combinations’) and from dichotomous partitions to ones allowing any finite number k of partitioning classes. Further analogues and generalizations form the substance of the partition calculus, the founding fathers of which were Paul Erdős and Richard Rado: see [69, 115].

Given its origins, it is not altogether surprising that Ramsey’s theorem and its generalizations continue to play a key role in the logical foundations of set theory.

4.2 Partitions from large cardinals

We are particularly concerned below with the partition property that follows. As usual we regard any ordinal (including any cardinal) as the set of its predecessors. The partition property (partition relation) of concern is

$$\begin{aligned} \kappa \rightarrow (\alpha )_{2}^{<\omega }, \end{aligned}$$

by which is meant that if (the finite subsets of \( \kappa \)) is partitioned into two classes, then there is a homogeneous subset of \(\kappa \) of order type \(\alpha \). (Ramsey’s result as stated above is recorded in this notation as \(\omega \rightarrow (\omega )_{2}^{2}\), and its immediate generalization to n-tuples and k classes as \(\omega \rightarrow (\omega )_{k}^{n}\).)

For any \(\alpha \geqslant \omega \) the least cardinal \(\kappa \) for which \(\kappa \rightarrow (\alpha )_{2}^{<\omega }\) holds, denoted \(\kappa (\alpha )\), is called the \(\alpha \)-th Erdős cardinal (or partition cardinal); but do such cardinals exist? One may show in ZFC that\(\ \kappa (\alpha )\), if it exists, is regular (below), and when \(\alpha \) is a limit ordinal, that \(\kappa =\kappa (\alpha )\) is strongly inaccessible (below) [66, Chapter 7], and so \(V_{\kappa }\) is then a model of ZFC written \( V_{\kappa }\models \) ZFC [66, Chapter 4]. Hence, by Gödel’s incompleteness theorem, we cannot deduce its existence in ZFC.

Of particular importance are cardinals \(\kappa \), in particular \(\kappa =\kappa (\omega _{1})\), for which \(\kappa \rightarrow (\omega _{1})_{2}^{<\omega }\) holds: see the next section. So if, as we do, we need them, then we must add their existence to our axiom system. To gauge the consistency strength of this assumption we refer to one of the earliest notions of a ‘large cardinal’: a measurable cardinal\(\kappa \). Such a cardinal was defined by Ulam [219] in 1930 by the condition that it supports a \(\{0,1\}\)-valued \(\kappa \)-additive (i.e. additive over families of cardinality \(\lambda \), for all \(\lambda <\kappa )\) non-trivial measure on the power set \(\wp (\kappa )\). This may be reformulated as asserting the existence of a \(\kappa \)-complete ultrafilter on \( \kappa \) [41, 56, 82, 106]. It turns out that for \(\kappa \) measurable, the stronger relation \(\kappa \rightarrow (\kappa )_{2}^{<\omega }\) holds. The latter is taken as the defining property of a Ramsey cardinal, through its similarity with \(\omega \rightarrow (\omega )_{2}^{2}\).

We stop to notice that the relation \(\kappa \rightarrow (\kappa )_{2}^{2}\) (taken to be the definition of a weakly compact cardinal [66, Chapter 10.2]—see also Sect. 4.3) holds iff \(\kappa \) is strongly inaccessible and \(\kappa \) has the tree property: every tree of cardinality \(\kappa \) having less than cardinality \(\kappa \) nodes at each level has a path, i.e. a branch of full length \(\kappa \). It is interesting that, as with the Cauchy sequences in \(\mathbb {R}\) above, if \(\kappa \rightarrow (\kappa )_{2}^{2}\), then every linearly ordered set of cardinality \(\kappa \) has a subset of cardinality \(\kappa \) which is either well-ordered or reversely well-ordered by the linear ordering.

4.3 Large cardinals continued

A first source for the notion of a large cardinal takes its motivation from the conceptual leap from the finite to the infinite, as exemplified by the set of natural numbers viewed as \(\mathbb {N}\), or, better for this context, as the first infinite ordinal \(\omega \), sanctioned by the Axiom of Infinity (Sect. 2). The arithmetic operations of summation (equivalently, union) and multiplication/exponentiation (equivalently, the power set operation \(\wp )\) applied to members of \(\omega \) lead again to members of \(\omega \): they do not reach above \(\omega \).

This observation can be copied by a direct reference to the two corresponding operations that generate a union of a given family and the power set of a given set, each operation being guaranteed by the corresponding axiom. Thus a cardinal is said to be weakly inaccessible if it is a limit cardinal above \(\omega \) which is regular (a regular limit cardinal), meaning, firstly, that it is the limit, i.e. supremum (union), of the unbounded set of all the preceding ordinals, and, secondly, that nonetheless it is not the union (supremum) of a smaller set of ordinals. A cardinal \(\kappa \) is strongly inaccessible, or just (plain) inaccessible, if it is a strong regular limit cardinal, i.e. additionally \(2^{\lambda }<\kappa \) for all \(\lambda <\kappa \). (Here \(2^{\lambda }\) is the cardinality of \(\wp (\lambda )\).) Further such notions (of hyper-inaccessibility), which we omit here, have been introduced by reference to the idea of a ‘large limit’ (limit over a large set) of ‘large cardinals’. The axioms ZFC, assumed consistent, cannot imply the existence of an inaccessible \(\kappa \), as then \(V_{\kappa }\), being a model for ZFC, provides proof within ZFC of the consistency of ZFC, in contradiction to Gödel’s incompleteness theorem. This presents the opportunity of adjoining a stronger axiom of infinity asserting its existence.

A second source of largeness is motivated by the study of infinitary languages, the idea being to overcome some of the limitations of first-order languages. For example, in the language \(\mathscr {L}_{\kappa \kappa }\) one admits \(\kappa \) many free variables and permits both infinite conjunctions/disjunctions of any family of formulas of cardinality below \( \kappa \) and the use of quantification over fewer than \(\kappa \) free variables. This leads to the desirability of these languages having a compactness property analogous to Gödel’s compactness property of the ordinary language \(\mathscr {L}_{\omega \omega }\) (see above). Examples of the failure of compactness abound; so it emerges that the desired \(\kappa \), if it exists, needs to be large. Thus a cardinal \(\kappa \) is called strongly compact [66, Chapter 10.3] if the language \(\mathscr {L}_{\kappa \kappa }\) is \((\lambda ,\kappa )\)-compact for each \(\lambda \geqslant \kappa \), that is: for each \(\lambda \geqslant \kappa \) and any set \({\Sigma } \) of sentences in that language with \(|{\Sigma } |\leqslant \lambda \), if each subset \( {\Sigma } ^{\prime }\) with \(|{\Sigma }'|<\kappa \) has a model, then \( {\Sigma } \) has a model. (So the cardinality of \({\Sigma }\) here is not constrained.) The property may be characterized without reference to the language more simply, as saying that every \(\kappa \)-complete filter can be extended to a \(\kappa \)-complete ultrafilter (see Sect. 5.1).

Analogously, a cardinal \(\kappa \) is weakly compact [66, Chapter 10.3] if the language \(\mathscr {L}_{\kappa \kappa }\) is \((\kappa ,\kappa )\) -compact: if any set of sentences \({\Sigma } \) with \(|{\Sigma } |\leqslant \kappa \) such that each of its subsets of cardinality \(<\kappa \) has a model, then \( {\Sigma } \) has a model.

A third source, more promising as will emerge, is more in keeping with the first (‘operational’) viewpoint. It is motivated by the ‘substructures’ analysis initiated in Gödel’s proof that GCH holds in the universe of constructible sets. Attention focusses now on the properties that the operation of elementary embedding could or should have. We recall that the range of such an embedding is an elementary substructure (Sect. 3). Suppose that \(j:N\rightarrow M\) is an elementary embedding, where N and \(\ M\) are transitive classes and j is definable in N by a formula of set theory with parameters from N. (Here we refer to more simply as M, etc.) Then j must take ordinals to ordinals and j must be strictly increasing. Also \(j(\omega )=\omega \) and \(j(\alpha )\geqslant \alpha \), so when there is a least \(\delta \) with \(j(\delta )>\delta \) this is called the critical point of j. Then

is a non-principal \(\delta \)-complete ultrafilter on \(\delta \), i.e. \(\delta \) is a measurable cardinal. In fact, the converse is also true—see [207, Theorem 1.2]. Interestingly, here a non-principal ultrafilter is defined by membership of a single point, albeit via images.

The significance of this characterization lies in the ‘operations’ the function j encodes which, on the one hand, pass the test of ‘elementarity’ and, on the other, introduce an upward jump at the critical point (roughly speaking, an ‘inaccessibility from below by elementarity’).

We mention some further canonical large-cardinal notions obtained from variations on this elementary embedding theme; these will be useful not only presently for the establishment of a reference scale of consistency strength, but also later in relation to the regularity properties of subsets of \(\mathbb {R}\) (such as Lebesgue measurability etc., considered in Sects. 7 and 10).

A cardinal \(\kappa \) is supercompact if it is \(\lambda \)-supercompact for all \(\lambda \geqslant \kappa \); here \(\kappa \) is \(\lambda \)-supercompact if there is a (necessarily non-trivial) elementary embedding \( j=j_{\lambda }:V\rightarrow M\) with M a transitive class, such that j has critical point \(\kappa \), and , i.e. M is closed under arbitrary sequences of length \(\lambda \). Under AC, w.l.o.g. \( j(\kappa )>\lambda \).

For \(\kappa \) a cardinal and \(\lambda >\kappa \) an ordinal, \(\kappa \) is said to be \(\lambda \)-strong if for some transitive inner model (Sect. 3), M say, there exists an elementary embedding \(j_{\lambda }:V\rightarrow M\) with critical point \(\kappa \), \(j_{\lambda }(\kappa )\geqslant \lambda \), and

$$\begin{aligned} V_{\lambda }\subseteq M. \end{aligned}$$

Furthermore, \(\kappa \) is said to be a strong cardinal if it is \( \lambda \)-strong for all ordinals \(\lambda >\kappa \).

This notion may be relativized to subsets S to yield the concept of \(\lambda \)-S-strong, by requiring in place of the inclusion above only that

$$\begin{aligned} j(S)\cap V_{\lambda }=S\cap V_{\lambda }. \end{aligned}$$

(One says that j ‘preserves’ \(S\ \)up to \(\lambda \).) This provides passage to our last definition. The cardinal \(\delta \) is a Woodin cardinal if \(\delta \) is strongly inaccessible, and for each \(S\subseteq V_{\delta }\) there exists a cardinal \(\theta <\delta \) which is \(\lambda \)-S-strong for every \(\lambda<\)\(\delta \). (So the last of these three calls for more ‘preservation’ than the second, but less than the first.)

The consistency strength of various extensions of the standard axioms ZFC, by the addition of further axioms, may then be compared (perhaps even assessed on a well-ordered scale) by determining which canonical large-cardinal hypothesis will suffice to create a model for the proposed extension. Thus, for \(\kappa \) supercompact, \(V_{\kappa }\models \exists \, \mu \;[\)\(\mu \) is strong”],  which places supercompact ‘above’ strong, in the sense that the assumption of the existence of a supercompact cardinal is stronger than the assumption of the existence of a strong cardinal (indeed, also of a strong cardinal below a supercompact). Likewise, for \(\kappa \) strong, \(V_{\kappa }\models \exists \, \mu \;[\)\(\mu \) is measurable”], placing measurability ‘below’ strong, in the same sense. (And below that is the existence of a Ramsey cardinal (Sect. 4.2), recalling earlier comments.) We now have five concepts in play here. The reader may find it helpful to refer to the following mnemonic, or diagram:

$$\begin{aligned} \text {supercompact}> \mathrm{Woodin}> \mathrm{strong}> \mathrm{measurable} > \mathrm{Ramsey} \end{aligned}$$

(of course, there is no pointwise comparison implied here between each supercompact and each Woodin cardinal, etc.), the positioning of the Woodin cardinal flowing from the degree of required ‘preservation’, as above, etc.

5 Beyond the constructible hierarchy L: I

We have mentioned the Löwenheim–Skolem–Tarski theorem. How else may one construct structures that will contain a given one as elementarily embedded? In topology one naturally reaches for powers and products (as with Tychonoff’s theorem), and also their various substructures such as function spaces. For example, Hewitt [93] in 1948 constructed hyper-real fields by using a quotient operation on the space of continuous functions via a maximal ideal; cf. [83, Chapter 13] and [60]. Incidentally, this Dales–Woodin collaboration [59, 60] arose from problems in analysis, Dales being a functional analyst and Woodin being one originally—see [58].

5.1 Expansions via ultrapowers and intimations of indiscernibles

Jerzy Łoś [132] in 1955, though foreshadowed by Skolem’s construction [199] of non-standard arithmetic in 1934, even by Gödel 1930, and in some sense also by Arrow’s reference to similar ideas in his Impossibility Theorem [2] (cf. [99, Chapter 9, p. 475]), introduced a natural algebraic way of constructing new structures. Łoś relied on the concept, introduced in 1937 by Cartan [40, 41], of an ultrafilter: a maximal filter in the power set of I, say. (Though under the name Kranz, Vietoris had already introduced the concept of filter-base in 1921 in his pioneering work on general topology. The assumption of the existence of ultrafilters—see PI in Sect. 1—is (in general) weaker than AC. See [214] for an existence theorem.) For a family of structures , all of identical type/signature, i.e. each having the same distinguished operations and relations on its domain \(A_{i}\) (and possibly distinguished elements, e.g. ), one first defines the direct product as a structure (again of the same type) with domain the set (its non-emptiness in general by AC, of course) by defining the operations and relations pointwise; thus any distinguished element e, say, if interpreted in \(A_{i}\) as \(e_{i}\), say, is interpreted in the product by the function \(e:i\mapsto e_{i}\) (given, by assumption, so no need for AC here). Next, for \(\mathscr {U}\) an ultrafilter on I, define \(\mathscr {U}\)-equivalence: \(f\sim g\), according as , i.e. f and g are pointwise \(\mathscr {U}\)-almost equal. Then denote by the equivalence classes \([f]_{\mathscr {U}}\) and equip these with the requisite operations and relations suitably interpreted as relations that hold pointwise \(\mathscr {U}\)-almost always.

Łoś’s Theorem (ŁT below) asserts satisfaction in the ultraproduct of arbitrary properties/formulas \(\varphi \), say for simplicity with one free variable v, via

for \(\varphi \) any first-order formula (in the language needed to describe a structure of that type/signature). This is proved by induction on the complexity of formulas, the atomic cases holding by fiat (see the definitions above).

If the \(\mathscr {A}_{i}=\mathscr {A}\) are all equal (with domain A), then \( \mathscr {A}\) embeds elementarily into the ultrapower, when \(a\in A\) is identified with the constant map \( f_{a}:i\mapsto a\).

Consider again , \(I=\mathbb {N}\) and \(\mathscr {U}\) an ultrafilter extending the filter of co-finite subsets of \(\mathbb {N}\) (again invoking, say, AC). Then, \(\mathbb {R }\) embeds in , with any real number a represented by the constant function \(f_{a}:n\mapsto a\). We may call the function for \(i\in I\) a dominating function since it plays an important role and dominates any constant function \(f_{m}\) for \(m\in \mathbb {N}\); indeed, , since , and so \(\mathrm{id}\) is an element following all of \(\mathbb {N}\), and so follows all of \(\mathbb {R}\) in . That is, \( \mathrm {id}\) identifies an infinite number; likewise \(1/\mathrm{id}\) identifies a positive (non-zero) element that may be interpreted as an infinitesimal. This observation allowed Abraham Robinson [186, 187] to develop a non-standard analysis within which to interpret and interrogate rigorously Leibniz’s intuitive texts on infinitesimals; see [120] for an undergraduate rigorous development of calculus in this setting, and [131] for a recent textbook account. We note another link with economics here; see [213].

The argument just given may be repeated with \(\mathscr {A}:=\langle A,\in _{A}\rangle \) for A a transitive set and \(\in _{A}\) the relation of membership in A. If \(\mathscr {A}\) is a countable model of ZF, then, provided \(\mathscr {U}\) is countably complete (see e.g. [113, Proposition 5.3]), is well-founded under its ‘interpretation of the membership relation’, so will contain elements that form an interval of ordinals following the ordinals in A. However, there are no means within \( \mathscr {A}\) itself of ‘seeing’ the existence of this extra layer of ordinals: speaking informally (but see below, Sect. 5.2), they are ‘indiscernible’. (Strictly speaking, in the present context, needs to be replaced by an isomorphic structure which is a transitive set, known as the Mostowski collapse, defined inductively by the collapsing function \(\pi \):

(cf. Sect. 2); then interpretations of ordinals collapse to actual ordinals.)

When \(I=\kappa \) with \(\kappa \) the least measurable cardinal and \(\mathscr {U }\) the (\(\kappa \)-complete) corresponding ultrafilter, Dana Scott considered the extension of L to \(L[\mathscr {U}]\) (the Lévy class of sets ‘constructible relative to’ \(\mathscr {U}\)—obtained by allowing definability over the ordinals to refer also to \(\mathscr {U}\) as a set, cf. end of Sect. 6.2—so a class closed under the intersection with \(\mathscr {U}\); see [113, Chapter 1, Section 3], [66, 5.6.2]), and investigated the ultrapower to conclude the non-existence of a measurable cardinal in L. This is easiest to understand through the lens of the assertion that existence of a measurable cardinal contradicts \(V=L\) [192, 66, Section 6.2.10], [11, Chapter 14, Section 6] (so there is no measurable cardinal in L ). This is done again by referring to the dominating function, which vies with \(\kappa \) for the place of smallest measurable cardinal (in the Mostowski collapse). A proper proof needs to avoid doubtful manipulations of \(\mathscr {U}\)-equivalence classes of subclasses of \(L[\mathscr {U}]\). (To achieve this, one represents any function f by one of least rank that is \(\mathscr {U}\)-equivalent to it—the ‘Scott trick’; under these circumstances well-foundedness of the resulting model needs to be verified, using \(\sigma \)-additivity of \( \mathscr {U}\).)

The gist of the proof is to recreate the following contradictions stemming from ŁT. As before, for \(i\in I\), and \(f_{\lambda }:i\mapsto \lambda \) is the constant function on I embedding \(\lambda \) into the ultrapower. By ŁT, the map \(\lambda \mapsto f_{\lambda }\) is injective for \(\lambda <\kappa \) (since , for \(\lambda<\mu <\kappa )\). By ŁT again, \(f_{\kappa }\) is the smallest measurable cardinal in \(\mathscr {A}\) (since is such a cardinal, for all i), hence \( f_{\kappa }=\kappa \) (up to equivalence, really). Now \(\mathrm{id}<f_{\kappa }\) (since ), so \(\mathrm{id}<f_{\kappa }=\kappa \). But, for each \( \lambda <\kappa \), we have (since , as \(\mathscr {U}\) is \(\kappa \)-complete). But has cardinality \(\kappa \), and so \(\kappa \leqslant \mathrm {id}\), contradicting the earlier deduction that \(\mathrm {id} <\kappa \).

Actually, these observations just demonstrate that the embedding \(j=j_{ \mathscr {U}}\), obtained by composing \(\lambda \mapsto f_{\lambda }\) with the Mostowski collapse, satisfies \(j(\lambda )=\lambda \) for \(\lambda <\kappa \), and \(j(\kappa )\), being the collapsed version of , lies strictly above \(\kappa \); thus the ordinal \(\kappa \) is in this sense the critical point of j.

This argument was further investigated by Haim Gaifman, from the point of view of iterating the ultrapower construction, and perfected by Kunen [125].

5.2 Ehrenfeucht–Mostowski models: expansion via indiscernibles

At about the same time as Łoś introduced ultraproducts into model-theory, Ehrenfeucht and Mostowski [67] in 1956 introduced a construction that expands a structure \(\mathscr {A}\) by importing a linearly ordered infinite set of elements in such a way that, speaking anthropomorphically, \(\mathscr {A}\) is incapable of distinguishing between these imports and a certain infinite subset of its own domain. Less than a decade later, first Morley in 1962 (see e.g. [150]) and then Silver in his thesis in 1966 (see [198]) put these features to decisive use, by enabling the imported elements to generate various kinds of information about \( \mathscr {A}\) consistent with that generated by \(\mathscr {A}\) on its own.

The original construction provided an elementary embedding of any infinite structure \(\mathscr {A}\) into another ‘larger’ one—larger in possessing many non-trivial automorphisms, securing in particular a non-trivial elementary embedding. A (copy of a) linearly ordered set X is adjoined to A comprising elements x which are to be ‘indiscernible’ from the viewpoint of \(\mathscr {A}\) (except only in name—as the formal language must adjoin formal names \(c_{x}\) to speak about them) in the sense that:

$$\begin{aligned} (\mathscr {A},(c_{x})_{x\in X})\models \varphi (x_{1},\dots ,x_{n})\Leftrightarrow \varphi (x_{1}^{\prime },\dots ,x_{n}^{\prime }), \end{aligned}$$

for all formulas \(\varphi \) having n free variables, for all n, and all \( x_{1}<\cdots <x_{n}\) and all \(x_{1}^{\prime }<\cdots <x_{n}^{\prime }\) in X. That this is possible in general relies on the Compactness Theorem (and so on AC): the idea here being that if one takes the sentences true in \( \mathscr {A}\) together with the sentences \(\varphi (c_{x_{1}},\dots ,c_{x_{n}})\Leftrightarrow \varphi (c_{x_{1}^{\prime }},\dots ,c_{x_{n}^{\prime }})\) above (and also the inequalities ), then one may satisfy a finite set F of these by interpreting the finite number m of \(c_{x}\)s in play in F, \(c_{x_{1}},\dots ,c_{x_{m}}\) say, with suitably chosen elements of A, as follows. To effect the choice, partition all m-tuples of A dichotomously according as to whether or not \(\mathscr {A}\) can distinguish between them on the basis of the properties defined by the finite number of formulas \(\varphi (v_{1},\dots ,v_{m})\) obtained from the \(\varphi \) in F. (That is: the free variables \(v_{i}\) replace the constants \(c_{x_{i}}\).) Then an infinite homogenous set for this partition yields a model for F. In particular, for limit ordinal \(\delta \), the structure (by abuse of notation \(\in \) here and below denotes membership \(\in \) restricted to \(L_{\delta }\)) can be expanded to a structure with a sequence of indiscernibles to which the formal language gives names \(c_{n}\). Call that \(\mathscr {A}_{0}\). (Here AC may be avoided, as \(L_{\delta }\) is well-ordered.) In turn, for any ordinal \(\alpha \), that expanded structure \(\mathscr {A}_{0}\) may be further extended to a structure \(\mathscr {M}_{\alpha }(\mathscr {A})\) with a set of indiscernibles X of order type \(\alpha \) and with the following additional property: for any formula in the language of , \(\varphi (v_{1},\dots ,v_{n})\) say,

So, in particular, the indiscernibles X can generate all the true sentences about \(\mathscr {A}\). But are the structures \(\mathscr {M}_{\alpha }( \mathscr {A})\) well-founded for all\(\alpha \)? That depends on whether the structures \(\mathscr {M}_{\alpha }(\mathscr {A})\) for just \(\alpha <\omega _{1}\) are all well-founded (the reduction here is possible, since any descending sequence occurring in the models with larger \(\alpha \) can be captured by a countable submodel). This will be so when and \(\kappa \) satisfies the partition relation

$$\begin{aligned} \kappa \rightarrow (\omega _{1})_{2}^{<\omega }. \end{aligned}$$

(With \(\alpha <\omega _{1}\) as above, the argument is similar to but easier than that in the Ehreneucht–Mostowski result. Appealing to the partition relation above in place of Ramsey’s theorem, partition dichotomously according as to whether \( \mathscr {M}_{\alpha }\models \varphi (\xi _{1},\dots ,\xi _{n})\) holds or not; extract an \(\omega _{1}\) homogeneous subset of \(\kappa \) and use its first \( \alpha \) members as the required indiscernibles. (Their Skolem hull in \( L_{\kappa }\), a well-founded set, is isomorphic to \(\mathscr {M}_{\alpha }(\mathscr {A})\).)

A first corollary (by appeal to indiscernibility, use of only the first \( \omega \) indiscernibles, and then the countability of the formal language): only a countable number of subsets of \(\omega \) are constructible in L, even though from the viewpoint of L there are uncountably many of them in L; but then, an embellishment of the analysis yields that \(\omega _{1}^{L}\), the ordinal interpreted by L as the first uncountable, is also countable.

Silver deduced deeper results about L along these lines. Some of these were then bettered by Kunen [125], who devised a way for iterating the ultrapower construction of a structure \(\mathscr {M}\) in a setting where the ultrafilter \(\mathscr {U}\) need not be a member of \(\mathscr {M}\). A most remarkable contribution from Silver was the introduction of the set now called \(0^{\#}\) (zero-sharp) following Solovay (originally designated a ‘remarkable’ set); this is the set of Gödel codes \(\lceil \varphi \rceil \) for all the true sentences \(\varphi \) about L generated by the \(\omega \)-sequence of indiscernibles , namely:

(The notation tacitly assumes that \(n=n(\varphi )\) is the number of free variables in \(\varphi \).) This set’s very existence of course depends on suitable large-cardinal assumptions, such as \(\kappa \rightarrow (\omega _{1})_{2}^{<\omega }\) holding for some \(\kappa \). The ‘existence of \(0^{\#}\) ’ can be used as a large-cardinal assumption in its own right, lying below the existence of the Erdős cardinal. Indeed, in Sect. 7 we discuss the classical theory of analytic sets and thereafter the determinacy of infinite positional games with a target set T, say; the assumption that sets with co-analytic target set are determined (\(\varvec{\Pi }_{1}^{1}\)-determinacy) implies that 0\(^{\#}\) exists, a result due to Harrington [90].

Assuming still the partition relation just mentioned, we return to the indiscernibles for the structures , which had been studied initially by Gaifman and by Rowbottom. Silver’s great contribution was to describe the structure, indeed the ‘very good behaviour’ (below), of a (proper) class X of ordinal indiscernibles: closed (under limits—i.e. under suprema), unbounded in any cardinal \( \lambda \) (with \(X\cap \lambda \) of cardinality \(\lambda )\); with for \(\alpha <\beta \) with both ordinals in X (indeed, stretching the notation to class structures, with ); having the property that every set in L is definable from parameters in \( X.\ \)Among the significant consequences is the, already mentioned, countability of those sets in \(L\ \)that are definable over L without any parameters (implying immediately that \(V\ne L)\), and more importantly the definability of truth in L. For details see e.g. [66, Theorem 4.8]. We stress these results are subject to the partition assumption.

The point (above) about good behaviour concerns particularly the ‘closed unbounded’ nature of X above. Sets of ordinals with this property should be regarded as ‘large’, since they enable the very important ‘stationary sets’ of the next section to be thought of as non-negligible. The two concepts play a leading role in combinatorial principles (holding in L) isolated by Jensen [107] (see e.g. [62]) from the fine structure of L. These include Jensen’s \(\Diamond \) (diamond—[107]), used in constructing a ‘Suslin continuum’ as a counterexample to Suslin’s hypothesis, SH (see Sect. 6.2); \(\square \) (square—[107]); derived ones like \(\clubsuit \) (club), introduced by Ostaszewski [168] (in ‘counterexample’ constructions for general topology); and generalizations \(\clubsuit _{\text {NS}}\) studied by Woodin [225, Chapter 8]. Compare the use of NT (for No Trump) in [17, 20].

6 Beyond the constructible hierarchy L: II

6.1 Cohen’s legacy: forcing and generic extensions

The undisputed game-changer for set theory was Cohen’s ‘method of forcing’. Just as with Cantor, Cohen’s earliest research was on harmonic analysis, but his arrival on the scene was through a constant awareness, since boyhood, of developments in logic, and as though drawn thither under a slow gravitational process (in his words: ‘The continued pull of logic’—[53, Chapter 19.4]). Inspired by Skolem’s work, especially by the existence of ‘countable models’ of set theory (as in Sect. 2), his approach was model-theoretic rather than syntactical—so in contrast to Gödel’s. He devised a means, not unlike the Skolemization of formulas in Sect. 2, of importing into a countable structure \(\mathscr {M}=\langle M,\in _{M}\rangle \) additional sets from \(V\backslash M\) (V contains the reals; M, being countable, does not), without disturbing the fact that \(\mathscr {M}\) may be a model of ZF. Speaking anthropomorphically, the imported set may have the intention of introducing new information—say, the existence of a transfinite sequence of real numbers viewed by \(\mathscr {M}\) as an \(\omega _{2}^{M}\) sequence (reference here to the interpretation in \(\mathscr {M}\) of the second uncountable cardinal), albeit viewed by V as a countable sequence—without nevertheless encoding such “earth-shattering” information as that M itself is countable. Cohen described his method [52] as ultimately analogous to the construction of a field extension: introduce a name for the algebraically absent element, and then describe its properties via polynomials in that element. For stimulating commentary, see [114]. In truth the extension method shares a family resemblance with non-constructive existence proofs, either via the Baire category method (the desired item has generic features), or the Erdős probabilistic method (measure-theoretic: the desired item has ‘random’ features). Indeed, the two canonical instances of forcing to adjoin real numbers, Cohen’s and Solovay’s, are categorical (Cohen reals) or measure-theoretic (‘random reals’, or—perhaps better—‘Solovay reals’). Indeed, following an idea of Ryll-Nardzewski and of Takeuti, Mostowski [156] shows how to guide the selection of an imported set by reference to the points of a Baire topological space (one in which Baire’s theorem holds); avoiding a specified meagre set ensures that the extension of \(\mathscr {M}\) will be a model of ZF. The two canonical cases then correspond to two topological spaces. For an alternative unification, see [127].

One views the forcing method as acting ‘over’ a structure \(\mathscr {M}\) by providing a set P in M of partial possible descriptions of a generic object G yet to be determined. P is thus rendered as a partially ordered set, and under its ordering relation \(q\leqslant p\) is understood as saying that q contains more information about the object to be constructed than does p. There is a syntactic relation \(p\Vdash \varphi \) for \(p\in P\) and \( \varphi \) a sentence, read as ‘p forces \(\varphi \)’, which may be ‘explained’ by an induction reminiscent of the Tarski inductive definition of truth (\(\models \), in Sect. 2), but with significant differences (below).

Before embarking on the details, it is helpful to use an analogy with probability or statistical inference. Indeed, \(p\in P\) is usually called a ‘condition’ and forcing is inspired by the language of ‘conditioning’; its inferences are concerned with information about Ggiven the information in p. Thus the forcing relation must allow for further information which may become available ‘later’, so to speak.

As a first pass, here is a brief glimpse of the character of the forcing relation: as this is a syntactical relation, we refer to a language whose terms are built from functions from P to M, and so we have (see [126, Corollary 3.7] or [128, IV.2.30]):

The final property refers to a function \(\sigma \in M^{P}\) which here acts as a name for an object yet to be interpreted, a matter we return to shortly. (This corresponds to the polynomials in the algebraically ‘absent’ element mentioned above.) A clearer picture will emerge shortly.

Whilst a variant of the forcing relation above was Cohen’s starting point, this is now a derived concept, the usual starting point being a set G that is a filter on P (meaning here that for any \(g_{1},g_{2}\in G\) there is \(g\in G\) with \(g\leqslant g_{1},g_{2}\), and that \(p\in G\) whenever \( g\leqslant p\) for some \(g\in G\)—[126, Section 2.2.4] or [128, III.3.10]) with the property that whenever \(D\ \)is a dense subset of P (i.e. for each p there is \(q\leqslant p\) with \(q\in D)\) and \(D\in M\), then

$$\begin{aligned} G\cap D\ne \varnothing . \end{aligned}$$

Then G is said to be P-generic over \(\mathscr {M}\), or just generic over \(\mathscr {M}\), when P is understood. The filter approach to forcing owes much to developments of Cohen’s approach by such contemporaries as Solomon Feferman, Scott, Shoenfield, and Solovay—see [114].

For M countable, the dense subsets of P lying in M may be enumerated as a sequence \(D_{n}\), and we may choose starting with an arbitrary \(p_{0}\in D_{0}\) and inductively with

The choice is possible precisely because \(D_{n+1}\) is dense. Then meets each \(D_{n}\), and so is generic over \(\mathscr {M}\). The construction of such a ‘complete sequence’ is sometimes called the Cohen diagonalization argument, since, in particular, G decides every sentence \(\varphi \). Indeed, the following set is dense:

(as \(p\notin D_{\varphi }\) implies not (\(p\Vdash \lnot \, \varphi )\) and so \( (\exists \, q\leqslant p)\) []).

The idea is that the dense sets provide a structured way of hinting at the properties of G, and about the various ways that G might be selected, albeit conditional on some given state of knowledge p. The sequence \(p_{n}\) above runs through all possible dense sets in an arbitrary order, and brings into existence a particular realization of G. Before G is created, there are only names for G and for all the possible objects in the intended extension, given simply by the functions in \(M^{P}\). (As above, this corresponds to the use of polynomials in field extension.) But, once a generic G is given, one may proceed inductively to give an interpretation \(\tau ^{G}\) to the ‘names’ \(\tau \in M^{P}\) of objects. Inductively, put

(mirroring the Mostowski collapse of Sect. 2), and so construct the extension \(\mathscr {M}[G]\) as the set of G-interpretations \(\tau ^{G}\). In this setting, one then defines forcing (relative to P and \(\mathscr {M}\)) by:

This should clarify the three properties of the forcing relation introduced earlier.

It emerges that if \(\mathscr {M}\models \mathrm{ZFC}\), then \(\mathscr {M}[G]\models \mathrm{ZFC}\). Furthermore, if P satisfies the so-called countable chain condition (‘ccc’) (which actually calls for antichains of P in \(M\ \)to be countable in M), then all ordinals that are cardinals from the viewpoint of \(\mathscr {M}\) continue to be cardinals from the viewpoint of \(\mathscr {M}[G]\), and their cofinalities [106] remain the same—see e.g. [126, Theorem 5.10], or [128, Theorem IV.3.4].

To secure the failure of CH, Cohen used as his conditions finite sets p with elements of the form:

which act as coded messages about objects, named as \(c_{\alpha }\), to be imported from outside M asserting that \(n\notin c_{\alpha }\) if \(i=0\) and \( n\in c_{\alpha }\) if \(i=1\). As with the ‘dog that did not bark’, that which p will never say allows us to infer that \(c_{\alpha }\) will be a subset of \( \omega \): this is forced to be the case, since no extension of the coded message p can say otherwise. Thus p ‘hints at information’ by the absence of information.

Formally, the corresponding P, called since it adds \(\omega _{2}\) many subsets of \(\omega \), may be defined in M to comprise ‘partial functions’ p with finite domain contained in and range in \(\{0,1\}\), and with the ordering of ‘increasing informativeness’ that \(q\leqslant p\) if \(p\subseteq q\), that is, q contains at least all of the information in p. The filter G in \(P\ \)has the property that for some . Indeed, for \(n,\alpha \) as above, each of the sets

is dense, as may be readily checked. (Hint: Given \(p\notin D_{n,\alpha }\) choose q to contain both p and So \(G\ \) must meet \(D_{n,\alpha }\) for each \( n \in \omega \,\mathrm{and}\, \alpha \in M\) (as \(\omega \subseteq M\), since \(\mathscr {M}\models \mathrm{ZFC}\)). For \(\alpha \in M\cap \omega _{2}^{M}\), put

Moreover, for distinct \(\alpha ,\beta <\omega _{2}\), put

which is dense. (Given \(p\notin {\Delta } _{\alpha ,\beta }\) choose q to contain both p and for some large enough m.) So for distinct \(\alpha ,\beta \in M\cap \omega _{2}^{M}\), G contains , for some n and i, and indeed with \(i=1\), say (w.l.o.g.). Then \(n\in G_{\alpha }\backslash G_{\beta }\). Thus in \(\mathscr {M}[G]\) there are \(\omega _{2}^{M}\) distinct subsets of \(\omega \), and so from the viewpoint of \(\mathscr {M}[G]\) the continuum is at least \(\omega _{2}\) (since \(\omega _{2}^{M}\) is still the interpretation of \(\omega _{2}\) in \(\mathscr {M }[G]\) by the ccc, which is satisfied by P here).

We have just given an example of importing a set in order to increase the cardinality of the continuum. (Note that this construction may be repeated with \(\omega _{2}^{M}\) replaced by \(\omega _{\tau }^{M}\) for \(\tau \) with any cofinality other than \(\omega \), that being the only restriction (König’s theorem) on the cofinality of the continuum—see Sect. 1.)

An important ingredient in Solovay’s result on LM in [204] (as simplified by Kenneth McAloon) in constructing a model of \(\mathrm{ZF}+\mathrm{DC}\) in which all sets of reals are Lebesgue measurable (cf. [113, Chapter 13, Section 11]), to which we refer in Sect. 10.2, is the use of a further partial order . This had been introduced by Lévy in order to alter/collapse a (strongly) inaccessible cardinal \(\kappa \) so that in the ‘extension’ \(\mathscr {N}=\mathscr {M}[G^{\kappa }]\) (\(G^{\kappa }\) being \(P^{\kappa }\)-generic over \(\mathscr {M}\)) it is the ordinal \(\kappa \) that appears as the first uncountable cardinal \(\omega _{1}^{N}\). Consequently the ordinals below \(\kappa \) are made to be countable by the importation of appropriate (generic) enumerations. Interest focuses on the substructure \(\mathscr {N}_{1}\) with domain the sets that are hereditarily definable (over \(\mathscr {N}\)) from a parameter in \( \mathscr {N}\cap On^{\omega }\) (i.e. from an \(\omega \)-sequence of ordinals in \(\mathscr {N}\)), much as defined earlier in Sect. 2. \(\mathscr {N}_{1}\) satisfies the axioms ZF (see [161]), and, significantly here, shares with \(\mathscr {N}\) the same \(\omega \)-sequences of ordinals, in particular the same reals. (Here the reals are identified via binary expansions (\(\omega \)-sequences) with characteristic functions of subsets of \(\omega \).)

The Lévy conditions (elements of \(P^{\kappa })\) this time are partial functions with finite domain and range in \(\kappa \). Since there are no bounds placed on the range values of the partial function in this P, it follows that for \(\alpha <\kappa \) the functions defined (from \(G^{\kappa }\) above) by:

will collectively witness (by enumeration) that each \(\alpha <\kappa \) is countable. This ensures that \(\kappa \) “viewed from” \(\mathscr {M}[G^{\kappa }]\) is \(\omega _{1}\). Solovay’s purpose is to turn any transfinite sequence of ordinals below an inaccessible \(\kappa \) into an \(\omega \)-sequence. This helps him turn an arbitrary set of reals A that lies in \(\mathscr {N}_{1}\), initially definable in \(\mathscr {N}\) via ordinal parameters, into one that is definable via a real \(a \in N \). (This also carries the advantage that, since \( \kappa \) retains its inaccessibility in the ‘extension’ (see [204], I.2.7]), one may w.l.o.g. argue as though were \(\mathscr {M}.)\) As both \( \mathscr {N}\) and \(\mathscr {N}_{1}\) have the same reals, they also have the same open sets (coded by reals detailing the sequence of rational-ended intervals that they contain), so the same Borel sets, and the same null\(\mathscr {G}_{\delta }\)-sets.

Solovay’s surprising innovation was to force over using its non-null Borel sets \(\mathscr {B}_{+}\), ordered by inclusion (smaller sets yielding more information as to location). The key idea here is to introduce the notion of a random real, namely a real that cannot be covered by any null \(\mathscr {G}_{\delta }\)-set coded canonically by a real c of the model . (Solovay thought of these as ‘random’ (over \(\mathscr {M}\)[a]); we have already mentioned that Cohen reals are categorical, while random (‘Solovay’) reals are measure-theoretic; the term generic was already in use, so unavailable. Compare our earlier use of the language of probability and statistical inference above. One might also mention the term pseudo-random number in computer simulation.) But, being countable, there are only countably many such codes, so in V the set of non-random reals is null. For a set \(A\subseteq \omega ^{\omega }\) (in N) that is definable from an \( \omega \)-sequence of ordinals (i.e., by a sequence from \(On^{\omega }\)), suppose that with a as above, for some formula \(\varphi _{A}\) say, . It emerges ([204], I.4], [113], 10.21]) that \(\mathscr {N}=\mathscr {M}[G^{\kappa }]\) may also be expressed in the form \(\mathscr {N}=\mathscr {M}[a][G']\) for some filter \(G^{'}\) that is \(P^{\kappa }\)-generic over \({\mathscr {M}}[a]\). This enables one to choose a formula \(\psi _A\) (by some deft ‘unscrambling’ – [204], III.1.4/5], cf. [113], p. 140]) such that, for x random over \(\mathscr {M}[a]\),

In \(\mathscr {B}_{+}\) choose a maximal (necessarily countable, by positivity of measure here) antichain of Borel sets \(\mathscr {C}\) whose elements ‘decide’ the sentence (i.e. force the sentence or its negation), where \(\check{a}\) is a name in the language LST for the set a given above, and \(\dot{r}\) is a name for a random real (cf. the use of \(\dot{q}\) in Sect. 6.2). Then, referring to \(\mathscr {B}_{+}\)-forcing, for all x random over \(\mathscr {M}[a]\)

here \(F_{c}\) is a non-null closed set canonically coded by c (cf. [113], p. 140]). So modulo the null set of non-random reals, A is an \(\mathscr {F}_{\sigma }\): so A is (Lebesgue) measurable. In summary: an arbitrary set of reals A in \( \mathscr {N}_{1}\), being hereditarily definable from an ordinal sequence, is measurable. Though we do not pursue the details here, Solovay shows that DC holds in \(\mathscr {N}_{1}\); note that later Mathias [142] proved that also the partition relation \(\omega \rightarrow (\omega )_{2}^{\omega }\) holds in this model.

6.2 Forcing axioms

Solovay’s argument makes heavy use in various ways of ‘two-step extensions’ like with G an \(\mathscr {M}\)-generic filter and H an -generic filter. By implication, G is associated with a partial order P in \(\mathscr {M}\) and H with a partial order Q in . This can be turned into a one-step extension , but in a perspicuous way (more general than cartesian products), so that a generic extension of a generic extension is again a generic extension, as we now explain. There is a family resemblance here to the use made in probability theory of the law of iterated conditional expectation (the tower law), which involves iterated conditioning by comparable \(\sigma \)-fields. Since the model is created by interpreting ‘names’ (using G as in \(\tau ^{G}\) above), the partial order for the equivalent single step needs to be built out of P and out of a name \(\dot{Q}\) for Q, and must refer to pairs \((p,\dot{q})\) with \(p\in P\) and \(\dot{q}\) a name for something that is P-forced to lie in \(\dot{Q}\); likewise, the order on the resulting composition of the two partial orders, denoted , must make use of how the P-conditions P-force the extension property \( \dot{q}\leqslant \dot{q}^{\prime }\) between relevant names for elements of . Thus a kind of syntactical analysis in \(\mathscr {M}\) underlies this ‘iterated forcing’. More generally, any ordinal \(\alpha \) of \( \mathscr {M}\) can provide the basis for \(\alpha \)-step iterations, and, as with the bases for the various topologies on products so too here, various kinds of \(\alpha \)-iterations may be constructed by appropriate constraints on the supports (e.g. finite or countable). We omit the details, except to mention that it was by use of such an iteration that Solovay and Tennenbaum [208] showed that it is consistent that no Suslin continuum exists (so otherwise than in L, where such does exist); this led to the more general observation, proved by Martin and Solovay: the consistency of Martin’s Axiom, MA ([140], cf. [75]), namely the statement that for all cardinals \( \kappa \) below the continuum (\(\kappa < \mathfrak {c}\)) the following holds:

For every partial order P satisfying the countable chain condition (ccc), and any family\(\mathscr {F}\)with\(|\mathscr {F}|\leqslant \kappa \)of dense subsets of P there is a filter G in P which meets each \(D\in \mathscr {F}\).

The reader will notice the similarity between the property of G here and that of a filter P-generic over \(\mathscr {M}\); indeed Martin (and independently Rowbottom) proposed this axiom as a combinatorial principle that is ‘forcing-free’—so, in particular, with the potential for immediate applicability without expertise in logic. That potential was so quickly realized both in theorem-proving and counterexample-manufacture—look no further than [75]—that it became the ‘tool of first choice’ when abstaining from CH whilst harbouring CH-like intuitions, because, like Zorn’s Lemma, it encapsulates a ‘construction without (transfinite) induction’: the latter is replaced with side-conditions swept away into \( \mathscr {F}\), the family of dense sets. Of course, the ‘implied’ induction was performed, off-line so to speak, in the Martin–Solovay paper [140], aptly titled ‘Internal Cohen extensions’, reflecting the view that MA asserts that the universe of sets is closed under a large class of generic extensions. This will be a recurring theme below.

In regard to MA’s huge significance as an alternative to the continuum hypothesis: we cite after Martin and Solovay [140] the statistic that at least 71 of 82 consequences of CH, as given in Sierpiński’s monograph [197], are decided by MA or . Amongst these are that MA implies:

  1. (1)

    \(2^{\aleph _{0}}\) is not a real-valued measurable cardinal;

  2. (2)

    the union of less than \(2^{\aleph _{0}}\) (Lebesgue) null /meagre sets of reals is null/meagre;

  3. (3)

    Lebesgue measure is \(2^{\aleph _{0}}\)-additive;

and that implies:

  1. (1)

    Suslin’s hypothesis (SH) that every complete, dense, linear order without first and last elements in which every family of disjoint intervals is at most countable (the Suslin condition) is order-isomorphic to \(\mathbb {R}\);

  2. (2)

    every \(\varvec{\Sigma }_{2}^{1}\) set of reals (for the \( \varvec{\Sigma }\) and \(\varvec{\Pi }\) notation of the projective hierarchy see Sect. 9) is Lebesgue measurable and has the Baire property;

  3. (3)

    every set of reals of cardinality \(\aleph _{1}\) is \(\varvec{\Pi }_{1}^{1}\) (co-analytic) iff every \(\aleph _{1}\) union of Borel sets is \(\varvec{\Sigma }_{2}^{1}\).

On a personal note, one of the present authors [167] considered consequences for aspects of the theory of Hausdorff measures [188] and measures of Hausdorff-type, cited in [75, 31I (d)].

It is worth remarking that an equivalent of MA is the topological statement that, in a compact Hausdorff space whose open sets satisfy the countable chain condition, the union of less than \(2^{\aleph _{0}}\) meagre sets is meagre [75, 223]. This identifies MA as a variant of Baire’s Theorem, and gives it a special role in the investigation of the additivity properties etc. of classical ideals such as the null and meagre sets, for which see [8] and Sect. 10.6.

Given its particular usefulness and origin, MA, termed a Forcing Axiom, inspired the search for further, more powerful, forcing axioms. The first to occupy centre-stage is the Proper Forcing Axiom (PFA). This is an extension of , which draws in more model theory. At the price of replacing all the cardinals \(\kappa <\mathfrak {c}\) by allowing just \(\kappa =\aleph _{1}\), PFA relaxes the ‘ccc’ restriction. (In fact, Todorčević and Veličković [216, 221] showed that PFA implies that \(\mathfrak {c}=\aleph _{2}\), so allowing ‘back in’ all the, rather few, cardinals \(\kappa <\mathfrak {c.)}\) The relaxation widens access to the class of proper partial orders (below), and so asserts:

\(\mathrm{PFA} :\)For every partial order P that is proper and any family\(\mathscr {F}\)with\(|\mathscr {F}|\leqslant \aleph _{1}\) of dense subsets of P there is a filter G in P which meets each \(D\in \mathscr {F}\).

The definition of properness refers to the interplay between the whole of the partial order P and those fragments of P that appear in ‘suitably rich’ countable structures, as follows. A partial order P is proper if, for any regular uncountable cardinal \(\kappa \) and countable model \(\mathscr {M}\prec H(\kappa )\) (the family of sets hereditarily of cardinal less than \(\kappa \) [66, Chapter 3, Section 7]; for the meaning of \(\prec \) see Sect. 3) with \(P\in M\):

For each \(p\in P\cap M\) and each \(q\leqslant p,\) every antichain \(A\in M\) contains an element r compatible with q.

(This formulation obviates the need to refer to ‘maximal antichains’.) The class of proper partial orders includes both those satisfying ccc (which preserves cardinality, and cofinality) and those with countable closure (i.e. guaranteeing a lower bound for any decreasing \(\omega \)-sequence). A consistency proof for PFA needs use of a supercompact cardinal (for which see Sect. 4.3). See [9] for applications and discussion (especially remarks after Theorem 3.1 there concerning the need for a supercompact and its ‘reflection properties’), and also [63, 128, V.7], and the more recent [149]. A wider variant still is SPFA, based on \(\aleph _{1}\)- semiproper forcing. The maximal version, known as Martin’s Maximum (MM) was introduced by Foreman, Magidor and Shelah [73], and like PFA needs a supercompact cardinal for a proof of its consistency. Here the role of \( \omega _{1}\) as \(\aleph _{1}\) (in merely prescribing a cardinality bound) changes in order to create an \(\omega _{2}\)-chain condition, as we shall see presently. Prominence is given now to the stationary subsets of \( \omega _{1}\) (defined below), cf. Sect. 5.2; these are the ‘non-negligible’ subsets in relation to coding, and their definition draws on some associated ‘large’ sets, namely: the subsets that are closed and unbounded (cofinal) in \(\omega _{1}\), with which we begin. A set \(C\subseteq \omega _{1}\) is closed if it contains all its limit points (i.e. for limit \(\alpha \) whenever \(C\cap \alpha \) is cofinal in \( \alpha )\); such sets form a filter, as any two unbounded closed sets meet (assuming a context where \(\omega _{1}\) has uncountable cofinality). A subset \(S\subseteq \omega _{1}\) is stationary if S meets every closed unbounded set. In MM, the partial orders P are required to preserve stationarity. This condition is motivated by a question about the ‘negligible sets’ comprising the non-stationary ideal, i.e. the ideal of non-stationary sets (denoted \(\ell _{\mathrm{NS}}\) or NS\(_{\omega _{1}}\)): whether it is \(\omega _{2}\)-saturated, i.e. whether every \(\omega _{2}\)-sequence of stationary sets contains at least two members intersecting again in a stationary set. If so, then the Boolean algebra \(\wp (\omega _{1})/\ell _{\mathrm{NS}}\) is complete and satisfies the \(\omega _{2}\)-chain condition. MM implies this.

It is interesting to summarize the last paragraph by saying that here, just as in Solovay’s construction of Sect. 6.2 for LM (which uses an inaccessible), large cardinals act as enablers of forcing iterations. For a textbook treatment see [225].

Woodin [225, 226] has forcefully argued for a canonical model where CH fails (cf. Coda); it is a forcing extension of \(L(\mathbb {R})\), i.e. of the Hajnal ‘constructible closure’ of \(\mathbb {R}\) (the class of sets constructible from some real in V—[66, Chapter 5, Section 6.1], cf. [113, Chapter 1, Section3]; this is not to be confused with the Lévy class of sets ‘constructible relative to a given set’ [66, Chapter 5, Section 6.2], which occurs in Sect. 5 in the shape of \(L[\mathscr {U}]\) with distinct notation). To distinguish between L(U) and L[U], one may follow Kunen in speaking of constructing respectively “from U as a set” (so that \(A\in L(A))\) and “from U as a property” (so that \(U\cap x\in L[U]\) for \(x\in L[U])\). (Recall, however, from Sect. 6.1 that contains G.)

7 Suslin, Luzin, Sierpiński and their legacy: infinite games and large cardinals

After the (necessarily) extensive excursion into logic and model theory, we now re-anchor all this to analytic practice. Henceforth, we intertwine these two aspects. For the Analysts’s point of view of set theory, we can do no better at this point than to cite C. A. (Ambrose) Rogers, a modern-day analyst par excellence (with a pedigree of: Geometry of Numbers, Discrete geometry, Convexity, Hausdorff measures, Topological descriptive set theory). In his last phase (post 1960), Rogers famously ‘would often give talks entitled “Which sets do we need?”, his answer being: “analytic sets”’ (cited from [174]). To these we now turn. For background here, see [189].

7.1 Analytic sets

Analytic subsets of \(\mathbb {R}\) are precisely the sets that arise as projections of planar Borel sets. Their initial (‘classical’) study, principally by Suslin, Luzin and Sierpiński, was prompted by Lebesgue’s erroneous assertion, in the course of his research on functions that are ‘analytically representable’, that these projections were Borel. But they need not be, as was first observed by Suslin in 1916. Indeed, an analytic set is Borel iff its complement is also analytic [209]. Until that moment the typical sets considered by analysts were Borel. Fortunately for Lebesgue’s research goals, analytic sets are extremely well-behaved: in the first place projections of analytic sets are inevitably analytic, and furthermore they have the following three regularity properties (the classical regularity properties below): they are measurable [133], they have the property of Baire [165], and likewise the perfect-set property [1] (they are either countable or contain a perfect set), and in certain circumstances are well approximable from within by compact subsets (they are ‘capacitable’—a property discovered independently by Davies [61] in 1952 and in a general topological context by Choquet in 1952 [43,44,45,46]).

The newly discovered sets emerged as the first-level sets of the (Luzin) projective hierarchy (also called the analytical hierarchy) generated from the Borel sets by alternately applying the operation of projection and complementation (a fact later recognized also through the analysis of their logical complexity: counting how many alternations of existential and universal quantifiers over the reals are needed to define them, and identifying the preliminary quantifier: be it existential or universal). However, the very successful classical study of analytic sets struggled to promote much of the ‘good behaviour’ up the hierarchy. At the margins, of particular interest, was Kondô’s uniformization theorem of 1939 (that a co-analytic planar set has a co-analytic uniformization, i.e. contains a co-analytic graph selecting one point from each vertical section). See Jayne and Rogers [104, Introduction] for the role of AC in selection theorems generally.

The message from set theory in Gödel’s inner universe of sets L was particularly depressing: Kondô’s theorem implied the existence in L of an analytic set whose complement failed to have the perfect-set property (the culprit was the canonical well-ordering of L, which relative to L lies at the second projective level—for a particularly insightful analysis, see [85], and also [222] for its ‘black-box’ approach, that tracks only descriptive character).

Further progress seemed doomed. But an unlikely development, in the shape of a game-theoretic rival to AC, unblocked the log-jam. However, it was left to a later generation to pore over the classical achievements to extract the necessary inspiration from the classicists by drawing in a further theme: the Banach–Mazur games.

To explain this development we need to explore some analytic-set theory. Suslin’s characterization [209] in 1917 of analytic sets \(S\subseteq \mathbb { R}\) asserts they may be represented in the form

where each of the determining sets is closed and of diameter at most \(2^{-n}\)—so that \(F(\mathbf {i})\) has at most one member; here

(For this reason, the operation taking a determining system to the set S above is now usually called the Suslin operation, though it is sometimes called the A-operation as in [129], apparently named for Alexandrov, who had devised it to construct perfect subsets of uncountable Borel sets [1].) Implicit in the formula is an operation on the determining system of sets , which includes countable intersection and countable union (and preserves analyticity if the determining system comprises analytic sets, rather than specifically closed sets [189, Part 1, Section 2.3]). The Suslin representation goes beyond countable union seemingly towards a continuum union, but one that is constrained by the upper hemi/semi-continuity of the map \(\mathbf {i\mapsto }F(\mathbf {i})\).

Under this ‘continuous union’ lie hidden the countable ordinals, by virtue of the countable tree T of all finite sequences \(\mathbf {i}{|}n\) (ordered by sequence extension). For any x the associated subtree

is well-founded iff \(x\notin S\), as then \(T_{x}\) has no paths (infinite branches); indeed \(x\notin F(\mathbf {i})\) for all \(\mathbf {i}\). (This tree idea, with the replaced by rationals, goes back, albeit under the name ‘sieve’ (crible), to Lebesgue’s construction of a measurable set that is not Borel.) The overall complexity of the subtree may then be measured by a countable ordinal, known as the Luzin–Sierpiński index of the tree \(T_{x}\) (or of the point x)—[134]. This is obtained rather as the Cantor–Bendixson index of a scattered set is obtained by the repeated (inductive) removal of isolated points, except that here one removes at each stage the terminal nodes of a tree. (A moment’s reflection shows this corresponds to a linear ordering of the finite sequences, akin to lexicographic but adjusted to allow shorter sequences to preceed their longer extensions, such that the tree \(T_{x}\) is well-ordered iff it is well-founded: this is the Kleene–Brouwer order.)

When the determining system of S (i.e. the family of sets above) consists of closed sets, it readily follows, via its countable transfinite definition, that the set of points x in the complement of S with index bounded by a fixed \(\alpha <\omega _{1}\) is Borel. It is also immediate that the complement of an analytic set is a union of \(\omega _{1}\) Borel sets, since the index is bounded by \(\omega _{1}\). The important boundedness property of the index (that it remains bounded over any analytic set \(S^{\prime }\) in the complement of S by a corresponding countable ordinal, a matter that hinges on the ‘continuous union’ aspect) leads to a proof of the FirstSeparation Theorem: disjoint analytic sets may be covered by disjoint Borel sets. From here, as an immediate corollary, an analytic set with analytic complement is Borel.

7.2 Banach–Mazur games and the Luzin hierarchy

We recall that a Banach–Mazur game with target set \(S\subseteq \mathbb {R}\) is an infinite positional game which may be viewed as played by two players ‘alternately picking ad infinitum’ the digits of a decimal expansion of a real number—but this needs the interpretation that each player selects a function (a strategy) determining that player’s choice of next digit, given the current position—with the first player declared the winner iff the real number generated from the play of the two strategies falls in S, and otherwise the second. The target set S is said to be determined if one or other of the players has a winning strategy. Mazur proposed the game (this is Problem 43 in the Scottish Book, [144]), and Banach responded in 1935 by characterizing determinacy by the property of Baire. See [80] for an alternative infinite game which offers a measure-theoretic result as a contrast to Banach’s category result.

It is clear from its description that the game offers a natural interpretation for a sequence of choices in a manner related to ACC. In 1962 Mycielski and Steinhaus [158] proposed the Axiom of Determinacy (AD) as an alternative to AC—in essence setting the task of ascertaining its consistency relative to ZF. See [157] for an account of the consequences of AD current in 1964, making the case that, in a hoped-for subuniverse of sets in which AD holds, the well-known ‘paradoxes’ (Hausdorff, Banach–Tarski, etc.) flowing from AC would be ruled out, while at the same time preserving standard analysis in \(\mathbb {R}\) (since ‘countable choice’ for a countable family with union at most a continuum of members follows from AD—and so, in view of the continuum restriction, it is usual to work with \(\mathrm{AD}+\mathrm{DC}\)).

We may pass now to a generalization of Suslin’s representation for analytic sets, which enabled higher-level analogues of the classical regularity properties. Interpreting \(\mathbb {N}^{\mathbb {N}}\) as the set of irrationals (via continued fraction expansion), we may w.l.o.g. assume that \(S\subseteq \mathbb {N}^{\mathbb {N}}\). This carries the simplifying advantage that, ignoring a countable set of lines, we may easily identify planar sets, regarded as lying in , with subsets of \(\mathbb {N}^{\mathbb {N}}\) (merging a pair (xy) into a single sequence \(\langle x,y\rangle \)) and so regard projection as an operation from \(\mathbb {N}^{\mathbb {N}}\) to \(\mathbb {N}^{\mathbb {N}}\). Replacing by its \(2^{-n}\) open swelling yields that \(s\in S\) iff for some \(\mathbf {i}\in \mathbb {N}^{\mathbb {N}}\)

here we interpret as a (rational) point of \(\mathbb {R}\) (and implicitly refer to the metric of first difference: \(d(x,y)=2^{-n}\), when xy differ first in their \(n^{\text {th}}\) term). We can tidy up further while working in \(\mathbb {R}\), by assuming compact and replacing with a union of a finite number of rational-ended closed intervals. Coding such finite unions in \(\mathbb {N}\), we arrive at a reformulation of Suslin’s characterization: for T a tree of finite (pairs (uv) of) sequences, define the projection of T into \( \mathbb {N}^{\mathbb {N}}\) by

then \(S\ \)is analytic iff \(S=p(T)\) for some appropriate tree T of finite sequences of elements of . The generalization to a \(\gamma \)-Suslin set for ordinals \(\gamma \) is obtained by taking trees T of finite sequences of elements from , and provides the context allowing the regularity properties of category and measure to be lifted up the projective hierarchy.

A set that is \(\gamma \)-Suslin for some \(\gamma \) is said to be a homogeneously Suslin set if there is an \(\omega _{1}\)-complete ultrafilter \( \mathscr {U}_{x|n}\ \mathrm{on} \ \gamma ^{n}\) for each such that for all n

(membership witnessed via a ‘large’ set of nodes), and the following holds

(projection equivalent to passage through a ‘large’ sets of nodes at each height/level; the sequence is then said to be countably complete). In using the index set \(\gamma ^{<\omega }\) these generalizations sound muted echoes of the non-separable theory of analytic sets (pioneered in the West by Stone, Hansell, Sion—see [212] and [170]—and in Central Europe by Frolík, Holický, Pol).

Martin, generalizing [138], shows in [141, Theorem 2.3] that homogeneously \( \gamma \)-Suslin sets are determined (as well as having the classical regularity properties of Sect. 4.2), and that if Ramsey cardinals exist, then co-analytic sets are homogenously Suslin. This last result is a re-interpretation of Martin’s earlier theorem [138] that if there is a Ramsey cardinal (e.g. if there is a measurable cardinal), then analytic games are determined.

Two features of the analysis of a co-analytic set C via the Luzin–Sierpiński index are of great significance to the study of projective sets. First, the index maps to the ordinals, i.e. into a well-ordered set, and so the index induces a prewellordering, rather than a well-ordering on the set C (as distinct points of C may be mapped to the same ordinal). Secondly, denoting the index by \(\rho \), the relation

$$\begin{aligned} R^{+}(x,y):=x\in C\text { and }\rho (x)\leqslant \rho (y), \end{aligned}$$

and its negation \(R^{-}(x,y)\) are both Borel, and so both co-analytic. Taking an abstract viewpoint, a class \({\Gamma } \) of sets in \(\mathbb {N}^{ \mathbb {N}}\) may be said to have the prewellordering property if for every set \(C\in {\Gamma } \) there is a map \(\rho :C\rightarrow On\) such that both of \(R^{\pm }(x,y)\) are in \({\Gamma } \). (The map is then called a \( {\Gamma } \)-norm.) Suppose that the complementary class \(\check{{\Gamma }} \) (i.e. of sets with complement in \({\Gamma } \)) is, like the analytic sets, closed under projection; then the class of sets \(\exists ^{1}{\Gamma } \) obtained as the projections of sets in \({\Gamma } \) also has the prewellordering property. This would have been clear to Luzin and Sierpiński; but, with the introduction of determinacy, a new feature arises (we omit one technicality below):

The First Periodicity Theorem ([139, 153]): For a class of sets\( {\Gamma }\)for which the sets in theambiguous class\({\Delta } _{{\Gamma } }:={\Gamma } \cap \check{{\Gamma }}\)are determined: for every\(C\in {\Gamma } \), ifCadmits a\({\Gamma } \)-norm, thenadmits a norm in the class of sets\(\forall ^{1}\exists \, ^{1}{\Gamma } \), i.e. in the class of sets of the formfor some\(C^{\prime }\)in\({\Gamma } \).

Thus, in particular: inductively, if the \(\varvec{\Sigma }_{2n}^{1}\)-class (for the \(\varvec{\Sigma }\) and \(\varvec{\Pi }\) notation of the projective hierarchy, again see Sect. 9) has the prewellordering property, then so does the \(\varvec{\Pi }_{2n+1}^{1}\)-class, assuming determinacy of the ambiguous class \(\varvec{\Delta }_{2n}^{1}\). The \(\varvec{\Pi }_{2n+1}^{1}\)-class yields quite directly a prewellordering for the class \(\varvec{\Sigma } _{2n+2}^{1}\): if for C in \(\varvec{\Pi }_{2n+1}^{1}\) with norm \(\rho _{C}\), then a norm (of the corresponding class) for A may be defined by

Thus, given the determinacy, the prewellordering property ‘zig-zags’ between the \(\varvec{\Pi }\) and \(\varvec{\Sigma }\) classes.

Part of the motivation to take a game-theoretic approach to the projective sets was the appearance in 1967 of a new proof of the earlier mentioned Suslin separation theorem for analytic sets (actually of the stronger variant: Kuratowski’s Reduction Theorem, [129, II, Section 26], [189, 5.8]) given by David Blackwell [27] on the basis of the Gale–Stewart proof of the determinacy of open sets [79] of 1953. This caught the attention of Martin and Moschovakis, who thus independently arrived at the first of the periodicity theorems. The wealth of insights thereafter is history: witness the very title of Mathias’s ‘Surrealist landscape with figures’ survey [143], capturing the spirit of the time.

It was a careful reading of Kondô’s proof of the uniformization of \(\varvec{\Pi }_{1}^{1}\)-sets by a \(\varvec{\Pi }_{1}^{1}\) graph that initially led Moschovakis to isolate a more general kind of \({\Gamma } \)-norm: that of a \({\Gamma } \)-scale which refers to an \(\omega \)-sequence of \( {\Gamma } \)-norms \(\rho _{m}\) defined on a set C of \({\Gamma } \) with associated relations in \({\Gamma } \) (as with the single \({\Gamma } \)-norm above), but with an additional ‘convergence-guiding’ property:

For any sequence\(c_{n}\in C\) with \(c_{n}\rightarrow c_{0} \), if for eachm

then \(c_{0}\in C\) and \(\rho _{m}(c_{0})\leqslant \lambda _{m}\) for allm. (See e.g. [139, Section 8.2].)

Mutatis mutandis, the Moschovakis Second Periodicity Theorem [153] has the same form as the First but with \({\Gamma } \)-scale replacing \({\Gamma } \)-norm throughout. Analogously, the Second Theorem implies that the Kondô uniformization property likewise zigzags between the \(\varvec{\Pi }\) and \( \varvec{\Sigma }\) classes—see [151].

Guided by the original \(\varvec{\Pi }_{1}^{1}\)-norm (the Luzin–Sierpiński index), having range in \(\omega _{1}\) (less, if the \(\varvec{\Pi } _{1}^{1}\) set in question is Borel), one defines the projective ordinal of level n by reference to the sets in the ambiguous class \( \varvec{\Delta }_{n}^{1}\)

$$\begin{aligned} \varvec{\delta }_{n}^{1}:=\text {supremum of the lengths of prewellorderings in }\varvec{\Delta }_{n}^{1}. \end{aligned}$$

(Naturally, evaluation or estimation of these ordinals, under suitable axiomatic assumptions throws some light on the size of the continuum.) Martin showed that \(\varvec{\delta }_{2}^{1}\leqslant \omega _{2}\), with equality implied under AD by the Moschovakis result that \(\varvec{\delta }_{n}^{1}\) for \(n\geqslant 1\) is a cardinal and that, under the hypothesis PD that all projective sets are determined (see Sect. 10 and references there), \(\varvec{ \delta }_{2n}^{1}<\varvec{\delta }_{2n+2}^{1}\). Under \(\mathrm{AD}+\mathrm{DC}\), \(\varvec{\delta } _{2n}^{1}=(\varvec{\delta }_{2n-1}^{1})^{+}\) (i.e. the even-indexed ordinal is the successor of the preceding odd-indexed one); furthermore, Jackson’s theorem [102, 103] asserts that under \(\mathrm{AD}+\mathrm{DC}\),

$$\begin{aligned} \varvec{\delta }_{2n-1}^{1}=\aleph _{w(2n-1)+1}, \end{aligned}$$

where is defined via iterated (ordinal-) exponentiation inductively so that with \(w(1)=\omega \). A concerted effort to assess the consistency strength of the determinacy assumption for \(\varvec{\Pi }_{n+1}^{1}\) ultimately led to the result that this is implied by the existence of n Woodin cardinals below a measurable cardinal (see e.g. [113, 32.12]). A measure of the ‘consistency closeness’ at one end is the equiconsistency of with the existence of one Woodin [113, 32.17], and at the other the equiconsistency of the existence of \(\omega \) Woodin cardinals with AD holding in \(L(\mathbb {R})\)—see [130]. (Recall also the connection here, due to Harrington [90], with \(0^{\#}\) mentioned in Sect. 5.)

8 Shadows

Here we wrap up our survey of the set-theoretical domain. We have seen how combinatorial properties, some ‘high up’ in Cantor’s world, affect properties of the real line down below. When powerful axioms extend familiar properties in desirable ways one is led to ask whether one can get away with less and get if not the same outcome, then ‘almost’ the same (in some sense). To this end Mycielski and Tomkowicz [160] speak in very suggestive language of shadows of AC in their chosen setting of \(L(\mathbb {R})\), a model of set theory that resolves some of the hardest set-theory problems. Their quest is theorems of ZFC that have corollaries that are theorems of \(\mathrm{ZF}+\mathrm{AD}\)—see [160]. Recalling the Hajnal notation at the end of Sect. 6.2, in \(L(\mathbb {R})\) AD implies DC [117], and the present authors have come to view DC as a natural ally for analysis. (For reassurance, we may add that \(\omega _{1}\) is a regular cardinal, assuming AD.) We give our favourite example of this, and then, after a brief review of syntactical terminology in Sect. 9, we survey in Sect. 10 results which give further succour, if one is willing in the interests of plurality to conduct mathematics in an appropriate helpful (indeed playful, to borrow the term from [151] and [153], when games are enlisted) subuniverse.

An example with the Axiom of Dependent Choice DC in mind. We begin with an example concerned with real-valued sublinear functions on \( \mathbb {R}\) which ‘almost’ follow Banach’s enduring paradigmatic definition. They are subadditive, i.e. satisfying , but in one variant they are only \(\mathbb {N}\)-homogeneous in the sense that \( f(nx)=nf(x) \) for \(n=0,1,2,\dots \) (so \(\mathbb {Q}_{+}\)-homogeneous), for all x. In other variants the quantification over x may also be thinned—see [22]. In electing to study sublinear functions as possible realizations of norms, Berz ([13, 22]) showed, for measurable f, that the graph of f is conical—comprises two half lines through the origin; however, his argument relied on AC, in the usual form of Zorn’s Lemma, which he used in the context of \(\mathbb {R}\) over the field of scalars \(\mathbb {Q}\) . In spirit he follows Hamel’s construction of a discontinuous additive function [124, Section 4.2], and so ultimately this rests on transfinite induction of continuum length requiring continuum many selections. Our own proof [22] (cf. [23, 25]) of Berz’s theorem, taken in a wider context including Banach spaces, depends in effect on the Baire Category Theorem (BC), or the completeness of \(\mathbb {R}\) (in either of the distinct roles of ‘Cauchy-sequential’ and ‘Cauchy-filter’ completeness, the latter stronger in the absence of AC, see [74, Section 3] and also [64, Sections 2,7]): we rely on generalizations of the Kestelman–Borwein–Ditor Theorem (KBD) asserting that for any (category/measure theoretic) non-negligible set T and any null sequence \(z_{n}\rightarrow 0\), for quasi all \(t\in T\) the t-translate of some subsequence \(z_{n(m)}\) (dependent on t) embeds in T, i.e. \(t+z_{n(m)}\in T\). See [146] for a discussion of this ‘shift-compactness’ notion. KBD is a variant of BC. So the proof ultimately rests on elementary induction via the Axiom of Dependent Choice(s) DC (thus named in 1948 by Tarski [215, p. 96] and studied in [154], but anticipated in 1942 by Bernays [12, Axiom IV*, p. 86]—see [105, Section 8.1], [106, Chapter 5]); DC in turn is equivalent to BC by a result of Blair [28]. (For further results in this direction see also [84, 92, 180, 181, 224], and the textbook [91].)

The relevance of KBD in the setting of a Polish group comes from its various corollaries which include the Steinhaus–Weil interior-points theorem [26], the Open Mapping Theorem and its generalization to group actions: the Effros Theorem—see [145, 171,172,173]. For a target set T that is a dense \(\mathscr {G}_{\delta }\), embeddings which are performed simultaneously in any neighbourhood by a perfect subset of T of a fixed set Z (not necessarily a null sequence) into T characterize those sets Z that are strong measure zero—see [80].

We note that DC is equivalent to a statement about trees: a pruned tree has an infinite branch (for which see [118, 20.B]); so by its very nature DC is an ingredient in set-theory axiom systems which consider the extent to which Banach–Mazur-type games (with underlying tree structure) are determined. The latter in turn have been viewed as generalizations of Baire’s Theorem ever since Choquet [47]—cf. [118, 8C, D, E]. Inevitably, determinacy and the study of the relationship between category and measure go hand in hand.

9 The syntax of Analysis: Category/measure regularity versus practicality

The Baire/measurable property discussed at various points above is usually satisfied in mathematical practice. Indeed, any analytic subset of \(\mathbb {R }\) possesses these properties ([189, Part 1, Section 2.9], [118, 29.5]), hence so do all the sets in the \(\sigma \)-algebra that they generate (the C-sets, [118, Section 29.D], C for criblé as in Sect. 7.1—see [33, 34], cf. [20]). There is a broader class still. Recall first that an analytic set may be viewed as a projection of a planar Borel set P, so is definable as via the \(\varvec{\Sigma }_{1}^{1}\) formula ; here the notation \(\varvec{ \Sigma }_{1}^{1}\) indicates one quantifier block (the subscripted value) of existential quantification, ranging over reals (type 1 objects—the superscripted value). Use of the bold-face version of the symbol indicates the need to refer to arbitrary coding (by reals not necessarily in an effective manner, for which see [81, Section 1.5]) of the various open sets needed to construct P. (As in Sect. 6.1 and elsewhere above, an open set U is coded by the sequence of rational intervals contained in U.) Effective variants are rendered in light-face.

Consider a set A such that both A and \(\mathbb {R}\backslash A\) may be defined by a \(\varvec{\Sigma }_{2}^{1}\) formula, say respectively as and , where now, and similarly \({\Psi } \). This means that A is both \(\varvec{\Sigma }_{2}^{1}\) and \(\varvec{\Pi }_{2}^{1}\) (with \(\varvec{\Pi }\) indicating a leading universal quantifier block), and so is in the ambiguous class \(\varvec{\Delta }_{2}^{1}\). If in addition the equivalence

$$\begin{aligned} {\Phi } (x)\;\Longleftrightarrow \; \lnot \, {\Psi } (x) \end{aligned}$$

is provable in ZF, i.e. without reference to AC, then A is said to be provably\(\varvec{\Delta }_{2}^{1}\). (Here DC is allowed; indeed DC, or ACC, or the weaker principle in [113, p. 152], is needed, to move quantification over \(\mathbb {N}\) to the right of the real-number quantifiers—on this see again [113, p. 155].) It turns out that such sets have the Baire/measurable property—see [71], where these are generalized to the universally (=absolutely) measurable sets (cf. [22, Section 2]); the idea is ascribed to Solovay in [113, Chapter 3, Example 14.4]. How much further this may go depends on what axioms of set theory are admitted, a matter to which we presently turn.

Our interest in such matters derives from the Character Theorems of regular variation, noted in [19, Section 3] (revisited in [21, Section11]), which identify the logical complexity of the function

which is \(\varvec{\Delta }_{2}^{1}\) if the function h (more precisely, its graph) is Borel (and is \(\varvec{\Pi }_{2}^{1}\) if h is analytic, and \( \varvec{\Pi }_{3}^{1}\) if h is co-analytic). We argued in [19, Section 5] that \(\varvec{\Delta }_{2}^{1}\) is a natural setting in which to study regular variation.

10 Category-Measure duality

10.1 Practical axiomatic alternatives: LM, PB, AD, PD

While ZF is common ground in mathematics, AC is not, and alternatives to it are widely used, in which for example all sets are Lebesgue-measurable (usually abbreviated to LM) and all sets have the Baire property, sometimes abbreviated to PB (as distinct from BP to indicate individual ‘possession of the Baire property’). One such is DC above. As Solovay [204, p. 25] points out, this axiom is sufficient for the establishment of Lebesgue measure, i.e. including its translation invariance and countable additivity (“... positive results ... of measure theory ...”), and may be assumed together with LM. Another is the Axiom of Determinacy (AD) mentioned above and introduced by Mycielski and Steinhaus [158]; this implies LM, for which see [159], and PB, the latter a result, as mentioned in Sect. 7, due to Banach—see [118, 38.B]. Its introduction inspired remarkable and still current developments in set theory concerned with determinacy of ‘definable’ sets of reals (see [72] and particularly [162]) and consequent combinatorial properties (such as the partition relations) of the alephs (see [122]); again see Sect. 7. Others include the (weaker) Axiom of Projective Determinacy (PD) [118, Section 38.B], cf. Sect. 7, restricting the operation of AD to the smaller class of projective sets. (The independence and consistency of DC versus AD was established respectively in Solovay [205] and Kechris [118]—see also [119]; cf. [59, 169].)

10.2 LM versus PB

In 1983 Raisonnier and Stern [184, Theorem 2] (cf. [6, 7]), inspired by then current work of Shelah (circulating in manuscript since 1980) and earlier work of Solovay, showed that if every \(\varvec{\Sigma } _{2}^{1}\) set is Lebesgue measurable, then every \( \varvec{\Sigma } _{2}^{1}\) set has BP, whereas the converse fails—for the latter see [211]—cf. [8, Section 9.3] and [177]. This demonstrates that measurability is in fact the stronger notion—see [109, Section1] for a discussion of the consistency of analogues at level 3 and beyond—which is one reason why we regard category rather than measure as primary. For example, the category version of Berz’s theorem implies its measure version; see Note 1 at the end of Sect. 1 and also [22, 23, 25].

Note that the assumption of Gödel’s Axiom of Constructibility\( V=L\), viewed as a strengthening of AC, yields \(\varvec{\Delta }_{2}^{1}\) non-measurable subsets, so that the Fenstad–Normann result on the narrower class of provably \(\varvec{\Delta }_{2}^{1}\) sets mentioned in Sect. 9 marks the limit of such results in a purely ZF framework (at level 2).

10.3 Consistency and the role of large cardinals

While LM and PB are inconsistent with AC, such axioms can be consistent with DC. Justification with scant exception involves some form of large-cardinal assumption, which in turn, as in Sect. 4, calibrates relative consistency strengths—see [113, 123] (cf. [116, 130]). Thus Solovay [204] in 1970 was the first to show the consistency of \(\mathrm{ZF}+\mathrm{DC}+\mathrm{LM}+\mathrm{PB}\) with that of \(\mathrm{ZFC} +`{} \textit{there exists an inaccessible cardinal}\)’. The appearance of the inaccessible in this result is not altogether incongruous, given its emergence in results (from 1930 onwards) due to Banach [5] (under GCH), Ulam [219] (under AC), and Tarski [214], concerning the cardinalities of sets supporting a countably additive/finitely additive [0, 1]-valued/\(\{0,1\}\)-valued measure (cf. [29, 1.12 (x)], [76]). Later, in 1984, Shelah [194, 5.1] showed in \(\mathrm{ZF}+\mathrm{DC}\) that already the measurability of all \(\varvec{\Sigma } _{3}^{1}\) sets implies that \(\aleph _{1}^{V}\) is inaccessible in the sense of L (the symbol \(\aleph _{1}^{V}\) refers to the first uncountable ordinal of V, Cantor’s universe—cf. Sect. 2). As a consequence, Shelah [194, 5.1A] showed that \(\mathrm{ZF}+\mathrm{DC}+\mathrm{LM}\) is equiconsistent with \(\mathrm{ZF}+`{} \textit{there exists an inaccessible}\)’, whereas [194, 7.17] \(\mathrm{ZF}+\mathrm{DC}+\mathrm{PB}\) is equiconsistent with just ZFC (i.e. without reference to inaccessible cardinals), so driving another wedge between classical measure-category symmetries (see [109] for further, related ‘wedges’). The latter consistency theorem relies on the result [194, 7.16] that any model of ZFC \(+\) CH has a generic (forcing) extension satisfying ZF \(+\)every set of reals (first-order) defined using a real and an ordinal parameter has BP’. (Here ‘first-order’ restricts the range of any quantifiers, see Sect. 2). For a topological proof see Stern [211].

10.4 LM versus PB continued

Raisonnier [183, Theorem 5] (cf. [194, 5.1B]) has shown that in \(\mathrm{ZF}+\mathrm{DC}\) one can prove that if there is an uncountable well-ordered set of reals (in particular a subset of cardinality \(\aleph _{1}\)), then there is a non-measurable set of reals. (This motivates Judah and Spinas [110] to consider generalizations including the consistency of the \(\omega _{1}\)-variant of DC.) See also Judah and Rosłanowski [108] for a model (due to Shelah) in which \(\mathrm{ZF}+\mathrm{DC} +\mathrm{LM}+\lnot \,\mathrm{PB}\) holds, and also [195] where an inaccessible cardinal is used to show consistency of \(\mathrm{ZF}+\mathrm{LM}+\lnot \,\mathrm{PB}+`{} \textit{there is an uncountable set without a perfect}\)

\(\textit{subset}\)’. For a textbook treatment of much of this material see again [8].

Raisonnier [183, Theorem 3] notes the result, due to Shelah and Stern, that there is a model for \(\mathrm{ZF}+\mathrm{DC}+\mathrm{PB}+\aleph _{1}=\aleph _{1}^{L}+ `{} \textit{the ordinally definable subsets of reals are} \textit{measurable}\)’. So, in particular by Raisonnier’s result, there is a non-measurable set in this model. Shelah’s result indicates that the non-measurable set is either \({\Sigma } _{3}^{1}\) (light-face symbol: all open sets coded effectively) or \(\varvec{\Sigma } _{2}^{1}\) (bold-face); see the comments at the end of the introduction in [211]. Thus here \(\mathrm{PB}+\lnot \,\mathrm{LM}\) holds.

10.5 Regularity of reasonably definable sets

From the existence of suitably large cardinals flows a most remarkable result due to Shelah and Woodin [196] justifying the opening practical remark about BP, which is that every ‘reasonably definable’ set of reals is Lebesgue measurable: compare the commentary in [10] following their Theorem 5.3.2. This is a latter-day sweeping generalization of a theorem due to Solovay (cf. [203]) that, subject to large-cardinal assumptions, \(\varvec{\Sigma } _{2}^{1}\)sets are measurable (and so also have BP by [184]).

10.6 Category and measure: qualitative versus quantitative aspects

Most of the similarities between category and measure [175] can now be seen [24,25,26] to flow from density-topology aspects. As Oxtoby points out [175, p. 85], category-measure duality extends as far as qualitative aspects (0-1 laws) but not as far as quantitative aspects (strong law of large numbers etc.). The differences here can be dramatic. For example, the requirement on a series for it to converge almost surely when “random signs” are given to its terms is that it be \(\ell _{2}\) [111]; by contrast for convergence off a meagre set, the corresponding convergence criterion is (minimally!) \(\ell _{1}\) [112]. On occasion discrepancies can be engineered into re-alignment by refining the metric—see [39].

We have pointed out in Sect. 10.2 that measurability is in fact the stronger notion. Such distinctions give rise to two streams of literature. In one, pathology (strange counterexamples) is pursued: see e.g. [49]. In the other, comparisons are made between the various cardinal invariants associated with the \(\sigma \)-ideals of negligible sets; these ask questions, relative to given axioms of set theory, such as: how small may non-negligibles be (the non number), how small a family of negligibles has non-negligible union (the additivity number), how small such a family must be to cover the real line (the covering number), or how small if it is to be cofinal under inclusion (the cofinality number). Two further key ingredients are \(\mathfrak {b,}\) the bounding number, and \(\mathfrak {d,}\) the dominating number, corresponding to a smallest unbounded family and a smallest dominating family of functions in \(\omega ^{\omega }\) relative to domination mod-finite. (For the connection between the latter and maximal almost disjoint (mad) families of subsets of \(\omega \) see e.g. [65]; for the role of mad families in Ramsey properties of ultrafilters see [142], and for recent developments [218]). A result on the cardinal invariants, memorable for it symmetries, is summarized in the following Cichoń diagram, for which we refer to [8], and the very recent [32].

Here the arrows \(\rightarrow \) indicate \(\leqslant \).

11 Coda

We close with some comments about connections with other branches of mathematics than analysis.

We have briefly discussed algebra above (Note 2 of Sect. 1, and Sect. 3) and topology (Sects. 3, 10.3).

There has been much to say in Sect. 10 on the reals (canonical from some points of view but not from others). There is even still much to say on number theory (the integers are canonical from any point of view).

This is perhaps at its least surprising in transcendental number theory, as this concerns the irrationals, and so the reals. Here, Cohen ([53, p. 2412] and [54], Section 19.3]) mentions the Thue–Siegel–Roth theorem (see e.g. [89, Notes to Chapter XI] or Baker [4, Chapter 7]) as the first ‘truly non-constructive proof in number theory’, in the context of the controversies over intuitionism (see [114]). Again, Macintyre [135] discusses the logical implications of proofs of Schanuel’s conjecture in transcendental number theory. More surprisingly, this is still true in diophantine equations—a context ostensibly about the integers: Macintyre discusses the logical aspects of Wiles’s proof of Fermat’s Last Theorem [89, Chapter XXV] in some detail [136], Appendix].

Cohen [53], in his historical account of ‘Skolem and pessimism about proof in mathematics’, draws freely on number theory as a source of illustrative examples throughout. In his last paper (posthumous), Cohen [54], Section 19.6] continues this, writing on his interactions with Gödel. Woodin [227], 20.1.3]—‘Three problems and three formal theories’—again explores the links between set theory (in particular large-cardinal axioms) and number theory; cf. [227], 20.8].

To return to the algebraic characterization of the reals as ‘the’ complete archimedean ordered field: it is the ‘complete’ which hides the ‘modulo cardinality’ and ‘modulo which sets are available’ aspects. It is always good to look at familiar mathematics, and ask oneself the analogous question in that context, and so to seek out new ‘illuminating interdisciplinary’ connections.

Cassels [42] gives a number-theoretic treatment of local fields, arguing convincingly that these are as interesting as archimedean ones.

As working analysts ourselves, we feel for those of our colleagues new to these matters, who may look fondly back to an age of ‘bygone innocence’, when ‘one didn’t need to worry about such things’. We prefer instead to marvel at the unfathomable richness of mathematics. As usual, Shakespeare puts his finger on it somewhere:

figure a

So we have only mathematical ‘gut-feeling and belief’, as with Mickiewicz:

figure b

—‘Feeling and faith more forcefully persuade, Than the lens and the eye of a sage’.

Thus it is that we close with two ‘high-profile’ attitudes towards Solovay’s dictum that the continuum ‘can be anything it ought to be’, to both of which Woodin has contributed. On the one hand there is a putative L-like ‘ultimate inner model’ (leading to \(V=\mathrm{Ult}\text {-}L\)) [228], which permits adjunction of known large-cardinal axioms; under it the continuum is \(\aleph _{1}\). On the other hand is the argument, offered by Woodin in [226], close in spirit to the Forcing Axioms of Sect. 8, as it depends on closure under (set) forcing in the presence of large cardinals; under this the continuum is \(\aleph _{2}\). See [38].