The receptors of innate immune systems evolve slowly over time. Those that confer some fitness benefit will be naturally selected, and so become the common property of the succeeding generations. In contrast, the receptors of the so-called adaptive immune systems are generated somatically within each individual, by moving evolution from the level of the germline to that of somatic cells. As a result, each individual ends up with a repertoire of adaptive immune receptors that is as distinctive as are their fingerprints. Unlike the fingerprint, however, the repertoire of adaptive immunity in an individual is constantly changing. Adaptive immune systems come in two fundamentally different forms that differ both in the nature and in the source of the pathogen-sensing element. On the one hand are those immune systems that use nucleic acids as the pathogen sensors. In these cases the sensor is only formed after infection, and the key information needed to build it is derived from the pathogen. On the other hand are those “anticipatory” systems that use proteins as pathogen sensors. Here the sensors have been formed prior to infection with the pathogen, and the information used to form the sensors is entirely host derived.

Across the phylogenetic span from nematode worms to mammals, immune defence involves the use of a mix of “innate”, i.e. germ line encoded, and of “adaptive”, i.e. somatically encoded, receptors. Since different species inhabit different environments, it is clear that neither innate nor adaptive immunity can be viewed as an “off the peg” defence system. Though natural selection has ensured that certain features of innate and adaptive immunity are common between the fruit fly and mammals, there are also very considerable differences. In the same way, and for the same reasons, immune defence in the mouse is not identical to immune defence in humans.

4.1 Adaptive Systems that Use Nucleic Acid Sensors

The use of immune systems based on nucleic acid sensors preceded the division between germline and soma . Bacterial cells are germline and soma in one, yet they have evolved sophisticated adaptive systems of defence that go under the name of CRISPR -Cas.

4.1.1 Bacterial CRISPR-Cas

CRISPR -Cas systems provide bacteria with a means to protect themselves against viral infection. Since viruses evolve quickly, the selective pressure on the bacterial CRISPR-Cas systems has been enormous, so that now there are many different variants of the basic strategy. The essence of these systems is that a small part of a pathogenic viral genome is captured and inserted into the bacterial chromosome. The captured sequences can be transcribed and made available as “guide RNA” molecules that form a complex with a bacterial Cas nuclease. This complex of guide RNA and Cas protein functions as a weapon, in which the guide RNA identifies the complementary viral sequences, and the associated Cas nuclease then destroys the viral genome.

The CRISPR -Cas system is highly specific and enormously economic in terms of its requirement for space in the bacterial genome. The captured viral genomic fragment is small, and yet, once transcribed into a “guide RNA”, it is large enough to act as a sequence-specific probe providing precisely tailored defence against a particular virus. However, there is a problem. Once a viral sequence has been captured, then the capturing bacterium—and all of its descendants—must clearly distinguish between the copy of the sequence in the attacking virus and the copy that now resides in its own genome. The sequence in the virus must be attacked; the sequence in its own genome must be left untouched. This is achieved by looking at the chromosomal sequences adjacent to the captured viral sequence. In some cases co-transcription of bacterial sequences that flank the viral insert results in a guide RNA containing both viral and bacterial sequences. Pairing of this guide RNA to the bacterial sequence aborts Cas-mediated destruction of the bacterial chromosome. The viral genome, on the other hand, lacks the “tolerising” bacterial sequence and is destroyed. Alternatively, in certain CRISPR-Cas systems, a “Protospacer Adjacent Motif”, present in the viral genome, but not included in the sequence captured by the bacterium, allows the guide RNA-Cas complex to distinguish self from non-self .

4.1.2 Eukaryotic RNA Interference

The CRISPR -Cas systems are restricted to a large fraction of bacteria and archaea, but an alternative nucleic acid-based adaptive immune system is widely used in eukaryotes. This is based on the RNA interference machinery that is used in eukaryotes to help regulate gene expression. RNA interference involves small, so-called micro RNA (miRNA) genes, that are encoded in the genome, and which contain sequences complementary to short parts of the genes whose expression they will control. The transcripts are processed to yield short double-stranded miRNAs, and these are then incorporated into a large “RNA-Induced Silencing Complex” (RISC) in the cytoplasm. The miRNAs are analogous to the guide RNAs of the CRISPR systems in that they enable the RISC to specifically locate the target mRNA, which is then either destroyed or its translation is suppressed.

4.1.3 Evolution of Adaptive Immune Systems Based on RNA Interference

In the nematode worm Caenorhabditis elegans , this miRNA system has been adapted to provide defence against viruses. After infection of a cell, viral RNA is processed and loaded into a RISC complex that is then used to locate and destroy the virus. This adaptive defence system thus “borrows” genetic information from the viral pathogen and uses it to identify and destroy the intruder. Such a system works well within a single cell, but to be of real value in a multicellular animal, there has to be a little more. In particular, a way must be found to provide uninfected neighbouring cells with pre-emptive protection. In other words, this cell autonomous adaptive immune response must be converted into a systemic response. This achieved both in the nematode worm Caenorhabditis elegans [1] and in the fruit fly Drosophila melanogaster [2] by mechanisms that copy, and thus make large amounts of, the short viral RNA.

The formation of this “extra” RNA in the nematode and in the fly is one example of what is a recurring theme in the evolution of adaptive immunity. This is that two different species facing the same problem may reach similar solutions, but by quite different mechanisms.

Both the fly and the worm must convert a local RNA interference response into a systemic one, and in both cases this is achieved by amplification of the captured viral RNA and its export to other cells. However, in the case of the nematode, the amplification is by an RNA-dependent RNA polymerase, and the extra RNA produced in this way is then exported from the infected cell through special pores in the membrane (Fig. 4.1). Such pores, present on all of the cells, also permit entry of the RNA into cells at remote locations. The fly, in contrast, lacks both of these mechanisms, and instead amplifies the RNA using a reverse transcriptase borrowed from an endogenous retrovirus, after which the amplified product is released from the infected cells in vesicles, which can deliver it to uninfected cells at remote locations (Fig. 4.1). In both cases the RNA interference response in infected cells provides distant uninfected cells with pre-emptive protection against the virus.

Fig. 4.1
figure 1

One problem—two solutions. Both in the worm and in the fly a local RNA interference response must be converted into a systemic one. In the nematode (upper line) the RNA is amplified by an RNA-dependent RNA polymerase (RdRp), exported from the infected cell through pores, and imported into distant cells by pores. In the fly (lower line) amplification is by a reverse transcriptase (RT), the product is packed and exported from the infected cell in vesicles, which deliver the RNA to distant cells by membrane fusion

At least in the case of nematode worms, this adaptive response is one of those situations in which the accepted norms of evolutionary thought have to give way to the pressing necessities of immune defence. One widely held view is that the strict separation of germline and soma (Sect. 2.1) makes the Lamarckian idea of the inheritance of acquired characteristics impossible. However, in Caenorhabditis elegans the oocytes, like the somatic cells, are able to take up the exported RNA and, as a result, the worm’s generation of immunity to a virus can be transmitted as an acquired characteristic to its progeny [1].

In vertebrates endogenous miRNAs play important roles in controlling gene expression, but this sort of RISC-based system does not play a significant role in defence against viruses. This is something of a puzzle, for the adaptive defence systems in worms and insects based on RNA interference are very ancient, and evolution does not readily give up a good idea. The currently most widely accepted thought is that a good idea becomes dispensable if a better one comes along. The better idea seems to have been to replace the “adaptive” RNA interference response of invertebrates with the “non-adaptive” type 1 interferon response, which is used as an antiviral defence mechanism in the innate systems of fish, birds and mammals [3]. These type 1 interferons are cytokines that are induced as a result of the activation of many of the receptors of innate immunity. They cause changes in the expression of hundreds of genes, many of which have potent antiviral activity. The simultaneous induction of multiple antiviral functions makes it difficult for a viral pathogen to readily evolve escape mechanisms [4]. In mammals, the interferon system has been backed up with the emergence of the APOBEC3G cytidine deaminase , which is involved in providing innate defence specifically against retroviruses (see Sect. 3.6.1), and by the evolution of T-cell-based adaptive immunity (see Sect. 4.6).

4.2 Somatic Evolution of Immune Systems that Use Protein Sensors

Perhaps the simplest way to form a protein-based adaptive immune system would be to make use of the existing array of innate receptors and then, when under pathogen attack, to randomly mutate their genes in the hope of making a somewhat better receptor. Something along these lines may take place in the snail Biomphalaria glabrata . When attacked by parasitic worms the snail’s FREP receptor genes seem to pick up random somatic mutations , which may aid in fending off the parasite [5]. This way of doing things, however, was certainly never more than a footnote in the history of immunity, and effective protein-based adaptive immune systems had to await the evolution of vertebrates.

Vertebrate anticipatory adaptive immune systems based on protein receptors are radically different from the adaptive RNA interference system of invertebrates. The crucial difference is that in the RNA-based systems the information that provides the receptor with its specificity is “borrowed” from the pathogen, while in protein-based systems this information is derived solely from the host. RNA-based adaptive immunity, and the innate systems looked at in Chap. 3, can be fairly thought of as straightforward evolutionary answers to the challenge of detecting and destroying pathogens. Natural selection established these defence systems, over the course of hundreds of millions of years, by screening many billions of random mutations. At each generation the few advantageous changes were favoured, while those individuals expressing mutations that would lead to disaster were selected out. Protein-based adaptive immune systems, in contrast, face the problem that they must carry out this process of mutation and screening of the repertoire “all at once”. The somatically generated repertoire is made in a hurry and will contain not only useful receptors, but also many worthless or even autoimmune ones. In fact this repertoire will certainly be lethally autoimmune unless some way can be found to purge it of these deleterious specificities. Purging the repertoire by somatic “tolerance ” mechanisms is therefore crucial to the establishment of an anticipatory adaptive immune system. Only after all this has been done can the repertoire be used to deal with pathogens.

The difficulty for a protein-based immune system is that in order to be able to detect all possible pathogens, it must produce a truly vast number of different receptors. How can such a huge receptor repertoire be formed in the germline on a genome that has space for only around 20,000 genes? The simple answer is that it can’t be done—at least not directly. However, evolution has found a trick to solve this problem. The trick is that the antigen-binding receptor genes are not encoded as such in the germline, but instead are assembled in the developing lymphocytes by mixing and matching sets of pre-existing sequence modules at the DNA level. Useful receptors, made in this way, cannot be passed on to the next generation, because the receptor genes are not formed in cells of the germline. Instead the next generation inherits the collection of DNA modules, and the means to join them together, as a “do-it-yourself kit” with which it too can somatically build receptor genes.

This sort of deliberate induction of mutations in somatic cells is normally avoided, because it brings with it the risk of cell death—or even worse—of cell transformation. Despite this, lymphocytes make use of a potent, induced mutagenic process to form the antigen-binding domain of their immune receptors. Indeed the mutational process induced in these cells is so effective that every new lymphocyte that is generated carries a mutant receptor, and every one of these mutant receptors has a unique binding site. One great advantage of this way of doing things is that, since the mutation process directly alters the DNA sequences coding for the receptor’s binding site, only a small investment of genomic space is sufficient to allow for the generation of many different receptors.

There is, it must be said, one aspect of this story, which goes against our usual expectations of how evolution works. One tends to think of evolution as being slow and conservative—a matter of finding some sort of solution to a current problem, and then “tinkering ” with that solution till it is gradually improved. That is probably a reasonable picture of how things are done—most of the time. What is most unusual is for evolution to suddenly change course, or for a solution that works to be simply dropped, and replaced with something completely different. Two such exceptional events can be seen in the evolution of the antigen-specific receptors of adaptive immunity. In the first case a system of somatic receptor construction “suddenly” arose in early jawless vertebrates (agnathans) and gave rise to a huge repertoire of receptors composed of Leucine Rich Repeat (LRR) domains. The second abrupt change of direction occurred when this agnathan system was subjected to massive re-organisation and emerged as something rather different in the jawed vertebrates (gnathostomes), where the receptors are all composed of “Immunoglobulin Superfamily” (IgSF) domains (see Appendix D).

4.2.1 Two Different Protein-Based Adaptive Immune Systems

There are two systems of protein-based adaptive immunity in vertebrates. The first is found in the agnathans, whose surviving members are the lampreys and hagfish. The second is found in the gnathostomes, which comprise the vast majority of all vertebrate species . In this chapter we will look at the evolution of function in these two adaptive systems. One must, however, bear in mind that while adaptive immunity in gnathostomes has been studied for well over a century, the analysis of the nature of the agnathan adaptive immune system receptors only began with a seminal publication in 2004 from Max Cooper’s laboratory [6]. Not surprisingly, while there is a mass of detailed knowledge on gnathostome adaptive immunity, very little, by comparison, is known about the workings of the system in agnathans.

4.2.2 Lymphocytes and Their Receptors

The lymphocytes that somatically form immune receptors in vertebrates can be divided into two broad types. The first consists of those that, once activated by contact with a pathogen, will secrete their receptors in the form of highly specific “antibodies”. These are the familiar B-cells of jawed vertebrates and the VLRB cells of agnathans. The second type of lymphocyte —best known here are the αβ T-cells of jawed vertebrates and the VLRA and VLRC cells of agnathans—is characterised by the fact that they do not secrete their antigen-specific receptor. Since these two different lymphocyte forms have different functions, it is not surprising that different selective forces act on the formation of their receptor repertoires. However, some general aspects of receptor structure and generation are common to them all.

Any antigen receptor expressed on the surface of a lymphocyte has to be structurally robust, so that it can survive in the frequently difficult extracellular environment. At the same time, the ligand-binding part of the structure must be able to accommodate many variations in its sequence so that many different versions of the receptor can be formed. In reality these criteria are not terribly restrictive, and a number of different protein domains would fit the bill. The LRR domain used in agnathan immune receptors, and the IgSF domain used in gnathostomes are found throughout animal phylogeny, and both frequently occur in proteins expressed on cell surfaces. Irrespective of whether the receptor is of the LRR or of the IgSF type, the ligand-binding sites of all adaptive immune receptors in vertebrates are formed somatically during lymphocyte differentiation, by mixing and matching DNA elements from a library of sequence modules in the genome. The advantage of this way of doing things is clear: if the sequence coding for the antigen-binding site of a receptor is built up by bringing together elements from a collection of short sequence modules, then the number of different receptors that can be made greatly exceeds the number of modules. If this were just a numbers game, then there would be no difficulty at all in arranging for the construction of thousands of billions of different receptor-binding sites, for all that would be required would be lots of alternative module sequences. 104 alternative modules would do the job easily—and this would not take up an inordinate amount of space in the genome. However, each of these modules would have to be continually subject to purifying natural selection, otherwise random mutation would quickly destroy their coding capacity. Sadly, the simple truth is that the larger a family of functionally similar sequences becomes, the more difficult it is for natural selection to maintain all of the members intact. With the available selection pressure spread over an increasing number of sequences, random mutations leading to stop codons, frame shifts or disruptive amino acid replacements will inevitably accumulate, and this sets strict limits on the number of modules that can be maintained.

So now we are back to square one. If the number of useful modules is restricted in this way, how can a large number of receptor sequences be generated? The answer is that the problem only arises if the module-joining process is precise, so that joining module 1 to module 2 always yields exactly the same result. If, however, the recombination process were to be sloppy, then joining module 1 to module 2 may yield any one of many different results. In both of the protein-based adaptive systems that we know of, the mechanism to mix and match modules is extremely sloppy, and because of this, the repertoire of different antigen-receptor-binding sites that can be formed is enormous.

4.3 Somatic Formation of the Agnathan Adaptive Immune Receptors

Agnathan immune receptors—the so-called variable lymphocyte receptors (VLRs)—are constructed out of LRR domains. In most cells the genes for these receptors are massively incomplete and consist just of the sequences coding for the beginning and end of the polypeptide chain. The central parts of the gene are only formed in developing lymphocytes, which copy in LRR modules from a collection of flanking sequences (Fig. 4.2). Simply copying in members of the flanking collection of LRR modules would not be enough, for the required large receptor repertoire can only be made if the copying process is imprecise. In agnathans the process is almost endlessly imprecise, because the flanking LRR sequences are not always treated as individual elements. Instead, copying can start in one module and then, somewhere in the middle, jump to another one. In this way, an essentially unlimited number of “hybrid” LRR sequences can be formed from a restricted library of flanking LRR sequences. In agnathans each lymphocyte is limited to expressing one receptor specificity. It is not known how this is imposed, but an analogous phenomenon exists also in gnathostome adaptive immunity (see Sect. 4.5.4).

Fig. 4.2
figure 2

Agnathan adaptive immune receptor gene rearrangement. (a) Germline configuration of a lamprey variable lymphocyte receptor gene. The gene codes only for the beginning—N-terminal LRR (LRR-NT)—and the end—C-terminal LRR (LRR-CT) of the receptor (shaded triangles). These are separated by a “spacer” sequence. This almost “empty” gene is flanked on each side by many sequences coding for leucine rich repeat (LRR) domains. (b) Flanking LRR sequences are copied by a gene conversion mechanism into the empty gene where they replace the spacer sequence. The gene conversion process is believed to be initiated by a cytidine deaminase enzyme similar to the mammalian “Activation-Induced Deaminase” (AID) . The copying mechanism permits the formation of “hybrid” LRR domains copies. The number of flanking sequences remains unaltered

4.3.1 The Key Enzyme: An AID-Like Cytidine Deaminase

Central to the formation of the lamprey adaptive immune system receptor repertoire, is the ability to somatically construct hybrid LRR receptors from a limited set of germline encoded LRR modules. There are three loci containing genes for the receptors in the agnathan genome. Each individual lymphocyte uses only one of these genes, and so three different types of lymphocyte are formed. These lymphocytes express either the “Variable Lymphocyte Receptor A” (VLRA), or VLRB or VLRC.

Any mechanism to mix and match DNA modules requires the ability to cut DNA. In the agnathan system, the cutting process is initiated by an enzyme with cytidine deaminase activity. Cytidine deaminases of the AID/APOBEC family (Sect. 3.6.1) arose early in vertebrate evolution. Those members of the family that are involved in adaptive immunity remove the amino group from dC residues in the single-stranded stretches of DNA, which may be formed as a result of the supercoiling stress induced during transcription that locally “melts” the double helix. The deamination converts dC to dU, a base that is not accepted as a normal component of DNA by the cell’s DNA repair mechanisms [7]. The repair systems recognise these dU bases as “errors”, and set about removing them. There are a number of different repair systems available, and the result of the repair of the “error” depends on which of them operates. Little is known as yet about the mechanisms that regulate the formation and the repair of dU in DNA in agnathans, but the result of their removal is a single-stranded break in the DNA—and this is the first essential step in the gene conversion process.

4.3.2 Structure of the Receptor Molecules

VLR molecules, like all LRR proteins, form horseshoe-shaped structures similar to those formed by TLRs (Sect. 3.3), though the VLR molecules have fewer LRR domains and consequently form a shallower crescent-shaped structure (Fig. 4.3). Strikingly, they bind antigens in the inner, concave, face of the molecule, rather than on the outer face as do the TLRs. This type of structure provides a large area of interaction with a ligand, and results in an affinity that is comparable to that between a gnathostome antibody and its antigen.

Fig. 4.3
figure 3

Diagrammatic outline of a VLRB molecule. The LRR domains (LRR) and the connecting peptide (CP) (open ovals) form a curved structure, which is bounded at the N-terminal end by the “capping” domain LRR-NT, and at the C-terminal end by LRR-CT. Antigen is bound on the concave face and the LRR-CT also plays an important role in this interaction. The VLRB molecule is attached to the cell surface by a glycophosphatidylinositol (GPI) anchor [8]

The functions of the VLRA and VLRC bearing cells are at the moment unclear. However, the VLRB expressing cells, once activated, secrete their receptor, and since the secreted form of the VLRB molecule is a complex of eight or ten such units, the effective affinity (avidity) of the complex is substantial and may rival that of a mammalian pentameric IgM molecule.

4.3.3 Tolerance

“There’s no such thing as a free lunch”, said Milton Friedman, and this slogan is as true in immunology as it is in economics. A vast receptor repertoire, which “sees” everything, is splendid—in terms of defence against pathogens, but on the other hand such a repertoire “sees” far too much, for it will undoubtedly detect all the myriad components of the organism’s own body. The simple—and correct—conclusion is that any usefully large repertoire, constructed by a process of random mutation, is bound to be lethally autoimmune . An adaptive immune system therefore has to co-evolve along with some powerful mechanisms that remove or suppress the receptors with autoimmune specificity. Though such “tolerance mechanisms” must of necessity exist in the agnathans, nothing is as yet known about how they work.

4.4 Adaptive Immune Receptors in Gnathostomes

In jawed vertebrates lymphocytes also express antigen-specific receptors and these are also somatically formed by DNA rearrangement during lymphocyte development. This sounds like a rehash of the agnathan story—but it is not, for in jawed vertebrates the antigen-specific receptors are composed of a different protein domain, and a different mechanism is used to mix and match the DNA modules that code for them.

Like agnathans, gnathostomes also have three major categories of lymphocytes expressing adaptive system receptors. The first are the “B-cells” which express an antigen-specific B-Cell Receptor (BCR) , and which, once activated, may be induced to secrete the receptor in the form of soluble antibody . These B-cells of gnathostomes are thus reminiscent of the VLRB cells in agnathans. The second major category consists of the “CD4 + T-cells”, and the third of the “CD8 + T-cells”. These latter two cell types express a somatically generated antigen-specific T-cell receptor (TCR) that is not released in soluble form. These T-cells are, in this sense, reminiscent of the VLRA and VLRC cells of agnathans. All of these antigen-binding receptors of gnathostomes are composed of “IgSF ” domains (Fig. 4.4; Appendix D).

Fig. 4.4
figure 4

Immunoglobulin super family domains in an antibody . Left: schematic diagram of a gnathostome IgG molecule. The molecule is composed of four chains, two identical heavy (H) chains and two identical light (L) chains. Each H consists of one N-terminal “variable” (V-type) domain (VH) followed by “constant” (C1-type) domains. The L chain consists of one V-type domain (VL) followed by one C1-type domain. The antigen-binding part of the molecule is formed by the association of VL and VH. Right: a ribbon diagram of the structure of a V-domain. The domain is formed of two sheets of β-strands and is reinforced by a disulphide bond formed between strands B and F. Loops linking strands B and C, C′ and C″, and F and G form the “Complementarity Determining Regions ” (CDR1, CDR2 and CDR3) that make contact with the antigen (see also Fig. 4.7)

4.4.1 The Immunoglobulin Super Family Domain

Many proteins contain domains that are members of the IgSF . These domains fall into four structural sets: the variable set (V), the intermediate set (I) and the constant sets (C1 and C2). V, C2 and I type domains are found in practically all metazoans, while C1 domains are found only in gnathostomes (see Appendix D). In the antigen-specific receptors of gnathostome B-cells and T-cells, the N-terminal V-type domain is always associated with one or more membrane-proximal C1-type domains. The antigen-binding part of these receptors are all derived from one particular V-type domain, which emerged as a crippled casualty from an ancient battle fought out some 500 million years ago between the precursor of all jawed vertebrates and a DNA transposon.

4.4.2 The “Transib” Transposon Contributed to the Structure of All Antigen-Specific Receptors in Jawed Vertebrates

Transposons are short stretches of parasitic DNA that consist, at a minimum, of terminal sequences that flank a gene coding for a “transposase”. The transposase is a site-specific recombinase that recognises the terminal sequences and can catalyse the precise excision of the transposon out of a genome, and its insertion into a new position. Since a transposon can use this “cut and paste” mechanism to hop around, there is always a risk that sooner or later it will hop into, and disrupt, an essential gene. Because of this, there is considerable selective pressure to make sure that transposons are quickly immobilised, either by inactivation of the transposase or by destruction of the terminal repeats. There are currently no active transposons in humans, though around 3% of the genome is composed of their inactivated remains. One such inactivated transposon belonged to a member of the Transib (Trans Siberian) transposon family [9].

The adaptive immune receptors of jawed vertebrates were born when a Transib transposon, (Fig. 4.5) [11], inserted into an exon coding for a V-type IgSF domain. In the aftermath, the recombinase genes and the terminal repeats from the transposon were hijacked, and retooled to produce the lymphocyte receptors. What seems to have happened is that the coding sequences for the transposase genes were removed, and placed elsewhere in the genome, where they sit under the tight control of a lymphocyte-specific promoter. These genes are now known as RAG-1 and RAG-2. The terminal inverted repeats, however, were left in place within the V-domain. In such a situation expression of the recombinase genes will cause precise excision of the terminal inverted repeats and of the sequences lying between them.

Fig. 4.5
figure 5

The Transib transposon. A Transib transposon integrated into a chromosome. The transposon consists of genes coding for the transposase genes ProtoRAG-1 and ProtoRAG-2 flanked by terminal inverted repeats [10]. The ProtoRAG recombinase excises the transposon precisely by cutting the DNA at the ends of the terminal repeats

However, it must be admitted that a very great deal has happened in the last 500 million years, and the chromosomal segments, which contain the genes for adaptive immune receptors, are the most complex, segmentally duplicated regions in the human genome [12]. Many steps of duplication and insertion, involving the terminal inverted repeats, were required to form the locus, and at every step natural selection would have determined whether the changes were worth preserving. It will be clear that going from the initial insertion of the transposon to a functional receptor locus would have required a considerable period of time. It is unclear how adaptive immunity was organised over this transitional period (see also Sect. 4.16.7).

The two major lineages of lymphocytes of adaptive immunity in jawed vertebrates—the B-cells and the T-cells—express quite different receptor structures, which are coded for at different loci in the genome, but all of the antigen-binding sites are derived from the original V-type domain, into whose gene the Transib transposon inserted.

The process by which the gene segments coding for these V-domains are formed during lymphocyte development is radically different from that used to build the agnathan adaptive immune receptors. In the lamprey , information is copied by gene conversion from the flanking sequences into the empty receptor gene. In gnathostomes the information is physically moved from one position to another by a recombination process catalysed by the RAG genes (Fig. 4.6). However, just as in the agnathan system, the mere recombination of a limited set of modules is not sufficient to produce a huge repertoire of different receptors and in the gnathostomes the recombination process has been modulated so as to introduce the necessary degree of sloppiness into the joining of the V, (D), and J modules.

Fig. 4.6
figure 6

Generation of gnathostome adaptive immune receptors from V, D and J modules. (a) Germline configuration. The example shown illustrates in schematic form the situation in the heavy chain gene of the BCR and in the β chain of the TCR . The germline gene contains separate V, D and J modules each of which is flanked by Transib -derived terminal repeats. Genes coding for the BCR light chain or for the α chain of the TCR are similar but lack the D module. (b) The rearranged gene in this example is formed by RAG-mediated recombination of one D with one J module, followed by recombination with one V module

4.4.3 The Gnathostome Immune Receptor Antigen-Binding Sites

The V-type IgSF protein domain, encoded by the gene that was invaded by the Transib transposon, consists of a stable central structure from which extend three loops (Fig. 4.4) [13]. As outlined in Fig. 4.7 these loops form the site at which the receptor physically interacts with the accessible surface of an antigen. Since the loops, or parts of them, must form a structure complementary to the surface of the antigen to which they will bind, they are known as the “Complementarity Determining Regions ” (CDRs). With rare exceptions, the antigen-binding sites of gnathostome adaptive immune receptors are constructed from two separate V-type IgSF domains. The resulting heterodimer forms a binding site that bears a total of six CDRs (Fig. 4.7). The use of heterodimers to form the binding site substantially increases the repertoire size, for 10 monomers of each chain can, in principle, form 100 different receptor specificities.

Fig. 4.7
figure 7

The antigen-binding V-type Immunoglobulin Super Family (IgSF ) domain of gnathostome adaptive immune receptors: (a) The V-type IgSF domain has a stable backbone of conserved sequence that supports three variable sequence loops. These form the three Complementarity Determining Regions (CDRs), which make contact with the antigen (stippled rectangle). The parts of the receptor derived from the “V” module only (Fig. 4.6) are shown in white (CDR1 and CDR2). The part of the molecule derived from the rearranged sections of the “V”, (“D”) and “J” modules is shaded (CDR3). (b) With few exceptions, the antigen-binding sites of gnathostome adaptive immune receptors are composed of two different chains. In the BCR these are the heavy (H) and light (L) chains, while in the TCR they are the α and β chains. Thus both BCR and TCR have six different CDRs involved in ligand binding

4.5 RAG and Its Limitations

A glance at the chapter on V, (D), J recombination in any immunology textbook will quickly convince you that many proteins other than the RAG recombinases are involved in a process that is seemingly bizarrely complex. Why should this be so? The answer is that precise joining of the handful of V, D and J modules available would yield—at best—a modest repertoire of receptors. A protein-based immune system, however, is of little or no selective value, unless the receptor repertoire produced is vast. RAG, which we “inherited” from the Transib transposon, is a site-specific recombinase, and so the complicated molecular gymnastics, described in the RAG recombination chapter of your immunology textbook, are there for the simple purpose of converting a precise site-specific recombination process into a sloppy one. The result is very sloppy indeed: DNA is cut, ends are nibbled, single-stranded overhangs are filled in or chopped off, bases are moved from one strand to the other, or are even randomly added on to the broken DNA ends. It is a mess, but it is a mess that allows for a very large repertoire of products to be formed.

4.5.1 Somewhere Between Lots of Receptors and Lots of Junk

The sloppy RAG recombination system is there to make lots of different receptor structures. This it does, but it is important to keep in mind that the RAG recombination system is quite astonishingly wasteful. A large part of the problem is that the genetic code is a triplet code, meaning that three bases in DNA sequence will be translated into one amino acid in the sequence of a protein chain. Yet RAG recombination, with its nibbled ends, filled in gaps and its anarchistic addition of random nucleotides, is unable to count bases. Whether multiples of one, two or three bases have been added in, or chopped out, during the recombination process, is simply a matter of chance. Yet all adaptive receptor protein chains consist of a V-type domain, which is followed by at least one essential C-1-type IgSF domain (Fig. 4.4). If the reading frame is shifted during RAG-mediated recombination, then the structure of the downstream C-1 domain will be destroyed. On this score alone 66% of the products of each RAG recombination event are going to end up being molecular junk. But that is not the end, for even if the reading frame is maintained a chaotic mutational process may generate stop codons, or sequences coding for amino acids that disrupt the three-dimensional structure of the protein chain. The bottom line is that something like 70% of each of the recombination events required to form a gnathostome V-type domain will end in catastrophe. Furthermore, with few exceptions these gnathostome adaptive receptors are heterodimers. Thus the 30% of lymphocytes emerging from RAG recombination with the first chain intact must now rearrange the second chain. Once again 70% of these rearrangements of the second chain will result in junk. This is a degree of waste, which natural selection must have viewed critically.

4.5.2 RAG Recombination: A Half-Hearted Attempt to Build an All-Encompassing Adaptive Repertoire

RAG recombination does make lots of different receptors by messing about with the DNA coding for CDR3, but it leaves CDR1 and CDR2 untouched (Fig. 4.7). Why this hesitation? The answer is that if CDR1 and CDR2 were also to be mutated in the same way, then all of the problems associated with rearrangement at CDR 3 would be multiplied by three, and essentially all of the rearranged receptor genes would be junk. This must be avoided at all costs, and the price, which must be paid, is the reduction of the completeness of the RAG-generated repertoire.

4.5.3 Ligand Binding and “Tolerance” in the B-Cell Lineage

The “sloppiness” of the rearrangement process, which is the secret of adaptive immunity’s success, both in agnathans and in gnathostomes, automatically forms a receptor repertoire that will contain numerous autoimmune specificities. These must be removed, one way or another, by “tolerance ” mechanisms—and the jawed vertebrates’ concern with tolerance can be fairly described as obsessive. Not surprisingly, the study of tolerance mechanisms currently forms the forefront of immunological research, because understanding how tolerance works will provide the keys to manipulate immunity in situations as varied as vaccination, autoimmunity, cancer and the treatment of infectious diseases.

Given the random generation process, the BCR receptor repertoire can “see” essentially any and every molecular structure. In particular it can “see” all of the structures in our own body. Experimental analysis of the specificities of newly formed BCRs suggests that approximately 50% of them are directed against “self” structures [14], and that many of these receptors are removed by a “central tolerance ” system, operating on the newly formed B-cells in the bone marrow and spleen [15]. Central tolerance deletes the autoimmune specificities by killing the newly formed cells that express them.

The requirement for central tolerance has enormous consequences for the way adaptive immune receptors are expressed. Innate cells like macrophages express many different types of innate system receptors. In contrast, each B-cell expresses a single type of receptor molecule, which consists of two copies of one “heavy” (H) chain and two copies of one “light” (L) chain (Fig. 4.4). Suppose that a B-cell were to express two different heavy chains (H1 and H2) and two different light chains (L1 and L2), then it could express four different combinations (H1L1, H2L2, H1L2 and H2L1), and hence four different receptor specificities. You might think that this would be a good idea, for such a B-cell could now perhaps recognise four different pathogens, but in reality it would be terrible. The reason is very simple: if 50% of B-cells that express one single receptor must be destroyed by central tolerance because they are autoimmune , then 75% of any, which expressed 2 different receptors, 88% of those expressing three receptor specificities and 94% of those expressing four specificities would have to be destroyed. By permitting each lymphocyte to express only one receptor, the system maximises the number of B-cells that will survive the central tolerance processes.

4.5.4 Allelic Exclusion

Limiting each lymphocyte to just one receptor is achieved by the unusual genetic process of “allelic exclusion”. In eukaryotes, there are two copies of each autosomal gene—one maternal and the other paternal. In general, both alleles are expressed at an approximately equal level, but there are a few situations in which pressing selective forces drive the preferential use of either the maternal or the paternal copy. This sort of “allelic exclusion” phenomenon is seen, for example, in mice and humans in the expression of imprinted genes involved in resolving “parental conflict” [16]. It also occurs, though by a different mechanism, in the expression of odour receptor genes in the nasal neuroepithelium, which require monoallelic expression to achieve correct wiring with the CNS [17]. The recombination of lymphocyte receptor genes is a third example. RAG rearrangement is completed first on just one of the two alleles and, if this is successful, then rearrangement on the second allele is blocked. If, however, the rearrangement on the first allele fails to produce a well-formed receptor chain, then the cell is given a second chance by being allowed to try again on the second allele. The result is that, in general, each B lymphocyte expresses only one single receptor specificity.

4.6 Ligand-Binding in the T-Cell Lineages

The major T-cell lineages express heterodimeric “αβ” TCRs. These αβ T-cells differentiate from precursors that are born in the bone marrow, and then migrate to the thymus where the RAG-dependent somatic generation of the TCR takes place. Gene segments, homologous to the V, (D) and J modules of the BCR , encode the ligand-binding domains of the heterodimeric TCR. The overall structure of the TCR’s antigen-binding domains is similar to that of the BCR. Each of the antigen-binding domains contains three CDR loops that make contact with the ligand (Fig. 4.7). Despite the fact that only the CDR3 loop is mutated during RAG recombination , a huge repertoire of different TCRs is generated. As with the BCR, the rearrangement of the TCR is subject to allelic exclusion so that each T-cell normally expresses only one receptor specificity.

These TCRs are quite different from the receptors of B-cells in terms of the sorts of ligands that they can bind. A BCR may be complementary to a small molecule, or it may recognise part of a large molecule such as a protein, or polysaccharide, or it may even recognise part of a multi-molecular complex, such as a virus, a bacterium or a eukaryotic cell. In contrast, the vast majority of αβ TCRs recognise short peptides. These peptides have to be first formed inside some other cell, then complexed with peptide-binding proteins, before being displayed on the cell’s surface. What is the point of this? At least part of the answer has to do with the fact that some bacterial, and all viral pathogens, live and replicate inside cells where the antibodies produced by B-cells cannot reach them. One principle function of T-cells is to combat such intracellular infections—but therein lies a major problem. The T-cell’s antigen-specific receptor is bound on its surface membrane, pointing out into the extracellular space. If these receptors are to be of use in detecting and combatting pathogens lurking inside other cells, then they must be able to monitor and unobtrusively “see” inside each of the approximately 1014 cells in an adult human being. How can this be done?

This problem is similar to that faced by the world’s intelligence agencies, all of whom wish to know what each of us is doing. They cannot regularly interrogate all of the roughly 8 × 109 human beings on the planet, and so, as Edward Snowden showed, they use unobtrusive means to come by the information they require. Mobile phones, laptops, credit cards and car navigation systems are the commonplaces of modern life, and each time one of these devices is used, the information is swept up by the security services, and subjected to sophisticated data analysis to decide what, if anything, needs to be done to whom. This is the strategy used by T-cells. In place of the data from electronic devices, they make use of information extracted from the pattern of proteins that are present within a cell. Know that pattern, and you know, with considerable precision, the intimate details of what is going on inside that cell. In a nutshell, this is done by digesting a small fraction of the proteins within a cell into short peptides, which are bound by peptide carrier proteins, and the resulting peptide–peptide carrier complexes are then expressed on the cell surface, where they may be “seen” by circulating T-cells.

4.6.1 The MHC Complex and the Peptide Carrier Molecules

Winston Churchill’s phrase “a riddle, wrapped in a mystery, inside an enigma” was not directed at the Major Histocompatibility (MHC) complex, but it does provide a fair description of it. Two rounds of whole genome duplication early in the evolution of vertebrates have left four identifiable copies of this region in humans with one copy normally present on each of chromosomes 1, 6, 9 and 19. The bona fide human MHC complex, containing the genes for the peptide-binding proteins, is the copy on chromosome 6 [18]. Long before the evolution of adaptive immunity, this ancient section of the genome contained a number of genes involved in innate immunity, stress responses and protein degradation [19]. The riddle of the MHC concerns this “primordial immune complex” [20], for though there are variations in gene content, relative position and chromosomal location throughout phylogeny, many of the genes within it have indeed been kept together for a very long time. Is there a selective advantage in this maintenance of gene linkage? One explanation has been that it was all a matter of chance, for once genes have arisen on a chromosome they will stick together until some chromosomal rearrangement like an inversion or translocation separates them. Since such events are rare, genes and gene order (synteny) can often be traced across vast tracts of phylogeny. Alternatively, the genes may have been kept together for some functional reason. For example, since different chromosomal domains have different degrees of accessibility for the transcriptional apparatus, keeping genes that code for related processes together may be of selective value if it helps ensure their concerted expression [21]. One further possible mechanism is based on the interactions of polymorphic genes. Imagine two genes, “A” and “B”, whose protein products must interact. Let’s assume that both of these genes are polymorphic, so that different individuals in the species have different versions of “A” and different versions of “B”. If for each of the polymorphic “A” genes there is an ideal partner “B”, then there will be a clear advantage in keeping each ideal “A + B” pair tightly linked in the chromosome [22].

Beyond this riddle of the “primordial immune complex” lies the mystery surrounding the forces that, early in jawed vertebrate evolution, drove the selection of the genes coding for the peptide-binding MHC proteins. A hypothetical structure for the resulting early gnathostome “Ur-MHC” complex is shown in Fig. 4.8. Numerous speculative scenarios have been developed, but as yet there is no consensus as to where these genes came from, and how their evolution was matched to that of the TCR . Not surprisingly, given these imponderables it remains an enigma whether the presence of the genes coding for these peptide-binding molecules in this particular locus is evolutionarily significant or merely a case of “they had to go somewhere”.

Fig. 4.8
figure 8

A hypothetical scheme of the “Ur-MHC ” of gnathostomes. The genes can be grouped into four categories: the Class-I region, which includes the Class-Ia and Class-Ib genes together with the LMP genes coding for proteasome subunits, and the TAP (transporter associated with peptide antigen processing) genes involved in the generation and transport of the peptide ligands for Class-I (red elements); the Class-II region containing the genes for the Class-II molecules (orange elements); the Class-III region that contains a number of genes involved in innate immunity (blue elements), such as the complement components C2, C4 and B, and cytokines of the tumour necrosis superfamily (TNFS). In addition, other genes including NOTCH and Tenascin X (green elements) are encoded in the MHC complex. Little of this hypothetical structure now remains unchanged across all gnathostome groups. Class-I and Class-II genes are not necessarily linked, β2 microglobulin is present in the MHC of modern sharks, but not elsewhere, and the Class-II genes themselves are, under certain circumstances, dispensible, as in the Atlantic cod

Though little is known about the early evolution of the peptide-binding MHC molecules, a great deal is known about how they function in vertebrate adaptive immunity. These molecules come in two forms—Class-I and Class-II —which differ not only in structure (Fig. 4.9), but also in the nature of the peptides that they bind. Class-I molecules bind peptides, which are usually nine amino acids long. Class-II molecules bind longer peptides, but even here the binding energy is derived largely from the nine amino acids that fit into the binding groove. The number of possible peptide ligands for these MHC molecules is large. Since there are 20 amino acids, the number of different nonamer peptides that could, in principle, be produced is 209, which is roughly 5 × 1011. Though not all of these peptides will actually be formed, because the proteases responsible for making them do have preferences as to where to cut a protein chain, yet even so the array of peptide available to bind to MHC molecules, is large.

Fig. 4.9
figure 9

Class-I and Class-II MHC molecules. Class-I MHC molecules (left) consist of three domains: the α1 and α2 domains, which together form the peptide-binding groove, and the C1 type IgSF domain α3. The α3 domain interacts non-covalently with beta-2 microglobulin, and this interaction is important for the stability of the complex. The molecule is bound to the cell membrane through a transmembrane domain. The Class-II MHC molecule (right) is composed of two chains—α and β—each of which is built of two domains. The α1 and β1 domains together form the peptide-binding groove. The Class-I and Class-II molecules have remarkable structural similarities. The peptide-binding grooves of both are similar, and both have an IgSF domain of the C1 type that associates with the membrane. The homology between the two classes is indicated by similarity of shape and shading. The Class-I α1 domain is homologous to Class-II α1. The Class-I α2 domain is homologous to Class-II β1. Class-I α3 is homologous to Class-II α2 and β2 as well as to β2 microglobulin

4.6.2 Why Two Types of MHC-Peptide Carrier Molecules?

Every nucleated cell in the body digests a fraction of its proteins in a cell organelle called the proteasome. Under inflammatory conditions the proteasome’s structure is altered so that it more efficiently produces peptides that associate with the MHC -Class-I peptide carriers and the resulting complexes are displayed on the cell surface [23]. This spectrum of peptide-MHC-Class-I complexes on the cell surface thus provides a snapshot of what is currently going on inside that cell. Remember, this is happening in every nucleated cell in the body, so if you think that this sounds like a very large investment of energy and metabolites, then you are right. It is.

Peptide-MHC -Class-II complexes, in contrast, are not expressed on all cells, but rather are restricted to the so-called Antigen Presenting Cells (APCs) such as the phagocytic dendritic cells or macrophages, or the B-cells that internalise antigens via their BCR . These APCs take up material from their surroundings into endosomes where the ingested proteins are digested to peptides. These peptides may then associate in the endosome with MHC-Class-II molecules, and the resulting complexes are displayed on the APC’s surface. These peptide-MHC-Class-II complexes thus provide a snapshot of what an APC has recently eaten.

Each individual human can express 3 to 6 different MHC -Class-I and 4 to 8 different MHC-Class-II molecules. Unlike the receptors of innate immunity or the adaptive immune receptors of B-cells and T-cells, all of which are highly specific for particular ligands, the MHC molecules have a rather relaxed binding specificity. Thus, though each MHC-Class-I molecule has its particular peptide-binding preferences, nevertheless each of them is thought to be able to bind around a million different peptides with useful affinity [24]. However, given that there are a strictly limited number of peptide-binding MHC molecules per haploid human genome, a million is not a large number, for it implies that each of us can, at best, bind roughly 3 to 6 × 106 different peptides on MHC-Class-I molecules. That is not a lot, given that the total possible array of nonamer peptide sequences is around 5 × 1011. For the many species that express only one or two MHC-Class-1 molecules, the situation is even worse. So why don’t we all have many more peptide-binding MHC molecules? Why not scores, hundreds or even thousands of them? The answer is that, in a sense, we do, for the MHC locus is highly polymorphic and thousands of different alleles are spread through the population. That tells us that lots of different MHC-Class-I and Class-II alleles are a good idea, and that there is, in fact, no problem in making lots of different MHC structures. So why the restriction to just a handful in each individual? Perhaps the answer is simply that MHC-Class-I molecules are expressed in every nucleated cell and MHC-Class-II molecules in every APC as well as in certain other cells such as, for example, thymic epithelial cells (see Sect. 4.8). This represents a massive investment of energy and metabolites. Because of this, the number of MHC pseudo-alleles present in an individual will be a trade-off between the investment of metabolic resources needed to express them and the fitness advantage gained from having a peptide-recognising T-cell system.

So why two types of peptide carrier molecules? The simple answer is that peptide-MHC -Class-I complexes on the surface of a cell provide information about “non-self”, in the form of pathogens that may be lurking inside a cell. The peptide-MHC-Class-II complexes, in contrast, provide information about “non-self” taken up from the immediate environment of the cell.

4.6.3 TCRs Recognise Complexes of Peptides and MHC Molecules

The two quite different types of information, provided by the Class-I and Class-II peptide complexes, are processed by two different types of αβ T-cells. In some αβ T-cells the TCR on the surface works hand in hand with a co-receptor molecule known as CD4 . These CD4+ αβ T-cells recognise peptide-MHC -Class-II complexes present on APCs, and hence they are in the business of recognising peptides derived from proteins that were digested in the endosome. A T-cell that instead expresses the co-receptor CD8 interacts instead with peptide-MHC-Class-I complexes, and thus is in the business of detecting peptides that were generated by digestion of intracellular proteins in the proteasome. Proteins in the endosome do not, in general, have ready access to the cytosol and hence will not be efficiently presented on MHC-Class-I molecules. On the other hand, proteins in the cytosol do not, in general, have ready access to the endosome, and hence will not be efficiently presented on MHC-Class-II molecules.

The fact that the TCR recognises a complex of peptide with an MHC -molecule rather than just a naked peptide has enormous consequences for the way adaptive immunity functions. In the B-cell system a huge—for all practical purposes—infinite array of possible ligands is met, and countered, with an enormous (>1014) potential repertoire of different BCRs. In the T-cell system things are very different. The RAG-dependent rearrangement of the TCR genes is thought to be able to produce a repertoire of over 1014 different receptors. An array of something like 5 × 1011 different nonamer peptide ligands on the one side, being faced by 1014 receptors on the other, would sound not unreasonable. That, however, is not the way it really is, for the ligands that the TCRs recognise are not naked peptides but rather peptide-MHC complexes. Thus, when a TCR recognises a peptide-MHC complex, part of the binding energy comes from interactions between the TCR and the MHC molecule, and part from interactions of the TCR with the peptide. Since the number of MHC molecules available to an individual is strictly limited (Sect. 4.6.2), and each of these binds only a limited range of different peptides, the number of different peptide-MHC complexes that an individual can present to T-cells is also strictly limited.

4.7 Which TCRs Are Potentially Useful: “Positive Selection”

A TCR that cannot productively interact with any of the MHC -Class-I or Class-II molecules available in an individual is a TCR, which is of no value to that person’s immune system. Flooding the immune compartment with huge numbers of T-cells that are incapable of recognising any of the available peptide-MHC-Class-I complexes would clog up the system, and so all the “useless” cells must be got rid of [25]. This process, which only retains the potentially useful receptors, takes place in the thymic cortex and is known as “positive selection ”. It is believed to involve some means of measuring the strength of the association of the T-cell’s TCR with MHC-peptide complexes expressed on the surface of the cortical epithelial cells. Those T-cells that fail to interact productively with any of these complexes are directed into apoptosis and die. The question of the number of different MHC genes per genome is thus crucially important here. Natural selection has had to find the best possible compromise between the energy drain involved in expressing MHC molecules in every cell in the body, the waste involved in killing large numbers of T-cells whose receptors do not “fit” to any of the individual’s MHC molecules and the fitness benefit derived from having T-cells able to look into what is going on inside other cells.

4.8 “Negative Selection” of T-Cells in the Thymus

Perhaps the greatest challenge in considering the evolutionary origin of adaptive immunity in vertebrates revolves around the simple chicken-and-egg problem that somatic recombination of receptors cannot arise in the absence of appropriate tolerance systems, and tolerance cannot evolve in the absence of the repertoire. It is hard to envisage two such complex systems as repertoire generation and tolerance induction suddenly evolving in tandem. However, the absolute necessity for tolerance systems is underscored by the investment that is made to purge the repertoire of anti-“self” specificities.

Positive selection in the thymus cortex of cells expressing TCRs that will interact with one of the individual’s MHC molecules involves a massacre of the newly formed T-cells—but this is just the start. What follows is a second massacre—one that takes place in the thymic medulla and is known as negative selection. What is the purpose of negative selection? Why is it needed? How does it work? The function of the repertoire of TCRs on mature CD8 + T-cells is to continually scan the peptide-MHC-Class-I complexes exposed on the surface of normal cells. Those cells, which display only peptides derived from normal “self” proteins, must be left in peace. Those, which display peptides derived from the proteins of an intracellular pathogen, must be destroyed. How are the CD8+ T-cells to distinguish “self” peptides from “pathogen” derived ones? The answer is that in reality they can’t—and they don’t have to, because those CD8+ T-cells that can recognise “self” peptide-MHC-Class-I complexes are destroyed in the thymus by the process of negative selection [25].

In a similar way the CD4 + T-cells are there to recognise peptide-MHC complexes displayed on the surface of APCs. These complexes are derived from material that the APC has taken up from its surroundings, and processed to peptides in the endosomes. The resulting peptides may be derived from pathogens, like bacteria or viruses, or they may be derived from “self” proteins present in apoptotic cells or in the debris from virus-infected cells that the APC has scavenged. All of this material will be processed in the endosomes, and thus not only pathogen derived but also “self” peptides will be presented by the APC on MHC-Class-II molecules to the CD4+ T-cells. How are the CD4+ T-cells to recognise the “non-self ” components, and yet ignore the “self” components? As with the CD8+ T-cells, they can’t, and as with the CD8+ T-cells they don’t have to, because those CD4+ T-cells that recognise “self” peptide-MHC-Class-II complexes are destroyed in the thymus medulla by negative selection [25].

Negative selection works by having the thymic medullary epithelial cells present “self” peptide-MHC -Class-I complexes and “self” peptide-MHC-Class-II complexes to those T-cells that emerge from positive selection in the cortex. T-cells, which interact with high affinity with any of the “self” peptide-MHC complexes, are driven into apoptosis and die. For this selection to function properly, the medullary thymic epithelial cells face two major problems. The first is that if negative selection is to be effective, then these epithelial cells must be able to present all “self” peptides to the T-cells. The second is that they must be able to present “self” peptides not only on MHC-Class-I to CD8 + T-cells but also on MHC-Class-II molecules to CD4 + T-cells.

4.8.1 Presenting All “Self” Peptides

Thymic epithelial cells are end-differentiated cells, and end-differentiated cells express only a specific subset of all the genes in the genome. Because of this, the expression of many genes is restricted to certain tissues, and so one would not expect that epithelial cells in the thymus to be able to express the tissue restricted genes typical of other tissues. In this respect, however, the medullary thymic epithelial cells are different for, they express a gene called AIRE . This gene codes for a protein, which, in ways that are not yet fully understood, licences these cells to express essentially every gene in the genome. Not all genes are expressed in every medullary thymus epithelial cell all of the time, but at any given moment a tissue restricted protein is expressed by roughly 1–3% of them. These proteins, synthesised in the medullary epithelial cells, are then processed through the proteasome and presented as peptide-MHC -Class-I complexes to the CD8 + T-cells. This allows for the detection and destruction of those newly made CD8+ T-cells that express autoimmune receptors [25].

4.8.2 Expressing “Self” Peptides on MHC Class-II

The peptides detected by CD4 + T-cells are different from those detected by CD8 + T-cells. This is because peptides formed in the endosome are generated by a different set of proteases from those that are present in the proteasome. In presenting these “self” peptides to CD4+ cells the medullary epithelial cells must overcome a peculiar problem. Normal cellular proteins can be readily routed to the proteasome, and the resulting peptides can be loaded onto MHC -Class-I molecules. However, these normal cellular proteins do not usually have access to the endosomes. Only proteins from phagocytosed material are directed to the endosome. How then can the medullary epithelial cells make “endosome type” peptides from these normal proteins, and present them as MHC-Class-II complexes to CD4+ αβ T-cells? The answer seems to be that these medullary epithelial cells, as well as the medullary dendritic cells , are peculiarly efficient at autophagy —the mechanism of “internal phagocytosis ” by which normal cellular proteins are transferred to the endosomes (see Sect. 2.2.5). In this way samples of the cell’s own proteins reach the endosomes where they are digested to peptides that associate with MHC-Class-II molecules.

4.9 The Price of T-Cell Selection in the Thymus

The result of positive and negative selection in the thymus is that the CD4 + and the CD8 + T-cells, which emerge into the periphery, carry receptors that ought to be able to make use of an MHC molecule present in that individual, and they ought to be incapable of interacting with any “self”-peptide-MHC molecules. The repertoire of receptors on these T-cell populations will, nevertheless, almost certainly contain specificities that will detect peptide-MHC complexes containing peptides derived from pathogens. This is what makes the CD8+ and the CD4+ repertoires selectively advantageous. The price, however, is enormous, for over 95% of the T-cells die during positive and negative selection in the thymus, and only the few survivors enter the circulation. Nevertheless, in a young human being, around 109 survivors per day do exit the thymus [26]. The daily destruction of billions of cells in the thymus is an extraordinary waste of metabolites and energy. Comparable size-adjusted numbers of destroyed thymocytes have been reported for other vertebrates such as mice and chickens.

4.10 Co-evolution of TCRs and MHC-Molecules

MHC molecules, by displaying peptides on cell surfaces, provide the T-cells with information about what is going on inside other cells—this is the sole established function of MHC molecules. Attempts to find some alternative function of these MHC molecules, which would provide them with a selective advantage in the absence of an adaptive immune repertoire, have not so far been successful. T-cells interpret the information provided by the peptide-MHC-complexes—this is the sole function of the TCR . MHC molecules and TCRs thus represent an extraordinary case of co-evolution, and it is unclear how either of them could have arisen on its own, for αβ T-cells, as we know them today, make no sense in the absence of MHC molecules, and MHC molecules make no sense in the absence of αβ T-cells. How these two molecular species , with their very different, but complementary, functions, could have co-evolved is a mystery that has puzzled biologists for decades. One clue may be that MHC molecules share with the TCR the use of the C1-type IgSF domain, which is restricted to gnathostomes, and this suggests that the genesis of the TCR and MHC families was not totally independent. What is clear is that the peptide-binding MHC molecules emerged in parallel with RAG-mediated recombination , as the jawed vertebrates split from the earlier non-jawed vertebrate line [27]. However, it is certainly possible that the “original T-cells” were more like B-cells in that they did not recognise peptides, and so did not require the MHC peptide-binding molecules. The reason for thinking this is that in addition to the αβ T-cells, gnathostomes also possess a second population of T-cells which express a TCR composed of one γ and one δ chain. These γ and δ chains are homologous to the α and β TCR chains of “conventional” T-cells. The “gamma-delta” T-cells are thought to play roles in mucosal immunity and they seem to bind intact proteins rather than peptides, and hence do not require MHC for antigen recognition. The gamma-delta T-cells may represent the vestiges of a T-cell population that preceded the evolution of the αβ T-cells. Were this to be the case, then the evolution of the TCR may well have preceded that of the MHC -Class-I and Class-II molecules.

4.11 Repertoire Change After Central Tolerance: Somatic Hypermutation of BCRs

One might expect that after central tolerance has been established, there would be an absolute prohibition on any subsequent widespread mutation of the sequences coding for the antigen-binding site, for this might well give rise to new autoimmune specificities. This expectation is fulfilled for mammalian T-cells, but for the B-cells things are different.

The binding sites of adaptive immune receptors are formed from the three “Complementarity Determining Regions ” which contact the ligand (Fig. 4.7). Since RAG recombination only mutates the sequence of CDR3, but leaves the sequences of CDR1 and CDR2 untouched, the repertoire that is produced is far from being as complete as it might be. Because only a small fraction of an incomplete B-cell repertoire can be expressed at any one time, it is unlikely that a pathogen entering the body will be met by B-cells expressing the “perfect” receptor against it. Nevertheless, the available repertoire is large enough that, in general, some B-cells with less than perfect—and hence low affinity—receptors will be available. This handful of cells would never be able to effectively counter a rapidly growing bacterial or viral pathogen, and so after contact with the pathogen, they are directed to specialised niches in the secondary lymphoid tissues where germinal centres develop, a micro-environment required for the “maturation” of the immune response. In the germinal centres the activated B-cells undergo rapidly accelerated cell divisions and, multiple rounds of somatic mutation, followed by a somatic selection process to filter out the cells with the highest affinity receptors. The value of this can be seen by the fact that cold-blooded vertebrates, which lack germinal centres, have problems “maturing” their immune responses [28].

For this process of “somatic hypermutation” in the germinal centres , RAG recombinase is of no value for its mutagenic action is part and parcel of the rearrangement of the V, (D) and J sequence modules, which was completed before the B-cell emerged into the periphery. Instead a different mutagenic mechanism—one centred on the cytidine deaminase AID —comes into play [29]. AID is the gnathostome homolog of the cytidine deaminases used by agnathans to initiate the gene conversion process by which their adaptive immune receptor repertoires are generated (Sect. 4.3.1). The formation of dU by deamination of dC is a mutational process that is repaired by the cell’s DNA repair systems. These repair systems are very ancient and have been naturally selected for their ability to restore the original sequence after a mutational event. It is at the moment unclear what induces these repair systems to instead generate mutations in the context of B-cell somatic hypermutation, class switch recombination and in gene conversion of V-genes in chickens, rabbits and sheep (Fig. 4.10). It seems, however, that the type of mutation that is induced depends on the nature of the cofactors in the cell that work together with AID.

Fig. 4.10
figure 10

Induction of mutations by Activation-Induced Deaminase (AID ). Deamination of dC by AID to form dU is followed by the removal of the uracil moiety by Uracil-DNA-glycosylase (UNG) and the introduction of a nick in the DNA by the AP-endonuclease. Different elements of the Base Excision Repair (BER) or of the Mismatch Repair (MMR) systems are recruited to repair the damaged DNA strand and may result in somatic hypermutation, class switch recombination and/or gene conversion [29]

During the AID -initiated somatic hypermutation process, mutations are introduced into the rearranged V-gene sequence and thus can alter not just CDR3, but CDR1 and CDR2 as well [30]. This somatic hypermutation process is fraught with problems. Not only will the mutation process result in B-cells with structurally defective receptors but, in addition, the tiny fraction of mutant cells with increased affinity for the antigen must be reliably identified and their numbers quickly expanded in the germinal centres . Selection and expansion are achieved by making the cells compete with each other for access to a limited amount of antigen, which is presented in intact form on the surface of the germinal centre’s “follicular dendritic cells ”. Mutant B-cells with low affinity receptors lose out in this competition for antigen, and, having been left empty handed, they are condemned to death by apoptosis . However, those cells with a higher affinity receptor grab the antigen, internalise it, process it to peptides in their endosomes and express the resulting peptides as MHC -Class-II complexes on the cell surface. CD4 + “follicular T-helper cells” that recognise these peptides will then provide the B-cell with essential survival factors. These high affinity winners may differentiate into effector cells or go through new rounds of division, mutation and selection, so that in the space of a few days clones of cells expressing high affinity receptors have been selected and expanded [31]. It is, however, clear that changing the structure of the CDRs can dramatically alter not only the affinity of the BCR for the cognate ligand, it may also radically change its ligand specificity, and thus may generate an autoimmune receptor. The central tolerance mechanisms operating in the bone marrow have been left far behind, so other means of dealing with B-lineage cells that have acquired potentially autoimmune receptors must be available.

4.12 Life After Central Tolerance: Peripheral Tolerance and Lymphocyte Activation

Experience teaches us that nothing in life is perfect, and this applies also to central tolerance . Though the T-cell compartment does not do anything as bizarre as mutating its receptors after central tolerance is established, that does not mean that the repertoire is completely safe. All TCRs must be at least a little bit autoimmune , because they must interact with “self” MHC molecules and so “self” and “non-self ”, “safe” and “autoimmune” are neither well defined nor clearly distinguished categories. The highest affinity receptors are potentially “autoimmune”, and the lowest are “safe”, but there is a considerable grey area between “low” and “high”, and no simple affinity cut-off above which cells must be eliminated and below which they are perfectly safe. In consequence, many of the B- and T-cells entering the periphery express receptors that are somewhere between “borderline autoimmune” and “frankly autoimmune”. This should not just be viewed merely as a failure of central tolerance, for it is true that “Immunity operates on the edge of autoimmunity. The more potent an immune response is, the greater the risk of auto-reactivity and self-harm” [32]. In addition to the problems posed by the question of an appropriate affinity cut-off for B-cells and T-cells, there is in the T-cell compartment, a further problem. This is that the astonishing system of AIRE -mediated, global gene expression in thymic medullary epithelial cells, is not as all-encompassing as one might think. Tissue-specific splice variants or tissue-specific post-translational modification of protein structure will not be covered and thus T-cells cannot be centrally tolerised to these “self” structures. Simply allowing centrally tolerised lymphocytes to do their thing unchecked in the periphery would result in a catastrophe [25]. Somehow such potentially autoimmune cells must be controlled. An intricate set of overlapping controls has evolved to do this, and their importance is seen in the devastating autoimmune diseases that appear should these systems fail. It is also seen when pathogens—such as tumours—manipulate these regulatory systems to their own ends. These controls fall roughly into two groups—those based on the “fail-safe two key” strategy, and those due to the presence of regulator cells.

4.12.1 The “Two Key” Fail-Safe Strategy

Innate immune cells, such as macrophages, express many different receptors, and these together provide a “picture” of a potentially dangerous situation. Each of these cells is empowered to analyse the input signals flowing in from its receptor repertoire, and then autonomously decide what should be done. These cell autonomous decisions work, because the receptors, which are providing the input data, have been honed over millions of years of selection to a fine state of tolerance . These receptors can therefore, in general, be trusted. The situation is different with lymphocytes expressing somatically evolved immune receptors. Here the receptors have been screened by the rough and ready business of central tolerance and they cannot be relied on to be truly tolerant. Because of this, in adaptive immunity cell autonomous decisions are avoided like the plague. Instead the “two key” principle guiding the launching of nuclear weapons applies, so that at least two different cell types must agree that there is a problem that requires firm action before an adaptive immune response can be initiated.

One can see this schematically in Fig. 4.11, which shows the activation of a CD8 + killer cell. A brief description of this complex process is given in the figure legend, but the important take-home lessons are actually very simple. First, the activation of a CD8+ T-cell requires the collaboration of three different cell types and involves five different ligand-receptor repertoires. Furthermore, both the CD4 + and CD8+ T-cells require additional information from the dendritic cell in the form of “co-stimulation”, and the CD8+ T-cell also requires “help” from the CD4+ T-cell [33].

Fig. 4.11
figure 11

Activation of a naïve CD8 + T-cell. This requires three different cell types—a dendritic cell, a CD4 + “helper” T-cell and the CD8+ T-cell. It involves a total of five different immune receptor repertoires: the innate receptors on the dendritic cell; the peptide-MHC -Class-I complexes (pep-MHC-I) displayed on the dendritic cell, the peptide-MHC-Class-II complexes displayed on the dendritic cell, the CD4+ T-cell receptor repertoire and finally the CD8+ T-cell receptor repertoire. A dendritic cell (a) detects the danger signals associated with the debris of a lysed virus-infected cell by way of its innate immune receptors (IRs) The debris is phagocytosed and peptides generated from it by digestion in the endosome are displayed on the cell surface as peptide-MHC-Class-II complexes (p-MHC-II) to the T-cell receptor (TCR ) of a naïve CD4+ T-cell (b). The dendritic cells also carry on their surface the so-called co-stimulator (CS) molecules (narrow stippled arrow), with which they can identify themselves to the CD4+ T-lymphocyte as officially approved “Antigen Presenting Cells”. These dendritic cells are also adept at “cross presentation” by which proteins in the ingested material are transferred into the cytosol, digested in the proteasome and displayed on the cell surface as peptide-MHC-Class-I (pep-MHC-I) complexes to CD8+ T-cells (c). If a naïve CD8+ T-cell interacts with the peptide-MHC-Class-I complex, and if it is also assured by the co-stimulatory signals on the surface of the DC that this is indeed a bona fide “Antigen Presenting Cell”, and if it also receives appropriate signals as from the CD4+ helper cell (thick stippled arrow), then the naïve CD8+ T-cell will start to divide and produce a large clone of activated “killer cells”

Why does it have to be so complicated? The answer is that neither the CD4 + nor the CD8 + TCR repertoires are entirely trustworthy. Only if both of them, together with the dendritic cell, agree that there is a problem that needs to be addressed, will a clone of activated CD8+ killer cells be formed. This caution is well justified because once the activated CD8+ T-cells have been formed they need no help or permission from any other cell to go about doing their business. Their business is killing. A CD8+ T-cell accidentally activated against a “self” peptide will kill any cell that expresses the appropriate peptide-MHC -Class-I complex on its surface—and that is the worst possible news, because CD8+ T-cells are serial killers. The activation scheme shown in Fig. 4.11 is set up so that if any of the signals are missing, then the activation is aborted. When the system fails, it ought to fail “safe”.

It is not only CD8 + T-killer cells whose activation is regulated in this way, for a similar sort of control regulates the activation of B-cells. Not all autoimmune B-cells are removed by central tolerance in the bone marrow, and some do make it out into the periphery. There more may be generated during the process of somatic hypermutation in the germinal centres . B-lymphocytes, however, are normally not licenced to autonomously mount a response to their cognate ligand. They require help from T-cells to initiate an immune response. If a B-cell meets an antigen in the periphery, and fails to get the appropriate T-cell help, then it will be converted into an inactive “anergic” state.

4.12.2 T-Regulator Cells

Within the CD4 + and CD8 + T-cell populations there are many functional subdivisions. The CD4+ T-cell population, in particular, is known to be divided into a large number of subsets each of which has a different spectrum of functions and each of which is defined by a typical pattern of expression of transcription factors within it. These transcription factors do not alter the structure of the TCR in any way, but, by deciding which genes will be expressed in the cell, they define how an interaction of the TCR with its cognate ligand will be interpreted, and hence what functions the cell will subsequently undertake.

Perhaps the most extreme example of a function dictated by the expression of a particular transcription factor is provided by those CD4 + T-cells whose TCR fell into the grey area between “safe” and “autoimmune ” during negative selection. These CD4+ T-cells may switch on the transcription factor FoxP3, which turns them into “T-regulator ” cells (T-regs). In such cells the signal from engagement of the TCR with its ligand does not instruct the cell to provide help for the activation of B-cells or of CD8 + killer cells, but instead T-regs do exactly the opposite—they switch the activation of other cells off.

These different T-cell lineages are not cast in concrete, and they may, under certain circumstances, demonstrate a degree of plasticity. This is exploited by many cancers, which are able to subvert the CD4 + helper cells by turning them into “induced T-regs”. The adaptive immune response to the tumour then grinds to a halt. Learning how to manipulate the plasticity of CD4+ T-cell populations will have obvious clinical relevance.

4.13 Beyond the Receptor Repertoire: Lymphocyte Effector Functions

So far we have concentrated on the mechanisms that generate an adaptive immune receptor repertoire, and on the means, which have evolved to purge the repertoire of autoimmune specificities. Only once these processes have been completed can the repertoire be employed to counter pathogen infections. Immune defence requires not just sensors to detect a problem; it also needs effector systems to remedy it.

The MHC -Class-I -dependent CD8 + killer T-cells provide a means to destroy virus-infected cells. These killer cells, once activated, destroy their targets by directly transferring the contents of cytotoxic granules into the target, which then dies by apoptosis . Killer cells appear to have arisen early in vertebrate evolution, for there is genomic sequence evidence indicating that they are already present in the surviving basal cartilaginous fish [27].

The effector functions of gnathostome B-cells are somewhat more complex. During the initial interactions with a pathogen, the B-cell collects information on the cytokines and other factors in its immediate environment and carries out a computation that provides a decision as to what effector system should be linked to the antigen-binding part of a BCR . In some cases it may be best to ensure that the receptor will activate the complement system, in others that the pathogen be phagocytosed by macrophages, in other cases it may be better to have the receptor secreted across a mucosal surface, while in yet others it may be important that the receptor can cross the placenta. Each of these responses is best achieved by treating the entire antigen-binding part of the BCR as a modular unit that can be plugged onto one of a set of different “constant” regions that define the effector function of the molecule. The process that makes this possible is known as “class switch recombination ”. It involves yet another recombinational process at the DNA level. In this case the entire V-domain, with its three CDRs, is physically transferred to one of a number of different C1-type domains that enable different functions of the antibody molecule [29]. This recombination process is initiated by the enzyme AID , which is also central to somatic hypermutation, and whose homolog in agnathans is required for initiating the gene conversion process involved in repertoire generation (see Figs. 4.2 and 4.10). A sophisticated form of class switch recombination is already present in amphibians and all later vertebrates, but a simplified version of this process is evident in the earliest jawed vertebrates currently available—the cartilaginous fish [34].

Activation of B-cells via the BCR may lead to differentiation of the B-cell into a plasma cell that synthesises large amounts of the receptor and secretes it as soluble antibody at a rate of around 2000 molecules per second. To be secreted the receptor has to be modified so that it is no longer bound to the cell surface. In principle this change, like V, (D), J rearrangement or like class switch recombination , could be effected by recombination at the DNA level, but there is no law which says that evolution has to be consistent, and the solution that emerged involves removing the exon containing the transmembrane domain by directed alternative splicing at the RNA level.

4.14 Adaptive Memory in Gnathostomes

If you have just been infected with a novel pathogen, then clonally selected B and T-cells are not going to provide significant defence for the first week to 10 days. Over this crucial initial period those who have not been vaccinated against the pathogen will have to rely on innate immunity to survive. This underscores that the selective advantage provided by adaptive immunity is probably not its slow initial response, but rather the fact that it remembers the pathogens that it has come in contact with. This is because activated T and B-cells are set aside during an initial response and retained as the so-called memory cells.

In the B-cell compartment this memory response is made up of two arms. The first of these consists of antibody secreting plasma cells that are produced after activated B-cells have been selected in germinal centres, and matured by somatic hypermutation and class switching. Once an infection has been overcome these plasma cells may migrate to the bone marrow where a small fraction of them settle down and continue to produce antibody for long periods of time—sometimes for decades [35]. The second arm of memory is provided by antigen stimulated B and T lymphocytes that differentiate into circulating memory cells. Should the pathogen reappear in the future, then circulating antibody provided by the long lived plasma cells will make life hard for it, and at the same time the memory B- and T-cells will be driven into cell division to produce a whole new generation of plasma and memory cells.

Thus, as one grows up in a particular environment, one gradually acquires immunological memory directed against all of the endemic pathogens. Over this initial period the young animal is at risk, but after a while immunological memory provides protection that will effectively neutralise an infection at the earliest stages—often even before symptoms develop. B-cell adaptive immunity is thus a large-scale exercise in vaccination. These vaccinations provide an enormous selective advantage, and they are what justify the enormous costs involved in developing specific adaptive immune responses.

4.15 Diversity of Adaptive Immune Repertoire Formation in Gnathostomes

Across the phylogenetic span from nematode worms to mammals, immune defence involves the use of a mix of “innate”, i.e. germ line encoded, and of “adaptive”, i.e. somatically encoded, receptors. Since different species inhabit different environments, it is clear that neither innate nor adaptive immunity can be viewed as an “off the peg” defence system. Though natural selection has ensured that certain features of innate immunity are common between the fruit fly and mammals, there are also very considerable differences. In the same way, and for the same reasons, the details of the workings of adaptive immune defence varies considerably between different gnathostome species.

The brief outline given here of the gnathostome adaptive immune receptors is heavily skewed to the situation in mice and humans, for these are by far the best-studied vertebrates. If one broadens the perspective, to cover other species , then it turns out that there is little in adaptive immunity that is precisely conserved across all jawed vertebrates. This is perhaps not surprising for no adaptive immune system is ever truly “complete”. New pathogens with new virulence strategies constantly require adjustments to the current immune system, and since different vertebrate species live in very different environments, and face different spectra of pathogens, it would be surprising indeed if there were no differences in the ways their immune systems work. Indeed when the adaptive immune response in any species is examined in detail a host of idiosyncratic features emerge, and these make the application of the results of animal experiments to human clinical medicine an enterprise that is often fraught with uncertainties. True, the division of the lymphocyte universe into T- and B-cells whose receptors are generated by a process involving RAG recombination are general features in gnathostomes. Beyond that, however, divergence is the name of the game. A few examples will demonstrate this.

The first example we will take is that the RAG catalysed VDJ recombination system is not entirely sacrosanct. RAG requires multiple alternative V, (D) and J modules with which to generate the repertoire but, on the other hand, there is a general problem with gene families in which the various members all carry out very similar functions. As such a family increases in size, selection pressure on each member drops and their maintenance by purifying selection becomes increasingly difficult. The genes start to accumulate random mutations and degrade to pseudo-genes. For example, in the first human heavy chain locus that was completely sequenced there are 123 V-gene segments, but 79 of these are pseudo-genes, so that there are only 44 intact V-gene segments left that can be used [36]. In a number of vertebrates including the chicken and rabbit this process has gone one step further, for there is just one single V-gene segment left functionally intact. In these species the problem of having too few VH-gene segments is solved by taking a leaf out of the lamprey ’s book: the last remaining intact V-gene segment is rearranged to D and J using RAG recombinase , after which AID -mediated gene conversion is used to copy information from the pseudo-genes into the rearranged VDJ-gene segment, and so produce a large and diverse repertoire.

A second example of variation of the adaptive system in gnathostomes is that the antigen-binding site of a BCR does not always consist of two polypeptide chains, as described in Sect. 4.4.3 (Fig. 4.7). There are bizarre forms, like the IgNAR of sharks, in which the antigen-binding site consists of a single V-type domain. The group of camels, dromedaries and llamas separately evolved a rather similar sort of molecule, in which the binding site is formed by a single V-domain, and hence like the shark IgNAR, contains only 3 CDRs. This form of camel immunoglobulin has excited interest in recent years because the isolated variable domain binds antigen, and it is smaller, more soluble, more stable and much easier to produce than similar structures from conventional antibodies. These camel V-domains are of potential pharmacological value and may also be of use for research purposes to provide small, but specific, probes that can be expressed inside eukaryotic cells.

A third example of variation concerns the CD4 + T-helper and the T-regulator cells, which in humans and mice are the central controllers of immune responses, and yet in certain bony fish, including the Atlantic cod, the CD4 co-receptor and the MHC -Class-II molecules have been lost [37]. Since other bony fish do have an intact CD4 system, it seems that the cod’s loss of CD4 is a derived characteristic. In this particular case an expansion of MHC-Class-1 pseudo-alleles suggests that the loss of the CD4 compartment may be compensated to some extent by an expansion of the role of CD8 + T-cells. In a similar vein, zebra fish that have been genetically altered so that they are unable to form B- and T-cells are not obviously disadvantaged, at least under laboratory conditions [38], while the same mutation in mice or humans results in a frequently lethal “Severe Combined Immune Deficiency”. Perhaps the aqueous environment is less packed with pathogens than is ours, so that fish can afford to lose bits of their immune system in ways that we cannot.

4.16 The Evolutionary Relationship of the Agnathan and Gnathostome Adaptive Immune Systems

When one looks at the range of solutions to life’s problems that living systems have come up with, then it is fair to conclude that there is no such thing as a problem to which there is only one possible solution. The politicians’ favoured gambit, “there is no alternative”, does not apply in biology. However, as François Jacob pointed out, the range of alternatives that are available are indeed constrained by one important factor—history. A simple system that has not yet developed in any very sophisticated fashion has the freedom to evolve in many different ways, so that when it seeks a solution to some problem it may have access to a large battery of options. However, the number of evolutionary paths open to a complex system is reduced. As a result, two unrelated complex systems may reach similar analogous solutions to a problem, simply because their levels of complexity have narrowed the range of evolutionary options available to them in a remarkably similar way. The problem therefore is that seemingly similar solutions can be reached by very different means, and, to make matters worse, sometimes seemingly different solutions can in reality be closely related at the genetic level.

“Homology” is a technical term that implies inheritance by descent. A character in two animals is homologous if it is uniquely dependent on shared inherited genes or genetic circuits that were present in the most recent common ancestor. The idea of shared genetic information is important to keep in mind, for it is the key to distinguishing homology from analogy . Analogy describes a situation in which two different species have reached similar solutions to a problem by convergent evolution. Analogy implies no relatedness in terms of descent. However, analogy is much more than just “not homology”, for it tells something about the selective forces which were at work.

When one compares the adaptive immune systems of agnathans and gnathostomes, then it is clear that there are fundamental differences between them. It is equally clear that there are a number of astonishing similarities. What are these similar features that must have been inherited from their most recent common ancestor? What are the divergent features that each of them added on since they diverged from that most recent common ancestor? These questions are not as easy to answer as one might think, for while some homologies are informative, others are—in this context—entirely trivial. Even worse, as we will see in the next section, some analogies may be based on “deep” homology (see also Appendix E).

4.16.1 Homology and Analogy

What we would like to know is how adaptive immunity came to be, and why it developed along two rather different lines, one in agnathans and the other in gnathostomes. The only tools we have available to tease apart what happened is to ask if homologies exist between the agnathan and gnathostome systems, and, if such homologies can be identified, then see what they can tell us about the last common ancestor of all vertebrates, and the evolution of adaptive immunity in agnathans and gnathostomes. When one looks at the antigen-binding receptors—VLRs in the case of agnathans, IgSF domain-based receptors in the case of gnathostomes—then it seems clear that they are very different. The same is true if one considers the genetic rearrangement processes employed to generate the receptor repertoires—gene conversion in agnathans, and RAG driven recombination in gnathostomes. However, on the other hand, there are many features, which make the two systems seem very similar indeed.

Which, if any, of the similarities are real evidence of homology, i.e. of shared inheritance by descent? For example, both agnathan and gnathostome systems use antigen-specific receptors—VLRB in agnathans, BCR in gnathostomes—first as cell surface bound molecules, and then as released effectors. Is this striking shared character evidence of homology? In the gnathostome case, the receptor is held on the surface by virtue of a membrane-spanning domain. Releasing the receptor as a soluble molecule is achieved by alternative splicing of the messenger RNA to remove the membrane spanning exon. In the agnathans, on the other hand, the VLRB receptor molecule is bound to the surface of the cell by means of a post-translational modification that adds a so-called GPI-anchor onto the protein chain. This anchor links the receptor to the cell membrane. Omitting this post-translational modification permits secretion of the product. The two systems thus use different molecular means of linking the receptor to the membrane, and different ways of releasing the cell-bound receptor in soluble form. The result may look broadly similar, but this is a case of convergent evolution—of analogy rather than of homology.

Even in cases where a character in agnathans is clearly homologous to one present in gnathostomes, because it results from inheritance by descent, this may not necessarily be terribly informative. To take a simple example, agnathans and gnathostomes—like all other life forms—use a triplet code to convert the information encoded in their genome into the amino acid sequences of proteins. This is certainly a shared, inherited, homologous feature—but since there are no exceptions, it is, for our purposes, trivial. By the same token, any phenotypic characteristic, which is a direct consequence of the use of such a triplet code, is also, for our purposes, trivial and uninformative. A triplet code has the consequence that random mutation of the coding region of a gene will frequently disrespect the “rule of three” and hence will generate large numbers of frame shift mutations (Sect. 4.5.1). Since agnathans and gnathostomes both use random mutation mechanisms to alter the coding sequences of their adaptive immune receptors, it is inevitable that in both cases a great deal of junk will be formed. In both cases some means must be found to select and preserve the well-formed receptors [39]. The mere fact that selection is necessary, and takes place, in both systems is thus an inevitable consequence of the use of a three letter code. The question of homology then rests on whether some unusual process of selection of the products is shared by the two systems. Though a great deal is known about selection in gnathostomes, at the moment almost nothing is known about how this is achieved in the agnathan system. Until we have that knowledge, the question of whether selection of the products in agnathans and gnathostomes is an example of homology or merely a consequence of the universal use of a triplet code, remains open.

Other features of agnathan and gnathostome adaptive immunity that appear at first sight to be evidence of striking homologies turn out on closer examination to be less convincing. For example, both systems make use of a large collection of lymphocytes each of which expresses just one single receptor specificity. Is this remarkable similarity a case of homology? The idea that these two independent systems would reach the same solution to the problem of expressing a very large repertoire by using a “one lymphocyte , one specificity” solution might seem unlikely, but the degree of unlikeliness has to be measured in terms of what other solutions are available. That is, of course, a priori impossible to judge, but what one can say is that any solution would have to be compatible both with the way that fundamental molecular processes of cell biology have evolved, and with the number of receptors that have to be accommodated. Alternative explanations, from Ehrlich ’s side chain theory, to Pauling’s instructive theory, or to Jerne’s natural selection theory, all fail this test, and that is what, at the end, made Burnet’s clonal selection theory—one lymphocyte, one anticipatory specificity—so convincing and attractive. Until some shared, but unusual molecular means of arranging for one specificity per lymphocyte is discovered, this phenomenon will not pass muster as proof of homology. Analogous arguments apply to the fact that both systems use only one of the two parental receptor loci (allelic exclusion—see Sect. 4.5.4), and that in both cases lymphocytes are stimulated to divide and clonally expand when activated.

So are there any features of the adaptive system in agnathans and in gnathostomes that can be fairly said to be homologous in an evolutionary sense? The answer is that there are least four major factors:

  • They both possess lymphocytes

  • They share particular transcription factors

  • They both form specialised epithelium with the properties of a thymus

  • They share the use of novel cytidine deaminases

4.16.2 Intercalary Evolution and “Deep Homology”

Nowhere have the questions of real and apparent homologies been investigated more extensively than in the evolution of visual systems. And nowhere have they been so deeply studied (see Appendix E). Numerous obviously different types of visual systems occur in metazoans, but all of them share the basic feature that a transcription factor cascade, headed by the master regulator Pax6, regulates their developmental program. This simple basic visual system was then subjected to improvement by the recruitment of genes more or less at random. Those genes that helped were retained; those that did not help were dropped. Since in different species the selective forces operating on the visual system are different, different genes have been selected in different species, and have given rise to different visual systems (Fig. 4.12).

Fig. 4.12
figure 12

Intercalary evolution of visual systems. Starting from a transcription factor (TF) network centred on Pax6, which specifies photoreceptor cells, genes may be recruited at random and selected on the basis of their ability to improve the visual system. In different species , different combinations are selected and these different combinations give rise to different eye forms [40]

Structurally and developmentally the various visual systems are all very different, but at a deeper level they are all homologous for all depend on the Pax6 transcription factor cascade, which is inherited by descent throughout phylogeny.

4.16.3 The Adaptive Niche

The Pax6 story tells us one more thing, and that is that there is no such thing as a “perfect” eye. Each visual system has been selected to fit the needs of very different animals in very different environments. Each species ends up with the sort of eye that gives it the maximal benefit for the minimal input of resources. The same sort of species-specific selective niches will also apply to immune defence, and they will have shaped the evolution of adaptive immunity. So what are the characteristics of the particular selective niche within which adaptive immunity evolved? Perhaps the central limiting factor is that a protein-based, anticipatory adaptive immune system requires a vast number of different receptors. With just one receptor specificity per lymphocyte , a huge number of lymphocytes must be constantly produced, and the majority of them do not survive the developmental process. Most of those that do will never meet their cognate antigen during the course of their short lives in the periphery, and so will die unused. The production and subsequent loss of all of these cells is metabolically expensive. This sort of adaptive immune system would be unlikely to help a small, short-lived organism with few immune cells—such as a fruit fly or a nematode worm. To make the investment worthwhile, the animal has to have features that will allow it to exploit the benefits of adaptive immunity so as to increase its fitness . For this it must live long enough, and have sufficient rapidly turned over immune cells, to be able to sample the huge repertoire during the course of its life. Somewhere at the start of vertebrate evolution the increasing “generation gap ” between rapidly dividing pathogens on the one hand and ever-larger animals with longer generation times on the other, produced a favourable selective niche within which adaptive immunity made sense. After that the evolution of adaptive immune systems—while not inevitable—should not be considered to be a great surprise.

4.16.4 Haematopoiesis and the Origin of Lymphocytes

We all know—more or less—what a “lymphocyte ” is. In vertebrates they are immune cells produced by the mesoderm-derived haematopoietic system, some, but not all of which, express somatically formed receptors of the adaptive immune system. Where did lymphocytes come from? In Drosophila melanogaster haematopoietic stem cells give rise to three innate immune cell types all of which are found in the haemolymph. The crystal cells and lamellocytes are specialised mobile cells, which encapsulate intruders like fungi or the eggs of parasitic wasps. This activity is somewhat reminiscent of the formation of granulomas in humans and mice, though these crystal cells and lamellocytes have no clear homologs in vertebrate systems. The third cell type—the plasmatocytes—are mobile phagocytic cells that remove cell debris and bacterial pathogens, and which in this sense are comparable to mammalian granulocytes or monocytes. These invertebrate lineages are considered homologous to vertebrate haematopoietic lineages, since they share with them the use of certain transcription factor cascades. However, in invertebrates there is nothing to compare to the range of blood cells produced by the haematopoietic stem cells in vertebrates. Though some invertebrates do have oxygen exchange molecules, either free in their body fluid or packed into the so-called plasmatocytes, none have an erythroid lineage that produces erythrocytes and there is nothing in invertebrate that can be compared in its developmental origin to the vertebrate lymphocyte lineages [41]. Somewhere down at the start of the vertebrate line there was a veritable explosion in the number of different cell lineages produced by haematopoietic stem cells.

When a stem cell lineage is extended so as to produce new cell types, as happened in the haematopoietic lineage as vertebrates first evolved, then each new cell type requires a network of transcription factors to organise the gene expression pattern, which will define its identity. These transcription factors will act to switch on the genes now appropriate for that new cell’s activities, and to switch off genes that are inappropriate. Which transcription factors would be best for the new lineage? The answer is that it really doesn’t matter—just as there is nothing about Pax6 that predestines it to be the master regulator of visual systems, so any transcription factor could in principle be recruited to regulate the development and the life of a new haematopoietic lineage. However, once a network of transcription factors has been chosen, and their short DNA recognition sequences inserted close to the start of the genes they should regulate, then it is very hard indeed to change the network. The target genes can be readily swapped by gaining or losing the short recognition sequences, but the transcription factors, once chosen, are fixed. A transcription factor network thus has an evolutionary stability that makes it a particularly powerful means of tracing homology, for there are so many transcription factors available that in cases of convergent evolution the probability that the same ones will be chosen by chance, is vanishingly low (see Appendix E).

We have no information as to how lymphocytes evolved, but by any reasonable definition of the term, cells which can be described as “lymphocytes” are present both in agnathans and in gnathostomes. Are these cells homologs, i.e. the product of one “invention” or are they two independent “inventions” that converged to do more or less the same job? Just as the Pax6 transcription factor demonstrates a deep homology between all bilaterian visual systems, one sees evidence of shared transcription factor networks in gnathostome and agnathan lymphocyte lineages. Perhaps the best evidence currently available comes from the transcription factor Pax5.

Pax5 is used throughout metazoan phylogeny in the specification of the nervous system, and it is also expressed in mice during spermatogenesis [42]. However, the extension of the haematopoietic lineages at the start of vertebrate evolution required that each of these new cell types be given a transcription factor network that would uniquely define its identity, and in the case of the B-lymphocytes in gnathostomes, identity is defined by a set of transcription factors that includes Pax5. No other cell in the haematopoietic lineage expresses Pax5, and a B-cell that loses its ability to express Pax5 ceases to be a B-cell. Thus, for example, Pax5 is switched off when a human or mouse B-cell turns into an antibody secreting plasma cell. In the agnathan lamprey there are three lineages of lymphocytes—the VLRA , B and C cells. Of these the VLRB cell is the only one, which, like the gnathostome B-cell, once activated releases its receptor in soluble form. It is also the only VLR cell to express Pax5, and this alone is a powerful argument that VLRB cells in the agnathans are homologous to B-lymphocytes in gnathostomes. In a similar vein VLRA and VLRC cells express transcription factors whose mammalian homologs are associated with αβ T-cells and γδ T-cells respectively [43]. By the criterion of transcription factor expression VLRA and VLRC cells are homologs to the gnathostome T-cell lineages, while the VLRB cells are homologs of gnathostome B-cells.

4.16.5 Thymus and “Thymoid”

The similarities between the agnathan and gnathostome systems do not end there. Gnathostome T lymphocytes develop in the thymus , an organ whose development is critically dependent on the expression of the transcription factor FOXN1. In the mouse this transcription factor is responsible for controlling the expression of a number of cytokines and cell surface molecules, including Scf, Cxcl12 and DLL4, and their expression is required to permit T-cell precursors to associate with the thymic micro-environment [44]. In the agnathan lamprey a region of the epithelium close to the gill tips expresses the agnathan homologs of both FOXN1 and DLL4. It is in this “thymoid ” that the T-cell-like VLRA and VLRC cells rearrange their receptors. VLRB cells do not rearrange their receptors in this thymus homolog, but rather in a gut-associated tissue known as the typhlosole. The presence of a specialised FOXN1 and DLL4 expressing thymic-type lymphoid organ within which the T-cell-like lymphocyte lineages are formed, is a second strongly homologous feature shared by agnathans and gnathostomes, which must have been inherited from their last common ancestor.

4.16.6 Evolution of AID-Like Cytidine Deaminase Functions in Immunity

Early in vertebrate evolution there appeared a new family of cytidine deaminases . Members of this “Activation-Induced Deaminase” family are present both in agnathans and in gnathostomes, and so must have been present in their most recent common ancestor. These deaminase enzymes do not themselves cut DNA, but by converting dC in particular, defined DNA sequences into dU, they alert the cell’s extensive array of DNA repair systems (Fig. 4.10). Depending on which of these mechanisms is recruited to deal with the dU, the result may be the induction of local point mutations, or of gene conversion , or of a larger chromosomal rearrangement [45].

In the agnathan lamprey , gene conversion forms the adaptive immune receptor repertoire, and the key players involved are AID -like cytidine deaminases (Fig. 4.2). In gnathostomes, in contrast, the means of forming the repertoire is mechanistically quite different. It is based not on gene conversion but rather on recombination catalysed by RAG (Figs. 4.5 and 4.6). This RAG recombination of adaptive immune receptor genes is restricted to gnathostomes, and hence was unlikely to have been present in the most recent common ancestor of agnathans and gnathostomes.

Since gene conversion and recombination are mechanistically quite different processes, one might expect that an adaptive immune system would be organised around one or the other. This would require that the switch from gene conversion in agnathans to recombination in gnathostomes was an abrupt saltation—one of those sudden transitions that do not happen often in evolution. But was it really that abrupt? There are a couple of observations that tend to make this transition seem rather less like a sudden switch. The first is that it appears that at the start of the evolution of gnathostomes AID and RAG may have worked together to form the primary repertoire. The reason for thinking this is that in the nurse shark —a basal gnathostome —the formation of the primary T-cell repertoire seems to involve both enzymes working in concert [46]. Furthermore, even in more highly evolved forms it turns out that AID’s ability to generate the primary repertoire is a skill that has not been entirely forgotten. This can be seen in species like the chicken, in which all but one of the germline V-genes have accumulated mutations that converted them into pseudo-genes. How then does the chicken manage to form a diverse repertoire? The answer is that the one remaining functional germline V-gene is rearranged by RAG to form the initial receptor gene, and the diverse repertoire is then formed by AID-mediated gene conversion, which copies information into it from the flanking pseudo-genes. Thus, in species, which have failed to look after their germline V-genes properly, gene conversion can still be reactivated to save their adaptive immune system.

In this scenario AID -like deaminases were the key to immune receptor rearrangement in basal agnathans. Later, with the evolution of some primitive version of RAG-based recombination mechanism, AID and RAG worked in concert to generate the primary repertoire. Later still, the rearrangement process became dependent on RAG recombination alone. However, the RAG recombination process is intolerant of stop or frameshift mutations in the germline V-genes, and if these occur, the species quickly finds itself between a rock and a hard place. It then faces the choice of either trying to survive without adaptive immunity or reactivating repertoire formation by gene conversion . So far the latter choice appears to have been the only one that offered a chance of survival. As always, evolution rarely throws away a good idea, and though AID is no longer the primary generator of diversity in gnathostome adaptive immunity, it was not lost, but rather reassigned to novel roles—roles like the initiation of class switch recombination and of somatic hypermutation in germinal centre B-cells, which have no counterpart in the older agnathan system.

4.16.7 The Last Common Ancestor of Agnathans and Gnathostomes

Any attempt to reconstruct the evolution of adaptive immunity is bound to be highly speculative. Nevertheless, the molecular homologies between agnathans and gnathostomes allow us to make a reasonable guess as to the features that would have been required in the most recent common ancestor. The first of these is that the ancestor crossed the borderline of a selective niche, so that natural selection’s cost–benefit analysis now made adaptive immunity a realistic option. The most recent common ancestor’s haematopoietic stem cells generated a whole collection of new cell lineages, not found in basal invertebrates. One of these lineages led to the generation of three new cell types that were the precursors of the VLRB, VLRA and VLRC lymphocytes in agnathan and of the B-cells, CD4 +αβ T-cells, CD8 +αβ T-cells and γδ T-cells in gnathostomes. These precursor cell lineages in the most recent common ancestor were almost certainly mobile cells involved in immune defence. As such they would show another property crucial to mounting a concerted adaptive immune response—the ability to communicate with each other. This is achieved in gnathostomes either by direct cell-to-cell interactions or by the release by one cell of cytokines, which can be detected by the appropriate receptors present on other cells. IL-8, IL-16 and IL-17, along with their receptors, all play important roles in co-ordinating immune responses in gnathostomes and, though their functions have not yet been investigated in agnathans, it is clear that all of these cytokines and receptors are expressed by lamprey lymphocytes [43].

Just as in the visual systems, where a Pax6–photoreceptor cell axis (Fig. 4.12) can be expanded by random recruitment of genes, followed by selection for their ability to improve vision, so in adaptive immunity an axis defined by circulating immune cells on the one side, and a controllable mutagen-like cytidine deaminase on the other can then recruit in gene products that will provide for improved defence and hence increased fitness (Fig. 4.13).

Fig. 4.13
figure 13

Presumptive intercalary evolution of adaptive immunity in agnathans. Circulating immune cells expressing an AID -like cytidine deaminase mutator activity as somatic mutator, recruit genes at random. A recruited gene that can give rise to a family of receptors will improve fitness and be positively selected

One of the genes recruited must, of course, have coded for a cell surface receptor whose structure was such that it could be mutated so as to interact with a broad range of targets. One that “worked”, in this sense, would be retained as the pathogen-binding receptor of the adaptive system. In the agnathan system we know nothing about the cell surface molecule, which gave rise to the receptors except that its extracellular domain was an LRR structure reminiscent of the TLR molecules of innate immunity. There are dozens of putative candidates in the genomes of modern cephalochordates or tunicates. In the gnathostome system, we need the Transib -invaded IgSF V-domain linked to an IgSF C1 domain. Both of these domains are found only in gnathostomes and so their route of evolution from invertebrate precursors remains speculative [47].

One final point concerns the selection of the somatic mutators—cytidine deaminase in agnathans, and RAG in gnathostomes. Is it possible that RAG-based receptor rearrangement arose as an innate system backup to a cytidine deaminase-based adaptive system, like that in the lamprey ? The reason for considering this is that there is an interesting anomaly in the earliest jawed vertebrates to which we have access. This involves an oddity in cartilaginous fish, in which the polypeptide chains of adaptive immune receptors are encoded by V, D and J segment that are rearranged by RAG recombinase . Unlike the situation in more advanced vertebrates, these gene segments are not organised into a single cluster within which recombination takes place. Instead there are many “mini clusters”, each of which consists of just a few V, D and J segments. In many of these clusters the segments are partially recombined in the germline to D-J, V-D or even to fully recombined V-D-J elements. This tells us that in the ancestors of these fish, RAG recombinase must have been expressed in the germline, and raises the possibility that RAG recombination arose as a strategy to generate new, germline encoded, innate immune system receptors [48]. Perhaps RAG-based adaptive immunity in gnathostomes arose as a result of a molecular misunderstanding, when RAG, instead of being expressed in the germline, was suddenly expressed instead in lymphocyte precursors. A similar scenario might explain the initial selection of cytidine deaminase in agnathans.

Sadly, however, the details of the transition from agnathan to gnathostome adaptive immunity have been largely lost with the extinction of all agnathans, except for the basal groups represented by the lampreys and hagfish.