The notion of symmetry carries with it the connotation of beauty, harmony and unity. In the words of Hermann Weyl (1952, p. 5):

Symmetry, as wide or narrow as you may define its meaning, is one idea by which man through the ages has tried to comprehend and create order, beauty, and perfection.

The history of symmetry begins with the Greeks, coining the term summetria, derived from the words sun (meaning “with,” “together” or “by association”) and metron (“measure”). In a modern, scientific context, symmetry is recast in terms of invariance: under certain manipulations, namely transformations, specific features of a physical system remain unchanged. Symmetry is thus mathematized as an operator acting on an object, where the defining feature is that the object remains unaltered. Expressed in other words, the object is invariant under the symmetry transformation.

The mathematical structure that underlies the study of symmetry and invariance is known as group theory. Here now is a concrete example of Fig. 2.1: the real-world notion of symmetry is encoded as the mathematical concept of invariance. In order to gain new insights into the workings of the physical world, one needs to burrow deeper into the abstract world. The first gem that can be discovered is group theory, which, as will be discussed, is intimately related to geometry. From the formal rules pertaining to these areas of mathematics encapsulated in the abstract world, three applications can be derived: a universal law of conserved quantities, a tangible grip on elementary particles, and a merger of fragmented forces in nature.

A group G is defined as a set together with an operation that combines any two group elements, satisfying the axioms of closure, associativity, the existence of inverse elements, and containing an identity element. A group action on a set X is defined as a function \(\varPhi : G \times X \rightarrow X\), obeying the axioms of compatibilityFootnote 1 and the existence of an identity function. This defines a bijective map \(\varPhi _g : X \rightarrow X\), where \(\varPhi _g (x) := \varPhi (g,x)\). G is a symmetry group if its group action \(\varPhi \) preserves the structure on X. In other words, if \(\varPhi _g\) leaves X invariant. The set X can be equipped with algebraic, topological, geometric, or analytical structures. See, for instance Schottenloher (1995).

Although the history of group theory has many sources and its evolution unfolded in various parallel threads, listing many famous contributors—the likes of Joseph-Louis Lagrange, Carl Friedrich Gauss, and Augustin-Louis Cauchy—Évariste Galois formalized the abstract notion of a group and is generally considered to have been the first to develop group theory (Kleiner 1986). Galois’ life was tragic. His budding mathematical influence started at the age of seventeen, only to be stifled by his early death three years later. He died in a duel in 1832. The manuscripts he had submitted to Cauchy, and later, Jean-Baptiste Fourier, would both be lost, never to reappear. Galois was incarcerated for nine months for political reasons and was shunned by the French mathematical establishment, which he fought against with vitriol and anger. Only posthumously he was awarded the recognition for his important contributions to mathematics. See, for instance Du Sautoy (2008).

This simple idea, that the symmetry transformations of an object with a predefined structure constitute a group, allowing the concept of symmetry to be formalized in terms of group theory, has proven to be very powerful. Indeed, the more symmetries an object has, the larger its symmetry group. As an example, the monster group was constructed by Robert L. Griess as a group of rotations in 196,883-dimensional space. It is a symmetry group that belongs to a structure with the mind-bogglingly large number of symmetries given by \(M_s\), as specified in (2.5).

Although the groups studied by group theory are algebraic structures, it was recognized that they also play a fundamental role in geometry. Felix Klein initiated a research program in 1872 which aimed at classifying and characterizing geometries, utilizing group theory. It was a manifesto for a new kind of mathematics which thought to capture the essence of geometry not in terms of points and lines, but in the group of symmetries that permuted those objects. This effort became known as the Erlanger Program (Hawkins 1984). The notions of geometry and symmetry, and crucially their deep relationship, are perhaps one of the most fruitful and far reaching themes in physics. If the Book of Nature is written in the alphabet of geometric symbols, then symmetry furnishes its syntax.

1 Symmetry in Action: Conservation Laws

In classical physics, a conservation law states that some aspect of a dynamical system remains constant throughout the system’s evolution. In a first mathematical formalization this means that some quantity X exists, a dynamical variable capturing the system’s evolution over time. In other words, X obeys some equation of motion encoded in (2.1) or (3.1). As X is conserved, i.e., \(\dot{X} = 0\), it remains constant along its flow in phase space—it is an invariant. Conserved quantities are often called constants of motion. In effect, this imposes a constraint on the physical system under investigation. Albeit a constraint originating in the abstract world: a natural consequence of the equations of motions, driven by the mechanics of derivatives, rather than a physical restriction which would be a manifestation of some force.

The notion of conserved quantities and the general idea of persistence, with the antonyms related to perpetual flux, form the basis of very dissimilar philosophies. The pre-Socratic Greek philosopher Parmenides in the early 5th Century B.C.E. resisted the teachings of Heraclitus, who maintained that everything is change. Parmenides initiated the search “for something not subject to the empire of Time” (Russell 2004, p. 54). He asserted the principle that “nothing comes from nothing,” ex nihilo, nihil fit. In the social network of Greek philosophers, Parmenides was influenced by Pythagoras and, in turn, would leave an impression on Plato’s thinking. His principle, which can also be traced back to the Milesian philosophers (Roecklein 2010), argues that existence is eternal and not the result of a divine act of creation. A related idea, which can be seen to prevail throughout time, is called principle of sufficient reason. From Anaximander, Baruch Spinoza, notably Gottfried Wilhelm Leibniz, and to Arthur Schopenhauer, the tenet, that nothing happens without reason, is echoed. Formally, for every fact F, there must be an explanation why F is the case. It is a powerful and controversial philosophical principle and entails bold assertions regarding metaphysics and epistemology. See Melamed and Lin (2013).

The notion, that nothing can come from nothing, is also entailed in the natural philosophy of atomism proposed by Leucippus and his pupil Democritus. They believed that everything is composed of indivisible, indestructible, and eternal atoms. Around the same time, in India, a similar concept of atoms, called aṇu or paramāṇu , appeared perhaps for the first time in Jain scriptures. Jainism, a radically non-violent Indian religion, shares in its cosmology many of the elements of pre-Socratic Greek philosophies, stating that the universe and its constituents are without beginning or end, and nothing can be destroyed or created. The Jain philosophy contains categories that have a distinct scientific flavor, even today. The part of reality that is “non-spirit,” i.e., not related to consciousness is divided into time, space, the principles of motion and stability, and matter (Nakamura 1998). Also in Buddhism, although originally harboring a qualitative, Aristotelian-style atomic theory, would later in the 7th Century develop notions, reminiscent of today’s Weltanschauung, considering atoms as point-sized, eternal units of energy (Singh 2010). A general reference discussing naturalism in Indian philosophy is Chatterjee (2012).

However, a lot of time would pass, before the philosophical notions of immutable, eternal entities could be put on a firm footing and recast in the language of conserved quantities in nature. In 1644, René Descartes published an influential book, called Principles of Philosophy. Not only did he describe laws of physics, which would later be incorporated into Newton’s first law of motion (Whiteside 1991), he also introduced a conserved quantity, which he indiscriminately referred to as “motion” or “quantity of motion.” For the first time, an attempt was made to identify an invariant or unchanging feature of mechanical interactions. Moreover, Descartes envisioned the conservation of motion as one of the fundamental governing principles of the cosmos. Indeed, his law falls just short of the modern law for the conservation of momentum. See Slowik (2013).

While it took many scientists over time to tediously formulate and prove the conservation laws for mass and energy, the insights of one person lead to the uncovering of an overarching framework and the deep understanding of the specifics relating to conservation laws. In her 1918 publication, the mathematician Emmy Noether spelled this out in a theorem, wrapping up a deep physical truth with the mathematics of symmetry (Noether 1918). In plain words (Thompson 2004, p. 5):

If a system has a continuous symmetry property, then there are corresponding quantities whose values are conserved in time.

To understand what this really means, expressed in the language of mathematics, one needs to embark on a journey starting with some notions from geometry.

1.1 From Geometry \(\dots \)

In the centuries following the introduction of Newton’s dynamical laws of classical mechanics, a restatement and further development of the formalism yielded powerful new tools to investigate mechanical systems. The key concepts were unsurprisingly related to geometry. The encoding of the observables leading to (2.1) can be cast in a new light, uncovering the powerful formalisms of Lagrangian and Hamiltonian mechanics.

Each spatial arrangement of a system of particles, or a rigid body, is captured by a single point in a multidimensional space \(M \in \mathbb {R}^{n}\), called the configuration space. Each point in M is described by a generalized coordinate \(q=(q^1,\ldots ,q^n)\), where n reflects the degrees of freedom of the classical system. In effect, a curve in the configuration space represents the evolution of the physical system in time. Technically, M has the structure of a (differentiable) manifold, a generalization of the notions of curves, surfaces, and volumes to arbitrary dimensional objects. From the coordinates, the generalized velocities can be derived as \(\dot{q}^i = \mathrm {d} q^i / \mathrm {d} t\), defining the phase-space \(P = M \times \mathbb {R}^{n}\) with elements \((q^i,\dot{q}^i)\).

Because the velocities are tangential vectors by construction, the set of these vectors at any point \(x \in M\) form a vector space \(TM_x\), called the tangent space to M at x. The union of the tangent spaces to M at all points is the so-called tangent bundle, \(TM = \cup _{x \in M} TM_x\). Hence P can be understood as the tangent bundle TM of the configuration space M. In Lagrange mechanics, a function on the tangent bundle \(L : TM \rightarrow \mathbb {R}\) encodes the structure of the physical system it represents. The equations of motion are given by the Euler–Lagrange equation

$$\begin{aligned} \frac{\partial L(q^i,\dot{q}^i,t)}{\partial q^i} = \frac{\mathrm {d}}{\mathrm {d}t} \left( \frac{\partial L(q^i,\dot{q}^i,t)}{\partial \dot{q}^i} \right) , \end{aligned}$$
(3.1)

equivalent to Newton’s laws of motion.

By introducing the concept of generalized momentum

$$\begin{aligned} p^i = \frac{\partial L}{\partial \dot{q}^i}, \end{aligned}$$
(3.2)

Hamiltonian mechanics can be formulated, where the elements \((q^i,p^i)\) define the (momentum) phase-space. The equations of motion in this point of view are given by

$$\begin{aligned} \dot{q}^i = \frac{\partial H}{\partial p^i} , \,\;\,\; p^i = \frac{\partial H}{\partial \dot{q}^i}, \end{aligned}$$
(3.3)

where the function H is obtained via a special transformation of L. Although Lagrangian mechanics is contained in Hamiltonian mechanics as a special case, “the Hamiltonian point of view allows us to solve completely a series of mechanical problems which do not yield solutions by other means” (Arnold 1989, p. 161). In geometric terms, the momentum phase-space has the structure of a cotangent bundle \(T^{*}M\). Technically, it is the dual vector space of TM, defined for each \(x \in M\) as

$$\begin{aligned} T^{*}M_x := ( TM_x)^{*} := \{ \eta : TM_x \rightarrow \mathbb {R}; \,\;\eta \,\; \text {linear} \}. \end{aligned}$$
(3.4)

Hence the \(\eta \) are linear functionals or 1-forms. This recasts Hamiltonian mechanics as geometry in phase-space, \(H : T^{*}M \rightarrow \mathbb {R}\). General references are Arnold (1989), Frankel (1999), Nakahara (2003).

The geometric language of the Lagrangian or Hamiltonian approach finds its successful application in various domains of physics, fueled by the key concept of what is known as the Lagrangian density. In the following detour, these ideas will be explored.

Both Lagrangian and Hamiltonian mechanics have played highly influential roles in modern physics, as the formalisms can be naturally extended to fields. Especially Lagrangian field theory has become a cornerstone in many physical theories. Here the Lagrangian functionals L are replaced by their field-theoretic counterparts, called Lagrangian densitiesFootnote 2 \(\mathcal {L}\). In general, the following discrete point-particle expressions are extended to fields with an infinite degree of freedom:

$$\begin{aligned} \begin{aligned} q^i&\longrightarrow \psi ^{j} (x^\nu ),\\ \dot{q}^i&\longrightarrow \partial _\mu \psi ^{j} (x^\nu ),\\ L(q^i,\dot{q}^i,t)&\longrightarrow \mathcal {L}(\psi ^{i},\partial _\mu \psi ^j, t), \\ \end{aligned} \end{aligned}$$
(3.5)

where \(x^\nu := (t, \mathbf {x})\) is a point in four-dimensional space-time and the components \(\psi ^j, j=1, 2, \ldots \), describe a quantum field. The corresponding derivative is \(\partial _\mu := \partial /\partial x^\mu \), or, alternatively, \(\partial _\mu = (\partial _{x^0}, \ldots , \partial _{x^3}) = (\partial _t, \partial _x, \partial _y, \partial _z)\). Now the Euler–Lagrange equations also take on a field-theoretic form

$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \psi ^j} = \partial _{\mu } \left( \frac{\partial \mathcal {L}}{\partial (\partial _\mu \psi ^j)} \right) . \end{aligned}$$
(3.6)

As an example, Maxwell’s theory of electromagnetism, encoded as (2.4), can be concisely recast as a field theory in four-dimensional space-timeFootnote 3 building on the field-strength tensor \(F_{\mu \nu }\). The components of \(F_{\mu \nu }\) are derived from the components of the electric and magnetic field vectors \({\mathbf {E}}\) and \({\mathbf {B}}\), respectively. The Lagrangian of electromagnetism takes on the form

$$\begin{aligned} \mathcal {L}_{\text {EM}} \sim F_{\mu \nu } F^{\mu \nu }, \end{aligned}$$
(3.7)

where the Einstein summation convention is assumed and the expression “\(\sim \)” implies equality up to a constant factor. The Euler–Lagrange equations can elegantlyFootnote 4 retrieve Maxwell’s equations, seen in (2.4), by substituting (3.7) in (3.6). See Jackson (1998), Collins et al. (1989) for more details.

Another, more abstract, example is the Lagrangian of the standard model of particle physics. It is a very accurate theory describing the interactions of matter particles via the electromagnetic, weak, and strong forces. In other words, it covers all know forces excluding gravity. The force carrying particles are called gauge bosons (seen in Sect. 4.2). In detail, there exist four boson fields associated with the electroweak force, a unification of electromagnetism and the weak force (discussed in Sect. 4.2.1), and the gluon boson field which propagates the strong force. Gauge bosons represent one of two categories classifying particles according to the value of the spin they carry. The notion of spin can be understood as an intrinsic form of angular momentum of elementary particles (described in Sect. 3.2.2.2), and gauge bosons carry an integer spin value. Particles with half-integer spins are called fermions, the second category of existing particles. All matter is composed of fermions, with a sub-categorization distinguishing leptons and quarks. See Fig. 4.1 on p. 109 for an overview of bosons and fermions. Gauge bosons are formally represented as vector potential \(A^{\mu }\) , discussed in (4.12), with corresponding field tensors \(F^{\mu \nu }\), constructed in (4.14) or (4.41a). Fermions find their formalization as spinor fields \(\psi \), entities which, unlike vectors, require 720\(^{\circ }\) to complete a full rotation (as explained in Sect. 3.2.2.1). The last ingredient is a scalar Higgs boson \(\phi \), required for the generation of mass terms for the bosons and fermions, which are missing in the Lagrangian. The mathematical trick necessary for this feat is called the Higgs mechanism (introduced in Sect. 4.2.1). Returning to the standard model Lagrangian, in a nutshell, one finds

$$\begin{aligned} \mathcal {L}_{\text {SM}} = \mathcal {L}_{\text {force}} + \mathcal {L}_{\text {matter}} + \mathcal {L}_{\text {Higgs}} + \mathcal {L}_{\text {coupling}}, \end{aligned}$$
(3.8)

where

$$\begin{aligned} \mathcal {L}_{\text {force}} \sim A_{\mu \nu } A^{\mu \nu }, \end{aligned}$$
(3.9)

describes the vector bosons,

(3.10)

encodes the fermionic matter fields, where \({D_{\mu }}\) is a special derivative operator, expressed using Feynman’s slash notation, encoding the interactions with the bosons. The bar denotes the Hermitian conjugate which is associated with antiparticles. The next term is related to the Higgs field

$$\begin{aligned} \mathcal {L}_{\text {Higgs}} = \left| D_{\mu } \phi \right| ^2 - \mathcal {V} (\phi ), \end{aligned}$$
(3.11)

with \(\mathcal {V}\) describing the potential energy of the scalar field. Finally, the fermions couple to the Higgs scalars as specified by what is known as the Yukawa coupling

$$\begin{aligned} \mathcal {L}_{\text {coupling}} \sim (\bar{\psi }\phi ) \psi . \end{aligned}$$
(3.12)

These quantities are responsible for generating the mass terms in the Higgs mechanism. This shorthand notation of the standard model Lagrangian goes back to the physicist John Ellis and has been featured on T-shirts and mugs. However, accounting for every detail, the full-blown standard model Lagrangian is comprised of a myriad of terms, filling a whole page. See, for instance, Appendix E in Veltman (1994). General information on the standard model can be found in textbooks on quantum field theory (Kaku 1993; Peskin and Schroeder 1995; Ryder 1996), general theoretical physics (Collins et al. 1989; Lawrie 2013), and particle physics and symmetry (Cheng and Li 1996; Mohapatra 2003).

A final example of an important Lagrangian is related to Einstein’s theory of general relativity, the only successful theory to accurately describe gravitational forces. Here matter, radiation, and non-gravitational force fields are expressed through the four-dimensional stress-energy tensor \(T^{\mu \nu }\) , which becomes the source of the gravitational field. The physical effect of gravitational pull is translated into the abstract notion of the curvature of space-time.Footnote 5 The tools for quantifying such perturbations are found in the mathematics of differential geometry, rendering general relativity a geometric theory. The metric tensor \(g_{\mu \nu }\) , formally a bilinear form defined on a manifold, captures the geometric structure of space-time. Note that \(g^{\mu \nu }g_{\nu \rho }=g_{\rho \nu }g^{\nu \mu }=\delta ^{\mu }_{\ \rho }\), where the Kronecker delta represents the identity matrix, and \(A_{\mu \nu }=g_{\mu \rho }g_{\nu \sigma }A^{\rho \sigma }\). The curvature of space-time can be measured in a series of higher degrees of abstractions, starting with the metric tensor. In detail

$$\begin{aligned} g_{\mu \nu } \rightarrow \varGamma ^\lambda {}_{\mu \nu } \rightarrow {R^\rho }_{\sigma \mu \nu } \rightarrow R_{\mu \nu }, R \rightarrow G_{\mu \nu }. \end{aligned}$$
(3.13)

Labeling the terms form right to left: the Einstein tensor is constructed from the curvature scalar and the Ricci tensor, which are contracted from the Riemann tensor, which depends on the Christoffel symbols, defined via the metric. Encoding all this information uncovers Einstein’s elegant field equations

$$\begin{aligned} G_{\mu \nu } \sim T_{\mu \nu }. \end{aligned}$$
(3.14)

An in-depth account can be found in Sects. 4.1, 4.3.1, and 10.1.2. Again, this shorthand notation does not convey the level of detail and technicality going on behind the scenes. Even simple gravitational problems can be very arduous to solve. Interestingly, it was not Einstein who derived the corresponding Lagrangian which yields the gravitational field equations by virtue of the Euler–Lagrange equations. On the 25th of November 1915, after a series of false starts and detours, Einstein presented the final version of his geometrodynamic law, in the form in which it is still used today (Einstein 1915). Five days earlier, David Hilbert had independently discovered the Lagrangian, respectively the Hamiltonian approach from which Einstein’s theory can be derived (Hilbert 1915). The Lagrangian reads

$$\begin{aligned} \mathcal {L}_{\text {GR}} \sim \sqrt{- \det (g_{\mu \nu })} R. \end{aligned}$$
(3.15)

As a mathematician, Hilbert believed in the axiomatic foundations of physics. Indeed, he stated this as the sixth problem in his famous list of 23 mathematical problems (Hilbert 1900). This intuition allowed him to find such an elegant path to the field equations of gravity, in contrast to Einstein’s struggles. Hilbert tersely remarked (Sauer and Majer 2009, p. 403, translation mine):

If Einstein ends up with the same result [equation of motion] after his colossal detour [...], this can be viewed as a nice consistency check.

According to Kip Thorne, an expert on general relativity, the reason for Einstein’s priority over the geometrodynamic field equations, detailing how matter warps space-time, is the following (Thorne 1995, p. 117f.):

Quite naturally, and in accord with Hilbert’s view of things, the resulting law of warpage was quickly given the name the Einstein field equation rather than being named after Hilbert. Hilbert had carried out the last few mathematical steps to its discovery independently and almost simultaneously with Einstein, but Einstein was responsible for essentially everything that preceded those steps.

Thus ends the excursion sketching the prominence of the Lagrangian formalism in various fields of physics. General references for general relativity are Misner et al. (1973), Collins et al. (1989), Lawrie (2013) next to the specific challenges of formulating quantum field theories in curved space (Birrell and Davies 1994).

Returning to the geometric reformulations of Lagrangian and Hamiltonian mechanics, perhaps the most important aspect of this approach is that it allows the ideas of symmetry to be naturally incorporated. By extending the formal representation of an existing theory to incorporate new abstractions, novel and powerful insights into the fundamental workings of the physical world can be uncovered.

1.2 \(\dots \) To Symmetry

The mathematician Sophus Lie revolutionized the understanding of symmetry and greatly extend its scope of influence with his work on continuous symmetries, with the idea that such transformations should be understood as motions. In contrast, discrete symmetries are always associated with non-continuous changes in the system. For instance, permutations, reflections, or a square’s discreet rotational symmetry, always being multiples of 90\(^{\circ }\). Lie’s notable achievement was the realization that continuous transformation groups, today known as Lie groups, could be best understood by “linearizing” them. In detail, he realized that it suffices to study the group elements in the local neighborhood of the identity element to understand the group’s global structure. In effect, Lie repeated for symmetries what Galois had achieved for algebraic structures: to classify them in terms of group theory. Indeed, initially Lie worked with Klein on the Erlanger Program. Years later, after Lie had suffered from a mental breakdown and became increasingly paranoid, fearing people would steal his ideas, the friendship between him and Klein would turn sour. See Du Sautoy (2008).

Technically, a Lie group G is a differentiable manifold endowed with a group structure such that multiplication and the inverse transformation are differentiable maps. The tangential space \(TG_e\) of a Lie group at the identity element \(e \in G\) is a very special and useful mathematical structure called a Lie algebra. In the terminology of abstract algebra, a vector space over a fieldFootnote 6 \(\mathbb {F}\) is a set V together with a bilinear operation for adding the elements of V (called vectors, \(u, v \in V \Rightarrow u+v \in V\)), and one linking the scalars, or the elements of \(\mathbb {F}\), with vectors, referred to as scalar multiplication (\(a \in \mathbb {F}, v \in V \Rightarrow a \cdot v \in V\)). Eight axioms specify the properties of a vector space. Generalizing this notion, an algebra \(\mathfrak {a}\) over a field \(\mathbb {F}\) is a vector space over \(\mathbb {F}\) equipped with an additional bilinear operation for multiplying elements in \(\mathfrak {a}\). For many Lie algebras \(\mathfrak {g}\), this operation, denoted by Lie brackets, is given by a skew-symmetric product. An example is given by the commutator

$$\begin{aligned} (X, Y ) \in \mathfrak {g} \times \mathfrak {g} \rightarrow [X, Y] := XY - Y X \in \mathfrak {g}. \end{aligned}$$
(3.16)

Although, generically, the Lie brackets must obey an equation known as the Jacobi identity.

The relationship between Lie algebras and Lie groups is captured by a map

$$\begin{aligned} \exp : \mathfrak {g} \rightarrow G. \end{aligned}$$
(3.17)

It is a homomorphism, or a structure preserving map, taking addition to multiplication. For the many Lie groups that are comprised of matrices, the exponential map takes its usual form for any matrix A: \(\exp (A) := \sum _i \frac{1}{i!} A^i\). The existence of this map is one of the primary justifications for the study of Lie groups at the level of Lie algebras. In the general case, a continuous symmetry \(S(t) \in G\), parametrized by \(t \in \mathbb {R}\), can now be described as

$$\begin{aligned} S(t) = \exp (tX^a), \end{aligned}$$
(3.18)

where the vector \(X^a \in \mathfrak {g}\) is called a generator. The set of such generators, equipped with a Lie bracket, defines the Lie algebra. Given a basis \(X^a \in \mathfrak {g}\), (3.16) generalizes to

$$\begin{aligned}{}[X^a,X^b] = f^{a b }_{c} X^c, \end{aligned}$$
(3.19)

where all the information is coded into the \(f^{a b c}\), the structure constants. Note the usage of Einstein’s summation convention.

To summarize, the knowledge of the structure constants, defining the Lie brackets of a Lie algebra, is sufficient to determine the local nature of the Lie group near the identity element. In effect, the Lie brackets can be understood as a linearized version of the group law, a powerful insight provided by Lie.

1.3 \(\dots \) And Back

The formal mathematical framework detailed above can be linked to the dynamics of physical systems. Again, the continued encoding of natural systems into formal representations yields novel insights into the structure of the physical world. Specifically, it is the richness of the abstract world, allowing formal structures to be viewed from seemingly unrelated points of view, that allows for the discovery of similarities between concepts not obvious from the outset.

As mentioned, every Lie group can also be understood as a manifold. It is special in the sense that it always has a family of diffeomorphisms, i.e., invertible functions between differentiable manifolds, such that: \(L_g : G \rightarrow G\), where \(L_g(h) = gh\), with \(g,h \in G\). This means that \(L_g\) acts as a translation.Footnote 7 For any (differentiable) map on a manifold, \(f : G \rightarrow G\), the naturally corresponding differential map \(f_*\) can be defined as

$$\begin{aligned} f_*: TG_p \rightarrow TG_{f(p)}, \end{aligned}$$
(3.20)

meaning that a tangent vector X to G at the point \(p \in G\) is transformed into a tangent vector \(f_*X\) at f(p). The notion is that of a directional derivative along a curve c(t), with \(c(0)=p\). In the case of Lie groups, given a tangent vector \(X_e\) at the identity, the translation \(L_g\) of group elements has the derivative \(L_{g *}\) which maps the vector to any point in G, as \(X_g := L_{g *} X_e\). Finally, a vector field X on G is said to be (left) invariant if it is invariant under all (left) translations, that is \(L_{g *} X_h = X_{gh}\). In this new terminology, a Lie algebra \(\mathfrak {g}\) of G is the space of all (left) invariant vector fields on G. Moreover, the Lie bracket of two left-invariant vector fields is also left invariant.

There is still one piece of the puzzle missing, in order for the formal machinery to spit out novel insight into the workings of physical systems. This missing element is called a one-parameter subgroup of G. In general, it is a differentiable homomorphismFootnote 8 \(\varphi : \mathbb {R} \rightarrow G\). It describes a path \(\varphi (t)\) in G and satisfies the condition \(\phi (t+s) = \phi (t) \phi (s)\). For any Lie group G the one-parameter subgroup, whose generator at the identity e is the tangent vector X, is given by

$$\begin{aligned} \varphi (t) = \exp (t X). \end{aligned}$$
(3.21)

In other words, any continuous symmetry S(t) is equivalent to a one-parameter subgroup \(\varphi (t)\), and there exists an associated left invariant vector field \(X \in \mathfrak {g}\). This links the formal framework of one-parameter subgroups to the notion of symmetry, namely Lie groups and algebras. In a next step, the abstract ideas relating to such one-parameter subgroups can be re-expressed in terms of tangible physical concepts. If M is the phase-space of a physical process, then \(x \in M\) describes the state of the system at some initial time \(t_0\). A mapping \(g_t : M \rightarrow M\) takes this state to the state at the later instant t, \(g_tx\). These transformations \(g_t\) are also called the phase flow, as the phase space can be thought of as filled with a fluid, where a particle located at x flows to the point \(g_tx\) during the time t. It is required that the particular order in which states are transformed is irrelevant. So \(x \mapsto g_{t+s}x\) and \(x \mapsto g_tx \mapsto g_s g_t x\) are identical. This condition, that \(g_{t+s} = g_s g_t\), reveals the phase flow to be a one-parameter subgroup of M, \(g_t=\exp (tX)\).

In a nutshell, each flow \(g_t\), or curve in phase space, can be associated with a velocity vector field in the tangent space. The converse result is “perhaps the most important theorem relating calculus to science” (Frankel 1999, p. 31): roughly speaking, to each vector field corresponds a flow which has this particular vector field as velocity field. Moreover, the exact form of the flow can be found by solving a system of ordinary differential equations associated with the dynamics of the system, similar to (2.1).

At this point it is not yet obvious what the gained benefit of this lengthy formal derivation is. Indeed, it could appear that one is going round in circles. General references to the above discussed topics are Frankel (1999), Nakahara (2003), Arnold (1989).

1.4 Noether’s Theorem: Digging Deeper

In more formal detail, Noether’s theorem states that whenever a system (described by a Lagrangians L or \(\mathcal {L}\) and obeying the Euler–Lagrange equations (3.1) and (3.6)) admits a one-parameter subgroup of diffeomorphisms (the Lagrangian is invariant under a the action of a continuous symmetry group) there is a conserved quantity. For instance, if the Lagrangian is invariant under time translations, spacial translations or the angular rotation about some axis, then the energy, momentum or angular momentum, respectively, is conserved in the system.

Crucially, continuous symmetries are the cornerstone in Noether’s theorem, unveiling the deep connection between the conservation of physical quantities and the formal language of symmetries. However, the truly universal importance of these symmetries in understanding natural phenomena only really started to become apparent with the further developments of quantum theory. Weyl, influenced by the work of Lie, was instrumental in helping foster the understanding of the symmetry structure of quantum mechanics, namely its group-theoretic basis (Weyl 1928, translated into English as Weyl 1950).

In a nutshell, the formal machinery related to symmetries is applied to Lagrangian densities describing quantum fields.Footnote 9 Digging deeper, by adding more analytical formalism to the mix, the insights gained in the abstract realm can be decoded back into the physical world, allowing for novel conserved quantities to emerge. Starting with a Lagrangian \(\mathcal {L}\) describing an arbitrary vector field \(\psi ^i\) (\(i=1, \ldots , N)\), the invariance under a symmetry group G is expressed as

$$\begin{aligned} \mathcal {L} ( \psi ^i) = \mathcal {L} (\psi ^{\prime i}), \end{aligned}$$
(3.22)

where \(\psi ^{\prime i}\) is the transformed field under the group action. The missing link required to specify how, in detail, a group element \(g \in G\) acts as an operator on \(\psi ^i\), is called representation theory, a sub-field of group theory.

In this framework, the abstract mathematical operators of a group are represented as linear transformations of a vector space V (over a field \(\mathbb {F}\)). This implies that a group element \(g \in G\) is transformed as \(g \rightarrow U(g)\), where now \(U(g) : V \rightarrow V\) is a linear mapping. In other words, U is an element of GL(V), the set of all linear transformations on V, called the general linear group. The advantage of this formal translation is gained from an important result of linear algebra. The structure of the transformation U can be encoded in a matrix Footnote 10 as

$$\begin{aligned} U(g) [v] = \text {U} (v), \end{aligned}$$
(3.23)

for \(v \in V\). As a result, GL(V) is isomorphic to \(GL(n, \mathbb {F})\), the set of all \(n \times n\) matrices \(\text {U}\) over the field \(\mathbb {F}\). In effect, by mapping g via U to \(\text {U}\), representation theory allows one to manipulate ever more concrete and tangible objects. The abstract quantity g reemerges as a matrix representation \(\text {U}\) with concrete physical connotation. Formally, \(\text {U}\) can be expanded via the basis vectors of V as \(\text {U}^{ij} \in \mathbb {F}\). Recalling that the group G is associated with a corresponding Lie algebra \(\mathfrak {g}\), Fig. 3.1 shows a diagrammatic representation of how all these concepts fit together.

Fig. 3.1
figure 1

A commutative diagram showing the Lie algebra \(\mathfrak {g}\), the exponential mapping to its corresponding Lie group G—described in (3.17) and (3.21)—the representation of \(g \in G\) as an element of \(U(g) \in GL (V)\), the Lie algebra of GL(V), called \(\mathfrak {gl} (V)\), with the representational mapping \(\mathfrak {U}\) from \(\mathfrak {g}\), and, finally, the matrix representation given by the mapping \(\rho \) yielding the matrix U

Before it is possible to explain how representation theory transitions from pure mathematics to physics, a detour into a special aspect of the framework of quantum mechanics is required. To specify the effect of the group action g on the vector field \(\psi ^i\), i.e., to uncover the form of \(\psi ^{\prime i}\), a powerful procedure from quantum field theory is invoked, called second quantization. In quantum mechanics, the properties of a physical system are encoded into a state vector \(|\psi \rangle \) , employing a notation introduced by Paul A. M. Dirac (1939), referred to as bra-ket notation (see, for instance also Sakurai 1994). \(|\psi \rangle \) is a vector in an infinitely dimensional complex Hilbert space, an abstract vector space, generalizing the notion of Euclidean space, equipped with specific structures. Physical measurements are associated with linear operators on this space of quantum state vectors, called observables. From \(|\psi \rangle \) the wave function \(\psi (t, \mathbf {x})\) can be derived, which is interpreted as a probability amplitude, assigning \(|\psi (t, \mathbf {x})|^2\) the role of a probability density for locating the particle at \(\mathbf {x}\) at time t. This interpretation goes back to Max Born (Born 1926), winning him a Nobel Prize in 1954. The time evolution of the wave function is described by the Schrödinger equation (Schrödinger 1926a, b, c, d)

$$\begin{aligned} i \hbar \partial _t \psi (t, \mathbf {x}) = H \psi (t, \mathbf {x}), \end{aligned}$$
(3.24)

where H is the Hamiltonian operator, characterizing the energy of the system, awarding Erwin Schrödinger a Nobel Prize in 1933. General textbooks on quantum mechanics are, for instance Feynman et al. (1965), Sakurai (1994), Messiah (2000), Schwabl (2007). In essence, a particle located at \((t, \mathbf {x})\) is described by the wave function \(\psi (t, \mathbf {x})\). This idea is referred to as first quantization. To extend this notion of quantization from objects with three degrees of (spatial) freedom to quantum fields with infinite degrees of freedom, a procedure called second quantization is called for. Essentially, the wave function is promoted from a vector to an operator in a Hilbert space, see, for instance Schwabl (2008). There exist, however, various types quantization schemes that have been proposed over the decades, each coming with their own merits and drawbacks (Kaku 1993). A prominent example is Feynman’s path integral formulation (Feynman 1942, 1948), see Sect. 9.1.

After this detour, in the context of second quantization, \(\psi ^i\) is now understood as being on par with the transformation operator U(g), and the effect of the group action g can be stated as

$$\begin{aligned} \psi ^{\prime i} = U (g) \psi ^i U^{-1}(g). \end{aligned}$$
(3.25)

By virtue of this equation, it has become possible to link quantum fields with group theory. So not only is quantum field theory quantum mechanics extended to infinite degrees of freedom, it is, roughly speaking, also the merger of quantum theory with group theory. Namely in the sense that abstract group transformations are represented and thus realized as linear transformations on the vector spaces of quantum physics.

It should be noted that (3.25) can be understood in the terms of linear algebra as a similarity transformation of matrices, where \(\psi ^i\) and \(\psi ^{\prime i}\) represent the same operator under two different bases. Using the power of representation theory (Tung 1993; Cornwell 1997), the effect of the transformation can be explicitly formulatedFootnote 11 as

$$\begin{aligned} U (g) \psi ^i U^{-1}(g) = \text {U}^i_{j} \psi ^j, \end{aligned}$$
(3.26)

employing Einstein’s summation convention. In a final step, the effect of the group action on the vector field can now be solely derived from the knowledge of the Lie algebra. As the Lie group G is a continuous transformation group, its elements can be naturally parametrized, for instance, by the set of scalars \(\theta _k\), as \(g=g(\theta _1, \ldots , \theta _n)\), with \(n=\text {dim}(G)\). These variables carry over into the matrix \(\text {U} = \text {U} (\theta _1, \ldots , \theta _n)\).

By virtue of (3.21), \(\text {U}\) can be associated with a one-parameter subgroup of G. Alternatively, via (3.18), \(\text {U}\) is understood as a continuous symmetry. In the end, the generators \(X^a \in \mathfrak {g}\) encode the information for the group action

$$\begin{aligned} \text {U} (\theta _1, \ldots , \theta _n) = \exp ({\theta _k \text {X}^k}), \end{aligned}$$
(3.27)

where the matrices \(\text {X}^k\) are a matrix representation of the Lie algebra generators \(X^k\), implying that they satisfy the commutation relations (3.19). In other words, the Lie brackets of the matrix representations must obey the Jacobi identity. As the structure constants \(f^{kij}\) satisfy a similar relation, the elements of the matrix representations can be defined as

$$\begin{aligned}{}[\text {X}^k]^{ij} := f^{kij}. \end{aligned}$$
(3.28)

By construction, this simple definition of the matrices \(\text {X}^k\) captures all the features of the Lie algebra, satisfying (3.19), and is called the adjoint representation.Footnote 12 All these manipulations culminate in the following equation, reducing the effect of the symmetry group action on the vector filed to the structure constants of the Lie algebra. Expanding (3.25)–(3.27) in a Taylor series yields

$$\begin{aligned} \begin{aligned} \psi ^{\prime i}&= \exp \left( \theta _k [\text {X}^k]^i_{\ j} \right) \psi ^j = \exp \left( \theta _k f^{ki}_j \right) \psi ^j\\&= \psi ^i + \theta _k f^{ki}_j \psi ^j + \mathcal {O} ( \theta ^2 ) , \end{aligned} \end{aligned}$$
(3.29)

for generators close to the identity element. The symbol \(\mathcal {O}\), also known as big-O notation, generally describes the asymptotic behavior of a function, or, in this case, encapsulates higher order terms. In a more compact notation, the Lagrangian of a vector field is invariant under a symmetry group G, transforming \(\psi ^i \rightarrow \psi ^{\prime i}\), if (3.22) holds. Then, for infinitesimal transformations

$$\begin{aligned} \psi ^{\prime i} = \psi ^i + \theta _k f^{ki}_{j} \psi ^j = \psi ^i + \delta \psi ^i. \end{aligned}$$
(3.30)

Building on this procedure, Noether’s theorem can now easily be proved. The change in the quantum field \(\psi ^i\), induced by the symmetry transformation and encoded as \(\delta \psi ^i\), causes a corresponding perturbation in the Lagrangian

$$\begin{aligned} \delta \mathcal {L} = \frac{\delta \mathcal {L}}{\delta \psi ^i} \delta \psi ^i + \frac{\delta \mathcal {L}}{\delta (\partial _\mu \psi ^i)} \delta (\partial _\mu \psi ^i). \end{aligned}$$
(3.31)

Utilizing the Euler–Lagrange equations for Lagrange densities, the field-theoretic version of (3.1), yields

$$\begin{aligned} \delta \mathcal {L} = \theta _k \partial _\mu \left[ \frac{\delta \mathcal {L}}{\delta (\partial _\mu \psi ^i)} f^{ki}_{j} \psi ^j \right] =: \theta _k \partial _\mu \mathcal {J}^{\mu k}. \end{aligned}$$
(3.32)

The invariance requirement \(\delta \mathcal {L} = 0\) leads to a conserved quantity \(\mathcal {J}\). See, for instance Cheng and Li (1996).

2 Symmetry Manifested

Symmetry is perhaps the profoundest concept ever to be discovered in theoretical physics, uncovering deep truths about the workings of reality. It lies at the heart of special relativity and quantum theory, as will be demonstrated in the following sections.

2.1 Causality and the Relation of Space and Time

The experimental discovery that light propagates at a constant speed c, regardless of the speed of any observer, posed a great challenge to physicists. The resolution would transform physics as it was known and reveal deep connections between different laws of physics.

In April and July of the year 1887, Albert A. Michelson and Edward Morley set up an experiment to verify the existence of the aether, a postulated substance that permeated all of space and acted as the medium for light to propagate in. The result was negative. Not only was there no aether to be detected, but more puzzlingly, velocities could not be simply added up linearly, as Galileo Galilei had envisioned in his theory of relativity.

One year after the Michelson–Morley experiment, George FitzGerald proposed the revolutionary idea that the Galilean transformation should be replaced with a transformation that mixes space and time coordinates in inertial frames (Faraoni 2013). This was the first postulation hinting at the malleability of space and time. Hendrik Lorentz, soon after, introduced a fully-fledged transformation rule, today named after him. Consider an inertial frame described by the space and time coordinates \(\{t,x,y,z\}\). An additional inertial frame \(\{t^\prime ,x^\prime ,y^\prime ,z^\prime \}\) is moving with relative velocity v in direction of the x-axis. The following transformation rule describes the mathematics behind moving from \(\{t,x,y,z\}\) to \(\{t^\prime ,x^\prime ,y^\prime ,z^\prime \}\), called a Lorentz boost

$$\begin{aligned} \begin{aligned} t \rightarrow t^{\prime }&= \frac{x-vt}{\sqrt{1-\frac{v^2}{c^2}}},\\ x \rightarrow x^{\prime }&= y,\\ y \rightarrow y^{\prime }&= z,\\ z \rightarrow z^{\prime }&= \frac{t- \frac{vx}{c^2}}{\sqrt{1-\frac{v^2}{c^2}}}.\\ \end{aligned} \end{aligned}$$
(3.33)

Historically, Lorentz derived his transformation rule employing the newly discovered invariance of the speed of light, the fact that all observers measure the same value for c in their reference frames. Consider a spherical pulse of electromagnetic radiation emitted at the origin of each inertial system at \(t = 0\). It propagates along the x and \(x^{\prime }\)-axis as follows

$$\begin{aligned} \begin{aligned} x&= ct,\\ x^{\prime }&=ct^{\prime }. \end{aligned} \end{aligned}$$
(3.34)

From this consistency requirement, the Lorentz transformation in (3.33) can be derived (Faraoni 2013).

However, the Lorentz transformation reveals a far deeper truth. The constant c appearing in (3.33) denotes a fundamental velocity, which is a priori unrelated to the speed of light in a vacuum. Let us call it \(c_{\text {sc}}\), representing a space-time structure constant. If one postulates that reality should make sense, then the Lorentz transformation is the only possible solution. In other words, in a comprehensible universe, the laws of physics are unchanged in reference frames and are independent of position, orientation, and velocity. This consistency assumption translates into the following commonsensical requirements that:

  1. (1)

    There exist no preferred reference frames.

  2. (2)

    It is possible to transform between observers in reference frames.

Only by adhering to these postulates, the Lorentz transformation can be derived, without any reference to c, the invariant speed of light (von Ignatowsky 1911; Pelissetto and Testa 2015). Now, the constant velocity \(c_{\text {sc}}\) appearing in (3.33) is interpreted as the speed of causality, the theoretical maximal velocity of information transmission in the universe (Landau and Lifshitz 1951). To understand this, the theory of special relativity, building on Lorentz’ insights, had to be formulated.

Einstein interpreted the meaning of the Lorentz transformation in his theory of special relativity, yielding the theory’s prominent predictions: time dilation, length contraction, and the equivalence of mass and energy \(E=mc^2\). He original introduced special relativity in 1905 based on two postulates of symmetry (Einstein 1905):

  1. (1)

    The laws of physics are invariant in all inertial systems (i.e., non-accelerating frames of reference)

  2. (2)

    The speed of light in a vacuum is the same for all observers (regardless of the motion of the light source).

Enforcing Lorentz invariance for Postulate (2) results in the mixing of space and time. As a result, observers disagree on the chronological order of events—someone’s past is in someone else’s future. This dramatic turn of events threatened to render time and causality meaningless. Luckily, the universe conspires in a way to uphold a more general notion of causality than the temporal one we naively assumed to exist. If causality is expressed as a space-time interval, then it becomes a universal property all observers agree on. In mathematical terms

$$\begin{aligned} (\varDelta s)^2 := (c \varDelta t)^2 - (\varDelta x)^2 - (\varDelta y)^2 - (\varDelta z)^2. \end{aligned}$$
(3.35)

The space-time interval \((\varDelta s)^2\) encodes the separation of events in space-time and it holds that

$$\begin{aligned} (\varDelta s)^2 = (\varDelta s^{\prime })^2, \end{aligned}$$
(3.36)

for all reference frames. Due to the minus sign, \((\varDelta s)^2\) can be positive, zero, or negative. This means that the space-time interval between two distinct events can result in the events being separated by more time than space, or vice versa. In other words, the space-time interval between events A and B says something about if and how A can influence B. This is a description of causality, which is invariant and universally agreed upon by all observers. Having lost causality in time, we rediscover it in space-time. Mathematically, this is described by a Minkowski space, a four-dimensional reality that contains all past, present, and future events (see Sect. 3.2.2.1 below for a more formal definition). The notion of space-time invokes the analogy of a block universe, where the passage of time is an illusion. All observers in space-time move through the block and experience slices of it as their present. In this sense, the entire unchanging time-line of an observer represents their reality. So, why do our brains make us perceive space-time so vividly as a distinctly spatial entity evolving in time? No one knows, but this apparent atemporal reality underlying our illusion of the passage of time can be consoling. During Einstein’s early years, he worked in obscurity in the patent office in Bern with Michele Besso. When Besso died in 1955, Einstein wrote to the widow (Wuppuluri and Ghirardi 2017, p. 469):

Now he has departed from this strange world a little ahead of me. That signifies nothing. For those of us who believe in physics, the distinction between past, present and future is only a stubbornly persistent illusion.

Two weeks later, Einstein would also die.

The question remains how the speed of causality \(c_{\text {sc}}\) in the Lorentz transformation, derived solely from the relation between space and time, is related to the speed of light. Yet again, invariance gives the answer. Maxwell’s equations (2.4), encoding everything there is to know about electromagnetism, are only invariant under Lorentz transformations for a very specific value of \(c_{\text {sc}}\). The fundamental speed limit of causality and the contents of Maxwell’s equations have to interrelate, in order for invariance to be upheld.

It was known that a wave equation, describing the propagation of electromagnetic radiation, can be easily derived from Maxwell’s equations (Jackson 1998). The speed of these waves is derived from the two fundamental constants appearing in the equations: the permittivity (\(\varepsilon _0\)) and the permeability (\(\mu _0\)) of the vacuum. They combine to yield a theoretical definition of the velocity of electromagnetic radiation—in other words, the speed of light. It is found that

$$\begin{aligned} c :=\sqrt{\frac{1}{\varepsilon _0 \mu _0}}. \end{aligned}$$
(3.37)

This is the only speed massless particles can travel at and particles with mass can never reach this speed. Equipped with this knowledge, the final piece of the puzzle is found, where \(c_{\text {sc}} = c\).

The Lorentz transformation is a manifestation of a deep symmetry in nature. Requiring physical theories to be Lorentz invariant results in the speed of causality being the constant speed of light. Moreover, the Lorentz transformation reveals the intimate interplay of space and time, setting the stage for special relativity.

2.2 Elementary Particles

Continuing with the story, another surprising and deep link between symmetry and the nature of elementary particles becomes apparent. This insight would also be awarded with a Nobel Prize.

2.2.1 The Lorentz Group

A prominent example of a Lie group G is the Lorentz group \(\mathscr {L}\). As a group of transformations it encodes fundamental symmetries of space-time. In detail, \(\mathscr {L}\) is the group of the isometries of space-time, i.e., distance-preserving maps between spaces endowed with a metric, which leave the origin fixed.

Formally, the merger of space and time is accomplished by the means of Minkowski space, a four-dimensional manifold. A vector in this space is comprised of \(x^\mu = (t, \mathbf {x})\), where natural unitsFootnote 13 are assumed. This is the abstract setting in which Einstein’s theory of special relativity is formulated. The metric tensor associated with flat Minkowski space-time is simply a diagonal matrix \(g^{\mu \nu } = \text {diag}(-1, 1, 1, 1) = - g_{\mu \nu }\). Hence \(x_\mu = g_{\mu \nu } x^\nu = (t,-\mathbf {x})\). Sometimes the notation \(\eta ^{\mu \nu }\) is used for flat space-time, reserving \(g^{\mu \nu }\) for the curved case. The Lorentz group can be represented as the generalized orthogonal group O(1, 3) , the matrix Lie group which preserves the quadratic form \(ds^2 = g_{\mu \nu } dx^\mu dx^\nu = dt^2 -dx^2 -dy^2-dz^2\). Recall (3.35) defining the space-time interval.

As Maxwell’s field equations in the theory of electrodynamics, seen in (2.4), the Dirac equation,Footnote 14 and the kinematic laws of special relativity, given in (4.56), are all invariant under Lorentz transformations, the corresponding Lorentz group is understood as encoding the symmetries of fundamental laws of nature.

In detail, a general group element \(\varLambda \in \mathscr {L}\) induces the transformation

$$\begin{aligned} x^\mu \rightarrow x^{\prime \mu } =\varLambda ^\mu _{\ \nu } x^\nu . \end{aligned}$$
(3.38)

A concrete example of \(\varLambda \) is the Lorentz boost seen in (3.33). In analogy to (3.26), the effect of \(\mathscr {L}\) on a generic quantum field \(\psi ^\rho \) is captured by the following expression

$$\begin{aligned} U (\varLambda ) \psi ^\rho (x^\mu ) U(\varLambda ^{-1}) = [\text {U}(\varLambda ^{-1})]^\rho _{\ \sigma } \psi ^{\sigma }(\varLambda x^\mu ), \end{aligned}$$
(3.39)

where U is the operator representing \(\varLambda \) on the Hilbert space where \(\psi ^\rho \) is defined, with the corresponding matrix representation on the right-hand side.

Fields transforming as (3.39) are called spinors. These objects, requiring 720\(^\circ \) to complete a full rotation, reflect the true rotational symmetry of space. As mentioned earlier, spinors represent mass particles, i.e., leptons and quarks, and are generally categorized as fermions, particles carrying half-integer spin. It is an interesting piece of history, that it took a long time for physicists to understand these strange quantities existing in Minkowski space. Paul Ehrenfest, coining the term spinor, remarked in 1932 (adapted from a translated quote seen in Tomonaga 1997, p. 130):

By all measures, it is truly strange that absolutely no one, until the work of Pauli [...] and Dirac, which is twenty years after special relativity [...], suggested this eerie proposition, that a mysterious tribe by the name of the spinor family inhabits isotropic [three-dimensional] space or the Einstein-Minkowski world.

Until the full connection between the transformation properties of spinors and the Lorentz group were uncovered, spinors had raised their heads at various points in time. In their most general mathematical form they were discovery by Éli Cartan in 1913 (Cartan 1913, 1938, 1966), a mathematician who was involved in fundamental work on the theory of Lie groups and also their geometric applications. Then, due to efforts aiming at incorporating the notion of spin into the framework of quantum mechanics, in other words, by constructing a quantum theory of the electron, Wolfgang Pauli and Dirac found equations describing the behavior of spinors(Pauli 1927; Dirac 1928). As these entities were comprised of two respectively four elements, they were simply referred to as two-component or four-component quantities. In 1928, Dirac started to investigate how the Schrödinger equation (3.24), could be made consistent with the principles of special relativity, in effect marrying quantum mechanics and relativity. Formally, he was searching for a Lorentz invariant quantum wave equation, incorporating spinor fields \(\psi (x^\nu )\) with mass, describing electrons and quarks. This straightforward task would lead him deeper into the abstract world, as this feat could only be accomplished by introducing novel mathematical quantities. In order to sculpture yet another variation of the theme of derivatives, Dirac introduced a set of specific matrices \(\gamma ^\mu , \mu =0,\ldots ,3\). Today, they are known as Dirac matrices and the new derivative takes the from

(3.40)

introducing Feynman’s slash notation. The Dirac equation for a free spin-1/2 particle with mass m reads

(3.41)

In the presence of an electromagnetic field, encoded in the 4-vector potential \(A_\mu \) of (4.12), the equation takes on the form

(3.42)

where e is the elementary charge. The Dirac Lagrangian reads

(3.43)

with the Hermitian conjugate \(\bar{\psi }\). As usual, the equations of motion, in this case the Dirac equation (3.41), can be derived from the Lagrangian utilizing the Euler–Lagrange equations (3.6). Dirac’s insights opened up a whole new section in the Book of Nature, see Collins et al. (1989), Kaku (1993), Peskin and Schroeder (1995), Ryder (1996). More on the history of spin can be found in the book of the Nobel laureate Sin-itiro Tomonaga (Tomonaga 1997).

The generators \(M^{\mu \nu }\) of the Lie algebra \(\mathfrak {o}(1,3)\) of the Lorentz group satisfy the specific commutation relations

$$\begin{aligned} \left[ M^{\mu \nu }, M^{\rho \sigma } \right] = i \left( g^{\nu \rho } M^{\mu \sigma } - g^{\mu \rho } M^{\nu \sigma } - g^{\nu \sigma } M^{\mu \rho } + g^{\mu \sigma } M^{\nu \rho } \right) , \end{aligned}$$
(3.44)

encoding the properties of \( \text {U}(\varLambda ) \in O(1,3)\) . An explicit matrix representation is found to be

$$\begin{aligned}{}[\text {M}^{\mu \nu }]^\rho _{\ \sigma } = i (g^{\mu \rho } \delta ^\nu _{\ \sigma } - g^{\nu \rho } \delta ^\mu _{\ \sigma }), \end{aligned}$$
(3.45)

employing Kronecker’s delta. Parameterizing \(\varLambda = \varLambda (\omega )\), the operators, following (3.27), can be defined via these generators in the Lie algebra

$$\begin{aligned} \text {U}(\varLambda ) = \exp (i \omega _{\mu \nu } \text {M}^{\mu \nu }). \end{aligned}$$
(3.46)

A Lorentz transformation is now explicitly implemented on a 4-vector field \(V^\rho \) , similarly to the generic case seen in, as (3.39)

$$\begin{aligned} V^{\prime \rho } (x^\mu ) = V^\rho (\varLambda x^\mu ) + i \omega _{\mu \nu } [\text {M}^{\mu \nu }]^\rho _{\ \sigma } V^\sigma (\varLambda x^\mu ). \end{aligned}$$
(3.47)

Compare with (3.29). In summary, the representation of the Lorentz group given by \(M^{\mu \nu }\) yields the transformation rules for Lorentz vectors \(V^\rho \). There is, however, another important representation to be uncovered, which is related to spinor fields \(\psi ^\alpha (x^\nu )\). The representation matrices are

$$\begin{aligned} \Sigma ^{\mu \nu } := \frac{i}{4} [\gamma ^\mu , \gamma ^\nu ], \end{aligned}$$
(3.48)

defined via the Dirac matrices. By replacing \(V^\rho \rightarrow \psi ^\alpha \) and \([\text {M}^{\mu \nu }]^\rho _{\ \sigma } \rightarrow [\Sigma ^{\mu \nu }]^\alpha _{\ \beta }\) in (3.47), the transformation property of 4-component spinors under Lorentz transformations is discovered. The matrices \(\Sigma ^{\mu \nu }\) are the generators of the spinor representation of the Lorentz group, derived solely from the Dirac matrices. For more details, see, for instance, Peskin and Schroeder (1995). In Sect. 4.3.2, starting from (4.61), more layers of abstraction will be uncovered.

In summary, quantum fieldsFootnote 15 can be understood by virtue of their transformation properties specified by representations of the Lorentz group. This feat does not only apply to spinors but can generally be extended to bosons: scalar spin-0 fields, vector spin-1 fields, and tensor spin-2 fields can all be characterized by specific representations of \(\mathscr {L}\). General references are Tung (1993), Schwabl (2008).

The Lorentz group encodes the symmetries of fundamental laws of nature: electromagnetism, special relativity, and the quantum behavior of the electron (via the Dirac equation) , as all quantum fields transform as representations of the Lorentz group.

2.2.2 The Poincaré Group

The Poincaré group \(\mathscr {P}\) extends the Lorentz group by an additional transformation

$$\begin{aligned} x^\mu \rightarrow x^{\prime \mu } = x^\mu + a^\mu . \end{aligned}$$
(3.49)

This is simply a translation in space-time along the vector \(a^\mu \).

In essence, \(\mathscr {P}\) represents all isometries of Minkowski space by combining Lorentz transformations with translations:

$$\begin{aligned} x^\mu \rightarrow x^{\prime \mu } =\varLambda ^\mu _{\ \nu } x^\nu + a^\mu , \end{aligned}$$
(3.50)

The Poincaré group is generated by the Lorentz group generators \(M^{\mu \nu }\), recalling (3.46), and the additional generators \(P^\mu \), obeying specific commutation relations. It can be shown that \(P^\mu = i \partial ^\mu \) (Ryder 1996). This is in analogy with ordinary quantum mechanics, where classical quantities are replaced by operators. For instance, energy and momentum:

$$\begin{aligned} E \rightarrow i \partial _t, \quad \mathbf {p} \rightarrow \nabla / i, \end{aligned}$$
(3.51)

see, for instance Schwabl (2007). In effect, a state in the Hilbert space representing a massive particle with 4-momentum \(p^\mu = (m, \mathbf {p})\), written in bra-ket notation (Sakurai 1994) as \(| p^\mu \rangle \), satisfies the eigenvalue equation \(P^\mu | p^\mu \rangle = p^\mu | p^\mu \rangle \).

The full commutation relations defining the Poincaré algebra are

$$\begin{aligned} \begin{aligned} \left[ P^{\mu }, P^{\nu } \right]&= 0, \\ \left[ P^\mu , M^{\nu \rho }\right]&= i \left( g^{\mu \nu } P^\rho - g^{\mu \rho } P^\nu \right) , \\ \left[ M^{\mu \nu }, M^{\rho \sigma } \right]&= i \left( g^{\nu \rho } M^{\mu \sigma } - g^{\mu \rho } M^{\nu \sigma } - g^{\nu \sigma } M^{\mu \rho } + g^{\mu \sigma } M^{\nu \rho } \right) , \end{aligned} \end{aligned}$$
(3.52)

where \(g^{\mu \nu } = \text {diag}(1, -1, -1, -1)\) represents the flat space-time metric as in the case of the Lorentz group.

Lie algebras contain special elements called Casimir operators. By definition, they commute with all generators in the Lie algebra. The representations of the group can always be labeled by the eigenvalues of the Casimir operators (Tung 1993; O’Raifeartaigh 1988). The Poincaré group has two Casimir operators, \(C_1 = P_\mu P^\mu \) and \(C_2 = W_\mu W^\mu \), where \(W^\mu \), called the Pauli–Lubanski tensor, is a function of \(P^\nu \) and \(M^{\rho \sigma }\).

Using this mathematical machinery, Eugene Wigner could demonstrate the following remarkable fact (Wigner 1939):

All known physical particle states transform as representations of the Poincaré group.

These insights would win him the Nobel Prize in 1963.

In detail, the eigenvaluesFootnote 16 of the Casimir operators are

$$\begin{aligned} C_1 = m^2, \quad C_2 = m^2 s(s+1), \end{aligned}$$
(3.53)

where m represents the particle’s mass and s its spin. The resulting representations are associated with the following particle states

$$\begin{aligned} \begin{aligned}&|m,s\rangle ; \quad \ s = \frac{1}{2},1, \frac{3}{2},\ldots \\&|h\rangle ; \quad \ h = \pm s, \end{aligned} \end{aligned}$$
(3.54)

where h is the generalization of spin to mass-less states called helicity and \(|m,s\rangle \) labels particle state distinguished by their mass and spin.

Wigner’s work also sheds light on the question why particles have quantized spin and establishes that spin is indeed associated with the group of rotations, justifying and formalizing the vague notion of understanding spin as an intrinsic quantum form of angular momentum. General references are Tung (1993), Kaku (1993), Ryder (1996).

It should, however, also be noted, that other states associated with further possible representations have not been observed in nature. As an example, Wigner’s classification also yields tachyons. These are particles with \(m^2<0\), implying imaginary mass. Using the equation for the total relativistic energy of a particle with rest mass m (Einstein 1956), and switching to the SI system of units

$$\begin{aligned} E = \frac{mc^2}{\sqrt{1 - \frac{v^2}{c^2}}}, \end{aligned}$$
(3.55)

the condition \(v>c\), implying a speed faster than light, results in \(E = m^\prime c^2 / ib\), for some real number b. As E is a real number by definition, this requires \(m^\prime = im\), establishing tachyons, defined by imaginary mass, as being superluminal particles.

Finally, the groups \(\mathscr {L}\) and \(\mathscr {P}\), with their representations describing the transformation properties of quantum fields and physical particle states, respectively, are related as follows. Recalling that in bra-ket notation, an arbitrary state is described by \(| \psi \rangle \), the wave function associated with this state is found to be \(\psi (x^\mu ) = \langle x^\mu | \psi \rangle \). In momentum-space, this wave function can be re-expressed as \(\varPsi (p^\mu ) = \langle p^\mu | \psi \rangle \), where the relationship between \(\psi \) and \(\varPsi \) is established by a Fourier transformation (Sakurai 1994). Both fields transform identically under Lorentz transformations. A wave equation for \(\varPsi \), for instance the Schrödinger or Dirac equation, allows the quantity to be expanded in terms of coefficients which transform as representations of the Poincaré group, i.e., identically to the particle states (Tung 1993, Section 10.5.3).