In this chapter, we summarize some theoretical fundamentals. We assume that the reader is already familiar with these basic facts. The main purpose of this chapter is to introduce the notation that is used in this book and to provide a reference. Therefore, the explanations are brief, and no proofs are given.

2.1 Fourier Analysis and Application to Beam Signals

In this section, several formulas for Fourier series and the Fourier transform are summarized. However, we do not discuss the properties of a function that are necessary for the existence of the transformation. For those foundations, the reader should consult the references cited here.

2.1.1 Fourier Series

A real-valued periodic function f(t) with period T may be decomposed into Fourier components according to the Fourier series

$$\displaystyle{ \fbox{$f(t) =\sum _{ n=-\infty }^{\infty }c_{ n}\;e^{\mathit{jn}\omega t}\mbox{ with }\omega = \frac{2\pi } {T},$} }$$
(2.1)

where the complex coefficients cn are determined by

$$\displaystyle{\fbox{$c_{n} = \frac{1} {2\pi }\int _{-\pi }^{\pi }f(\varphi )\;e^{-\mathit{jn}\varphi }\mathrm{d}\varphi $}}$$

or by

$$\displaystyle{ \fbox{$c_{n} = \frac{1} {T}\int _{-T/2}^{T/2}f(t)\;e^{-\mathit{jn}\omega t}\mathrm{d}t,$} }$$
(2.2)

where we made the substitution \(\varphi =\omega t\).

With the substitution \(x = t + T\), we obtain

$$\displaystyle{\int _{-T/2}^{0}f(t)\;e^{-\mathit{jn}\omega t}\mathrm{d}t =\int _{ T/2}^{T}f(x - T)\;e^{-\mathit{jn}\omega x}e^{\mathit{jn}\omega T}\mathrm{d}x.}$$

Due to \(\omega = \frac{2\pi } {T}\), the last exponential function equals 1. Furthermore, we have \(f(x - T) = f(x)\), so that

$$\displaystyle{\int _{-T/2}^{0}f(t)\;e^{-\mathit{jn}\omega t}\mathrm{d}t =\int _{ T/2}^{T}f(x)\;e^{-\mathit{jn}\omega x}\mathrm{d}x =\int _{ T/2}^{T}f(t)\;e^{-\mathit{jn}\omega t}\mathrm{d}t}$$

holds. Therefore, we may use

$$\displaystyle{ c_{n} = \frac{1} {T}\int _{0}^{T}f(t)\;e^{-\mathit{jn}\omega t}\mathrm{d}t }$$
(2.3)

instead of Eq. (2.2).

2.1.2 Spectrum of a Dirac Comb

In this book, the Dirac delta distribution is used in a heuristic way without the foundations of distribution theory. Therefore, the reader should be aware that the results presented still have to be proven mathematically. For example, we use the formula

$$\displaystyle{\int _{-\infty }^{+\infty }f(x)\;\delta (x - x_{ 0})\;\mathrm{d}x = f(x_{0}),}$$

even though it does not have any meaning in the scope of classical analysis.

A strongly bunched beam may be approximated by a sum of Dirac delta pulses

$$\displaystyle{f(t) =\sum _{ k=-\infty }^{\infty }\delta (t - kT),}$$

which is called Dirac comb. For this special sum of Dirac pulses, one obtains the following Fourier coefficients (only the Dirac pulse with k = 0 is located inside the interval \(-T/2 \leq t \leq +T/2\)):

$$\displaystyle{ c_{n} = \frac{1} {T}\int _{-T/2}^{T/2}\delta (t)\;e^{-\mathit{jn}\omega t}\mathrm{d}t = \frac{1} {T}. }$$
(2.4)

Hence, all coefficients are equal. According to Eq. (2.1), we get

$$\displaystyle{\sum _{k=-\infty }^{\infty }\delta (t - kT) = \frac{1} {T}\sum _{n=-\infty }^{\infty }e^{\mathit{jn}\omega t}.}$$

This can also be written as

$$\displaystyle\begin{array}{rcl} & \sum _{k=-\infty }^{\infty }\omega \;\delta (\omega t -\omega kT) = \frac{1} {T}\sum _{n=-\infty }^{\infty }e^{\mathit{jn}\omega t}& {}\\ & \Rightarrow 2\pi \sum _{k=-\infty }^{\infty }\delta (\varphi -2\pi k) =\sum _{ n=-\infty }^{\infty }e^{\mathit{jn}\varphi }.& {}\\ \end{array}$$

2.1.3 Different Representations of the Fourier Series

The general definition of the Fourier series shows that the cn are defined in such a way that both positive and negative frequencies occur. If only positive frequencies are to be allowed, one may write Eq. (2.1) as follows:

$$\displaystyle{ f(t) = c_{0} +\sum _{ n=1}^{\infty }\left (c_{ n}e^{\mathit{jn}\omega t} + c_{ -n}e^{-\mathit{jn}\omega t}\right ) = }$$
(2.5)
$$\displaystyle\begin{array}{rcl} & =& c_{0} +\sum _{ n=1}^{\infty }\left (c_{ n}\left [\cos (n\omega t) + j\;\sin (n\omega t)\right ]\right. \\ & & +\left.c_{-n}\left [\cos (n\omega t) - j\;\sin (n\omega t)\right ]\right ).\ \qquad {}\end{array}$$
(2.6)

We obtain the result

$$\displaystyle{f(t) = c_{0} +\sum _{ n=1}^{\infty }\left [(c_{ n} + c_{-n})\cos (n\omega t) + j(c_{n} - c_{-n})\sin (n\omega t)\right ].}$$

By means of the definition

$$\displaystyle{a_{n} = c_{n} + c_{-n}}$$

and

$$\displaystyle{b_{n} = j(c_{n} - c_{-n}),}$$

one obtains

$$\displaystyle{ \fbox{$f(t) = \frac{a_{0}} {2} +\sum _{ n=1}^{\infty }\left [a_{ n}\cos (n\omega t) + b_{n}\sin (n\omega t)\right ].$} }$$
(2.7)

Taking a0 = 2c0 and b0 = 0 into account, one may calculate the coefficients cn if an and bn are known:

$$\displaystyle{ c_{n} = \frac{a_{n} - jb_{n}} {2}. }$$
(2.8)

For the special case that \(c_{n} = 1/T\) holds for all n (Dirac comb; see Sect. 2.1.2), one obtains \(a_{n} = 2/T\) and bn = 0. According to Eq. (2.7), this means that the average, i.e., the DC component, of a strongly bunched beam is exactly one-half the fundamental harmonic:

$$\displaystyle{ \sum _{k=-\infty }^{\infty }\delta (t - kT) = \frac{1} {T} + \frac{2} {T}\sum _{n=1}^{\infty }\cos (n\omega t). }$$
(2.9)

Now we return to the general case. Instead of using an and bn, one may also use amplitudes and phases:

$$\displaystyle{ \fbox{$f(t) = \frac{a_{0}} {2} +\sum _{ n=1}^{\infty }d_{ n}\;\cos (n\omega t +\varphi _{n}).$} }$$
(2.10)

A comparison with Eq. (2.7) shows that

$$\displaystyle\begin{array}{rcl} & a_{n}\cos (n\omega t) + b_{n}\sin (n\omega t) = d_{n}\;\cos (n\omega t +\varphi _{n}) & {}\\ & \Rightarrow a_{n}\cos (n\omega t) + b_{n}\sin (n\omega t) = d_{n}\;\cos (n\omega t)\;\cos \varphi _{n} - d_{n}\;\sin (n\omega t)\;\sin \varphi _{n}.& {}\\ \end{array}$$

This leads to the following conditions:

$$\displaystyle{ a_{n} = d_{n}\;\cos \varphi _{n}, }$$
(2.11)
$$\displaystyle{ b_{n} = -d_{n}\;\sin \varphi _{n}. }$$
(2.12)

According to Eq. (2.8), we therefore have

$$\displaystyle{ \varphi _{n} = \measuredangle \;c_{n} }$$
(2.13)

and

$$\displaystyle{ d_{n} = \sqrt{a_{n }^{2 } + b_{n }^{2}}. }$$
(2.14)

Due to

$$\displaystyle{ \vert c_{n}\vert = \frac{1} {2}\sqrt{a_{n }^{2 } + b_{n }^{2}}, }$$
(2.15)

one obtains

$$\displaystyle{ d_{n} = 2\vert c_{n}\vert }$$
(2.16)

as the physical amplitudes (peak values). By inserting Eqs. (2.11) and (2.12) into Eq. (2.8), one gets

$$\displaystyle{c_{n} = \frac{d_{n}} {2} e^{j\varphi _{n} }.}$$

The same result is obtained by combining Eqs. (2.13)–(2.15).

2.1.4 Discrete Fourier Transform

The discrete Fourier transform is a powerful tool for spectral analysis of signals that are given in digital form, e.g., on a computer. Therefore, we briefly discuss some important features here.

2.1.4.1 Motivation of the Transformation Formula

Let us now assume that a real-valued periodic function f(t) with period \(T = \frac{2\pi } {\omega }\) is discretized according to

$$\displaystyle{f_{k} = f(k\Delta t),}$$

where k is an integer. The period T is divided into \(N \in \mathbb{N}\) time intervals

$$\displaystyle{\Delta t = \frac{T} {N}}$$

such that f0 = fN holds. Therefore, the N samples \(f_{0},f_{1},\ldots,f_{N-1}\) are sufficient to describe the function f(t), provided that N is large enough. We now replace the integral in Eq. (2.3) by the Riemann sum

$$\displaystyle{c_{n}\,\approx \,\frac{1} {T}\sum _{k=0}^{N-1}f(k\Delta t)e^{-\mathit{jn}\omega k\Delta t}\Delta t\,=\, \frac{1} {T}\sum _{k=0}^{N-1}f_{ k}e^{-j2\pi \mathit{nk}/N}\Delta t\,=\, \frac{1} {N}\sum _{k=0}^{N-1}f_{ k}e^{-j2\pi \mathit{nk}/N}.}$$

This formula is used to define the discrete Fourier transform (DFT)

$$\displaystyle{ \fbox{$X_{n} = \frac{1} {N}\sum _{k=0}^{N-1}x_{ k}\;e^{-j2\pi \mathit{nk}/N}.$} }$$
(2.17)

This obviously yields an approximation of the Fourier coefficients cn of the periodic function f(t), provided that the number N of samples \(x_{k} = f(k\Delta t)\) is large enough.

2.1.4.2 Symmetry Relations

Based on Eq. (2.17), we find that

$$\displaystyle{X_{n+N} = \frac{1} {N}\sum _{k=0}^{N-1}x_{ k}\;e^{-j2\pi \mathit{nk}/N}\;e^{-j2\pi Nk/N} = \frac{1} {N}\sum _{k=0}^{N-1}x_{ k}\;e^{-j2\pi \mathit{nk}/N} \cdot 1 = X_{ n}.}$$

Therefore, all Xn are known if those for 0 ≤ n ≤ N − 1 are specified. One sees that for a sample \((x_{0},x_{1},\ldots,x_{N-1})\), one obtains a sample \((X_{0},X_{1},\ldots,X_{N-1})\) as the spectrum.

Since we have assumed that the signal f(t) is real-valued and periodic, the same is true for the samples xk. Based on Eq. (2.17), it is then obvious that the symmetry relation

$$\displaystyle{X_{-n} = X_{n}^{{\ast}}}$$

holds. We may also combine these two symmetry relations to obtain

$$\displaystyle{X_{N-n} = X_{-(n-N)} = X_{n-N}^{{\ast}} = X_{ n}^{{\ast}}.}$$

Therefore, only about one-half of the coefficients Xn with 0 ≤ n ≤ N − 1 have to be calculated.

2.1.4.3 Interpretation of the Spectral Components

According to Eq. (2.1), the sample X0 belongs to the DC component of the signal. The sample X1 obviously belongs to the angular frequency

$$\displaystyle{1\cdot \omega = \frac{2\pi } {T}.}$$

Therefore, the spectrum \((X_{0},X_{1},\ldots,X_{N-1})\) has a resolution of \(f = 1/T\), where T is the total time that passes between the samples x0 and xN. It is obvious that XN−1 belongs to the frequency

$$\displaystyle{f_{\mathrm{max}} = \frac{N - 1} {T} = \frac{N - 1} {N} \frac{1} {\Delta t} \approx \frac{1} {\Delta t} = f_{\mathrm{sampl}}.}$$

This approximation is, of course, valid only for large samples with N ≫ 1. Hence we conclude that the frequency resolution is given by the inverse of the total time T, whereas the maximum frequency is determined by the sampling frequency \(f_{\mathrm{sampl}} = 1/\Delta t\). However, due to \(X_{N-n} = X_{n}^{{\ast}}\), only one-half of this frequency range between 0 and fmax actually contains information. In other words, and in compliance with the Nyquist–Shannon sampling theorem, sampling has to take place with at least twice the signal bandwidth.

These properties are visualized in Table 2.1.

Table 2.1 Overview of DFT components of real-valued signals

If one makes sure that the N equidistant samples xn of the periodic function represent an integer number of periods (so that duplicating \((x_{0},x_{1},\ldots,x_{N-1})\) does not introduce any severe discontinuities), one may obtain good results even without sophisticated windowing techniques.

For the interpretation of the spectrum, please note that the DC component is equal to

$$\displaystyle{f_{\mathrm{DC}} = \frac{a_{0}} {2} = c_{0} = X_{0},}$$

i.e., to the first value of the DFT.

According to Eq. (2.16), the amplitude (peak value) at the frequency pT is given by

$$\displaystyle{d_{p} = 2\vert c_{p}\vert = 2\vert X_{p}\vert.}$$

The discussion above shows that the sample \((X_{0},X_{1},\ldots,X_{N-1})\) contains all the information about the spectrum, but that the DFT spectrum is infinite. It does not even decrease with increasing frequencies. At first glance, this looks strange, but in our introduction to the DFT, we assumed only that the integral over \(\Delta t\) may approximately be replaced by a product with \(\Delta t\). We made no assumption as to how the function f(t) varies in the interval \(\Delta t\). This explains the occurrence of the high-frequency components.

It should be clear from the Nyquist–Shannon sampling theorem that the spectrum for frequencies larger than fmax∕2 cannot contain any relevant information, since the sampling frequency is fixed at \(\Delta t \approx 1/f_{\mathrm{max}}\).

Therefore, in the next section, we filter out those frequencies to obtain the inverse transform.

2.1.4.4 Inverse DFT

As mentioned above, the Nyquist–Shannon sampling theorem tells us that we should consider only frequencies fx with

$$\displaystyle{-\frac{f_{\mathrm{max}}} {2} \leq f_{x} \leq +\frac{f_{\mathrm{max}}} {2}.}$$

This corresponds to

$$\displaystyle{-\frac{N - 1} {2T} \leq f_{x} \leq +\frac{N - 1} {2T},}$$

or

$$\displaystyle{-\frac{N - 1} {2} \omega \leq \omega _{x} \leq +\frac{N - 1} {2} \omega.}$$

For the sake of simplicity, we assume that N ≥ 3 is an odd number. If we have a look at Eq. (2.1),

$$\displaystyle{f(t) =\sum _{ n=-\infty }^{\infty }c_{ n}\;e^{\mathit{jn}\omega t},}$$

it becomes clear that only those n with

$$\displaystyle{-\frac{N - 1} {2} \leq n \leq +\frac{N - 1} {2} }$$

lead to the aforementioned frequencies \(\omega _{x} = 2\pi f_{x} = n\omega\). Therefore, we expect to be able to reconstruct the signal based on

$$\displaystyle{f(t) =\sum _{ n=-(N-1)/2}^{+(N-1)/2}c_{ n}\;e^{\mathit{jn}\omega t}.}$$

We now apply the discretization

$$\displaystyle{ f_{k} = f(k\Delta t) =\sum _{ n=-(N-1)/2}^{+(N-1)/2}c_{ n}\;e^{\mathit{jn}\omega k\Delta t} =\sum _{ n=-(N-1)/2}^{+(N-1)/2}c_{ n}\;e^{j2\pi \mathit{nk}/N} }$$
(2.18)

and obtain

$$\displaystyle{\sum _{n=(N+1)/2}^{N-1}c_{ n}\;e^{j2\pi \mathit{nk}/N} =\sum _{ l=-N/2+1/2}^{-1}c_{ l+N}\;e^{j2\pi k\frac{l+N} {N} }.}$$

Here we introduced the new summation index \(l = n - N\). The last formula leads to

$$\displaystyle{\sum _{n=(N+1)/2}^{N-1}c_{ n}\;e^{j2\pi \mathit{nk}/N} =\sum _{ l=-(N-1)/2}^{-1}c_{ l}\;e^{j2\pi kl/N}.}$$

On the right-hand side, we may now rename l as n again. This shows that the sum from \(-(N - 1)/2\) to − 1 included in Eq. (2.18) may be replaced by the sum from \((N + 1)/2\) to N − 1:

$$\displaystyle{f_{k} =\sum _{ n=0}^{N-1}c_{ n}\;e^{j2\pi \mathit{nk}/N}.}$$

This defines the formula for the inverse DFT (not only for odd N):

$$\displaystyle{\fbox{$x_{k} =\sum _{ n=0}^{N-1}X_{ n}\;e^{j2\pi \mathit{nk}/N}.$}}$$

Please note that in the literature, the factor 1∕N is sometimes not included in the definition of the DFT, but it appears in that of the inverse DFT. Our choice was determined by the close relationship to the Fourier series coefficients discussed above. Apart from the factor 1∕N, the DFT and the inverse DFT differ only by the sign in the argument of the exponential function.

2.1.4.5 Conclusion

We have summarized only a few basic facts that will help the reader to interpret the DFT correctly. There are many other properties that cannot be mentioned here.

For large sample sizes equal to a power of 2, the so-called fast Fourier transform(FFT) algorithm may be used, which is a dramatically less time-consuming implementation of the DFT.

2.1.5 Fourier Transform

The Fourier transformX(ω) of a real-valued function x(t) depending on the time variable t is given by

$$\displaystyle{ \fbox{$X(\omega ) =\int _{ -\infty }^{+\infty }x(t)\;e^{-j\omega t}\;\mathrm{d}t,$} }$$
(2.19)

the inverse transform by

$$\displaystyle{ \fbox{$x(t) = \frac{1} {2\pi }\int _{-\infty }^{+\infty }X(\omega )\;e^{j\omega t}\;\mathrm{d}\omega.$} }$$
(2.20)

This relation is visualized by the correspondence symbol

The Fourier transform is a linear transformation. It is used to determine the frequency spectrum of signals, i.e., it transforms the signal x(t) from the time domain into the frequency domain. It is possible to generalize the definition of the Fourier transform to generalized functions (i.e., distributions), which also include the Dirac function [1, 2].

Please note that various definitions for the Fourier transform and for its inverse transform exist in the literature. The factor \(\frac{1} {2\pi }\) may be distributed among the original transformation and the inverse transformation in a different way, and even the sign of the argument of the exponential function may be defined in the opposite way.

Some common Fourier transforms are summarized in Table A.3 on p. 417. Further relations can also be found using symmetry properties of the Fourier transform. Consider the Fourier transform

If the time t in x(t) is replaced by ω, and x(ω) is regarded as a Fourier transform, its inverse transform is given by

In other words, the inverse transform of x(ω) is obtained by replacing ω in the function X(ω) by − t.

2.1.5.1 Fourier Transform of a Single Cosine Pulse

Let

$$\displaystyle{ x(t) = \left \{\begin{array}{ll} 1 +\cos (\Omega t)&\mbox{ for }-\pi < \Omega t <\pi,\\ 0 &\mbox{ otherwise,} \end{array} \right. }$$
(2.21)

define a single cosine pulse. This leads to

$$\displaystyle\begin{array}{rcl} X(\omega )& =& \int _{-\infty }^{+\infty }x(t)\;e^{-j\omega t}\mathrm{d}t =\int _{ -\pi /\Omega }^{+\pi /\Omega }\left [1 +\cos (\Omega t)\right ]\;e^{-j\omega t}\mathrm{d}t = {}\\ & =& \int _{-\pi /\Omega }^{+\pi /\Omega }\left [e^{-j\omega t} + \frac{1} {2}\;e^{j(\Omega -\omega )t} + \frac{1} {2}\;e^{-j(\Omega +\omega )t}\right ]\;\mathrm{d}t = {}\\ & =& \left [\frac{e^{-j\omega t}} {-j\omega } + \frac{1} {2}\;\frac{e^{j(\Omega -\omega )t}} {j(\Omega -\omega )} + \frac{1} {2}\;\frac{e^{-j(\Omega +\omega )t}} {-j(\Omega +\omega )}\right ]_{-\pi /\Omega }^{+\pi /\Omega } = {}\\ & =& \sin \left (\pi \frac{\omega } {\Omega }\right )\left [\frac{2} {\omega } + \frac{1} {\Omega -\omega }- \frac{1} {\Omega +\omega }\right ] =\sin \left (\pi \frac{\omega } {\Omega }\right ) \frac{2\Omega ^{2}} {\omega (\Omega ^{2} -\omega ^{2})} {}\\ \end{array}$$
$$\displaystyle{ \Rightarrow X(\omega ) = \frac{2\pi } {\Omega }\; \frac{\mathrm{si}\left (\pi \frac{\omega }{\Omega }\right )} {1 -\left ( \frac{\omega }{\Omega }\right )^{2}}. }$$
(2.22)

In the last equation, we used the definition

$$\displaystyle{\mathrm{si}(x) = \left \{\begin{array}{ll} \frac{\sin \;x} {x} &\mbox{ for }x\neq 0,\\ 1 &\mbox{ for } x = 0. \end{array} \right.}$$

For the sake of uniqueness, we call this function si(x) instead of sinc(x).

2.1.5.2 Convolution

The convolution is given by

$$\displaystyle{\fbox{$h(t) {\ast} x(t) =\int _{ -\infty }^{+\infty }h(\tau )\;x(t-\tau )\;\mathrm{d}\tau,$}}$$

and one obtains

We consider the special case that

$$\displaystyle{h(t) =\sum _{k}\delta (t - T_{k})}$$

is a sequence of Dirac pulses. This leads to

$$\displaystyle{h(t) {\ast} x(t) =\sum _{k}\int _{-\infty }^{+\infty }\delta (\tau -T_{ k})\;x(t-\tau )\;\mathrm{d}\tau =\sum _{k}x(t - T_{k}).}$$

Hence, by convolution with a sequence of Dirac pulses, we may produce a repetition of the function x(t) at the locations of the delta pulses.

2.1.5.3 Relation to the Fourier Series

We consider the special case

$$\displaystyle{X(\omega ) =\sum _{ k=-\infty }^{+\infty }p_{ k}\;\delta (\omega -k\omega _{0}).}$$

According to Eq. (2.20), this leads to

$$\displaystyle{x(t) = \frac{1} {2\pi }\sum _{k=-\infty }^{+\infty }\int _{ -\infty }^{+\infty }p_{ k}\;\delta (\omega -k\omega _{0})\;e^{j\omega t}\;\mathrm{d}\omega = \frac{1} {2\pi }\sum _{k=-\infty }^{+\infty }p_{ k}\;e^{jk\omega _{0}t}.}$$

If we set

$$\displaystyle{p_{k} = 2\pi c_{k},}$$

we obtain the correspondence

which is an ordinary Fourier series, as Eq. (2.1) shows.

Hence, if we calculate the Fourier transform of a periodic function with period \(T_{0} = \frac{2\pi } {\omega _{0}}\), we get a sum of Dirac pulses that are multiplied by 2π and the Fourier coefficients. The factor 2π is obvious because of the correspondence

2.1.6 Consequences for the Spectrum of the Beam Signal

We first model an idealized beam signal h(t) as a periodic sequence of Dirac pulses. Even if the bunches oscillate in the longitudinal direction, periodicity may be satisfied if the beam signal repeats itself after one synchrotron oscillation period. The sequence of delta pulses will be defined by

$$\displaystyle{h(t) =\sum _{k}\delta (t - T_{k})}$$

as above. Thus, we get a realistic beam signal by convolution with the time function x(t), which represents a single bunch:

$$\displaystyle{y(t) = h(t) {\ast} x(t).}$$

Since h(t) is to be periodic, it may be represented by a Fourier series. As shown in the previous section, this leads to the Fourier transform

$$\displaystyle{H(\omega ) = 2\pi \sum _{k=-\infty }^{+\infty }c_{ k}^{h(t)}\;\delta (\omega -k\omega _{ 0}).}$$

The function x(t) describes a single pulse and is therefore equal to zero outside a finite interval. Therefore, the spectrum X(ω) will be continuous. This shows that

$$\displaystyle\begin{array}{rcl} Y (\omega )& =& H(\omega )\;X(\omega ) {}\\ & =& 2\pi \sum _{k=-\infty }^{+\infty }c_{ k}^{h(t)}\;X(\omega )\;\delta (\omega -k\omega _{ 0}) {}\\ & =& 2\pi \sum _{k=-\infty }^{+\infty }c_{ k}^{h(t)}\;X(k\omega _{ 0})\;\delta (\omega -k\omega _{0}) {}\\ \end{array}$$

is a Fourier series whose Fourier coefficients are

$$\displaystyle{ c_{k}^{y(t)} = c_{ k}^{h(t)}\;X(k\omega _{ 0}). }$$
(2.23)

As an example and as a test of the results obtained so far, we analyze the convolution of a Dirac comb

$$\displaystyle{h(t) =\sum _{ k=-\infty }^{\infty }\delta (t - kT_{ 0})}$$

with a single cosine pulse. According to Eq. (2.4), the Fourier coefficients of the Dirac comb are

$$\displaystyle{c_{k}^{h(t)} = \frac{1} {T_{0}}.}$$

Here T0 denotes the time span between the pulses. For the single cosine pulse with time span \(T = \frac{2\pi } {\Omega }\) that was defined in Eq. (2.21), one obtains—based on Eq. (2.22)—the Fourier transform

$$\displaystyle{X(\omega ) = \frac{2\pi } {\Omega }\; \frac{\mathrm{si}\left (\pi \frac{\omega }{\Omega }\right )} {1 -\left ( \frac{\omega }{\Omega }\right )^{2}}.}$$

According to Eq. (2.23), the Fourier coefficients of the convolution function y(t) = h(t) ∗ x(t) are therefore

$$\displaystyle{ c_{k}^{y(t)} = \frac{1} {T_{0}}\; \frac{2\pi } {\Omega }\; \frac{\mathrm{si}\left (\pi k\; \frac{\omega _{0}} {\Omega }\right )} {1 -\left (k\; \frac{\omega _{0}} {\Omega }\right )^{2}} = \frac{\omega _{0}} {\Omega }\; \frac{\mathrm{si}\left (\pi k\; \frac{\omega _{0}} {\Omega }\right )} {1 -\left (k\; \frac{\omega _{0}} {\Omega }\right )^{2}}. }$$
(2.24)

We will now analyze this result for several special cases.

  • Constant beam current: In this first case, we assume that the different single-cosine pulses overlap according to T = 2T0, which is equivalent to \(\omega _{0} = 2\Omega \). In this case, we obtain cky(t) = 0 for k ≠ 0. For c0y(t), which corresponds to the DC component, one obtains

    $$\displaystyle{c_{0}^{y(t)} = \frac{1} {T_{0}}\; \frac{2\pi } {\Omega } = 2,}$$

    which is the expected result for a constant function that equals 2.

  • Continuous sine wave: In this case, we make use of the simplification \(\Omega =\omega _{0}\), so that y(t) corresponds to a simple cosine function that is shifted upward:

    $$\displaystyle{c_{k}^{y(t)} = \frac{\mathrm{si}(\pi k)} {1 - k^{2}}.}$$

    We obviously have

    $$\displaystyle{c_{0}^{y(t)} = 1.}$$

    For k = ±1, we may use l’Hôpital’s rule:

    $$\displaystyle{c_{\pm 1}^{y(t)} =\lim _{ k\rightarrow \pm 1} \frac{\mathrm{si}(\pi k)} {1 - k^{2}} =\lim _{k\rightarrow \pm 1} \frac{\sin (\pi k)} {\pi \;(k - k^{3})} =\lim _{k\rightarrow \pm 1} \frac{\pi \;\cos (\pi k)} {\pi \;(1 - 3k^{2})} = \frac{1} {2}.}$$

    All other coefficients are zero. Thus we obtain

    $$\displaystyle{y(t) =\sum _{ k=-\infty }^{+\infty }c_{ k}^{y(t)}e^{jk\omega _{0}t} = 1 + \frac{1} {2}\;e^{j\omega _{0}t} + \frac{1} {2}\;e^{-j\omega _{0}t} = 1 +\cos (\omega _{ 0}t),}$$

    which is in accordance with our expectation.

  • Dirac comb: For this last case, we first observe that the area under each single-cosine pulse defined in Eq. (2.21) is T. If we want to have an area of 1 instead, we have to divide the function y(t) by T:

    $$\displaystyle{\tilde{y}(t) = \frac{y(t)} {T}.}$$

    Hence, the Fourier coefficients in Eq. (2.24) also have to be divided by T:

    $$\displaystyle{c_{k}^{\tilde{y}(t)} = \frac{1} {T_{0}}\; \frac{\mathrm{si}\left (\pi k\; \frac{\omega _{0}} {\Omega }\right )} {1 -\left (k\; \frac{\omega _{0}} {\Omega }\right )^{2}}.}$$

    We now consider the case T → 0 while assuming a fixed value of T0. Hence \(\omega _{0}/\Omega \rightarrow 0\), and we obtain

    $$\displaystyle{c_{k}^{\tilde{y}(t)} = \frac{1} {T_{0}},}$$

    which is the expected result for a Dirac comb.

Finally, our simple beam signal model that was constructed by a combination of single-cosine pulses is able to describe all states between unbunched beams and strongly bunched beams. In the case of long bunches (continuous sine wave), the DC current equals the RF current amplitude. As the bunches become shorter (\(\omega _{0} < \Omega \)), Eq. (2.24) can be used to determine the ratio between RF current amplitude and DC current.

2.2 Laplace Transform

The Laplace transform is one of the standard tools used to analyze closed-loop control systems. In the scope of the book at hand, we deal only with the one-sided Laplace transform [3, 4], which is useful because processes can be described whereby signals are switched on at t = 0. Hence, the name “Laplace transform” will be used as a synonym for “one-sided Laplace transform.” Such a one-sided Laplace transform of a function f(t) with f(t) = 0 for t < 0 is given by

$$\displaystyle{ \fbox{$F(s) =\int _{ 0}^{\infty }f(t)\;e^{-st}\;\mathrm{d}t.$} }$$
(2.25)

Here \(s =\sigma +j\omega\) is a complex parameter. It is obvious that the Laplace transform has a close relationship to the Fourier transform that is obtained for σ = 0 if only functions with f(t) = 0 for t < 0 are allowed. The real part of s is usually introduced to obtain convergence for a larger class of functions (please note that the Fourier transform of a sine or cosine function already leads to nonclassical Dirac pulses, as we saw in Sect. 2.1.5.3).

The Laplace transform F(s) of a function f(t) is an analytic function, and there is a unique correspondence between f(t) and F(s) if the classes of functions/distributions that are considered in the time domain and the Laplace domain are chosen accordingly [1, 4]. Since the integral in Eq. (2.25) exists only in some region of the complex plane, the Laplace transform is initially defined in only this region as well. If, however, a closed-form expression is obtained for the Laplace transform, e.g., a rational function, it is possible to extend the domain of definition by means of analytic continuation (cf. [5, Sect. 2.1]; [6, Sect. 10-9]; [7, Sect. 5.5.4]). Therefore, the Laplace transform F(s) should be defined as the analytic continuation of the function defined by Eq. (2.25). Apart from poles, a Laplace transform F(s) may thus be defined in the whole complex plane.

Like the Fourier transform, the Laplace transform is a linear transformation. If according to

we use the correspondence symbol again, the Laplace transform has the following properties (n is a positive integer, and a is a real number):

  • Laplace transform of a derivativeFootnote 1:

  • Derivative of a Laplace transform:

  • Laplace transform of an integral:

  • Shift theorems:

    (2.26)
  • Convolution:

    (2.27)
  • Scaling (a > 0):

  • Limits:

    $$\displaystyle{ f(0+) =\lim _{s\rightarrow \infty }\left (s\;F(s)\right ), }$$
    $$\displaystyle{ f(\infty ):=\lim _{t\rightarrow \infty }f(t) =\lim _{s\rightarrow 0}\left (s\;F(s)\right ). }$$
    (2.28)

    Here f and its derivative must satisfy further requirements [4]. Before using the final-value theorem (2.28), for example, one should verify that the function actually converges for t → .

Like the Fourier transform, the Laplace transform may also be generalized in order to cover distributions (i.e., generalized functions) [1]. Some common Laplace transforms are summarized in Table A.4 on p. 417.

2.3 Transfer Functions

Some dynamical systemsFootnote 2 may be described by the equation

$$\displaystyle{Y (s) = H(s)\;X(s).}$$

In this case, X(s) and Y (s) are the Laplace transforms of the input signal x(t) and the output signal y(t), respectively. The Laplace transform H(s) is called the transfer function of the system. We discuss two specific input signals:

  • Let us assume that the input function x(t) is a Heaviside step function

    In this case, the output is

    $$\displaystyle{Y (s) = \frac{H(s)} {s}.}$$

    If we now apply Eq. (2.28), we obtain

    $$\displaystyle{y(\infty ) =\lim _{s\rightarrow 0}H(s)}$$

    as the long-term (unit-)step response of the system.

  • If generalized functions are allowed, we may use x(t) = δ(t) as an input signal. In this case, the correspondence

    leads to

    $$\displaystyle{Y (s) = H(s),}$$

    which means that the transfer function H(s) corresponds to the impulse responseh(t) of the system. The final value of the response y(t) = h(t) is then given by

    $$\displaystyle{y(\infty ) =\lim _{s\rightarrow 0}\left (s\;H(s)\right ).}$$

Let us assume that a system component is specified by the transfer function H(s). If we calculate the phase responseFootnote 3of this component according to \(\varphi (\omega ) = \measuredangle H(j\omega )\), the group delay can be defined by

$$\displaystyle{\tau _{\mathrm{g}} = -\frac{\mathrm{d}\varphi } {\mathrm{d}\omega }.}$$

Taking a dead-time element with \(H(s) = e^{-sT_{\mathrm{dead}}}\) (see shift theorem (2.26)) as an example, one obtains the frequency-independent, i.e., constant group delay

$$\displaystyle{\tau _{\mathrm{g}} = T_{\mathrm{dead}}.}$$

Hence, the dead-time element is an example of a device with linear phase response.

2.4 Mathematical Statistics

The results summarized in this chapter can be read in more detail in [8].

2.4.1 Gaussian Distribution

The Gaussian distribution (also called the normal distribution) is given by the probability density function

$$\displaystyle{ \fbox{$f(x) = \frac{1} {\sigma \;\sqrt{2\pi }}e^{-\frac{1} {2} \left (\frac{x-\mu } {\sigma } \right )^{2} },$} }$$
(2.29)

where \(\mu,\sigma \in \mathbb{R}\) with σ > 0 are specified. In order to ensure that f(x) is in fact a valid probability distribution, the equation

$$\displaystyle{\fbox{$\int _{-\infty }^{+\infty }f(x)\;\mathrm{d}x = 1$}}$$

must hold. We show this by substituting

$$\displaystyle{u = \frac{x-\mu } {\sigma },\qquad \frac{\mathrm{d}u} {\mathrm{d}x} = \frac{1} {\sigma }.}$$

This leads to

$$\displaystyle{\int _{-\infty }^{+\infty }f(x)\;\mathrm{d}x =\int _{ -\infty }^{+\infty } \frac{1} {\sqrt{2\pi }}\;e^{-\frac{1} {2} u^{2} }\;\mathrm{d}u.}$$

By means of standard methods of mathematical analysis, one may show that

$$\displaystyle{\int _{0}^{\infty }e^{-a^{2}u^{2} }\;\mathrm{d}u = \frac{\sqrt{\pi }} {2a},}$$

which actually leads to the result

$$\displaystyle{ \int _{-\infty }^{+\infty }f(x)\;\mathrm{d}x = \frac{1} {\sqrt{2\pi }}\;\int _{-\infty }^{+\infty }e^{-\frac{1} {2} u^{2} }\;\mathrm{d}u = 1. }$$
(2.30)

For a given measurement curve that has the shape of a Gaussian distribution, one may use curve-fitting techniques to determine the parameters μ and σ. A simpler method is to determine the FWHM (full width at half maximum) value. According to Eq. (2.29), one-half of the maximum value is obtained for

$$\displaystyle{e^{-\frac{1} {2} \left (\frac{x-\mu } {\sigma } \right )^{2} }\stackrel{!}{=}\frac{1} {2}\qquad \Rightarrow -\frac{1} {2}\left (\frac{x-\mu } {\sigma } \right )^{2} = -\ln \;2\qquad \Rightarrow \vert x -\mu \vert =\sigma \sqrt{2\;\ln \;2}.}$$

The FWHM value equals twice this distance (one to the left of the maximum and one to the right of the maximum):

$$\displaystyle{\mathrm{FWHM} =\sigma \; 2\;\sqrt{2\;\ln \;2} \approx 2.35482\;\sigma.}$$

This formula may, of course, lead to less-accurate results than those obtained by the curve-fitting concept if zero line or the maximum cannot be clearly identified in the measurement data.

2.4.2 Probabilities

We now consider the area below the curve f(x) that is located to the left of \(x =\mu +\Delta x\), where \(\Delta x > 0\) holds. This area will be denoted by \(\Phi \):

$$\displaystyle{\Phi =\int _{ -\infty }^{\mu +\Delta x}f(x)\;\mathrm{d}x.}$$

It obviously specifies the probability that the random variableX is less than \(\mu +\Delta x\). By applying the same substitution as that mentioned above, one obtains

$$\displaystyle{\Phi =\int _{ -\infty }^{\Delta x/\sigma } \frac{1} {\sqrt{2\pi }}\;e^{-\frac{1} {2} u^{2} }\;\mathrm{d}u.}$$

According to Fig. 2.1 we set

$$\displaystyle{\Delta u = \frac{\Delta x} {\sigma } }$$

and get

$$\displaystyle{\Phi (\Delta u) = \frac{1} {\sqrt{2\pi }}\;\int _{-\infty }^{\Delta u}e^{-\frac{1} {2} u^{2} }\;\mathrm{d}u.}$$

The area D that is enclosed between \(\mu -\Delta x\) and \(\mu +\Delta x\) (see Fig. 2.2) can be calculated as follows:

$$\displaystyle{D(\Delta u) = \Phi (\Delta u) - \Phi (-\Delta u).}$$
Fig. 2.1
figure 1

Gaussian distribution

Fig. 2.2
figure 2

Gaussian distribution

Due to symmetry, we have

$$\displaystyle{\Phi (-\Delta u) = 1 - \Phi (\Delta u),}$$

which leads to

$$\displaystyle{D(\Delta u) = 2\;\Phi (\Delta u) - 1.}$$

Often, the area \(\Phi _{0}\) is considered, which is located between μ and \(\mu +\Delta x\):

$$\displaystyle{\Phi _{0}(\Delta u) = \Phi (\Delta u) -\frac{1} {2},\qquad \Phi _{0}(\Delta u) = \frac{1} {\sqrt{2\pi }}\;\int _{0}^{\Delta u}e^{-\frac{1} {2} u^{2} }\;\mathrm{d}u.}$$

This shows that D may also be written in the form

$$\displaystyle{D(\Delta u) = 2\;\Phi _{0}(\Delta u).}$$

Some examples for these quantities are summarized in Table 2.2.

Table 2.2 Integrals of the Gaussian probability density function

As an example, the table shows that the random variable is located in the confidence interval between μ − 2σ and μ + 2σ with a probability of 95. 45%.

2.4.3 Expected Value

Let X be a random variable with probability density function f(x). Then the expected value of the function g(X) is given by

$$\displaystyle{E(g(X)) =\int _{ -\infty }^{+\infty }g(x)\;f(x)\;\mathrm{d}x.}$$

It is obvious that the expected value is linear:

$$\displaystyle{E(a\;g_{1}(X) + b\;g_{2}(X)) = a\;E(g_{1}(X)) + b\;E(g_{2}(X)).}$$

For g(X) = Xk, one obtains the kth moment:

$$\displaystyle{E(X^{k}) =\int _{ -\infty }^{+\infty }x^{k}\;f(x)\;\mathrm{d}x.}$$

By definition, the first moment is the mean of the random variable X. For the Gaussian distribution, we obtain

$$\displaystyle{E(X) = \frac{1} {\sigma \;\sqrt{2\pi }}\int _{-\infty }^{+\infty }x\;e^{-\frac{1} {2} \left (\frac{x-\mu } {\sigma } \right )^{2} }\;\mathrm{d}x = \frac{1} {\sqrt{2\pi }}\int _{-\infty }^{+\infty }(\sigma u+\mu )\;e^{-\frac{1} {2} u^{2} }\;\mathrm{d}u.}$$

The term σ u in the parentheses leads to an odd integrand, so that this part of the integral vanishes.

Using Eq. (2.30), one obtains the mean

$$\displaystyle{E(X) =\mu,}$$

which is geometrically obvious.

If we always (not only for the Gaussian distribution) denote the mean by μ, then the kth central moment is given by

$$\displaystyle{E((X-\mu )^{k}) =\int _{ -\infty }^{+\infty }(x-\mu )^{k}\;f(x)\;\mathrm{d}x.}$$

The second central moment is called the variance. For the Gaussian distribution, we obtain

$$\displaystyle{E((X-\mu )^{2})= \frac{1} {\sigma \;\sqrt{2\pi }}\int _{-\infty }^{+\infty }(x-\mu )^{2}\;e^{-\frac{1} {2} \left (\frac{x-\mu } {\sigma } \right )^{2} }\;\mathrm{d}x= \frac{1} {\sqrt{2\pi }}\int _{-\infty }^{+\infty }(\sigma u)^{2}\;e^{-\frac{1} {2} u^{2} }\;\mathrm{d}u.}$$

With

$$\displaystyle\begin{array}{rcl} & & a = u\;e^{-u^{2}/4 },\quad a^{{\prime}} = e^{-u^{2}/4 }\left (1 -\frac{u^{2}} {2} \right ), {}\\ & & b^{{\prime}} = u\;e^{-u^{2}/4 },\quad b = -2\;e^{-u^{2}/4 }, {}\\ \end{array}$$

an integration by parts yields

$$\displaystyle{\int _{-\infty }^{\infty }u^{2}\;e^{-u^{2}/2 }\;\mathrm{d}u = \left.-2u\;e^{-u^{2}/2 }\right \vert _{-\infty }^{\infty } + 2\int _{ -\infty }^{+\infty }e^{-u^{2}/2 }\;\left (1 -\frac{u^{2}} {2} \right )\;\mathrm{d}u.}$$

The first term on the right-hand side vanishes, and we get

$$\displaystyle{2\;\int _{-\infty }^{\infty }u^{2}\;e^{-u^{2}/2 }\;\mathrm{d}u = 2\int _{-\infty }^{+\infty }e^{-u^{2}/2 }\;\mathrm{d}u.}$$

The remaining integral is known from Eq. (2.30):

$$\displaystyle{\int _{-\infty }^{\infty }u^{2}\;e^{-u^{2}/2 }\;\mathrm{d}u = \sqrt{2\pi }.}$$

Hence we obtain

$$\displaystyle{E((X-\mu )^{2}) =\sigma ^{2}.}$$

The variance is generally denoted by σ2 (not only for the Gaussian distribution), and its square root, the value σ, is called the standard deviation.

For a random sample with m values x1, x2, …, xm, one defines the sample mean

$$\displaystyle{\bar{x} = \frac{1} {m}\sum _{k=1}^{m}x_{ k}}$$

and the sample variance

$$\displaystyle{s^{2} = \frac{1} {m - 1}\sum _{k=1}^{m}(x_{ k} -\bar{ x})^{2}.}$$

For large samples, this value does not deviate much from \(\Delta x_{\mathrm{rms}}^{2}\), where the root mean square (rms) is definedFootnote 4 as

$$\displaystyle{\Delta x_{\mathrm{rms}} = \sqrt{ \frac{1} {m}\sum _{k=1}^{m}(x_{k} -\bar{ x})^{2}}.}$$

2.4.4 Unbiasedness

The individual values xk of a sample are the observed realizations of the random variables Xk that belong to the same distribution. Also,

$$\displaystyle{\bar{X} = \frac{1} {m}\sum _{k=1}^{m}X_{ k}}$$

is a random variable for which one may calculate the expected value. From E(Xk) = μ we obtain

$$\displaystyle{E(\bar{X}) = \frac{1} {m}\sum _{k=1}^{m}E(X_{ k}) =\mu,}$$

which means that \(\bar{X}\) is an unbiased estimator of the mean value μ of the population. We now check whether the sample variance

$$\displaystyle{S^{2} = \frac{1} {m - 1}\sum _{k=1}^{m}(X_{ k} -\bar{ X})^{2}}$$

is unbiased as well. We have

$$\displaystyle{ E(S^{2}) = \frac{1} {m - 1}\sum _{k=1}^{m}\left [E(X_{ k}^{2}) - 2E(X_{ k}\bar{X}) + E(\bar{X}^{2})\right ]. }$$
(2.33)

First of all, we need an expression for E(Xk2). For this purpose, we point out that all the random variables Xk belong to the same distribution, so that

$$\displaystyle{\sigma ^{2} = E((X_{ k}-\mu )^{2}) = E(X_{ k}^{2}) - 2\mu E(X_{ k}) +\mu ^{2}}$$

holds. From E(Xk) = μ, we obtain

$$\displaystyle{\sigma ^{2} = E(X_{ k}^{2}) -\mu ^{2}}$$

and

$$\displaystyle{ E(X_{k}^{2}) =\sigma ^{2} +\mu ^{2}. }$$
(2.34)

Now we analyze the second expression in Eq. (2.33), i.e., the expected value of

$$\displaystyle{X_{k}\bar{X} = \frac{1} {m}\sum _{l=1}^{m}X_{ k}X_{l}.}$$

For independent random variablesX and Y, we have the equation

$$\displaystyle{E(\mathit{XY }) = E(X)\;E(Y ).}$$

In our case, this is satisfied only for kl, which means for m − 1 terms. The term with k = l leads to the expected value E(Xk2) derived above. Therefore, we have

$$\displaystyle{ E(X_{k}\bar{X}) = \frac{1} {m}\sum _{l=1}^{m}E(X_{ k}X_{l}) = \frac{1} {m}\left [(m - 1)\mu ^{2} + (\sigma ^{2} +\mu ^{2})\right ] =\mu ^{2} + \frac{\sigma ^{2}} {m}. }$$
(2.35)

Finally, we calculate the expected value of

$$\displaystyle{\bar{X}^{2} = \frac{1} {m^{2}}\sum _{k=1}^{m}\sum _{ l=1}^{m}X_{ k}X_{l}}$$

in an analogous way, obtaining

$$\displaystyle{ E(\bar{X}^{2}) = \frac{1} {m^{2}}\left ((m^{2} - m)\mu ^{2} + m(\sigma ^{2} +\mu ^{2})\right ) =\mu ^{2} + \frac{\sigma ^{2}} {m}. }$$
(2.36)

The results (2.34)–(2.36) may now be used in Eq. (2.33):

$$\displaystyle\begin{array}{rcl} E(S^{2})& =& \frac{m} {m - 1}\left [(\sigma ^{2} +\mu ^{2}) - 2\left (\mu ^{2} + \frac{\sigma ^{2}} {m}\right ) + \left (\mu ^{2} + \frac{\sigma ^{2}} {m}\right )\right ] {}\\ & =& \frac{m} {m - 1}\left (\sigma ^{2} - \frac{\sigma ^{2}} {m}\right ) =\sigma ^{2}. {}\\ \end{array}$$

This shows that the sample variance is an unbiased estimator of the population variance. This is obviously not true for rms values. For large samples, however, this difference is no longer important.

We now calculate the variance of the sample mean \(\bar{X}\):

$$\displaystyle{E((\bar{X}-\mu )^{2}) = E(\bar{X}^{2}) - 2\mu E(\bar{X}) +\mu ^{2} = \left (\mu ^{2} + \frac{\sigma ^{2}} {m}\right ) -\mu ^{2} = \frac{\sigma ^{2}} {m}.}$$

This shows that an estimate of the population mean from the sample mean becomes better as the sample size becomes larger.

2.4.5 Uniform Distribution

According to

$$\displaystyle{f(x) = \left \{\begin{array}{ll} \frac{1} {2\Delta x} &\mbox{ for }\vert x -\mu \vert \leq \Delta x,\\ 0 &\mbox{ elsewhere,} \end{array} \right.\mbox{ with the constant }\Delta x > 0,}$$

we now calculate the variance of a uniform distribution:

$$\displaystyle{\sigma ^{2} = E((X-\mu )^{2}) =\int _{ -\infty }^{+\infty }(x-\mu )^{2}\;f(x)\;\mathrm{d}x =\int _{ -\infty }^{+\infty }u^{2}\;f(u+\mu )\;\mathrm{d}u.}$$

In the last step, we substituted \(u = x-\mu\) to obtain

$$\displaystyle\begin{array}{rcl} & \sigma ^{2} =\int _{ -\Delta x}^{+\Delta x}u^{2}\; \frac{1} {2\Delta x}\;\mathrm{d}u = 2\int _{0}^{\Delta x}u^{2}\; \frac{1} {2\Delta x}\;\mathrm{d}u = \frac{1} {\Delta x}\left.\frac{u^{3}} {3} \right \vert _{0}^{\Delta x} = \frac{\Delta x^{2}} {3} & {}\\ & \Rightarrow \sigma = \frac{1} {\sqrt{3}}\;\Delta x. & {}\\ \end{array}$$

For large samples, we get

$$\displaystyle{\Delta x_{\mathrm{rms}} \approx \frac{1} {\sqrt{3}}\;\Delta x.}$$

2.5 Bunching Factor

Let us consider a beam signal Ibeam(t) of a bunched beam as shown, for example, in Fig. 1.3 on p. 7. The bunching factor is defined as

$$\displaystyle{ \fbox{$B_{\mathrm{f}} = \frac{\bar{I}_{\mathrm{beam}}} {I_{\mathrm{beam,max}}},$} }$$
(2.37)

i.e., it is the ratio of the average beam current to the maximum beam current (cf. Chao [9, Sect. 2.5.3.2, p. 131] or Reiser [10, Sect. 4.5.1, p. 263]). Obviously, the equation

$$\displaystyle{\bar{I}_{\mathrm{beam}} = \frac{1} {T_{\mathrm{RF}}}\int _{-T_{\mathrm{RF}}/2}^{T_{\mathrm{RF}}/2}I_{\mathrm{ beam}}(t)\;\mathrm{d}t}$$

holds, where TRF denotes the period.

Now one may replace the true shape of the beam current pulse by a rectangular one with the same maximum value. For \(-T_{\mathrm{RF}}/2 < t < T_{\mathrm{RF}}/2\), we then have

$$\displaystyle{I_{\mathrm{beam}}(t) = \left \{\begin{array}{ll} I_{\mathrm{beam,max}} & \mbox{ for }\vert t\vert \leq \tau /2, \\ 0 &\mbox{ elsewhere,} \end{array} \right.}$$

where we have assumed that the bunch is centered at t = 0. In this case, one has to choose a pulse width τ in such a way that the same average beam current is obtained:

$$\displaystyle{\bar{I}_{\mathrm{beam}} = \frac{\tau } {T_{\mathrm{RF}}}\;I_{\mathrm{beam,max}}.}$$

Under these conditions, we obtain the expression

$$\displaystyle{B_{\mathrm{f}} = \frac{\tau } {T_{\mathrm{RF}}}}$$

for the bunching factor.

We now assume that the beam current pulse has the shape of a Gaussian distribution. This is, of course, possible only if the pulses are significantly shorter than the period time TRF. Under this condition, the beam current will be close to zero before the next pulse starts.

Making use of Eq. (2.29), one may write Ibeam(t) in the form

$$\displaystyle{I_{\mathrm{beam}}(t) = K\; \frac{1} {\sigma \;\sqrt{2\pi }}e^{-\frac{1} {2} \left (\frac{t} {\sigma } \right )^{2} }\mbox{ for } - T_{\mathrm{RF}}/2 < t < T_{\mathrm{RF}}/2.}$$

We have

$$\displaystyle{\int _{-\infty }^{+\infty }I_{\mathrm{ beam}}(t)\;\mathrm{d}t = K.}$$

The average beam current is obtained using the above-mentioned approximation:

$$\displaystyle{\bar{I}_{\mathrm{beam}} = \frac{1} {T_{\mathrm{RF}}}\int _{-T_{\mathrm{RF}}/2}^{T_{\mathrm{RF}}/2}I_{\mathrm{ beam}}(t)\;\mathrm{d}t \approx \frac{K} {T_{\mathrm{RF}}}.}$$

For the maximum current, we obtain

$$\displaystyle{I_{\mathrm{beam,max}} = \frac{K} {\sigma \;\sqrt{2\pi }},}$$

so that the bunching factor

$$\displaystyle{B_{\mathrm{f}} \approx \frac{\sigma \;\sqrt{2\pi }} {T_{\mathrm{RF}}}}$$

is obtained. The equivalent length τ of a rectangular pulse is therefore

$$\displaystyle{\tau =\sigma \; \sqrt{2\pi } \approx 2.5\;\sigma.}$$

The two slopes of the rectangular pulse are therefore located at about ± 1. 25 σ. This leads to the conversion between the Gaussian bunch and the rectangular signal that is visualized in Fig. 2.3.

Fig. 2.3
figure 3

Gaussian beam signal

2.6 Electromagnetic Fields

We summarize in this section a few basic formulas that may be found in standard textbooks (cf. [1117]). We begin with Maxwell’s equations in their integral form.

In the following, A denotes a two-dimensional domain, and V a three-dimensional domain. For a domain D (two- or threedimensional), ∂ D denotes its boundary (with mathematically positive orientation, if applicable).

Maxwell’s first equation (Ampère’s law) in the time domain is

$$\displaystyle{ \fbox{$\oint _{\partial A}\vec{H} \cdot \mathrm{ d}\vec{r} =\int _{A}\left (\vec{J} +\dot{\vec{ D}}\right ) \cdot \mathrm{ d}\vec{A},$} }$$
(2.38)

where \(\vec{H}\) is the magnetizing field, \(\vec{J}\) the current density, and \(\vec{D}\) the electric displacement field.

Maxwell’s second equation in the time domain (Faraday’s law) reads

$$\displaystyle{ \fbox{$\oint _{\partial A}\vec{E} \cdot \mathrm{ d}\vec{r} = -\int _{A}\dot{\vec{B}} \cdot \mathrm{ d}\vec{A}.$} }$$
(2.39)

Here \(\vec{E}\) is the electric field, and \(\vec{B}\) is the magnetic field.

Maxwell’s third equation states that no magnetic charge exists:

$$\displaystyle{ \fbox{$\oint _{\partial V }\vec{B} \cdot \mathrm{ d}\vec{A} = 0.$} }$$
(2.40)

The electric charge Q inside a three-dimensional domain V is determined by Maxwell’s fourth equation (Gauss’s law):

$$\displaystyle{ \fbox{$\oint _{\partial V }\vec{D} \cdot \mathrm{ d}\vec{A} =\int _{V }\rho _{q}\;\mathrm{d}V = Q.$} }$$
(2.41)

Here, ρq denotes the charge density.

The current through a certain region A is given by

$$\displaystyle{\fbox{$I =\int _{A}\vec{J} \cdot \mathrm{ d}\vec{A}$},}$$

and the voltage along a curve C is defined by

$$\displaystyle{\fbox{$V =\int _{C}\vec{E} \cdot \mathrm{ d}\vec{r}.$}}$$

Please note that we use the same symbol for voltage and for threedimensional domains, but according to the context this should not lead to confusion.

In material bodies, the simplest relationships (linear isotropic media with relaxation times that are much smaller than the minimum time intervals of interest) between the field vectors are

$$\displaystyle{\fbox{$\vec{D} =\epsilon \vec{ E}, \vec{B} =\mu \vec{ H}, \vec{J} =\kappa \vec{ E}.$}}$$

The material parameters are the permittivity ε, the permeability μ, and the conductivity κ. In vacuum, and approximately also in air, we have

$$\displaystyle{\epsilon =\epsilon _{0},\qquad \mu =\mu _{0}.}$$

At least for fixed nonmoving domains A, we can write Eq. (2.39) in the form

$$\displaystyle{\oint _{\partial A}\vec{E} \cdot \mathrm{ d}\vec{r} = -\frac{\mathrm{d}\Phi _{\mathrm{m}}} {\mathrm{d}t},}$$

where

$$\displaystyle{\Phi _{\mathrm{m}} =\int _{A}\vec{B} \cdot \mathrm{ d}\vec{A}}$$

is the magnetic flux through the domain A. This form is suitable for induction problems.

Based on the integral form of Maxwell’s equations presented above, one may derive their differential form if integral theorems are used:

Taking Eq. (2.44) into account, the divergence of Eq. (2.42) leads to the continuity equation

$$\displaystyle{ \fbox{$\mathrm{div}\vec{J} +\dot{\rho _{q}} = 0.$} }$$
(2.46)

We will discuss the physical meaning of this equation in Sect. 2.9.

In certain cases (here we assume that domains are filled homogeneously with linear isotropic material), Maxwell’s equations may be solved by means of the vector potentialFootnote 5\(\vec{A}\), defined by

$$\displaystyle{ \vec{B} =\mathrm{ curl}\;\vec{A}, }$$
(2.47)

and the scalar potential\(\Phi \), defined by

$$\displaystyle{ \vec{E} = -\dot{\vec{A}} -\mathrm{ grad}\;\Phi, }$$
(2.48)

both connected by the Lorenz gauge condition

$$\displaystyle{ \mathrm{div}\;\vec{A} = -\mu \epsilon \dot{\Phi }. }$$
(2.49)

Using these definitions, one obtains the wave equations

Here

$$\displaystyle{ \fbox{$c = \frac{1} {\sqrt{\mu \epsilon }}$} }$$
(2.52)

denotes the speed of light in the material under consideration. The speed of light in vacuum is

$$\displaystyle{ \fbox{$c_{0} = \frac{1} {\sqrt{\mu _{0 } \epsilon _{0}}}.$} }$$
(2.53)

For static problems, there is no time dependence of the fields, and according to Maxwell’s equations, electric and magnetic fields are therefore decoupled. In this case, the vector potential and the scalar potential also do not depend on time.

Equations (2.48) and (2.51) for homogeneous media thus reduce to

$$\displaystyle{\vec{E} = -\mathrm{grad}\;\Phi }$$

and the Poisson equation

$$\displaystyle{ \fbox{$\Delta \Phi = -\frac{\rho _{q}} {\epsilon },$} }$$
(2.54)

respectively. This equation has to be solved for electrostatic problems.

2.7 Special Relativity

The primary objective of this section is to introduce the nomenclature that is used in this book. This nomenclature is close to that of the introductory text [17] (in German). In any case, the reader should consult standard textbooks on special (and general) relativity (cf. [11, 13, 1820] in English or [14, 2128] in German) for an extensive introduction. However, the remainder of the book can also be understood if the formulas presented in this section are regarded as given.

The speed of light c0 in vacuum has the same value in every inertial frame. Therefore, the equation of the wave front

$$\displaystyle{ x^{2} + y^{2} + z^{2} = c_{ 0}^{2}t^{2} }$$
(2.55)

in one inertial frame S (e.g., light flash at t = 0 at the origin of S) is transformed into a wave front equation

$$\displaystyle{\bar{x}^{2} +\bar{ y}^{2} +\bar{ z}^{2} = c_{ 0}^{2}\bar{t}^{2}}$$

that has the same form in a different inertial frame \(\bar{S}\). Such transformation behavior is satisfied by the general Lorentz transformation. If one restricts generality in such a way that at t = 0, the origins of the two inertial frames are at the same position and that one frame \(\bar{S}\) moves with constant velocity v in the z-direction relative to the other frame S, then one obtains the special Lorentz transformation

$$\displaystyle{ \bar{x} = x, }$$
(2.56)
$$\displaystyle{ \bar{y} = y, }$$
(2.57)
$$\displaystyle{ \bar{z} = \frac{z - vt} {\sqrt{1 - \frac{v^{2 } } {c_{0}^{2}}} }, }$$
(2.58)
$$\displaystyle{ \bar{t} = \frac{t - \frac{v} {c_{0}^{2}} z} {\sqrt{1 - \frac{v^{2 } } {c_{0}^{2}}} }. }$$
(2.59)

The inverse transformation can be generated if the quantities with a bar (e.g., \(\bar{y}\)) are replaced by the same quantities without the bar (e.g., y) and vice versa. In that case, \(\bar{v} = -v\) has to be used (if \(\bar{S}\) moves with respect to S with velocity v in the positive z direction, S will move with respect to \(\bar{S}\) in the negative z direction), and c0 remains the same. This concept for generating inverse transformation formulas may also be applied to electromagnetic field quantities, whose transformation behavior is discussed below.

The square root in the denominator of Eqs. (2.58) and (2.59) is typical of expressions in special relativity. Therefore, the so-called Lorentz factors are defined:

$$\displaystyle\begin{array}{rcl} & \fbox{$\beta _{v} = \frac{v} {c_{0}},$} & {}\\ & \fbox{$\gamma _{v} = \frac{1} {\sqrt{1 -\beta _{ v }^{2}}}.$}& {}\\ \end{array}$$

Special relativity may be built up by defining so-called four-vectors and four-tensors. For example, the space coordinates are combined with the time “coordinate” in order to define the components of a four-vector that specifies the position in space-time:

$$\displaystyle{(\theta ^{i}) = (x,y,z,c_{ 0}t)^{\mathrm{T}}\mbox{ with }i \in \{ 1,2,3,4\}.}$$

Specific values of this four-vector can be interpreted as events. In combination with the special choice (signature)

$$\displaystyle{(g_{ik}) = (g^{ik}) = (\bar{g}_{ ik}) = (\bar{g}^{ik}) = \left (\begin{array}{cccc} 1&0&0& 0\\ 0 &1 &0 & 0 \\ 0&0&1& 0\\ 0 &0 &0 & -1 \end{array} \right )}$$

for the metric tensor (i, k ∈ { 1, 2, 3, 4}), one obtains the desired transformation behavior of the wave front equation, because

$$\displaystyle{\theta ^{i}\theta _{ i} = g_{ik}\theta ^{i}\theta ^{k} = 0,}$$

which reproduces Eq. (2.55), is a tensor equation with a tensor of rank 0 (scalar) on the right-hand side. Here we use the Ricci calculus and Einstein’s summation convention. The special Lorentz transformation given above can now be reproduced by

$$\displaystyle{\bar{\theta }^{i} =\bar{ a}_{ k}^{i}\theta ^{k},}$$

which corresponds to the matrix equation

$$\displaystyle{(\bar{\theta }^{i}) = (\bar{a}_{ k}^{i}) \cdot (\theta ^{i})}$$

if the transformation coefficients

$$\displaystyle{(\bar{a}_{k}^{i}) = \left (\begin{array}{cccc} 1&0& 0 & 0\\ 0 &1 & 0 & 0 \\ 0&0& \gamma _{v} & -\beta _{v}\gamma _{v} \\ 0&0& -\beta _{v}\gamma _{v}& \gamma _{v} \end{array} \right )}$$

are chosen (i = row, k = column).

Similarly to the construction of the position four-vector, the vector potential and the scalar potential in electromagnetic field theory may be combined to form the electromagnetic four-potential \(\mathcal{A}\) according to

$$\displaystyle{(\mathcal{A}^{i}) = (A_{ x},A_{y},A_{z},\Phi /c_{0})^{\mathrm{T}}.}$$

This, for example, allows one to write the Lorenz gauge condition (2.49) for free space in the form

$$\displaystyle{\mathcal{A}^{i}\vert _{ i} = 0}$$

of a tensor equation, where the vertical line indicates a covariant derivative, which—in special relativity—corresponds to the partial derivative because the metric coefficients are constant.

The four-current density\(\mathcal{J}\) is defined by

$$\displaystyle{(\mathcal{J}^{i}) = (J_{ x},J_{y},J_{z},\rho _{q}c_{0})^{\mathrm{T}},}$$

so that the tensor equation

$$\displaystyle{\mathcal{J}^{i}\vert _{ i} = 0}$$

represents the continuity equation (2.46). The transformation law obviously yields

$$\displaystyle{ \bar{J}_{x} = J_{x}, }$$
(2.60)
$$\displaystyle{ \bar{J}_{y} = J_{y}, }$$
(2.61)
$$\displaystyle{ \bar{J}_{z} =\gamma _{v}\left (J_{z} - v\rho _{q}\right ), }$$
(2.62)
$$\displaystyle{ \bar{\rho }_{q} =\gamma _{v}\left (\rho _{q} - \frac{v} {c_{0}^{2}}J_{z}\right ). }$$
(2.63)

With \(\vec{v} = v\vec{e}_{z}\) defining the parallel direction ∥ , this may be written in the generalized form

$$\displaystyle{ \vec{\bar{J}}_{\perp } =\vec{ J}_{\perp }, }$$
(2.64)
$$\displaystyle{ \vec{\bar{J}}_{\parallel } =\gamma _{v}\left (\vec{J}_{\parallel }-\vec{ v}\rho _{q}\right ), }$$
(2.65)
$$\displaystyle{ \bar{\rho }_{q} =\gamma _{v}\left (\rho _{q} -\frac{\vec{v} \cdot \vec{ J}} {c_{0}^{2}} \right ). }$$
(2.66)

The electromagnetic field tensor may be defined as

$$\displaystyle{(\mathcal{B}^{ik}) = \left (\begin{array}{cccc} 0 & B_{z} & - B_{y} & - E_{x}/c_{0} \\ - B_{z} & 0 & B_{x} & - E_{y}/c_{0} \\ B_{y} & - B_{x} & 0 & - E_{z}/c_{0} \\ E_{x}/c_{0} & E_{y}/c_{0} & E_{z}/c_{0} & 0 \end{array} \right ),}$$

while its counterpart for the other field components in Maxwell’s equations may be defined as

$$\displaystyle{(\mathcal{H}^{ik}) = \left (\begin{array}{cccc} 0 & H_{z} & - H_{y}& - c_{0}D_{x} \\ - H_{z}& 0 & H_{x} & - c_{0}D_{y} \\ H_{y} & - H_{x}& 0 & - c_{0}D_{z} \\ c_{0}D_{x} & c_{0}D_{y} & c_{0}D_{z} & 0 \end{array} \right ),}$$

where i specifies the row, and k the column. The introduction of these four-vectors and four-tensors allows one to write Maxwell’s equations asFootnote 6

$$\displaystyle{ \mathcal{H}^{ik}\vert _{ i} = -\mathcal{J}^{k}, }$$
(2.67)
$$\displaystyle{ {\mathcal{B}^{{\ast}}}^{ik}\vert _{ i} = 0, }$$
(2.68)

so that their form remains the same if a Lorentz transformation from one inertial frame to a different one is performed. This form invariance of physical laws is called covariance. The covariance of Maxwell’s equations implies the constancy of c0 in different inertial frames, since c0 is a scalar quantity, a tensor of rank 0. Because \(\mathcal{B}^{ik}\) and \(\mathcal{H}^{ik}\) are tensors of rank 2, they are transformed according to the transformation rule

$$\displaystyle{\bar{\mathcal{B}}^{ik} =\bar{ a}_{ l}^{i}\bar{a}_{ m}^{k}\mathcal{B}^{lm},\qquad \bar{\mathcal{H}}^{ik} =\bar{ a}_{ l}^{i}\bar{a}_{ m}^{k}\mathcal{H}^{lm}.}$$

Taking the second transformation rule as an example, this may be translated into the matrix equation

$$\displaystyle{(\bar{\mathcal{H}}^{ik}) = (\bar{a}_{ k}^{i}) \cdot (\mathcal{H}^{ik}) \cdot (\bar{a}_{ k}^{i})^{\mathrm{T}}.}$$

A long but straightforward calculation then leads to the transformation laws for the corresponding field components:

$$\displaystyle{ \bar{H}_{x} =\gamma _{v}(H_{x} + vD_{y}), }$$
(2.69)
$$\displaystyle{ \bar{H}_{y} =\gamma _{v}(H_{y} - vD_{x}), }$$
(2.70)
$$\displaystyle{ \bar{H}_{z} = H_{z}, }$$
(2.71)
$$\displaystyle{ \bar{D}_{x} =\gamma _{v}\left (D_{x} - \frac{\beta _{v}} {c_{0}}H_{y}\right ), }$$
(2.72)
$$\displaystyle{ \bar{D}_{y} =\gamma _{v}\left (D_{y} + \frac{\beta _{v}} {c_{0}}H_{x}\right ), }$$
(2.73)
$$\displaystyle{ \bar{D}_{z} = D_{z}. }$$
(2.74)

The generalized form is

$$\displaystyle{ \vec{\bar{H}}_{\perp } =\gamma _{v}\left (\vec{H}_{\perp }-\vec{ v} \times \vec{ D}_{\perp }\right ), }$$
(2.75)
$$\displaystyle{ \vec{\bar{H}}_{\parallel } =\vec{ H}_{\parallel }, }$$
(2.76)
$$\displaystyle{ \vec{\bar{D}}_{\perp } =\gamma _{v}\left (\vec{D}_{\perp } + \frac{\vec{v} \times \vec{ H}_{\perp }} {c_{0}^{2}} \right ), }$$
(2.77)
$$\displaystyle{ \vec{\bar{D}}_{\parallel } =\vec{ D}_{\parallel }. }$$
(2.78)

The remaining transformation laws are obtained analogously:

$$\displaystyle{ \bar{B}_{x} =\gamma _{v}\left (B_{x} + \frac{v} {c_{0}^{2}}E_{y}\right ), }$$
(2.79)
$$\displaystyle{ \bar{B}_{y} =\gamma _{v}\left (B_{y} - \frac{v} {c_{0}^{2}}E_{x}\right ), }$$
(2.80)
$$\displaystyle{ \bar{B}_{z} = B_{z}, }$$
(2.81)
$$\displaystyle{ \bar{E}_{x} =\gamma _{v}(E_{x} -\beta _{v}c_{0}B_{y}), }$$
(2.82)
$$\displaystyle{ \bar{E}_{y} =\gamma _{v}(E_{y} +\beta _{v}c_{0}B_{x}), }$$
(2.83)
$$\displaystyle{ \bar{E}_{z} = E_{z}, }$$
(2.84)
$$\displaystyle{ \vec{\bar{B}}_{\perp } =\gamma _{v}\left (\vec{B}_{\perp }-\frac{\vec{v} \times \vec{ E}_{\perp }} {c_{0}^{2}} \right ), }$$
(2.85)
$$\displaystyle{ \vec{\bar{B}}_{\parallel } =\vec{ B}_{\parallel }, }$$
(2.86)
$$\displaystyle{ \vec{\bar{E}}_{\perp } =\gamma _{v}\left (\vec{E}_{\perp } +\vec{ v} \times \vec{ B}_{\perp }\right ), }$$
(2.87)
$$\displaystyle{ \vec{\bar{E}}_{\parallel } =\vec{ E}_{\parallel }. }$$
(2.88)

In the scope of this book, there is no need to develop the theory further. Nor do we discuss such standard effects as time dilation, Lorentz contraction, and the transformation of velocities. However, we need some relativistic formulas for mechanics.

The definition of the Lorentz force (1.1) is valid in special relativity—it corresponds to a covariant equation (the charge Q is invariant; it is a scalar quantity).

Also, the equation

$$\displaystyle{\fbox{$\vec{F} = \frac{\mathrm{d}\vec{p}} {\mathrm{d}t}$}}$$

with the momentum definition

$$\displaystyle{\fbox{$\vec{p} = m\;\vec{u}$}}$$

based on the velocity

$$\displaystyle{\fbox{$\vec{u} = \frac{\mathrm{d}\vec{r}} {\mathrm{d}t}$}}$$

still holds. However, the mass m is not invariant. Only the rest massm0 is a tensor of rank zero, i.e., a scalar:

$$\displaystyle{\fbox{$m = \frac{m_{0}} {\sqrt{1 - \frac{u^{2 } } {c_{0}^{2}}} } = m_{0}\gamma _{u}.$}}$$

Please note that we strictly distinguish between the velocities v and u and also between the related Lorentz factors. The velocity \(\vec{v} = v\;\vec{e}_{z}\) is defined as the relative velocity of the inertial frame \(\bar{S}\) with respect to the inertial frame S, i.e., the velocity between these two reference frames. The velocity \(\vec{u}\) is the velocity of a particle measured in the first inertial frame S. Consequently, \(\vec{\bar{u}}\) is the velocity of the same particle in the inertial frame \(\bar{S}\). If it is clear what is meant by a certain Lorentz factor, one may, of course, omit the subscript.

The total energy of a particle with velocity \(\vec{u}\) is given by

$$\displaystyle{ \fbox{$W_{\mathrm{tot}} = m\;c_{0}^{2} =\gamma _{ u}m_{0}c_{0}^{2}.$} }$$
(2.89)

Consequently, the rest energy is obtained for \(\vec{u} = 0\), which leads to γu = 1:

$$\displaystyle{\fbox{$W_{\mathrm{rest}} = m_{0}c_{0}^{2}.$}}$$

Therefore, the kinetic energy is

$$\displaystyle{\fbox{$W_{\mathrm{kin}} = W_{\mathrm{tot}} - W_{\mathrm{rest}} = m_{0}c_{0}^{2}(\gamma _{ u} - 1).$}}$$

Using the Lorentz factors, one may write the momentum in the form

$$\displaystyle{\vec{p} = m\vec{u} = m_{0}c_{0}\vec{\beta }_{u}\gamma _{u},}$$

leading to the absolute value

$$\displaystyle{ p = mu = m_{0}c_{0}\beta _{u}\gamma _{u}. }$$
(2.90)

Here we used the definition

$$\displaystyle{ \vec{\beta }_{u} = \frac{\vec{u}} {c_{0}}. }$$
(2.91)

If we have a look at Eqs. (2.89)–(2.91), we observe that γ is related to the energy, the product β γ to the momentum, and β corresponds to the velocity. It is often helpful to keep this correspondence in mind when complicated expressions containing a large number of Lorentz factors are evaluated. One should also keep in mind that when one of the expressions β, γ, β γ is known, the others are automatically fixed as well.

This is why we can also convert expressions for relative deviations into each other. For example, we may calculate the time derivative of

$$\displaystyle{ \gamma = \frac{1} {\sqrt{1 -\beta ^{2}}} }$$
(2.92)

as follows:

$$\displaystyle\begin{array}{rcl} & \dot{\gamma }= - \frac{-2\beta \dot{\beta }} {2\left (1-\beta ^{2}\right )^{3/2}} =\beta \gamma ^{3}\dot{\beta }& {}\\ & \Rightarrow \frac{\dot{\gamma }}{\gamma } =\beta ^{2}\gamma ^{2}\frac{\dot{\beta }} {\beta }. & {}\\ \end{array}$$

Here we can use the relation

$$\displaystyle{ \gamma ^{2} -\beta ^{2}\gamma ^{2} = 1, }$$
(2.93)

which follows directly from Eq. (2.92):

$$\displaystyle{\frac{\dot{\gamma }} {\gamma } = (\gamma ^{2} - 1)\frac{\dot{\beta }} {\beta }.}$$

Expressions of this type are very helpful, because they can be translated as follows:

$$\displaystyle{\frac{\Delta W_{\mathrm{tot}}} {W_{\mathrm{tot}}} = (\gamma ^{2} - 1)\frac{\Delta u} {u}.}$$

This conversion is possible if the relative change in the quantities is sufficiently small. In the example presented here, one can see directly that a velocity deviation of 1% is transformed into an energy deviation of 3% if γ = 2 holds.

As a second example, we can calculate the time derivative of Eq. (2.93):

$$\displaystyle\begin{array}{rcl} & 2\gamma \dot{\gamma } - 2(\beta \gamma )\frac{\mathrm{d}(\beta \gamma )} {\mathrm{d}t} = 0& {}\\ & \Rightarrow \frac{\dot{\gamma }}{\gamma } =\beta ^{2} \frac{1} {(\beta \gamma )} \frac{\mathrm{d}(\beta \gamma )} {\mathrm{d}t}.& {}\\ \end{array}$$

This can be translated into

$$\displaystyle{\frac{\Delta W_{\mathrm{tot}}} {W_{\mathrm{tot}}} =\beta ^{2}\frac{\Delta p} {p}.}$$

Relations like these are summarized in Table 2.3.

Table 2.3 Conversion of relative deviations

In accelerator physics and engineering, specific units that contain the elementary charge e are often used to specify the energy of the beam. This is due to the fact that the energy that is gained by a chargeFootnote 7Q = zqe is given by formula (1.2),

$$\displaystyle{\Delta W = QV = z_{q}eV.}$$

An electron that passes a voltage of V = 1 kV will therefore lose or gain an energy of 1 keV, depending on the orientation of the voltage. We have only to insert the quantities into the formula without converting e into SI units. In order to convert an energy that is given in eV into SI units, one simply has to insert \(e = 1.6022 \cdot 10^{-19}\,\mathrm{C}\), so that \(1\,\mathrm{eV} = 1.6022 \cdot 10^{-19}\,\mathrm{J}\) holds.

Also, the rest energy of particles is often specified in eV. For example, the electron rest mass \(m_{\mathrm{e}} = 9.1094 \cdot 10^{-31}\,\mathrm{kg}\) corresponds to an energy of 510. 999 keV.

As we saw above, the energy directly determines the Lorentz factors and the velocity. Therefore, it is desirable to specify the energy in a unit that directly corresponds to a certain velocity. Due to

$$\displaystyle{W_{\mathrm{kin}} = mc_{0}^{2} - m_{ 0}c_{0}^{2} = m_{ 0}c_{0}^{2}(\gamma -1),}$$

a kinetic energy of 1 MeV leads to different values for γ if different particle rest masses m0 are considered. This is why one introduces another energy unit for ions. An ion with mass number A has rest mass

$$\displaystyle{m_{0} = A_{\mathrm{r}}m_{\mathrm{u}},}$$

where \(m_{\mathrm{u}} = 1.66054 \cdot 10^{-27}\,\mathrm{kg}\) denotes the unified atomic mass unit (as mentioned below, Ar differs slightly from A). Therefore, one obtains

$$\displaystyle{W_{\mathrm{kin,u}} = \frac{W_{\mathrm{kin}}} {A_{\mathrm{r}}} = m_{\mathrm{u}}c_{0}^{2}(\gamma -1).}$$

If the value on the right-hand side is specified now, γ is determined in a unique way, since mu and c0 are global constants. As an example, an ion beam with a kinetic energy ofFootnote 8 11. 4 MeV∕u corresponds to γ = 1. 0122386 and β = 0. 15503. We do not need to specify the ion species.

Ions are usually specified by the notation

$$\displaystyle{\text{}_{\mathrm{Z}}^{\mathrm{A}}\mathrm{Element}^{\mathrm{z}_{\mathrm{q}}+}.}$$

Here A is the (integer) mass number, i.e., the number of nucleons (protons plus neutrons); Z is the atomic number, which equals the number of protons and identifies the element. For example,

$$\displaystyle{\text{}_{92}^{238}\mathrm{U}^{28+}}$$

indicates a uranium ion that has \(A - Z = 146\) neutrons. Different uranium isotopesFootnote 9 exist with a different number of neutrons. The number of protons however, is the same for all these isotopes. Therefore, Z is redundant information that is already included in the element name. In the last example, the uranium atom has obviously lost 28 of its 92 electrons, leading to the charge number zq = 28.

The unified atomic mass unit mu is defined as 1∕12 of the mass of the atomic nucleus612C. For different ion species and isotopes, the mass is not exactly an integer multiple of mu (reasons: different mass of protons and neutrons, relativistic mass defect due to binding energy). For238U, for example, one has Ar = 238. 050786, which approximately equals A = 238.

2.8 Nonlinear Dynamics

A continuous dynamical system of first order may be described by the following first-order ordinary differential equation (ODE):

$$\displaystyle{\frac{\mathrm{d}x} {\mathrm{d}t} = v(x,t).}$$

The state of a dynamical system of order n is represented by the values of n variables x1, x2, …, xn, which may be combined into a vector \(\vec{r} = (x_{1},x_{2},\ldots,x_{n})\). Hence, a dynamical system of order n is described by the system of ordinary differential equations

$$\displaystyle{ \fbox{$\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r},t).$} }$$
(2.94)

One should note that the system of ODEs is still of order 1, but of dimension n. Such a system is called autonomous when \(\vec{v}(\vec{r},t)\) does not depend on the timeFootnote 10t, i.e., when

$$\displaystyle{\vec{v}(\vec{r},t) =\vec{ v}(\vec{r})}$$

holds. The next sections will show that Eq. (2.94), which may look very simple at first sight, includes a huge variety of problems.

2.8.1 Equivalence of Differential Equations and Systems of Differential Equations

Let us consider the nth-order linear ordinary differential equation

$$\displaystyle{\frac{\mathrm{d}^{n}y} {\mathrm{d}t^{n}} + a_{n-1}(t)\frac{\mathrm{d}^{n-1}y} {\mathrm{d}t^{n-1}} + \cdots + a_{1}(t)\frac{\mathrm{d}y} {\mathrm{d}t} + a_{0}(t)y(t) = b(t)}$$

with dimension 1. One sees that by means of the definitions

$$\displaystyle\begin{array}{rcl} x_{1}& =& y, {}\\ x_{2}& =& \frac{\mathrm{d}y} {\mathrm{d}t}, {}\\ & \ldots & {}\\ x_{n}& =& \frac{\mathrm{d}^{n-1}y} {\mathrm{d}t^{n-1}}, {}\\ \end{array}$$

it may be converted into the form

$$\displaystyle\begin{array}{rcl} \dot{x}_{1}& =& x_{2}, {}\\ \dot{x}_{2}& =& x_{3}, {}\\ & \ldots & {}\\ \dot{x}_{n}& =& b(t) - a_{0}(t)x_{1} - a_{1}(t)x_{2} -\cdots - a_{n-1}(t)x_{n}, {}\\ \end{array}$$

which is equivalent to the standard form

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r},t).}$$

If b and all the ak do not depend on time (ODE of order n with constant coefficients), then \(\vec{v}\) will also not depend on time explicitly, so that an autonomous system is present.

Although the vector field \(\vec{v}\) is called a velocity function, it does not always correspond to a physical velocity. As already mentioned, the variable t is not necessarily the physical time. However, we will use this notation because the reader may always interpret these variables in terms of the mechanical analogy, which may help to understand the physical background.

The above-mentioned equivalence is also valid for nonlinear ODEs of the form

$$\displaystyle{\frac{\mathrm{d}^{n}y} {\mathrm{d}t^{n}} = F\left (t,y, \frac{\mathrm{d}y} {\mathrm{d}t},\ldots, \frac{\mathrm{d}^{n-1}y} {\mathrm{d}t^{n-1}} \right ),}$$

whereFootnote 11F ∈ C1. Also here, we may use

$$\displaystyle\begin{array}{rcl} x_{1}& =& y, {}\\ x_{2}& =& \frac{\mathrm{d}y} {\mathrm{d}t}, {}\\ & \ldots & {}\\ x_{n}& =& \frac{\mathrm{d}^{n-1}y} {\mathrm{d}t^{n-1}}, {}\\ \end{array}$$

to obtain the standard form

$$\displaystyle\begin{array}{rcl} \dot{x}_{1}& =& x_{2}, {}\\ \dot{x}_{2}& =& x_{3}, {}\\ & \ldots & {}\\ \dot{x}_{n}& =& F(t,x_{1},x_{2},\ldots,x_{n}) {}\\ \end{array}$$
$$\displaystyle{\Leftrightarrow \frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r},t).}$$

An autonomous system results when F and \(\vec{v}\) do not explicitly depend on the time variable t.

2.8.2 Autonomous Systems

Hereinafter, we will consider only autonomous systems if a time dependence is not stated explicitly.

2.8.2.1 Time Shift

An advantage of autonomous systems is the fact that if a solution y(t) of

$$\displaystyle{\frac{\mathrm{d}^{n}y} {\mathrm{d}t^{n}} = F\left (y, \frac{\mathrm{d}y} {\mathrm{d}t},\ldots, \frac{\mathrm{d}^{n-1}y} {\mathrm{d}t^{n-1}} \right )}$$

is known, then \(z(t) = y(t - T)\) will also be a solution if T is a constant time shift. This can be shown as follows:

The solution y(t) is the first component of the vector \(\vec{r}(t)\) that satisfies the differential equationFootnote 12

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r}).}$$

Therefore, z(t) is the first component of the vector

$$\displaystyle{\vec{r}_{\mathrm{shift}}(t) =\vec{ r}(t - T).}$$

We obtain

$$\displaystyle{\frac{\mathrm{d}\vec{r}_{\mathrm{shift}}} {\mathrm{d}t} = \left.\frac{\mathrm{d}\vec{r}} {\mathrm{d}t}\right \vert _{t-T} = \left.\vec{v}(\vec{r})\right \vert _{t-T} =\vec{ v}(\vec{r}_{\mathrm{shift}}).}$$

One sees that \(\vec{r}_{\mathrm{shift}}(t)\) satisfies the system of ODEs in the same way as \(\vec{r}(t)\) does. Due to the equivalence with the differential equation of order n, z(t) will be a solution as well.

This explains, for instance, why sin(ω t) must be a solution of the homogeneous differential equation

$$\displaystyle{\ddot{y} +\omega ^{2}y = 0}$$

if one knows that cos(ω t) is a solution. This ODE is autonomous, and these two solutions differ only by a time shift.

2.8.2.2 Phase Space

The phase space may be defined as the continuous space of all possible states of a dynamical system. In our case, the dynamical system is described by an autonomous system of ordinary differential equations.

The graphs of the solutions \(\vec{r}(t)\) of the differential equation are the integral curves or solution curves in the n-dimensional phase space. Such an integral curve contains the dependence on the parameter t (which is usually but not necessarily the time). A different parameterization therefore leads to a different integral curve.

The set of all image points of the map \(t\mapsto \vec{r}(t)\) is called the orbit. An orbit does not contain dependence on the parameter t. A different parameterization therefore leads to the same orbit, since the same image points are obtained simply by a different value of the parameter t.

Different orbits of an autonomous system are often drawn in a phase portrait, which may be defined as the set of all orbits.

2.8.3 Existence and Uniqueness of the Solution of Initial Value Problems

The standard form

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r})}$$

has the advantage that it can be solved numerically according to the (explicit) Euler method:

$$\displaystyle\begin{array}{rcl} & \vec{r}_{k+1} =\vec{ r}_{k} + \Delta t \cdot \vec{ v}(\vec{r}_{k})& {}\\ & t_{k} = t_{0} + k\;\Delta t. & {}\\ \end{array}$$

It is obvious that by defining the initial condition

$$\displaystyle{\vec{r}_{0} =\vec{ r}(t_{0}),}$$

the states

$$\displaystyle{\vec{r}_{k} \approx \vec{ r}(t_{k}) =\vec{ r}(t_{0} + k\Delta t)}$$

of the system at different times can be derived iteratively for k > 0 (\(k \in \mathbb{N}\)). The states of the system may be calculated for both future times t > t0 and past times t < t0 in a unique way by selecting the sign of \(\Delta t\). However, this is possible only in a certain neighborhood around t0, as we will see in the next sections.

It is obvious that by defining \(\vec{r}_{0}\), n scalar initial conditions are required to make the solution unique.

2.8.3.1 Existence of a Local Solution

The existence of a solution is ensured by the following theorem:

Theorem 2.1 (Peano).

Consider an initial value problem

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r},t)\qquad \vec{r}(t_{0}) =\vec{ r}_{0}}$$

with a continuous \(\vec{v}: D \rightarrow \mathbb{R}^{n}\) on an open set \(D \subset \mathbb{R}^{n+1}\) . Then there exists \(\alpha (\vec{r}_{0},t_{0}) > 0\) such that the initial value problem has at least one solution in the interval \([t_{0}-\alpha,t_{0}+\alpha ]\) .

(See Aulbach [29, Theorem 2.2.3].)

Remark.

We may easily see that \(\vec{v}\) must be continuous. If we choose \(v = \Theta (t)\) (Heaviside step function) in the one-dimensional case, we immediately see that the derivative \(\frac{\mathrm{d}r} {\mathrm{d}t}\) is not defined at t = 0. Therefore, in the scope of classical analysis, we have to exclude functions that are not continuous. In the scope of distribution theory, the solution \(r = t\;\Theta (t)\) is obvious.

2.8.3.2 Uniqueness of a Local Solution

Uniqueness can be ensured if the vector field \(\vec{v}\) satisfies a Lipschitz condition or if it is continuously differentiable.

Definition 2.2.

The vector function \(\vec{v}(\vec{r},t): D \rightarrow \mathbb{R}^{n}\) (\(D \subset \mathbb{R}^{n+1}\) open) is said to satisfy a global Lipschitz condition on D with respect to \(\vec{r}\) if there is a constant K > 0 such that for all \((\vec{r}_{1},t),(\vec{r}_{2},t) \in D\), the condition

$$\displaystyle{\left \|\vec{v}(\vec{r}_{1},t) -\vec{ v}(\vec{r}_{2},t)\right \| \leq K\left \|\vec{r}_{1} -\vec{ r}_{2}\right \|}$$

holds. Instead of saying that a function satisfies a global Lipschitz condition, one also speaks of a function that is Lipschitz continuous.

(Cf. Aulbach [29, Definition 2.3.5] and Perko [30, p. 71, Definition 2].)

Definition 2.3.

The vector function \(\vec{v}(\vec{r},t): D \rightarrow \mathbb{R}^{n}\) (\(D \subset \mathbb{R}^{n+1}\) open) is said to satisfy a local Lipschitz condition on D with respect to \(\vec{r}\) if for each \((\vec{r}_{0},t_{0}) \in D\), there exist a neighborhood \(U_{(\vec{r}_{0},t_{0})} \subset D\) of \((\vec{r}_{0},t_{0})\) and a constant K > 0 such that for all \((\vec{r}_{1},t),(\vec{r}_{2},t) \in U_{(\vec{r}_{0},t_{0})}\), the condition

$$\displaystyle{\left \|\vec{v}(\vec{r}_{1},t) -\vec{ v}(\vec{r}_{2},t)\right \| \leq K\left \|\vec{r}_{1} -\vec{ r}_{2}\right \|}$$

holds. Instead of saying that a function satisfies a local Lipschitz condition, one also speaks of a function that is locally Lipschitz continuous.

(Cf. Aulbach [29, Definition 2.3.5], Wirsching [31, Definition 3.4], and Perko [30, p. 71, Definition 2].)

In other words, the function satisfies a local Lipschitz condition if for every point, we can find a neighborhood such that a “global” Lipschitz condition holds in that neighborhood.

Example.

The function f(x) = x2 is locally Lipschitz continuous, but it is not Lipschitz continuous.

Theorem 2.4 (Picard–Lindelöf).

Consider the initial value problem

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r},t)\qquad \vec{r}(t_{0}) =\vec{ r}_{0}}$$

with continuous \(\vec{v}: D \rightarrow \mathbb{R}^{n}\) ( \(D \subset \mathbb{R}^{n+1}\) open). Suppose that the vector function \(\vec{v}(\vec{r},t)\) is locally Lipschitz continuous with respect to \(\vec{r}\) . Then there exists \(\alpha (\vec{r}_{0},t_{0}) > 0\) such that the initial value problem has a unique solution in the interval \([t_{0}-\alpha,t_{0}+\alpha ]\) .

(See Aulbach [29, Theorem 2.3.7].)

Every locally Lipschitz continuous function is also continuous.

Every continuously differentiable function satisfies a local Lipschitz condition, i.e., is locally Lipschitz continuous (Aulbach [29, p. 77], Arnold [32, p. 279], Perko [30, lemma on p. 71]).

Therefore, the Picard–Lindelöf theorem may simply be rewritten for continuously differentiable functions instead of locally Lipschitz continuous functions (Perko [30, p. 74]: “The Fundamental Existence-Uniqueness Theorem,” Guckenheimer/Holmes [33, Theorem 1.0.1]).

2.8.3.3 Maximal Interval of Existence

One may try to make the solution interval larger by using the endpoint of the solution interval as a new initial condition. If this strategy is executed iteratively, one obtains the maximal interval of existence. It is an open interval (cf. [30, p. 89, Theorem 1]). The maximal interval of existence does not necessarily correspond to the full real time axis. Further requirements are necessary to ensure this.

2.8.3.4 Global Solution

A continuously differentiable vector field \(\vec{v}\) is called complete if it induces a global flow,Footnote 13 i.e., if its integral curves are defined for all \(t \in \mathbb{R}\).

Every differentiable vector field with compact support is complete.

The following theorem shows that certain restrictions on the “velocity” \(\vec{v}(\vec{r})\) are sufficient for completeness:

Theorem 2.5.

Let the vector function \(\vec{v}(\vec{r})\) with \(\vec{v}: D \rightarrow \mathbb{R}^{n}\) ( \(D \subset \mathbb{R}^{n}\) open) be continuously differentiable and linearly bounded with K,L ≥ 0:

$$\displaystyle{\left \|\vec{v}(\vec{r})\right \| \leq K\left \|\vec{r}\right \| + L.}$$

Then the initial value problem

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r})\qquad \vec{r}(t_{0}) =\vec{ r}_{0}}$$

has a global flow.

(Cf. Zehnder [34, Proposition IV.3, p. 130], special form of Theorem 2.5.6, Aulbach [29].)

According to Amann [35, Theorem 7.8], the solution will then be bounded for finite time intervals.

Like many other authors, Perko [30, p. 188, Theorem 3] requires that \(\vec{v}(\vec{r})\) satisfy a global Lipschitz condition

$$\displaystyle{\left \|\vec{v}(\vec{r}_{1}) -\vec{ v}(\vec{r}_{2})\right \| \leq K\left \|\vec{r}_{1} -\vec{ r}_{2}\right \|}$$

for arbitrary \(\vec{r}_{1},\vec{r}_{2} \in \mathbb{R}^{n}\). For \(\vec{r}_{2} = 0\), this leads to linear boundedness, as one may show by means of the reverse triangle inequality, but it is a stronger condition.

Example.

The ODE

$$\displaystyle{\dot{y} = 1 + y^{2}}$$

is obviously satisfied for

$$\displaystyle{y =\tan \; t = \frac{\sin t} {\cos t}\qquad \dot{y} = \frac{\cos ^{2}t +\sin ^{2}t} {\cos ^{2}t} = 1 +\tan ^{2}t.}$$

This solution may be found by separation of variables. An arbitrary initial condition y(0) = y0 may be satisfied if the shifted solution

$$\displaystyle{y =\tan (t+\tau )}$$

is considered. In any case, however, the solution curve reaches infinity while t is still finite. The “vector” field \(v(y) = 1 + y^{2}\) is not complete, and it is obviously not linearly bounded.

If we simplify the results of this section, we may summarize them as follows:

  • The existence of a local solution is ensured by continuity of \(\vec{v}\).

  • Local Lipschitz continuity ensures uniqueness of the solution. If \(\vec{v}\) is continuously differentiable, uniqueness is also guaranteed.

  • If linear boundedness of \(\vec{v}\) is required in addition, a global solution/global flow exists.

For the sake of simplicity, we will consider only complete vector fields in the following.

2.8.3.5 Linear Systems of Ordinary Differential Equations

For linear systems of differential equations with

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} = A \cdot \vec{ r},\qquad \vec{r}(t_{0}) =\vec{ r}_{0},}$$

where A is a quadratic matrix with real constant elements, we may use the matrix norm:

$$\displaystyle{\left \|\vec{v}(\vec{r})\right \| = \left \|A \cdot \vec{ r}\right \| \leq \left \|A\right \|\left \|\vec{r}\right \| = K\left \|\vec{r}\right \|.}$$

Therefore, the conditions of Theorem 2.5 are satisfied, and a unique solution with a global flow exists. One may specifically use the Frobenius norm

$$\displaystyle{\left \|A\right \|_{\mathrm{F}} = \sqrt{\sum _{i=1 }^{n }\sum _{k=1 }^{n }\vert a_{ik } \vert ^{2}},}$$

which is compatible with the Euclidean norm

$$\displaystyle{\left \|\vec{r}\right \| = \sqrt{\sum _{i=1 }^{n }\vert r_{i } \vert ^{2}}}$$

of a vector, so that

$$\displaystyle{\left \|A \cdot \vec{ r}\right \| \leq \left \|A\right \|_{\mathrm{F}}\left \|\vec{r}\right \|}$$

holds.

2.8.4 Orbits

Two distinct orbits of an autonomous system do not intersect. In order to prove this, we assume the contrary. Suppose that two distinct orbits defined by \(\vec{r}_{1}(t)\) and \(\vec{r}_{2}(t)\) intersect according to

$$\displaystyle{\vec{r}_{1}(t_{1}) =\vec{ r}_{2}(t_{2}).}$$

Please note that the intersection point may be reached for different values t1 and t2 of the parameter t, since we require only that the orbits (i.e., the images of the solution curves) intersect. As shown in Sect. 2.8.2.1,

$$\displaystyle{\vec{r}_{\mathrm{shift}}(t) =\vec{ r}_{1}(t + t_{1} - t_{2})}$$

is also a solution of the differential equation. Therefore, we have

$$\displaystyle{\vec{r}_{\mathrm{shift}}(t_{2}) =\vec{ r}_{1}(t_{1}) =\vec{ r}_{2}(t_{2}).}$$

Hence \(\vec{r}_{\mathrm{shift}}(t)\) and \(\vec{r}_{2}(t)\) satisfy the same initial conditions at the time t2. This means that the solution curves \(\vec{r}_{\mathrm{shift}}(t)\) and \(\vec{r}_{2}(t)\) are identical.

Since \(\vec{r}_{1}(t)\) is simply time-shifted with respect to \(\vec{r}_{\mathrm{shift}}(t) =\vec{ r}_{2}(t)\), the images, i.e., the orbits, will be identical. This means that two orbits are completely equal if they have one point in common.

In other words, each point of phase space is crossed by only one orbit.

2.8.5 Fixed Points and Stability

Vectors \(\vec{r} =\vec{ r}_{\mathrm{F}}\) for which

$$\displaystyle{\vec{v}(\vec{r}) = 0}$$

holds are called fixed points (or equilibrium points or stationary points or critical points) of the dynamical system given by

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r}).}$$

This nomenclature is obvious, since a particle that is initially located at

$$\displaystyle{\vec{r}(t_{0}) =\vec{ r}_{\mathrm{F}}}$$

will stay there forever:

$$\displaystyle{\vec{r}(t) =\vec{ r}_{\mathrm{F}}\mbox{ for }t > t_{0}.}$$

Definition 2.6.

A fixed point of an autonomous dynamical system is called an isolated fixed point or a nondegenerate fixed point if an environment of the fixed point exists that does not contain any other fixed points.

(Cf. Sastry [36, Definition 1.4, p. 13], Perko [30, Definition 2, p. 173].)

We now define the stability of fixed points according to Lyapunov.

Definition 2.7.

A fixed point is called stable if for every neighborhood U of \(\vec{r}_{\mathrm{F}}\), another neighborhood V ⊂ U of \(\vec{r}_{\mathrm{F}}\) exists such that a trajectory starting in V at t = t0 will remain in U for all t ≥ t0 (see Fig. 2.4). Otherwise, the fixed point is called unstable.

Fig. 2.4
figure 4

Stability (left) and asymptotic stability (right) of a fixed point

Please note that it is usually necessary to choose V smaller than U, because the shape of the orbit may cause the trajectory to leave U for some starting points in U even if \(\vec{r}_{\mathrm{F}}\) is stable.

Definition 2.8.

A stable fixed point \(\vec{r}_{\mathrm{F}}\) is called asymptotically stable if a neighborhood U of \(\vec{r}_{\mathrm{F}}\) exists such that for every trajectory that starts at t = t0 in U, the following equation holds:

$$\displaystyle{\lim _{t\rightarrow \infty }\vec{r}(t) =\vec{ r}_{\mathrm{F}}.}$$

(See, e.g., Perko [30, Definition 1, p. 129].)

Definition 2.9.

A function \(L(\vec{r})\) with L ∈ C1 and \(L: U \rightarrow \mathbb{R}\) (\(U \subset \mathbb{R}^{n}\) open) is called a Lyapunov function for the fixed point \(\vec{r}_{\mathrm{F}}\) of the autonomous system

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r})\qquad \vec{v} \in C^{1}(D)\qquad D \subset \mathbb{R}^{n}\mbox{ open}}$$

if

$$\displaystyle{L(\vec{r}_{\mathrm{F}}) = 0}$$

and

$$\displaystyle\begin{array}{rcl} & L(\vec{r}) > 0\mbox{ for }\vec{r} \in U\setminus \{\vec{r}_{\mathrm{F}}\}, & {}\\ & \dot{L} =\vec{ v} \cdot \mathrm{ grad}\;L \leq 0\mbox{ for }\vec{r} \in U\setminus \{\vec{r}_{\mathrm{F}},\}& {}\\ \end{array}$$

hold in a neighborhood U ⊂ D of \(\vec{r}_{\mathrm{F}}\).

A Lyapunov function is called a strict Lyapunov function if

$$\displaystyle{\dot{L} =\vec{ v} \cdot \mathrm{ grad}\;L < 0\mbox{ for }\vec{r} \in U\setminus \{\vec{r}_{\mathrm{F}}\}}$$

holds.

(Cf. Perko [30, p. 131, Theorem 3], La Salle/Lefschetz [37, Sect. 8], Guckenheimer/Homes [33, Theorem 1.0.2].)

Theorem 2.10.

If a Lyapunov function for a fixed point \(\vec{r}_{\mathrm{F}}\) of an autonomous system exists, then this fixed point \(\vec{r}_{\mathrm{F}}\) is stable. If a strict Lyapunov function exists, then this fixed point \(\vec{r}_{\mathrm{F}}\) is asymptotically stable.

(Cf. Perko [30, p. 131, Theorem 3].)

It is easy to see that this theorem is valid. For two-dimensional systems with the particle trajectory \(\vec{r}(t) = x(t)\;\vec{e}_{x} + y(t)\;\vec{e}_{y}\), we obtain, for example,Footnote 14

$$\displaystyle{\dot{L} = \frac{\mathrm{d}L} {\mathrm{d}t} = \frac{\partial L} {\partial x} \frac{\mathrm{d}x} {\mathrm{d}t} + \frac{\partial L} {\partial y} \frac{\mathrm{d}y} {\mathrm{d}t} =\vec{ v} \cdot \mathrm{ grad}\;L.}$$

If this expression is negative, the strict Lyapunov function will decrease while the particle continues on its path. Since the minimum of the Lyapunov function is obtained for \(\vec{r}_{\mathrm{F}}\), it is clear that the particle will move toward the fixed point.

Similar reasoning applies for a Lyapunov function that is not strict. In this case, the particle cannot move away from the fixed point, because the Lyapunov function does not increase. However, it will not necessarily get closer to the fixed point.

2.8.6 Flows of Linear Autonomous Systems

Having shown above that a linear autonomous system possesses a global flow, we shall now compute this flow. If an autonomous system of order n is linear, we may describe it by

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r})}$$

with

$$\displaystyle{\vec{v}(\vec{r}) = A \cdot \vec{ r},}$$

where A is a quadratic n × n matrix with real constant elements. The ansatz

$$\displaystyle{\vec{r} =\vec{ w}\;e^{\lambda t}}$$

leads to

$$\displaystyle{\lambda \vec{w} = A \cdot \vec{ w},}$$

or

$$\displaystyle{(A -\lambda I) \cdot \vec{ w} = 0.}$$

For nontrivial solutions \(\vec{w}\neq 0\), the condition

$$\displaystyle{\fbox{$\det (A -\lambda I) = 0$}}$$

is necessary, which determines the eigenvaluesλ. For the sake of simplicity, we now assume that all n eigenvalues are distinct and that there is one eigenvector belonging to each eigenvalue (A is diagonalizable in this case). The overall solution of the homogeneous system of ODEs may then be written in the form

$$\displaystyle{ \vec{r}(t) =\sum _{ k=1}^{n}C_{ k}\;\vec{w}_{k}e^{\lambda _{k}t}, }$$
(2.95)

where \(\vec{w}_{k}\) denotes the eigenvector that belongs to the eigenvalue λ = λk and where the Ck are constants. For the initial condition at t = 0, we therefore have

$$\displaystyle{\vec{r}_{0} =\vec{ r}(0) =\sum _{ k=1}^{n}C_{ k}\;\vec{w}_{k}.}$$

According to Eq. (2.95), the solution is obviously asymptotically stable if and only if

$$\displaystyle{ \fbox{$\mathrm{Re}\{\lambda _{k}\} < 0$} }$$
(2.96)

holds for all \(k \in \{ 1,2,\ldots,n\}\), since only then does

$$\displaystyle{\lim _{t\rightarrow \infty }\vec{r}(t) = 0}$$

hold for arbitrary initial conditions. In this case, \(\vec{r} = 0\) is an asymptotically stable fixed point.

Now we raise the question whether further fixed points exist. This is the case for

$$\displaystyle{\vec{v}(\vec{r}) = A \cdot \vec{ r} = 0}$$

with

$$\displaystyle{\vec{r}\neq 0,}$$

i.e., only for

$$\displaystyle{\det \;A = 0.}$$

In Sect. 2.8.8, we will see that this is the condition for a degenerate, i.e., nonisolated, fixed point (see Definition 2.6, p. 59).

Let us now determine a map that transforms an initial value \(\vec{r}_{0}\) into a vector \(\vec{r}(t)\) that satisfies the linear autonomous system of ODEs

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r}) = A \cdot \vec{ r}.}$$

The overall solution

$$\displaystyle{\vec{r}(t) =\sum _{ k=1}^{n}C_{ k}\;\vec{w}_{k}e^{\lambda _{k}t}}$$

with eigenvectors

$$\displaystyle{\vec{w}_{k} = \left (\begin{array}{c} w_{k1} \\ w_{k2} \\ \cdots \\ w_{ kn}\\ \end{array} \right )}$$

may, due to

$$\displaystyle{x_{i}(t) =\sum _{ k=1}^{n}C_{ k}\;w_{ki}e^{\lambda _{k}t},}$$

be written as the matrix equation

$$\displaystyle{\vec{r}(t) = \left (\begin{array}{cccc} w_{11} & w_{21} & \cdots & w_{n1} \\ w_{12} & w_{22} & \cdots & w_{n2} \\ \cdots & \cdots &\cdots & \cdots \\ w_{ 1n}&w_{2n}&\cdots &w_{nn} \end{array} \right )\cdot \left (\begin{array}{c} C_{1}e^{\lambda _{1}t} \\ C_{2}e^{\lambda _{2}t} \\ \cdots \\ C_{n}e^{\lambda _{n}t} \end{array} \right ).}$$

We define

$$\displaystyle{X_{A}(0) = \left (\begin{array}{cccc} w_{11} & w_{21} & \cdots & w_{n1} \\ w_{12} & w_{22} & \cdots & w_{n2} \\ \cdots & \cdots &\cdots & \cdots \\ w_{ 1n}&w_{2n}&\cdots &w_{nn} \end{array} \right )}$$

(matrix of the eigenvectors),

$$\displaystyle{\vec{c} = \left (\begin{array}{c} C_{1} \\ C_{2} \\ \cdots \\ C_{ n} \end{array} \right ),}$$

and

$$\displaystyle{D_{A}^{\lambda }(t) = \left (\begin{array}{cccc} e^{\lambda _{1}t}& 0 &\cdots & 0 \\ 0 &e^{\lambda _{2}t}&\cdots & 0\\ \cdots & \cdots &\cdots & \cdots \\ 0 & 0 &\cdots &e^{\lambda _{n}t} \end{array} \right ).}$$

Hence we have

$$\displaystyle{\vec{r}(t) = X_{A}(0) \cdot D_{A}^{\lambda }(t) \cdot \vec{ c}.}$$

For t = 0, we obtain

$$\displaystyle{\vec{r}(0) = X_{A}(0) \cdot D_{A}^{\lambda }(0) \cdot \vec{ c} = X_{ A}(0) \cdot I \cdot \vec{ c} = X_{A}(0) \cdot \vec{ c}.}$$

Due to

$$\displaystyle{D_{A}^{\lambda }(0) = I,}$$

the definition

$$\displaystyle{ X_{A}(t) = X_{A}(0) \cdot D_{A}^{\lambda }(t) }$$
(2.97)

makes sense. Finally, we obtain

$$\displaystyle{\vec{r}(t) = X_{A}(t)X_{A}(0)^{-1}\vec{r}(0).}$$

Using the matrix exponential function

$$\displaystyle{ e^{tA} = X_{ A}(t)X_{A}(0)^{-1}, }$$
(2.98)

one also writes

$$\displaystyle{ \vec{r}(t) = e^{tA}\;\vec{r}(0). }$$
(2.99)

This equation obviously determines the global flow (Guckenheimer [33, Eqn. (1.1.9), p. 9]) if the following definition is used:

Definition 2.11.

A (global) flow is a continuous map \(\Phi: \mathbb{R} \times D \rightarrow D\), which transforms each initial value \(\vec{r}(0) =\vec{ r}_{0} \in D\) (\(D \subset \mathbb{R}^{n}\) open) into a vector \(\vec{r}(t)\) (\(t \in \mathbb{R}\)) satisfying the following conditions:

$$\displaystyle\begin{array}{rcl} & \Phi _{0} = \mbox{ id, i.e., }\Phi _{0}(\vec{r}_{0})\,=\,\vec{r}_{0}\mbox{ for all }\vec{r}_{0} \in D, & {}\\ & \Phi _{t_{1}+t_{2}}\,=\,\Phi _{t_{1}} \circ \Phi _{t_{2}}\mbox{, i.e. }\Phi _{t_{1}+t_{2}}(\vec{r}_{0})\,=\,\Phi _{t_{1}}(\Phi _{t_{2}}(\vec{r}_{0}))\mbox{ for all }\vec{r}_{0} \in D,\quad t_{1},t_{2} \in \mathbb{R}.& {}\\ \end{array}$$

Here we have defined \(\Phi _{t}(\vec{r}):= \Phi (t,\vec{r})\).

(Cf. Wiggins [38, Proposition 7.4.3, p. 93], Wirsching [31, Definition 8.6], Amann [39, p. 123/124].)

The interpretation of this definition is simple: If no time passes, one remains at the same point. Instead of moving from a first point to a second one in the time span t2 and then from this second one to a third one in a time span t1, one may go directly from the first to the third in the time span t1 + t2.

Remark.

 

  • A flow (also called a phase flow) is called a global flow if it is defined for all \(t \in \mathbb{R}\) (as in Definition 2.11), a semiflow if it is defined for all \(t \in \mathbb{R}^{+}\), and a local flow if it is defined for t ∈ I (open interval I with 0 ∈ I).

  • For semiflows and local flows, Definition 2.11 has to be modified.

  • In the modern mathematical literature, a dynamical system is defined as a flow. In our introduction, however, the dynamical system was initially described by an ODE, and the corresponding velocity vector field induced the flow.

Final remark: If the matrix A does not possess n linearly independent eigenvectors, then no diagonalization is possible, in which this case generalized eigenvectors may be used (cf. Guckenheimer [33, p. 9]). These are defined by

$$\displaystyle\begin{array}{rcl} & (A -\lambda I)^{p} \cdot \vec{ w} = 0,& {}\\ & (A -\lambda I)^{p-1} \cdot \vec{ w}\neq 0,& {}\\ \end{array}$$

and may be used to transform any quadratic matrix A into Jordan canonical form (cf. Burg et al. [40, vol. II, p. 293] or Perko [30, Sect. 1.8]). As the formula shows, eigenvectors are also generalized eigenvectors (for p = 1).

2.8.7 Topological Orbit Equivalence

In this section, we shall define what it means to say that two vector fields are topologically orbit equivalent. Of course, the term topological orbit equivalence will include the case that the two vector fields can be transformed into each other by a simple rotation.

In order to simplify the situation even further, we assume that the two vector fields are given by

$$\displaystyle{ \vec{v}_{1}(\vec{r}_{1}) = A \cdot \vec{ r}_{1} }$$
(2.100)

and

$$\displaystyle{ \vec{v}_{2}(\vec{r}_{2}) = B \cdot \vec{ r}_{2}, }$$
(2.101)

where A and B are quadratic matrices.

If one vector field can be obtained as a result of rotating the other one, there must be a rotation matrix M such that

$$\displaystyle{ \vec{v}_{2} = M \cdot \vec{ v}_{1} = M \cdot A \cdot \vec{ r}_{1} }$$
(2.102)

holds. Now \(\vec{v}_{2}\) depends on \(\vec{r}_{1}\). In order to make \(\vec{v}_{2}\) dependent on \(\vec{r}_{2}\), we must rotate the coordinates in the same way as the vector field (see Fig. 2.5):

$$\displaystyle{ \vec{r}_{2} = M \cdot \vec{ r}_{1}. }$$
(2.103)

Hence, we obtain

$$\displaystyle{\vec{v}_{2} = M \cdot A \cdot M^{-1} \cdot \vec{ r}_{ 2}.}$$

Since M is invertible as a rotation matrix, the matrix

$$\displaystyle{B = M \cdot A \cdot M^{-1}}$$

describes the well-known similarity transformation that may also be written in the form

$$\displaystyle{B \cdot M = M \cdot A.}$$

A similarity transformation, however, is usually written in the form

$$\displaystyle{B =\tilde{ M}^{-1} \cdot A \cdot \tilde{ M},}$$

so that we have to define \(\tilde{M} = M^{-1}\) here.

Fig. 2.5
figure 5

Rotation of a vector field

Now we observe that two orbits can be identical even though the corresponding solution curves are parameterized in a different way.

A flow for Eq. (2.100) will be denoted by

$$\displaystyle{\vec{r}_{1}(t_{1},\vec{r}_{10}),}$$

and a flow for Eq. (2.101) by

$$\displaystyle{\vec{r}_{2}(t_{2},\vec{r}_{20}).}$$

In order to transform the orbits of these flows into each other, the starting points must be mapped first:

$$\displaystyle{\vec{r}_{20} = M \cdot \vec{ r}_{10}.}$$

Our requirement that different parameterizations be allowed for both solution curves may be translated as follows: For every time t1, there is a time t2 such that

$$\displaystyle{\vec{r}_{2}(t_{2},\vec{r}_{20}) = M \cdot \vec{ r}_{1}(t_{1},\vec{r}_{10}),}$$

and therefore

$$\displaystyle{\vec{r}_{2}(t_{2},M \cdot \vec{ r}_{10}) = M \cdot \vec{ r}_{1}(t_{1},\vec{r}_{10})}$$

holds. This formula must be included in the general definition of topological orbit equivalence if rotations are to be allowed as topologically equivalent transformations.

The previous considerations make the following definition transparent:

Definition 2.12.

Two C1 vector fields \(\vec{v}_{1}(\vec{r}_{1})\) and \(\vec{v}_{2}(\vec{r}_{2})\) are called topologically orbit equivalentFootnote 15 if a homeomorphism h exists such that for every pair \(\vec{r}_{10}\), t1, there exists t2 such that

$$\displaystyle{ \Phi _{t_{2}}^{\vec{v}_{2} }(h(\vec{r}_{10})) = h(\Phi _{t_{1}}^{\vec{v}_{1} }(\vec{r}_{10})) }$$
(2.104)

holds. Here, the orientation of the orbits must be preserved. If in addition, the parameterization by time is preserved, the vector fields are called topologically conjugate. In this definition, \(\Phi _{t}^{\vec{v}}\) denotes the flow that is induced by the vector field \(\vec{v}\).

(Cf. Sastry [36, Definition 7.18, p. 303], Wiggins [38, Definition 19.12.1, p. 346], Guckenheimer [33, p. 38, Definition 1.7.3].)

Remark.

A homeomorphism is a continuous map whose inverse map exists and is also continuous. The fact that a homeomorphism is used as a generalization of the rotation matrix, used as an example above, leads to the following features:

  • Not only linear maps are allowed, but also nonlinear ones.

  • The requirement that the map be continuous guarantees that neighborhoods of a point are mapped to neighborhoods of its image point. Therefore, the orbits are deformed but not torn apart. Two examples for topological orbit equivalence are shown in Fig. 2.6.

    Fig. 2.6
    figure 6

    Examples of topological orbit equivalence

The fact that the validity of Eq. (2.104) is required for each initial point ensures that all orbits are transformed into each other. Hence, the entire phase portraits will be equivalent.

The preservation of the orientation may be checked by means of a continuously differentiable function \(t_{2}(\vec{r}_{10},t_{1})\) with \(\frac{\partial t_{2}} {\partial t_{1}} > 0\) (cf. Perko [30, Sect. 3.1, Remark 2, p. 183/184]).

Please note that different authors use slightly different definitions. In cases of doubt, one should therefore check the relevant definitions thoroughly.

Let us now consider the case that a vector field \(\vec{v}(\vec{r}) = A \cdot \vec{ r}\) is given by a real n × n matrix A and that we want to check whether this vector field is topologically orbit equivalent to a simpler vector field. Often, diagonalization is possible. This case will be discussed in the following.

Remark.

 

  • In case diagonalization is not possible, it is always possible to transform the matrix into Jordan canonical form.

  • Diagonalization of an n × n matrix is possible if and only if for each eigenvalue, the algebraic multiplicity (multiplicity of the zeros of the characteristic polynomial) equals the geometric multiplicity (number of linearly independent eigenvectors).

  • Diagonalization of an n × n matrix is possible if and only if it possesses n linearly independent eigenvectors.

  • Diagonalization is possible for every symmetric matrix with real elements.

Consider a matrix A for which diagonalization is possible. We will show now that the diagonal matrixFootnote 16

$$\displaystyle{ B = X_{A}(0)^{-1} \cdot A \cdot X_{ A}(0) }$$
(2.105)

does in fact lead to a topologically orbit equivalent vector field. Here, \(\tilde{M} = M^{-1} = X_{A}(0)\) denotes the matrix of the n eigenvectors of A. These are linearly independent, since diagonalization of A is possible (cf. Burg/Haf/Wille [40, vol. II, p. 280, Theorem 3.52]).

According to Eqs. (2.98) and (2.99), the flows are given by

$$\displaystyle{\Phi _{t_{1}}^{\vec{v}_{1} }(\vec{r}_{10}) = X_{A}(t_{1})X_{A}(0)^{-1}\vec{r}_{ 10}}$$

for A and by

$$\displaystyle{\Phi _{t_{2}}^{\vec{v}_{2} }(\vec{r}_{20}) = X_{B}(t_{2})X_{B}(0)^{-1}\vec{r}_{ 20}}$$

for B. In our case, the homeomorphism h is given by the matrix M. We therefore obtain

$$\displaystyle{\Phi _{t_{2}}^{\vec{v}_{2} }(h(\vec{r}_{10})) = \Phi _{t_{2}}^{\vec{v}_{2} }(M \cdot \vec{ r}_{10}) = X_{B}(t_{2})X_{B}(0)^{-1}X_{ A}(0)^{-1} \cdot \vec{ r}_{ 10}.}$$

On the other hand,

$$\displaystyle{h(\Phi _{t_{1}}^{\vec{v}_{1} }(\vec{r}_{10})) = M \cdot \Phi _{t_{1}}^{\vec{v}_{1} }(\vec{r}_{10}) = X_{A}(0)^{-1}X_{ A}(t_{1})X_{A}(0)^{-1} \cdot \vec{ r}_{ 10}}$$

holds. We see that these expressions are equal for every initial vector \(\vec{r}_{10}\) if

$$\displaystyle{X_{B}(t_{2})X_{B}(0)^{-1}X_{ A}(0)^{-1} = X_{ A}(0)^{-1}X_{ A}(t_{1})X_{A}(0)^{-1},}$$

or

$$\displaystyle{X_{B}(t_{2})X_{B}(0)^{-1} = X_{ A}(0)^{-1}X_{ A}(t_{1}),}$$

is valid. Due to Eq. (2.97), we know that

$$\displaystyle{X_{A}(t_{1}) = X_{A}(0) \cdot D_{A}^{\lambda }(t_{ 1})}$$

and

$$\displaystyle{X_{B}(t_{2}) = X_{B}(0) \cdot D_{B}^{\lambda }(t_{ 2})}$$

hold, so that the equation

$$\displaystyle{X_{B}(0) \cdot D_{B}^{\lambda }(t_{ 2})X_{B}(0)^{-1} = X_{ A}(0)^{-1}X_{ A}(0) \cdot D_{A}^{\lambda }(t_{ 1}) = D_{A}^{\lambda }(t_{ 1})}$$

has to be verified. Since B is a diagonal matrix, the Cartesian unit vectors are eigenvectors, so that

$$\displaystyle{X_{B}(0) = I}$$

is valid. Therefore, we have only to check whether

$$\displaystyle{D_{B}^{\lambda }(t_{ 2}) = D_{A}^{\lambda }(t_{ 1})}$$

is true. Since the eigenvalues of A and B are equal due to diagonalization, we obtain

$$\displaystyle{D_{A}^{\lambda }(t_{ 1}) = D_{B}^{\lambda }(t_{ 1}).}$$

Here we had only to set t2 = t1. We have shown that diagonalization leads to topologically orbit equivalent vector fields.

2.8.8 Classification of Fixed Points of an Autonomous Linear System of Second Order

The considerations presented above indicate that a similarity transformation

$$\displaystyle{B =\tilde{ M}^{-1} \cdot A \cdot \tilde{ M}}$$

always leads to topologically orbit equivalent vector fields. Every similarity transformation leaves the eigenvalues unchanged (cf. Burg/Haf/Wille [40, vol. II, p. 272, 3.17]). This leads us to the assumption that the eigenvalues of a matrix at least influence the topological properties of the related vector field. Therefore, the eigenvalues are now used to characterize the fixed points.

We calculate the eigenvalues of a two-dimensional matrix

$$\displaystyle{A = \left (\begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right )}$$

with constant real elements. This leads to

$$\displaystyle\begin{array}{rcl} & (a_{11}-\lambda )(a_{22}-\lambda ) - a_{12}a_{21} = 0 & {}\\ & \Rightarrow \lambda ^{2} -\lambda (a_{11} + a_{22}) + a_{11}a_{22} - a_{12}a_{21} = 0& {}\\ \end{array}$$
$$\displaystyle{ \Rightarrow \lambda ^{2} -\lambda \;\mathrm{ tr}A +\det A = 0. }$$
(2.106)

Hence, we obtain

$$\displaystyle{ \lambda = \frac{\mathrm{tr}A} {2} \pm \sqrt{\frac{(\mathrm{tr }A)^{2 } } {4} -\det \; A} = B \pm \sqrt{C} }$$
(2.107)

with

$$\displaystyle{ B = \frac{\mathrm{tr}A} {2},\qquad C = \frac{(\mathrm{tr}A)^{2}} {4} -\det A. }$$
(2.108)

We now try to distinguish as many cases as possible:

  1. 1.

    Both eigenvalues are real (C ≥ 0).

    1. (a)

      Both are positive

      1. (i)

        λ1 > λ2 > 0

      2. (ii)

        λ1 = λ2 > 0

    2. (b)

      Both are negative

      1. (i)

        λ1 < λ2 < 0

      2. (ii)

        λ1 = λ2 < 0

    3. (c)

      One is positive, one is negative: λ1λ2 < 0

    4. (d)

      One equals 0 (λ1 = 0):

      1. (i)

        λ2 > 0

      2. (ii)

        λ2 < 0

      3. (iii)

        λ2 = 0

  2. 2.

    Imaginary eigenvalues: C < 0, B = 0, \(\lambda _{2} = -\lambda _{1}\)

  3. 3.

    Complex eigenvalues: C < 0, λ1 = λ2

    1. (a)

      B = Re{λ} < 0

    2. (b)

      B = Re{λ} > 0

Hence we have found 11 distinct cases. If one eigenvalue is zero, then Eq. (2.106) leads to

$$\displaystyle{\det \;A = 0.}$$

As we will show now, this means in general that one row is a multiple of the other row, which contains the special case that one or both rows are zero.

If we now assume that the second row is a multiple of the first one,

$$\displaystyle{A = \left (\begin{array}{cc} a_{11} & a_{12} \\ ka_{11} & ka_{12} \end{array} \right ),}$$

we obtain the following condition for the eigenvalues:

$$\displaystyle\begin{array}{rcl} & (a_{11}-\lambda )(ka_{12}-\lambda ) - ka_{11}a_{12} = 0& {}\\ & \Rightarrow \lambda ^{2} -\lambda (a_{11} + ka_{12}) = 0. & {}\\ \end{array}$$

This shows that at least one eigenvalue is zero. The same can be shown for the case that the first equation is a multiple of the second one. Since one eigenvalue is 0, the relation

$$\displaystyle{\det (A -\lambda I) = 0}$$

leads to

$$\displaystyle{\det \;A = 0.}$$

In conclusion, the following statements for our two-dimensional case are equivalent:

  • det A = 0.

  • One row of A is a multiple of the other row.

  • At least one eigenvalue is 0.

The following general theorem holds:

Theorem 2.13.

The following statements for a quadratic matrix A are equivalent:

  • The quadratic matrix A is regular.

  • All row vectors (or column vectors) of A are linearly independent.

  • det  A ≠ 0.

  • All eigenvalues of A are nonzero.

  • A is invertible.

In our two-dimensional case, the statement that for det A = 0, one row is a multiple of the other one means that the equation \(\vec{v}(\vec{r}) = A \cdot \vec{ r} = 0\) is satisfied along a line through the origin or even everywhere. Hence, we have an infinite number of fixed points that are not separated from each other. In this case, we speak of degenerate fixed points (see Definition 2.6 on p. 59). This case will henceforth be excludedFootnote 17 (case 1.d), so that the number of relevant cases is reduced from 11 to 8. According to the behavior of the vector field in the vicinity of the fixed point \(\vec{r}_{\mathrm{F}} = 0\), the fixed points are named as follows:

  1. 1.

    Both eigenvalues are real (C ≥ 0).

    1. (a)

      Both positive

      1. (i)

        λ1 > λ2 > 0: unstable node

      2. (ii)

        λ1 = λ2 > 0: unstable improper node or unstable star

    2. (b)

      Both negative

      1. (i)

        λ1 < λ2 < 0: stable node

      2. (ii)

        λ1 = λ2 < 0: stable improper node or stable star

    3. (c)

      One positive, one negative: λ1λ2 < 0: saddle point

  2. 2.

    Imaginary eigenvalues: C < 0, B = 0, \(\lambda _{2} = -\lambda _{1}\): center or elliptic fixed point

  3. 3.

    Complex eigenvalues: C < 0, λ1 = λ2

    1. (a)

      Re{λ} < 0: stable spiral point or stable focus

    2. (b)

      Re{λ} > 0: unstable spiral point or unstable focus

The classification of fixed points is summarized in Table 2.4 on p. 74 (cf. [29, 42, 43]). As the table shows, not only the eigenvalues can be used to characterize the fixed points. In the column “Topology”, the numbers in parentheses denote the quantity of eigenvalues with positive real part. The column “Topology” also contains the so-called index in brackets. In order to calculate the index, one considers a closed path around the fixed point with an orientation that is mathematically positive. Now one checks how many revolutions the vectors of the vector field perform while “walking” on the path. If, e.g., the vectors of the vector field also perform one revolution in mathematically positive orientation, the index is + 1. If the vector field rotates in opposite direction, the index is − 1.

Figures 2.7, 2.8, 2.9, 2.10, 2.11, and 2.12 on p. 75 and 76 show how orbits in the vicinity of the fixed point look in principle for each type of fixed point. In case the fixed point is a stable node, star, or spiral point, the orientation of the solution curves will be towards the fixed point in the middle; in case of an unstable node, star, or spiral point, all solution curves will be directed outwards. Each picture is just an example; in the specific case under consideration the orbits may of course be deformed significantly.

Fig. 2.7
figure 7

Center

Fig. 2.8
figure 8

Node with two tangents

Fig. 2.9
figure 9

Node with one tangent

Fig. 2.10
figure 10

Star

Fig. 2.11
figure 11

Spiral point

Fig. 2.12
figure 12

Saddle point

Table 2.4 Classification of isolated fixed points in the plane (regular Jacobian matrix)

2.8.9 Nonlinear Systems

Consider the nonlinear autonomous system

$$\displaystyle{ \frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r}) }$$
(2.109)

with the initial condition

$$\displaystyle{\vec{r}(0) =\vec{ r}_{0},}$$

where \(\vec{v}(\vec{r}) \in C^{2}\) has a fixed point \(\vec{r}_{\mathrm{F}}\) with

$$\displaystyle{\vec{v}(\vec{r}_{\mathrm{F}}) = 0.}$$

As the results summarized in this section show, the linearization

$$\displaystyle{ \frac{\mathrm{d}\vec{r}} {\mathrm{d}t} = A \cdot \vec{ r} }$$
(2.110)

of the system (2.109), where \(A = D\vec{v}(\vec{r}_{\mathrm{F}})\) is the Jacobian matrixFootnote 18 at \(\vec{r} =\vec{ r}_{\mathrm{F}}\), is a powerful tool for analyzing a nonlinear system in the vicinity of its fixed points. Please note that \(\vec{r} = 0\) in Eq. (2.110) corresponds to \(\vec{r} =\vec{ r}_{\mathrm{F}}\) in Eq. (2.109), i.e., the fixed point of the nonlinear system was shifted to the origin of the linearized system.

Theorem 2.14.

Consider the nonlinear system  (2.109) with the linearization (2.110) at a fixed point\(\vec{r}_{\mathrm{F}}\). If A is nonsingular, then the fixed point\(\vec{r}_{\mathrm{F}}\)is isolated (i.e., nondegenerate).

(See Sastry [36, Proposition 1.5, p. 13], Perko [30, Definition 2, p. 173].)

If one or more eigenvalues of the Jacobian matrix are zero, the fixed point is a degenerate fixed point. This is the generalization of the linear case.

Definition 2.15.

A fixed point is called a hyperbolic fixed point if no eigenvalue of the Jacobian matrix has zero real part.

Theorem 2.16.

If the fixed point \(\vec{r}_{\mathrm{F}}\) is a hyperbolic fixed point, then there exist two neighborhoods U of \(\vec{r}_{\mathrm{F}}\) and V of \(\vec{r} = 0\) and a homeomorphism h: U → V, such that h transforms the orbits of Eq.  (2.109) into orbits of

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} = A \cdot \vec{ r}\mbox{ with }A = D\vec{v}(\vec{r}_{\mathrm{F}}).}$$

Orientation and parameterization by time are preserved.

(Cf. Guckenheimer [33, Theorem 1.3.1 (Hartman-Grobman), p. 13], Perko [30, Theorem Sect. 2.8, p. 120], and Bronstein [44, Sect. 11.3.2].)

In other words, we may state the following theorem.

Theorem 2.17 (Hartman–Grobman).

Let \(\vec{r}_{\mathrm{F}}\) be a hyperbolic fixed point. Then the nonlinear problem

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r}),\qquad \vec{v} \in C^{1}(D),\qquad D \subset \mathbb{R}^{n}\mbox{ open},}$$

and the linearized problem

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} = A \cdot (\vec{r} -\vec{ r}_{\mathrm{F}})}$$

with

$$\displaystyle{A = D\vec{v}(\vec{r}_{\mathrm{F}})}$$

are topologically conjugate in a neighborhood of \(\vec{r}_{\mathrm{F}}\) .

(See Wiggins [38, Theorem 19.12.6, p. 350].)

If the fixed point is not hyperbolic, i.e., a center (elliptic fixed point), then the smallest nonlinearities are sufficient to create a stable or an unstable spiral point. This is why the theorem refers to hyperbolic fixed points only.

If the real parts of all eigenvalues of \(D\vec{v}(\vec{r}_{\mathrm{F}})\) are negative, then \(\vec{r}_{\mathrm{F}}\) is asymptotically stable. If the real part of at least one eigenvalue is positive, then \(\vec{r}_{\mathrm{F}}\) is unstable:

Theorem 2.18.

Let \(D \subset \mathbb{R}^{n}\) be an open set, \(\vec{v}(\vec{r})\)continuously differentiable on D, and\(\vec{r}_{\mathrm{F}}\)a fixed point of

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r}).}$$

If the real parts of all eigenvalues of\(D\vec{v}(\vec{r}_{\mathrm{F}})\)are negative, then\(\vec{r}_{\mathrm{F}}\)is asymptotically stable. If\(\vec{r}_{\mathrm{F}}\)is stable, then no eigenvalue has positive real part.

(Cf. Bronstein [44, Sect. 11.3.1], Perko [30, Theorem 2, p. 130].)

A saddle point has the special property that two trajectories exist that approach the saddle point for t → , whereas two different trajectories exist that approach the saddle point for t → − (cf. [30, Sect. 2.10, Definition 5]). These four trajectories define a separatrix. Loosely speaking, a separatrix is a trajectory that “meets” the saddle point.

2.8.10 Characteristic Equation

Consider the autonomous linear homogeneous nth-order ordinary differential equation

$$\displaystyle{ a_{n}\frac{\mathrm{d}^{n}y} {\mathrm{d}t^{n}} + a_{n-1}\frac{\mathrm{d}^{n-1}y} {\mathrm{d}t^{n-1}} + \cdots + a_{1}\frac{\mathrm{d}y} {\mathrm{d}t} + a_{0}y(t) = 0. }$$
(2.111)

As usual (see Sect. 2.8.1), we define the vector \(\vec{r} = (x_{1},x_{2},\ldots,x_{n})^{\mathrm{T}}\) by

$$\displaystyle\begin{array}{rcl} x_{1}& =& y, {}\\ x_{2}& =& \frac{\mathrm{d}y} {\mathrm{d}t}, {}\\ & \ldots & {}\\ x_{n}& =& \frac{\mathrm{d}^{n-1}y} {\mathrm{d}t^{n-1}}, {}\\ \end{array}$$

which leads to

$$\displaystyle\begin{array}{rcl} \dot{x}_{1}& =& x_{2}, {}\\ \dot{x}_{2}& =& x_{3}, {}\\ & \ldots & {}\\ \dot{x}_{n}& =& -\frac{a_{0}} {a_{n}}x_{1} - \frac{a_{1}} {a_{n}}x_{2} -\cdots -\frac{a_{n-1}} {a_{n}} x_{n}, {}\\ \end{array}$$

in order to obtain the standard form

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} =\vec{ v}(\vec{r}).}$$

This may be written as

$$\displaystyle{\frac{\mathrm{d}\vec{r}} {\mathrm{d}t} = A \cdot \vec{ r}}$$

if the following n × n matrix is defined:

$$\displaystyle{ A = \left (\begin{array}{ccccccc} 0 & 1 & 0 & 0 &\cdots & 0 & 0\\ 0 & 0 & 1 & 0 &\cdots & 0 & 0 \\ 0 & 0 & 0 & 1 &\cdots & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\ 0 & 0 & 0 & 0 &\mathop{\ldots } & 1 & 0 \\ 0 & 0 & 0 & 0 &\mathop{\ldots }& 0 & 1 \\ - \frac{a_{0}} {a_{n}} & - \frac{a_{1}} {a_{n}} & - \frac{a_{2}} {a_{n}} & - \frac{a_{3}} {a_{n}} & \mathop{\ldots }& -\frac{a_{n-2}} {a_{n}} & -\frac{a_{n-1}} {a_{n}} \end{array} \right ). }$$
(2.112)

According to Sect. 2.8.6, Eq. (2.96), we know that asymptotic stability is reached if all eigenvalues of this system matrix have negative real part. Therefore, we now describe how to find the eigenvalues based on the requirement that the determinant

$$\displaystyle{ D_{n}^{F} =\det (A -\lambda I) }$$
(2.113)

equal zero. Let us begin with n = 2 as an example:

$$\displaystyle\begin{array}{rcl} & A = \left (\begin{array}{cc} 0 & 1\\ - \frac{a_{0 } } {a_{2}} & -\frac{a_{1}} {a_{2}} \end{array} \right ) & {}\\ & \Rightarrow D_{2}^{F} =\det (A -\lambda I) = \left \vert \begin{array}{cc} -\lambda & 1 \\ -\frac{a_{0}} {a_{2}} & -\frac{a_{1}} {a_{2}} -\lambda \end{array} \right \vert =\lambda ^{2} + \frac{a_{1}} {a_{2}} \lambda + \frac{a_{0}} {a_{2}} \stackrel{!}{=}0.& {}\\ \end{array}$$

For n = 3, we obtain

$$\displaystyle\begin{array}{rcl} & A = \left (\begin{array}{ccc} 0 & 1 & 0\\ 0 & 0 & 1 \\ -\frac{a_{0}} {a_{3}} & -\frac{a_{1}} {a_{3}} & -\frac{a_{2}} {a_{3}} \end{array} \right ) & {}\\ & \Rightarrow D_{3}^{F} =\det (A -\lambda I) = \left \vert \begin{array}{ccc} -\lambda & 1 & 0\\ 0 & -\lambda & 1 \\ -\frac{a_{0}} {a_{3}} & -\frac{a_{1}} {a_{3}} & -\frac{a_{2}} {a_{3}} -\lambda \end{array} \right \vert = -\lambda ^{3} -\frac{a_{2}} {a_{3}} \lambda ^{2} -\frac{a_{1}} {a_{3}} \lambda -\frac{a_{0}} {a_{3}} \stackrel{!}{=}0.& {}\\ \end{array}$$

These two results lead us to the assumption that

$$\displaystyle\begin{array}{rcl} D_{n}^{F}& =& (-1)^{n}\left (\lambda ^{n} + \frac{a_{n-1}} {a_{n}} \lambda ^{n-1} + \frac{a_{n-2}} {a_{n}} \lambda ^{n-2} + \cdots + \frac{a_{2}} {a_{n}}\lambda ^{2} + \frac{a_{1}} {a_{n}}\lambda + \frac{a_{0}} {a_{n}}\right ) \\ & =& (-1)^{n}\sum _{ k=0}^{n}\frac{a_{k}} {a_{n}}\lambda ^{k} {}\end{array}$$
(2.114)

holds in general. In Appendix A.11, it is shown that this is indeed true. The requirement that the polynomial in Eq. (2.114) equal zero is called the characteristic equation of the ODE (2.111). One easily sees that the characteristic equation is also obtained if the Laplace transform is applied to the original ODE (2.111):

$$\displaystyle\begin{array}{rcl} & \left (a_{n}s^{n} + a_{n-1}s^{n-1} + \cdots + a_{1}s + a_{0}\right )Y (s) = 0 & {}\\ & \Rightarrow a_{n}s^{n} + a_{n-1}s^{n-1} + \cdots + a_{1}s + a_{0} = 0\mbox{ for }Y (s)\neq 0.& {}\\ \end{array}$$

The matrix A is called the Frobenius companion matrix of the polynomial. Please note that instead of finding the zeros (roots) of the polynomial, one may also determine the eigenvalues of the companion matrix A and vice versa.

Hence, asymptotic stability of the dynamical system defined by the ODE (2.111) is equivalently shown

  • if all zeros of the characteristic equation have negative real part.

  • if all eigenvalues of the system matrix have negative real part.

In case of asymptotic stability, one also calls the system matrix a strictly or negative stable matrix (cf. [45, Definition 2.4.2]) or a Hurwitz matrix.

2.9 Continuity Equation

Consider a particle density ρ in space and a velocity field \(\vec{v}\) that moves the particles. We will now calculate how the particle density in a fixed volume V changes due to the velocity field.

For this purpose, we consider a small volume element \(\Delta V\) at the surface of the three-dimensional domain V. As shown in Fig. 2.13, this contains in total

$$\displaystyle{\Delta n = \Delta V \;\rho = \Delta h\;\Delta A\;\rho }$$

particles. During the time interval \(\Delta t\), this quantity of

$$\displaystyle{\Delta n = v_{n}\Delta t\;\Delta A\;\rho =\rho \; \Delta t\;\vec{v} \cdot \Delta \vec{A}}$$

particles will leave the domain V, where \(v_{n} = \Delta h/\Delta t\) denotes the normal component of the velocity vector \(\vec{v}\) with respect to the surface of the domain V. Hence, we have

$$\displaystyle{\int _{V }\rho (t + \Delta t)\;\mathrm{d}V =\int _{V }\rho (t)\;\mathrm{d}V -\oint _{\partial V }\rho \Delta t\;\vec{v} \cdot \mathrm{ d}\vec{A}.}$$
Fig. 2.13
figure 13

Volume element at the surface of a region

As a limit for \(\Delta t \rightarrow 0\), one therefore obtains

$$\displaystyle{\int _{V }\dot{\rho }\;\mathrm{d}V = -\oint _{\partial V }\rho \;\vec{v} \cdot \mathrm{ d}\vec{A}.}$$

According to Gauss’s theorem,

$$\displaystyle{\oint _{\partial V }\vec{V } \cdot \mathrm{ d}\vec{A} =\int _{V }\mathrm{div}\;\vec{V }\;\mathrm{d}V,}$$

one concludes by setting \(\vec{V } =\rho \;\vec{ v}\):

$$\displaystyle{\int _{V }\dot{\rho }\;\mathrm{d}V = -\int _{V }\mathrm{div}(\rho \;\vec{v})\;\mathrm{d}V.}$$

Since this equation must be valid for arbitrary choices of the domain V, one obtains

$$\displaystyle{ \fbox{$-\dot{\rho } =\mathrm{ div}(\rho \;\vec{v}).$} }$$
(2.115)

This is the continuity equation, for which we only assumed that no particles disappear and no particles are generated. Instead of the particle density, one could have considered different densities, such as the mass density, assuming mass conservation in that case. If we take the charge density as an example, charge conservation leads to

$$\displaystyle{\fbox{$ -\dot{\rho }_{q} =\mathrm{ div}(\rho _{q}\;\vec{v}) =\mathrm{ div}\;\vec{J},$}}$$

which we already know as Eq. (2.46) and where \(\vec{J} =\rho _{q}\vec{v}\) is the convection current density.

Remark.

If \(\dot{\rho }_{q} = 0\) holds, then the density will remain constant at every location; one obtains a stationary flow with

$$\displaystyle{\mathrm{div}(\rho _{q}\;\vec{v}) = 0\mbox{ or }\mathrm{div}\;\vec{J} = 0.}$$

This equation is known in electromagnetism for steady currents.

2.10 Area Preservation in Phase Space

In this section, we discuss how an area or a volume that is defined by the contained particles is modified when the particles are moving.

2.10.1 Velocity Vector Fields

Consider an arbitrary domain A in \(\mathbb{R}^{2}\) at time t. Particles located inside the domain and on its boundary at time t will move a bit farther during the time span \(\Delta t\). This movement is determined by the velocity field \(\vec{v}(x,y)\).

Let us define a parameterization of the domain such that x(α, β) and y(α, β) are given depending on the parameters α and β. This leads to the area

$$\displaystyle{A(t) =\int _{A}\mathrm{d}A =\int _{ \beta _{\mathrm{min}}}^{\beta _{\mathrm{max}} }\int _{\alpha _{\mathrm{min}}}^{\alpha _{\mathrm{max}} }\left \vert \frac{\partial (x,y)} {\partial (\alpha,\beta )} \right \vert \;\mathrm{d}\alpha \;\mathrm{d}\beta.}$$

The coordinates \(\vec{r} = (x,y)\) denote each point of A. Such a point \(\vec{r}\) will move to the new point

$$\displaystyle{\vec{r}\,^{{\prime}} =\vec{ r} +\vec{ v}(\vec{r})\Delta t}$$

after the time span \(\Delta t\). Since \(\vec{r}\) depends on α and β, it follows that \(\vec{r}\,^{{\prime}}\) will also depend on these parameters. For the area of the deformed domain at the time \(t + \Delta t\), we therefore get

$$\displaystyle{A(t + \Delta t) = A^{{\prime}} =\int _{ \beta _{\mathrm{ min}}}^{\beta _{\mathrm{max}} }\int _{\alpha _{\mathrm{min}}}^{\alpha _{\mathrm{max}} }\left \vert \frac{\partial (x^{{\prime}},y^{{\prime}})} {\partial (\alpha,\beta )} \right \vert \;\mathrm{d}\alpha \;\mathrm{d}\beta }$$

with

$$\displaystyle\begin{array}{rcl} x^{{\prime}}& =& x + v_{ x}\Delta t, {}\\ y^{{\prime}}& =& y + v_{ y}\Delta t. {}\\ \end{array}$$

Using the abbreviation

$$\displaystyle{\xi = \frac{\partial (x,y)} {\partial (\alpha,\beta )} = \left \vert \begin{array}{ll} \frac{\partial x} {\partial \alpha } &\frac{\partial x} {\partial \beta } \\ \frac{\partial y} {\partial \alpha } &\frac{\partial y} {\partial \beta }\\ \end{array} \right \vert =\det (\vec{r}_{\alpha },\vec{r}_{\beta })}$$

or

$$\displaystyle{\xi ^{{\prime}} =\det (\vec{r}\,_{\alpha }^{{\prime}},\vec{r}\,_{\beta }^{{\prime}})}$$

leads to

$$\displaystyle\begin{array}{rcl} \xi ^{{\prime}}& =& \det (\vec{r}_{\alpha } +\vec{ v}_{\alpha }\Delta t,\vec{r}_{\beta } +\vec{ v}_{\beta }\Delta t) =\det (\vec{r}_{\alpha },\vec{r}_{\beta } +\vec{ v}_{\beta }\Delta t) +\det (\vec{v}_{\alpha }\Delta t,\vec{r}_{\beta } +\vec{ v}_{\beta }\Delta t) = {}\\ & =& \det (\vec{r}_{\alpha },\vec{r}_{\beta }) +\det (\vec{r}_{\alpha },\vec{v}_{\beta }\Delta t) +\det (\vec{v}_{\alpha }\Delta t,\vec{r}_{\beta }) +\det (\vec{v}_{\alpha }\Delta t,\vec{v}_{\beta }\Delta t) = {}\\ & =& \xi +\Delta t\left (\det (\vec{r}_{\alpha },\vec{v}_{\beta }) +\det (\vec{v}_{\alpha },\vec{r}_{\beta })\right ) + \Delta t^{2}\det (\vec{v}_{\alpha },\vec{v}_{\beta }). {}\\ \end{array}$$

One obtains

$$\displaystyle\begin{array}{rcl} \frac{\partial \xi } {\partial t}& =& \lim _{\Delta t\rightarrow 0} \frac{\xi ^{{\prime}}-\xi } {\Delta t} =\det (\vec{r}_{\alpha },\vec{v}_{\beta }) +\det (\vec{v}_{\alpha },\vec{r}_{\beta }) = {}\\ & =& \frac{\partial x} {\partial \alpha } \frac{\partial v_{y}} {\partial \beta } -\frac{\partial y} {\partial \alpha } \frac{\partial v_{x}} {\partial \beta } + \frac{\partial v_{x}} {\partial \alpha } \frac{\partial y} {\partial \beta } -\frac{\partial v_{y}} {\partial \alpha } \frac{\partial x} {\partial \beta } = {}\\ & =& \frac{\partial x} {\partial \alpha } \left (\frac{\partial v_{y}} {\partial x} \frac{\partial x} {\partial \beta } + \frac{\partial v_{y}} {\partial y} \frac{\partial y} {\partial \beta } \right ) -\frac{\partial y} {\partial \alpha } \left (\frac{\partial v_{x}} {\partial x} \frac{\partial x} {\partial \beta } + \frac{\partial v_{x}} {\partial y} \frac{\partial y} {\partial \beta } \right ) + {}\\ & +& \frac{\partial y} {\partial \beta } \left (\frac{\partial v_{x}} {\partial x} \frac{\partial x} {\partial \alpha } + \frac{\partial v_{x}} {\partial y} \frac{\partial y} {\partial \alpha } \right ) -\frac{\partial x} {\partial \beta } \left (\frac{\partial v_{y}} {\partial x} \frac{\partial x} {\partial \alpha } + \frac{\partial v_{y}} {\partial y} \frac{\partial y} {\partial \alpha } \right ) = {}\\ & =& \frac{\partial v_{y}} {\partial y} \left (\frac{\partial x} {\partial \alpha } \frac{\partial y} {\partial \beta } -\frac{\partial x} {\partial \beta } \frac{\partial y} {\partial \alpha } \right ) + \frac{\partial v_{x}} {\partial x} \left (\frac{\partial x} {\partial \alpha } \frac{\partial y} {\partial \beta } -\frac{\partial x} {\partial \beta } \frac{\partial y} {\partial \alpha } \right ) {}\\ \end{array}$$
$$\displaystyle{\Rightarrow \frac{\partial \xi } {\partial t} =\xi \;\mathrm{ div}\;\vec{v}.}$$

Since ξ ≠ 0 is valid (ξ =  | ξ |  sgn ξ, sgn ξ constant), one gets

$$\displaystyle{\frac{\partial \vert \xi \vert } {\partial t} = \vert \xi \vert \;\mathrm{div}\;\vec{v}.}$$

Due to

$$\displaystyle{A(t) =\int _{ \beta _{\mathrm{min}}}^{\beta _{\mathrm{max}} }\int _{\alpha _{\mathrm{min}}}^{\alpha _{\mathrm{max}} }\vert \xi \vert \;\mathrm{d}\alpha \;\mathrm{d}\beta,}$$

one obtains

$$\displaystyle{\frac{\mathrm{d}A} {\mathrm{d}t} =\int _{ \beta _{\mathrm{min}}}^{\beta _{\mathrm{max}} }\int _{\alpha _{\mathrm{min}}}^{\alpha _{\mathrm{max}} } \frac{\partial \vert \xi \vert } {\partial t} \;\mathrm{d}\alpha \;\mathrm{d}\beta =\int _{ \beta _{\mathrm{min}}}^{\beta _{\mathrm{max}} }\int _{\alpha _{\mathrm{min}}}^{\alpha _{\mathrm{max}} }\vert \xi \vert \;\mathrm{div}\;\vec{v}\;\mathrm{d}\alpha \;\mathrm{d}\beta.}$$

Now it is obvious that the area remains constant for \(\mathrm{div}\;\vec{v} = 0\). If we were talking here about a fluid, such a fluid would obviously be incompressible; were one to try to compress it, the shape would be modified, but the total area (or volume) occupied by the particles would remain the same.

2.10.2 Maps

Now we analyze in a more general way how an area

$$\displaystyle{A =\int _{ \beta _{\mathrm{min}}}^{\beta _{\mathrm{max}} }\int _{\alpha _{\mathrm{min}}}^{\alpha _{\mathrm{max}} }\left \vert \frac{\partial (x,y)} {\partial (\alpha,\beta )} \right \vert \;\mathrm{d}\alpha \;\mathrm{d}\beta }$$

is modified by a map

$$\displaystyle{\vec{r}\,^{{\prime}} =\vec{ F}(\vec{r}),}$$

which transforms each vector \(\vec{r} = (x,y)\) into a vector \(\vec{r}\,^{{\prime}} = (x^{{\prime}},y^{{\prime}})\). The parameterization will remain the same. Each point of the domain moves to a new point, so that the shape of the domain will change in general. Hence, we have to calculate

$$\displaystyle{A^{{\prime}} =\int _{ \beta _{\mathrm{ min}}}^{\beta _{\mathrm{max}} }\int _{\alpha _{\mathrm{min}}}^{\alpha _{\mathrm{max}} }\left \vert \frac{\partial (x^{{\prime}},y^{{\prime}})} {\partial (\alpha,\beta )} \right \vert \;\mathrm{d}\alpha \;\mathrm{d}\beta.}$$

According to Appendix A.10, we have

$$\displaystyle{A^{{\prime}} = A}$$

if

$$\displaystyle{\left \vert \frac{\partial (x^{{\prime}},y^{{\prime}})} {\partial (x,y)} \right \vert = 1}$$

is satisfied; the Jacobian of area-preserving maps is obviously + 1 or − 1.

We now check this general formula for the situation discussed in the previous section, where a special map

$$\displaystyle{\vec{r}\,^{{\prime}} = \left (\begin{array}{c} x^{{\prime}} \\ y^{{\prime}}\end{array} \right ) =\vec{ r}+\vec{v}\;\Delta t = \left (\begin{array}{c} x\\ y \end{array} \right )+\left (\begin{array}{c} v_{x} \\ v_{y}\end{array} \right )\;\Delta t}$$

was given. The Jacobian is then

$$\displaystyle\begin{array}{rcl} & & \left \vert \begin{array}{cc} \frac{\partial x^{{\prime}}} {\partial x} &\frac{\partial x^{{\prime}}} {\partial y} \\ \frac{\partial y^{{\prime}}} {\partial x} &\frac{\partial y^{{\prime}}} {\partial y} \end{array} \right \vert = \left \vert \begin{array}{cc} 1 + \frac{\partial v_{x}} {\partial x} \;\Delta t& \frac{\partial v_{x}} {\partial y} \;\Delta t \\ \frac{\partial v_{y}} {\partial x} \;\Delta t &1 + \frac{\partial v_{y}} {\partial y} \;\Delta t \end{array} \right \vert = 1 + \Delta t\left (\frac{\partial v_{x}} {\partial x} + \frac{\partial v_{y}} {\partial y} \right ) {}\\ & & \qquad + \Delta t^{2}\left (\frac{\partial v_{x}} {\partial x} \;\frac{\partial v_{y}} {\partial y} -\frac{\partial v_{x}} {\partial y} \;\frac{\partial v_{y}} {\partial x} \right ). {}\\ \end{array}$$

If one now wants to calculate

$$\displaystyle{ \frac{\partial \xi } {\partial t} =\lim _{\Delta t\rightarrow 0} \frac{\xi ^{{\prime}}-\xi } {\Delta t},}$$

one obtains, due to

$$\displaystyle{\xi ^{{\prime}} = \frac{\partial (x^{{\prime}},y^{{\prime}})} {\partial (x,y)} \;\xi }$$

(see Appendix A.10), the relation

$$\displaystyle{ \frac{\partial \xi } {\partial t} =\xi \;\lim _{\Delta t\rightarrow 0}\frac{\frac{\partial (x^{{\prime}},y^{{\prime}})} {\partial (x,y)} - 1} {\Delta t} =\xi \; \left (\frac{\partial v_{x}} {\partial x} + \frac{\partial v_{y}} {\partial y} \right ) =\xi \;\mathrm{ div}\;\vec{v},}$$

as above.

2.10.3 Liouville’s Theorem

The statement derived above that the condition

$$\displaystyle{\fbox{$\mathrm{div}\;\vec{v} = 0$}}$$

leads to area preservation or—depending on the dimension—to volume preservation in phase space is called Liouville’s theorem. This equation is also given as the condition for incompressible flows. Please note that one can speak of area preservation only if the area is defined in a unique way. This is, for example possible, if a continuous particle density ρ with clear boundaries in phase space is assumed, but not for a discrete distribution of individual particles (or only approximately if large numbers of particles are present). We will return to this problem later.

Liouville’s theorem (and therefore also area/volume preservation) is also valid if

$$\displaystyle{\vec{v}(\vec{r},t)}$$

depends explicitly on time (cf. Szebehely [46, p. 55], Fetter [47, p. 296], or Budó [48, p. 446]).

2.11 Hamiltonian Systems

Hamiltonian theory is usually developed in the scope of classical mechanics after introduction of the Lagrangian formulation (cf. [19]). Here we choose a different approach by introducing Hamiltonian functions directly. This can, of course, be no replacement for intense studies of Hamiltonian mechanics, but it is sufficient to understand some basics that are relevant in the following chapters of this book.

2.11.1 Example for Motivation

Consider the system sketched in Fig. 2.14. The spring constant K and the mass m are known. The force balance leads to

$$\displaystyle{m\ddot{x} = -Kx}$$
$$\displaystyle{ \Leftrightarrow \ddot{ x} + \frac{K} {m}\;x = 0. }$$
(2.116)

The general solution is obtained using the following ansatz:

$$\displaystyle\begin{array}{rcl} x& =& A\;\cos (\omega t) + B\;\sin (\omega t), {}\\ \dot{x}& =& -A\omega \;\sin (\omega t) + B\omega \;\cos (\omega t), {}\\ \ddot{x}& =& -A\omega ^{2}\;\cos (\omega t) - B\omega ^{2}\;\sin (\omega t). {}\\ \end{array}$$

We obviously obtain

$$\displaystyle{\omega ^{2} = \frac{K} {m}.}$$

One can alternatively write x in the form

$$\displaystyle{ x = C\;\cos (\omega t-\varphi ) = C\;\cos (\omega t)\;\cos \varphi + C\;\sin (\omega t)\;\sin \varphi. }$$
(2.117)

This leads to:

$$\displaystyle\begin{array}{rcl} & A = C\;\cos \varphi,& {}\\ & B = C\;\sin \varphi.& {}\\ \end{array}$$

For \(\dot{x}\) and \(\ddot{x}\) one obtains

$$\displaystyle{ \dot{x} = -C\omega \;\sin (\omega t-\varphi ), }$$
(2.118)
$$\displaystyle{ \ddot{x} = -C\omega ^{2}\;\cos (\omega t-\varphi ) = -\omega ^{2}x. }$$
(2.119)

The result may be drawn as shown in Fig. 2.15.

Fig. 2.14
figure 14

Spring–mass system

Fig. 2.15
figure 15

Trajectory of the spring–mass system

The quantity \(\varphi\) obviously determines only the initial conditions, whereas C is the oscillation amplitude and thus characterizes the energy of the system.

The quantity C (regarded as a system property), as well as the energy W, remains constant on the trajectory. If, in general, we have an invariant H that depends on two variables q and p, then the trajectory (q(t), p(t)) will have the property that

$$\displaystyle{\frac{\mathrm{d}H} {\mathrm{d}t} = 0}$$

holds. One concludes that

$$\displaystyle\begin{array}{rcl} & \frac{\partial H} {\partial q} \frac{\mathrm{d}q} {\mathrm{d}t} + \frac{\partial H} {\partial p} \frac{\mathrm{d}p} {\mathrm{d}t} = 0 & {}\\ & \Leftrightarrow \frac{\partial H} {\partial q} \dot{q} + \frac{\partial H} {\partial p} \dot{p} = 0.& {}\\ \end{array}$$

This equation is obviously satisfied if the following system of equations is valid:

$$\displaystyle{ \frac{\partial H} {\partial p} =\dot{ q}, }$$
(2.120)
$$\displaystyle{ \frac{\partial H} {\partial q} = -\dot{p}. }$$
(2.121)

These equations are called Hamilton’s equations. The function H(q, p) is called the Hamiltonian. We will now check whether this system of equations is actually satisfied in our example.

It is clear that the total energy of the system remains constant:

$$\displaystyle{W(x,v) = \frac{K} {2} \;x^{2} + \frac{1} {2}mv^{2}.}$$

This can also be seen formally if the differential equation

$$\displaystyle{m\;\ddot{x} + K\;x = 0}$$

is multiplied by \(\dot{x}\):

$$\displaystyle\begin{array}{rcl} & m\;\dot{x}\;\ddot{x} + K\;\dot{x}\;x = 0 & {}\\ & \Leftrightarrow \frac{m} {2} \frac{\mathrm{d}} {\mathrm{d}t}(\dot{x}^{2}) + \frac{K} {2} \frac{\mathrm{d}} {\mathrm{d}t}(x^{2}) = 0& {}\\ & \Leftrightarrow \frac{\mathrm{d}W} {\mathrm{d}t} = 0. & {}\\ \end{array}$$

Here we obviously have

$$\displaystyle\begin{array}{rcl} & \frac{\partial W} {\partial x} = Kx, & {}\\ & \frac{\partial W} {\partial v} = mv, & {}\\ & \dot{x} = v, & {}\\ & \dot{v} =\ddot{ x} = -\frac{K} {m}x.& {}\\ \end{array}$$

As a result, one obtains

$$\displaystyle\begin{array}{rcl} & \frac{\partial W} {\partial x} = -m\dot{v},& {}\\ & \frac{\partial W} {\partial v} = m\dot{x}. & {}\\ \end{array}$$

These equations are not yet equivalent to the above-mentioned Hamilton equations. However, it is not a big step to work with p = mv instead of v:

$$\displaystyle\begin{array}{rcl} & \frac{\partial W} {\partial x} = -\dot{p}, & {}\\ & \frac{\partial W} {\partial p} = \frac{1} {m} \frac{\partial W} {\partial v} =\dot{ x}.& {}\\ \end{array}$$

Now the equations actually have the desired form; Hamilton’s equations are satisfied. The function W(x, p) is called a Hamiltonian, since it satisfies Hamilton’s equations.

As shown above, C is also constant along the trajectory. We can obviously determine C as follows, based on Eqs. (2.117) and (2.118):

$$\displaystyle{(\omega C)^{2} = (\omega x)^{2} +\dot{ x}^{2}.}$$

It seems to be useful to define the following quantities in order to get \(C = \sqrt{\bar{q}^{2 } +\bar{ p}^{2}}\):

$$\displaystyle{\bar{q} = x,\qquad \bar{p} = \frac{\dot{x}} {\omega }.}$$

Calculating the partial derivatives leads to

$$\displaystyle\begin{array}{rcl} & \frac{\partial C} {\partial \bar{p}} = \frac{1} {2C}\;2\bar{p},& {}\\ & \frac{\partial C} {\partial \bar{q}} = \frac{1} {2C}\;2\bar{q},& {}\\ & \dot{\bar{q}} =\dot{ x} =\omega \bar{ p}. & {}\\ \end{array}$$

With the help of Eq. (2.116), one obtains

$$\displaystyle{\dot{\bar{p}} = \frac{\ddot{x}} {\omega } = -\omega x = -\omega \bar{q}.}$$

We therefore get two coupled differential equations:

$$\displaystyle\begin{array}{rcl} \frac{\partial C} {\partial \bar{p}} & =& \frac{1} {\omega C}\;\dot{\bar{q}}\quad \Rightarrow \omega C\;\frac{\partial C} {\partial \bar{p}} =\dot{\bar{ q}}, {}\\ \frac{\partial C} {\partial \bar{q}} & =& -\frac{1} {\omega C}\;\dot{\bar{p}}\quad \Rightarrow \omega C\;\frac{\partial C} {\partial \bar{q}} = -\dot{\bar{p}}. {}\\ \end{array}$$

This is reminiscent of the product rule

$$\displaystyle{\frac{\partial (C^{2})} {\partial \bar{p}} = 2C\frac{\partial C} {\partial \bar{p}},\mbox{ or }\frac{\partial (C^{2})} {\partial \bar{q}} = 2C\frac{\partial C} {\partial \bar{q}}.}$$

If we therefore set

$$\displaystyle{H = C^{2} \frac{\omega } {2},}$$

we again obtain Hamilton’s equations:

$$\displaystyle\begin{array}{rcl} \frac{\partial H} {\partial \bar{p}} & =& \dot{\bar{q}}, {}\\ \frac{\partial H} {\partial \bar{q}} & =& -\dot{\bar{p}}. {}\\ & & {}\\ \end{array}$$

We conclude that on the trajectory, the Hamiltonian

$$\displaystyle{H(\bar{q},\bar{p}) = \frac{\omega } {2}\bar{q}^{2} + \frac{\omega } {2}\bar{p}^{2}}$$

is constant if in our example, the generalized coordinate

$$\displaystyle{\bar{q} = x}$$

and the generalized momentum

$$\displaystyle{\bar{p} =\dot{ x}/\omega }$$

are used. In our special case, \(\bar{q}\) is a physical coordinate, but \(\bar{p}\) is not the physical momentum. In general, \(\bar{q}\) also does not need to be a physical coordinate. This explains the terms “generalized coordinate” and “generalized momentum.” Here they formally play a similar mathematical role. We summarize:

  • H(q, p) is called a Hamiltonian if Hamilton’s equations (2.120) and (2.121) are satisfied.

  • The Hamiltonian describes a dynamical system.

  • The quantities q and p are called a generalized coordinate and generalized momentum, respectively. They do not necessarily have to be identical to the physical coordinates and momenta.

  • Different Hamiltonians may exist for the same dynamical system (in our example, \(\frac{\omega }{2}\;C^{2}\) and W), and also different definitions of q and p are possible.

2.11.2 Arbitrary Number of Variables

Our introductory example contained only one coordinate and one momentum variable. For an arbitrary number of coordinate variables, Hamilton’s equations are

In this case, the Hamiltonian

$$\displaystyle{H(q_{i},p_{i},t)}$$

depends on n generalized coordinates qi (1 ≤ i ≤ n), on n generalized momentum variables pi, and in general, explicitly on the time t. Its total derivative with respect to time is

$$\displaystyle{\frac{\mathrm{d}H} {\mathrm{d}t} =\sum _{ i=1}^{n}\left [\frac{\partial H} {\partial q_{i}} \dot{q}_{i} + \frac{\partial H} {\partial p_{i}}\dot{p}_{i}\right ] + \frac{\partial H} {\partial t}.}$$

By means of Eqs. (2.122) and (2.123), one obtains

$$\displaystyle{\fbox{$\frac{\mathrm{d}H} {\mathrm{d}t} = \frac{\partial H} {\partial t}.$}}$$

This shows that if the Hamiltonian does not explicitly depend on time (as in our introductory example), it is constant along the trajectory. In contrast to this case, an explicit time dependence directly determines the time dependence along the trajectory.

Autonomous Hamiltonian systems H(qi, pi) with no explicit time dependence are conservative systems, because H does not change with time (i.e., along the trajectory).

2.11.3 Flow in Phase Space

In general, we consider a system with n degrees of freedom. In this case, we have n generalized coordinates qi and n generalized momentum variables pi (\(i \in \{ 1,2,\ldots,n\}\)).

The 2n-dimensional space that is generated by these variables is called the phase space. If the 2n variables qi and pi are given at a time t0, the system state is determined completely, and qi(t), pi(t) can be calculated for arbitrary times t (in the maximal interval of existence; see Sects. 2.8.3.3 and 2.8.3.4).

In order to show this, we combine coordinate and momentum variables as follows:

$$\displaystyle{\vec{r} = \left (\begin{array}{c} q_{1} \\ q_{2}\\ \ldots \\ q_{n} \\ p_{1} \\ p_{2}\\ \ldots \\ p_{n}\end{array} \right ),\qquad \vec{v} =\dot{\vec{ r}} = \left (\begin{array}{c} \dot{q}_{1} \\ \dot{q}_{2}\\ \ldots \\ \dot{q}_{n} \\ \dot{p}_{1} \\ \dot{p}_{2}\\ \ldots \\ \dot{p}_{n}\end{array} \right ) = \left (\begin{array}{c} \frac{\partial H} {\partial p_{1}} \\ \frac{\partial H} {\partial p_{2}}\\ \ldots \\ \frac{\partial H} {\partial p_{n}} \\ -\frac{\partial H} {\partial q_{1}} \\ -\frac{\partial H} {\partial q_{2}}\\ \ldots \\ - \frac{\partial H} {\partial q_{n}} \end{array} \right ).}$$

In the last step, we used Hamilton’s equations (2.122) and (2.123). Based on this definition, the problem has the standard form (2.94) of a dynamical system (see p. 50). We obtain

$$\displaystyle{\fbox{$\mathrm{div}\;\vec{v} =\sum _{ k=1}^{n}\left ( \frac{\partial ^{2}H} {\partial p_{k}\partial q_{k}} - \frac{\partial ^{2}H} {\partial q_{k}\partial p_{k}}\right ) = 0.$}}$$

Therefore, the flow in phase space corresponds to an incompressible fluid. Thus, Liouville’s theorem is valid automatically, stating that the area/volume in phase space remains constant. We have assumed only the preservation of the number of particles and the validity of Hamilton’s equations.

Liouville’s theorem (and area/volume preservation) is also valid if the Hamiltonian

$$\displaystyle{H(q,p,t)}$$

explicitly depends on time (cf. Szebehely [46, p. 55], Lichtenberg [49, p. 13]).

2.11.4 Fixed Points of a Hamiltonian System in the Plane

For the fixed points of an autonomous Hamiltonian system with one degree of freedom, we have

$$\displaystyle{\vec{r} = \left (\begin{array}{c} q\\ p \end{array} \right ),\qquad \vec{v} = \left (\begin{array}{c} \dot{q}\\ \dot{p} \end{array} \right ) = \left (\begin{array}{c} \frac{\partial H} {\partial p} \\ -\frac{\partial H} {\partial q} \end{array} \right ) = 0.}$$

The Jacobian matrix is

$$\displaystyle{D\vec{v} = \left (\begin{array}{cc} \frac{\partial ^{2}H} {\partial p\partial q} & \frac{\partial ^{2}H} {\partial p^{2}} \\ -\frac{\partial ^{2}H} {\partial q^{2}} & - \frac{\partial ^{2}H} {\partial q\partial p} \end{array} \right ).}$$

If we calculate the eigenvalues of the matrix

$$\displaystyle{ A = D\vec{v}(\vec{r}_{\mathrm{F}}) = \left (\begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right ) }$$
(2.124)

according to Eqs. (2.107) and (2.108), we obtain

$$\displaystyle{B = \frac{a_{11} + a_{22}} {2} = 0,\qquad C = -\det \;A,}$$

and therefore

$$\displaystyle{\lambda = \pm \sqrt{C} = \pm \sqrt{-\det \;A}.}$$

Hence, the two eigenvalues are either real with opposite sign or imaginary with opposite sign.

All fixed points of the linearized system are therefore either centers or saddle points. The linearized system cannot have any sources or sinks. This is consistent with the name “conservative system.”

We now regard the Hamiltonian H(q, p) as a function that describes a two-dimensional surface in three-dimensional space.

The fixed-point condition

$$\displaystyle{\left (\begin{array}{c} \frac{\partial H} {\partial p} \\ -\frac{\partial H} {\partial q} \end{array} \right ) = 0}$$

is necessary for the existence of a relative extremum (also called a local extremum) of \(H(\vec{r})\) at \(\vec{r} =\vec{ r}_{\mathrm{F}}\), because the gradient of H must be zero. A sufficient condition for a relative minimum is that the Hessian matrix

$$\displaystyle{\left (\begin{array}{cc} \frac{\partial ^{2}H} {\partial q^{2}} & \frac{\partial ^{2}H} {\partial q\partial p} \\ \frac{\partial ^{2}H} {\partial p\partial q} & \frac{\partial ^{2}H} {\partial p^{2}} \end{array} \right )}$$

of H be positive definite at \(\vec{r} =\vec{ r}_{\mathrm{F}}\) (all eigenvalues positive). If the Hessian matrix is negative definite (all eigenvalues negative), then a relative maximum is present. If the Hessian matrix is indefinite (both positive and negative eigenvalues), a saddle point is present.

Obviously, we find for the Hessian matrix

$$\displaystyle{\left (\begin{array}{cc} \frac{\partial ^{2}H} {\partial q^{2}} & \frac{\partial ^{2}H} {\partial q\partial p} \\ \frac{\partial ^{2}H} {\partial p\partial q} & \frac{\partial ^{2}H} {\partial p^{2}} \end{array} \right ) = \left (\begin{array}{cc} - a_{21} & a_{11}\\ a_{11 } & a_{12} \end{array} \right ).}$$

The eigenvalues λH of the Hessian matrix can be determined as follows:

$$\displaystyle\begin{array}{rcl} & (-a_{21} -\lambda _{\mathrm{H}})(a_{12} -\lambda _{\mathrm{H}}) - a_{11}^{2} = 0 & {}\\ & \Rightarrow \lambda _{\mathrm{H}}^{2} +\lambda _{\mathrm{H}}(a_{21} - a_{12}) - (a_{11}^{2} + a_{12}a_{21}) = 0& {}\\ \end{array}$$
$$\displaystyle{ \Rightarrow \lambda _{\mathrm{H}} = \frac{a_{12} - a_{21}} {2} \pm \sqrt{\frac{(a_{12 } - a_{21 } )^{2 } } {4} + a_{11}^{2} + a_{12}a_{21}} }$$
(2.125)
$$\displaystyle{\Rightarrow \lambda _{\mathrm{H}} = \frac{a_{12} - a_{21}} {2} \pm \sqrt{\frac{(a_{12 } + a_{21 } )^{2 } } {4} + a_{11}^{2}}.}$$

The argument of the square root is not negative, so that only real eigenvalues exist (symmetry of the Hessian matrix).

Hence, we have three possibilities for the value of the square root:

  • It is greater than the absolute value of the first fraction. In this case, it determines the sign of the eigenvalues. Therefore, a positive eigenvalue and a negative eigenvalue exist, and the Hessian matrix is indefinite. Hence, we have a (geometric) saddle point. In this case, due to Eq. (2.125), we have

    $$\displaystyle{a_{11}^{2} + a_{ 12}a_{21} > 0,}$$

    or with \(a_{11} = -a_{22}\) (see Eq. (2.124)),

    $$\displaystyle\begin{array}{rcl} & a_{11}a_{22} - a_{12}a_{21} < 0& {}\\ & \Leftrightarrow \det \; A < 0. & {}\\ \end{array}$$

    Due to the restriction

    $$\displaystyle{\lambda = \pm \sqrt{C} = \pm \sqrt{-\det \;A}}$$

    for the eigenvalues of the Jacobian matrix, the fixed point is also a saddle point.

  • It is less than the absolute value of the first fraction in Eq. (2.125), so that det A > 0 holds. Due to

    $$\displaystyle{\lambda = \pm \sqrt{-\det \;A},}$$

    the eigenvalues are imaginary, and the fixed point is a center. The first fraction in Eq. (2.125) decides which sign the eigenvalues of the Hessian matrix have. For a12 > a21, we have a relative minimum of the Hamiltonian, and for a12 < a21, one obtains a relative maximum.

  • It equals the first fraction in Eq. (2.125). Then, one eigenvalue is zero, and det A = 0 holds, which we have excluded (degenerate fixed point).

In conclusion, the Hamiltonian has a geometric saddle point if the corresponding fixed point is a saddle point. It has a relative minimum or maximum if the corresponding fixed point is a center.

2.11.5 Hamiltonian as Lyapunov Function

As in the previous sections, let us consider an autonomous Hamiltonian system with only one degree of freedom.

In Sect. 2.11.4, we saw that the linearization of such a system may have only centers and saddle points as fixed points. Let us assume that the system is linearized at a specific fixed point and that the fixed point of the linearized system is a saddle point. According to Theorem 2.17 (p. 77), the fixed point of the original system must be a saddle point as well. Theorem 2.17 applies, because the saddle point is a hyperbolic fixed point.

These arguments cannot be adopted for a center as a fixed point, because centers are not hyperbolic fixed points. If we want to show that a center of the linearized system corresponds to a center of the original nonlinear system, we need a different approach, which is presented in the following.

In many cases, the Hamiltonian of an autonomous system is defined in such a way that H ≥ 0 holds and that for the fixed points, \(H(\vec{r}_{\mathrm{F}}) = 0\) is valid. If under these conditions, \(H(\vec{r})\) has a minimum at \(\vec{r} =\vec{ r}_{\mathrm{F}}\), then L: = H is a Lyapunov function, since one has

$$\displaystyle{\frac{\mathrm{d}L} {\mathrm{d}t} =\vec{ v}\cdot \mathrm{grad}\;L = \left (\begin{array}{c} \dot{q}\\ \dot{p} \end{array} \right )\cdot \mathrm{grad}\;H = \left (\begin{array}{c} \frac{\partial H} {\partial p} \\ -\frac{\partial H} {\partial q} \end{array} \right )\cdot \left (\begin{array}{c} \frac{\partial H} {\partial q} \\ \frac{\partial H} {\partial p} \end{array} \right ) = 0 \leq 0.}$$

Under these conditions, one is therefore able to show that the autonomous Hamiltonian system has a center.Footnote 19

Theorem 2.19.

Let H ∈ C 2 (D) be a Hamiltonian ( \(D \subset \mathbb{R}^{2n}\) open). If \(\vec{r}\) is an isolated minimum (strict minimum) of the Hamiltonian, then \(\vec{r}\) is a stable fixed point.

(See Amann [35, Sect. 18.11 b]; Walter [50, Sect. 30, Chap. XII d].)

Since for Hamiltonians, the question whether a minimum of maximum exists is just a matter of the sign,Footnote 20 one concludes in general the following result:

Theorem 2.20.

Every nondegenerate fixed point \(\vec{r}_{\mathrm{F}}\) of a Hamiltonian system is a saddle point or a center. It is a saddle point if and only if the Hamiltonian has a saddle point with

$$\displaystyle{\det \left [D\vec{v}(\vec{r}_{\mathrm{F}})\right ] < 0.}$$

It is a center if and only if the Hamiltonian has a strict minimum or strict maximum with

$$\displaystyle{\det \left [D\vec{v}(\vec{r}_{\mathrm{F}})\right ] > 0.}$$

(See Perko [30, Sect. 2.14, Theorem 2].)

2.11.6 Canonical Transformations

We consider canonical transformations as transformations that preserve the phase space area and that transform one set of Hamilton’s equations (depending on q, p) into another set of Hamilton’s equations (depending on Q, P).

According to Appendix A.10, preservation of the phase space area means

$$\displaystyle{ \xi = \frac{\partial (Q,P)} {\partial (q,p)} = \left \vert \begin{array}{cc} \frac{\partial Q} {\partial q} &\frac{\partial Q} {\partial p} \\ \frac{\partial P} {\partial q} &\frac{\partial P} {\partial p} \end{array} \right \vert = \frac{\partial Q} {\partial q} \frac{\partial P} {\partial p} -\frac{\partial Q} {\partial p} \frac{\partial P} {\partial q} = 1. }$$
(2.126)

We consider only a very specificFootnote 21 subset of canonical transformations for which the value of the Hamiltonian remains unchanged. In this case,

$$\displaystyle{\dot{q} = \frac{\partial H} {\partial p},\qquad \dot{p} = -\frac{\partial H} {\partial q},}$$

must be transformed into

$$\displaystyle{\dot{Q} = \frac{\partial H} {\partial P},\qquad \dot{P} = -\frac{\partial H} {\partial Q}.}$$

For all points in phase space we have

$$\displaystyle\begin{array}{rcl} & \frac{\partial H} {\partial p} = \frac{\partial H} {\partial Q} \frac{\partial Q} {\partial p} + \frac{\partial H} {\partial P} \frac{\partial P} {\partial p} \qquad \Rightarrow \dot{ q} = -\dot{P}\;\frac{\partial Q} {\partial p} +\dot{ Q}\;\frac{\partial P} {\partial p}, & {}\\ & \frac{\partial H} {\partial q} = \frac{\partial H} {\partial Q} \frac{\partial Q} {\partial q} + \frac{\partial H} {\partial P} \frac{\partial P} {\partial q} \qquad \Rightarrow -\dot{p} = -\dot{P}\;\frac{\partial Q} {\partial q} +\dot{ Q}\;\frac{\partial P} {\partial q}.& {}\\ \end{array}$$

Now we have to check whether these restricted transformations are actually canonical ones, i.e., whether ξ = 1 holds.

For this purpose, we eliminate all derivatives of P in Eq. (2.126) by means of the last two results:

$$\displaystyle{\xi = \frac{\partial Q} {\partial q} \left ( \frac{\dot{q}} {\dot{Q}} + \frac{\dot{P}} {\dot{Q}} \frac{\partial Q} {\partial p} \right ) -\frac{\partial Q} {\partial p} \left (-\frac{\dot{p}} {\dot{Q}} + \frac{\dot{P}} {\dot{Q}} \frac{\partial Q} {\partial q} \right ) = \frac{1} {\dot{Q}}\left (\frac{\partial Q} {\partial q} \dot{q} + \frac{\partial Q} {\partial p} \dot{p}\right ).}$$

The last expression in parentheses is equal to \(\dot{Q}\), so that ξ = 1 indeed holds.

2.11.7 Action-Angle Variables

In this section, we will briefly discuss special coordinates for oscillatory Hamiltonian systems, the so-called action-angle variables. Again, the general theory is outside the scope of this book, but we will use some results of this theory to determine the oscillation frequency of nonlinear systems.

2.11.7.1 Introductory Example

As an introductory example, we consider a parallel LC circuit as shown in Fig. 2.16 for which

$$\displaystyle{V = L\dot{I}\qquad \Rightarrow \dot{ I} = \frac{V } {L}}$$

and

$$\displaystyle{-I = C\dot{V }\qquad \Rightarrow \dot{ V } = -\frac{I} {C}}$$

hold. For

$$\displaystyle{q = I,\qquad p = V,}$$

this may be transformed into Hamilton’s equations:

$$\displaystyle{\dot{q} = \frac{p} {L} = \frac{\partial H} {\partial p},\qquad \dot{p} = -\frac{q} {C} = -\frac{\partial H} {\partial q}.}$$
Fig. 2.16
figure 16

Parallel LC circuit

By means of an integration, we obtain the Hamiltonian

$$\displaystyle{H = \frac{p^{2}} {2L} + \frac{q^{2}} {2C} = \frac{V ^{2}} {2L} + \frac{I^{2}} {2C}.}$$

For the initial conditions

$$\displaystyle{V = V _{\mathrm{max}},\qquad I = 0,}$$

one obtains

$$\displaystyle{H = \frac{V _{\mathrm{max}}^{2}} {2L}.}$$

This value of the Hamiltonian is preserved, so that

$$\displaystyle{I^{2} = 2CH -\frac{C} {L}V ^{2} = \frac{C} {L} (V _{\mathrm{max}}^{2} - V ^{2})}$$

is valid. Hence, the orbit in phase space is an ellipse with semiaxes Vmax and

$$\displaystyle{I_{\mathrm{max}} = \sqrt{\frac{C} {L}}V _{\mathrm{max}}.}$$

For the area enclosed by this orbit, one obtains

$$\displaystyle{A =\pi V _{\mathrm{max}}I_{\mathrm{max}} =\pi \sqrt{\frac{C} {L}}V _{\mathrm{max}}^{2} = 2\pi \sqrt{LC}H.}$$

Since we know that the resonant angular frequency of a parallel LC circuit is

$$\displaystyle{\omega =\omega _{\mathrm{res}}:= \frac{1} {\sqrt{LC}},}$$

we see at once that

$$\displaystyle{A = \frac{2\pi } {\omega } H = TH,}$$

where T is the period of the oscillation. One may therefore guess that the resonant frequency or period may be derived from the area enclosed by the orbit even if less-trivial examples are considered. If that works (and it does, as we will see soon), it will obviously not be necessary to actually solve the differential equation.

2.11.7.2 Basic Principle

Let us consider an autonomous Hamiltonian system with one degree of freedom. We assume that in the (q, p) phase space, a center exists such that closed orbits are present. We are now looking for a specific canonical transformation that introduces the new generalized coordinate/momentum pair (Q, P).

The idea of action-angle variables is to require that one of the transformed coordinates not depend on time:

$$\displaystyle{\frac{\mathrm{d}P} {\mathrm{d}t} =\dot{ P} = 0.}$$

Hamilton’s equations

$$\displaystyle\begin{array}{rcl} & \dot{q} = \frac{\partial H} {\partial p},\qquad \dot{p} = -\frac{\partial H} {\partial q}, & {}\\ & \dot{Q} = \frac{\partial H} {\partial P},\qquad \dot{P} = -\frac{\partial H} {\partial Q},& {}\\ \end{array}$$

then show that

$$\displaystyle{\frac{\partial H} {\partial Q} = 0}$$

is valid, so that H cannot depend on Q but only on P:

$$\displaystyle{H = H(P).}$$

Therefore, \(\frac{\partial H} {\partial P}\) also depends only on P. Furthermore, P is constant with respect to time, so that

$$\displaystyle{\frac{\partial H} {\partial P} = \frac{\mathrm{d}H} {\mathrm{d}P}}$$

cannot depend on time either. Due to Hamilton’s equation, one then obtains

$$\displaystyle{\dot{Q} = \frac{\mathrm{d}H} {\mathrm{d}P} = K(P),}$$

and therefore

$$\displaystyle{Q(t) = Q(0) + K(P)\;t.}$$

In the original phase space (q, p), one revolution lasted for time T. Hence, it is clear that in the transformed phase space (Q, P), the variable Q will increase by the amount

$$\displaystyle{ \Delta Q = K(P)\;T, }$$
(2.127)

while P remains constant. This is visualized in Fig. 2.17.

Fig. 2.17
figure 17

Transition to action-angle variables

Since the area in phase space is kept constant by a canonical transformation, the (Q, P) phase space is a surface of a cylinder (cf. Percival/Richards [51, p. 105]). This indicates why generalized coordinates are called cyclic if the Hamiltonian does not depend on them. In our case, Q is a cyclic coordinate.

Heretofore, we required only that P not depend on time. This is satisfied, for example, if to every point (q, p), we assign the area A that is enclosed by the orbit that goes through the point (q, p):

$$\displaystyle{P = A =\iint _{A}\mathrm{d}q\;\mathrm{d}p.}$$

If this definition for P is used, then shaded area in the (q, p) phase space in Fig. 2.17 is equal to P. The shaded area in the (Q, P) phase space is equal to \(P\;\Delta Q\) (see the right-hand diagram in Fig. 2.17). Since both areas must be equal, we obtain

$$\displaystyle{\Delta Q = 1,}$$

and therefore, based on Eq. (2.127),

$$\displaystyle\begin{array}{rcl} & K(P)\;T = 1 & {}\\ & \Leftrightarrow K(P) = \frac{\mathrm{d}H} {\mathrm{d}P} = \frac{1} {T}.& {}\\ \end{array}$$

In conclusion, we may use H to calculate the period of the oscillation directly without solving the differential equation explicitly.

Instead of taking the area A = P directly as a generalized coordinate, one defines the action variable

$$\displaystyle\begin{array}{rcl} & J = \frac{P} {2\pi } & {}\\ & \Leftrightarrow \fbox{$J = \frac{1} {2\pi }\iint _{A}\mathrm{d}q\;\mathrm{d}p$}& {}\\ \end{array}$$

and the angle variableFootnote 22

$$\displaystyle{\theta = 2\pi \;Q.}$$

As the name implies, the angle variable obviously increases by 2π during every period of the oscillation. Hence, one obtains

$$\displaystyle\begin{array}{rcl} & \frac{\mathrm{d}H} {\mathrm{d}J} = \frac{\mathrm{d}H} {\mathrm{d}P} \;2\pi = \frac{2\pi } {T} & {}\\ & \Rightarrow \fbox{$\frac{\mathrm{d}H} {\mathrm{d}J} = \frac{2\pi } {T} =\omega $}& {}\\ \end{array}$$

for Hamilton’s equations

$$\displaystyle{\fbox{$\dot{\theta } = \frac{\partial H} {\partial J}, \dot{J} = -\frac{\partial H} {\partial \theta } = 0.$}}$$

Please note that for these considerations, we assumed that the Hamiltonian does not depend on time and that the orbits are closed. Therefore, the action variable is defined in a unique way, and by Liouville’s theorem, it is obvious that the phase space area, and thus also the action variable, remains constant.

2.11.8 LC Circuit with Nonlinear Inductance

The characteristic curve B(H) of a magnetic material can be approximated by

$$\displaystyle{B = B_{\mathrm{max}}\frac{2} {\pi } \arctan \frac{H} {H_{0}} = B_{\mathrm{max}}\frac{2} {\pi } \arctan \frac{I} {I_{0}}.}$$

The magnetic material will be used to build an inductor with N windings. With the magnetic flux \(\Phi _{\mathrm{m}} = BA\), it follows that

$$\displaystyle{V = N\;\frac{\mathrm{d}\Phi _{\mathrm{m}}} {\mathrm{d}t} = N\;AB_{\mathrm{max}}\frac{2} {\pi } \frac{1} {1 + \left ( \frac{I} {I_{0}} \right )^{2}} \frac{1} {I_{0}} \frac{\mathrm{d}I} {\mathrm{d}t}.}$$

Therefore, from

$$\displaystyle{V = L\frac{\mathrm{d}I} {\mathrm{d}t},}$$

one obtains

$$\displaystyle{L(I) = \frac{L_{0}} {1 + \left ( \frac{I} {I_{0}} \right )^{2}}.}$$

The corresponding inductor in parallel with a capacitor can now be used to form an LC oscillator as shown in Fig. 2.16. For the capacitance of the oscillating circuit,

$$\displaystyle{-I = C\frac{\mathrm{d}V } {\mathrm{d}t}.}$$

is valid. The magnetic energy is

$$\displaystyle\begin{array}{rcl} W_{\mathrm{magn}}& =& \int V \;I\;\mathrm{d}t =\int L(I)\;I\;\frac{\mathrm{d}I} {\mathrm{d}t} \;\mathrm{d}t = L_{0}\int \frac{I} {1 + \left ( \frac{I} {I_{0}} \right )^{2}}\;\frac{\mathrm{d}I} {\mathrm{d}t} \;\mathrm{d}t {}\\ & =& L_{0}I_{0}^{2}\int \frac{x} {1 + x^{2}}\;\frac{\mathrm{d}x} {\mathrm{d}t} \;\mathrm{d}t, {}\\ \end{array}$$

where we have used

$$\displaystyle{x = \frac{I} {I_{0}},\mbox{ or }I = x\;I_{0}.}$$

Because of

$$\displaystyle{\int \frac{x} {1 + x^{2}}\;\mathrm{d}x = \frac{1} {2}\;\ln \vert 1 + x^{2}\vert +\mathrm{ const},}$$

one obtains

$$\displaystyle{W_{\mathrm{magn}} = \frac{L_{0}I_{0}^{2}} {2} \;\ln \left (1 + \left ( \frac{I} {I_{0}}\right )^{2}\right ).}$$

Together with

$$\displaystyle{W_{\mathrm{el}} = \frac{1} {2}CV ^{2},}$$

this leads to

$$\displaystyle{W(V,I) = W_{\mathrm{el}} + W_{\mathrm{magn}} = \frac{1} {2}CV ^{2} + \frac{L_{0}I_{0}^{2}} {2} \;\ln \left (1 + \left ( \frac{I} {I_{0}}\right )^{2}\right ).}$$

If we define p = V, we obtain

$$\displaystyle{\frac{\partial W} {\partial p} = Cp = CV = CL(I)\frac{\mathrm{d}I} {\mathrm{d}t} = \frac{CL_{0}} {1 + \left ( \frac{I} {I_{0}} \right )^{2}} \frac{\mathrm{d}I} {\mathrm{d}t}.}$$

If this is one of the two Hamilton’s equations, the right-hand side must be equal to \(\dot{q}\), and one obtains

$$\displaystyle{q = CL_{0}I_{0}\;\arctan \frac{I} {I_{0}}.}$$

Therefore, the Hamiltonian is

$$\displaystyle\begin{array}{rcl} & H(q,p) = \frac{1} {2}Cp^{2} + \frac{L_{0}I_{0}^{2}} {2} \;\ln \left (1 +\tan ^{2} \frac{q} {CL_{0}I_{0}} \right )& {}\\ & \Rightarrow H(q,p) = \frac{1} {2}Cp^{2} - L_{ 0}I_{0}^{2}\;\ln \;\cos \frac{q} {CL_{0}I_{0}}. & {}\\ \end{array}$$

We still have to check the second of Hamilton’s equations. One obtains

$$\displaystyle{\frac{\partial H} {\partial q} = -L_{0}I_{0}^{2}\; \frac{1} {\cos \frac{q} {CL_{0}I_{0}} } \left (-\sin \frac{q} {CL_{0}I_{0}}\right ) \frac{1} {CL_{0}I_{0}} = \frac{I_{0}} {C} \;\tan \frac{q} {CL_{0}I_{0}} = \frac{I} {C}.}$$

In fact, the right-hand side equals \(-\frac{\mathrm{d}V } {\mathrm{d}t}\), which is equal to \(-\dot{p}\), and both Hamilton’s equations are satisfied.

In order to calculate the oscillation frequency, we compute the action:

$$\displaystyle\begin{array}{rcl} J(H)& = & \frac{1} {2\pi }\iint \mathrm{d}q\;\mathrm{d}p {}\\ & =& \frac{1} {\pi } \int _{q_{1}}^{q_{2}}p\;\mathrm{d}q {}\\ & = & \frac{1} {\pi } \sqrt{ \frac{2} {C}}\int _{q_{1}}^{q_{2} }\sqrt{H + L_{0 } I_{0 }^{2 }\;\ln \;\cos \frac{q} {CL_{0}I_{0}}}\;\mathrm{d}q. {}\\ \end{array}$$

The limits q1 and q2 are determined by the zeros of p where the trajectory crosses the q-axis. The substitution

$$\displaystyle{x = \frac{q} {CL_{0}I_{0}},\qquad \frac{\mathrm{d}x} {\mathrm{d}q} = \frac{1} {CL_{0}I_{0}},}$$

leads to

$$\displaystyle{ J(H) = \frac{1} {\pi } \sqrt{ \frac{2} {C}}CL_{0}I_{0}\int _{x_{1}}^{x_{2} }\sqrt{H + L_{0 } I_{0 }^{2 }\;\ln \;\cos x}\;\mathrm{d}x. }$$
(2.128)

Due to

$$\displaystyle{\ln \;\cos x \approx -\frac{x^{2}} {2} -\frac{x^{4}} {12} -\cdots }$$

the simplest approximation for I ≪ I0 is

$$\displaystyle\begin{array}{rcl} J(H)& =& \frac{1} {\pi } \sqrt{2C}L_{0}I_{0}\sqrt{\frac{L_{0 } } {2}} I_{0}\int _{x_{1}}^{x_{2} }\sqrt{H \frac{2} {L_{0}I_{0}^{2}} - x^{2}}\;\mathrm{d}x {}\\ & =& \frac{1} {\pi } \sqrt{L_{0 } C}L_{0}I_{0}^{2}\int _{ x_{1}}^{x_{2} }\sqrt{H \frac{2} {L_{0}I_{0}^{2}} - x^{2}}\;\mathrm{d}x. {}\\ \end{array}$$

The integral describes the area of a semicircle with radius

$$\displaystyle{\sqrt{ \frac{2H} {L_{0}I_{0}^{2}}},}$$

so that

$$\displaystyle{J(H) = \frac{1} {\pi } \sqrt{L_{0 } C}\;L_{0}I_{0}^{2}\; \frac{2H} {L_{0}I_{0}^{2}} \frac{\pi } {2} = \sqrt{L_{0 } C}\;H}$$

is obtained. As expected, one obtains

$$\displaystyle{\omega = \frac{\mathrm{d}H} {\mathrm{d}J} = \frac{1} {\sqrt{L_{0 } C}}.}$$

If the approximation is undesirable, one may directly calculate the derivative of Eq. (2.128). Then the integral may be evaluated numerically in order to calculate the amplitude-dependent oscillation frequency \(\omega (\hat{I})\). As mentioned above, no direct solution of the differential equation is required.

2.11.9 Mathematical Pendulum

Consider a mathematical pendulum with mass m that is suspended by means of a massless cord of length R (see Fig. 2.18). Suppose that initially, the mass m is at height x = h (corresponding to the angle \(\alpha =\hat{\alpha }\)) with zero velocity.

Fig. 2.18
figure 18

Mathematical pendulum

2.11.9.1 Energy Balance

The sum of the potential energy and kinetic energy must remain constant:

$$\displaystyle\begin{array}{rcl} & W_{\mathrm{pot}} + W_{\mathrm{kin}} =\mathrm{ const},\mbox{ with }W_{\mathrm{pot}} = mgx\mbox{ and }W_{\mathrm{kin}} = \frac{1} {2}mu^{2}& {}\\ & \Leftrightarrow mgx + \frac{1} {2}mu^{2} =\mathrm{ const} & {}\\ & \Leftrightarrow g(R - R\;\cos \alpha ) + \frac{1} {2}R^{2}\dot{\alpha }^{2} =\mathrm{ const}. & {}\\ \end{array}$$

We now calculate the time derivative of this equation:

$$\displaystyle{gR\;\sin \alpha \;\dot{\alpha } + R^{2}\dot{\alpha }\ddot{\alpha } = 0.}$$

As a result, we obtain

$$\displaystyle{ \ddot{\alpha }+ \frac{g} {R}\;\sin \alpha = 0. }$$
(2.129)

2.11.9.2 Hamilton’s Equations

We now try to convert Eq. (2.129) into a pair of Hamilton’s equations using our standard approach

$$\displaystyle{q =\alpha,\qquad p =\dot{\alpha },}$$

which leads to

$$\displaystyle\begin{array}{rcl} \dot{q}& =& p, {}\\ \dot{p}& =& -\frac{g} {R}\;\sin q. {}\\ \end{array}$$

If this is to be in accord with Hamilton’s equations,

$$\displaystyle\begin{array}{rcl} \frac{\partial H} {\partial p} & =& \dot{q}, {}\\ \frac{\partial H} {\partial q} & =& -\dot{p}, {}\\ \end{array}$$

we obtain by integration

$$\displaystyle\begin{array}{rcl} & H = \frac{p^{2}} {2} + f(q), & {}\\ & H = -\frac{g} {R}\;\cos q + g(p).& {}\\ \end{array}$$

Putting both results together, one obtains

$$\displaystyle{ H(q,p) = \frac{p^{2}} {2} - \frac{g} {R}\;\cos q. }$$
(2.130)

If we additionally require H(0, 0) = 0, we may add a constant accordingly:

$$\displaystyle{ H(q,p) = \frac{p^{2}} {2} + \frac{g} {R}\;(1 -\cos q). }$$
(2.131)

2.11.9.3 Oscillation Period

In order to calculate the oscillation period, we first determine the action variable:

$$\displaystyle{J(H) = \frac{1} {2\pi }\iint \mathrm{d}q\;\mathrm{d}p = \frac{1} {\pi } \int _{q_{1}}^{q_{2} }p\;\mathrm{d}q.}$$

Equation (2.130) leads to

$$\displaystyle{p = \sqrt{2\left [H + \frac{g} {R}\;\cos q\right ]}}$$

for the upper part of the curve in phase space. By means of

$$\displaystyle\begin{array}{rcl} a& =& \frac{2} {\pi ^{2}} H, \\ b& =& \frac{2} {\pi ^{2}} \frac{g} {R},{}\end{array}$$
(2.132)
$$\displaystyle{ \frac{a} {b} = \frac{HR} {g} }$$
(2.133)

one obtains the following integral:

$$\displaystyle{ J(H) =\int _{ q_{1}}^{q_{2} }\sqrt{a + b\;\cos q}\;\mathrm{d}q. }$$
(2.134)

The limits q1 and q2 of integration are determined by the zeros of p. We obviously have p = 0 for

$$\displaystyle{q_{1,2} = \mp \arccos \frac{-a} {b}.}$$

Therefore, \(q_{1} = -q_{2}\) holds, so that we can use the symmetry of the integrand:

$$\displaystyle{J(H) = 2\int _{0}^{q_{2} }\sqrt{a + b\;\cos q}\;\mathrm{d}q.}$$

According to the first formula 2.576 in [3], for | a | ≤ b and \(0 \leq q <\arccos (-a/b)\), the integral has the following value:Footnote 23

$$\displaystyle{ J(H) = 2\sqrt{\frac{2} {b}}\left [(a - b)\;\mathrm{F}\left (\gamma, \frac{1} {r}\right ) + 2b\;\mathrm{E}\left (\gamma, \frac{1} {r}\right )\right ]_{0}^{q_{2} } }$$
(2.135)

with

$$\displaystyle{r = \sqrt{ \frac{2b} {a + b}}\mbox{ and }\gamma =\arcsin \sqrt{\frac{b(1 -\cos q)} {a + b}}.}$$

For q = 0, one obviously has γ = 0. For q = q2,

$$\displaystyle{\cos q = \frac{-a} {b} \qquad \Rightarrow \gamma =\arcsin \sqrt{\frac{b\left (1 + \frac{a} {b}\right )} {a + b}} =\arcsin \; 1 = \frac{\pi } {2}}$$

is valid. The expression in square brackets in Eq. (2.135) is equal to zero at the lower integration limit γ = 0, since F(0, k) = 0 and E(0, k) = 0:

$$\displaystyle{(a - b)\mathrm{F}\left (0,k\right ) + 2b\;\mathrm{E}\left (0,k\right ) = 0.}$$

Here we set

$$\displaystyle{ k = \frac{1} {r} = \sqrt{\frac{1} {2} + \frac{a} {2b}}\mbox{ and }k^{{\prime}} = \sqrt{1 - k^{2}} = \sqrt{\frac{1} {2} - \frac{a} {2b}}. }$$
(2.136)

From \(\mathrm{F}(\pi /2,k) =\mathrm{ K}(k)\) and \(\mathrm{E}(\pi /2,k) =\mathrm{ E}(k)\), one concludes, based on Eq. (2.135), that

$$\displaystyle{ J(H) = 2\sqrt{\frac{2} {b}}\left [(a - b)\;\mathrm{K}(k) + 2b\;\mathrm{E}(k)\right ] = 4\sqrt{2b}\left [\mathrm{E}(k) - k^{{\prime}2}\mathrm{K}(k)\right ]. }$$
(2.137)

The angular frequency of the oscillation may be calculated according to

$$\displaystyle{\omega = \frac{\partial H} {\partial J}.}$$

Since by definition, H depends only on the action variable J but not on the angle variable θ, and since H does not depend on time in our case, the partial derivative is in fact a total derivative:

$$\displaystyle{\omega = \frac{\partial H} {\partial J} = \frac{\mathrm{d}H} {\mathrm{d}J} = \left ( \frac{\mathrm{d}J} {\mathrm{d}H}\right )^{-1}.}$$

We therefore need \(\frac{\mathrm{d}J} {\mathrm{d}H}\). From Eqs. (2.133), (2.136), and (2.137), we get

$$\displaystyle\begin{array}{rcl} \frac{\mathrm{d}J} {\mathrm{d}H}& =& \frac{\mathrm{d}J} {\mathrm{d}k} \frac{\mathrm{d}k} {\mathrm{d}H} = 4\sqrt{2b}\left [\frac{\mathrm{d}\mathrm{E}(k)} {\mathrm{d}k} - 2k^{{\prime}}\frac{-2k} {2k^{{\prime}}} \;\mathrm{K}(k) - (1 - k^{2})\frac{\mathrm{d}\mathrm{K}(k)} {\mathrm{d}k} \right ] \frac{1} {2k} \frac{R} {2g} = {}\\ & =& \frac{2} {\pi k}\sqrt{ \frac{R} {g}} \left [\frac{\mathrm{d}\mathrm{E}(k)} {\mathrm{d}k} + 2k\;\mathrm{K}(k) - (1 - k^{2})\frac{\mathrm{d}\mathrm{K}(k)} {\mathrm{d}k} \right ]. {}\\ \end{array}$$

In the last step, we made use of Eq. (2.132), which led to

$$\displaystyle{\sqrt{2b} = \frac{2} {\pi } \sqrt{ \frac{g} {R}}.}$$

With

$$\displaystyle{\fbox{$\frac{\mathrm{d}\mathrm{K}(k)} {\mathrm{d}k} = \frac{\mathrm{E}(k)} {kk^{{\prime}2}} -\frac{\mathrm{K}(k)} {k} $}}$$

and

$$\displaystyle{\fbox{$\frac{\mathrm{d}\mathrm{E}(k)} {\mathrm{d}k} = \frac{\mathrm{E}(k) -\mathrm{ K}(k)} {k},$}}$$

we obtain

$$\displaystyle{ \frac{\mathrm{d}J} {\mathrm{d}H} = \frac{2} {\pi k}\sqrt{\frac{R} {g}} \;\frac{\mathrm{E}(k) -\mathrm{ K}(k) + 2k^{2}\;\mathrm{K}(k) -\mathrm{ E}(k) + k^{{\prime}2}\mathrm{K}(k)} {k} = \frac{2} {\pi } \sqrt{\frac{R} {g}} \;\mathrm{K}(k).}$$

This finally leads to

$$\displaystyle{ \omega = \frac{\mathrm{d}H} {\mathrm{d}J} = \frac{\pi } {2\mathrm{K}(k)}\;\sqrt{ \frac{g} {R}}. }$$
(2.138)

The calculation presented here can be simplified significantly if the derivative with respect to H is determined before the integral is evaluated. This is done in Sect. 3.16 for an analogous problem.

Now our considerations are complete in principle. Only the geometric meaning of the modulus k remains to be clarified.

From Eq. (2.136), one obtains

$$\displaystyle{k = \sqrt{\frac{1} {2} + \frac{HR} {2g}}.}$$

Initially, the mass is momentarily at rest, so that we have \(p =\dot{\alpha }= 0\). Therefore, according to Eq. (2.130),

$$\displaystyle{H = -\frac{g} {R}\;\cos \hat{\alpha }}$$

is the value of the Hamiltonian (which remains constant). This leads to

$$\displaystyle{k = \sqrt{\frac{1-\cos \hat{\alpha }} {2}}.}$$

Since

$$\displaystyle{\cos \hat{\alpha }=\cos ^{2} \frac{\hat{\alpha }} {2} -\sin ^{2} \frac{\hat{\alpha }} {2} = 1 - 2\;\sin ^{2} \frac{\hat{\alpha }} {2},}$$

this may be written in the form

$$\displaystyle{k =\sin \frac{\hat{\alpha }} {2}.}$$

2.11.10 Vlasov Equation

From the formula

$$\displaystyle{\mathrm{div}(\rho \;\vec{v}) =\vec{ v} \cdot \mathrm{ grad}\;\rho +\rho \;\mathrm{ div}\;\vec{v},}$$

which is known from vector analysis, the continuity equation (2.115) for incompressible flows leads to the differential equation

$$\displaystyle{-\dot{\rho } =\vec{ v} \cdot \mathrm{ grad}\;\rho.}$$

If we consider a Hamiltonian system with one degree of freedom that describes the incompressible flow, we have

$$\displaystyle{\vec{r} = \left (\begin{array}{cc} q\\ p \end{array} \right ),\qquad \vec{v} =\dot{\vec{ r}} = \left (\begin{array}{cc} \dot{q}\\ \dot{p} \end{array} \right ).}$$

Therefore, one obtains

$$\displaystyle\begin{array}{rcl} & \frac{\partial \rho } {\partial t} +\dot{ q}\; \frac{\partial \rho } {\partial q} +\dot{ p}\; \frac{\partial \rho } {\partial p} = 0 & {}\\ & \Rightarrow \frac{\partial \rho } {\partial t} + \frac{\partial H} {\partial p} \; \frac{\partial \rho } {\partial q} -\frac{\partial H} {\partial q} \; \frac{\partial \rho } {\partial p} = 0.& {}\\ \end{array}$$

This is the Vlasov equation. It describes how the particle density ρ at different locations changes with time.

2.11.11 Outlook

A dynamical system is called conservative if the total energy (or the area in phase space) remains constant.

Every autonomous Hamiltonian system is conservative. However, there exist non-Hamiltonian systems that are conservative.

A function I(qk, pk) that does not depend on t and that does not change its value on the trajectory is called a constant of the motion. Such a constant of the motion allows one to reduce the order of the problem by 1 (cf. Tabor [53, p. 2]), since one may express one variable in terms of the other variables by means of this function.

For a non-Hamiltonian system of order n, one therefore needs n − 1 constants of the motion in order to completely solve the differential equation by means of quadratures (cf. Tabor [53, p. 39]).

A Hamiltonian system is called integrable if the solution can be determined by quadratures (cf. Rebhan [22, vol. I, p. 287]). This is the case if the problem can be written in action-angle variables.

In contrast to non-Hamiltonian systems, one needs only n constants of the motion if a Hamiltonian system of order 2n with n degrees of freedom is considered (instead of 2n − 1, as in the general case).

Conservative Hamiltonian systems with one degree of freedom (order 2) are integrable (cf. Rebhan [22, vol. I, p. 359]). This is obvious, because the Hamiltonian itself is a constant of the motion.

Chaotic behavior is possible only in nonintegrable systems (cf. Rebhan [22, vol. I, pp. 335 and 359]). Therefore, chaos is not possible in autonomous Hamiltonian systems with one degree of freedom. However, if more degrees of freedom are present, chaotic behavior may also occur in autonomous Hamiltonian systems.