1 Introduction

1.1 Background and Examples

The term ‘photonic imaging’ describes an optical imaging setup where the available measurement data Y are counts of detected photons. The origin of these photons can be diverse in its nature. In coherent X-ray imaging (see e.g. Chap. 2), photons emitted by an X-ray source (like a free electron laser) are scattered (and/or absorbed) by a specimen. In fluorescence microscopy (see e.g. Chap. 1 or Chap. 7), marker molecules are excited by an excitation pulse and emit photons with a certain probability. These two examples are characteristic for the wide range of scenarios arising in photonic imaging: in coherent X-ray imaging we have on the one hand single-molecule diffraction data composed of only few photons [1], and on the other hand holographic experiments where millions of photons can be collected from one sample [2]. In fluorescence microscopy, the number of photons is intrinsically limited to a few hundred or thousand per marker due to bleaching effects, and in case of temporally resolved measurements, only a handful of photons is available per time step [3]. Similar restrictions arise in related imaging modalities, including those based on Förster resonance energy transfer (FRET) or metal induced energy transfer (MIET), see e.g. Chap. 8 or [4, 5] for a discussion. Although not within the context of nanoscale imaging, statistically related is astrophysical imaging. Here, there is no a priori limit for the observation time and hence for the number of photons. However, the former is practically limited to several minutes to avoid severe motion blur, see e.g. [6, 7] for examples. We also mention positron emission tomography (PET), where the total number of emitted photons should be as small as possible to minimize the radiation dose for the patient [8]. In all of these applications, detected photons can also originate from undesired background contributions, whose nature strongly depends on the experimental setup, adding additional noise to the observations.

1.2 Purpose of the Chapter

The aim of this chapter is to give an overview over prototypical approaches to model the data emerging in photonic imaging from a statistical point of view, based on the physical modeling of photon observation. A sketch of the typical imaging setup we consider is presented in Fig. 4.1.

Fig. 4.1
figure 1

Sketch of the imaging process. A source emits photons that are mapped on a (binned) detection interface \(\varOmega \) through the optical system. The underlying photon intensity \(\lambda (\cdot , t)\) at time t, which is determined by the physics of the specific imaging setup, is used for statistical modeling of the detected signal

We assume that the imaging process is described by an underlying photon intensity \(\lambda : \varOmega \times \left[ 0,T\right] \rightarrow \left[ 0,\infty \right) \) at the detector interface, where \(\varOmega \) is the spatial domain of observation (which can be two- or three-dimensional) and T is the total observation time. Let us enumerate the emitted photons by \(1,\ldots , N\) and denote their specific detection position and time by \(\left( \mathbf {x}_i, t_i\right) \in \varOmega \times \left[ 0,T\right] \). For a given (measurable) subset \(A \subset \varOmega \) and time interval \(I \subset \left[ 0,T\right] \) we write \(Y\left( A \times I\right) := \# \left\{ 1 \le i \le N~ \big |~ \mathbf {x}_i \in A, t_i \in I\right\} \) to denote the number of photons observed in A during I. The expected number of photons detected in \(A\times I\) is by definition of \(\lambda \) given by

$$\begin{aligned} \mathbb {E}\left[ Y \left( A \times I\right) \right] = \int \limits _I \int \limits _A \lambda \left( \mathbf {y},t\right) \,\mathrm d \mathbf {y}\,\mathrm d t\ . \end{aligned}$$
(4.1)

Note that this includes all detected photons, including all background contributions. We will always assume \(\lambda \ge 0\), which ensures that the integral in (4.1) is well-defined (however it might be \(\infty \)).

Throughout this manuscript, we will discuss statistical models for the distribution of the observations Y, depending on the physical measurement setup. We assume \(\lambda \) to be given, as deriving or estimating \(\lambda \) and/or other model parameters described (implicitly) by \(\lambda \) is the topic of other expositions (see e.g. Chap. 5 or Chap. 11).

1.3 Measurement Devices

Depending on the type of sensor used for photon detection, different models for photonic imaging settings have been proposed. One commonality of all measurement setups is that the spatial domain of observation \(\varOmega \) is discretized into detector regions, so-called bins. We will assume that the detectors on all bins have identical physical properties, and we denote the centers of such bins by \(\mathbf {x}\in \varXi \) with \(\varXi \) being the set of all bin centers. If a charge-coupled device (CCD) camera is used for detection, all bins (the pixels of the sensor) can be observed simultaneously. This is e.g. the case in most coherent X-ray experiments or astrophysical imaging. PET requires a tomographic setup consisting of several photomultiplier tubes (PMT) surrounding the patient (see e.g. [9]). In confocal fluorescence microscopy the most widely applied detectors are based on avalanche photodiodes, which can measure photons in one bin at a time only. Hence, the domain of observation \(\varOmega \) is typically scanned by physically moving the specimen (or detector) at a fast pace. Temporal simultaneous photons can be measured as well, requiring a different experimental setup (see e.g. [10]).

Most photon detectors rely on the photoelectric effect. With a certain probability (the quantum efficiency), incident photons will release photo electrons on the detector surface. Since single electrons cannot be detected reliably, the signal is typically amplified by a cascade of electron multiplying systems. This introduces additional noise due to the stochastic nature of the multiplying steps. Another complication is the existence of dead times. The dead time of a detection device refers to the time interval (after activation) during which it is unable to record another event. Dead times can, for example, arise due to the necessity to recharge conductors in-between measurements, or due to time delays caused by analog-to-digital conversion and data storage. Details on the statistics of different detectors can be found in [11, Cpt. 12].

1.4 Structure and Notation

For the remainder of this chapter we will develop and discuss models for the right part in Fig. 4.1 with different degrees of accuracy. The model choice mainly depends on the total number of detected photons and on the spatial and temporal dependency structure of the randomly generated photons. We will start with the Poisson model, which is well-known and most common for many applications. It can be derived immediately from (4.1) under the assumption of independence, which explains its wide use in photonic imaging (see e.g. the reviews [7, 12] and the references therein). However, if it is necessary to count photons on small time scales, or if independence is not given, a more refined modeling is on demand. In these situations, we turn towards Bernoulli and Binomial models subsequently, and discuss to what extend they are compatible with the aforementioned Poisson model. Finally we turn to the case of large counting rates, which lead to Gaussian models based on asymptotic normality. We discuss differences and commonalities arising from the different base models and indicate in which situation which model should be used. This will be linked to different examples from this book, where we argue if our assumptions are met or not.

Let us introduce the basic notation used in this chapter. We will always assume that any observation y is the realization of a random variable Y, and we will denote by \(\mathbb P\) probabilities w.r.t. this random object. By \(\mathbb E\) and \(\mathbb V\) we will denote the expectation and variance w.r.t. \(\mathbb P\), respectively. The letters \(\mathcal P, \mathcal B\) and \(\mathcal N\) will denote the Poisson, Binomial and normal distribution introduced below. Random variables will always be denoted by capital letters \(X, X_i, Z\) etc., and if we write i.i.d. for a sequence \(X_1, X_2,\ldots \) of random variables, this stands for independent identically distributed.

2 Poisson Modeling

Suppose we have a perfect photon detector that registers the individual arrival times of all emitted photons reaching a bin without missing any. We will focus on describing a single bin for the moment to avoid notational difficulties. In this situation, the total number of collected photons often can be modeled as Poissonian. A random variable X follows a Poisson law with parameter (intensity) \(\mu \ge 0\), if

$$ \mathbb {P}\left[ X = j\right] = \frac{\mu ^j}{j!} \exp \left( -\mu \right) , \quad j\in \mathbb {N}_0. $$

We write \(X \sim \mathcal P \left( \mu \right) \). The following fundamental theorem about point processes explains why the Poisson distribution often comes into play when modeling photon counts:

Theorem 4.1

Suppose we observe a random number N of photons at random arrival times \(0 \le t_1< \cdots \ < t_N \le T\) such that

  1. (a)

    for each choice of disjoint intervals \(I_1, \ldots , I_n \subset \left[ 0,T\right] \), the random variables \(\#\left\{ 1 \le k \le N ~\big |~ t_k \in I_i\right\} \), \(1 \le i \le n\), corresponding to the number of observed photons during \(I_i\) are independent, and

  2. (b)

    there exists some integrable function \(\mu \) on \(\left[ 0,T\right] \) such that for any choice \(0 \le a < b\le T\) it holds

    $$ \mathbb {E}\left[ \#\left\{ 1 \le k \le N ~\big |~ a \le t_k \le b\right\} \right] = \int \limits _a^b \mu \left( t\right) \,\mathrm d t. $$

Then, for all \(0 \le a < b\le T\), the number of photons observed between time a and time b is Poisson distributed with parameter \(\int _a^b \mu \left( t\right) \,\mathrm d t\), i.e.

$$ \#\left\{ 1 \le k \le N ~\big |~ a \le t_k \le b\right\} \sim \mathcal P \left( \int _a^b \mu \left( t\right) \,\mathrm d t\right) . $$

For the proof we refer to [13, Theorem 1.11.8]. In terms of probability theory, this theorem implies that the point process \(X := \sum _{i=1}^N \delta _{t_i}\), with \(\delta _t\) denoting the Dirac measure at t, is a Poisson point process with intensity \(\mu \) if the stated assumptions are satisfied.

Let us discuss these assumptions. Condition (b) underlies our whole modeling procedure as described in (4.1) and seems universally evident. Temporal independence of the arrival times in (a) is more critical but seems (at least approximately) reasonable in many imaging modalities where photons arise from a high-intensity source, including coherent X-ray imaging. However, if the photons arise from fluorescent markers, temporal independence can be violated due to hidden internal states of the fluorophores, energy transfer between different fluorophores on small time and spatial scales (e.g. FRET), or dead times of the detectors.

If temporal independence is given, then Theorem 4.1 states that the number \(Y_{\mathbf {x}, t}\) of collected photons within a bin \(B_{\mathbf {x}}\) until time \(t \in \left[ 0,T\right] \) can naturally be modeled by a Poissonian random variable with intensity \(\int _0^t \int _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau \). This gives rise to the following model:

Poisson model

Let the spatial domain of observation \(\varOmega \) be discretized into bins \(B_{\mathbf {x}}\) with centers \(\mathbf {x}\in \varXi \). We assume that our observations are given by a field \(Y_t := \left( Y_{\mathbf {x}, t}\right) _{\mathbf {x}\in \varXi }\) of random variables such that

$$\begin{aligned} Y_{\mathbf {x}, t} \sim \mathcal P \left( \int \limits _0^t \int \limits _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau \right) , \qquad \mathbf {x}\in \varXi , t \in \left[ 0,T\right] \end{aligned}$$
(4.2)

for some intensity function \(\lambda \ge 0\).

This is the basis of many popular models covering a variety of distinct applications. Examples include PET (see Vardi et al. [9]), astronomy and fluorescence microscopy (see Bertero et al. [7] or Hohage and Werner [12]), or a more subtle model for CCD cameras due to Snyder et al. [14, 15].

Note that so far we have assumed that all arriving photons are collected by the detector. This will however be never the case due to several physical limitations, see Fig. 4.2.

Fig. 4.2
figure 2

Statistical photon thinning at the detector interface. Photons that reach the detection plane can stay undetected due to various reasons. For example, they can fail to free a photo electron, miss the sensitive regions of the detector bins, or arrive during the dead time caused by a previously recorded photon

The specific efficiency depends strongly on the setup and can vary considerably. Additionally to different quantum efficiencies of different detectors, it might also happen that the detector does not cover all of \(\varOmega \) or has some dead subregions (like interfaces between individual elements). This causes a loss of measured photons and hence a statistical thinning of the random variable \(Y_{\mathbf {x}, t}\). In this case, the actually observed random variable \(\widetilde{Y}_{\mathbf {x},t}\) can be written as

$$\begin{aligned} \widetilde{Y}_{\mathbf {x},t} = \sum _{i=1}^{Y_{\mathbf {x}, t}} X_i \end{aligned}$$
(4.3)

with Bernoulli random variables \(X_i\) having success probability \(\eta _i \in \left[ 0,1\right] \) where each \(X_i\) indicates if the ith photon has been detected. If, in addition, the thinning happens identically and independently for each photon, i.e. \(X_i {\mathop {\sim }\limits ^{\text {i.i.d.}}} \mathcal B \left( 1,\eta \right) \), only the parameter in the Poisson law (4.2) changes, but not its distributional structure. More precisely, in this case it follows (see the Appendix) that

$$ \widetilde{Y}_{\mathbf {x}, t} \sim \mathcal P \left( \eta \int \limits _0^t \int \limits _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau \right) . $$

Consequently, the imperfectness of a detector (as long as the induced thinning happens independent for each photon) can be seen as a scaling of the underlying photon intensity \(\lambda \) by an efficiency factor \(\eta \in \left( 0,1\right] \). In agreement with Fig. 4.1 we can hence assume that all physical processes causing a thinning have already been treated when modeling \(\lambda \) in the following.

Besides this kind of independent thinning, a further important issue in many imaging modalities is the dead time \(\varDelta t\) of the employed detector. Dead times can vary significantly depending on the type of detector, but usually are in the range of nanoseconds. If a photon arrives at time \(t \in \left[ 0,T\right) \), the detector will only be able to record the next photon arriving after \(t + \varDelta t\). Note that whenever \(\varDelta t > 0\), at most \(T/\varDelta t\) photons can be detected during the whole measurement, which contradicts (4.2) in the sense that \(\mathbb {P}\left[ Y_{\mathbf {x}, T} >T/\varDelta t\right] = 0\) in this case. Such an upper limit on the total number of detected photons can crucially change the distribution, which can, e.g., be seen from the following fact proven in the appendix:

Theorem 4.2

Fix \(\mathbf {x}\in \varXi \) and let \(I_1, \ldots , I_m\) be a decomposition of \(\left[ 0,T\right] \) into disjoint intervals. Denote by \(X_i\) the number of photons observed during \(I_i\) in bin \(B_{\mathbf {x}}\). Assume model (4.2), and suppose that \(X_1,\ldots , X_m\) are independent. Then the conditional distribution given \(Y_{\mathbf {x}, T} = N\) of \((X_1, \ldots , X_m)\) is multinomial with parameter N and probability vector \(\left( p_1, \ldots , p_m\right) \) where

$$ p_i = \frac{\int \limits _{I_i} \int \limits _{B_{\mathbf {x}}}\lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau }{\int \limits _0^T \int \limits _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau }. $$

In other words, Theorem 4.2 states that, conditioning on the total number of photons, the arrival times of individual photons behave like a Bernoulli process with intensity \(\tau \mapsto \int _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\). This implies that conditioning on the total number of photons introduces a dependency structure between the number of counts during different time intervals. Consequently, if \(\varDelta t\) cannot be neglected, temporal independence is not given anymore, hence corrupting the Poisson law, and different modeling approaches are needed.

3 Bernoulli Modeling

To measure the temporal structure of the incoming photons, counting as described above is not sufficient. In such cases, photons are consecutively counted during (short) time frames. We suppose that the discretization of the temporal measurement process is refined such that temporal aggregation underlying the Poisson model is not appropriate anymore. This is described by (equidistant) time frames, which are consecutive intervals \(I_1, I_2, \ldots , I_n \subset \left[ 0,T\right] \) of equal length \(\delta >0\), chosen such that the probability to observe more than one photon in each bin \(B_{\mathbf {x}}\) during any interval is sufficiently close to 0, and separated by a waiting time \(\epsilon >0\), which allows to ignore the dead time. In this situation, the following model is a reasonable approximation:

Bernoulli model

For \(\mathbf {x}\in \varXi \) and \(1 \le i \le n\) the random variable \(Y_{\mathbf {x}, i}\) indicating if a photon arrives in bin \(B_{\mathbf {x}}\) during the time interval \(I_i\) follows a Bernoulli distribution,

$$\begin{aligned} Y_{\mathbf {x}, i} \sim \mathcal B \left( 1,p_{\mathbf {x}, i}\right) , \end{aligned}$$
(4.4)

with success probability

$$\begin{aligned} p_{\mathbf {x},i} \approx \int \limits _{I_i} \int \limits _{B_{\mathbf {x}}}\lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau . \end{aligned}$$
(4.5)

As mentioned before, the detector will hardly count all arriving photons, which causes a statistical thinning as in (4.3). If the thinning happens independently of the photon arrivals, we obtain \(\widetilde{Y}_{\mathbf {x}, i} \sim \mathcal B\left( 1,\eta \cdot p_{\mathbf {x}, i}\right) \) with the probability \(\eta \) that an incident photon is detected, which immediately follows from \(X \cdot Z \sim \mathcal B \left( 1,pp'\right) \) if \(X \sim \mathcal B \left( 1,p\right) \) is independent of \(Z \sim \mathcal B\left( 1,p'\right) \).

In many imaging setups, it would be difficult to store the whole time series \(Y_{\mathbf {x}, i}\), for instance due to memory limitations. Examples include fluorescence microscopy setups like confocal, STED or 4Pi microscopy, or coherent X-ray imaging, where millions of photons are observed in short times, which would require an unreasonably fine time discretization. For other examples like SMS microscopy, however, the temporal structure can be important (e.g. for adjusting temporal drifts, see e.g. [16, 17]) and hence most of the data of the above model has to be used. If temporal dependencies are less important, it is sufficient to count photon arrivals in some interval \(I \subset \left[ 0,T\right] \) larger than \(\delta \), i.e. to consider \(Y_{\mathbf {x}, I} := \sum _{I_i \subset I} Y_{\mathbf {x}, i}\). The distribution of \(Y_{\mathbf {x}, I}\) depends strongly on the temporal dependency structure of the \(Y_{\mathbf {x}, i}\). In case that they are independent and \(p_{\mathbf {x}, i} \equiv p_{\mathbf {x}}\) for all \(1 \le i \le n\), we obtain a Binomial model:

Binomial model

For \(\mathbf {x}\in \varXi \) and \(I \subset \left[ 0,T\right] \), the number of photons observed in the bin centered at \(\mathbf {x}\) during the time interval I is

$$\begin{aligned} Y_{\mathbf {x}, I} \sim \mathcal B\left( \# \left\{ I_i \subset I\right\} , p_{\mathbf {x}}\right) \end{aligned}$$
(4.6)

with \(p_{\mathbf {x}, i} \equiv p_{\mathbf {x}}\) for all \(1 \le i \le n\) and \(p_{\mathbf {x},i}\) as in (4.5).

Note  that if we proceed similarly with the thinned observations \(\widetilde{Y}_{\mathbf {x}, i}\), we obtain \(\widetilde{Y}_{\mathbf {x}, I} \sim \mathcal B\left( \# \left\{ I_i \subset I\right\} , \eta p_{\mathbf {x}}\right) \), which is the canonical thinning of (4.6), see e.g. [18].

Independence of the \(Y_{\mathbf {x}, i}\) is strongly connected to the photon source, as discussed above. If \(\epsilon \ge \varDelta t\), the dead times of the detectors have no influence on the temporal dependency structure anymore. The second assumption, \(p_{\mathbf {x}, i} \equiv p_{\mathbf {x}}\) for all \(1 \le i \le n\), is equivalent to stationarity of the underlying photon source, which again depends on the imaging modality. If, e.g., a freeze-dried sample is imaged sufficiently fast, then this assumption is reasonable.

Besides temporal dependencies, the field of random variables can also have a spatial dependency structure. In many modalities the random variables are independent for different pixels or voxels \(\mathbf {x}\), but on sufficiently small scales some dependency can occur, e.g., due to energy transfer between molecules.

3.1 Law of Small Numbers

It is a fundamental and well-known fact that a Binomial distribution can in certain situations be approximated by a Poissonian distribution. In this section, we will discuss how this provides a link between the initial Poisson modeling (4.2) and the preceding Bernoulli modeling (4.6). To this end, we recall the so-called law of small numbers, which will be stated in terms of Le Cam’s theorem [19]. For the moment we suppress dependencies on \(\mathbf {x}\) and consider only a single Binomial random variable, corresponding to a fixed bin.

Theorem 4.3

(Law of small numbers) Let \(X_1, \ldots , X_m\) be independent and Bernoulli distributed with success probabilities \(q_1, \ldots , q_m\). Then the distribution of \(X := X_1 + \cdots + X_m\) can be approximated by \(\mathcal P \left( \lambda _m\right) \) with \(\lambda _m = -\sum _{i=1}^m \log \left( 1-q_i\right) \). More precisely it holds that

$$\begin{aligned} \sum \limits _{k=0}^\infty \left| \,\mathbb {P}\left[ X = k\right] - \frac{\lambda _m^k}{k!} \exp \left( -\lambda _m\right) \,\right| \le 2 \sum \limits _{i=1}^m \left( \log \left( 1-q_i\right) \right) ^2. \end{aligned}$$
(4.7)

For a textbook proof we refer to [20, Theorem 5.1]. Figure 4.3 visualizes the law of small numbers. Note that the bound on the right-hand side of (4.7) can be simplified by using \(\log \left( 1-x\right) \le -x\), resulting in \(\sum _{i=1}^m q_i^2\). We furthermore refer to [21, Proposition 4.3 and 4.4], where bounds of the supremum instead of the sum over k on the left-hand side are given. Note that Theorem 4.3 can be generalized to dependent Bernoulli random variables at the price of a worse upper bound, see e.g. [20, Theorem 5.5].

Fig. 4.3
figure 3

Law of small numbers. Figure (a) shows the probabilities for a sum of 50 Bernoulli variables with \(q_i = 0.1\) and the respective Poisson approximation with \(\lambda = -50\log (0.9)\approx 5\). Figure (b) depicts the left hand side of (4.7) and the corresponding upper bound (right hand side of (4.7)) of Theorem 4.3 for increasing m and \(q_i := 5/m\)

A classical example for this law is the situation when \(q_i \equiv q_m\) for all \(1 \le i \le m\) and \(q_m \cdot m\) converges to some \(\lambda >0\), i.e., \(q_m\sim 1/m\). In this case we may use \(\log \left( 1-x\right) \approx -x\) for small x to obtain \(\lambda _m \approx m q_m \rightarrow \lambda \) and \(2 \sum _{i=1}^m \left( \log \left( 1-q_i\right) \right) ^2 \approx 2 \sum _{i=1}^m q_i^2= 2 m q_m^2 \sim 1/m \rightarrow 0\) as \(m\rightarrow \infty \), i.e., the Binomial distribution of X converges rapidly to the Poisson distribution with parameter \(\lambda \).

On the other hand, if the success probabilities \(q_i \equiv q \in \left( 0,1\right) \) are fixed, the right-hand side of (4.7) diverges. This seems intuitive, as in this situation convergence towards a normal distribution has to be expected (cf. Sect. 4.4.1 below). This is in line with the observation that a Poisson distribution with growing parameter \(\lambda _m = -m \log \left( 1-q\right) \) converges towards a normal distribution (cf. Sect. 4.4.2 below).

Let us now compare the two Poisson laws arising from Theorem 4.3 and (4.2). According to (4.4), our observations are Binomial random variables with success probability

$$ p_{\mathbf {x},i} \approx \int \limits _{I_i} \int \limits _{B_{\mathbf {x}}}\lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau , $$

where we used that the probability to observe more than one photon is close to 0. Hence, if we denote the largest time in \(I_m\) by \(t_m\), and use again \(\log \left( 1-x\right) \approx -x\), then the number of photons observed until time \(t_m\) is approximately Poisson distributed with parameter \(\lambda _{\mathbf {x},m} = p_{\mathbf {x},1} + \cdots + p_{\mathbf {x},m} \approx \int _0^{t_m} \int _{B_{\mathbf {x}}}\lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau \) ignoring the waiting times. This is in good agreement with (4.2). According to (4.7), the error in this approximation is bounded by

$$ 2 \sum \limits _{i=1}^m \left( \log \left( 1-p_{\mathbf {x},i}\right) \right) ^2 \le 2 \lambda _{\mathbf {x},m} \max \limits _{1 \le i \le m} p_{\mathbf {x},i} \le C \max \limits _{1 \le i \le m} \left| I_i\right| , $$

revealing it valid whenever the temporal discretization is sufficiently fine.

4 Gaussian Modeling

4.1 As Approximation of the Binomial Model

Besides the approximation by a Poisson distribution, it is well-known that a Binomial model can also be approximated by a Gaussian one under suitable circumstances. Let us start with the Bernoulli model (4.4) and suppose that all \(Y_{\mathbf {x},i}\) are independent with \(p_{\mathbf {x},i} \equiv p_{\mathbf {x}}\). If we are interested in the total number of counts \(Y_{\mathbf {x}}:= \sum _{i=1}^n Y_{\mathbf {x},i}\) in bin \(\mathbf {x}\), the de Moivre-Laplace theorem states that

$$\begin{aligned} \frac{Y_{\mathbf {x}} - n p_{\mathbf {x}}}{\sqrt{np_{\mathbf {x}}\left( 1-p_{\mathbf {x}}\right) }} \longrightarrow Z\quad \text {as}\quad n \rightarrow \infty \end{aligned}$$
(4.8)

in distribution, where \(Z \sim \mathcal N \left( 0,1\right) \) follows a standard normal distribution. Note that \(\frac{Y_{\mathbf {x}} - n p_{\mathbf {x}}}{\sqrt{np_{\mathbf {x}}\left( 1-p_{\mathbf {x}}\right) }}\) is just the centered and standardized version of the total number of counts \(Y_{\mathbf {x}}\). This implies that the distribution of \(Y_{\mathbf {x}}\) can be approximated by a Gaussian distribution with mean \(n p_{\mathbf {x}}\) and variance \(np_{\mathbf {x}}\left( 1-p_{\mathbf {x}}\right) \) if n is sufficiently large. This gives rise to a first Gaussian model:

Gaussian model I

For each \(\mathbf {x}\in \varXi \), the number of photons observed in the bin centered at \(\mathbf {x}\) up to time T is

$$\begin{aligned} Y_{\mathbf {x}} \sim \mathcal N \left( n p_{\mathbf {x}}, np_{\mathbf {x}}\left( 1-p_{\mathbf {x}}\right) \right) \end{aligned}$$
(4.9)

where \(n = n(T) \sim T/\delta \) with the length \(\delta \) of the individual time frames.

The rate of convergence in (4.8) can be made more precise. For instance a special case of the Berry-Esseen theorem states

$$\begin{aligned} \sup \limits _{y \in \mathbb R} \left| \mathbb {P}\left[ \frac{Y_{\mathbf {x}} - n p_{\mathbf {x}}}{\sqrt{np_{\mathbf {x}}\left( 1-p_{\mathbf {x}}\right) }} \le y\right] - \varPhi (y) \right| < \frac{\sqrt{10} + 3}{6 \sqrt{2\pi }} \frac{p_{\mathbf {x}}^2 + \left( 1-p_{\mathbf {x}}\right) ^2}{\sqrt{n p_{\mathbf {x}}\left( 1-p_{\mathbf {x}}\right) }} \end{aligned}$$
(4.10)

where \(\varPhi \) denotes the distribution function of \(\mathcal N \left( 0, 1\right) \), i.e.,

$$\begin{aligned} \varPhi (y) = \frac{1}{\sqrt{2\pi }}\int _{-\infty }^y \exp \left( -\frac{x^2}{2}\right) \,\mathrm {d}x. \end{aligned}$$

In fact, the constant on the right-hand side of (4.10) cannot be improved [22]. An interpretation of this theorem is that the approximation leading to the model (4.9) is reasonable as soon as \(n p_{\mathbf {x}}\left( 1-p_{\mathbf {x}}\right) >9\), which implies the right-hand side of (4.10) to be bounded by \(\frac{\sqrt{10} + 3}{18 \sqrt{2\pi }} \approx 0.137\).

If the success probabilities \(p_{\mathbf {x}, i}\) do vary in i, the de Moivre-Laplace theorem (4.8) cannot be applied immediately. However, it is still possible, under certain conditions, to derive an approximate Gaussian model of the form (4.9) by applying the Lindeberg central limit theorem (see e.g. [23]). It states that the sum \(Y_{\mathbf {x}}\), after centralization and standardization, still converges to \(\mathcal N \left( 0,1\right) \) in distribution even for non identically distributed \(Y_{\mathbf {x}, i}\). This motivates a second Gaussian model:

Gaussian model II

For each \(\mathbf {x}\in \varXi \), the number of photons observed in the bin centered at \(\mathbf {x}\) up to time T is

$$\begin{aligned} Y_{\mathbf {x}} \sim \mathcal N \left( \sum \limits _{i=1}^n p_{\mathbf {x},i}, \sum \limits _{i=1}^n p_{\mathbf {x},i}\left( 1-p_{\mathbf {x},i}\right) \right) . \end{aligned}$$
(4.11)

Note that, if the random variables \(Y_{\mathbf {x},i}\) are dependent, the type of dependency very much determines whether a central limit theorem is still valid (with different limiting variance), see e.g. [24] or [25,26,27] for mixing sequences, and [28] for martingale difference sequences, to mention two large classes of examples.

4.2 As Approximation of the Poisson Model

The Poisson model in (4.2) can also be approximated by a Gaussian one. This relies on the fact that the Poisson distribution is infinitely divisible, which means that whenever \(X \sim \text {Poi} \left( \mu \right) \), then X can be represented as \(X = X_1 + \cdots + X_n\) for any \(n \in \mathbb N\) with i.i.d. random variables \(X_1, \ldots , X_n \sim \text {Poi} \left( \mu /n\right) \). Consequently, the central limit theorem states that

$$\begin{aligned} \frac{X-\mu }{\sqrt{\mu }} \longrightarrow Z,\quad \text {as}\quad \mu \rightarrow \infty \end{aligned}$$

with \(Z \sim \mathcal N \left( 0,1\right) \). The general Berry-Esseen theorem can also be used to bound the error of an approximation of \(\frac{X-\mu }{\sqrt{\mu }}\) by Z, namely one obtains (see also [29])

$$\begin{aligned} \sup \limits _{y \in \mathbb R} \left| \mathbb {P}\left[ \frac{X-\mu }{\sqrt{\mu }} \le y\right] - \varPhi (y) \right| < \frac{5}{2} \frac{1}{\sqrt{\mu }}. \end{aligned}$$
(4.12)

Hence, if \(\mu \) is sufficiently large, the distribution of X can be approximated by a Gaussian distribution with mean and variance \(\mu \). If we suppose that \(Y_{\mathbf {x}, t}\) satisfies (4.2) and that \(\int _0^t \int _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau \rightarrow \infty \) as \(t \rightarrow \infty \), then the above reasoning gives rise to another Gaussian model:

Gaussian model III

For each \(\mathbf {x}\in \varXi \), the number of photons observed in the bin centered at \(\mathbf {x}\) up to time t is

$$\begin{aligned} Y_{\mathbf {x}, t} \sim \mathcal N \left( \int \limits _0^t \int \limits _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau ,\int \limits _0^t \int \limits _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau \right) . \end{aligned}$$
(4.13)

4.3 Comparison

Let us briefly compare the Gaussian models I-III in (4.9), (4.11) and (4.13) respectively. It is clear that (4.11) is a generalization of (4.9) to the case of non-identical success probabilities \(p_{\mathbf {x},i}\), and both coincide if \(p_{\mathbf {x},i}\) is independent of i. To compare (4.11) with (4.13), we recall our previous computation that \(p_{\mathbf {x},1} + \cdots + p_{\mathbf {x},n} = \int _0^{t_n} \int _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau \rightarrow \infty \) where \(t_n\) is the largest time in the sub-interval \(I_n\). Consequently, (4.11) and (4.13) differ only in the variance by \(1-p_{\mathbf {x},i}\), which is usually small. Hence, all three Gaussian models are in good agreement, and (4.13) can be considered the most simple one which should be used.

4.4 Thinning

Taking into account the detection efficiency \(\eta \in \left[ 0,1\right] \) as discussed before, we will arrive at models similar to (4.9), (4.11) and (4.13) with the only difference being that \(p_{\mathbf {x}}\), \(p_{\mathbf {x},i}\) or \(\lambda \) are multiplied by \(\eta \). In this sense, the canonical thinning of the Poisson or Binomial models carries over to the Gaussian one.

4.5 Variance Stabilization

Note that the variance in the Gaussian models I-III is always inhomogeneous, which hinders data analysis with standard methods and causes further difficulties. This can be overcome by variance stabilization. The most popular choice is the celebrated Anscombe transform, which is applied to the Poisson model (4.2) to obtain asymptotically a normal distribution with variance 1. It is based on the following result (see e.g. [30, Lemma 1]):

Lemma 4.1

(Anscombe’s transform) Let \(\mu >0\) and \(Y \sim \mathcal P \left( \mu \right) \) be a Poisson distributed random variable. Then it holds for all \(c \ge 0\) that

$$\begin{aligned} \mathbb {E}\left[ 2 \sqrt{Y + c}\right]&= 2 \sqrt{\mu } + \frac{4c-1}{4 \sqrt{\mu }} + \mathcal O \left( \frac{1}{\mu ^{\frac{3}{2}}}\right) ,\\ \mathbb {V}\left[ 2 \sqrt{Y + c}\right]&= 1 + \frac{3-8c}{8 \mu } + \mathcal O \left( \frac{1}{\mu ^2}\right) . \end{aligned}$$

From this we can conclude that the choice \(c = 3/8\) ensures that the variance of \(2 \sqrt{Y + c}\) does no longer depend on the parameter \(\mu \) up to second order. To reduce the bias, \(c = 1/4\) is the best choice. Furthermore, applying this result to the Poisson model in (4.2) gives rise to a fourth Gaussian model:

Gaussian model IV

For each \(\mathbf {x}\in \varXi \), denote the number of photons observed in the bin centered at \(\mathbf {x}\) up to time t by \(Y_{\mathbf {x}, t}\). Then we assume

$$\begin{aligned} 2\sqrt{Y_{\mathbf {x}, t} + \frac{3}{8}} \sim \mathcal N \left( 2 \left( \int \limits _0^t \int \limits _{B_{\mathbf {x}}} \lambda \left( \mathbf {y}, \tau \right) \,\mathrm d \mathbf {y}\,\mathrm d \tau \right) ^{1/2},1\right) \end{aligned}$$
(4.14)

for each \(\mathbf {x}\in \varXi \).

We emphasize the importance of the model (4.14) in statistics, as it turns out to be equivalent in a strict sense to the previously discussed Poisson model (4.2) as the total number of photons (and hence the parameter t) tends to \(\infty \) (see e.g. [31,32,33]).

5 Conclusion

In this chapter we introduced models for photonic imaging setups with different degrees of accuracy. The most common and basic Poisson model (4.2) is accurate as soon as the temporal dependency can be neglected and the detector has no significant dead time. If furthermore the number of observed photons is sufficiently large on each bin, then the Gaussian model (4.13) can be used. In case of significant temporal dependency, the Bernoulli model (4.4) with time resolved individual photon arrivals or the resulting Binomial model (4.6) should be considered instead.

An overview about appropriate model choices for the various imaging techniques discussed previously is provided in Fig. 4.4.

Fig. 4.4
figure 4

Overview over viable model choices for different imaging methods. Projects of the SFB 755 associated to the respective methods are marked in gray

In fluorescence microscopy, STED based methods, which scan the sample pixelwise, record about 10–100 photons per fluorescent marker. Due to low temporal dependencies, we are thus in the scope of the binomial or Poisson models [3]. Even though a Gaussian approximation seems questionable as in regions of low intensities only a few photons per bin can be collected, it has been successfully applied employing variance stabilizing techniques [34]. In order to analyze STORM/PALM data, the full range of modeling approaches is applied. Individual frames contain spots with single or several photons and weak temporal dependency, calling for Bernoulli, binomial, or Poisson models, while Gaussian approximations are used successfully for drift and rotational corrections [17]. FRET/MIET based imaging heavily relies on the interactions of fluorescent markers, so that the assumption of temporal independence is violated. This makes the Bernoulli model the model of choice, or if more photons are counted, also the Binomial model can be applied [4, 5].

Another example in the scope of the Bernoulli model is the 3-photon correlation technique (see e.g. Chap. 16), where molecular structures are probed by femtosecond X-ray pulses. This leads to a high number of images consisting of a few photons only, out of which only triples are used. Inference based on this sequence of images is additionally complicated by rotations of the single target molecules [1].

X-ray diffraction imaging also allows for a whole range of models. On first glance it seems that a Gaussian model is sufficient, as in total millions of photons are collected. However, depending on the specific setup, the photon intensity \(\lambda \) may vary strongly over the detection region. If imaging is performed in a near-field regime, as e.g. in many X-ray microscopy setups, the number of photons in the lower intensity regions is about one order of magnitude lower than in the high intensity regions, allowing for a Gaussian model. In contrast to this are far field methods where on high intensity bins \(10^4\) photons can be collected, but in low intensity regions only a handful of photons arrives, revealing a Binomial and/or Poisson model more suitable [12].