1 Introduction

From Lumigraph (Lippmann 1911) to commercial plenoptic cameras (Ng et al. 2005; Perwaß et al. 2012), several designs have been proposed to capture information that cannot be captured by conventional cameras. Said cameras capture only one point of view of a scene, whereas a plenoptic camera is a device that allows to retrieve spatial as well as angular information. A same point from a scene is projected into multiple observations on the sensor. For instance, this redundant information can be used for digitally refocusing and rendering (Bishop and Favaro 2012) or for depth estimation (Johannsen et al. 2017).

This paper focuses on plenoptic cameras based on a micro-lenses array (MLA) placed between a main lens and a sensor as illustrated in Fig. 3. The specific design of such a camera allows to multiplex both types of information onto the sensor in the form of a micro-images array (MIA), as shown in Fig. 1, but implies a trade-off between the angular and spatial resolutions (Georgiev et al. 2006; Levin et al. 2008; Georgiev and Lumsdaine 2009). It is balanced according to the (MLA) position with respect to the main lens focal plane and the sensor plane, corresponding to unfocused (Ng et al. 2005) or focused (Perwaß et al. 2012; Georgiev and Lumsdaine 2012) configurations.

To further extend the depth of field (DoF) of the plenoptic camera, a multi-focus configuration has been proposed by Perwaß et al. (2012); Georgiev and Lumsdaine (2012). In this setup, the MLA is composed of several micro-lenses with different focal lengths. The same part of a scene will be more or less focused according to the micro-lens’ type. Usually, only micro-images with the smallest amount of blur are used. Alternatively, specific patterns are used to exploit the information (Palmieri and Koch 2017). If one were able to relate the camera parameters to the amount of blur in the image, all information could be used simultaneously, without distinction between types of micro-lenses. As a first step in that direction, we propose a calibration method that takes advantage of blur information.

Fig. 1
figure 1

The Raytrix R12 multi-focus plenoptic camera used in our experimental setup (a), along with a raw image of a checkerboard calibration target (b). The image is composed of several micro-images with different amounts of blur, arranged in a hexagonal grid. In each micro-image, our Blur Aware Plenoptic (BAP) feature is illustrated by its center and its blur circle (c).

Calibration is an initial step for applications using plenoptic imaging. Conventional cameras are usually modeled as pinhole or thin lens. Due to the complexity of plenoptic cameras’ design, the developed models are generally high dimensional. Specific calibration methods have to be proposed to retrieve the intrinsic parameters of these models.

1.1 Related Work

Unfocused plenoptic camera calibration: In the unfocused configuration, the main lens is focused at the MLA plane and the sensor plane is placed at the MLA focal plane. The MLA is therefore focused at infinity, thus calling this configuration unfocused. The calibration of such plenoptic cameras (Ng et al. 2005) has been widely studied in the literature. Most approaches rely on a thin-lens model for the main lens and an array of pinholes for the micro-lenses. Dansereau et al. (2013) introduced a model to decode the pixels into rays, drawing inspiration from Grossberg et al. (2005), for the Lytro plenoptic camera (Ng et al. 2005). Their model is not directly associated with physical parameters and is based on corner detection in reconstructed sub-aperture images (SAIs). Zhou et al. (2019) proposed a practical two-step calibration method for unfocused plenoptic cameras. Their model describes the camera physical parameters but still requires feature points extracted in reconstructed SAIs. Yunsu et al. (2014) formulated a geometric projection model to estimate intrinsic and extrinsic parameters by utilizing raw images directly to avoid errors from reconstruction steps. Their method includes analytical solution and non-linear optimization of the reprojection error of a novel line feature to overcome the difficulties in finding checkerboard corners. Shi et al. (2016) proposed a detailed model of a plenoptic camera in the context of particle image velocimetry (PIV). Based on linear optics, they derived a model based on ray-tracing: contrarily to previous methods, they modeled the main lens and each micro-lens as thin-lenses. Hahne et al. (2018) developed a ray model by ray-tracing from the sensor side to the object space. They consider only the chief ray, connecting micromage centers (MICs) to the exit pupil center. O’Brien et al. (2018) introduced a projection model used for their calibration method suited both for unfocused and focused plenoptic cameras. They present a new feature called plenoptic disc, similar in nature to the circle of confusion (CoC) and defined by its center and its radius. Their feature parametrization is in 3D and is in one-to-one correspondence with point positions in the camera frame, as it is detected in reconstructed image. Zhao et al. (2020) recently presented a metric calibration method for unfocused plenoptic camera only also based on the plenoptic disc but directly from raw image.

In summary, most of the above methods require reconstructed images (SAIs) to extract features, and limit their model to the unfocused configuration, i.e., setting the sensor plane at the micro-lens focal plane. Therefore those models cannot be directly extended to the focused or multi-focus plenoptic camera.

Focused plenoptic camera calibration: With the arrival of commercial focused plenoptic cameras (Lumsdaine and Georgiev 2009; Perwaß et al. 2012), new calibration methods have been proposed. In this configuration, the micro-lenses focus on an intermediate image plane. Johannsen et al. (2013) formulated a general reprojection model in terms of the physical parameters of a Raytrix camera (Perwaß et al. 2012). They proposed a metric calibration and distortions correction using a grid of circular patterns. This work considered a relatively simple model of lens distortion and required careful initialization of the optimization to converge due to high sensibility to local minima. Heinze et al. (2016) improved the previous model by considering more sophisticated models of the main lens distortions. They introduced new parameters including the tilt and shift for the main lens. They are able to distinguish each micro-lens type, calibrating then the distance between the MLA and the sensor for each one but in separated calibration processes. The projection model and the metric calibration procedure are incorporated in the RxLive software of Raytrix GmbH. Strobl and Lingenauber (2016) presented a step-wise calibration approach to overcome the fragility of the initialization which hinders the final optimization. They first determined main lens parameters, then estimated MLA parameters. However, their calibration framework relied on reconstructed total focus images. Zeller et al. (2014) introduced two new methods to calibrate a focused plenoptic camera and depth images obtained from it. In further works (Zeller et al. 2016), they improved the camera projection model by modeling the main lens as a thin lens instead of a pinhole. The calibration process uses the reconstructed total focus image and virtual depth map to compute 3D observations.

All previous methods rely on reconstructed images (SAIs), which can lead to the introduction of errors in the reconstruction step as well as in the calibration process. Usually, computation of reconstructed images requires camera parameters and/or depth information to avoid artifacts and reconstruction error. To overcome this chicken and egg problem, several calibration methods focus on using only raw plenoptic images. Zhang et al. (2016) proposed a calibration method based directly on observations from raw images. They used a parallel bi-planar checkerboard to have a depth-scale prior. They considered a detailed model of the MLA geometry that accounts for non-planarity of the array. Zhang et al. (2018) presented a multi-projection-center model based on the two planes parametrization (Levoy and Hanrahan 1996). They derived a calibration algorithm based on this model and projective transformation, suitable for both unfocused and focused plenoptic cameras. Noury et al. (2017) presented a more complete geometrical model than the previous works. This model relates 3D points to their corresponding image projections, working directly with raw images. They developed a new detector to find checkerboard corners with sub-pixel accuracy in each micro-image. They introduced a new cost function based on reprojection errors of both checkerboard corners and micro-lens centers in raw image space. This enforces projected micro-lens centers to get closer to their corresponding MICs, and makes their method robust to wrong parameters initialization especially concerning those of the MLA. However, their method does not consider different types of micro-lenses and forces them to act as pinholes.

Several methods can account for the multi-focus setting. Bok et al. (2017) extended their previous model (Yunsu et al. 2014) to work with the focused plenoptic camera. They did not explicitly model the micro-lens focal lengths but introduced two additional intrinsic parameters that account for the MLA setting. Each setting – one for each type of micro-lenses –, models a different distance between the MLA and the sensor. Their method can retrieve different intrinsics by running the optimization for each type separately. Nousias et al. (2017) considered the geometric calibration of multi-focus plenoptic cameras. Their method allows to identify the micro-lens types and their spatial arrangement. It operates on checkerboard corners retrieved by a custom micro-image corner detector. Then, they applied their method on each type of micro-lens independently to retrieve specific intrinsic and extrinsic parameters for each configuration. Latter researches (Bok et al. 2017; Nousias et al. 2017; Noury et al. 2017) have achieved improved performance through automation and accurate identification of feature correspondences in raw images. More recently, Wang et al. (2018) proposed a geometric calibration method for focused plenoptic cameras based on virtual image points, establishing the mapping from object points behind the main lens and the MLA to image points on the sensor. Their method can be extended to calibrate multi-focus cameras by considering each type of micro-lenses individually.

In conclusion, most of these methods rely on simplified models for optic elements: the MLA misalignment is not considered, and the micro-lenses are modeled as pinholes thus not modeling their apertures. Some do not consider distortions of the main lens or restrict themselves to the focused case. Finally, few have considered the multi-focus case (Heinze et al. 2016; Bok et al. 2017; Nousias et al. 2017; Wang et al. 2018) but dealt with it in separate processes, leading to intrinsic and extrinsic parameters that vary depending on the type of micro-lens.

1.2 Contributions

We present a new calibration method for plenoptic cameras. To the best of our knowledge, it is the first to allow to calibrate the multi-focus plenoptic camera within a single process taking into account all types of micro-lenses simultaneously. To exploit all available information, we propose to explicitly include the defocus blur in a new camera model. Thus, we introduce a new Blur Aware Plenoptic (BAP) feature defined in raw image space that enables us to handle the multi-focus case. We present a new pre-calibration step using BAP features from white images to provide a robust initial estimation of camera parameters. We use our BAP features in a single optimization process that retrieves intrinsic and extrinsic parameters of a multi-focus plenoptic camera directly from raw plenoptic images of a checkerboard target.

Fig. 2
figure 2

Overview of our proposed method: first, the pre-calibration step retrieves initial camera parameters from white raw images at different apertures; then followed by the detection of BAP features that are used by the camera calibration process and calibration ot the relative blur; finally, once the camera is calibrated, it can be used, as addressed here, for profiling the camera, i.e., to characterize the working range of the camera. Other applications can be considered, such as metric depth estimation.

Fig. 3
figure 3

Focused plenoptic camera model in Galilean configuration with the notations used in this paper. Object points are projected by the main lens behind the MLA into a virtual intermediate space, and then re-imaged by each micro-lens onto the sensor.

This paper extends our previous work (Labussière et al. 2020). In addition to our former contributions, we present here an ablation study of the camera parameters and add further comparisons with state-of-the-art calibration methods. A new camera setup has also been tested to validate the generalization of our method, and a simulation setup is proposed to evaluate our method on Lytro-like configuration. Moreover, we take advantage of our BAP features to develop a new relative blur calibration process to link the geometric blur to the physical blur, i.e., the circle of confusion (CoC) to the point-spread function (PSF). This enables us to fully take advantage of blur in image space. Finally, we propose to use the blur to profile the plenoptic camera in terms of depth of field (DoF).

1.3 Paper Organization

An overview of our method is given in Fig. 2. The remainder of this paper is organized as follows. First, we present the camera model and how we model blur with our BAP feature in Sect. 2. Second, we explain in Sect. 3 how we leverage raw white images in the proposed pre-calibration step to initialize camera parameters. Then, we detail the feature detection in Sect. 4 and the calibration processes in Sect. 5, i.e., the camera calibration and the relative blur calibration. Our experimental setup is presented in Sect. 6. Finally, our results are given and discussed in Sect. 7. The notations used in this paper are shown in Fig. 3. Pixel counterparts of metric values are denoted in lower-case Greek letters. Bold font denotes vectors and matrices.

2 Camera and Blur Models

2.1 The (Multi-Focus) Plenoptic Camera

We consider the focused plenoptic camera, especially the multi-focus case as described by Georgiev and Lumsdaine (2012); Perwaß et al. (2012). The camera is composed of a main lens and photosensitive sensor with a micro-lenses array (MLA) in between, as illustrated in Fig. 3. The multi-focus configuration implies that the micro-lenses array consists of \(I\) different types of lenses. The setup corresponds to the multi-focus system described by Perwaß et al. (2012) with \(I=3\). Note that our model can be applied to the single-focus plenoptic camera as well, corresponding then to the case where \(I= 1\). Finally, the unfocused configuration is a special case of our model where the micro-lens focal length is equal to the distance between the MLA and the sensor, i.e., \(f= d\).

2.1.1 Main Lens

The main lens is modeled as a thin-lens and maps an object point to a virtual point in an intermediate space called the virtual space. An object at distance \(a\) is then projected at a distance \(b\) given the focal length \(F\) according to the thin-lens equation

$$\begin{aligned} \frac{1}{F} = \frac{1}{a} + \frac{1}{b}\text {.}\end{aligned}$$
(1)

The main lens principal point is expressed as \(\begin{bmatrix}u_0&v_0\end{bmatrix}^\top \) in image space. We model the main lens as parallel to the sensor plane. Deviations from this hypothesis will be compensated for by tangential distortion parameters. Furthermore, we define our camera reference frame as the main lens frame, with \(\varvec{O}\) being the origin, the z-axis coinciding with the optical axis and pointing outside the camera, and the y-axis pointing downwards. Distances are signed according to the following convention: \(F\) is positive when the lens is convergent; distances are positive when the point is real, and negative when virtual.

2.1.2 Distortions

We consider distortions of the main lens. Distortions represent deviations from the theoretical thin lens projection model. To correct those errors, we model the radial and tangential components of the lateral distortions using the model of Brown-Conrady (Brown 1966; Conrady 1919).

Depth distortions have also been studied by Heinze et al. (2016); Zeller et al. (2016), but Zeller et al. (2017); Noury (2019) both empirically observed that the effects of depth distortions, for large focal length and for large object distance, can be neglected compared to stochastic noise of the depth estimation process. Therefore, we do not include depth distortion in our model.

A distorted point \(\varvec{p}= \begin{bmatrix}x&y&z&1\end{bmatrix}^\top \) expressed in the main lens frame after projection (i.e., in the virtual intermediate space) is thus transformed into \(\mathbf {p}_u = \phi \!\left( \mathbf {p}\right) = \begin{bmatrix}x_u&y_u&z&1\end{bmatrix}^\top \) and is computed as

$$\begin{aligned} \left\{ \begin{aligned} x_u=&~ x \left( 1 + Q_1 \varsigma ^2 + Q_2 \varsigma ^4 + Q_3 \varsigma ^6 \right)&\text {[radial]}\\&+ P_1 \left( \varsigma ^2 + 2 x{}^2\right) + 2 P_2 x y&\text {[tangential]}\\ y_u=&~ y \left( 1 + Q_1 \varsigma ^2 + Q_2 \varsigma ^4 + Q_3 \varsigma ^6\right)&\text {[radial]}\\&+ P_2 \left( \varsigma ^2 + 2 y{}^2\right) + 2 P_1 x y&\text {[tangential]}\\ \end{aligned} \right. \end{aligned}$$
(2)

where \(\varsigma ^2 = {x^2+y^2}\). The three coefficients for the radial component are given by \(\left\{ Q_1, Q_2, Q_3\right\} \), and the two coefficients for the tangential by \(\left\{ P_1, P_2\right\} \).

2.1.3 Micro-Lenses Array

We also model the micro-lenses as thin-lenses allowing to take into account blur in the micro-image. The MLA consists then of \(I\) different lens types with focal lengths \(f{}^{\left( i\right) }\) where \(i \in [ 1\mathrel {{.}\,{.}}I]\) which are focused on \(I\) different planes.

We make the hypothesis that all micro-lenses lie on the same plane. The MLA is approximately centered around the optic axis. We define the farthest micro-lens along the \((-x)\)-axis and the \((-y)\)-axis as the origin of the MLA frame, i.e., the center of the upper-left micro-lens. The coordinates axes are orientated the same way as the ones of the main lens. The structural organization of the lenses can be an orthogonal or hexagonal arrangement. The MLA origin is at a distance \(D\) from the main lens and at a distance \(d\) from the sensor. Theoretically, the distances \(\mathrm {d}\!\left( k,l\right) \) and \(\mathrm {D}\!\left( k,l\right) \) are different for each micro-lens \(\left( k,l\right) \) due to the MLA tilt. However, using the approximation of a constant distance d provides adequate results thanks to the low order of the MLA rotation angles’ as shown in Sect. 7, while allowing simpler derivation of the equations.

Furthermore, a detected micro-image center (MIC) usually does not coincide with the optical center of the considered micro-lens. We take into account this deviation in opposition to orthographic projection of MICs which causes inaccuracy in decoded light field. Therefore, the principal point \(\varvec{c}{}^{k,l}_0\) of the micro-lens indexed by \(\left( k,l\right) \) is given by

$$\begin{aligned} \varvec{c}{}^{k,l}_0= \begin{bmatrix}u_0^{k,l}\\ v_0^{k,l}\end{bmatrix}= \frac{d}{D+d}\left( \begin{bmatrix}u_0\\ v_0\end{bmatrix}- \varvec{c}_{k,l}\right) + \varvec{c}_{k,l}, \end{aligned}$$
(3)

where \(\varvec{c}_{k,l}\) is the center of the micro-image \(\left( k,l\right) \) expressed in pixel, as illustrated in Fig. 3 .

2.1.4 Micro-Images Array

Finally, each micro-lens produces a micro-image (MI) onto the sensor. The set of these micro-images has the same structural organization as the MLA. The data can therefore be interpreted as an array of micro-images, called by analogy the micro-images array (MIA). The MIA coordinates are expressed in image space. Let \(\delta _i\) be the pixel distance between two arbitrary consecutive micro-images centers \(\varvec{c}_{k,l}\). With \(s\) the metric size of a pixel, let \(\varDelta _i= s\delta _i\) be its metric value, and \(\varDelta _\mu \) be the metric distance between the two corresponding micro-lens centers \(\varvec{C}_{k,l}\). From similar triangles, the ratio \(\lambda \) between them is given by

$$\begin{aligned} \lambda \triangleq \frac{D}{d+D} = \frac{\varDelta _\mu }{\varDelta _i} \Longleftrightarrow \varDelta _\mu = \lambda \varDelta _i= \frac{D}{d+D} \cdot \varDelta _i\text {.}\end{aligned}$$
(4)

We make the hypothesis that \(\varDelta _\mu \) is equal to the micro-lens aperture.

2.1.5 Camera Configuration

When the camera is in the unfocused configuration, the distance separating the sensor and the MLA is equal to the focal length of the micro-lenses, i.e., \(d= f\). Dealing with the focused plenoptic camera, we usually consider two possible configurations as presented by Georgiev and Lumsdaine (2009): 1) Galilean, when objects are projected behind the image sensor; and 2) Keplerian, when objects are projected in front of the image sensor. When considering micro-lenses as thin-lenses, we have to take into account their focal lengths to configure the camera. In practice, considering an object projected at distance \(b\) by the main lens, four cases are possible but only two are able to produce an exploitable image, i.e., with acceptable amount of blur, onto the sensor: \(b< D\) and \(f< d\) in Keplerian; and, \(b> D\) and \(f> d\) in Galilean. The condition \(b> D\) can be achieved both when \(F> D\) and \(F< D\). The mode of operation is then constrained by the focal length of the micro-lenses, as suggested by Mignard-Debiseet al. (2017).

We introduce then the definition of the internal configuration according to the micro-lens focal length as

(5)

2.2 Modeling Blur Within the Plenoptic Camera

From optics geometry, the image of a point from a circular lens not focused on the sensor can be modeled by the circle of confusion (CoC). Using a camera with a circular aperture, the blurred image is also circular in shape and is called the blur circle. From similar triangles and from the thin-lens equation (eq. 1), the signed blur radius of the image of a point at a distance \(a\) from the lens is expressed as

$$\begin{aligned} \left\{ \begin{aligned}&r&= A\frac{d}{2}\left( \frac{1}{f} - \frac{1}{a} - \frac{1}{d}\right)&\text {[metric]}\\&\rho&= {r}/{s}&\text {[pixel]}\\ \end{aligned} \right. \end{aligned}$$
(6)

with \(s\) being the size of a pixel, and \(A\) the aperture of this lens. In continuous domain, the response of an imaging system to a not in-focus point, i.e., the blur, can be expressed by the point-spread function (PSF). Let \(I\!\left( x,y\right) \) be the observed blurred image of an object at a constant distance. The image can be computed as the convolution of the PSF noted \(\mathrm {h}\!\left( x,y\right) \), with the in-focus image, \({I}^*\!\left( x,y\right) \), such as

(7)

where * denotes the convolution operator. If the lens aperture is circular and the level of blur low, the PSF \(\mathrm {h}\!\left( x,y\right) \) can be efficiently modeled by a two-dimensional Gaussian given by

$$\begin{aligned} \mathrm {h}\!\left( x,y\right) = \frac{1}{2\pi \sigma ^2}\exp \left( -\frac{x^2+y^2}{2\sigma ^2}\right) , \end{aligned}$$
(8)

where the spread parameter \(\sigma \) is proportional to the blur circle radius \(\rho \). Therefore, we can write

$$\begin{aligned} \sigma \propto \rho \Leftrightarrow \sigma = \kappa \cdot \rho \end{aligned}$$
(9)

where \(\kappa \) is a camera constant that should be determined by calibration (Pentland 1987; Subbarao 1989). Note that the spatially-variant spread parameter \(\sigma \) thus depends on the object distance \(a\).

The blur radius \(\rho \) appears at several levels within the camera projection: in the blur introduced by the thin-lens model of the micro-lenses and in the formation of the micro-images while taking a white image. Each micro-lens \(\left( k,l\right) \) projects virtual points onto the sensor at a position \(\left( u,v\right) \), with a blur radius \(\rho \) depending on the distance to the point and the micro-lens type.

2.3 BAP Features and Projection Model

To leverage this blur information, we introduce a new Blur Aware Plenoptic (BAP) feature characterized by its center and its radius, noted \(\varvec{p}= \begin{bmatrix}u&v&\rho&1\end{bmatrix}^\top \). The BAP feature are visualized in Fig. 1. Therefore, our complete plenoptic camera model allows us to link a scene point \(\varvec{p}_w= \begin{bmatrix}x&y&z&1\end{bmatrix}^\top \) to our new BAP feature \(\varvec{p}\) in homogeneous coordinates through each micro-lens \(\left( k,l\right) \) such as

$$\begin{aligned} \begin{bmatrix}u \\ v \\ \rho \\ 1\end{bmatrix}\propto P\left( i,k,l\right) \cdot \varvec{T}_\mu \!\left( k,l\right) \cdot \phi \left( K(F) \cdot T_c \cdot p_w \right) , \end{aligned}$$
(10)

where \(P\left( i,k,l\right) \) is the blur aware plenoptic projection matrix through the micro-lens \(\left( k,l\right) \) of type i, and computed as

$$\begin{aligned}&P\left( i,k,l\right) = \varvec{P}\!\left( k,l\right) \cdot \varvec{K}\!\left( f{}^{\left( i\right) }\right) \nonumber \\ =&\begin{bmatrix}{d}/{s} &{} 0 &{} u_0^{k,l} &{} 0 \\ 0 &{} {d}/{s} &{} v_0^{k,l} &{} 0 \\ 0 &{} 0 &{} \frac{\varDelta _\mu }{2s} &{} -\frac{\varDelta _\mu }{2s}d\\ 0 &{} 0 &{} -1 &{} 0 \end{bmatrix}\begin{bmatrix}1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0\\ 0 &{} 0 &{} -1/f{}^{\left( i\right) } &{} 1 \end{bmatrix}\text {.}\end{aligned}$$
(11)

\(\varvec{P}\left( k,l\right) \) is a matrix that projects the 3D virtual point onto the sensor and taking into account the blur radius. \(\varvec{K}\!\left( f\right) \) is the thin-lens projection matrix for the given focal length. \(\varvec{T}_c\) is the pose of the main lens with respect to the world frame and \(\varvec{T}_\mu \!\left( k,l\right) \) is the pose of the micro-lens \(\left( k,l\right) \) expressed in the camera frame. The function \(\phi \!\left( \cdot \right) \) models the lateral distortions.

Finally, the projection model from eq. 10 consists of a set \(\Xi \) of \(\left( 16+I\right) \) intrinsic parameters to be optimized, including: the main lens focal length \(F\), expressed in \(\varvec{K}\!\left( F\right) \), and its five lateral distortion coefficients \(Q_1\), \(Q_2\), \(Q_3\), \(P_1\), and \(P_2\), expressed in \(\phi \!\left( \cdot \right) \); the sensor translations, encoded in \(d\) and \(\left( u_0, v_0\right) \) through eq. 3, from \(\varvec{P}\!\left( k,l\right) \); the MLA pose, including its three rotations \(\left( \theta _x, \theta _y, \theta _z\right) \) and three translations \(\left( t_x, t_y, D\right) \), and the micro-lens pitch \(\varDelta _\mu \), expressed in \(\varvec{T}_\mu \!\left( k,l\right) \); and, the \(I\) micro-lens focal lengths \(f{}^{\left( i\right) }\), in \(\varvec{K}\!(f{}^{\left( i\right) })\).

2.4 Profiling the Depth of Field of the Plenoptic Camera

From calibrated camera parameters, we can compute the depth of field (DoF) of each micro-lens type and the blur profile – the blur radii as function of the object distance –, in order to profile the plenoptic camera. The analysis can be done with respect to the MLA pose, and then extended to object space by back-projection. A point at a distance \(a\) from MLA is projected back into object space at a distance \(a'\) according to the thin-lens equation through the main lens, such as

$$\begin{aligned} a' = \frac{\left( D- a\right) \cdot F}{\left( D- a\right) - F}\text {.}\end{aligned}$$
(12)

Let \(r_0\) be the minimal acceptable radius of the CoC. The smallest diffraction-limited spot resolved by a lens in wave optics, i.e., the radius of the first null of the Airy disc, is \(r^*= 1.22 \cdot \nu \cdot N^{*}\), where \(\nu \) is the considered light wavelength, and \(N^{*}= d/ A\) is the working f-number of the lens. The minimal acceptable radius is the maximum between this limit and half the size of a pixel, such as \(r_0 = \max \left( r^*, {s}/2\right) \). For a a micro-lens of type (i), the focus plane distance is given by

$$\begin{aligned} a_0^{(i)} = \left( \frac{1}{f^{(i)}} - \frac{1}{d} \right) ^{-1} = \frac{df^{(i)}}{d-f^{(i)}}\text {.}\end{aligned}$$
(13)

Let \(A\) be the micro-lens aperture, we derive then the far \(a_+\) and near \(a_-\) focus planes distances:

$$\begin{aligned} \left\{ \begin{aligned} a_+^{(i)}&= \frac{dA \cdot a_0^{(i)}}{Af{}^{\left( i\right) } - 2r_0\left( a_0^{(i)} - f{}^{\left( i\right) }\right) }&\text {~~~~[far]}\\ a_-^{(i)}&= \frac{dA \cdot a_0^{(i)}}{Af{}^{\left( i\right) } + 2r_0\left( a_0^{(i)} - f{}^{\left( i\right) }\right) }&\text {~~~~[near]}\text {.}\\ \end{aligned} \right. \end{aligned}$$
(14)

The DoF of a micro-lens of type (i) is computed as the distance between the near and far focus planes, such as

$$\begin{aligned} \mathrm {D\!O\!F}^{(i)} = \left| {a_+^{(i)}}\right| - \left| {a_-^{(i)}}\right| = \frac{Af{}^{\left( i\right) } \cdot a_0^{(i)} \cdot 2r_0\left( a_0^{(i)}-f{}^{\left( i\right) }\right) }{\left( Af{}^{\left( i\right) }\right) ^2 - 4r_0^2\left( a_0^{(i)}-f{}^{\left( i\right) }\right) ^2}. \end{aligned}$$
(15)

Note that to fully exploit the combined extended DoF without gaps, the micro-lenses DoF should either just touch or slightly overlap (Perwaß et al. 2012. Finally, under this consideration, the total DoF of the plenoptic camera in MLA space is computed using the micro-lenses DoF as

(16)

We can finally plot the blur profile of the camera, along with the focal planes and the total DoF as illustrated by the Fig. 10.

3 Pre-calibration Using Raw White Images

The goal of the pre-calibration step is to provide a strong initial estimate of the camera parameters. Inspired from depth from defocus theory (Subbarao and Surya 1994), we leverage blur information to estimate our blur radius by varying the main lens aperture and using the different micro-lenses focal lengths, in combination with parameters from the image space. This is achieved by using raw white images acquired with a light diffuser mounted on the main objective, and taken at different apertures. We then show how the blur radii are linked to camera parameters, thus enabling their initialization.

3.1 Micro-Images Array Calibration

First, the micro-images array (MIA) is calibrated using raw white images. We compute the micro-image centers \(\left\{ \varvec{c}_{k,l}\right\} \) by the intensity centroid method with sub-pixel accuracy (Thomason et al. 2014; Noury et al. 2017; Suligaet al. 2018). The distance between two micro-image centers \(\delta _i\) is then computed as the optimized edge-length of a fitted 2D regular grid mesh. The optimization is conducted by non-linear minimization of the distances between the grid vertices and the corresponding detected MICs. The pixel translation offset in image coordinates, \((\tau _x,\tau _y)\), and the rotation around the \(\left( -z\right) \)-axis, \(\vartheta _z\), are also determined during the optimization process.

Fig. 4
figure 4

Formation of a micro-image with its radius \(R\) through a micro-lens while taking a white image using a light diffuser, at an aperture \(A\), in Keplerian internal configuration. The point V is the vertex of the cone passing by the main lens and the considered micro-lens. \(V'\) is the image of V by the micro-lens and \(R\) is the radius of its blur circle.

3.2 Deriving the Micro-Image Radius

In white images taken with a light diffuser and a controlled aperture, each type of micro-lens produces a micro-image (MI) with a specific size and intensity. This provides a mean to distinguish between them (Fig. 5). The process of capturing a white image is equivalent for the micro-lenses to imaging a white uniform object of diameter \(A\) at a distance \(D\). The imaging process is schematized in Fig. 4. Using optics geometry, the image of this object, i.e., the resulting MI, corresponds to the image of an imaginary point V constructed as the vertex of the cone passing through the main lens and the considered micro-lens. Let \(a\) be the signed distance of this point from the MLA plane, expressed from similar triangles and eq. 4 as

$$\begin{aligned} a= -D\frac{\varDelta _\mu }{A-\varDelta _\mu } = - {D}\left( A\left( \frac{d+D}{D}\cdot \frac{1}{\varDelta _i}\right) - 1\right) ^{-1}, \end{aligned}$$
(17)

with \(A\) being the main lens aperture. Note the minus sign is added because the vertex is always formed behind the MLA plane, and thus considered as a virtual object for the micro-lenses. Geometrically, the MI formed is the blur circle of this imaginary point V. Therefore, injecting the latter expression in eq. 6, the metric MI radius \(R\) is given by

$$\begin{aligned} R&= \frac{\varDelta _\mu }{2}d\left( \frac{1}{f} - \frac{1}{a} - \frac{1}{d}\right) \nonumber \\&= \left( \frac{\varDelta _i\cdot D}{d+D}\right) \cdot \frac{d}{2}\cdot \left( \frac{1}{f} + \left( A\left( \frac{d+D}{D}\cdot \frac{1}{\varDelta _i}\right) - 1\right) \frac{1}{D} - \frac{1}{d}\right) \nonumber \\&= A\cdot \frac{d}{2D} + \left( \frac{\varDelta _i\cdot D}{d+D}\right) \cdot \frac{d}{2} \cdot \left( \frac{1}{f} - \frac{1}{D} - \frac{1}{d}\right) \text {.}\end{aligned}$$
(18)

From the above equation, the MI radius \(R\) depends linearly on the aperture of the main lens. However, the main lens aperture cannot be measured directly whereas we have access to the \(f\)-number value. Recall that the \(f\)-number of an optical system is the ratio of the system’s focal length \(F\) to the aperture, \(A\), given by \(N= {F}/{A}\). Finally, we can express the MI radius for each micro-lens focal length type i as

$$\begin{aligned} \mathrm {R_i}\!\left( N^{-1}\right) = m \cdot N^{-1} + q_i \end{aligned}$$
(19)

with

$$\begin{aligned} m = \frac{dF}{2D} \text {~~~and~~~} q_i = \frac{1}{f{}^{\left( i\right) }} \cdot \left( \frac{\varDelta _i\cdot D}{d+D}\right) \cdot \frac{d}{2} - \frac{\varDelta _i}{2} \text {.} \end{aligned}$$
(20)

We thus relate the MI radius to the plenoptic camera parameters. It is a function of fixed parameters (\(d, D, F\)), measured parameters (\(\varDelta _i= s\cdot \delta _i\)) and variable parameters (\(N\) and \(f{}^{\left( i\right) }\) with \(i \in [ 1\mathrel {{.}\,{.}}I]\)).

Fig. 5
figure 5

a Micro-image radii as function of the inverse f-number (in magenta), with their distributions represented by the violin-boxes, for our camera consisting of \(I= 3\) different types. b Each type of micro-lens is identified by its color (type (1) in red, type (2) in green, and type (3) in blue) with its computed radius.

Let \(\varOmega \) be the set of parameters \(\left\{ m, q'_1, \dots , q'_I\right\} \), where \(q_i'\) is the value obtained by

$$\begin{aligned} q_i' = \frac{1}{f{}^{\left( i\right) }} \cdot \left( \frac{\varDelta _i\cdot D}{d+D}\right) \cdot \frac{d}{2} = q_i + \frac{\varDelta _i}{2}\text {.}\end{aligned}$$
(21)

They are used to compute the radius part of the BAP feature and to initialize the camera parameters.

Micro-image radii estimation: From raw white images, we measure each MI radius \(\varrho = \left| R\right| /s\) in pixel based on image moments fitting. We use the second order central moments of the micro-image to construct a covariance matrix. The radius \(\varrho \) is proportional to the computed standard deviation \(\sigma \). Recall that raw moments and centroid are given by

$$\begin{aligned} M_{{ij}}=\sum _{x,y}x^{i}y^{j}{I}\left( x,y\right)&\text { and }&{\displaystyle \{{\bar{x}},\ {\bar{y}}\}=\left\{ {\frac{M_{10}}{M_{00}}},{\frac{M_{01}}{M_{00}}}\right\} }, \end{aligned}$$

and the central moments by

$$\begin{aligned} \mu _{{pq}}=\sum _{{x, y}}(x-{\bar{x}})^{p}(y-{\bar{y}})^{q}{I}\left( x,y\right) \text {.}\end{aligned}$$
(22)

The covariance matrix is then computed as

$$\begin{aligned} \mathrm {cov}\left[ {I}\left( x,y\right) \right] = \frac{1}{\mu _{{00}}} \begin{bmatrix}\mu _{{20}} &{} \mu _{{11}} \\ \mu _{{11}} &{} \mu _{{02}}\end{bmatrix}= \begin{bmatrix}\sigma _{{xx}} &{} \sigma _{{xy}} \\ \sigma _{{yx}} &{} \sigma _{{yy}}\end{bmatrix}\text {.}\end{aligned}$$
(23)

We define \(\sigma \) as the square root of the greatest eigenvalue of the covariance matrix, i.e.,

$$\begin{aligned} \sigma ^2 = \frac{\sigma _{{xx}} + \sigma _{{yy}}}{2} + \frac{\sqrt{{4\sigma _{{xy}}^{2}} + \left( \sigma _{{xx}} - \sigma _{{yy}}\right) ^{2}}}{2} \text {.}\end{aligned}$$
(24)

The estimation is robust to noise, works under asymmetrical distribution and is easy to use, but requires a parameter \(\alpha \) to convert the standard deviation \(\sigma \) into a pixel radius \(\varrho = \alpha \cdot \sigma \). The parameter \(\alpha \) is determined so that at least \(98\%\) of the distribution is taken into account. According to the standard normal distribution Z-score table, \(\alpha \) is picked up in \(\left[ 2.33, 2.37\right] \). In our experiments, we set \(\alpha = 2.357\) as it best fits our measurements.

Recall that the pixel MI radius is given by \(\varrho = \left| R\right| /s\). The metric radius is either positive if formed after the rays inversion, as in Fig. 4, or negative if before, and thus depends on the internal configuration such as

$$\begin{aligned} R= \left\{ \begin{aligned} \varrho \cdot s&\text {[Keplerian \textit{internal} configuration]},\\ - \varrho \cdot s&\text {[Galilean \textit{internal} configuration]}\text {.}\end{aligned} \right. \end{aligned}$$
(25)

Coefficients estimation: Given several raw white images taken at different apertures, we estimate the parameters \(\varOmega \), i.e., the coefficients of eq. 19, for each type of micro-image. Note that the standard full-stop \(f\)-number conventionally indicated on the lens differs from the real f-number. We use then the f-number calculated from the aperture value \(\mathrm {A\!V}\) by \(N= \sqrt{2^{\mathrm {A\!V}}}\). The coefficient m is a function of fixed physical parameters independent of the micro-lens focal lengths and the main lens aperture. Therefore, we obtain a set of linear equations, sharing the same slope, but with different y-intercepts. With \(\varvec{X} = \begin{bmatrix}m&q_1&\dots&q_I\end{bmatrix}^\top \), the set of equations can be linearly rewritten as

$$\begin{aligned} \varvec{A}\varvec{X} = \varvec{B}, \text { and then } \varvec{X} = \left( \varvec{A}^\top \varvec{A}\right) ^{-1}\varvec{A}^\top \varvec{B} \end{aligned}$$

where the matrix \(\varvec{A}\), containing the f-numbers and a selector of the corresponding y-intercept coefficient, and the vector \(\varvec{B}\), containing the radii measurements, are constructed by arranging the terms given the focal length at which they have been calculated. Finally, we compute \(\varvec{X}\) with a least-square estimation. Figure 5 shows an example of radii distributions from our experiments computed from white images taken at several f-numbers, and the estimated linear functions. In practice, at least two aperture configurations are required. More can be used to improve the estimation but at the condition that radii measurement distributions are distinguishable from each others, with small overlap.

3.3 Camera Parameters Initialization

First, the pixel size \(s\) is set according to the manufacturer values. The main lens focal length \(F\) is also initialized from them. Given the parameters \(\varOmega \) and the focus distance \(h\), the parameters \(d\) and \(D\) are initialized as

$$\begin{aligned} d\longleftarrow \frac{2mH}{F + \xi \cdot 4m} \text {~~~~and~~~~} D\longleftarrow H- \xi \cdot 2d, \end{aligned}$$
(26)

with \(\xi = 1\) (resp.,  \(\xi = -1\)) in Galilean (resp., Keplerian) internal configuration, and where \(H\) is given by Eq. (17) of (Perwaß et al. 2012),

$$\begin{aligned} H= \left| \frac{h}{2}\left( 1-\sqrt{1 - 4\frac{F}{h}}\right) \right| \text {.}\end{aligned}$$
(27)

For completeness, note that the unfocused configuration can be initialized with \(d\leftarrow 2m\) and \(D\leftarrow F\).

In a second step, all distortions coefficients are set to zero. The principal point is set as the center of the image. The sensor plane is thus set parallel to the main lens plane, with no rotation, at a distance \(- \left( D+d\right) \). Seemingly, the MLA plane is initially set parallel to the main lens plane at a distance \(-D\). From the pre-computed MIA parameters, the MLA translation takes into account the (xy)-offsets \(\left( -s\tau _x, -s\tau _y\right) \) and the rotation around the z-axis is initialized with \(-\vartheta _z\). The micro-lenses pitch \(\varDelta _\mu \) is set according to eq. 4, where the ratio \(\lambda \) is computed using eq. 26 such as

(28)

Finally, the initial micro-lenses’ focal lengths are also computed from the parameters \(\varOmega \) as follows

$$\begin{aligned} f{}^{\left( i\right) } \longleftarrow \frac{d}{2 \cdot q_i'}\cdot \varDelta _\mu \text {.}\end{aligned}$$
(29)

Experiments will show that the initial model is close to the optimized model.

4 BAP Features Detection in Raw Images

At this point, the MIA is calibrated and micro-images centers are extracted. The raw images are devignetted by dividing them by a white raw image taken with the same aperture. We based our method on a checkerboard calibration pattern. The detection process is divided into two steps: 1) checkerboard images are processed to extract corners at position \(\left( u,v\right) \); and 2) with the set of parameters \(\varOmega \) and the associated virtual depth estimate for each corner, the corresponding BAP feature is computed in image space.

4.1 Computing Blur Radius Through Micro-Lens

To respect the f-number matching principle (Perwaß et al. 2012), we configure the main lens \(f\)-number such that the micro-images fully tile the sensor without overlap. In this configuration the working \(f\)-number of the main imaging system and the micro-lens imaging system should match. We consider the general case of measuring an object \(\varvec{p}\) at a distance \(a\) from the main lens. First, \(\varvec{p}\) is projected through the main lens according to the thin lens equation, \({1}/{F} = {1}/{a}+{1}/{b}\), resulting in a point \(\varvec{p}'\) at a distance \(b\) behind the main lens, i.e., at a distance \(a' = D- b\) from the MLA. From eq. 6, the metric radius of the blur circle \(r\) of a point \(\varvec{p}'\) at distance \(a'\) through a micro-lens of type (i) is expressed as

$$\begin{aligned} r&= \left( \frac{\varDelta _iD}{d+D}\right) \cdot \frac{d}{2} \cdot \left( \frac{1}{f{}^{\left( i\right) }} - \frac{1}{a'} - \frac{1}{d} \right) \nonumber \\&= \underbrace{ \frac{\varDelta _i\cdot D}{d+D} \cdot \frac{d}{2} \cdot \frac{1}{f{}^{\left( i\right) }} }_{= q'_i [21]} - \underbrace{ \frac{\varDelta _i\cdot D}{d+D} \cdot \frac{d}{2} \cdot \frac{1}{d} }_{= \varDelta _\mu /2 [4]} - \underbrace{ \frac{\varDelta _i\cdot D}{d+D} }_{= \varDelta _\mu []} \cdot \frac{d}{2} \cdot \frac{1}{a'}\nonumber \\&= \left( -\varDelta _\mu \cdot \frac{d}{2}\right) \cdot \frac{1}{a'} + \left( q_i' - \frac{\varDelta _\mu }{2}\right) \text {.}\end{aligned}$$
(30)

In practice, \(a'\) and \(d\) cannot be measured in raw image space, but the virtual depth can, as it will be shown in the next subsection. Virtual depth refers to relative depth value obtained from disparity. It is defined as the ratio between the signed object distance \(a'\) and the sensor distance \(d\):

$$\begin{aligned} \upsilon = -\frac{a'}{d}\text {.}\end{aligned}$$
(31)

The sign convention is reversed for virtual depth computation. Distances are negative in front of the MLA plane. If we re-inject the virtual depth in eq. 30, taking caution of the sign, and using eq. 4, we can derive the radius of the blur circle of a point \(\varvec{p}'\) at a distance \(a'\) from the MLA by

$$\begin{aligned} r= \frac{\lambda \varDelta _i}{2}\cdot \upsilon ^{-1} + \left( q_i' - \frac{\lambda \varDelta _i}{2}\right) \text {.}\end{aligned}$$
(32)

This equation allows to express the pixel radius of the blur circle \(\rho = r/s\) associated to each point having a virtual depth without explicitly evaluating the physical parameters \(A, D, d, F\text { and }f{}^{\left( i\right) }\) of the camera, directly in image space.

4.2 Features Extraction

First, we detect corners in raw images using the detector introduced by Noury et al. (2017) with sub-pixel accuracy in each micro-image. With a plenoptic camera, contrarily to a classic camera, a same point in object space is projected into multiple observations onto the sensor. The checkerboard is designed and positioned so that the sets of observations are sufficiently far from each others to be clustered. We use the DBSCAN algorithm (Ester et al. 1996) to identify the clusters. We then associate each point with its cluster of observations.

Secondly, once each cluster is identified, we compute the virtual depth \(\upsilon \) from the disparity. Let \(\varDelta \!\varvec{C}_{1-2}\) be the distance between the centers of the micro-lenses \(\varvec{C}_1\) and \(\varvec{C}_2\), i.e., the baseline. Let \(\varDelta \!\varvec{p}= \left| \varvec{p}_1 - \varvec{p}_2\right| \) be the Euclidean distance between images of the same point in corresponding micro-images. The virtual depth \(\upsilon \) is calculated with the intercept theorem:

$$\begin{aligned} \upsilon = \frac{\varDelta \!\varvec{C}_{1-2}}{\varDelta \!\varvec{C}_{1-2} - \varDelta \!\varvec{p}} = \frac{\eta \cdot \varDelta _\mu }{\eta \cdot \varDelta _\mu - \varDelta \!\varvec{p}} = \frac{\eta \cdot \lambda \varDelta _i}{\eta \cdot \lambda \varDelta _i- \varDelta \!\varvec{p}}\text {.}\end{aligned}$$
(33)

If we consider two adjacent micro-lenses, the baseline \(\varDelta \!\varvec{C}_{1-2}\) is just the diameter of a micro-lens, i.e., \(\varDelta _\mu = \lambda \varDelta _i\) and \(\eta = 1\). For further apart micro-lenses the baseline is a multiple of that diameter, where \(\eta \) is not necessarily an integer. To handle noise in corner detection, we use a median estimator to compute the virtual depth of the cluster, taking into account all combinations of point pairs in the disparity estimation.

Finally, we compute the BAP features from eq. 32, using the set of parameters \(\varOmega \) and the available virtual depth \(\upsilon \). In each frame n, for each micro-image \(\left( k,l\right) \) of type (i) containing a corner at position \(\left( u,v\right) \) in the image, the feature \(\varvec{p}{}^{n}_{k,l}\) is given by

$$\begin{aligned} \varvec{p}{}^{n}_{k,l}= \begin{bmatrix}u&v&\rho&1\end{bmatrix}^\top , \text { with }\rho = r/ s\text {.}\end{aligned}$$
(34)

In the end, our observations are composed of a set of micro-images centers \(\left\{ \varvec{c}_{k,l}\right\} \) and a set of BAP features \(\left\{ \varvec{p}{}^{n}_{k,l}\right\} \) allowing us to introduce two reprojection error functions corresponding to each set of features as explains in the next section.

5 Camera and Relative Blur Calibration

To retrieve the parameters of our camera model (eq. 10), we use a calibration process based on non-linear minimization of reprojection errors. The camera calibration process is divided into three phases: 1) the initial intrinsics are provided by the pre-calibration step; 2) the initial extrinsics are estimated from the raw checkerboard images; and 3) the parameters are refined with a non-linear optimization leveraging our new BAP features. In parallel, using our BAP features, the blur proportionality coefficient of eq. 9 is calibrated, by minimizing the relative blur in a new reprojection error with a non-linear optimization.

5.1 Camera Model Initialization

Iterative optimization of non-linear cost functions are sensitive to initial parameters setting. To ensure convergence and to avoid falling into local minima during the process, the parameters must be carefully initialized close to the solution. Our pre-calibration step provides a strong initial solution for the optimization. Intrinsic parameters are initialized as explained in Sect. 3.3 using only raw white images.

The camera poses \(\left\{ \varvec{T}_c^n\right\} \), i.e., the extrinsic parameters, are initialized using the same method as by (Noury et al. 2017). For each cluster of observations, the barycenter is computed, as illustrated by Fig. 6. Those barycenters can been seen as the projections of the checkerboard corners through the main lens using a standard pinhole model. For each frame, the pose is then estimated using the PnP algorithm (Kneip et al. 2011), like in classic pinhole imaging system. To associate 3D-2D correspondences, we reproject checkerboard corners based on the estimated pose in image space and link them to their nearest cluster of observations.

Fig. 6
figure 6

Checkerboard raw image with: a clusters of observations; b their barycenter used as approximation for extrinsics initialization.

5.2 Optimizing the Camera Parameters

By introducing blur in our model, we can optimize all parameters within one single optimization process. We propose a new cost function \(\varTheta \) taking into account the blur information of our new BAP feature. The cost is composed of two main terms both expressing errors in the image space: 1) the blur aware plenoptic reprojection error and 2) the main lens center reprojection error.

In the first term, for each frame n, each checkerboard corner \(\varvec{p}{}^{n}_w\) is reprojected into the image space through each micro-lens \(\left( k,l\right) \) of type (i) according to the projection model of eq. 10 and compared to its observations \(\varvec{p}{}^{n}_{k,l}\). In the second term, the main lens center \(\varvec{O}\) is reprojected according to a pinhole model in the image space through each micro-lens \(\left( k,l\right) \) and compared to its detected micro-image center \(\varvec{c}_{k,l}\). Let \({S} = \left\{ \Xi , \left\{ \varvec{T}_c^n\right\} \right\} \) be the set of intrinsic \(\Xi \) and extrinsic \(\left\{ \varvec{T}_c^n\right\} \) parameters to be optimized. The cost function \(\varTheta ({S})\) is expressed as

$$\begin{aligned} \varTheta ({{S}})= \sum \left\| \varvec{p}_{k,l}^{n}-\Pi _{k,l}\left( \varvec{p}_{w}^{n}\right) \right\| ^2+\sum \left\| \varvec{c}_{k,l}-\Pi _{k,l}(\varvec{O})\right\| ^2. \end{aligned}$$
(35)

The optimization is conducted using the Levenberg-Marquardt algorithm.

5.3 Relative Blur Calibration Using BAP Features

Relative blur estimation has been studied by Ens and Lawrence (1993); Mannan and Langer (2016). Up to our knowledge, it has never been studied in context of plenoptic camera. As a new contribution, we leverage the relative blur between different micro-images and our BAP features to calibrate the blur proportionality coefficient \(\kappa \) of eq. 9.

Relative blur model: A point imaged by two different micro-lenses of type (i) and (j) will have different blur amount, i.e., the resulting images will have different spread parameters for the PSF model, such as

(36)

where \({I}^*\!\left( x,y\right) \) is the latent in-focus image. We approximate the PSF with a 2D Gaussian as in eq. 8, where the diameter of the blur kernel \(\mathrm {h}_{(i)}\) is \(\sigma _{(i)}\). To compare two views with different amount of blur, we use the relative blur model in spatial domain (Pentland 1987; Subbarao 1988; Subbarao and Surya 1994; Ens and Lawrence 1993). As stated by Mannan and Langer (2016), the Gaussian relative blur approximation works well mainly for small relative blurs (up to \(\rho \approx 5\) pixels) and when the aperture has a simple shape, which is the case with the plenoptic camera. We then use the equally-defocused representation by applying additional blur to the relatively in-focus micro-image, hence,

(37)

Note that \(\mathrm {h}_{r}\) is the relative blur kernel applied to either one of the views such that both views are equally-defocused. The diameter of the relative blur kernel \(\mathrm {h}_{r}\) is approximated as

$$\begin{aligned} \sigma _r(i,j) \simeq \sqrt{|{\sigma _{(i)}^2 - \sigma _{(j)}^2}|}\text {.}\end{aligned}$$
(38)

This approximation is exact when the PSF is a Gaussian. Since the radius of the relative blur kernel \(\sigma _r\) cannot indicate whether the (i) or the (j) view is more in-focus than the other, we define the relative blur similarly to Chen et al. (2015), as

$$\begin{aligned} \varDelta \!\sigma ^2(i,j) \triangleq {\sigma _{(i)}^2 - \sigma _{(j)}^2 }, \end{aligned}$$
(39)

where \(\varDelta \!\sigma ^2(i,j) > 0\) indicates that a pixel in the (j)-micro-image is more in-focus than its corresponding pixel in the (i)-micro-image. Symmetrically, \(\varDelta \!\sigma ^2(i,j) < 0\) indicates that the (i)-micro-image is more in-focus. In a similar fashion, we define the relative blur radius as

$$\begin{aligned} \rho _r(i,j) \simeq \sqrt{\left| \varDelta \!\rho ^2(i,j)\right| } = \sqrt{|{\rho _{(i)}^2 - \rho _{(j)}^2}|} \end{aligned}$$
(40)

with \(\sigma _r = \kappa \cdot {\rho _r}\), and where \(\rho _{(i)}, \rho _{(j)}\) are the blur radii of the BAP features through a micro-lens of type (i) and (j).

Blur proportionality coefficient calibration: To calibrate \(\kappa \), we use our BAP features and the relative blur model applied on micro-images of different types. BAP features \(\left\{ \varvec{p}_i\right\} \) from a same cluster \(\textsf {C}\) represent the same point in object space \(\varvec{p}{}_w\). We extract two windows \(\textsf {W}\) around the BAP features \(\varvec{p}{}_{i}, \varvec{p}{}_{j} \in \) C \(\left( \varvec{p}{}_w\right) \) of different types, and express them using the equally-defocused representation (eq. 37). As the relative blur radius does not exceed 2.5pix, windows \(\textsf {W}\) of size \(9 \times 9\) are extracted at \(\left( u, v\right) \) with sub-pixel precision, and represent therefore the same part of the scene in both micro-images. Additional blur is applied using a Gaussian kernel of spread parameter \(\sigma _r\). The spread parameter is computed from the \(\rho \) part of the BAP features and the parameter \(\kappa \) to be optimized, with initial value \({\kappa } = 1\). Let \(\varTheta (\kappa )\) be the cost function to be minimized. It is expressed as

(41)

given \(\left| \rho _{(i)}\right| < \left| \rho _{(j)}\right| \) and where \(\mathrm {h}_{r}\) is the PSF with spread parameter \(\sigma _r = \kappa \cdot \sqrt{|{\rho _{(i)}^2 - \rho _{(j)}^2}|}\). The optimization is conducted using the Levenberg-Marquardt algorithm.

Fig. 7
figure 7

Example of calibration targets acquired for distances between 775 and 400 mm from the checkerboard used in the dataset R12-B, and their respective poses in 3D.

6 Experimental Setup

To validate our camera model, we evaluate our method on real-world data obtained with a multi-focus plenoptic camera in a controlled environment. Our experimental setup is illustrated in Fig. 1. The camera is mounted on a linear motion table with micro-metric precision. The target plane is orthogonal to the translation axis, and the camera optical axis is aligned with this axis. The approximate absolute distances at which the images have been taken with the corresponding step lengths are reported in Table 1.

Table 1 Summary of R12-A,B,C,D, and UPC-S datasets contents

6.1 Hardware Environment

For our experiments we used a Raytrix R12 color 3D-light-field-camera, with a MLA of F/2.4 aperture. The camera is in Galilean internal configuration. We used two different mounted lens, a Nikon AF Nikkor F/1.8D with a 50 mm focal length for comparison with state-of-the-art, and a Nikon AF DC-Nikkor F/2D with a 135 mm focal length to validate the generalization of our model. The MLA organization is hexagonal row-aligned, and composed of \(176\times 152\) (width \(\times \) height) micro-lenses with \(I=3\) different types. The sensor is a Basler beA4000-62KC with a pixel size of \(s= 0.0055\) mm. The raw image resolution is \(4080\times 3068\) pixel. We calibrate our camera for four focus distance configurations, with \(h\in \left\{ 450, 1000, \infty \right\} \) mm for the 50 mm lens, and with \(h= 1500\) mm for the 135 mm lens. Note that when changing the focus setting, the main lens moves with respect to the block MLA-sensor.

6.2 Software Environment

All images have been acquired using the MultiCamStudio free software (v6.15.1.3573) of the Euresys company. We set the shutter speed to 5 ms. While taking white images for the pre-calibration step, we set the gain to its maximum value. For Raytrix data, we use their proprietary software RxLive (v4.0.50.2) to calibrate the camera, and compute the depth maps used in the evaluation. Our source code has been made publicly available: https://github.com/comsee-research/libpleno, and https://github.com/comsee-research/compote.

6.3 Datasets

We build four datasets with different focus distance \(h\): for the 50 mm lens, R12-A for \(h= 450\) mm, R12-B for \(h= 1000\) mm, and R12-C for \(h= \infty \); for the 135 mm lens, R12-D for \(h= 1500\) mm. Each dataset is composed of:

  • white raw plenoptic images acquired at different apertures (\(N\in \left\{ 4, 5.66, 8, 11.31, 16\right\} \)) using a light diffuser mounted on the main objective for pre-calibration,

  • free-hand calibration target images acquired at various poses (in distance and orientation), separated into two subsets, one for the calibration process (16 images) and the other for reprojection error evaluation (15 images),

  • a white raw plenoptic image acquired in the same luminosity condition and with the same aperture as in the calibration targets acquisition for devignetting,

  • and, calibration targets acquired with a controlled translation motion for quantitative evaluation, along with the depth maps computed by the RxLive software.

Examples of calibration targets acquired for the R12-B dataset are given in Fig. 7 along with their 3D poses. A summary for each dataset is given in Table 1, indicating checkerboard information and the distances at which the targets have been acquired for calibration and for the controlled evaluation. Our datasets have been made publicly available, and can be downloaded from our public repository at https://github.com/comsee-research/plenoptic-datasets.

6.4 Simulation Environment

In order to validate our model on Lytro-like plenoptic camera configuration, i.e., unfocused plenoptic camera (UPC), we propose to evaluate our model in a simulation environment. We built our own simulator based on raytracing to generate images. Similar to the real-world dataset, we generated a dataset, named UPC-S, composed of several white images taken at different apertures (with \(N \in \left\{ 2, 4, 5.6\right\} \)), various checkerboard poses for calibration and validation, and for evaluation, checkerboard images with known translation along the z-axis. Details are also given in Table 1. We used the Lytro Illum intrinsic parameters reported in Table 4 of Bok et al. (2017) as baseline for the simulation. They have been converted into our parameters and reported in Table 3. The MLA organization is hexagonal row-aligned, and composed of \(541\times 434\) (width \(\times \) height) micro-lenses of the same type (\(I= 1\)). The raw image resolution is \(7728\times 5368\) pixel, with a pixel size of \(s= 0.0014\) mm and with micro-image of radius 7.172 pixel.

7 Results and Discussions

Our evaluation process follows the steps given in the overview (Fig. 2). First, we present the pre-calibration results, where white raw plenoptic images are used for computing micro-image centers, and for estimating initial camera parameters. Second, from the set of devignetted calibration target images, BAP features are extracted, and camera intrinsic and extrinsic parameters are then computed using our non-linear optimization process. In parallel, the same BAP features are also used to calibrate the relative blur proportionality coefficient. Third, we evaluate our model quantitatively, firstly, using the reprojection error as a metric, and secondly, using the relative translation error in a controlled environment. Then, we propose an ablation study of the camera parameters. Finally, we illustrate how to characterize the plenoptic camera extended DoF using the blur profile.

7.1 Pre-calibration

To estimate the parameters \(\varOmega \), we set \(\alpha = 2.357\), and since the camera is in Galilean internal configuration, we use \(R= - \varrho \cdot s\),  following eq. 25. Fig. 5 shows the micro-image radii as function of the inverse f-number with the estimated lines for dataset R12-B. Their distributions are represented by the violin-boxes. For \(N=5.66\), we can see that radii distributions overlap, and that radii values are slightly overestimated as they do not fit exactly the borders of the micro-images. In practice, we only use white images that present distinguishable radii distributions in the estimation process, usually corresponding to small apertures. In case of R12-B, only white images at \(N=11.31\) and \(N=8\) are used. The corresponding coefficients for all datasets are summarized in Table 2. As expected, the parameter m is different for each dataset, since \(D\) and \(\varDelta _i\) vary with the focus distance \(h\), whereas the \(q'_i\) values are close for all datasets, even for different camera setup (R12-D).

Table 2 Set of parameters \(\varOmega \) (in [\(\mu \hbox {m}\)]) computed during the pre-calibration step for each dataset, along with the calibrated relative blur proportionality coefficient.
Table 3 Initial intrinsic parameters for each dataset along with the optimized parameters obtained by our method (BAP) and with the methods of Noury et al. (2017) (NOUR), of Nousias et al. (2017) for each micro-lens type (NOUS1, NOUS2, NOUS3) and the parameters obtained from RxLive software (RTRX)

7.2 Free-Hand Camera Calibration

Comparison with state-of-the-art: Since our model is close to the one of Noury et al. (2017), we compare our intrinsics with the ones obtained under their pinhole assumption using only corner reprojection error and with the same initial parameters. In addition, we evaluate against the method of Nousias et al. (2017), which provides a set of intrinsics and extrinsics for each micro-lens type. The equivalence of our parameters and their parameters is given by

$$\begin{aligned} \begin{aligned}&F&= \frac{(f_x + f_y)}{2} \cdot s,&D= -F\cdot \left( \frac{K_1}{K_2}\cdot F+ 1\right) ^{-1},\\&d{}^{\left( i\right) }&= D- \frac{K_2 D}{D+ K_2},&u_0= c_x~~\text { and }~~v_0= c_y, \end{aligned} \end{aligned}$$
(42)

where \(K_1\) and \(K_2\) are the two additional intrinsic parameters that account for the MLA setting in their model. The equivalence also stands for the parameters of Bok et al. (2017). The provided detector from Nousias et al. (2017) was not able to detect corner observations on our datasets. Therefore, we used the same observations for our method (noted BAP in Table 3), Noury et al. (2017) method (NOUR), and Nousias et al. (2017) method for each type (NOUS1, NOUS2, and NOUS3), which allowed us to focus the comparison on the camera model only. Finally, we provide the calibration parameters obtained from the RxLive software (RTRX) corresponding to the model of Heinze et al. (2016), and compare our depth measurements to their depth maps.

Initialization: We initialize \(\lambda \) from eq. 28. Its value for each dataset is reported in Table 2. The difference between the initial value of \(\lambda \) and its value computed from optimized camera parameter is less than 0.024%, which validates the use of the initial value from eq. 28 when computing our BAP features. The initial camera parameters reported in Table 3 are computed using the methodology presented in Sect. 3.3. They are used for the BAP and NOUR methods. The camera internal configuration is set to Galilean. When \(h\) decreases, \(D\) increases. Yet when the main lens focus distance is at infinity, the main lens should focus on the plane \(\upsilon = 2\), which implies that \(D\) tends to \(F- 2d\) as lower bound, as \(H\) tends to \(F\). In most cases (here, for R12-A,B,D), we will still have \(F< D\), which usually can describe the camera in Keplerian configuration. In Keplerian internal configuration, the condition \(F< D\) stands regardless of the focus distance, as \(D\) lower bound is \(F+ 2d\).

When using the linear initialization from NOUS, the initial parameters of some configurations corresponded to impossible physical setup or were too far from the solution, hindering the convergence of the optimization. Therefore, in order to continue comparison, we manually set the initial parameters close enough to a solution. In contrast, we can see that the optimized parameters for BAP and NOUR are close to initial values, which shows that our pre-calibration step provides a strong initial solution for the optimization process.

Intrinsic camera parameters: Optimized intrinsic parameters are reported for each dataset and for all the evaluated methods in Table 3. First, BAP, NOUR and NOUS all verify the condition \(F\approx D+ 2d\) when the focus is set at infinity (R12-C). Second, the focal lengths obtained from NOUR, NOUS and RTRX change significantly given the focus distance, and the ones obtained from NOUS even vary according to the micro-lens types. In contrast, only BAP shows stable parameters across all three R12-A,B,C datasets. Shared parameters across datasets (i.e., the focal lengths and the distance between the MLA and the sensor) are close enough to indicate that our model successfully generalizes to different focus configurations. Furthermore, the parameters obtained by our method with an other main lens, i.e., R12-D, are coherent with the previously obtained parameters, stressing out that our model can be applied to a different camera setting. Finally, our method is the only one providing the micro-lenses focal lengths in a single unified model. The other methods calibrate either several MLA-sensor distances (RTRX), or several models, one for each type (NOUS).

Note that distortion coefficients and MLA rotations are close to zero. The influence of these parameters will be analyzed in the proposed ablation study of the camera model in Sect. 7.4.

On simulated data: First, pre-calibration has been performed using the white raw images. The resulting parameters \(\varOmega \) are coherent with the simulation parameters. With parameters \(m = -23.639\mu \hbox {m}\) and \(q = -0.146\mu \hbox {m}\), we have \(d\approx f\), which describes the unfocused configuration. Reference and initial intrinsic parameters are reported in Table 3, along with the optimized parameters. Second, calibration has been performed. The obtained intrinsic parameters are close enough to the references parameters, indicating that our method is able to generalize to the unfocused plenoptic camera.

For completeness, we also quantitatively evaluated the optimized parameters, by estimating the relative displacement between checkerboard with known motion along the z-axis. It results a translation error \(\varepsilon _z = 1.64\%\), which validates the model.

7.3 Quantitative Evaluations of the Camera Model

Reprojection error: In the absence of ground truth, we first evaluated the intrinsic parameters by estimating the reprojection error using the previously computed intrinsics. We consider only free-hand calibration target images which are not used in the calibration process. We use the RMSE as a metric to evaluate the reprojection error on the corner part of the features, for each dataset. For the BAP method, the corner reprojection part is reported in Table 4, as well as the radius reprojection part within parentheses. Regarding the NOUS methods, the original error is expressed using the mean reprojection error (MRE). We converted the final error to the RMSE metric for comparison. Note that the latter method operates separately on each type of micro-lens, meaning that the number of features is not the same as with NOUR and BAP. First, the reprojection error is less than 1 pixel for all methods, for each dataset, demonstrating that the computed intrinsics lead to an accurate reprojection model and can be generalized to images which are not from the calibration set. Second, even though the NOUS method provides the lowest RMSE, it shows a significant discrepancy according to the considered type. The error obtained by our method is sightly higher than the error from NOUR, but this can be explained by the fact that our optimization does not aim at minimizing only the corner reprojection error, but the radius reprojection as well. Note that the positional error \(\varepsilon _{u,v}\) predominates in the total cost by two orders of magnitude compared to the blur radius error \(\varepsilon _\rho \), but the latter still helps to constrain our model as shown by the relatively close intrinsics between the datasets.

Table 4 Corner reprojection error for each evaluation dataset (i.e., free-hand calibration target images not part of the calibration dataset) using the RMSE metric

Controlled environment poses evaluation: With our experimental setup, we acquired several images with known relative translation between each frame. We compare the estimated displacements along the z-axis from the extrinsic parameters to the ground truth. The extrinsics are computed with the models estimated from the free-hand calibration. In the case of the RTRX method, we use the filtered depth maps obtained with the proprietary software RxLive to estimate the displacements.

Fig. 8
figure 8

Translation error along the z-axis with respect to the ground truth displacement from the closest frame, for datasets R12-A (a), R12-B (b) and R12-C (c). The error \(\varepsilon _z\) is expressed in percentage of the estimated distances, and truncated to 7% to ease the readability and the comparison. The mean error with its confidence interval across all datasets for our method (BAP), (Noury et al. 2017 )method (NOUR), (Nousias et al. 2017) method for each type (NOUS1, NOUS2, NOUS3), and for the proprietary software RxLive (RTRX) are reported in (d). Please refer to the color version for better visualization.

The translation errors along the z-axis with respect to the ground truth displacement from the closest frame are reported in Fig. 8 for datasets R12-A (a), R12-B (b) and R12-C (c). The relative error \(\varepsilon _z\) for a known displacement \(\delta _z\) is computed as the mean absolute relative difference between the estimated displacement \(\hat{\delta _z}\) and the ground truth, for each pair of frames \(\left( \varvec{T}_i, \varvec{T}_{j}\right) \) separated by a distance \(\delta _z\), i.e.,

(43)

where \(\hat{\delta _z} = \hat{z_i} - \hat{z_j}\), and \(\eta \) is a normalization constant corresponding to the number of frames pair.

The mean error with its standard deviation across all datasets for BAP, NOUR, NOUS, and RTRX are reported in (d).

Firstly, the mean error across R12-A,B,C datasets are of the same order for the evaluated methods around 3%: for BAP, \(\varepsilon _z = 2.92 \pm 0.73 \%\); for NOUR, \(\varepsilon _z = 3.50 \pm 3.08 \%\); for NOUS1, \(\varepsilon _z = 1.68 \pm 1.53 \%\); for NOUS2, \(\varepsilon _z = 3.40 \pm 2.19 \%\); for NOUS3, \(\varepsilon _z = 3.30 \pm 3.35 \%\); and, for RTRX, \(\varepsilon _z = 4.96 \pm 4.44 \%\). This is also the case for the dataset R12-D where our model has a mean translation error of \(\varepsilon _z = 3.37 \%\). Note that all evaluated methods outperform RTRX as the depth maps computation might not be as precise as the optimization of extrinsic parameters. Our method ranks second in terms of relative mean error. Even though lowest error is obtained by the method NOUS for type (1), it presents a large standard deviation and the errors for the other two types are significantly higher. In real application context, there is no way to know in advance which type will produce the smallest error. Nousias et al. (2017) suggest that when extrinsics are sufficiently close, we can use representative extrinsics that are calculated by averaging the extrinsics from the individual types. Our results do not match this observation as the estimated extrinsics are significantly different for each type. As shown, only the first type gives satisfactory results whereas the other two present a larger error with a significant standard deviation. Averaging the extrinsics from all types will therefore minimize the difference between poses but will not provide the best possible estimation.

Secondly, the standard deviation can be seen as an indicator of the estimation precision across the datasets, and thus indicates whether the model can generalize to several configurations or not. Our model presents the lowest standard deviation as illustrated in Fig. 8 (d). This indicates a low discrepancy between datasets and thus that the model is precise and consistent for all configurations.

Table 5 Ablation study of some camera parameters. For each dataset, the reprojection error \({\varepsilon _{all}}\), computed using the RMSE along with the relative translation error \({\varepsilon _z}\), expressed in %, are reported

Thirdly, we analyze the behavior of each method for each dataset across different distances. None of the methods suffered from a constant bias, as we do not observe a decreasing relative error as the distance increases. BAP and NOUR present a stable relative error for all distances, i.e., with approximately 0.3 % of standard deviation. This indicates that the estimation suffered only from a scale error. One could thus re-scale the poses to provide a precise and accurate estimation. We cannot draw any conclusion for the other methods since the variations do not follow any obvious pattern.

Finally, our model differs from the model of Noury et al. (2017) by modeling the micro-lens focal lengths. Comparing those two models, the mean error as well as the standard deviation is smaller with our method. The inclusion of the micro-lens focal lengths in the camera model improves the estimation precision and accuracy, and enables to generalize to several configurations. Dealing with different intrinsics which produce different extrinsics is not satisfactory when using the multi-focus plenoptic camera. In contrast, our model is able to manage all micro-lens types simultaneously, and proves to be stable across various configurations and working distances.

7.4 Ablation Study of Camera Parameters

To evaluate the influence of each parameter of the camera model, we present an ablation study of some of them. We focus the analysis on distortion coefficients (\(Q_1\), \(Q_2\), \(Q_3\), \(P_1\), and \(P_2\)), on some degrees of freedom of the MLA, especially its tilt with respect to the sensor (\(\theta _x, \theta _y\)), and the pitch between micro-lenses (\(\varDelta _\mu \)). All combinations of the parameters have been tested, resulting in eight configurations. For each configuration and on each dataset of R12-A,B,C: first, we calibrate the camera intrinsic parameters; second, we evaluate the model using the RMSE of the reprojection error; and finally, we quantitatively estimate the relative translation error on the evaluation dataset. Each configuration has been initialized with the same intrinsic parameters, and used the same observations for all processes. Results are reported in Table 5. The first column is the configuration number. The Tilt column indicates if we keep (\(\checkmark \)) or remove (\(\times \)) the parameters \(\theta _x\) and \(\theta _y\). The Pitch column stands for the parameter \(\varDelta _\mu \), and the column Dist for the distortion parameters \(Q_1\), \(Q_2\), \(Q_3\), \(P_1\), and \(P_2\). The reprojection error \({\varepsilon _{all}}\) is given by its RMSE, and the relative translation error \({\varepsilon _z}\) is expressed in percent with respect to the ground truth displacement.

The configuration 1 is our reference, corresponding to the complete model. The optimized parameters are close to the ones from Table 3, i.e., with less than 1% of variation, for all converging configurations and for all datasets.

First, the distortions do not impact the reprojection error of the model. Considering the pairs of configurations \(\left( 1,2\right) \), \(\left( 3,4\right) \), and \(\left( 5,6\right) \), the errors are similar with or without distortions, indicating that our camera does not suffer from lateral distortions. This is due to the relatively large main lens focal length. Nevertheless, distortions may have a role to play in case of shorter focal length.

Second, removing the rotations of the MLA does not improve nor worsen the reprojection error and the pose estimation. When keeping the tilt but freezing the pitch, the model is able to converge. The tilt, in combination with other factors (such as a slight decrease of the main lens focal length), compensates for the error introduced by the approximate value of the pitch. In contrast, configurations 7 and 8 do not converge to a solution, showing that when removing both the tilt and the pitch of the MLA, the model is not constrained enough, and the reprojection error cannot be minimized, resulting in a failure.

Finally, when freezing the pitch to its initial value, the positional part of the reprojection error increases. It is especially the case for dataset R12-A, where the reported errors in Table 5 are the highest of all configurations. This confirms our previous observation that the deviation of the micro-image centers and their optical centers does not satisfy an orthographic projection between the MIA and the MLA. The pitch should be taken into account, on one hand to improve the precision of the model, and on the other hand not to hinder the optimization process.

7.5 Relative Blur Calibration

We calibrate the blur proportionality coefficient \(\kappa \) for the three datasets using our BAP features. Figure 9 presents two windows extracted around BAP features of different types from the same cluster, showing different amount of blur. The target image to be equally-defocused according to our model is shown before, (b), and after, (c), blur addition. The estimated PSF of the relative blur is given in (d).

Fig. 9
figure 9

a Reference image with highest amount of blur. b Target image to be equally-defocused. c Target image with additional blur. d Estimated PSF.

The optimized blur proportionality coefficients \(\kappa \) are reported in Table 3. Theoretically, the parameter should be the same for all three datasets. Empirically this observation is validated for R12-A and R12-B. Estimated \(\kappa \) for R12-C is lower. This is because the micro-lenses focal lengths in R12-C are slightly shorter than in R12-A and R12-B. Analytically, this difference generates a higher amount of relative blur, and thus a shorter estimate of \(\kappa \) to match the observed blur in image space. In other words, \(\kappa \) compensates for the slight differences in \(f{}^{\left( i\right) }\) estimates. Therefore, \(\kappa \) should be calibrated for each dataset.

7.6 Profiling the Plenoptic Camera

Using the parameters from our calibration process, we plot the blur profile of the camera, i.e., the evolution of the blur radius with respect to depth for each micro-lens type along with its corresponding DoF. Figure 10 shows the blur profiles obtained for our three focus distance configurations, with their DoFs expressed in mm. The blur radius is expressed in pixel and is given for each type, in red for type (1), in green for type (2) and in blue for type (3). Distances are given in object space in mm with their corresponding virtual depth on a secondary x-axis, spanning from \(\upsilon =1\) to 15, except for the configuration \(h= \infty \) where we cropped just after the farthest focal plane. In MLA space, the profiles have the same behavior for all focus distances, as it only depends on the MLA parameters.

Fig. 10
figure 10

Blur profile, including each micro-lens type, in object space, at different focus distances: a \(h= 450\) mm; b \(h= 1000\) mm; and c \(h= \infty \) mm. Focal planes and depth of fields are illustrated for each type. The blur radius is expressed in pixel as function of the object distance to the camera in mm. Corresponding virtual depth is reported on the secondary x-axis.

First, the horizontal dashed line represents the radius of the minimal acceptable e circle of confusion \(r_0\). In our case, at a wavelength of 750nm, the radius of the smallest diffraction-limited spot is \(r^* = 2.4\mu \hbox {m}\) which is less than half the pixel size. We then choose \(r_0 = s/2\). Despite not illustrated in the figure, the blur radius grows exponentially when getting closer to the plane \(\upsilon = 0\). Once this limit is exceeded, the blur decreases and converges to a constant value of approximately 6 pixel. This happens for more distant objects when points are projected in front of MLA implying a negative virtual depth. This is the case for \(h=450\) and \(h=1000\) mm, but not for \(h=\infty \), as the points were never projected closer than \(\upsilon =2\). In the working distance range, the blur does not exceed 5 pixel and grows when points are closer to the camera.

Secondly, we can use the DoF to select the range of working distances where the blur is not noticeable. The DoF increases in object space as the focus distance increases. As reported on the figures: for R12-A, the DoF is of 14.44 mm; for R12-B of 120 mm; and finally, for R12-C, the total DoF is of 223 m. In MLA space the total DoF is constant and spans from \(\upsilon = 2.15\) to 3.45. As expected, the DoF overlap. In particular, the DoF of the type (3) micro-lens is entirely included in the other two, whereas the DoF of the type (1) and (2) just touch. Within the total DoF, a point can then be seen focused in two micro-images of different types simultaneously, which eases the matching problem between views.

Finally, we can easily identify the distance limits at which the point will not be in the DoF anymore nor be projected on multiple micro-images, i.e., corresponding to virtual distances \(\left| \upsilon \right| < 2\). At these distances, disparity cannot be computed in image space, and no depth estimation can be performed. Such estimation can also be hindered by the resolution in virtual space compared to the resolution in object space as disparity is inversely proportional to virtual depth. For instance, for close objects, points will be projected on more micro-images but with a low disparity. So the profiles can be used to efficiently characterize the range of distances according to the desired application. Furthermore, once the MLA parameters are available, we can simulate an approximate blur profile for the desired focus distance \(h\) with the desired main lens focal length \(F\) by updating the value of \(D\) using eq. 26 and eq. 27.

8 Conclusion

To calibrate a plenoptic camera, state-of-the-art methods rely on simplifying hypotheses, on reconstructed data or require separate calibration processes to take into account the multi-focus configuration. Taking advantage of blur information we propose: 1) a more complete plenoptic camera model with the introduction of a new BAP feature that explicitly models the defocus blur; this new feature is exploited in our calibration process based on non-linear optimization of reprojection errors; 2) a new relative blur calibration to fill the gap between the physical and geometric blur, which enables us to fully exploit blur in image space; and 3) a way to profile the plenoptic camera and its extended depth of field (DoF).

Our camera model is applicable to the multi-focus plenoptic camera (both in Galilean and Keplerian configuration), as well as to the single-focus and unfocused plenoptic camera. In case of the Raytrix multi-focus camera, our ablation study shows that main lens distortions and MLA tilt can be omitted without hindering the calibration process nor the pose estimation. The study also indicates that explicitly including the pitch of the micro-lenses in the model improves the results. In addition, our calibration methods are validated by quantitative evaluations in controlled environment on real-world data. Our method provides strong initial intrinsics during the pre-calibration step, and coherent optimized camera parameters for all evaluated configurations. It shows a low and stable relative translation error across all the datasets.

In the future, we plan to use blur information in complement to disparity to improve metric depth estimation.