Keywords

1 Introduction

Plenoptic cameras are able to discriminate the contribution of each of the light rays that emanate from a given point in the scene. In a conventional camera, the contribution of the several rays is not distinguishable since they are collected on the same pixel. This discrimination on plenoptic cameras is possible due to the positioning of a microlens array between the main lens and the image sensor.

Plenoptic cameras sample the lightfield [8, 9] which is a 4D slice of the plenoptic function [1]. There are several optical setups that are able to acquire the lightfield as camera arrays [15]. Here, we focus on compact and portable setups like the lenticular array based plenoptic cameras. More specifically, on the SPC [13] which has a higher directional resolution and produces images with lower spatial resolution [7] when compared to the focused plenoptic camera (FPC) introduced by Lumsdaine and Georgiev [10, 14].

The camera models proposed for SPC [3, 4] are approximations of the real setup by considering the main lens as a thin lens and the microlenses as pinholes. There can be more complex models to describe the real setup. The SPC manufacturer provides metadata regarding the camera optical settings that help describing the camera. Namely, the metadata provided include the main lens focal length which is considered in [3, 4] to model the refraction of the rays by the main lens. On the other hand, the metadata also includes the distance at which a point is always in focus by the microlenses. Nonetheless, the assumption of pinhole like microlenses do not allow to incorporate directly this additional information on the camera models [3, 4].

The calibration procedures for SPCs [3, 4] do not consider the information provided by the camera manufacturer as metadata and therefore rely completely on the acquisition of a dataset with a calibration pattern for the specific zoom and focus settings to estimate the camera model parameters. Thus, in this work, we identify the relationships among the optical parameters provided as metadata as well as the relationships between these optical parameters and the entries of the camera model [4] for different zoom and focus settings of the camera. The relationships obtained are used to represent the camera model parameters based on the metadata parameters for a specific zoom and focus setting without having to acquire a novel calibration dataset.

In terms of structure, one presents in Sect. 2 a brief review of the camera models proposed for the SPC. In Sect. 3, the camera model [4] that describes the SPC by a \(5 \times 5\) matrix that maps the rays in the image space to rays in the object space is summarized. In Sect. 4, one identifies the relationships among the parameters provided as metadata, and the relationships between the camera model entries and the metadata provided on the raw images. The results of estimating the camera model based on the metadata for a given zoom and focus setting are presented in Sect. 5. The major conclusions are presented in Sect. 6.

Notation: The notation followed throughout this work is the following: non-italic letters correspond to functions, italic letters correspond to scalars, lower case bold letters correspond to vectors, and upper case bold letters correspond to matrices.

Fig. 1.
figure 1

(a) Image captured on the sensor of a SPC. (b) Magnification of red box A in (a). This image depicts the hexagonal tiling of the microlenses images formed in the sensor. (c) Microlens images considered on the virtual plenoptic camera after the decoding process [4]. (Color figure online)

2 Related Work

SPC allow to define several types of images by reorganizing the pixels captured by the camera on the 2D raw image (Fig. 1a) [13]. The raw image displays the images obtained by each microlens in the microlens array (Fig. 1b). There is another arrangement of pixels that is commonly used in SPC, the viewpoint or sub-aperture images. These images are obtained by selecting the same pixel position relatively to the microlens center for each microlens [13]. The microlens and viewpoint images exhibit different features due to the position of the microlens array on the focal plane of the main lens (Fig. 2). Thus, for these cameras, there are mainly two calibration procedures, one based on viewpoint images [4] and other based on microlens images [3]. These consider camera models in which the main lens is modeled as a thin lens and the microlenses as pinholes.

The calibration based on viewpoint images [4] considers corner points as features and assumes a decoding process that transform the hexagonal tiling of the microlenses to a rectangular tiling (Fig. 1). This is done by interpolating the pixels of adjacent microlenses to get the missing ray information [5]. So in fact, this calibration procedure considers the calibration of a virtual SPC. There is an evolution of this work [16] that considers a better initialization for the camera model parameters. One of the disadvantages pointed out to this procedure is the fact of creating viewpoint images before a camera model is estimated.

On the other hand, the work of Bok et al. [3] allows to calibrate SPC directly from raw images using lines features. This procedure requires that line features appear on the microlens images which cannot be ensured when the calibration pattern is near the world focal plane of the main lens [3]. In this region, the microlens images consist of an image with very small deviations on the intensity values since these projections correspond to the same point in the scene [12] (Fig. 2).

The calibration procedures [3, 4] assume that no information is known and therefore each of the parameters must be estimated by acquiring a dataset with a calibration pattern for a specific zoom and focus settings. A SPC provides metadata with information of the optical settings with the images acquired. Monteiro et al. [12] identified a relationship between the zoom and focus step provided in the metadata with the world focal plane of the main lens, but did not pursue this line of research. Here, we go a step further and identify the relationships of the metadata parameters among them and with the camera model parameters [4]. These relationships allow to obtain a representation of the camera model for an arbitrary zoom and focus settings based on the parameters provided by the manufacturer as metadata of the images acquired and without acquiring a calibration dataset.

Fig. 2.
figure 2

Geometry of a SPC. The lightfield in the image space is parameterized using pixels and microlenses indexes while the lightfield in the object space is parameterized using a point and a direction. The lightfield in the object space is parameterized on plane \(\varPi \) regardless of the original plane \(\varOmega \) in focus.

3 Standard Plenoptic Camera Model

Let us consider a plenoptic camera that acquires a lightfield in the image space \(L\left( \mathbf {\Phi }\right) \) with the plane \(\varOmega \) in focus, i.e. with the world focal plane of the main lens corresponding to the plane \(\varOmega \) (Fig. 2). The rays of the lightfield in the image space \(\mathbf {\Phi } = \left[ i,j,k,l\right] ^T\) are mapped to the rays of the lightfield in the object space \(\mathbf {\Psi } = \left[ s,t,u,v\right] ^T\) by a \(5 \times 5\) matrix proposed by Dansereau et al. [4], the lightfield intrinsics matrix (LFIM) \(\mathbf {H}\):

$$\begin{aligned} \tilde{\mathbf {\Psi }} = \mathbf {H} \tilde{\mathbf {\Phi }} \end{aligned}$$
(1)

where \(\tilde{\left( \cdot \right) }\) denotes the vector \(\left( \cdot \right) \) in homogeneous coordinates. The rays in the image space are parameterized by pixels \(\left( i,j\right) \) and microlenses \(\left( k,l\right) \) indices while the rays in the object space are parameterized on a plane \(\varPi \) by a position \(\left( s,t\right) \) and a direction \(\left( u,v\right) \) in metric units [12]. Removing the redundancies of the LFIM with the translational components of the extrinsic parameters [2, 4], one defines a LFIM with 8 free intrinsic parameters

$$\begin{aligned} \mathbf {H} = \begin{bmatrix} h_{si} &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} h_{tj} &{} 0 &{} 0 &{} 0 \\ h_{ui} &{} 0 &{} h_{uk} &{} 0 &{} h_u \\ 0 &{} h_{vj} &{} 0 &{} h_{vl} &{} h_v \\ 0 &{} 0 &{} 0 &{} 0 &{} 1 \end{bmatrix}\quad . \end{aligned}$$
(2)

This matrix does not provide a direct connection with the common intrinsic parameters defined within a pinhole projection matrix. The closer connection to the pinhole projection matrix is the one provided by Marto et al. [11] regarding the representation of a camera array composed of identical co-planar cameras. In this setup, the LFIM can be represented as

$$\begin{aligned} \mathbf {H} = \begin{bmatrix} h_{si} &{} 0 &{}\mathbf {0_{2 \times 3}}\\ 0 &{} h_{tj} &{}\\ \mathbf {0_{3 \times 2}} &{} &{}\mathbf {K}^{-1} \end{bmatrix} \quad \mathrm {with} \quad \mathbf {K} = \begin{bmatrix} \frac{1}{h_{uk}} &{} 0 &{} -\frac{h_u}{h_{uk}} \\ 0 &{} \frac{1}{h_{vl}} &{} -\frac{h_v}{h_{vl}} \\ 0 &{} 0 &{} 1 \end{bmatrix} \end{aligned}$$
(3)

where \(\mathbf {0}_{n \times m}\) is the \(n \times m\) null matrix, \(\left[ h_{si},h_{tj}\right] ^T\) corresponds to the baseline between consecutive cameras, and \(\mathbf {K}\) corresponds to the intrinsics matrix that represents the cameras in the camera array defined using the LFIM (2) entries.

The LFIM introduced by Dansereau et al. [4] describes a virtual plenoptic camera whose microlenses define a rectangular tiling (Fig. 1c) instead of the actual hexagonal tiling of a plenoptic camera (Fig. 1b). The rectangular tiling is a result of a decoding process [4] that corrects the misalignment between the image sensor and the microlens array, and removes the hexagonal sampling by interpolating the missing microlenses information from the pixels of the neighbouring microlenses [5].

4 Calibration on a Range of Zoom and Focus Levels

The metadata parameters (meta-parameters), provided by the camera manufacturer with the images acquired, are retrieved from the camera hardware. Here, we focus on the information that refers to the image sensor, main lens and microlens array. More specifically, meta-parameters that change with the zoom and focus settings of the camera, i.e. the main lens world focal plane [12].

Fig. 3.
figure 3

Representation of a SPC based on meta-parameters provided in the images metadata. In step A, the affine functions \(\mathrm {a}\left( f\right) \), \(\mathrm {b}\left( f\right) \), \(\mathrm {c}\left( f\right) \), \(\mathrm {d}\left( f\right) \), \(\mathrm {e}\left( \lambda _\infty \right) \), and \(\mathrm {g}\left( \lambda _\infty \right) \) are estimated using several calibration datasets with different zoom and focus settings. These datasets are used to relate the entries of the LFIM \(\mathbf {H}_{\left( \cdot \right) }\) (Sect. 3) and the meta-parameters \(\vartheta _{\left( \cdot \right) }\) (Sect. 4). In step B, the LFIM \(\mathbf {H}_i\) is estimated for an arbitrary zoom and focus settings using only the meta-parameters \(\vartheta _i\) of a given image and without acquiring a calibration dataset for that specific zoom and focus settings (Sect. 5).

Fig. 4.
figure 4

Meta-parameters vs. Target depth. (a) represents the target object at depth 1.5 m for the different zoom steps. (b) represents the focus step with the depth of a target object for a selection of zoom steps. (c) represents the infinity lambda with the depth of a target object for a selection of zoom steps (or equivalently, focal lengths).

4.1 Camera Metadata Parameters

In [12], the influence of two meta-parameters in the definition of the main lens world focal plane was analyzed. Monteiro et al. [12] identified that the world focal plane is mainly determined by a combination of the zoom and focus steps (Fig. 4b). Nonetheless, there are more parameters on the metadata of the images acquired that can determine the main lens world focal plane and that were not analyzed in [12]. For example, the main lens focal length that can be associated with changes on the zoom level or the infinity lambda that can be associated with the focus settings of the microlenses. Namely, the infinity lambda corresponds to the distance in front of the microlens array that is in focus at infinity. However, the microlenses optical settings are fixed. The optical settings are changed by modifying the main lens or the complex of lenses that compose the main lens. Thus, the infinity lambda describes the combined optical setup of the microlenses and main lens. In fact, representing the focal length, infinity lambda and target object depth (Fig. 4c), one finds a similar behavior to the one depicted in Fig. 4b. This shows that the world focal plane can also be defined by a combination of the focal length and the infinity lambda parameters.

In order to identify and analyze the camera parameters depending on zoom and focus settings, we follow the same experimental approach defined in [12] and computed the Pearson correlation coefficient among the different meta-parameters [6]. In this experimental analysis, one identifies five parameters that vary with the main lens world focal plane: zoom step (zoom-stepper motor position), focus step (focus-stepper motor position), focal length, infinity lambda, and f-number. The first two parameters represent, up-to an affine transformation, optical parameters information. Namely, the zoom step is related with the focal length of the main lens (Fig. 5a) (correlation of \(93.16\%\)), and the focus step for a fixed zoom is related with the infinity lambda parameter (Fig. 5c) (correlation of \(99.54\%\)). On the other hand, the f-number is not used in the definition of the intrinsic parameters of a camera and it is normally described as the ratio f/D where f is the focal length and D is the diameter of the entrance pupil. This reduces the relevant metadata parameters to two, the focal length and the infinity lambda.

Fig. 5.
figure 5

Relationships among camera parameters provided on images metadata. The camera parameters were obtained experimentally by fixing the zoom number and autofocusing the camera to a target object placed at different depths. The zoom step (a) is related with the focal length of the main lens. (b) The focus step is related with the infinity lambda parameter. The zoom number corresponds to the number that appears on the interface of the camera.

4.2 Metadata Parameters Vs. LFIM

The LFIM depends on the optical settings of the camera. Let us now evaluate how the focal length and infinity lambda are related with the parameters of the LFIM described in Sect. 3. The derivation of Dansereau et al. [4] indicates how the LFIM parameters change with the focal length included in the images metadata. However, the assumption of microlenses as pinholes do not allow to introduce the concept of focus at infinity as a parameter of the LFIM. Thus, one wants to provide a relationship between the LFIM parameters and the camera parameters provided on the images metadata.

In order to evaluate these relationships, one needs multiple calibration datasets acquired under different zoom and focus settings. The datasets [12] were collected using a \(1^\mathrm{st}\) generation Lytro camera and are summarized on Table 1. For establishing the relationships, we use 10 poses randomly selected from the acquired calibration pattern poses to estimate the LFIM [4] and repeated this procedure 15 times to get the mean and standard deviation values. Representing the entries of the LFIM and computing their Pearson correlation coefficients [6] against the focal length and infinity lambda, we found that the entries \(h_{si}\) and \(h_{tj}\), which are related to the baseline, exhibit an affine relationship with the focal length (Fig. 6a–b) with a correlation coefficient of \(99.97\%\) and \(99.98\%\), respectively. The entries \(h_{uk}\) and \(h_{vl}\), which are related with the scale factors, exhibit a nonlinear relationship with the focal length (Fig. 6c–d) with a correlation coefficient of \(84.94\%\) and \(84.75\%\), respectively. Furthermore, the remaining entries do not exhibit a correlation with any of the metadata parameters provided.

Fig. 6.
figure 6

Relationships of the LFIM entries with the focal length. The entries related with the baseline (a)–(b), and with the scale factor (d)–(e) are represented against the focal length. The target object is depicted at 1 m with different focal lengths (0.0064 (c) and 0.0256 (f)).

Table 1. Information of the datasets [12] acquired under different zoom and focus settings. The meta-parameters are identified with the symbol *.
Fig. 7.
figure 7

Intrinsics matrix entries vs. focal length and infinity lambda. The entries related with the scale factor are represented against the focal length (a)–(b). The entries \(h_{ui}/h_{uk}\) and \(h_{vj}/h_{vl}\) are represented against the infinity lambda (c)–(d).

If we consider the entries on the intrinsics matrix \(\mathbf {K}\) (3), \(1/h_{uk}\) and \(1/h_{vl}\) exhibit an affine relationship with the focal length (Fig. 7a–b) with a correlation coefficient of \(99.82\%\) and \(99.81\%\), respectively. On the other hand, the ratios \(h_{ui}/h_{uk}\) and \(h_{vj}/h_{vl}\) have an affine relationship with the infinity lambda (Fig. 7c–d) with a correlation coefficient of \(99.55\%\) and \(99.83\%\), respectively. The principal point \(\left[ h_{u}/h_{uk},h_{v}/h_{vl}\right] ^T\) continue not having any relationship with the metadata parameters. The transformation to a pinhole like representation allows to simplify the relationships with the parameters provided by the manufacturer on the metadata of the images acquired.

In summary, denoting f as the focal length (see sample values in Table 1 column 4), \(\lambda _\infty \) as the infinity lambda (sample values shown in Table 1 column 5), and \(\left[ c_u,c_v\right] ^T\) as the principal point, one has

$$\begin{aligned} \mathbf {H} = \begin{bmatrix} \mathrm {a}\left( f\right) &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} \mathrm {b}\left( f\right) &{} 0 &{} 0 &{} 0 \\ \frac{\mathrm {e}\left( \lambda _\infty \right) }{\mathrm {c}\left( f\right) } &{} 0 &{} \frac{1}{\mathrm {c}\left( f\right) } &{} 0 &{} \frac{c_u}{\mathrm {c}\left( f\right) } \\ 0 &{} \frac{\mathrm {g}\left( \lambda _\infty \right) }{\mathrm {d}\left( f\right) } &{} 0 &{} \frac{1}{\mathrm {d}\left( f\right) } &{} \frac{c_v}{\mathrm {d}\left( f\right) } \\ 0 &{} 0 &{} 0 &{} 0 &{} 1 \\ \end{bmatrix} \end{aligned}$$
(4)

where \(\mathrm {a}\left( f\right) \), \(\mathrm {b}\left( f\right) \), \(\mathrm {c}\left( f\right) \), \(\mathrm {d}\left( f\right) \), \(\mathrm {e}\left( \lambda _\infty \right) \), and \(\mathrm {g}\left( \lambda _\infty \right) \) are the affine mappings identified earlier. In the next section, we detail the procedure followed to estimate the affine mappings and show numerical results for the datasets [12].

5 Experimental Results

In this section, we use the relationships established between the LFIM entries and the metadata parameters (Sect. 4) to obtain a representation for the parameters used to describe the camera for a specific zoom and focus settings.

The relationships \(\mathrm {a}\left( f\right) \), \(\mathrm {b}\left( f\right) \), \(\mathrm {c}\left( f\right) \), \(\mathrm {d}\left( f\right) \), \(\mathrm {e}\left( \lambda _\infty \right) \), and \(\mathrm {g}\left( \lambda _\infty \right) \), in Eq. (4), are estimated using the datasets in Table 1 except Dataset B. As in Sect. 4, one considered for each dataset 10 poses randomly selected from the acquired calibration pattern poses to estimate the camera model parameters [4] and repeated this procedure 15 times to get the mean values. The parameters of the affine mappings obtained using the mean values of the LFIM are summarized on Table 2.

Table 2. Line parameters estimated for the relationships between the LFIM entries and the focal length or the infinity lambda identified in (4).

The Dataset B is not included in the previous analysis in order to be used to evaluate the accuracy of the camera representation (4) using the focal length and the infinity lambda meta-parameters. The LFIM entries are obtained by applying the affine mappings identified in Table 2. These entries are compared with the mean values obtained by repeating 15 times the calibration procedure [4] using 10 randomly selected poses of Dataset B and are summarized in Table 3. The principal point \(\left[ c_u,c_v\right] ^T\) is assumed to be the center of the viewpoint image since no relationship was found with the metadata parameters. Table 3 shows that the entries obtained from the calibration are similar to the ones obtained from the metadata. Namely, the maximum deviation is \(7.8 \%\) and occurs for the ratio \(h_{ui}/h_{uk}\).

Table 3. LFIM entries estimated from focal length and infinity lambda using line parameters in Table 2 and from calibration procedure [4] for Dataset B.

Additionally, one considered a set of 10 randomly selected images to evaluate the re-projection, ray re-projection [4], and reconstruction errors using the LFIM obtained from applying the calibration procedure [4] and from the metadata provided on the images acquired using the representation (4). The errors are summarized in Table 4. This table allows to have a more practical view of the difference between the two approaches considered. The errors presented are significant but is important to note that the extrinsic parameters are not tuned for the LFIM. The re-projection and ray re-projection errors are similar, being greater for the LFIM obtained from the metadata by 0.34 pixels and 0.14 mm, respectively. On the other hand, the reconstruction error for the metadata based estimation is significantly greater than the one obtained from calibration [4] but still lower than 65 mm. However, note that the LFIM representation using the focal length and the infinity lambda is based on a statistical analysis between the metadata parameters provided by the camera manufacturer and the parameters estimated from a calibration procedure that are affected by noise.

Table 4. Calibration errors associated with the estimation of the LFIM \(\mathbf {H}\) from metadata and from the calibration procedure [4].

6 Conclusions

The different zoom and focus settings of the camera change the LFIM \(\mathbf {H}\) used to describe the camera, so we proposed a representation based on the metadata parameters provided on the images acquired. We found that the main lens world focal plane can be determined by the focal length and the infinity lambda parameters. This allows to estimate the LFIM entries without requiring the acquisition of a calibration dataset for a specific zoom and focus settings.