Keywords

1 Introduction

Imaging spectroscopy devices can capture an information-rich representation of the scene comprised by tens or hundreds of wavelength-indexed bands. In contrast with their trichromatic (colour) counterparts, these images are composed of as many channels, each of these corresponding to a particular narrow-band segment of the electromagnetic spectrum [1]. Thus, imaging spectroscopy has numerous applications in areas such as remote sensing [2, 3], disease diagnosis and image-guided surgery [4], food monitoring and safety [5], agriculture [6], archaeological conservation [7], astronomy [8] and face recognition [9].

Recent advances in imaging spectroscopy have seen the development of sensors where the spectral filters are fully integrated into the complementary metal-oxide-semiconductor (CMOS) or charge-coupled device (CCD) detectors. These are multispectral imaging devices which are single-shot and offer numerous advantages in terms of speed of acquisition and form-factor [10, 11]. However, one of the main drawbacks of these multispectral systems is the low raw spatial resolution per wavelength-indexed band in the image. Hence, super-resolving spectral images is crucial to achieving a much improved spatial resolution in these devices.

Note that, during recent years, there has been a steady improvement in the performance of example-based single image SR methods [12,13,14,15]. This is partly due to the wide availability of various benchmark datasets for development and comparison. For example, the dataset introduced by Timofte et al. [16,17,18,19,20,21], Urban100 [22], and DIV2K [23] are all widely available.

Similar to RGB or grey-scale super-resolution, recently example-based techniques for spectral image super-resolution have started to appear in the literature [24]. However, in contrast to their RGB and grey-scale counterparts, multispectral/hyperspectral datasets suitable for the development of single image super-resolution are not as abundant or easily accessible. For example, the CNN-based method in [24] was developed by putting together three different hyperspectral datasets. The first of these, the CAVE [25] consists of only 35 hyperspectral and RGB pairs gathered in a laboratory setting and controlled lighting using a camera with tunable liquid crystal filters. Similarly, the second dataset from Harvard [26] contains fifty hyperspectral images captured with a time-multiplexed 31-channel camera with an integrated liquid crystal tunable filter. The third dataset is that in [27], which includes 25 hyperspectral images of outdoor urban and rural scenes also captured using a tunable liquid-crystal filter. Probably the largest spectral dataset to date with more than 250 31-channel spectral images is the one introduced with the NTIRE 2018 challenge on spectral reconstruction from RGB images [28].

Moreover, while the topic of spectral image super-resolution utilizing colour images, i.e., pan-sharpening, has been extensively studied [29,30,31] so as to develop efficient example-based super-resolution methods, stereo registered colour-spectral datasets are limited to small number of hyperspectral images. One of the very few examples is that of the datasets in [32], where the authors introduced a stereo RGB and near infrared (NIR) dataset of 477 images and propose a multispectral SIFT (MSIFT) method to register the images. However, the dataset is promoted in the context of scene recognition. In addition, the NIR images are comprised of only one wavelength-indexed band. Similarly, in [33], the authors introduce an RGB-NIR image dataset of approximately 13 h video with only one band dedicated to NIR images. The dataset was gathered in an urban setting by mounting the cameras on a vehicle.

In this paper we introduce a novel dataset of colour-multispectral images which we name StereoMSI. Unlike the above two RGB-NIR datasets, the dataset was primarily developed for the PIRM2018 spectral SR challengeFootnote 1 [34] and comprised 350 registered stereo RGB-spectral image pairs. The StereoMSI dataset is hence large enough to help develop deep learning spectral super-resolution methods. Moreover, it is, to the best of our knowledge, the first of its kind. As a result, the paper is organised as follows. We commence by introducing the dataset. We then present a number of image quality metrics over the dataset and the proposed splits for training, validation and testing. Then we present a brief review of the challenge and elaborate upon the results obtained by its participants. Finally, we discuss other potential applications of the dataset and conclude on the developments presented here.

2 StereoMSI Dataset

As mentioned above, here we propose the StereoMSI dataset. The dataset is a novel RGB-spectral stereo image dataset for benchmarking example-based single spectral image and example-based RGB-guided spectral image super-resolution methods. The dataset is developed for research purposes only (Fig. 1).

Fig. 1.
figure 1

A sample image from the StereoMSI dataset. Here we show the RGB image and the 14 wavelengths channels of the multispectral camera indicated by \(\lambda _i\), \(i=\{1,2,\ldots ,14\}\). All wavelengths are in nm and, for the sake of better visualisation, we have gamma-corrected the 14 channels by setting \(\gamma =0.75\).

Fig. 2.
figure 2

Validation images for the Track 1 of the PIRM2018 challenge. Each of the panels corresponds to the normalised spectral power of one of the validation images, i.e. the norm of the spectra per-pixel normalised to unit maximum over the image.

Fig. 3.
figure 3

Validation images of Track 2 of the PIRM2018 challenge. In the left-hand and third columns we show the normalised spectral power of the spectral imagery, whereas the second and third columns show their registered RGB image pairs.

Fig. 4.
figure 4

Illustration of raw pixels for RGB and spectral cameras. The RGB camera used to acquire the images of our dataset has \(\times 4\) the resolution of the spectral camera. Here we show the RGGB Bayer pattern of the colour camera and the actual wavelengths of the multispectral sensor in our MQ022HG-IM-SM4x4 camera. The two invalid filters on the array are crossed out in the panel above. The wavelengths for the remaining 14 channels from the top left to the bottom right across the \(4\times 4\) spectral filter array are 553.3 nm, 599.9 nm, 510.9 nm, 477.2 nm, 562.5 nm, 612.9 nm, 523.2 nm, 500.3 nm, 590.6 nm, 548.9 nm, 489.5 nm, 577.3 nm, 617.5 nm, and 537.9 nm.

2.1 Diversity and Resolution

The 350 stereo pair images were collected from a diverse range of scenery in the city of Canberra, the capital of Australia. The nature of the images ranges from open industrial to office environments and from deserts to rainforests. In Figs. 2 and 3 we display validation images for the former and latter, respectively.

It is worth noting that, during acquisition time, we paid particular attention to the exposure time and image quality as the stereo pairs were captured using different cameras. One is an RGB XiQ camera model MQ022CG-CM and the other is a XiQ multispectral camera model MQ022HG-IM-SM4x4 covering the interval [\(470-620\) nm] in the visible spectral range.

The original spectral images were processed and cropped to the resolution \(480\times 240\) so as to allow the stereo RGB images to be resized to a resolution 2 times larger in each axis, that is \(960\times 480\). This is due to the fact that, in practice, the RGB camera used, based upon a CMOS image sensor, has a \(2\times 2\) Bayer RGGB pattern whereas the IMEC spectral sensors have a \(4\times 4\) pattern delivering 16 wavelength bands. Hence, the resolution of the RGB images in each axis is twice that of the spectral images. Figure 4 illustrates this resolution relationship between the two filter arrays on both cameras. When processing the images, no gamma correction was applied.

2.2 Structure and Splits

After collecting the StereoMSI 350 images, the two invalid wavelength-indexed bands on the IMEC sensor were removed. We then registered the images using Flownet2.0 [35] and used MATLAB’s imresizeFootnote 2 function to obtain lower resolution versions of each image by downscaling them by factors of \(\times 2\) and \(\times 3\) with nearest neighbour interpolation.

Table 1. Summarised dataset and camera properties

The dataset for Track 1 (single image super-resolution) consists of 240 different spectral images. The 240 images have been split into 200 for training, 20 for validation and 20 for testing with low resolution (HR) and high resolution (LR) on self explicatively named directories. The dataset for Track 2 (colour-guided spectral image super-resolution) consists of 120 randomly selected image stereo pairs, where one view is captured by the spectral imager and the other one by the colour camera. The images have been split into 100 pairs for training, 10 for validation and 10 for testing with HR, and LR on self explicatively named directories. All the images, for both tracks, are in band-sequential, 16 bit, ENVI standard file format. Table 1 summarized the dataset and camera properties explained above.

2.3 Bicubic Upsampling Metrics

To quantitatively assess our StereoMSI dataset, and to provide a baseline for future benchmarking, we have performed image upsampling by applying a bicubic kernel. Python’s imresize function from the scikit-imageFootnote 3 toolbox was used to perform bicubic upsampling. With the upsampled images in hand, we have then computed a number of image quality metrics so as to compare the performance of current and future example-based spectral super-resolution algorithms. To this end, we up-sampled the lower-resolution images in the dataset by \(\times 2\) and \(\times 3\) and compared against their HR reference counterparts.

For the sake of consistency, here we use the same metrics as those applied in the PIRM2018 spectral super-resolution challenge [34]. This are the mean relative absolute error (MRAE) (introduced in [28]), the Spectral Information Divergence (SID), the per-band Mean Squared Error (MSE), the Average Per-Pixel Spectral Angle (APPSA), the average per-image Structural Similarity index (SSIM) and the mean per-image Peak Signal-to-Noise Ratio (PSNR). For more information on these metrics refer to the PIRM2018 spectral image super-resolution challenge report [34].

In Table 2, we show the image metric results for the whole 350 images comprising the StereoMSI dataset, and the testing images. We have included the testing split in the table since the testing imagery for both tracks is the same. Tables 3, and 4 show the results for full dataset, training and validation splits of Track 1 and Track 2, respectively.

Table 2. Mean and standard deviation (in parenthesis) for the evaluation metrics under consideration for each of the two down sampling factors, i.e., \(\times 2\) and \(\times 3\), for the whole dataset and the testing split used for both tracks of the PIRM2018 Example-based Spectral Image Super-resolution challenge.
Table 3. Mean and standard deviation (in parenthesis) for the evaluation metrics under consideration for each of the two down sampling factors, i.e., \(\times 2\) and \(\times 3\), for the training and validation splits used in the Track 1 of the PIRM2018 Example-based Spectral Image Super-resolution challenge and the full set of images (the testing, training and validation splits combined).
Table 4. Mean and standard deviation (in parenthesis) for the evaluation metrics under consideration for each of the two down sampling factors, i.e., \(\times 2\) and \(\times 3\), for the training and validation splits used in the Track 2 of the PIRM2018 Example-based Spectral Image Super-resolution challenge and the full set of images (the testing, training and validation splits combined).

3 PIRM2018 Spectral Image Super-Resolution Challenge

The PIRM2018 challenge has a twofold motivation. Firstly, the notion that, by using machine learning techniques, single image SR systems can be trained to obtain reliable multispectral super-resolved images at testing. Secondly, that by exploiting the higher resolution of the RGB images registered onto the spectral images, the performance of the algorithms can be further improved.

Track 1 focuses on to the problem of super-resolving the spatial resolution of spectral images given training image pairs, whereby one of these is an LR and the other one is an HR image, i.e. the ground truth reference image. The aim is hence to obtain \(\times 3\) spatially super-resolved spectral images making use of training imagery. Track 2, in the other hand, aims at obtaining \(\times 3\) spatially super-resolved spectral images making use of spectral-RGB stereo image pairs.

Each of the participating teams is expected to submit HR testing images which are to be evaluated with respect to several quantitative criteria concerning the fidelity of the reconstruction of the spectra in the super-resolved spectral images. The quantitative assessment of the fidelity of the images consists of the comparison of the restored multispectral images with their corresponding ground truth. For this, the challenge used the MRAE, the SID, the MSE, the APPSA, the SSIM and the mean PSNR. However, only MRAE and SID were used for ranking.

Table 5. Mean and standard deviation (in parenthesis) for the evaluation metrics under consideration for the winners (IVRL_Prime [36], and VIDAR [37]) of both tracks of the PIRM2018 Example-based Spectral Image Super-resolution challenge. For the sake of reference, we also show the results yielded by up-sampling the LR (\(\times 3\)) testing images using a bicubic kernel.
Fig. 5.
figure 5

Performance of IVRL_Prime, and VIDAR teams on image 124 from the Track 2 testing split, compared to bicubic upsampled LR\(\times 2\) and LR\(\times 3\) images. Note that, for IVRL_Prime, inputs are LR\(\times 2\) and LR\(\times 3\) images, and for VIDAR the input is only the LR\(\times 3\) image. For the sake of comparison we also show an up-sampled LR image (factor \(\times 3\)) obtained using a bicubic kernel. All the imagery in the panels corresponds to the normalized spectral power image, and for the sake of better visualization, we have gamma-corrected the 14 channels by setting \(\gamma =0.75\).

In Table 5, we present the fidelity measurements for the testing images submitted by the challenge winners. Additionally, in Fig. 5 we show sample super-resolved results for the two winners of the competition. For more details regarding the challenge, the super-resolution results obtained by other participants and the networks and algorithms used at the challenge, we would like to refer the interested reader to [34].

4 Discussion and Conclusions

In this paper, we have introduced the StereoMSI dataset comprising of 350 stereo spectral-colour image pairs. The dataset is a novel one which is specifically structured for multispectral super-resolution benchmarking. Although it was acquired with spectral image super-resolution in mind, it is quite general in nature. Having a ColorChecker present in every image, it can also be used for a number of other learning-based applications. Moreover, it also provides lower-resolution imagery and training, validation, and testing splits for both colour-guided and example-based learning applications. We have also presented a set of quality image metrics applied to the images when up-sampled using a bicubic kernel and, in doing so, provided a baseline based upon an image resizing approach widely used in the community. We have also provided a summary of both tracks in the PIRM2018 spectral image super-resolution challenge and shown the results obtained by the respective winners.