Abstract
Robustness is an important concern in machine learning and pattern recognition, and has attracted a lot of attention from technical and scientific viewpoints. Actually, the robustness models the capacity of a computerized approach to resist to perturbing phenomena and data uncertainties, and generate common artefact while designing algorithms. However, this question has not been dealt in depth in such a way for image processing tasks. In this article, we propose a novel definition of robustness dedicated to image processing algorithms. By considering a generalized model of image data uncertainty, we encompass the classic additive Gaussian noise alteration that we study through the evaluation of image denoising algorithms, but also more complex phenomena such as shape variability, which is considered for liver volume segmentation from medical images. Furthermore, we refine our evaluation of robustness wrt. our previous work by introducing a novel quality-scale definition. To do so, we calculate the worst loss of quality for a given algorithm over a set of uncertainty scales, together with the scale where this drop appears. This new approach permits to reveal any algorithm’s weakness, and for which kind of corrupted data it may happen.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
Reproducibility and robustness are important concerns in image processing and pattern recognition tasks, and for various applications such as medical image analysis for instance [18, 26]. While the first refers to the replicable reuse of a method (and generally a code) by associating input image data and method’s outputs [17], the second is generally understood as the ability of an algorithm to resist to uncontrolled phenomena and to data uncertainties, such as image noise [29]. This article focuses on the evaluation of this robustness, which is a crucial matter in machine learning and computer vision [2, 21] and increasingly with the emergence of deep learning algorithms [3, 6] and big data [22, 25]. However, in the field of image processing, this definition of robustness and its evaluation have not been further studied in such a way. The first definition we have proposed in RRPR 2016 [28] (called \(\alpha \)-robustness) was the first attempt in measuring robustness by considering multiple scales of noise, and applied to two tasks: still image denoising and background subtraction in videos. In this previous work, image data was supposed to be altered by an additive Gaussian (or equivalent) noise, which is a common hypothesis when we refer to noisy image content. This robustness measurement consisted in calculating the worst quality loss (the \(\alpha \) value) of a given algorithm, for a set of noise scales (e.g. increasing standard deviation of a Gaussian noise).
In the present article, we introduce in Sect. 2 a novel quality-scale definition of robustness still dedicated to image processing algorithms, by a generalized model of the pertubating phenomenon under consideration. Instead of representing only additive Gaussian noises, we can consider more complex image data uncertainties. To be able to evaluate robustness, we only need to measure data uncertainty by a monotonic increasing function. Moreover, together with the \(\alpha \) value presented earlier, we also calculate the scale of uncertainty (\(\sigma \)) that generated an algorithm’s worst loss of quality. Then, we apply this definition (called \((\alpha ,\sigma )\)-robustness) first by revisiting the topic of image enhancement and denoising with the parallel concern of representation of noise in a multi-scale manner (Sect. 3), as we did in [28]. In this context the uncertainty is modeled as a classic Gaussian noise. Second we study the impact of shape variability in liver volume segmentation from medical images (Sect. 4). Here, we also propose to measure the uncertainty (liver variability) by a monotonic function, thus adapted to our test of robustness. In Sect. 5, we describe the code that can be publicly downloaded in [24] to reproduce the results of this paper, and so that any reader may evaluate the robustness of image processing methods. We conclude and enlarge the viewpoint of this paper by proposing future axes of progress of this research in Sect. 6.
2 A Novel Definition of Robustness for Image Processing
We first consider that an algorithm designed for image processing may be perturbed, because of an input data altered with a given uncertainty. By extending notations from the work [20, 28], we pose:
which will be shortened by \(\mathbf {\widehat{Y}}=\mathbf {Y^0}\odot \delta \mathbf {Y}\) when the context allows it, i.e. when the subscripts are not necessary. The measurement \(\mathbf {\widehat{Y}}\) is obtained from a perfect value \(\mathbf {Y^0}\), corrupted by the alteration \(\delta \mathbf {Y}\). Classically, \(\delta \mathbf {Y}\) may be considered as a Gaussian noise by supposing that \(\delta y_i\simeq GI(0, \sigma ^2C_y)\) where \(\sigma ^2C_y\) is the covariance of the errors at a known noise scale \(\sigma \) (e.g. standard deviation or std.). This noise is generally added to the input data so that \(\mathbf {\widehat{Y}}=\mathbf {Y^0}+\delta \mathbf {Y}\). Section 3 explores this classic scenario of additive noise modeling.
In this article, we also consider more complex phenomena that do not refer to this model. In such difficult situations, alteration \(\delta \mathbf {Y}\) and operator \(\odot \) cannot be modeled theoretically or numerically evaluated, and we only know the measures \(\mathbf {\widehat{Y}}\) and the perfect case \(\mathbf {Y^0}\). A way to model the uncertainty is to define a variability scale \(\sigma \) between a given sample \(\mathbf {\widehat{Y}}\) and the perfect, standard case \(\mathbf {Y^0}\). In Sect. 4, we propose to study shape variability through this viewpoint.
Let A be an algorithm dedicated to image processing, leading to an output \(\mathbf {X}=\{x_i\}_{i=1,n}\) (in general the image resulting from the algorithm). Let N be an uncertainty specific to the target application of this algorithm, and \(\{\sigma _k\}_{k=1,m}\) the scales of N. The different outputs of A for every scale of N is \(\mathbb {X}=\{\mathbf {X_k}\}_{k=1,m}\). The ground truth is denoted by \(\mathbb {Y}^0=\big \{\mathbf {Y_k^0}\big \}_{k=1,m}\). Let \(Q({\mathbf {X_k}},\mathbf {Y_k^0})\) be a quality measure of A for scale k of N. This Q function’s parameters are the result of A and the ground truth for a noise scale k. An example can be the F-measure, combining true and false positive and negative detections for a binary decision (as binary segmentation for instance). Our new definition of robustness can be formalized as follows:
Definition 1
(\((\alpha ,\sigma )\)-robustness ). Algorithm A is considered as robust if the difference between the output \(\mathbb {X}\) and ground truth \(\mathbb {Y}^0\) is bounded by a Lipschitz continuity of the Q function:
where
We calculate the robustness measure \((\alpha ,\sigma )\) of A as the \(\alpha \) value obtained and the scale \(\sigma =\sigma _k\) where this value is reached.
In other words, \(\alpha \) measures the worst drop in quality through the scales of uncertainty \(\{\sigma _{k}\}\), and \(\sigma \) keeps the uncertainty scale leading to this value. The most robust algorithm should have a low \(\alpha \) value, and a very high \(\sigma \) value. Figure 1 is a synthetic example of evaluation of two algorithms with this definition. This example illustrates the better robustness of Algorithm 2, since its \(\alpha \) value is smaller than the one of Algorithm 1. Moreover, we can precise that this robustness is achieved for a larger value of uncertainty with the \(\sigma \) value.
3 Application to Image Enhancement and Denoising
Image denoising has been addressed by a wide range of methodologies, which can be appreciated in a general manner in [16] for instance. The shock filter [23] is a PDE scheme that consists in employing morphological operators depending on the sign of the Laplacian operator. The original algorithm is not able to reduce image noise accurately, but several authors have improved it for this purpose. As summarized in Fig. 2-b, our test of robustness concerns these approaches based on shock scheme [1, 7, 30]; another PDE-based algorithm named coherence filtering [32]; together with the classic median [12] and bilateral [27] filterings; and an improved version of the median filter [14]. We use 13 very famous images (Barbara, Airplane, etc.), with additive white Gaussian noise altering them with varying kernel std., by considering the scales of noise \(\{\sigma _k\}=\{5,10,15,20,25\}\). The quality measure is the SSIM (structural similarity) originally introduced by [31].
Thanks to Definition 1, we are able to evaluate the robustness of various algorithms (Fig. 2), from a visual assessment thanks to the graph in Fig. 2(a), or numerically by getting the \((\alpha ,\sigma )\) values as in (b).
Since we consider an additive noise (\(\mathbf {\widehat{Y}}=\mathbf {Y^0}+\delta \mathbf {Y}\) with our notations), quality functions are decreasing monotonically over the set of noise scales, revealing that the tested algorithms loose progressively their efficiency. We can appreciate the good behavior of the algorithms SmoothedMedian and SmoothedShock, with a lower \(\alpha \) value and a larger \(\sigma \) scale than the other approaches, which means that the worst quality decrease has been observed when an aggressive Gaussian noise is applied to images.
Figure 3 presents the outputs obtained for all algorithms of our test. This confirms the good image enhancement achieved by the most robust methods, SmoothedMedian and SmoothedShock.
4 Application to Liver Volume Segmentation
Liver segmentation has been addressed by various approaches in the literature [11], and mostly oriented towards CT (computerized tomography) modality (see e.g. [10]). We propose to compare two liver extraction approaches in this test of robustness. The automatic model-based algorithm presented in [15] (named hereafter MultiVarSeg) is based on the prior 3-D representation of any patient’s liver by accumulating images from diverse public datasets. We compare MultiVarSeg with a free available semi-automatic segmentation software, called SmartPaint [19]. It allows a fully interactive segmentation of medical volume images based on region growing.
To compare these methods, we employ the datasets provided by the Research Institute against Digestive Cancer (IRCAD) [13] and by the SLIVER benchmark [11]. We propose to study the uncertainty of liver shape variability, revealing this organ’s complex and variable geometry. First, we construct a bounding box (BB) with standard dimensions of the liver certified by an expert and computed by the mean values of our database. We measure the liver variability of a given binary image (object of interest -the liver- vs. background) by the following function:
where L is the set of pixels that belong to the liver in a binary segmentation. \(L\setminus BB\) represents pixels that belong to the liver measured outside the standard box BB. The operator \(\#(.)\) stands for the cardinality of sets. To compare the tested algorithms, we use the Dice coefficient, which is a very common way to measure the accuracy of any segmentation method.
In Fig. 4, we present the result of the test of robustness by considering scale of variability following Eq. 4.
We consider here a more complex phenomenon producing uncertainty upon image data (general formalism \(\mathbf {\widehat{Y}}=\mathbf {Y^0}\odot \delta \mathbf {Y}\)), measured by a variability function. It provokes non-linear quality functions for both algorithms, however our definition of robustness enables the assessment in this case. We can thus observe the robustness of MultiVarSeg compared to SmartPaint, by a lower \(\alpha \) and a larger \(\sigma \) values.
In Fig. 5 are depicted the segmentation results obtained for each tested method. This visual inspection permits to confirm the accuracy of the model-based approach MultiVarSeg.
5 Reproducibility
We have developed a Python code, provided publicly in [24], which permits to assess visually and numerically robustness of image processing techniques. The reader can thus reproduce the plots and tables of Figs. 1, 2 and 4 of this paper. These elements are automatically created by means of the input data files presented as in Fig. 6.
Such files are composed of: the quality measure in the first line; in the second line the name of noise or uncertainty to be studied, followed by values of scales; then the next lines concern quality values of the tested algorithms, with their name at the first position, line by line until the end of the file.
Once any user runs:
for instance, our program will provide a plot displayed and saved as ‘fig_rob.pdf’; a LaTeX file named ‘tab_rob.tex’ containing the table with values of \((\alpha ,\sigma )\)-robustness in decreasing order of \(\alpha \); it will also print these values in the console (see Fig. 6-d).
To obtain these measures, our program first calculates \(\alpha \) according to Definition 1. To do so, we can rewrite Eq. 2 to determine \(\alpha \) as:
The denominator \(d_X(\sigma _{k+1}, \sigma _k)\) does not equal zero, this is easily ensured by always considering distinct scales of uncertainty, i.e. by assuming wlog. That \(\sigma _{k+1}>\sigma _k,\ 1\le k< m\). We could select any value of \(\alpha \) satisfying this equation, however, we prefer a reproducible strategy by computing the maximal value:
During this process, we also store the uncertainty scale \(\sigma \) where this \(\alpha \) value has been reached.
6 Discussion
In this paper, we have introduced a novel approach to measure robustness of image processing algorithms. We have first proposed to model image uncertainty, which encompasses the classic additive Gaussian noise alteration. Second, we have refined the factors we calculate for a given algorithm. Beside the quality loss obtained by considering Lipschitz continuity over the scales of uncertainties, we also keep the scale where this worst decrease appears. This permits to study the weakness of a method, and for which kind of image data it may happen in a concrete application.
As future concern, we would like to compare our measure with other approaches, such as calculating area under the curve, or by summing the successive quality variations. For both image enhancement and segmentation, we have conducted our study with datasets of limited size, and we have to confirm our results with larger image collections. We also hope that the code freely downloadable at [24] will help researchers and engineers to address more easily this problem of robustness for image processing in their activity.
Furthermore, it would be interesting to study noises inherent to acquisition machines from a multi-scale point of view, as Rician noise in MRI (magnetic resonance imaging) for instance [5, 9]. Drawing a relation between organ shape variability and robust image processing is another important question that is not studied in such a way in the literature. Our first measure of variability can be obviously applied to any other organ than the liver, and should be enhanced by further researches. More precisely, we could increase the number of parameters to represent complex organic shapes, but using more sophisticated models, such as [4] for instance. Robustness could be thus studied at a (slightly) greater dimension, to better understand the variation of image processing’s outcomes.
Whatever the uncertainty studied, it is necessary to acquire a voluminous amount of data, and to annotate it in order to determine algorithms’ robustness. For completing such a database, we could use simulation, as VIP (Virtual Imaging Platform) that consists in generating images, with various parameters related to acquisition machine and target organ’s anatomy [8] (see Fig. 7). To do so, we would have to inject in this simulator data from the target modality (CT, MRI, ultrasound) and from organ localization (e.g. binary masks of liver volume).
References
Alvarez, L., Mazorra, L.: Signal and image restoration using shock filters and anisotropic diffusion. SIAM J. Numer. Anal. 31(2), 590–605 (1994)
Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE Security and Privacy (2017)
Cartade, C., Mercat, C., Malgouyres, R., Samir, C.: Mesh parameterization with generalized discrete conformal maps. J. Math. Imaging Vis. 46(1), 1–11 (2013)
Coupé, P., Manjón, J., Gedamu, E., Arnold, D., Robles, M., Collins, D.: Robust rician noise estimation for MR images. Med. Image Anal. 14(4), 483–493 (2010)
Fawzi, A., Moosavi-Dezfooli, S.M., Frossard, P.: The robustness of deep networks: a geometrical perspective. IEEE Signal Process. Mag. 34, 50–62 (2017)
Gilboa, G., Sochen, N.A., Zeevi, Y.Y.: Regularized shock filters and complex diffusion. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 399–413. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_27
Glatard, T., et al.: A virtual imaging platform for multi-modality medical image simulation. IEEE Trans. Med. Imaging 32(1), 110–118 (2013)
Gudbjartsson, H., Patz, S.: The rician distribution of noisy MRI data. Magn. Reson. Med. 34(6), 910–914 (1995)
He, B., et al.: Fast automatic 3D liver segmentation based on a three-level AdaBoost-guided active shape model. Med. Phys. 43(5), 2421–2434 (2016)
Heimann, T., et al.: Comparison and evaluation of methods for liver segmentation from CT datasets. IEEE Trans. Med. Imaging 28(8), 1251–1265 (2009)
Huang, T., Yang, G., Tang, G.: A fast two-dimensional median filtering algorithm. IEEE Trans. Acoust. Speech Signal Process. 27(1), 13–18 (1979)
3D-IRCADb 01 (2018). http://www.ircad.fr/research/3d-ircadb-01/
Kass, M., Solomon, J.: Smoothed local histogram filters. ACM Trans. Graph. 29(4), 100:1–100:10 (2010)
Lebre, M.-A., et al.: Medical image processing and numerical simulation for digital hepatic parenchymal blood flow. In: Tsaftaris, S.A., Gooya, A., Frangi, A.F., Prince, J.L. (eds.) SASHIMI 2017. LNCS, vol. 10557, pp. 99–108. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68127-6_11
Lebrun, M., Colom, M., Buades, A., Morel, J.: Secrets of image denoising cuisine. Acta Numer. 21, 475–576 (2012)
Limare, N.: Reproducible research, software quality, online interfaces and publishing for image processing. Ph.D. thesis, École normale supérieure de Cachan, France (2012)
Lu, D., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. Int. J. Remote. Sens. 28(5), 823–870 (2007)
Malmberg, F., Nordenskjöld, R., Strand, R., Kullberg, J.: Smartpaint: a tool for interactive segmentation of medical volume images. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 5(1), 36–44 (2014)
Meer, P.: From a robust hierarchy to a hierarchy of robustness. In: Davis, L.S. (ed.) Foundations of Image Understanding. The Springer International Series in Engineering and Computer Science, vol. 628, pp. 323–347. Springer, Boston (2001). https://doi.org/10.1007/978-1-4615-1529-6_11
Meer, P.: Robust techniques for computer vision. In: Emerging Topics in Computer Vision, pp. 107–190. Prentice Hall (2004)
Menze, B., et al. (eds.): Medical Computer Vision: Algorithms for Big Data. LNCS, vol. 8848. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13972-2
Osher, S., Rudin, L.: Feature-oriented image enhancement using shock filters. SIAM J. Numer. Anal. 27, 919–940 (1990)
Robust image processing (2019). https://github.com/antoinevacavant/robustimageprocessing
Thomas, R., McSharry, P.: Big Data Revolution: What Farmers, Doctors and Insurance Agents Teach Us About Discovering Big Data Patterns. Wiley, Hoboken (2015)
Toennies, K.D.: Guide to Medical Image Analysis. Springer, London (2012). https://doi.org/10.1007/978-1-4471-2751-2
Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: IEEE International Conference on Computer Vision, Bombay, India (1998)
Vacavant, A.: A novel definition of robustness for image processing algorithms. In: Kerautret, B., Colom, M., Monasse, P. (eds.) RRPR 2016. LNCS, vol. 10214, pp. 75–87. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56414-2_6
Vacavant, A.: Robust image processing: Definition, algorithms and evaluation. Université Clermont Auvergne, France, Habilitation (2018)
Vacavant, A., Albouy-Kissi, A., Menguy, P., Solomon, J.: Fast smoothed shock filtering. In: IEEE International Conference on Pattern Recognition, Tsukuba, Japan (2012)
Wang, Z.: Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009)
Weickert, J.: Coherence-enhancing shock filters. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 1–8. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45243-0_1
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vacavant, A., Lebre, MA., Rositi, H., Grand-Brochier, M., Strand, R. (2019). New Definition of Quality-Scale Robustness for Image Processing Algorithms, with Generalized Uncertainty Modeling, Applied to Denoising and Segmentation. In: Kerautret, B., Colom, M., Lopresti, D., Monasse, P., Talbot, H. (eds) Reproducible Research in Pattern Recognition. RRPR 2018. Lecture Notes in Computer Science(), vol 11455. Springer, Cham. https://doi.org/10.1007/978-3-030-23987-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-23987-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23986-2
Online ISBN: 978-3-030-23987-9
eBook Packages: Computer ScienceComputer Science (R0)