Keywords

figure a

1 Introduction and Related Work

Spatial and Spatio-temporal model checking have gained an increasing interest in recent years in various domains of application ranging from Collective Adaptive Systems [11, 15, 18] and networked systems [27], to signals [32] and digital images [14, 26]. Research in this field has its origin in the topological approach to spatial logics, dating back to the work of Alfred Tarski (see [9] for a thorough introduction). More recently these early theoretical foundations have been extended to encompass reasoning about discrete spatial structures, such as graphs and images, extending the theoretical framework of topology to (quasi discrete) closure spaces (see for instance [1, 23, 24]). That framework has subsequently been taken further in recent work by Ciancia et al. [13, 14, 17] resulting in the definition of the Spatial Logic for Closure Spaces (SLCS), temporal extensions (see [12, 32, 36]), and related model checking algorithms and tools.

The main idea of spatial (and spatio-temporal) model checking is to use specifications written in a suitable logical language to describe spatial properties and to automatically identify patterns and structures of interest in a variety of domains (see e.g., [5, 16, 18]). In this paper we focus on one such domain, namely medical imaging for radiotherapy, and brain tumour segmentation in particular, which is an important and currently very active research domain of its own. One of the technical challenges of the development of automated (brain) tumour segmentation is that lesion areas are only defined through differences in the intensity (luminosity) in the (black & white) images that are relative to the intensity of the surrounding normal tissue. A further complication is that even (laborious and time consuming) manual segmentation by experts shows significant variations when intensity gradients between adjacent tissue structures are smooth or partially obscured [31]. Moreover, there is a considerable variation across images from different patients and images obtained with different Magnetic Resonance Images (MRI) scanners. Several automatic and semi-automatic methods have been proposed in this very active research area (see e.g., [20,21,22, 29, 34, 37]).

This paper continues the research line of [3, 7, 8], introducing the free and open source tool VoxLogicA (Voxel-based Logical Analyser)Footnote 1, catering for a novel approach to image segmentation, namely a rapid-development, declarative, logic-based method, supported by spatial model checking. This approach is particularly suitable to reason at the “macro-level”, by exploiting the relative spatial relations between tissues or organs at risk. VoxLogicA is similar, in the accepted logical language, and functionality, to the spatio-temporal model checker topocheckerFootnote 2, but specifically designed for the analysis of (possibly multi-dimensional, e.g. 3D) digital images as a specialised image analysis tool. It is tailored to usability and efficiency by employing state-of-the-art algorithms and open source libraries, borrowed from computational image processing, in combination with efficient spatial model checking algorithms.

We show the application of VoxLogicA on BraTS 2017Footnote 3 [2, 31, 35], a publicly available set of benchmark MRI images for brain tumour segmentation, linked to a yearly challenge. For each image, a manual segmentation of the tumour by domain experts is available, enabling rigorous and objective qualitative comparisons via established similarity indexes. We propose a simple, yet effective, high-level specification for glioblastoma segmentation. The procedure, partly derived from the one presented in [3], directly competes in accuracy with the state-of-the-art techniques submitted to the BraTS 2017 challenge, most of which based on machine learning. Our approach to segmentation has the unique advantage of explainability, and is easy to replicate; in fact, the structure of a logically specified procedure can be explained to domain experts, and improved to encompass new observations. A mathematically formalised, unambiguous semantics permits results to be replicated not only by executing them in the multi-platform, open source tool that has been provided, but also by computing them via different implementations.

2 The Spatial Logic Framework

In this section, we briefly recall the logical language ImgQL (Image Query Language) proposed in [3], which is based on the Spatial Logic for Closure Spaces SLCS [13, 14] and which forms the kernel of the framework we propose in the present paper. In Sect. 4 we will see how the resulting logic can be used for actual analysis via spatial model checking.

2.1 Foundations: Spatial Logics for Closure Spaces

The logic for closure spaces we use in the present paper is closely related to SLCS  [13, 14] and, in particular, to the SLCS extension with distance-based operators presented in [3]. As in [3], the resulting logic constitutes the kernel of a solid logical framework for reasoning about texture features of digital images, when interpreted as closure spaces. In the context of our work, a digital image is not only a 2-dimensional grid of pixels, but, more generally, a multi-dimensional (very often, 3-dimensional) grid of hyper-rectangular elements that are called voxels (“volumetric picture elements”). When voxels are not hypercubes, images are said to be anisotropic; this is usually the case in medical imaging. Furthermore, a digital image may contain information about its “real world” spatial dimensions, position (origin) and rotation, permitting one to compute the real-world coordinates of the centre and edges of each voxel. In medical imaging, such information is typically encapsulated into data by machines such as MRI scanners. In the remainder of the paper, we make no dimensionality assumptions. From now on, we refer to picture elements either as voxels or simply as points.

Definition 1

A closure space is a pair \((X,\mathcal {C})\) where X is a non-empty set (of points) and \(\mathcal {C}: 2^X \rightarrow 2^X\) is a function satisfying the following axioms: \(\mathcal {C}(\emptyset )=\emptyset \); \(Y \subseteq \mathcal {C}(Y)\) for all \(Y \subseteq X\); \(\mathcal {C}(Y_1 \cup Y_2) = \mathcal {C}(Y_1) \cup \mathcal {C}(Y_2)\) for all \(Y_1,Y_2\subseteq X\). \(\bullet \)

Given any relation \(R\subseteq X \times X\), function \(\mathcal {C}_{R}:2^X \rightarrow 2^X\) with \(\mathcal {C}_{R}(Y) \triangleq Y\cup \{x \mid \exists y \in Y. y \, R\,x\}\) satisfies the axioms of Definition 1 thus making \((X,\mathcal {C}_{R})\) a closure space. Whenever a closure space is generated by a relation as above, it is called a quasi-discrete closure space. A quasi-discrete closure space \((X,\mathcal {C}_{R})\), can be used as the basis for a mathematical model of a digital image. X represents the finite set of voxels and \(R\) is the reflexive and symmetric adjacency relation between voxels [25]. A closure space \((X,\mathcal {C})\) can be enriched with a notion of distance, i.e. a function \(d: X \times X \rightarrow \mathbb {R}_{\ge 0} \cup \{\infty \}\) such that \(d(x,y)=0\) iff \(x=y\), leading to the distance closure space \(((X,\mathcal {C}),d)\).Footnote 4

It is sometimes convenient to equip the points of a closure space with attributes; for instance, in the case of images, such attributes could be the color or intensity of voxels. We assume sets A and V of attribute names and values, and an attribute valuation function \(\mathcal{A}\) such that \(\mathcal{A}(x,a) \in V\) is the value of attribute a of point x. Attributes can be used in assertions \(\alpha \), i.e. boolean expressions, with standard syntax and semantics. Consequently, we abstract from related details here and assume function \(\mathcal{A}\) extended in the obvious way; for instance, \(\mathcal{A}(x,a \le c) = \mathcal{A}(x,a) \le c\), for appropriate constant c.

A (quasi-discrete) path \(\pi \) in \((X,\mathcal {C}_{R})\) is a function \(\pi : \mathbb {N}\rightarrow X\), such that for all \(Y \subseteq \mathbb {N}\), \(\pi (\mathcal {C}_{Succ}(Y)) \subseteq \mathcal {C}_{R}(\pi (Y))\), where \((\mathbb {N}, \mathcal {C}_{Succ})\) is the closure space of natural numbers with the successor relation: \((n,m) \in Succ \Leftrightarrow m=n+1\). Intuitively: the ordering in the path imposed by \(\mathbb {N}\) is compatible with relation \(R\), i.e. \(\pi (i) \, R\, \pi (i+1)\). For given set \(P\) of atomic predicates p, and interval of \(\mathbb {R}\) I, the syntax of the logic we use in this paper is given below:

(1)

We assume that space is modelled by the set of points of a distance closure space; each atomic predicate \(p \in P\) models a specific feature of points and is thus associated with the points that have this featureFootnote 5. A point x satisfies \(\mathcal {N}\, \varPhi \) if a point satisfying \(\varPhi \) can be reached from x in at most one (closure) step, i.e. if x is near (or close) to a point satisfying \(\varPhi \); x satisfies \(\rho \;\varPhi _1[\varPhi _2]\) if x may reach a point satisfying \(\varPhi _1\) via a path passing only by points satisfying \(\varPhi _2\); it satisfies \(\mathcal {D}^{I} \, \varPhi \) if its distance from the set of points satisfying \(\varPhi \) falls in interval I. The logic includes logical negation (\(\lnot \)) and conjunction (\(\wedge \)). In the following we formalise the semantics of the logic. A distance closure model \(\mathcal {M}\) is a tuple \(\mathcal {M}=(((X,\mathcal {C}),d), \mathcal{A}, \mathcal{V})\), where \(((X,\mathcal {C}),d)\) is a distance closure space, \(\mathcal{A}: X \times A\rightarrow V\) an attribute valuation, and \(\mathcal{V}: P\rightarrow 2^X\) is a valuation of atomic propositions.

Definition 2

Satisfaction of a formula \(\varPhi \) at point \(x \in X\) in model \(\mathcal {M}= (((X,\mathcal {C}),d), \mathcal{A}, \mathcal{V})\) is defined by induction on the structure of formulas:

where, when \(p:=\alpha \) is a definition for p, we let \(x\in \mathcal{V}(p)\) iff \(\mathcal{A}(x,\alpha )\) is true. \(\bullet \)

In the logic proposed in [13, 14], the “may reach” operator is not present, and the surrounded operator \(\mathcal {S}\) has been defined as basic operator as follows: x satisfies \(\varPhi _1 \, \mathcal {S}\, \varPhi _2\) if and only if x belongs to an area of points satisfying \(\varPhi _1\) and one cannot “escape” from such an area without hitting a point satisfying \(\varPhi _2\). Several types of reachability predicates can be derived from \(\mathcal {S}\). However, reachability is in turn a widespread, more basic primitive, implemented in various forms (e.g., flooding, connected components) in programming libraries. Thus, in this work, we prefer to use reachability as a basic predicate of the logic, as in [4], which is dedicated to extending the Spatial Signal Temporal Logic of [32]. In the sequel we show that \(\mathcal {S}\) can be derived from the operators defined above, employing a definition patterned after the model-checking algorithm of [13]. This change simplifies the definition of several derived connectives, including that of \( touch \) (see below), and resulted in notably faster execution times for analyses using such derived connectives. We recall the definition of \(\mathcal {S}\) from [14]: if and only if and for all paths \(\pi \) and indexes \(\ell \) we have: if \(\pi (0)=x\) and , then there is j such that \(0 < j \le \ell \) and .

Proposition 1

For all closure models \(\mathcal {M}=((X,\mathcal {C}), \mathcal{A}, \mathcal{V})\) and all formulas \(\varPhi _1\), \(\varPhi _2\) the following holds: \( \varPhi _1 \, \mathcal {S}\, \varPhi _2 \equiv \varPhi _1 \, \wedge \lnot (\rho \;\lnot (\varPhi _1 \vee \, \varPhi _2)[\lnot \varPhi _2]) \) \(\diamond \)

Definition 3

We define some derived operators that are of particular use in medical image analysis: \( touch (\varPhi _1, \varPhi _2) \triangleq \varPhi _1 \wedge \rho \;\varPhi _2[\varPhi _1] ; grow (\varPhi _1 , \varPhi _2) \triangleq \varPhi _1 \vee touch ( \varPhi _2 , \varPhi _1) ; flt (r, \varPhi _1) \triangleq \mathcal {D}^{< r} (\mathcal {D}^{\ge r} \lnot \varPhi _1) \) \(\bullet \)

The formula \( touch (\varPhi _1, \varPhi _2)\) is satisfied by points that satisfy \(\varPhi _1\) and that are on a path of points satisfying \(\varPhi _1\) that reaches a point satisfying \(\varPhi _2\). The formula \( grow (\varPhi _1, \varPhi _2)\) is satisfied by points that satisfy \(\varPhi _1\) and by points that satisfy \(\varPhi _2\) which are on a path of points satisfying \(\varPhi _2\) that reaches a point satisfying \(\varPhi _1\). The formula \( flt (r,\varPhi _1)\) is satisfied by points that are at a distance of less than r from a point that is at least at distance r from points that do not satisfy \(\varPhi _1\). This operator works as a filter; only contiguous areas satisfying \(\varPhi _1\) that have a minimal diameter of at least 2r are preserved; these are also smoothened if they have an irregular shape (e.g. protrusions of less than the indicated distance).

Example 1

In Fig. 1, the top row shows four pictures using colours blue and red, interpreted as atomic propositions. Each picture in the bottom row shows in white the points that satisfy a given formula. In particular: Fig. 1e is \(blue\, \mathcal {S}\, red\) of (a); Fig. 1f is \(\text{ touch }(red,blue)\) of (b); Fig. 1g is \(\text{ grow }(red, blue)\) of (c); Fig. 1h is \(red \, \mathcal {S}\; ( \mathcal {D}^{\le 11} blue)\) of (d). For more details the reader is referred to [6].

Fig. 1.
figure 1

Some examples of ImgQL operators (see Example 1). (Color figure online)

2.2 Region Similarity via Statistical Cross-correlation

In the sequel, we provide some details on a logical operator, first defined in [3], that we use in the context of Texture Analysis (see for example [10, 19, 28, 30]) for defining a notion of statistical similarity between image regions. The statistical distribution of an area Y of a black and white image is approximated by the histogram of the grey levels of points (voxels) belonging to Y, limiting the representation to those levels laying in a certain interval [mM], the latter being split into k bins. In the case of images modelled as closure models, where each point may have several attributes, the histogram can be defined for different attributes. Given a closure model \(\mathcal {M}= ((X,\mathcal {C}), \mathcal{A}, \mathcal{V})\), define function \(\mathcal{H}: A \times 2^X \times \mathbb {R}\times \mathbb {R}\times \mathbb {N}\rightarrow (\mathbb {N}\rightarrow \mathbb {N})\) such that for all \(m<M\), \(k>0\) and \(i\in \{1,\ldots ,k\}\), \( \mathcal{H}(a,Y,m,M,k)(i) = \Big |\{y\in Y \mid (i-1) \cdot \varDelta \le \mathcal{A}(y,a) - m < i \cdot \varDelta \}\Big | \) where \(\varDelta =\frac{M-m}{k}\). We call \(\mathcal{H}(a,Y,m,M,k)\) the histogram of Y (for attribute a), with k bins and m, M min and max values respectively. The mean \(\overline{h}\) of a histogram h with k bins is the quantity \(\frac{1}{k}\sum _{i=1}^k h(i)\). The cross correlation between two histograms \(h_1, h_2\) with the same number k of bins is defined as follows: \( \mathbf {r}(h_1,h_2) = \frac{\sum _{i=1}^k\left( h_1(i) - \overline{h_1} \right) \left( h_2(i) - \overline{h_2} \right) }{ \sqrt{ \sum _{i=1}^k \left( h_1(i) - \overline{h_1} \right) ^2 } \sqrt{ \sum _{i=1}^k \left( h_2(i) - \overline{h_2} \right) ^2 } } \). The value of \(\mathbf {r}{}{}\) is normalised so that \(-1\le \mathbf {r}{}{}\le 1\); \(\mathbf {r}(h_1,h_2)=1\) indicates that \(h_1\) and \(h_2\) are perfectly correlated (that is, \(h_1 = ah_2+b\), with \(a>0\)); \(\mathbf {r}(h_1,h_2) =-1\) indicates perfect anti-correlation (that is, \(h_1=ah_2+b\), with \(a<0\)). On the other hand, \(\mathbf {r}(h_1,h_2) = 0\) indicates no correlation.

We embed statistical similarity \(\triangle \!\!\!{\scriptstyle \triangle }_{\bowtie c} {\tiny \left[ \begin{array}{ccc} m &{} M &{} k\\ r &{} a &{} b \end{array} \right] }\) in the logic by adding it to the grammar defined by (1) and extending the definition of the satisfaction relation (Definition 2) with the following equation, for mMk as above:

where \(h_a=\mathcal{H}(a,S(x,r),m,M,k)\), , c is a constant in \([-1,1]\), \(\bowtie \, \in \{<, \le , =, \ge , >\}\) and \(S(x,r)=\{y\in X \mid d(x,y) \le r\}\) is the sphere of radius r centred in x. Note that, differently from topochecker that was used in [3], in VoxLogicA, for efficiency reasons, S(xr) is actually the hypercube with edge size 2r, which, for anisotropic images, becomes a hyperrectangle. So \(\triangle \!\!\!{\scriptstyle \triangle }_{\bowtie c} {\tiny \left[ \begin{array}{ccc} m &{} M &{} k\\ r &{} a &{} b \end{array} \right] } \varPhi \) compares the region of the image constituted by the sphere (hypercube) of radius r centred in x against the region characterised by \(\varPhi \). The comparison is based on the cross correlation of the histograms of the chosen attributes of (the points of) the two regions, namely a and b and both histograms share the same range ([mM]) and the same bins ([1, k]). In summary, the operator allows to check to which extent the sphere (hypercube) around the point of interest is statistically similar to a given region (specified by) \(\varPhi \).

3 The Tool VoxLogicA

VoxLogicA is a framework for image analysis, that embeds the logic ImgQL into a user-oriented expression language to manipulate images. More precisely, the VoxLogicA type system distinguishes between boolean-valued images, that can be arguments or results of the application of ImgQL operators, and number-valued images, resulting from imaging primitives. Underlying such expression language is a global model checker, that is, the set of points satisfying a logic formula is computed at once; this is done implicitly when an expression corresponding to a logic formula is saved to an image. Functionality-wise, VoxLogicA specialises topochecker to the case of spatial analysis of multi-dimensional images. It interprets a specification written in the ImgQL language, using a set of multi-dimensional imagesFootnote 6 as models of the spatial logic, and produces as output a set of multi-dimensional images representing the valuation of user-specified expressions. For logical operators, such images are Boolean-valued, that is, regions of interest in medical imaging terminology, which may be loaded as overlays in medical image viewers. Non-logical operators may generate number-valued images. VoxLogicA augments ImgQL with file loading and saving primitives, and a set of additional commodity operators, specifically aimed at image analysis, that is destined to grow along with future developments of the tool. The main execution modality of VoxLogicA is batch execution. A (currently experimental) graphical user interface is under development.

Implementation-wise, the tool achieves a two-orders-of-magnitude speedup with respect to topochecker. Such speedup has permitted the rapid development of a novel procedure for automatic segmentation of glioblastoma that, besides being competitive with respect to the state-of-the-art in the field (see Sect. 4), is also easily replicable and explainable to humans, and therefore amenable of improvement by the community of medical imaging practitioners.

3.1 Functionality

We provide an overview of the tool functionality, starting from its syntax. For space reasons, we omit details on parsing rules (delegated to the tool documentation). In the following, f, x1,..., xN, x are identifiers, "s" is a string, and e1, ..., eN, e are expressions (to be detailed later). A VoxLogicA specification consists of a text file containing a sequence of commands (see Specification 1 in Sect. 4 as an example). Five commands are currently implemented:

  • let f(x1,...,xN) = e is used for function declaration, also in the form let f = e(constant declaration), and with special syntactic provisions to define infix operators. After execution of the command, name f is bound to a function or constant that evaluates to e with the appropriate substitutions of parameters;

  • load x = "s" loads an image from file "s" and binds it to x for subsequent usage;

  • save "s" e stores the image resulting from evaluation of expression e to file "s";

  • print "s" e prints to the log the string s followed by the numeric, or boolean, result of computing e;

  • import "s" imports a library of declarations from file "s"; subsequent import declarations for the same file are not processed; furthermore, such imported files can only contain let or import commands.

VoxLogicA comes equipped with a set of built-in functions, such as arithmetic operators, logic primitives as described in Sect. 2, and imaging operators, for instance for computing the gray-scale intensity of a colour image, or its colour components, or the percentiles of its values (see Sect. 4.1). An exhaustive list of the available built-ins is provided in the user manualFootnote 7. Furthermore, a “standard library” is provided containing short-hands for commonly used functions, and for derived operators. An expression may be a numeric literal (no distinction is made between floating point and integer constants), an identifier (e.g. x), a function application (e.g. f(x1,x2)), an infix operator application (e.g. x1 + x2), or a parenthesized (sub-)expression (e.g. (x1 + x2)).

The language features strong dynamic typing, that is, types of expressions are unambiguously checked and errors are precisely reported, but such checks are only performed at “run time”, that is, when evaluating closed-form expressions with no free variables. The type system has currently been kept lightweight (the only typing rules regard constants and function application), in order to leave the design space open to future improvements. For instance, a planned development is function and operator overloading, as well as some form of static typing not interfering with the usability of the tool.

However, it is not the case that a type error may waste a long-running analysis. Type checking occurs after loading and parsing, but before analysis is run. Actual program execution after parsing is divided into two phases. First (usually, in a negligible amount of time), all the “save” and “print” instructions are examined to determine what expressions actually need to be computed; in this phase, name binding is resolved, all constant and function applications are substituted with closed expressions, types are checked and the environment binding expressions to names is discarded. Finally, the set of closed expressions to be evaluated is transformed into a set of tasks to be executed, possibly in parallel, and dependencies among them. After this phase, no further syntax processing or name resolution are needed, and it is guaranteed that the program is free from type errors. The second phase simply runs each task – in an order compliant with dependencies – parallelising execution on multiple CPU cores.

Each built-in logical operator has an associated type of its input parameters and output result. The available types are inductively defined as Number, Bool, String, Model, and Valuation(t), where t is in turn a type. The type Model is the type assigned to x in load x = "f"; operations such as the extraction of RGB components take this type as input, and return as output the only parametric type: Valuation(t), which is the type of a multi-dimensional image in which each voxel contains a value of type t. For instance, the red component of a loaded model has type Valuation(Number), whereas the result of evaluating a logic formula has type Valuation(Bool)Footnote 8.

An important aspect of the execution semantics of VoxLogicA specifications is memoization, constituting the core of its execution engine, and used to achieve maximal sharing of subformulas. In VoxLogicA, no expression is ever computed twice, freeing the user from worrying about how many times a given function is called, and making execution of complex macros and logical operators feasible.

3.2 Implementation Details

VoxLogicA is implemented in the functional, object-oriented programming language FSharp, using the .NET Core implementation of the .NET specificationFootnote 9. This permits a single code base with minimal environment-dependent setup to be cross-compiled and deployed as a standalone executable, for the major desktop operating systems, namely Linux, macOS, and Windows. Despite .NET code is compiled for an intermediate machine, this does not mean that efficiency of VoxLogicA is somehow “non-native”. There are quite a number of measures in place to maximise efficiency. First and foremost, the execution time is heavily dominated by the time spent in native libraries (more details below), and VoxLogicA acts as a higher-level, declarative front-end for such libraries, adding a logical language, memoization, parallel execution, and abstraction from a plethora of technical details that a state-of-the-art imaging library necessarily exposes. In our experiments, parsing, memoization, and preparation of the tasks to be run may take a fraction of a second; the rest of the execution time (usually, several seconds, unless the analysis is extremely simple) is spent in foreign function calls. The major performance boosters in VoxLogicA are: a state-of-the-art computational imaging library (ITK); the optimised implementation of the may reach operator; a new algorithm for statistical cross-correlation; an efficient memoizing execution engine; parallel evaluation of independent tasks, exploiting modern multi-core CPUs. Moreover, special care has been put in making all performance-critical loops allocationless. All used memory along the loops is pre-allocated, avoiding the risk to trigger garbage collection during computation. We will address each of them briefly in the following.

ITK Library. VoxLogicA uses the state-of-the-art imaging library ITK, via the SimpleITK glue libraryFootnote 10. Most of the operators of VoxLogicA are implemented directly by a library call. Notably, this includes the Maurer distance transform, used to efficently implement the distance operators of ImgQL.

Novel Algorithms. The two most relevant operators that do not have a direct implementation in ITK are mayReach and crossCorrelation, implementing, respectively, the logical operator \(\rho \), and statistical comparison described in Sect. 2.2. The computation of the voxels satisfying \(\rho \;\phi _1[\phi _2]\) can be implemented either using the (classical, in computer graphics) flood-fill primitive, or by exploiting the connected components of \(\phi _2\) as a reachability primitive; both solutions are available in SimpleITK. In our experiments, connected components perform better using this library from FSharp, for large input seeds. Several critical logical connectives (e.g. surrounded and touch), are defined in terms of mayReach. Therefore, an optimised algorithm for mayReach is a key performance improvement. The crossCorrelation operation is resource-intensive, as it uses the histogram of a multi-dimensional hyperrectangle at each voxel. Pre-computation methods such as the integral histogram [33], would not yield the expected benefits, because cross-correlation is called only few times on the same image. In this work, we designed a parallel algorithm exploiting additivity of histograms. Given two sets of values \(P_1\), \(P_2\), let \(h_1\), \(h_2\) be their respective histograms, and let \(h'_1\), \(h'_2\) be the histograms of \(P_1{\setminus }P_2\) and \(P_2{\setminus }P_1\). For i a bin, we have \(h_2(i) = h_1(i) - h'_1(i) + h'_2(i)\). This property leads to a particularly efficient algorithm when \(P_1\) and \(P_2\) are two hyperrectangles centred over adjacent voxels, as \(P_1{\setminus }P_2\) and \(P_2{\setminus }P_1\) are hyperfaces, having one dimension less than hyperrectangles. Our algorithm divides the image into as many partitions as the number of available processors, and then computes a Hamiltonian path for each partition, passing by each of its voxels exactly once. All partitions are visited in parallel, in the order imposed by such Hamiltonian paths; the histogram is computed incrementally as described above; finally cross-correlation is also computed and stored in the resulting image. The asymptotic algorithmic complexity of the implementation of ImgQL primitives in VoxLogicA is linear in the number of voxels, with the exception of crossCorrelation, which, by the above explanation, has complexity \(O(k\cdot n)\), where n is the number of voxels, and k is the size of the largest hyperface of the considered hypercube.

Memoizing Execution Semantics. Sub-expressions in VoxLogicA are by construction identified up-to syntactic equality and assigned a number, representing a unique identifier (UID). UIDs start from 0 and are contiguous, therefore admitting an array of all existing sub-formulas to be used to pre-computed valuations of expressions without further hashing.

3.3 Design and Data Structures

The design of VoxLogicA defines three implementation layers. The core execution engine implements the concurrent, memoizing semantics of the tool. The interpreter is responsible for translating source code into core library invocations. These two layers only include some basic arithmetic and boolean primitives. Operators can be added by inheriting from the abstract base class Model. The third implementation layer is the instantiation of the core layer to define operators from ImgQL, and loading and saving of graphical models, using the ITK library. We provide some more detail on the design of the core layer, which is the most critical part of VoxLogicA. At the time of writing, the core consists of just 350 lines of FSharp code, that has been carefully engineered not only for performance, but also for ease of maintenance and future extensions.

The essential classes are ModelChecker, FormulaFactory, Formula, and Operator, of which Constant is a subclass. Class Operator describes the available operators and their evaluation method. Class Formula is a symbolic representation of a syntactic sub-expression. Each instance of Formula has a unique numeric id (UID), an instance of Operator, and (inductively) a list of Formula instances, denoting its arguments. The UID of a formula is determined by the operator name (which is unique across the application), and the list of parameter UIDs. Therefore, by construction, it is not possible to build two different instances of Formula that are syntactically equal. UIDs are contiguous and start from 0. By this, all created formulas can be inserted into an array. Furthermore, UIDs are allocated in such a way that the natural number order is a topological sort of the dependency graph between subformulas (that is, if \(f_1\) is a parameter of \(f_2\), the UID of \(f_1\) is greater than the UID of \(f_2\)). This is exploited in class ModelChecker; internally, the class uses an array to store the results of evaluating each Formula instance, implementing memoization. The class ModelChecker turns each formula into a task to be executed. Whenever a formula with UID i is a parameter of the formula with UID j, a dependency is noted between the associated tasks. The high-level, lightweight concurrent programming library HopacFootnote 11 and its abstractions are used to evaluate the resulting task graph, in order to maximise CPU usage on multi-core machines.

4 Experimental Evaluation

The performance of VoxLogicA has been evaluated on the Brain Tumor Image Segmentation Benchmark (BraTS) of 2017 [2, 31] containing 210 multi contrast MRI scans of high grade glioma patients that have been obtained from multiple institutions and were acquired with different clinical protocols and various scanners. All the imaging data sets provided by BraTS 2017 have been segmented manually and approved by experienced neuro-radiologists. In our evaluation we used the T2 Fluid Attenuated Inversion Recovery (FLAIR) type of scans, which is one of the four provided modalities in the benchmark. Use of other modalities is planned for future work. For training, the numeric parameters of the VoxLogicA specification presented in Sect. 4.1 were manually calibrated against a subset of 20 cases. Validation of the method was conducted as follows. A priori, 17 of the 210 cases can be excluded because the current procedure is not suitable for these images. This is because of the presence of multi-focal tumours (different tumours in different areas of the brain), or due to clearly distinguishable artifacts in the FLAIR acquisition, or because the hyperintense area is too large and clearly not significant (possibly by incorrect acquisition). Such cases require further investigation. For instance, the current procedure may be improved to identify specific types of artefacts, whereas multi-modal analysis can be used to complement the information provided by the FLAIR image in cases where FLAIR hyperintensity is not informative enough. In Sect. 4.2, we present the results both for the full dataset (210 cases), and for the subset without these problematic cases (193 cases). We considered both the gross tumour volume (GTV), corresponding to what can actually be seen on an image, and the clinical target volume (CTV) which is an extension of the GTV. For glioblastomas this margin is a 2–2.5 cm isotropic expansion of the GTV volume within the brain.

4.1 ImgQL Segmentation Procedure

Specification 1 shows the tumour segmentation procedure that we used for the evaluationFootnote 12. The syntax is that of VoxLogicA, namely: |,&,! are boolean or, and, not; distlt(c,phi) is the set (similarly, distgeq; distances are in millimiters); crossCorrelation(r,a,b,phi,m,M,k) yields a cross-correlation coefficient for each voxel, to which a predicate c may be applied to obtain the statistical similarity function of Sect. 2.2; the > operator performs thresholding of an image; border is true on voxels that lay at the border of the image. Operator percentiles(img,mask), where img is a number-valued image, and mask is boolean-valued, considers the points identified by mask, and assigns to each such point x the fraction of points that have an intensity below that of x in img. Other operators are explained in Definition 3 (see also Fig. 1). Figure 2 shows the intermediate phases of the procedure, for axial view of one specific 2D slice of an example 3D MRI scan of the BraTS 2017 data set.

We briefly discuss the specification (see [6] for more details). Lines 1–8 merely define utility functions and load the image, calling it flair. Lines 9–10 define the background as all voxels in the area of intensity less than 0.1 that touches the border of the image, and the brain as the complement of the background. The application of percentiles in line 11 assigns to each point of the brain the percentile rank of its intensity among those that are part of brain. Based on these percentiles, hyper-intense and very-intense points are identified that satisfy hI and vI, respectively (lines 12–13). Hyper-intense points have a very high likelihood to belong to tumour tissue; very-high intensity points are likely to belong to the tumour as well, or to the oedema that is usually surrounding the tumour. However, not all hyper-intense and very-intense points are part of a tumour. The idea is to identify the actual tumour using further spatial information. In lines 14–15 the hyper-intense and very-intense points are filtered, thus removing noise, and considering only areas of a certain relevant size. The points that satisfy hyperIntense and veryIntense are shown in red in Fig. 2a and in Fig. 2b, respectively. In line 16 the areas of hyper-intense points are extended via the grow operator, with those areas that are very intense (possibly belonging to the oedema), and in turn touch the hyper-intense areas. The points that satisfy growTum are shown in red in Fig. 2c. In line 17 the previously-defined (line 8) similarity operator is used to assign to all voxels a texture-similarity score with respect to growTum. In line 18 this operator is used to find those voxels that have a high cross correlation coefficient and thus are likely part of the tumour. The result is shown in Fig. 2d. Finally (line 19), the voxels that are identified as part of the whole tumour are those that satisfy growTum extended with those that are statistically similar to it via the grow operator. Points that satisfy tumFinal are shown in red in Fig. 2e and points identified by manual segmentation are shown for comparison in blue in the same figure (overlapping areas are purple).

figure b

Interesting aspects of the ImgQL specification are its relative simplicity and abstraction level, fitting that of neuro-radiologists, its explainability, its time-efficient verification, admitting a rapid development cycle, and its independence of normalisation procedures through the use of percentiles rather than absolute values for the intensity of voxels.

4.2 Validation Results

Results of tumour segmentation are evaluated based on a number of indexes commonly used to compare the quality of different techniques (see [31]). These indexes are based on the true positive (TP) voxels (voxels that are identified as part of a tumour in both manual and VoxLogicA segmentation), true negatives (TN) voxels (those that are not identified as part of a tumour in both manual and VoxLogicA segmentation), false positives (FP) voxels (those identified as part of a tumour by VoxLogicA but not by manual segmentation) and false negatives (FN) voxels (those identified as part of a tumour by manual segmentation but not by VoxLogicA). Based on these four types the following indexes are defined: sensitivity: \(\text{ TP } / (\text{ TP } + \text{ FN })\); specificity: \(\text{ TN } / (\text{ TN } + \text{ FP })\); Dice: \(2*\text{ TP } / (2*\text{ TP } + \text{ FN } +\text{ FP })\). Sensitivity measures the fraction of voxels that are correctly identified as part of a tumour. Specificity measures the fraction of voxels that are correctly identified as not being part of a tumour. The Dice similarity coefficient is used to provide a measure of the similarity of two segmentations. Table 1 shows the mean values of the above indexes both for GTV and CTV volumes for Specification 1 applied to the BraTS 2017 training phase collection. The top-scoring methods of the BraTS 2017 Challenge [35] can be considered a good sample of the state-of-the-art in this domain. Among those, in order to collect significant statistics, we selected the 18 techniques that have been applied to at least 100 cases of the dataset. The median and range of values of the sensitivity, specificity and Dice indexes for the GTV segmentation of the whole tumour are, respectively, 0.88 (ranging from 0.55 to 0.97), 0.99 (0.98 to 0.999) and 0.88 (0.64 to 0.96). The 3D images used in this experiment have size \(240\times 240\times 155\) (about 9 million voxels). The evaluation of each case study takes about 10 s on a desktop computer equipped with an Intel Core I7 7700 processor (with 8 cores) and 16 GB of RAM.

Fig. 2.
figure 2

Tumour segmentation of image Brats17_2013_2_1, FLAIR, axial 2D slice at X = 155, Y = 117 and Z = 97. (a) hyperIntense (b) veryIntense (c) growTum (d) tumStatCC (e) tumFinal (red) and manual (blue, overlapping area is purple). (Color figure online)

Table 1. VoxLogicA evaluation on the BraTS 2017 benchmark.

4.3 Comparison with topochecker

The evaluation of VoxLogicA that we presented in this section uses features that are present in VoxLogicA, but not in topochecker. On the other hand, the example specification in [3], and its variant aimed at 3D images, are quite similar to the one we presented, and can be readily used to compare the performance of VoxLogicA and topochecker. The specifications consist of two human-authored text files of about 30 lines each. The specifications were run on a desktop computer equipped with an Intel Core I7 7700 processor (with 8 cores) and 16 GB of RAM. In the 2D case (image size: \(512\,\times \,512\)), topochecker took 52 s to complete the analysis, whereas VoxLogicA took 750 ms. In the 3D case (image size: \(512\times \,512\times \,24\)), topochecker took about 30 min, whereas VoxLogicA took 15 s. As we mentioned before, this huge improvement is due to the combination of a specialised imaging library, new algorithms (e.g., for statistical similarity of regions), parallel execution and other optimisations. More details could be obtained by designing a specialised set of benchmarks, where some of which can also be run using topochecker; however, for the purposes of the current paper, the performance difference is so large that we do not deem such detailed comparison necessary.

5 Conclusions and Future Work

We presented VoxLogicA, a spatial model checker designed and optimised for the analysis of multi-dimensional digital images. The tool has been successfully evaluated on 193 cases of an international brain tumour 3D MRI segmentation benchmark. The obtained results are well-positioned w.r.t. the performance of state-of-the-art segmentation techniques, both efficiency-wise and accuracy-wise. Future research work based on the tool will focus on further benchmarking (e.g. various other types of tumours and tumour tissue such as necrotic and non-enhancing parts), and clinical application. On the development side, planned future work includes a graphical (web) interface for interactive parameter calibration (for that, execution times will need to be further improved, possibly employing GPU computing); improvements in the type-system (e.g. operator overloading); turning the core design layer into a reusable library available for other projects. Finally, the (currently small, albeit useful) library of logical and imaging-related primitives available will be enhanced, based on input from case studies. Calibration of the numerical parameters of our Glioblastoma segmentation was done manually. Future work aims at exploring different possibilities for human-computer interaction in designing such procedures (e.g. via ad-hoc graphical interfaces), to improve user friendliness for domain experts. Experimentation in combining machine-learning methods with the logic-based approach of VoxLogicA are also worth being explored in this respect.