Keywords

1 Introduction

One of the most common surgical complications is due to inadvertent damage to blood vessels. Avoiding vascular structures is particularly challenging in minimally invasive surgery (MIS) and robotic MIS (RMIS) where the tactile senses are inhibited and cannot be used to detect pulsatile motion. Vessels can be detected by using interventional imaging modalities like fluorescence or ultrasound (US) but these do not always produce a sufficient signal, or are difficult to use in practice [1]. Using video information directly is appealing because it is inherently available, but processing is required to reveal any vessel information hidden within the video and is not apparent to the surgeon, as can be seen in the right image of Fig. 1.

Fig. 1.
figure 1

(Left): The vessel distension-displacement from the pulse wave, with the higher order derivatives along with annotation of the corresponding cardio-physiological stages. Down sampled to 30 data points to reflect endoscope frame rate acquisition. (1-D Virtual Model of arterial behaviour [2]) (Right): Endoscopic video image stack. The blue box surrounds an artery with no perceivable motion, shown by the vertical white line in the cross section

The cardiovascular system creates a pressure wave that propagates through the entire body and causes an equivalent distension-displacement profile in the arteries and veins [3]. This periodic motion has intricate characteristics, shown in Fig. 1 (left), that can be highlighted by differentiating the distension-displacement signal. The second order derivative outlines where the systolic uptake is located, whilst the third derivative highlights the end diastolic phase and the dicrotic notch. This information can be present as spatio-temporal variation between image frames and amplified using Eulerian video magnification (EVM). EVM could be applied to endoscopic video for vessel localisation by using an adaptation of an EVM algorithm and showing the output video directly to the surgeon [4]. Similarly, EVM can aid vessel segmentation for registration and overlay of pre-operative data [5], as existing linear based forms of the raw magnified video can be abstract and noisy to use directly within a dynamic scene. Magnifying the underlying video motion can exacerbate unwanted artifacts and unsought motions, and in this case regarding surgical video, of those which are not the blood vessels but due to respiration, endoscope motion or other physiological movement within the scene.

In this paper, we propose to utilise features that are apparent in the cardiac pulse wave, particularly the non-linear motion components that are emphasised by the third order of displacement, known as jerk (Green plot Fig. 1, left). We devise a custom temporal filter and use an existing technique for spatial decomposition of complex steerable pyramids [6]. The result is a more coherent magnified video compared to existing lower order of motion approaches [7, 8], as the high magnitudes of jerk are prominently exclusive to the pulse wave in the surgical scene, as our method avoids amplification of residual motions due to respiration or other periodic scene activities. Quantitative results are difficult for such approaches but we report a comparison to previous work using Structure Similarity [9] and Peak Signal to Noise Ratio (PSNR) of three robotic assisted surgical videos at separate optical zoom. We provide a qualitative example of how our method achieves isolation of two cardio-physiological features over existing methods. A supplementary video of the magnifications is provided that further illustrates the results.

2 Methods

Building on previous work in video motion magnification [7, 8, 10] we set out to highlight the third order motion characteristics created by the cardiac cycle. In an Eulerian frame of reference, the input image signal function is taken as \(I(\mathbf {x},t)\) at position \(\mathbf {x}\) (\(\mathbf {x} = (x,y)\)) and at time t [10]. With the linear magnification methods, \(\delta (t)\) is taken as a displacement function with respect to time, giving the expression \(I(\mathbf {x},t) = f(\mathbf {x} + \delta (t))\) and is equivalent to the first-order term in the Taylor expansion:

$$\begin{aligned} I(\mathbf {x},t) \thickapprox f(\mathbf {x})+\delta (t)\frac{\partial f(\mathbf {x})}{\partial \mathbf {x}} \end{aligned}$$
(1)

This Taylor series expansion appropriation can be continued into higher orders of motion, as shown in [8]. Taking it to the third order, where \(\hat{I}(\mathbf {x},t)\) is the magnified pixel at point \(\mathbf {x}\) and time t in the video.

$$\begin{aligned} \hat{I}(\mathbf {x},t) \thickapprox f(\mathbf {x})+(1+\beta )\delta (t)\frac{\delta f(\mathbf {x})}{\delta \mathbf {x}} +(1+\beta )^{2}\delta (t)^{2}\frac{1}{2}\frac{\delta ^{2}f(\mathbf {x})}{\delta ^{2}\mathbf {x}} +(1+\beta )^{3}\delta (t)^{3}\frac{1}{6}\frac{\delta ^{3}f(\mathbf {x})}{\delta ^{3}} \end{aligned}$$
(2)

In a similar vein to [8], we equate a component of the expansion to an order of motion and isolate these by subtraction of the lower orders

$$\begin{aligned} I(\mathbf {x},t) - I(\mathbf {x},t)_{non-linear(2^{nd}order)} - I(x,t)_{linear}\thickapprox (1+\beta )^{3}\delta (t)^{3}\frac{1}{6}\frac{\delta ^{3}f(x)}{\delta ^{3}\mathbf {x}} \end{aligned}$$
(3)

assuming (1+\(\beta )^{3}\)= \(\alpha ,\) \(\alpha >0\).

$$\begin{aligned} D(\mathbf {x},t) = \delta (t)^{3}\frac{1}{6}\frac{\delta ^{3}f(\mathbf {x})}{\delta ^{3}\mathbf {x}} \end{aligned}$$
(4)
$$\begin{aligned} \hat{I}_{non-linear(3^{nd}order)}(\mathbf {x},t) = I(\mathbf {x},t) + \alpha D(\mathbf {x},t) \end{aligned}$$
(5)

This produces an approximation for for the input signal and a term that can be attenuated in order to present an augmented reality (AR) view of the original video.

2.1 Temporal Filtering

As jerk is the third temporal derivative of the signal \(\hat{I}(\mathbf {x},t)\), a filter has to be derived to reflect this. To achieve acceleration magnification, the Difference of Gaussian (DoG) filter was used [8]. This allowed for a temporal bandpass to be assigned, by subtracting two Gaussian filters, using \(\sigma \) = \(\frac{r}{4\omega \sqrt{2}}\) [11] to calculate the standard deviations of them both, where r is the frame rate of the video and \(\omega \) is the frequency under investigation. Taking the derivative of the second order DoG we create an approximation of the third order, which follows Hermitian polynominals [12]. Due to the linearity of the operators, the relationship between the the jerk in the signal and the third order DoG as:

$$\begin{aligned} \frac{\partial ^{3}I(\mathbf {x},t)}{\partial t^{3}}\otimes G_{\sigma }(t) = I(\mathbf {x},t) \otimes \frac{\partial ^{3}G_{\sigma }(t)}{\partial t^{3}} \end{aligned}$$
(6)

2.2 Phase-Based Magnification

In the classical EVM approach, the intensity change over time is used in a pixel-wise manner [10] where a second order IIR filter detects the intensity change caused by the human pulse. An extension of this uses the difference in phase w.r.t spatial frequency [7] for linear motion, as subtle difference in phase can be detected between frames where minute motion is present. Recently, phase-based acceleration magnification has been proposed [8]. It is this methodology we utilise and amend for jerk magnification. By describing motion as phase shift, a decomposition of the signal f(x) with displacement \(\delta (t)\) at time t, the sum of all frequencies (\(\omega \)) can be shown as:

$$\begin{aligned} f(\mathbf {x}+\delta (t))={\mathop {\omega }\limits ^{[}}=-\infty ]{\infty }{\sum }A_{\omega }e^{i\omega (\mathbf {x}+\delta (t))} \end{aligned}$$
(7)

where the global phase for frequency \(\omega \) for displacement \(\delta (t)\) is \(\phi _{\omega } = \omega (\mathbf {x} + \delta (t))\).

It has been shown that spatially localised phase information of a series of image over time is related to local motion [13] and has been leveraged for linear magnification [7]. This is performed by using complex steerable pyramids [14] to separate the image signal into multi-frequency bands and orientations. These pyramids contain a set of filters \(\varPsi _{\omega _{s},\theta }\) at multiple scales, \(\omega _{s}\) and orientations \(\theta \). The local phase information of a single 2D image \(I(\mathbf {x})\) is

$$\begin{aligned} (I(\mathbf {x}))\otimes \varPsi _{\omega _{s},\theta }(\mathbf {x}) = A_{\omega ,\theta }(\mathbf {x})e^{i\phi _{\omega _{s},\theta }(\mathbf {x})} \end{aligned}$$
(8)

where \(A_{\omega ,\theta }(\mathbf {x})\) is the amplitude at frequency \(\omega \) and orientation \(\theta \), and where \(\phi _{\omega _{s},\theta }\) is the corresponding phase at scale (pyramid level) \(\omega _{s}\). The phase information is extracted (\(\phi _{\omega _{s},\theta }(\mathbf {x},t)\)) at a given frequency \(\omega \), orientation \(\theta \) and frame t. The jerk constituent part of the motion is filtered out with our third order Gaussian filter and can then be magnified and reinstated into the video (\(\hat{\phi }_{\omega ,\theta }(\mathbf {x},t)\)) to accentuate the desired state changes in the cardiac cycle, such as the dicrotic notch and end diastolic point, shown in Fig. 1 (left).

$$\begin{aligned} D_{\sigma }(\phi _{\omega ,\theta }(\mathbf {x},t)) = \phi _{\omega ,\theta }(\mathbf {x},t)\otimes \frac{\partial ^{3}G_{\sigma }(t)}{\partial t^{3}} \end{aligned}$$
(9)
$$\begin{aligned} \hat{\phi }_{\omega ,\theta }(\mathbf {x},t) = \phi _{\omega ,\theta }(\mathbf {x},t) +\alpha D_{\sigma }\phi _{\omega ,\theta }(\mathbf {x},t) \end{aligned}$$
(10)

Phase unwrapping is applied as with the acceleration methodology in order to create the full composite signal [8, 15].

3 Results

To demonstrate the proposed approach, endoscopic video was captured from robotic prostatectomy using the da Vinci surgical system (Intuitive Surgical Inc, CA), where a partially occluded obturator artery could be seen. Despite being identified by the surgical team the vessel produced little perceivable motion in the video. This footage was captured at 1080p resolution at 30 Hz. For processing ease, the video was cropped to a third of the original width, which contained the motion of interest, yet still retains the spatial resolution of the endoscope. The video was motion magnified using the phase-based complex steerable pyramid technique described in [7] for first order motion and the video acceleration magnification described in [8] offline for comparison. Our method appended the video acceleration magnification method. All processes use a four level pyramid and half octave pyramid type. For the temporal processing, a bandpass was set at 1 Hz +/− 0.1 to account for a pulse around 54 to 66 bpm. From the patient’s ECG reading, their pulse was stable at 60 bpm during video acquisition. This was done at three magnification factors (x2, x5, x10). Spatio-temporal slices were then taken of a site along the obturator artery for visual comparison of each temporal filter type. For a quantitative comparison, the Peak Noise to Signal Ratio (PNSR) and Structural Similarity (SSIM) index [9] was calculated on a hundred frame sample, comparing the magnified videos to their original equivalent frame.

Fig. 2.
figure 2

Volumetric image stacks of an endoscopic scene under different types of magnification.

Fig. 3.
figure 3

Motion magnification of the obturator artery (x10). (a) Unmagnified spatio-temporal slice (STS); (b) Linear magnification [7]; (c) Acceleration magnification [8]; (d) Jerk magnification (our proposal); (e),(g) Comparative STS, blue box from (d) (jerk) in green, with (b) in magenta in (e) and (c) in magenta in (g); (f) Sample site (zoomed); (h) Overview of the surgical scene.

Fig. 4.
figure 4

1D distension-displacement pulse wave signal amplification, using virtual data [2]. The jerk magnification shown in green creates two distinct peaks that is not present in the other two methods of lower order.

Figure 2 shows an apprehensible overview of our video magnification investigation. The pulse from the external iliac artery can be seen in the right corner and the obturator artery on the front face. Large distortion and blur can be observed on the linear magnification example, particularly in the front right corner, where as this is not present on the non-linear example, as change in velocity is exaggerated, where as any velocity is exaggerated in the linear case. Figure 3 displays a magnification comparison of spatio-temporal slices taken from three different for mentioned magnification methods. E and G in this figure, demonstrates the improvement in pulse wave motion granularity using jerk has in temporal processing, compared to the lower orders. The magenta in E shows a periodic saw wave, with no discerning features relating to the underlying pulse wave signal. The magenta in G that depicts the use of acceleration shows a more bipolar triangle wave. The green in both E and G shows a consistent periodic twin peak, with the second being more diminished, which suggests that our hypothesis of a jerk temporal filter being able to detect the dicrotic notch as correct and comparable to our model analysis shown in Fig. 4. Table 1 shows a comparison of a surgical scene at three separate working distances. This was arranged to diminish the spatial resolution with the same objective in the endoscope. All three aforementioned magnification algorithms were used on each at three different motion magnification (\(\alpha \)) factors (x2, x5, x10).

Table 1. Results from SSIM analysis and PSNR for our surgical videos at three levels of magnification across the different temporal processing approaches.

As a comparative metric, SSIM and PSNR are used as a quantitative metric, with PSNR being based on mathematical model and SSIM taking into account characteristics of the human visual system [9]. SSIM and PSNR allow for objective comparisons of a processed image to a reference source, whilst it is expected that a magnified video to be altered, the residual noise generation by the process can be seen by these proposed methods. SSIM is measured in decibels (db), where the higher the number the better the quality is. PSNR is a percentile reading, with 1 being the best possible correspondence to the reference frame. For the all surgical scene, our proposed temporal process of using jerk out performs the other low order motion magnification methods across all magnifications for SSIM and equals or outperforms the acceleration technique, particularly at \(\alpha {\,=\,10}\).

4 Conclusion

We have demonstrated that the use of higher order motion magnification can bring out subtle motion features that are exclusive to the pulse wave in arteries. This limits the amplification of residual signals present in surgical scenes. Our method particularly relies on the definitive cardiovascular signature characterized by the twin peaks of the end diastolic point and the dicrotic notch. Additionally, we have shown objective evidence that less noise is generated when used within laparoscopic surgery compared to other magnification technique, however, a wider sample and case specific examples would be needed to verify this claim. Further work will look at a real-time implementation of this approach as well as methods of both ground truth validation and subjective comparison within a clinical setting. Practical clinical use cases are also needed to verify the validity of using such techniques in practice and to identify the bottlenecks to translation.