Keywords

1 Introduction

The cochlea is a small organ in the inner part of the ear. It has a crucial role in hearing as it filters and transfers auditory signals to the brain. Recently, Cochlear Implants (CI) have become increasingly popular as a treatment option for patients with severe to profound Sensori-Neural Hearing Loss (SNHL) [16]. These implants result in significant improvement in post-operative speech recognition. This is mostly underpinned by the adequate match between CI electrode frequency bands and their exact location inside the cochlea, as each audible frequency has a specific position inside the cochlea [11]. Cochlear Duct Length (CDL) can also have a significant impact on the process of pre-operative electrode selection. If the electrode has a length that is not appropriate to the cochlear length, this can result in incomplete insertion, cochlear trauma or poor cochlear coverage with poor matching between the electrodes and the cochlea [8].

Surgical robots are gaining popularity, primarily driven by improvements in nanotechnology and artificial intelligence. These robots need reliable real-time computer vision algorithms in order to detect and analyze the target organ. For instance, during a robotic cochlear surgery [17], a reliable real-time estimation of the length and size of the cochlea is needed e.g. to decide a suitable CI for a specific patient. This study proposed a fast cochlear length and volume size estimation that may benefit future surgical cochlear robots.

1.1 Related Work

By reviewing the literature, one can find major variations in the radiological human CDL measurements with regards to used methodologies. Primarily methodologies include three-dimensional (3D) processing and spiral coefficient equations, which are relatively time-consuming and require deeper radiology expertise.

Escude et al. [5] introduced an equation of spiral coefficient, which requires only one measurement. This measurement, known as the A value, is defined as the largest distance from the round window to the opposite cochlear lateral wall. This equation was further modified using a linear equation by Alexiades et al. [2] followed by Koch et al. [9]. Most recently, Iyaniwura et al [8], using conventional and micro-Computed Tomographic (\(\mu CT\)) images of cadaveric cochleaer specimens, proposed an automated method for the measure of the A value, with a significantly lower mean error than the manual method.

Other studies used 3D reconstructions of cross-sectional imaging such as Weurfel et al. [18], measured the cochlear length of the cone beam computed tomography of temporal bones. They measured the cochlear length with a starting point at the distal bony rim of the round window and then a 3D curve was set up from the outer edge of the bony cochlea until helicotrema [18].

This was followed by Meng et al. measuring the CDL, and the relation between the basal turn lengths and CDL using 3D multiplanar reconstructed CT images [10].

More recently, Rivas et al. [15] compared the automatic measurement of the A value and automatic CDL measurement by 3D reconstruction to the manually measured A value and CDL by two fellowship-trained neurotologists. They concluded that the automatically measured values more reproducible and less time-consuming compared to those done manually.

The 3D reconstruction is considered the most accurate method to get the CDL measurements [10]. The entirety of the complex 3D shape of the cochlea can be well evaluated and is also less liable to cutting and viewing angle errors. However, the processing time for this technique is considerably long [10].

The development of a more consistent, less time-consuming and reproducible method with no inter-observer variability, to determine CDL, is still needed.

1.2 Cochlea Segmentation

Automatic cochlear analysis requires an efficient automatic segmentation algorithm. A segmentation is the process of extracting an object from an input image. Some researchers proposed automatic cochlea segmentation methods, but they are not practical. These methods are either time consuming or they do not cover the cochlear details. Hence, they are not suitable for a surgical robot integration [6, 12,13,14]. Atlas-based segmentation methods attempt to align an atlas to the input image. The atlas usually is a well-defined histological image or a high resolution \(\mu CT\) image. Model-based segmentation methods try to fit a statistical shape model to the input image [3, 4]. This statistical shape model is generated using many aligned and manually segmented cochlear images.

2 Materials and Methods

The objective of this paper is to propose and to evaluate a new, relatively easy and fast methodology utilizing simple computer hardware and software to calculate the cochlear duct length and cochlear duct size. The proposal can be easily integrated into a surgical robot due to its high accuracy and fast performance. In this section, we describe briefly the dataset, the experimental design, and the proposed method.

2.1 Datasets

A part of the standard and public Human Cochlea Dataset (HCD) was used, the dataset is well described in [1]. This dataset contains Computed Tomography (CT), Magnetic Resonance (MR) and Cone Beam Computed Tomography (CBCT) modalities of patients from different age and gender groups. Each patient has 3D images of his/her cochlea before and after the cochlear implant surgery. The part used in the experiments contains 71 3D Images of 21 patients, see Fig. 2 for a sample from these images.

2.2 Atlas-Model-Based Segmentation

The proposed method combines model-based and atlas-based segmentation. A high resolution \(\mu CT\) image was used as a model and its manual segmentation was used as an atlas. The atlas was aligned to the input image using image registration technique. The model was obtained from public and standard \(\mu CT\) cochlear dataset [7].

The original \(\mu CT\) image was too large to process in a standard PC, hence a re-sampling procedure was necessary. The source \(\mu CT\) image was re-sampled from [0.008, 0.008, 0.008] mm spacing to [0.032, 0.32, 0.008] mm spacing, which reduced the image size from 13.4 GB to 806 MB. All DICOM slice files were divided into groups of 100 files, each was then loaded separately into 3D Slicer softwareFootnote 1 for re-sampling. After that, all re-sampled parts were combined together to form the final re-sampled image.

Next, the image was cropped to the cochlear part only. This allows for a smaller image size of 103.2 MB with 243\(\,\times \,\)202\(\,\times \,\)1191 voxelsFootnote 2 (the original was 437\(\,\times \,\)412\(\,\times \,\)2349 voxels). Following that, the two main cochlear scalae, i.e. scala vestibuli and scala tympani, were segmented manually, see Fig. 1. The model was the transformed manually to get a direction matrix similar to a left cochlear side. The transformed model was automatically aligned to one of the clinical CBCT images using ACIR [1]. ACIR is a registration method proposed for multi-modal cochlear images. Registration is the process of aligning two or more images. The segmentation was aligned the same way. Due to the interpolation process, a process needed by the registration procedure that find locations of pixels when images have different sizes, the segmentation needed to be corrected manually to fit the CBCT image before using it as an atlas. Following correction process, CBCT left-side cochlear atlas and its segmentation were ready to segment any CBCT left-side cochlear image automatically.

By repeating the same process above, the model and the segmentation of CT and MRI images were obtained. The right-side cochlear atlases were generated by changing the direction matrix x-direction of the previous atlases and their segmentations.

A user-friendly interface for the atlas-based segmentation method was developed as a Slicer plug-in. A summary of the segmentation process is presented in Fig. 1 left. The user inputs the cochlear image, selects any point inside the cochlea then obtains the segmentation result automatically in less than 5 s.

Fig. 1.
figure 1

Left: cochlear analysis general pipeline. Middle: Cochlea high resolution \(\mu CT\) model. Right: Points set model

2.3 Cochlea Points Model

Skeletonization is the process of converting an object in an image to a thinner object (skeleton) e.g. converting a rectangle to a line. It is usually used to generate curves and lines of an object in order to obtain the object related measurements e.g. the object’s length.

Standard skeletonization methods did not succeed on the resulting cochlear segmentation, due to the non-regular shape of the cochlear scalae. A proposed solution was used for addressing this problem by creating a points-set model from a sorted-points set, see Fig. 1 right. This points-set model contains 55 points representing the center of the scala tympani. By applying the proposed atlas-segmentation transformation to this points-set, the length of the scala tympani can be calculated by computing the distance between each two consecutive points using the standard 3D distance equation:

$$\begin{aligned} Length = \sum _{i=1}^{n-1} \sqrt{(x_{i+1}-x_i)^{2} + (y_{i+1}-y_i)^{2} + (z_{i+1}-z_i)^{2} } \end{aligned}$$
(1)

where n is the number of the points, and xy and z, are the 3D point coordinates. Using this approach has two benefits:

  1. A.

    It is faster than skeletonization as it includes only one simple matrix multiplication.

  2. B.

    The points can be corrected or modified later to produce different useful measurements (e.g. measuring the inner length or the outer length of a scala).

3 Experiments and Results

A few important factors were studied in the experiments. These factors may affect the accuracy of the estimated measurements.

The only manual step in the proposed methodology is locating the cochlea in which the user provides a point inside the cochlear region (step 1 in Fig. 1 left). This point was used in the cropping phase, where the region around this point was cropped. The cropped image was used in the registration phase (step 3 Fig. 1 left). The point was considered the center of the cropping, generating a cube of 10 mm length and a 3D cropped image of about 80\(\,\times \,\)80\(\,\times \,\)80 voxels with 0.125 mm iso spacing. These parameters were selected based on several experiments, as the cochlea was located inside this cropped size in all tested images.

The selection of this cropping point can be different from one user to another, and could be different if the same user located it multiple times. The segmentation result changes slightly if the cochlear locations point was different. For justification, the cochlear point was located 10 different times for each input image, then the average and the standard deviation of the segmentation result were recorded. In Fig. 2, and Table 1, only three cochlear location points of one patient are shown with their quantitative and visual results.

Table 1. Sample of Cochlear location’s points and related results of patient 1
Fig. 2.
figure 2

Cochlear Location Change Effect, from left to right: results of point a, point b and point c. The points locations are listed in Table 1.

In Table 1, the points-set detection is related to the number of points that transformed to the new input image, see Sect. 2.3. Notice that the quantitative results are slightly different, while they visually look almost the same.

To find out the error resulting from the image artifact, the actual length of the CI was used as a reference point. The electrodes array was segmented out from the datasets and its length was computed. Then the estimated error is computed by comparing the computed electrodes array’s length to its actual length.

Using different implants types, the estimated error average of the CI length was 0.62 mm with standard deviation of 0.27 mm. The minimum and maximum error values were 0.05 mm and 1.1 mm. It can be concluded that there will be always a small error less than 1.5 mm (0.4 was added for more justification).

Average of the estimated scala tympani length and size were computed from all the 71 volumes using the proposed methods.

The average estimated scala tympani length was 29.54 mm with a standard deviation of 0.27 mm. The minimum and maximum values were 28.39 mm and 30.71 mm. The estimated average difference error between pre-and postoperative CBCT images was 0.31 mm with a standard deviation of 0.25 mm. Similarly, the estimated average error between preoperative MRI and CBCT images was 0.30 mm with a standard deviation of 0.22 mm. The estimated average difference error between preoperative MRI and postoperative CBCT images was 0.32 mm with a standard deviation of 0.30 mm.

Average of the estimated scala tympani size was 41.56 mm\(^{3}\) with a standard deviation of 0.19 mm\(^{3}\). The minimum and maximum values were 40.52 mm\(^{3}\) and 41.79 mm\(^{3}\). The estimated average difference error between pre-and postoperative CBCT images was 0.25 mm\(^{3}\) with a standard deviation of 0.27 mm\(^{3}\). Similarly, the estimated average error between preoperative CBCT and MRI images was 0.16 mm\(^{3}\) with a standard deviation of 0.15 mm\(^{3}\). The estimated average difference error between postoperative CBCT and MRI images was 0.26 mm\(^{3}\) with a standard deviation of 0.31 mm\(^{3}\).

Using an ASUS ROG G751-JT notebook (Intel Core i7-4720HQ @2.6 GHz x 8, Geforce GTX970m and 32 GB Memory), the average time of computation per image, was 4.01 seconds with a standard deviation of 0.79 s. The computation time covers computing the transformation, generating the segmentation, generating the cochlear 3D model, and computing the cochlea measurements i.e. length and size.

From the results above, it seems that the CDL is long enough to accommodate a longer CI than the ones available currently. Assuming an error of 1.5 mm and a range of 28.39 mm to 30.71 mm, a suggested CI active length ranging from 26.8 mm to 29.2 mm can be used. However, to verify this suggestion, more clinical datasets from different locations are needed.

Finally, Fig. 3 shows different visual results of the segmentation and the generated 3D models from different modalities.

Fig. 3.
figure 3

Segmentation results and the generated 3D models. left: CBCT, middle: MR and right: CT

4 Conclusion

A real-time automatic method for cochlear images analysis is proposed. The proposed method can be integrated into a surgical cochlear robot due to high accuracy and fast performance. Due to the absence of a public ground truth, the error is estimated using the cochlear implant known length in the images.

To the best of the authors knowledge, this is the first method that uses such error estimation. The experiments showed a small length estimation error ranged from 0.15 mm to 1.1 mm. The proposed method is implanted as a public open-source plug-in for 3D Slicer software. Future works include more enhancement in terms of speed and accuracy. Using a better histological model to get a segmentation of the 3 cochlear scalae also should be considered in future research.