Introduction

The cornea subspecialty has continuously been a pioneer on the use of technology to aid treatment and diagnosis in ophthalmic diseases. The first corneal surface characterisation dates back to the early XVII when Scheiner did his first experiments observing the image reflected on calibrated glass spheres and the cornea [1]. During the 1900s, the keratometer and the keratoscope were developed and used separately until we had the computational power to effectively combine the objective and subjective approach of both methods in the 1980s [2]. This made possible the diagnosis of early forms of progressive corneal diseases even before the vision was affected. The slow advances of the last four centuries have given way to a rapid transformation. In a few decades, several new devices and technologies arouse. We can now evaluate the corneal curvature and elevation of both surfaces along with the pachymetric map [3]. It is possible to study the corneal tissue anatomy by layers, to evaluate its histology and biomechanical properties in vivo [4,5,6,7,8].

All these new technologies provide us with an unprecedented amount of information, which in one hand is vital to early disease detection, but on the other, its excess can disrupt sensible decision-making. Information and knowledge are not synonyms. In fact, since the 1960s, it is proposed that information overload could be a barrier to the formation of knowledge [9]. There are some strategies to deal with this information overload, but be it to keep up-to-date in the speciality or to extract meaningful information from a complementary exam, the use of machines seems to be the most efficient way of doing it [10]. Artificial neural networks, deep learning, and other machine learning (ML) techniques had become useful tools in clinician’s arsenal to help to deliver the best quality of care to their patients.

One of the earliest examples of its application in corneal imaging was proposed by Maeda et al. in 1994 with the use of a classification tree combined with a linear discriminant function to distinguish between a keratoconus and a non-keratoconic pattern [11]. Throughout the 1990s, some other techniques such as neural networks have been proposed to identify the keratoconus pattern based on corneal topography [12,13,14]. But the greatest push to improve disease susceptibility detection came with the first reported cases of post-Lasik ectasia [15,16,17,18]. Currently, refractive surgery screening is the most prolific field for ML development in corneal disease, yet research in other fields is growing, despite promising ML techniques are full of dangerous pitfalls. A meticulous process of developing the models should be followed in order to get reliable information from it.

Overview of ML Techniques

In brief, we will discuss some of the ML techniques most commonly used in corneal diagnosis, their benefits and limitations, and how to avoid unexceptional problems and misinterpretation inherent to it.

Artificial intelligence (AI), introduced in the 1950s and 1960s, represents algorithms that through learning and thinking ability has enabled computers to become intelligent. Learning is defined as the ability to update parameters and coefficients of an algorithm upon the availability of data. Historically, machine learning is built on three fundamental branches of symbolic learning [19], statistics [20], and neural networks [21]. These led to the development of advanced approaches that include pattern and statistical recognition (k-nearest neighbours, Bayesian classifiers, and Fisher’s linear discriminants), symbolic learning (decision trees, logical programmes, and decision rules) and artificial neural networks (ANN) (deep learning, recurrent networks, and convolutional neural networks) [22, 23].

There are various types of machine learning and they can mainly be categorised into unsupervised (data are not labelled), supervised (data are labelled), and reinforcement learning [24]. For both supervised and unsupervised learning, if the data are continuous, it demands a regression analysis and in case of discrete data, a classification method is required. Regression analysis is a statistical method that reveals the relationships between two or more variables. An example of regression analysis can be the correlation of visual acuity improvement after a surgical intervention based on the preoperative status of the patient. On the other hand, an example of a classification problem is disease diagnosis using complementary exams’ data. Reinforcement learning is a very new type of machine learning algorithms that rewards are provided for the actions taken by the machine. A good example is the software packages that learn to play computer games. It has three components that are an agent (the decision maker that functions in an environment), environment (leads to making a decision), and actions (what agent can do based on the decision made).

Artificial neural networks are designed on the concept of biological neurons. A single perceptron, that is a building block for a neural network, can be mathematically described as shown in Fig. 1. Features (individual measurable characteristic or property) from the input provide information about the problem under consideration. A randomly allocated weight will be multiplied by the input and passed into the activation function. The activation function is a simple equation that defines the output; for instance, it can be if the inputs are positive, print 1 otherwise print 0. The problem will arise when the input is zero in some cases, which would make the output zero no matter what the activation function is. Hence, to overcome this problem, scientist included a bias term into each perceptron calculations.

Fig. 1
figure 1

A demonstration of the perceptron model that shows two inputs and a bias term that multiplied by random weights and fed into an activation function which numerically predicted the output

This process can be described mathematically using Eq. 1:

$$ \sum \limits_{i=0}^n{w}_i\ {x}_i+b $$
(1)

where n is the number of inputs, w is the weight of the input, x is the input, and b is the bias term. Multiple perceptrons that connect the input to the outputs are called middle layer or hidden layer. Any neural network is constructed from three layers of input, output, and hidden layer. If the number of hidden layers is more than three, it is called deep learning. Deep learning enables the network to gain a deeper understanding of the inputs and outputs. Hence, in a deep neural network, the function above is applied through multiple layers as below:

$$ {\displaystyle \begin{array}{c}\mathrm{Hidden}\ \mathrm{layer}\kern0.5em 1:\kern0.5em {Z}^{(1)}={w}^0{x}_i+{b}^0\\ {}\begin{array}{cc}\mathrm{Hidden}\ \mathrm{layer}\kern0.5em 2:& {Z}^{(2)}={w}^1{Z}_1+{b}^1\\ {}\mathrm{Hidden}\ \mathrm{layer}\kern0.5em 3:& {Z}^{(3)}={w}^2{Z}_2+{b}^2\end{array}\\ {}\\ {}\mathrm{Hidden}\ \mathrm{layer}\ n:\kern0.5em {Z}^{(n)}={w}^{n-1}\kern0.50em {Z}_{n-1}+{b}^{n-1}\end{array}} $$
(2)

where Z is the output of each layer. A neural network always goes through a process, Fig. 2. The mathematical complexity of this process is out of the scope of this paper and will not be discussed here. However, readers are encouraged to read about it at [25].

Fig. 2
figure 2

Process for building a machine learning software starts with data acquisition and cleaning. Then, the dataset should be split into training and test set. This will be followed by training the model, evaluating the results, and adjusting the parameters through an iterative process. Finally, the model can be deployed to utilise new inputs for predictions

There are a few important points regarding the training of any ANN that enables supervision of AI-related projects. This include the following:

  • Cleaning of input data in an organised manner

  • Choosing one appropriate cross-validation method (in the order of extrapolation accuracy and computational cost from lower to higher)

    • Hold-out: Splitting the data into a randomly selected training set (normally 70–90% of the data) and test set

    • K-fold: Splitting the data in k folds, normally 10, separating 9 of the folds for training and 1 for testing. Repeating the process 10 times, we can get the average accuracy and its standard deviation.

    • Leave-one-out: Use all data set but one case to train the data and test the accuracy in this remaining case. Repeating the process n times (size of the dataset) will get a robust estimation of the generalisation ability of the model.

  • The normalisation of training data to achieve consistency

  • Optimising the architecture of the network, pre-set parameters and selecting a suitable NN through trials and errors

  • Evaluating the performance of the network through the cost function (minimising the error between predictions and expected data)

The advantage of using ANN is that the algorithm is able to deal with noisy and missed clinical data and understanding complex patterns in the data in a way that is not possible with linear and non-linear equations [26].

The problem with ANN is that they require extremely large clinical datasets for training. For achieving a globally accepted performance, these clinical data should be collected from different ethnicities. This is not always easy to obtain considering patient data is often expensive, highly regulated, and time-consuming to collect in the desired manner. On the other hand, what is going on in the network is more like a black box as it does not provide any explanation to the clinician of why a decision has been made. Hence, abnormalities may be misdiagnosed which is a major concern. This problem can be overcome by refining the algorithm over time by comparing its decisions with the clinicians.

Other commonly used techniques that share some of the same advantages and limitations of the above-described artificial neural networks are the random forests and the support vector machines [27, 28]. The random forest uses the concept of the decision tree (DT) models. In these models, a flowchart-like structure is built. In each node, there is a test using one of the independent variables that will split the data in two mutual subgroups (branches). This process is repeated several times until the final decision of class assignment (leaves) is reached.

The random forest combine several trees. Each DT in this method is built based on a random subset of the data that is generated by a bootstrap resampling technique and for each split, the best variable is chosen from a subset composed of a pre-defined number of randomly selected variables. Accordingly, to its final decision, each tree gets a “vote” and the mode is used in classification problems while the mean is used in regression. Figure 3 exemplifies this process. Some advantages of the method are that it can model nonlinear class boundaries and can give variable importance. On the downside, it is a slow method and it is hard to get insights into the decision rules.

Fig. 3
figure 3

Example of a random forest model. a The patients included in the train and test sets, for exemplification purposes, different characteristics of height, age, and weight are graphically expressed with geometrical shapes (triangle, square, and circle, respectively). b A very simple random forest model composed of 3 trees trained with the three characteristics to separate healthy (H) and diseased (D) and the classification path (in green) of a new patient

In support vector machine (SVM), a previously classified dataset (supervised learning) is used to train the model. The algorithm will search for an n-dimensional hyperplane able to separate the group with the largest margin. In many cases, in ophthalmology, a linear solution (two-dimensional) is not possible, so finding the solution in a higher dimensional space is needed as observed in Fig. 4. The SVM looks for this solutions with a relatively less computational cost using the kernel trick. The kernel trick is a function used to obtain nonlinear variants of a selected algorithm with an ability to be casted in dot products’ formation [29]. However, choosing a suitable kernel is rather easy and an inappropriate kernel could lead to overfitting [30].

Fig. 4
figure 4

a Linear separable data with good margins. b Linearly impossible to separate data (2 dimensions) c Separation obtained with a hyperplane in a higher dimensional space (3 Dimensions)

The applications of these methods will be discussed below in clinical scenarios.

Keratoconus Diagnosis and Refractive Surgery Screening

The progressive character of the ecstatic corneal disease has always stimulated the search for a means of early diagnosis. With the introduction of corneal cross-linking, the disease progression could be halted, and with an accurate early diagnosis of the disease, the vision of patients could be preserved. But the increasing number of refractive surgeries and the report of iatrogenic keratectasia in cases without anterior surface alterations pushed the need to screen for susceptible cases [31]. It means to identify those cases that could experience a biomechanical failure after the procedure even before any mild alteration is present in the anterior corneal surface. To accomplish this task, a series of methods including risk scores, linear models, and more recently artificial intelligence models have been proposed with a progressive accuracy increase evaluating data from different devices [32, 33••]. Data from 4 different tomographers were analysed with 4 different artificial intelligence techniques and it made possible the identification of these very early forms of the disease with high accuracy [33••, 34,35,36].

The pentacam random forest index (PRFI) is a random forest model built using data from the tomographer Pentacam HR (Oculus, Wetzlar, Germany). It was the only model trained with the preoperative exam of patients that have developed ectasia. It was trained using a large data set of patients from 3 different continents to better assess the patient’s normal variability. While the index already available on the device (BAD-D) presented 55.3% of sensitivity, the PRFI was able to correctly identify 80% of the cases. In the external validation set, 85% of accuracy was found in detecting the normal topographic eye of very asymmetric cases (VAE-NT) maintaining specificity of 96.6% [33••].

A single decision tree method was proposed based on the data of a different tomographer, the Galilei Dual Scheimpflug Analyzer (Ziemer Ophthalmic Systems AG, Port, Switzerland). This index held sensitivity of 90% with specificity of 86% to detect the early forms [34]. Analysing the data from the tomographer Sirius (CSO, Firenze, Italy), the identification of cases with first signs of the disease (a stage slightly later than the VAE-NT) using an SVM model presented sensitivity of 92% with 97.7% of specificity [35]. Discriminant linear models were also successfully used to analyse the Orbscan II data (Technolas, Munich, Germany) with 92% and 96% of sensitivity and specificity, respectively, in a first validation set [36] and 70.8% and 98.1% of sensitivity and specificity, respectively, in a different ethnical background population [37].

The in vivo assessment of corneal biomechanics also provided a torrent of new and hard to interpret information. Neural networks have been used to evaluate the waveform signal of the Ocular Response Analyzer (Reichert Ophthalmic Instruments, Buffalo, USA) and resulted in high accuracy on the study validation sample composed of early forms of keratoconus (AUC 0.978) [38]. However, these results had not been validated in independent samples. The data from the other commercially available device the Corvis ST (Oculus Optikgeräte GmbH, Wetzlar, Germany) has also been analysed individually with logistic regression and high accuracy was found in keratoconus detection with 98.8% of the cases correctly classified [39]. The relatively low accuracy in detecting initial cases for the device used independently was overcome with the integration with tomographic data using AI. The random forest model named the tomography and biomechanical index (TBI) achieved 90.3% of sensitivity in detecting the VAE-NT with 96% of specificity [40••]. The tomographic and biomechanical combined parameter showed to be superior to both methods used alone [41].

One of the main advantages of the AI-derived models is their ability to eliminate the gap between research and clinical application, by providing a simple output parameter that can be directly used as a risk profiling tool. Different corneal imaging devices (tomographers and biomechanical analysers) have already implemented on its software these AI-based indices and made them available in daily clinical practice as objective screening parameters.

In Vivo Corneal Morphology Exams

The in vivo corneal morphology evaluation has also benefitted from the AI models. Neural networks have been used to automatically identify the healthy corneal layers on the confocal microscopy exam with significant improvement over the previous methods and eliminating the need of the image processing step of binarization, which is cumbersome and can often lead to information loss [42]. Evaluating the ultra-high-resolution OCT images with convolutional neural networks, it was also possible to precisely identify the corneal layers in keratoconic eyes. In these cases, a non-uniform thickness was observed in at least one layer, and this result was also proposed to be used for diagnosis [43, 44].

Going further into the evaluation of more specific histological features of the corneal tissue is possible to study in vivo the endothelial cell and the subbasal nerve plexus characteristics. The endothelial cell number and shape are calculated by means of specular or corneal confocal microscopy exams. To estimate cell density, pleomorphism, and polymegethism, the images acquired are manually annotated. Although it seems a trivial task to identify cell borders, due to frequent low image quality with blurred regions, low contrast, and artefacts, this automatization is often difficult. In order to fully automate the process, increasing the speed and avoiding imprecisions of the manual appraisal of the exam, several methods to delineate the cell borders have been proposed. The machine learning models have proven to be a fast and more accurate method of characterising the endothelium [45, 46•].

The corneal subbasal nerve plexus is composed of short nerve fibres that can be noninvasively studied with confocal microscopy. It has gained a special interest in evaluating diabetic sensorimotor polyneuropathy (DSPN), a common long-term complication of the disease affecting up to 50% of the patients [47]. The promising results in identifying the DSPN in cross-sectional studies face considerable difficulties in longitudinal evaluation since the images are manually analysed in an imprecise and time-consuming process [48]. This gap was filled with the introduction of neural network and random forest models to fully automate the nerve segmentation and morphology study, allowing the development of an objective and precise method to early characterise the disease [49•, 50].

Hyphae detection in corneal fungal infections and even the image segmentation of corneal ulcers, difficult tasks for subjective characterisation, have also had their accuracies improved with the aid of artificial intelligence models [51, 52].

Corneal Surgery

Another growing field where machine learning techniques are being used is corneal surgery. Automated objective quantification of haze and the demarcation line post-cross-linking surgery is achieved with a support vector machine model. This method provides the clinician’s haze statistics along with visual demarcation on the OCT images of the shape and location of haze and the demarcation line [53, 54•].

In corneal posterior lamellar transplant surgery, graft detachment is one of the main complications, especially during the initial part of the learning curve [55]. Intraoperative corneal OCT can be used to evaluate the fluid in the graft-host interface [56]. The identification of a larger residual interface fluid volume by the automated graph searching approach is associated with early graft dislocation [57]. On the postoperative period, the graft dislocation can also be objectively quantified using a convolutional neural network [58].

Current Limitations and Future Perspectives

AI techniques have already shown its value for enhancing clinical decisions for patients with corneal conditions, and its application keeps growing. However, there are some considerations and important limitations that preclude an even more spread use of it. One big limitation of deep learning is that it requires large training sets, on the order of tens of thousands to be accurately trained and to be possible to generalise the results. The very large datasets are also required to accurately deal with the high amount of noise derived from biological data. Successfully trained deep neural networks to classify retinal funduscopic images utilised a dataset of more than the 100,000 images [59, 60].

In corneal imagining, there are a few hurdles to build a very large dataset making it an extremely hard task, if not impossible. The high cost of devices such as the tomographers makes their availability relatively lower than the retinographers. The technical challenges to acquire images with devices such as the confocal microscope, that requires highly trained operators, is also a challenge. Another limitation is the differences between devices, even those that use the same technology. For instance, with Scheimpflug imaging, the scans obtained from the same patients with different devices are not easily interchangeable, which restricts the datasets to usually a single device type. One change in this scenario is the clever adaptations for smartphone cameras that allow self-imaging of the cornea and the anterior segment, and even high-quality imaging at a sub-cellular resolution [61, 62]. With the relative low cost and wide spread of these portable devices, a bigger dataset of corneal imaging are more likely to be acquired and the AI applications are numerous with a high potential to promote healthcare in remote areas [63].

With big datasets, there is also the need of high computational power to evaluate the features in a reasonable time which compels the participation of big high tech companies to make them possible to be applied in real life. AI models also need to be continuously trained and exposed to new data to be able to identify more subtle variations of the patterns and normal ethnical differences. Considering patient data confidentialities, an efficient data sharing system under strict privacy rules needs to be implemented to facilitate AI advancements.

The participation of high tech companies to provide the computational power needed, multicentric collaborations to gather big datasets along with efficient data sharing systems to constantly train the models is a vital step to improve the accuracy of AI models applied to medicine in general and also corneal diseases.

Conclusion

In conclusion, machines have been used to augment the clinical ability of ophthalmologies, either to reveal characteristics initially imperceptible to our senses in the typical clinical exam or to aid in the interpretation of the amount of information that they itself produce. AI models diagnostic indices are already available and widely used by clinicians in refractive surgery screening, and several other application are under fast development.