Keywords

1 Introduction

There is currently a notable population of near 8% of people with genetic disorders due to mutations in genes, which can affect any part of the body and its functionality. Approximately a third of people with genetic disorders present more serious symptoms that compromise their physical and mental well-being. About 3 to 6% of babies will be born with a genetic disease or a birth disability. In addition, 1 to 3% of people worldwide have an intellectual disability. More than 20% of infant deaths are caused by genetic conditions or congenital defects. Therefore, genetic disorders can be lethal or require major medical care. Genetic conditions affect people of all ages, genders and ethnic groups [6].

Furthermore, 30 to 40% of genetic syndromes present facial and cranial abnormalities, which help physicians diagnose certain disorders, such as Angelman syndrome, Down syndrome, Williams syndrome, among others. Although there are more than 6,000 known genetic disorders, only a few people with a suspected syndrome receive a clinical diagnosis [6]. In this work, we propose and evaluate a strategy for recognizing patterns of facial abnormalities associated with different genetic disorders.

Several studies have been conducted in the literature to investigate facial abnormalities in images of patients with genetic disorders. Loos et al. [16] presented a computer-based recognition of dysmorphic faces to describe facial patterns among five types of syndromes. Their method extracted a set of features through Gabor wavelet transformations. Similarly, Boehringer et al. [1] applied a set of Gabor wavelet filters at facial landmarks to identify facial abnormalities of ten types of syndromes. Vollmar et al. [19] presented an analysis to determine the impact on recognition accuracy when increasing the number of syndromes. They also described the improvements in the use of geometric features and their combination with texture features in accuracy rates. Ferry et al. [12] proposed an approach to extracting phenotype information using a combination of shape and texture features to recognize eight syndromes. They performed syndrome recognition through supervised and unsupervised learning methods.

The remainder of the paper is organized as follows. Section 2 describes the methodology proposed in this work, composed of preprocessing, feature extraction, feature reduction and classification stages. Section 3 presents and evaluates the experimental results. Section 4 concludes the paper with final remarks.

2 Methodology

The proposed methodology for genetic syndrome recognition is composed of four main stages: preprocessing, feature extraction, feature reduction, and classification. These steps are illustrated in Fig. 1 and explained as follows.

Fig. 1.
figure 1

Main steps of the genetic syndrome recognition methodology.

2.1 Preprocessing

The image preprocessing procedure is crucial for the genetic syndrome recognition task, whose primary purpose is to provide aligned and cropped faces. This preprocessing stage consists of the following five steps: (i) automatic landmark detection by Dlib library [15]; (ii) eye coordinate feature extraction; (iii) face alignment due to eye coordinates; (iv) cropping the face region applying a fitting bounding rectangle; and (v) and scaling the images to \(224 \times 224\) pixels.

2.2 Feature Extraction

Three feature extraction techniques – Deep Features, Geometric Representation and Oriented FAST and Rotated BRIEF (ORB) [18] – were extracted and fused for genetic syndrome recognition.

We employed a Deep Convolutional Neural Network (CNN) architecture based on the very deep network with the triplet loss function [17], which was trained on 2622 identities of 2.6 million images. The CNN model receives images with size of \(224 \times 224\) pixels as input. The model consists of \(3\times 3\) convolution kernels with stride 1, which are followed by non-linear rectification layers (ReLU), and 3 fully connected layers. This model does not use local contrast normalization. Then, the deep features are extracted from the cropped facial images through this CNN model, forming a feature vector of length 2622.

Fig. 2.
figure 2

Feature extraction. (a)–(b) twenty seven facial fiducial points detected; (c) twenty one distances extracted from the twenty seven landmarks.

A geometric representation is created from the detected facial fiducial points. The geometric descriptor employs twenty seven 2D facial landmarks: two points for the outer corner eyebrows (\(P_{1}\), \(P_{2}\)), two points for the inner corner eyebrows (\(P_{5}\), \(P_{7}\)), six points for the middle of the eyebrows (\(P_{2}\), \(P_{3}\), \(P_{4}\), \(P_{8}\), \(P_{9}\), \(P_{10}\)), one point for the glabella (\(P_{6}\)), four points for the inner and outer corner of the eyes (\(P_{12}\), \(P_{13}\), \(P_{15}\), \(P_{16}\)), one point for the root of nose (\(P_{14}\)), one point for the supratip (\(P_{17}\)), two points for the alars sidewalls (\(P_{18}\), \(P_{19}\)), one point for the subnasale (\(P_{20}\)), two points for the mouth corners (\(P_{23}\), \(P_{24}\)), two points for the top and bottom of the upper lip (\(P_{21}\), \(P_{22}\)), two points for the top and bottom of the lower lip (\(P_{25}\), \(P_{26}\)), and one point for the gnathion (bottom of chin) (\(P_{27}\)). Figure 2 (a)–(b) shows the localization of the twenty-seven facial landmarks.

Thirty one geometric features are extracted from the specified twenty seven fiducial points. From these thirty one geometric features, we calculated twenty one distances, as illustrated in Fig. 2(c), which are normalized to the face width in order to ensure the features to be scale invariant. The eleven distances \(d_{2}\), \(d_{3}\), \(d_{4}\), \(d_{5}\), \(d_{6}\), \(d_{9}\), \(d_{10}\), \(d_{13}\), \(d_{14}\), \(d_{15}\) and \(d_{21}\) represent the average values of the two mirrored distances on the left and right sides of the face. The distance \(d_{14}\) is computed using the intersection point of the line between the points on the top of the upper lip and bottom of the lower lip, and the line between the left and right corners of the mouth. From the upper lip thickness \(d_{16}\) and the lower lip thickness \(d_{18}\), we calculated the ratio between them (\(R_{\textit{lips}}=d_{16}/d_{18}\)).

We also computed the curvature from the fiducial points of both eyebrows (\(P_{1}\), \(P_{2}\), \(P_{3}\), \(P_{4}\), \(P_{5}\), \(P_{7}\), \(P_{8}\), \(P_{9}\), \(P_{10}\), \(P_{11}\)). For each eyebrow, we obtained three coefficients and their discriminant curve value, forming eight geometric features. Moreover, we calculated the subnasale angle \(\alpha _{\textit{subnasale}}\), generated by three points (\(P_{18}\), \(P_{20}\), \(P_{19}\)), where \(P_{20}\) is the central point, and \(P_{18}\) and \(P_{19}\) are the extreme points of the angle. Therefore, we form a geometric feature vector of total length 31.

Individuals with Cornelia de Lange syndrome frequently present a long philtrum, that is, the distance between the subnasale and the upper lip. Distance \(d_{12}\) represents this feature. They also have thin lips, which are described by the distance \(d_{16}\) and the ratio \(R_{\textit{lips}}\) [8]. They usually exhibit a short upturned nose that is captured by the angle \(\alpha _{\textit{subnasale}}\) and nose distances \(d_{9}\) to \(d_{11}\). On the other hand, in the case of people with Progeria syndrome, they show a beak-shaped nose, which is also described by the previously mentioned nose features. Another common symptom of Progeria syndrome is a shrunken chin, being represented by the distance \(d_{19}\) [14].

People with Down syndrome often feature telecanthus, referring to the increased intercanthal distance, that is, the space between the inner corners of the eyes. The distance \(d_{7}\) can represent this symptom. Another frequent symptom is a flattened nose, whose characteristic is covered by the subnasale angle and nose distances. Another common sign is a small mouth, which is described using the mouth distances \(d_{13}\), \(d_{14}\), \(d_{15}\), \(d_{17}\) and \(d_{20}\). The presence of upslanting palpebral fissures, that is, the distance between the lateral and medial canthus of the eyes, is also a symptom captured by the distance \(d_{6}\) [5, 10].

For Treacher Collins syndrome recognition, it is essential to consider the chin length \(d_{19}\). The most common clinical feature is the presence of a small lower jaw and chin. Another typical characteristic is a small upper jaw, which is represented by the philtrum distance \(d_{12}\). Moreover, as well as Down syndrome symptom, a Treacher Collins symptom is the presence of short and down-slanting palpebral fissures (\(d_{6}\)) [3]. This symptom also characterizes Apert syndrome. Individuals with this syndrome also present a broad and short nose with a bulbous tip, which can be described using the subnasale angle and nose distances [4].

People with Angelman syndrome have a prominent chin, whose feature is captured by the distance \(d_{19}\). They also have a wide mouth, featured by the mouth distances [2]. People with Williams syndrome also present a wide mouth. However, they show a small chin. Another Williams syndrome feature is a short nose with broad nasal tip, which can be represented by the nose distances. As Cornelia de Lange syndrome, William syndrome features a long philtrum (\(d_{12}\)) [11]. Moreover, regarding Fragile X syndrome, people with this syndrome have a prominent forehead, a long and thin face, and a large jaw [13], which are covered by several facial distances.

ORB is a feature descriptor based on Features from Accelerated Segment Test (FAST) keypoint detector and Robust Independent Elementary Features (BRIEF) descriptor, which appeared as a fast and efficient alternative to Scale Invariant Feature Transform (SIFT), Speed up Robust Feature (SURF) in computation cost and matching performance. For generating ORB features, we selected fifteen facial keypoints previously detected: two points for the middle of the eyebrows (\(P_{1}\), \(P_{2}\)), four points for the inner and outer corner of the eyes (\(P_{3}\), \(P_{4}\), \(P_{6}\), \(P_{7}\)), one point for the root of nose (\(P_{5}\)), one point for the supratip (\(P_{8}\)), two points for the alars sidewalls (\(P_{9}\), \(P_{10}\)), one point for the columella (\(P_{11}\)), two points for the mouth corners (\(P_{13}\), \(P_{14}\)), two points for the top of the upper lip and bottom of the lower lip (\(P_{12}\), \(P_{15}\)), and one point for the bottom of chin (\(P_{16}\)). Figure 2(a)–(b) show the localization of the sixteen facial points. Then, an ORB feature vector is computed for each keypoint. The resulting ORB descriptor is formed by the concatenation of the generated vector for each keypoint.

For several computational problems, it has been demonstrated that recognition accuracy can enhance by fusing feature descriptors [20]. After extracting the deep, geometric and ORB features, the combined feature vector is set to 3069 features and then scaled.

2.3 Feature Reduction and Classification

We followed two approaches, PCA and PCA+LDA, for performing feature reduction. Firstly, Principal Component Analysis (PCA) was employed over the resulting feature vector set, obtaining the principal feature vectors. Finally, Linear Discriminant Analysis (LDA) was applied over the PCA reduced feature vector, forming a new reduced feature space.

For the classification stage, we used Support Vector Machines (SVM), K-Nearest Neighbors (K-NN), Logistic Regression (LR) and Gaussian Naïve Bayes (GNB) classifiers to compare the accuracy rates for the genetic syndrome recognition. After performing dimensionality reduction, the classifiers are trained with reduced feature vectors. We applied the same validation protocol as the one used in the syndrome dataset from Ferry et al. [12], splitting the training and testing sets with a 4:1 ratio and obtaining a classification average from 10 repeats.

3 Experimental Results

The proposed methodology was tested on the Diagnostically Relevant Facial Gestalt Information from Ordinary Photos Database, collected by Ferry et al. [12]. This dataset is composed of 1499 ordinary and spontaneous photographs of different patients diagnosed into one of eight genetic syndromes: Angelman (205), Apert (200), Cornelia de Lange (250), Down (197), Fragile X (163), Progeria (150), Treacher Collins (103), and Williams-Beuren (231). This dataset comprises facial pose variations, illumination deviations, low-resolution photographs, different backgrounds, and occlusions, such as glasses, hair, scarves, hand gestures, among others. The subjects present in this dataset are of different ethnicities, genders, and ages, including children, adolescents, and adults.

For obtaining images of healthy controls, we employ the Dartmouth Database of Children’s Faces [7]. This dataset contains 40 male and 40 female children between 6 and 16 years of age. Models were photographed from different angles and lighting conditions, and also performing eight facial expressions: anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. From this image collection, we randomly selected 187 images, that is, the average number of images per genetic syndrome class, considering females and males who exhibit distinct pose deviations, facial expressions, and illumination variations. It is worth mentioning that we made this decision due the fact that children present the same facial musculature as adults [9].

Therefore, our dataset is composed of \(1686 (=1499+187)\) images in total to verify our recognition method. Then, we conducted experiments using our geometric representation, deep features, ORB features and the fusion of both features through the following approaches: PCA+K-NN, PCA+LDA+K-NN, PCA+SVM, PCA+LDA+SVM, PCA+LR, PCA+LDA+LR, PCA+GNB, and PCA+LDA+GNB.

We validated our methodology following the same protocol used in the Diagnostically Relevant Facial Gestalt Information from Ordinary Photos Database [12], that is, we randomly select 80% of samples of each class for the training set and the remaining 20% for the testing set. The results are reported in Table 1, whose values represent the average recognition accuracy rates obtained through the execution of ten repeats.

Table 1. Average accuracy (%) using our geometric, ORB and deep features, and the fusion between them on the evaluated dataset.

From our experiments, we can observe that the fusion of deep features with the geometric and ORB representation provides a high accuracy rate for genetic syndrome recognition. We can also notice that the individual use of deep features enables reaching an accuracy of about 85%, which is much superior to just using geometric or ORB representation independently. It is also shown that following a PCA+LDA approach provides increasing recognition rates. Table 2 shows the best detection accuracy rates achieved with the fusion feature set proposed in our approach and the methodology developed by Ferry et al. [12]. The accuracy rates were obtained using the validation protocol by Ferry et al. [12]. We can see that the proposed approach reaches competitive results.

Table 2. Comparison of average accuracy rates (%) for genetic syndrome recognition.

4 Conclusions

Experimental results demonstrated that the use of geometric and ORB representation allowed to enhance the discriminative power of the deep features. Our approach also proved to be robust for recognizing genetic syndromes in ordinary photographs in the presence of different occlusions, for instance, facial expressions, glasses, facial pose deviations, among others. Although the geometric and ORB representation did not provide a higher recognition rate individually, their fusion achieved a higher accuracy rate with deep features. Furthermore, PCA and LDA approaches, as well as the reduction and selection of discriminative features, allowed to increase the recognition rates significantly.