1 Introduction

Chromosome karyotyping is one of the important tasks considered in the field of cytogenetics. It is performed by cytogeneticists in which they segment and classify individual human chromosome images obtained during metaphase stage of cell division. A healthy human cell consists of 22 pairs of autosomes and a single pair of sex chromosomes (X and Y), thus giving a total of 23 pairs of chromosomes. Doctors examine the individual chromosomes and assign them to one of the 24 chromosome classes on the basis of various differentiating characteristics like banding pattern, centromere position, and length of chromosomes. An example of a karyotyped image obtained is shown in Fig. 1.

Fig. 1.
figure 1

An example chromosome karyotyped image.

In clinical labs, karyotypes provide doctors with the diagnostic information for specific birth defects, genetic disorders, and cancers mainly occurring due to structural changes, such as chromosomal deletions, duplications, translocations, or inversions. This process of manual segmentation and analysis of each and every chromosome for diagnosis purpose consumes a considerable amount of time and is highly dependent on expert knowledge. Thus, it motivates us to automate or semi-automate the karyotyping process in order to assist doctors and reduce their cognitive load by expediating the task of karyotyping. In this paper, we attempt to automate the classification stage of karyotyping with the assumption of availability of segmented and straightened individual chromosomes.

In recent past years, researchers have shown interest in automating the karyotyping process and proposed various machine learning and deep learning techniques [22, 23, 27, 31] with encouraging results. The main problem with these existing methods is that the performance of classifier deteriorates when the resolution of chromosome images is very low. The non-availability of high resolution images and requirement to obtain very high classification accuracy persuaded us to explore existing super-resolution techniques [11, 16, 30, 37] for low-resolution image classification tasks.

Although image analysis would be ideal with high quality images, but this is not always possible in practice because of non-availability of high resolution images. Considering the fact that higher the image resolution, the easier is the classification, we take cues from established literature on low resolution image classification and propose an end-to-end deep learning framework for automating chromosome classification of low-resolution images. The proposed network is an integration of deep super-resolution layers with a standard classification network (e.g. Xception network [8]) which is trained in an end-to-end manner and we named it as Super-Xception network. Before feeding the individual chromosome image as input to Super-Xception model, we perform a pre-processing step of length normalization to preserve the important distinguishing characteristic of chromosomes. During these explorations, we make the following main contributions in the paper:

Fig. 2.
figure 2

Proposed architecture of Super-Xception network for chromosome classification. A low-resolution image of size \(50 \times 50\) is given as an input to the network which is resized to image \(I^{LR}\) of size \(227 \times 227\) via bicubic interpolation. Subsequently, \(I^{LR}\) is passed through convolutional super-resolution layers and a high-resolution image \(I^{HR}\) is produced. Further, \(I^{HR}\) is fed to convolutional classification network (Xception [8]) which outputs chromosome class labels in the range (0–23).

  1. 1.

    To the best of our knowledge, the proposed work is the first attempt to automate classification of low-resolution chromosomal images.

  2. 2.

    We propose an end-to-end trainable Super-Xception network for automatic chromosome classification of low-resolution images. The architecture of the network is shown in Fig. 2.

  3. 3.

    We experimentally verify that the proposed Super-Xception network achieves superior performance for automatic chromosome classification of low-resolution images than the state-of-the-art networks like Deep Convolutional Network (DCNN) [31], ResNet-50 [17] and Xception [8] on a publicly available Bioimage Chromosome Classification dataset [1, 29].

The remainder of the paper is organized as follows: Sect. 2 gives an overview of related work in the field of chromosome karyotyping and super-resolution of images. Section 3 describes the proposed methodology for automatic chromosome classification which is followed by a brief description of deep super-resolution layers and Xception network in Sects. 3.1 and 3.2, respectively. In Sect. 4, we explain the proposed architecture of Super-Xception network. Subsequently, Sect. 5 gives details about the dataset, the training setup utilized and a discussion on the obtained results. Finally, we conclude the paper and discuss future directions in Sect. 6.

2 Related Work

Cytogeneticists spend considerable amount of manual effort and time in the karyotyping process which involves segmenting individual chromosomes from cell spread metaphase image and classifying the obtained individual chromosome segments to 24 classes. To reduce the cognitive load and aid doctors in the analysis of chromosomes and accelerate the process of karyotyping, research community have developed many computational algorithms [5, 6, 27]. A lot of work has been carried out on automatic segmentation of overlapping chromosomes [3, 28] and chromosome classification [13, 23, 26, 31] with encouraging results. Earlier, several techniques were developed for straightening of bent chromosomes [21, 22] to improve classifier performance. However, we found that there exists no work on chromosome classification when the images are of low resolution. Generally, it is difficult to obtain high resolution chromosome images from hospitals/labs which results in poor performance of the classifier. This motivated us to take up the task of automating chromosome classification in scenarios where the chromsome images are of inferior quality. Here, we make an assumption that we have been provided with segmented and straightened individual chromosomes.

There exists numerous super-resolution techniques for conversion of low-resolution (LR) to high-resolution (HR) images in vision field for better performance. Existing super-resolution (SR) algorithms are grouped into four groups: image statistical methods [19], example-based methods [10, 11, 15, 16, 18, 30, 35, 37], prediction models [20] and edge-based methods [14, 34]. However, with the advancement in deep learning techniques, researchers have started employing Convolutional Neural Networks for SR tasks as well, which perform better than state-of-the-art traditional methods. Dong et al. [11] proposed first convolutional neural networks for image super-resolution which learns a deep mapping between low and high resolution patches. Subsequently, variants of deep super-resolution networks were proposed. To avoid general up-scaling of input patches, a deconvolutional layer is added based on super resolution CNN (SRCNN) [11] in [12] which results in acceleration of CNN training and testing. A convolutional deep network is proposed in [24] to learn the mapping between LR image and residue between LR and HR image to expediate CNN training for very deep network. Kim et al. [25] uses a deep recursive layer in order to circumvent adding weight layers which will prevent increasing network parameters. In this paper, we borrowed the idea from [7] for incorporating super-resolution layers into a convolutional network (i.e. Xception network [8]) for classification. The experiments have shown that the SR-specific convolutional layers help in improving classification performance by recovering texture details from the low resolution images.

While there is huge corpus of deep networks for image classification [17, 32, 33, 36], we chose to use state-of-the-art Xception network [8] for chromosome classification. Because in general, traditional convolutional layers of CNNs classify images by learning feature maps in 3D space. Each convolutional layer performs mapping of correlations spatially and across channels simultaneously. But, the depthwise separable convolutional layers used in Xception network explicitly divide this task of learning feature maps into series of sub-tasks that independently look at cross-channel and spatial cross-relations. This makes the network learn robust feature representations with lesser parameters.

3 Proposed Methodology

This section gives an overview of the proposed method for automatic classification of low resolution chromosome images. As we are aware of the fact that low resolution images hinder the performance of any image classifier, hence our proposed method works upon improving the resolution of the image before classification. Higher the resolution of images, easier is the classification and hence, better is the performance of the classifier. However, in case of chromosomes, it is not always possible to obtain high resolution chromosome images due to which it becomes difficult to automate the chromosome classification process. To alleviate this issue, we proposed a network which first converts a low-resolution image to its higher resolution version by employing convolutional super-resolution layers [7] and further, passes the obtained high-resolution image to convolutional classification network like Xception network [8] to produce chromosome class label. We named the proposed network as Super-Xception whose architecture is shown in the Fig. 2. In the following subsections, we discuss few details of the convolutional super-resolution layers and Xception network.

3.1 Convolutional Super-Resolution Network

The major difference between the proposed Super-Xception network and the conventional Xception network [8] lies in the addition of three convolutional super-resolution layers following [7]. As a result, Super-Xception network becomes more deeper and consequently, it will store more knowledge about the images in form of increased network parameters. The main purpose of introducing these super-resolution layers is to improve the resolution and recover the texture details of the low-resolution images. The last layer of these super-resolution block produces a residual image which is the difference of the high-resolution (HR) and low-resolution (LR) image. The better learning of these layers depends on the fact that the HR and LR images are largely similar, i.e. more similarities must be removed from the residual image [7]. The CNNs can learn the detailed information from the residual images more easily than LR-HR CNNs [11, 12].

3.2 Xception Network

The Xception network [8] is made up of depthwise separable convolutional layers, which consists of a depthwise convolution (a spatial convolution performed independently for each channel) followed by a pointwise convolution (a 1 \(\times \) 1 convolution across channels). It is based on the hypothesis that the mapping of cross-channel correlations and spatial correlations in the feature maps of convolutional neural networks can be entirely decoupled. We can think of this as looking for correlations across a 2D space first, followed by looking for correlations across a 1D space. Intuitively, this 2D + 1D mapping is easier to learn than a full 3D mapping. The architecture of Xception network [8] consists of 36 convolutional layers which acts as the feature-extractor. This is followed by a softmax layer for the image classification purpose. These 36 convolutional layers are structured into 14 modules having linear residual connections around them, except for the first and last modules. Precisely, the Xception architecture is a linear stack of depthwise separable convolution layers with residual connections.

4 Architecture of Super-Xception Network

The proposed architecture of Super-Xception network, as shown in Fig. 2, consists of two sub-networks: super-resolution and classification network. The convolutional super-resolution layers, shown on the left side of Fig. 2, recover the texture details of low-resolution images to feed into the following convolutional categorisation layers. Next, the classification model solves the task of label assigning to the image. Since our network is the augmentation of Xception network with the deep super-resolution layers, hence we named it as Super-Xception network.

The convolutional super-resolution layers take a bicubic-interpolated low-resolution image (of the desired size) \({\varvec{I}}^{LR}\) and learn the mapping \(\mathbf g( {\varvec{I}}^{LR}{} \mathbf ) \) from LR image \({\varvec{I}}^{LR}\) to residual image \({\varvec{I}}^{HR}- {\varvec{I}}^{LR}\), where \({\varvec{I}}^{HR}\) is the high-resolution version of the image. We used three typical stacked convolutional-ReLU layers as the super-resolution layers in Super-Xception network. The empirical basic setting of the layers is \(f_1 = 9\times 9\), \(f_2 = 5\times 5\), \(f_3 = 5\times 5\), \(n_1 = 64\), \(n_2 = 32\) and \(n_3 = 1\) following from [7], where \(f_m\) and \(n_m\) represent the size and number of the filters of the \(m^{th}\) layer, respectively. The output obtained from the last convolutional layer of super-resolution network is summed with the interpolated version of low-resolution image \({\varvec{I}}^{LR}\) to construct the full super-resolution image \({\varvec{I}}^{HR}\) which is further fed into the remaining classification layers of Super-Xception network.

We used layers of Xception network [8] as the underlying classification layers for our Super-Xception network. The high-resolution image \({\varvec{I}}^{HR}\) is passed through the Xception network which learns the feature representation of the image. A softmax layer at the end is used for assigning labels in the range (0–23) to the learnt feature-representation.

5 Experiments

This section is divided into the following subsections: Sect. 5.1 provides details of the publicly available online Bioimage Chromosome Classification dataset [1, 29]. In Sect. 5.2, we elaborate on the training details utilized to perform our experiments. Subsequently, Sect. 5.3 discusses the results obtained from the experiments we conducted and provides comparison with the baseline models.

5.1 Dataset

We have utilized publicly available online Bioimage Chromosome Classification dataset [1, 29] to conduct our experiments. This dataset contains a total of 5256 chromosomes images of healthy patients, manually segmented and labeled by an expert cytogenecist. We have divided these 5256 images into three sets of 4176, 360 and 720 each for training, validation and testing purpose, respectively. While conducting our experiments, we have set the resolution of chromosome images to be \(50\,\times \,50\) in grayscale which is interpolated to the desired size of \(227\,\times \,227\). Furthermore, we have employed a pre-processing step of length normalization [31] to every chromosome image in the dataset.

5.2 Training Details

The performance of our model was compared with the baseline networks created using traditional deep CNN [31], AlexNet [7], ResNet-50 [17], and Xception [8] networks. The deep CNN network was trained using Adam optimizer with learning rate of \(10^{-4}\) and rest of the parameters were set to default values. For ResNet-50 network, we used stochastic gradient descent with learning rate of \(10^{-3}\), momentum of \(10^{-6}\), decay parameter set as 0.9 and nestrov set to be true. The Xception and proposed Super-Xception networks were trained with Adam optimizer with learning rate of \(10^{-4}\) and rest of the parameters were assigned default values. The number of epochs used to train deep CNN, ResNet-50, Super-AlexNet, Xception and Super-Xception models were set to 150, 30, 100, 50 and 80 respectively. For best trained model, we observed validation results at each epoch and tracked model parameters corresponding to the lowest validation loss. Deep CNN, AlexNet and ResNet-50 networks were implemented using Theano [4] and Keras [9] while Xception and Super-Xception models were implemented in Tensorflow [2] and Keras [9].

5.3 Results and Discussion

Table 1 shows the results of the experiments performed during evaluation of our proposed network and baseline networks. Row 1 of Table 1 shows the accuracy of a traditional deep CNN network comprised of 6 convolution layers having number of filters as 16, 16, 32, 64, 128 and 256 respectively. Each convolutional layer uses Rectified Linear Units (ReLU) and is followed by a Max-pool layer of size \(2 \times 2\). The last convolutional layer is proceeded by two fully connected layers with 1024 and 512 hidden units and having sigmoid as their activation function. The last layer is the softmax activated fully connected layer having 24 units each representing one of the 24 chromosome classes.

Table 1. Table showing comparison of classification accuracy of our proposed Super-Xception network with that of baseline deep networks for chromosome classification.

Subsequently, row 2 of Table 1 represents the performance of ResNet-50 which is a minor improvement over traditional deep CNN network. Next, we perform the classification using Super-ResNet model which is the augmentation of ResNet-50 to the convolutional super-resolution layers. This network gives a boost of \(\mathbf{2.91\% }\) in accuracy over ResNet-50 (row 4 of Table 1). This improvement is the result of incorporation of convolutional super-resolution layers before feeding to the traditional classification network. This further explains that the poor performance of other baseline models is due to the low-resolution of chromosomal images. Thus, this motivates us to use convolutional super-resolution layers before any classification network for chromosome classification.

Similarly, we also implemented Super-AlexNet [7] using concatenation of convolutional super-resolution layers to the AlexNet model proposed by Cai et al. The performance of this model is shown in row 3 of Table 1.

Next, row 5 of Table 1 gives the classification accuracy of Xception network [8] which is a considerable improvement over traditional deep CNN network and ResNet-50 model. This encourages us to employ the Xception network in concatenation to convolutional super-resolution layers in our proposed Super-Xception network.

Finally, the row 6 of Table 1 represents the performance of our proposed method, i.e. Super-Xception network which achieves the highest classification accuracy of 92.36%, outperforming various existing state-of-the-art algorithms for automatic chromosome classification.

6 Conclusion

The paper started by explaining the need to automate chromosome classification for assisting cytogeneticists in the analysis of chromosome images and saving their valuable time. Further, we consider the situations where there is non-availability of high resolution chromosome images which affect the accuracy of the classifier. Therefore, we explored the use of convolutional super-resolution layers before feeding low resolution chromosome images to a convolutional classifier. We demonstrated via experimentation that super-resolution helps in enhancing the resolution of images and thereby improving the performance of classifier. Next, we propose the use of Xception network for classification of chromosome images after the convolutional super-resolution layers. We evaluated our proposed architecture on a publicly available online Bioimage Chromosome Classification dataset of healthy humans and compared its performance against several baseline classification networks. We observed that our network beats various state-of-the-art networks available for automatic chromosome classification. Going ahead, we would explore techniques to detect various structural abnormalities like deletions, inversions and translocations etc. present in chromosomes of unhealthy humans to diagnose various birth defects and genetic disorders.