1 Introduction

Coronaviruses are a large group of viruses known to cause comorbidities that range from colds to more severe diseases, such as middle east respiratory syndrome (MERS) and severe acute respiratory syndrome (SARS). A new Coronavirus has emerged in 2019, in Wuhan, China. This virus is a new strain that has not been previously identified in humans [1]. The infection is transmitted by droplets when a person contacts someone with respiratory symptoms (such as coughing or sneezing) with close contact (within a distance of 1 m), which makes this person at risk of exposing his mucous membranes (mouth and nose) or conjunctiva (eye) to potential respiratory droplets. To be contagious, the infection may also be transmitted by contaminated devices found in the environment surrounding the infected person [8]. Therefore, infection with the virus that causes COVID-19 disease can be transmitted either by direct contact with infected individuals or indirect contact with surfaces in the surrounding environment, or tools used by the infected person (such as a stethoscope or thermometer). According to Medical News today [9], as the COVID-19 pandemic continues to claim victims around the world, the need for early diagnosis of infections is increasing. Also, the limited image data sources with expert labeled data for COVID-19 detection are insufficient. In addition, manual detection of COVID-19 is time consuming. Several medical image processing techniques have been involved in several applications in recent years [2,3,4,5,6,7]. Therefore, many computer-aided diagnostic (CAD) systems that depend on by automated artificial intelligence (AI) methods have been designed to detect and differentiate COVID-19 [2, 10,11,12,13,14,15].

At present, recent advances in machine learning, particularly data-driven deep learning (DL) methods using CNNs and ConvLSTM have shown promising performance in identifying, classifying, and quantifying disease patterns in medical images [3, 16,17,18,19,20,21], especially in the detection of COVID-19 [11, 12]. For instance, Rajaraman et al. [10] presented a deep learning algorithm using a CNN for COVID‐19 detection. They carried out their algorithm on a type of X-ray images called CXR images. They obtained accuracies of 0.5555 and 0.6536 in classifying Twitter and Montreal COVID‐19 CXR test data, respectively. Asif and Wenhui [11] developed a method for automatic prediction of COVID-19 using a deep convolutional neural network (DCNN) based on Inception V3 model and chest X-ray images. They obtained more than 96% accuracy for the detection of COVID-19. Moura et al. [12] presented a deep learning method for the classification of COVID-19, pneumonia and healthy chest X-ray radiographs. They obtained the best average accuracy for validation as 0.9725. He et al. [22] presented a deep transfer learning method for COVID-19 diagnosis based on CT scans. They achieved an F1 score of 0.85 and an AUC of 0.94 in diagnosing COVID-19 from CT scans. Jelodar et al. [23] introduced a method for automated detection of COVID-19 using the LSTM recurrent neural network approach. Also, Yan et al. [24] presented an LSTM-based COVID-19 confirmed cases prediction model. However, most of the previous deep learning approaches have the problem of data irregularity, which usually results in mis-calibration between classes in the dataset. Also, most of these studies worked on a limited number of images [7, 25], which achieves low accuracy for detection of COVID-19. In addition, these studies have overfitting problems and regularization errors. The work in [26] presented a deep learning approach for COVID-19 detection. This approach is based on a CNN and a ConvLSTM. The presented deep learning models are tested on both X-ray and CT biomedical images. The main idea in this work is to investigate the effect of performing several data augmentation techniques such as rotation, flipping and resizing as well as data augmentation based on conditional generative adversarial networks (CGANs). This work achieved an accuracy of 99% for X-ray images and 96% for CT images.

In this paper, we present an effective and robust solution for the detection of COVID-19 cases from small data based on CNN and ConvLSTM. Moreover, the proposed method reduces the overfitting and regularization errors through the utilization of an effective augmentation technique for COVID-19 recognition. The main contributions of this paper are as follows:

  • We propose novel and efficient modalities for the detection of COVID-19 based on CNN and ConvLSTM.

  • We introduce a novel dataset, which is a combination of CT and X-ray images in normal and COVID-19 cases.

  • We get a high accuracy of classification on small data based on CNN and ConvLSTM.

  • We present an effective augmentation technique for COVID-19 detection to reduce overfitting and regularization errors.

The rest of this paper is structured as follows. Section 2 describes in detail the proposed modalities. Section 3 presents experimentation, dataset and performance evaluation of the proposed modalities. Section 4 gives a discussion of the results. Section 5 gives the concluding remarks and the future work.

2 Deep learning modalities for augmented detection of COVID-19

This paper presents a diagnosis system for COVID-19 disease with deep learning. The proposed deep learning modalities are based on CNN and ConvLSTM. The objective is to design deep learning modalities that can extract feature maps from the input images and enroll these feature maps into a classification network to discriminate between the normal and abnormal states. The performance of these modalities is evaluated by the capability of discrimination of the normal and abnormal states with low false-alarm rates. So, the main contribution is to design an efficient deep learning architecture. This architecture consists of a hierarchy of convolutional layers, pooling layers, and ConvLSTM layers. In addition, the classification network handles the feature maps generated from the deep learning architecture to obtain a decision whether the input images are normal or not.

The first proposed deep learning modality is based on CNN. This modality consists of five convolutional (CNV) layers followed by five pooling (PL) layers. This hierarchy is implemented to extract features from the input images and generate feature maps that are enrolled to the classification network. Each CNV layer in the deep learning architecture generates a feature map whose depth equals the number of digital filters included in this layer. In addition, the PL layer is implemented for feature reduction. It can be implemented with two methodologies (max pooling and mean pooling). The feature map is segmented into square windows with a certain size. The max pooling technique is based on extracting the maximum value from each window, while the mean pooling extracts the mean value of each window. Furthermore, the classification network consists of two layers: fully-connected layer and classification layer. The fully-connected layer handles the feature map, which is generated from the hierarchy of CNV and PL layers. This layer converts the 3D feature map into a feature vector that is enrolled into the classification layer that determines whether the input image belongs to normal or abnormal classes. Figure 1 shows the proposed CNN model.

Fig. 1
figure 1

Architecture of proposed CNN modality

Another deep learning modality is proposed in this paper. It is a hybrid one that contains both ConvLSTM and CNN modalities. The ConvLSTM can be considered as the 2D version of the LSTM. The main idea of the LSTM is to remember the previous states and construct the current state. This modality is a double-edged technique, because the current state depends on the whole previous states. So, the drop in such a state would lead to a damaging performance in the next states. Hence, this type of deep learning modalities needs to be handled carefully and observed in the training phase to correct any disorder that may occur. This deep learning modality consists of ten layers. The ConvLSTM layer is followed by a PL layer. Then, a set of three CNV layers followed by three PL layers is used. The classification network is the same as in the first deep learning modality. It can be observed that this model consists of less layers than those of the first deep learning modality based on CNNs. This modality is designed to reduce the complexity of the designed deep learning structure (Fig. 2).

Fig. 2
figure 2

Architecture of hybrid ConvLSTM-CNN modality

Both the first and second proposed modalities consist of three phases (training, validation, and testing). The training needs an optimization methodology to update the values of weights. Adam optimizer is used to minimize the cross-entropy loss function [27,28,29,30].

Backpropagation algorithm is implemented to minimize the error between the real and estimated targets. This error is minimized based on the cross-entropy loss function. This type of loss function mainly depends on the difference between the exponential log of both real and estimated targets, as shown in Eq. 1.

3 Experimentation, dataset and performance evaluation

This section describes the datasets used in our study and gives a discussion of the experimentation, datasets and performance evaluation of the proposed modalities on these datasets. The proposed deep learning modalities are implemented on both CT and X-ray images. In addition, they are implemented on an augmented dataset. The proposed modalities are implemented and evaluated by the accuracy, F1 score and MCC. In addition, the sensitivity, specificity, PPV and NPV are considered in the evaluation process. These metrics help in indicating whether the true infection case is estimated as an infection or misclassified as a normal case. These metrics may be useful for technicians.

3.1 Dataset description

The proposed deep learning modalities are tested on three datasets. The first dataset includes CT images [31]. This dataset consists of 288 COVID-19 images and 288 normal images. This dataset is augmented by several rotations and scaling operations. This augmentation process generates 2880 COVID-19 and 2880 normal images. The second dataset includes X-ray images [32]. This dataset consists of two different augmented subsets. Each subset consists of 304 COVID-19 images and 304 normal images.

The third dataset is the COVID-19 radiography dataset [33]. This dataset includes X-ray images. In addition, the COVID-19 radiography dataset consists of three categories: COVID-19, normal and viral pneumonia. The proposed modalities are tested on this dataset to discriminate between the cases of COVID-19 and viral pneumonia. Furthermore, the proposed modalities are tested on a combined dataset [34] including the first and second datasets in order to combine both CT and X-ray images in normal and COVID-19 cases. Table 1 shows a description of each dataset.

Table 1 Brief description of the datasets

3.2 Results on CT dataset

The proposed deep learning modalities are tested on CT images. The dataset is split into 70% for training and 30% for testing. In addition, in the training phase, the dataset is fragmented into batches, where the batch size is 20. Furthermore, the number of training epochs is 40.

3.2.1 CNN modality

The proposed CNN deep learning modality is tested on the CT images for COVID-19 detection. The simulation results are evaluated by accuracy, F1 score and MCC. Figure 3 shows the evaluation metrics of the proposed CNN modality. In addition, Fig. 4a and b show the accuracy and loss curves of both training and validation phases. Furthermore, Fig. 4c and d show the confusion matrix and the ROC curve for the testing phase.

Fig. 3
figure 3

Evaluation metrics of the proposed CNN modality on CT images

Fig. 4
figure 4

Accuracy, loss, confusion matrix and ROC curve for the proposed CNN modality on CT images

3.2.2 ConvLSTM modality

The proposed ConvLSTM deep learning modality is tested on the CT dataset. Like the CNN modality, it is evaluated with accuracy of detection, F1 score and MCC. Figure 5 shows the values of evaluation metrics at different numbers of epochs. The proposed modality achieves an accuracy of 100%. This is attributed to the nature of the CT images and the efficient design of the deep learning modality for sufficient feature extraction. In addition, Fig. 6a and b show the accuracy and loss curves of the training and validation data, while Fig. 6c and d show the confusion matrix and ROC curve of the testing data.

Fig. 5
figure 5

Evaluation metrics of the ConvLSTM modality

Fig. 6
figure 6

Accuracy, loss, confusion matrix and ROC curve for the proposed ConvLSTM modality on CT images

3.3 Results on augmented datasets

Both of the proposed deep learning modalities are tested on the augmented COVID-19 dataset. This dataset includes different X-ray chest images. The augmentation process is performed by rotations with several rotation angles. This dataset includes two categories: COVID-19 and normal. The target of the proposed modalities is to discriminate between COVID-19 and normal cases.

3.3.1 Results on augmented dataset A

The proposed deep learning modalities are tested on X-ray images, which are included in the first augmented dataset (Dataset A). Figures 7 and 9 show the evaluation metrics for the CNN and the ConvLSTM modalities, respectively. In addition, Figs. 9, 10, 13, and 14 show the accuracy and loss curves for these modalities. Furthermore, Figs. 11, 12, 15, and 16 show the confusion matrix and ROC curves.

Fig. 7
figure 7

Evaluation metrics of the proposed CNN modality on augmented dataset A

3.3.1.1 CNN modality

The proposed CNN modality is tested on the first X-ray augmented dataset. Figure 7 shows the evaluation metrics of the proposed CNN modality with different numbers of epochs. The simulation results reveal that the proposed CNN modality achieves an accuracy of 99.18%, an F1 score of 99.18%, and MCC of 98.37%. Figure 8a and b show the accuracy and loss curves of the training and validation data. In addition, Figs. 11 and 12 show the confusion matrix and ROC curve for the testing data. The testing accuracy is 99%.

Fig. 8
figure 8

Accuracy, loss, confusion matrix and ROC curve for the proposed ConvLSTM modality on CT images

3.3.1.2 ConvLSTM modality

The proposed ConvLSTM modality is tested on the X-ray dataset. Figure 9 shows the values of the evaluation metrics with different numbers of epochs. The proposed ConvLSTM modality achieves an accuracy of 93.23 with 40 epochs. In addition, the proposed modality achieves an F1 score of 93.68% and MCC of 87.27%. Figure 10a and b show the accuracy and loss curves of training and validation data. Furthermore, Fig. 10c and d illustrate the confusion matrix and ROC curve of the proposed modality.

Fig. 9
figure 9

Evaluation metrics of the proposed ConvLSTM modality for augmented dataset A

Fig. 10
figure 10

Accuracy, loss, confusion matrix and ROC curve for the proposed ConvLSTM modality for augmented dataset A

3.3.2 Results on augmented dataset B

The proposed deep learning modalities are tested on the second augmented dataset (Augmented Dataset B) including X-ray images. Both the CNN and ConvLSTM modalities are tested on 608 X-ray images. This dataset is split into 304 COVID-19 images and 304 normal images.

3.3.2.1 CNN modality

The proposed CNN modality is tested on the second augmented dataset. Figure 11 shows the values of the evaluation metrics of the proposed CNN modality with different numbers of epochs. The simulation results reveal that the proposed CNN modality achieves an accuracy of 100% with 20, 30 and 40 epochs. Figure 12a and b show the accuracy and loss curves of the training and validation data. In addition, Fig. 12c and d show the confusion matrix and ROC curve of the testing data. The testing accuracy of the proposed modality is 100%.

Fig. 11
figure 11

Evaluation metrics of the proposed CNN modality for augmented dataset B

Fig. 12
figure 12

Accuracy, loss, confusion matrix and ROC curve for the proposed CNN modality for augmented dataset B

3.3.2.2 ConvLSTM modality

The proposed ConvLSTM modality is tested on the second augmented dataset. Figure 13 shows the simulation results with different numbers of epochs. The simulation results reveal that the proposed ConvLSTM modality achieves accuracies of 97.7%, 99.67%, 99.84% and 98.34% with of 10, 20, 30 and 40 epochs, respectively. In addition, Fig. 14a and b show the accuracy and loss curves of the training and validation data. Furthermore, Fig. 14b and d show the confusion matrix and ROC curve of the testing data. The proposed ConvLSTM modality achieves a testing accuracy of 99%.

Fig. 13
figure 13

Evaluation metrics of the proposed ConvLSTM modality on the augmented dataset B

Fig. 14
figure 14

Accuracy, loss, confusion matrix and ROC curve for the proposed ConvLSTM modality on the augmented dataset B

3.3.3 Results of the combined dataset

Another scenario that is proposed in this paper is based on CT and X-Ray images. The proposed deep learning modalities are tested on a combined dataset including both CT and X-Ray images.

3.3.3.1 CNN modality

The proposed CNN modality is tested on the combined dataset. Figure 15 shows the simulation results of the proposed CNN modality with different numbers of epochs. The simulation results reveal that the proposed CNN modality achieves an accuracy of 98.97% with 40 epochs, while it achieves an F1 score of 98.65%. In addition, the proposed modality achieves an MCC of 97.76%. Figure 16a and b show the accuracy and loss curves of the training and validation data, while Fig. 16c and d show the confusion matrix and ROC curve of the proposed CNN modality. The proposed CNN modality achieves a testing accuracy of 99%.

Fig. 15
figure 15

Evaluation of the proposed CNN modality on the combined dataset

Fig. 16
figure 16

Accuracy, loss, confusion matrix and ROC curve for the proposed CNN modality on the combined dataset

3.3.3.2 ConvLSTM modality

The proposed ConvLSTM modality is tested on the combined dataset. Figure 17 shows the simulation results of the proposed ConvLSTM modality with different numbers of epochs. The simulation results reveal that the proposed ConvLSTM modality achieves an accuracy of 98.45% at epoch 40, while it achieved an F1 score of 98.07%. In addition, the proposed modality achieves an MCC of 96.81%. Figure 18a and b show the accuracy and loss curves of the training and validation data, while Fig. 18c and d show the confusion matrix and ROC curve of the proposed ConvLSTM modality. The proposed ConvLSTM modality achieves a testing accuracy of 98%.

Fig. 17
figure 17

Evaluation metrics of the proposed ConvLSTM modality on the combined dataset

Fig. 18
figure 18

Accuracy, loss, confusion matrix and ROC curve for the proposed ConvLSTM modality on the combined dataset

3.3.4 Results on COVID-19 radiography dataset

This paper presents a different scenario of COVID-19 detection. The detection process is based on discrimination between COVID-19 and pneumonia diseases. This scenario is a challenging one, because it is required to discriminate between two diseases with high similarity in features instead of normal and abnormal states like the previous scenarios. Both of the proposed modalities are tested on the radiography dataset to discriminate between viral pneumonia and COVID-19 X-Ray images.

3.3.4.1 CNN modality

The proposed CNN modality is tested on the radiography dataset in order to discriminate between viral pneumonia and COVID-19 states. Figure 19 shows the simulation results of the proposed modality with a number of epochs of 10, 20, 30, and 40. The simulation results show that the proposed CNN modality achieves an accuracy of 97.74%, 97.98%, 97.62%, and 94.41% with the number of epochs of 10, 20, 30, and 40, respectively. In addition, the proposed modality achieves an F1 score of 97.59%, 97.82%, 97.48%, and 94.33% for the number of epochs of 10, 20, 30, and 40, respectively. Furthermore, it achieves an MCC of 95.46%, 95.46%, 95.96% and 89.27% with 10, 20, 30 and 40 epochs, respectively. Figure 20a and b show the accuracy and loss curves of the training and validation data. In addition, Fig. 20c and d show the confusion matrix and ROC curve of the testing data. The proposed modality achieves a testing accuracy of 95%.

Fig. 19
figure 19

Evaluation metrics of the proposed CNN modality on radiography dataset

Fig. 20
figure 20

Accuracy, loss, confusion matrix and ROC curve for the proposed CNN modality on radiography dataset

3.3.4.2 ConvLSTM modality

The proposed ConvLSTM modality is tested on the radiography dataset in order to discriminate between viral pneumonia and COVID-19 states. Figure 21 shows the simulation results of the proposed modality at a number of epochs of 10, 20, 30, and 40. The simulation results show that the proposed ConvLSTM modality achieves an accuracy of 95.84%, 95.96%, 93.72%, and 85.02.41% at the number of epochs of 10, 20, 30 and 40, respectively. In addition, the proposed modality achieves an F1 score of 95.55%, 95.64%, 93.91%, and 86.18% for the number of epochs of 10, 20, 30, and 40, respectively. Furthermore, it achieves an MCC of 91.64%, 91.89%, 88.11% and 73.65% for number of epochs of 10, 20, 30 and 40, respectively. Figure 22a and b show the accuracy and loss curves of the training and validation data, while Fig. 22c and d show the confusion matrix and ROC curve of the testing data. The proposed modality achieves a testing accuracy of 88%.

Fig. 21
figure 21

Evaluation metrics of the proposed ConvLSTM modality on radiography dataset

Fig. 22
figure 22

Accuracy, loss, confusion matrix and ROC curve for the proposed ConvLSTM model for radiography dataset

4 Results and discussion

This paper includes a study of a COVID-19 diagnosis system using deep learning modalities. The proposed deep learning modalities are based on CNN and ConvLSTM. In addition, the proposed modalities are tested on both CT and X-ray images of COVID-19 and normal cases. The proposed CNN modality achieves accuracies of 100%, 99,18%, 100%, 98.91%, and 94.33% for CT images, augmented dataset (A), augmented dataset (B), combined dataset, and COVID-19 radiography dataset, respectively. On the other hand, the ConvLSTM modality achieved accuracies of 100%, 93.23%, 100%, 98.45%, and 85.02% for CT images, augmented dataset (A), augmented dataset (B), combined dataset and COVID-19 radiography dataset, respectively. Furthermore, F1 score and MCC evaluation metrics are calculated in order to present a fair evaluation of the proposed deep learning modalities. Although the proposed modalities achieve high performance in some cases, they achieve poor performance in others, especially for the ConvLSTM modality. The reason for this poor performance is the dependency on the previous states in the structure of the ConvLSTM modality. Generally, the proposed modalities reveal high performance in the detection of COVID-19. So, the proposed modalities can be considered in an efficient diagnosis system for COVID-19. Figure 23 shows the evaluation metrics of the proposed CNN modality for all datasets, while Fig. 24 shows the evaluation metrics of the proposed ConvLSTM modality.

Fig. 23
figure 23

Evaluation metrics of the training phase for CNN model

Fig. 24
figure 24

Evaluation metrics of the training phase for ConvLSTM modality

In addition, the proposed deep learning modalities are evaluated in the testing stage using the area under ROC curve (AUC). The proposed CNN modality achieves testing accuracies of 100%, 99%, 100%, 999% and 95% for CT images, augmented dataset (A), augmented dataset (B), combined dataset and COVID-19 radiography dataset, respectively. On the other hand, the ConvLSTM modality achieves testing accuracies of 100%, 93%, 99%, 98% and 88% for CT images, augmented dataset (A), augmented dataset (B), combined dataset and COVID-19 radiography dataset, respectively.

Table 2 shows a brief comparison between the proposed modalities and the work presented in the literature. This comparison is performed in two directions. The first direction depends on the works, which were tested on X-ray images. In this direction, the proposed deep learning modalities achieve an accuracy of 99%, while the previous works achieved a range of 95 to 98%. On the other hand, another direction is considered for the comparison between the proposed modalities and the previous works, which were tested on CT images. The works in literature achieved an accuracy of 83% to 90.1%, while the proposed modalities achieve an accuracy of 99%. So, the proposed deep learning modalities can be considered in an efficient diagnosis systems for the detection of COVID-19.

Table 2 Brief comparison between the proposed modalities and the traditional works

5 Conclusion and future work

This paper presented an effective and robust solution for the detection of the COVID-19 cases based on two efficient deep learning modalities: CNN and ConvLSTM. Two different augmentation techniques have been utilized for COVID-19 detection in order to reduce overfitting and regularization errors. We introduced a novel dataset, which is a combination of CT and X-ray images with normal and COVID-19 cases. The proposed modalities have been tested on both CT and X-ray images of COVID-19 and normal cases. The proposed CNN modality achieved a high classification performance with accuracies of 100%, 99,18%, 100%, 98.91% and 94.33% for CT images, augmented dataset (A), augmented dataset (B), combined dataset and COVID-19 radiography dataset, respectively. On the other hand, the ConvLSTM model achieved accuracies of 100%, 93.23%, 100%, 98.45% and 85.02% for CT images, augmented dataset (A), augmented dataset (B), combined dataset and COVID-19 radiography dataset, respectively. Hence, the proposed modalities can be considered in an efficient diagnosis system for the detection of COVID-19 and other relevant infections.