Background

In November 2019 a respiratory disease emerged that was later identified to be caused by a new member of the coronaviruses family, a single-stranded RNA virus named SARS-CoV-2 by the International Committee on Taxonomy of Viruses [1, 2]. Since then, the virus that causes COVID-19 has spread to every country in the world [3]. The World Health Organization declared the outbreak a pandemic in February 2020 [4, 5], and the need to break its chain of transmission had become crucial. As SARS-CoV-2 replicates, it can undergo mutations, leading to the emergence of different strains [6]. It is well-established that the epidemiological and clinical features of COVID-19 can vary depending on the predominant strain [7]. In the early days of the outbreak, the disease was characterized mostly by cough, fever and dyspnea [8,9,10], but as the novel infection grew to inflict upon more people in different regions worldwide, other symptoms emerged as significant in diagnosis including rare symptoms such as changes in smell and taste can complicate the diagnosis of the disease [11]. Six main strains were identified in Iran, namely: the initial strain, B.1.36, B.1.1.413, Alpha, Delta and Omicron. Each variant of the disease has presented with different characteristics. For example, the Omicron variant has shown a higher transmissibility compared to other variants [12]. Investigating the evolution of symptom patterns can assist researchers in understanding the virus's behavior and tracking its progression. There have been studies conducted globally regarding changes in symptom patterns, some of which are referenced in this study [13,14,15,16]. The current research conducted within Iranian society, with a substantial number of participants, can provide valuable insights into the evolving symptom patterns within this geographical region. In this study, we aimed at determining the symptoms that predict a positive result on the RT-PCR test in each variant of COVID-19. We took into consideration situations where testing equipment is limited and costly tests need to be reduced. Thus, we only included characteristics observed at admission and did not consider laboratory or other para-clinical findings. This allowed us to identify most probable cases promptly. Our study focused on all variants of COVID-19 that caused an outbreak in Iran, including three Variants of Concern [6]. The goal of our study was to examine the variations in clinical features during periods when different variants were dominant, using artificial intelligence techniques like deep learning and neural network.

Methods

Study population and data sources

In this study 46,747 cases were extracted from the hospital information system from 17 hospitals of South Khorasan province, Iran, from the first case identified in the region on 22 February 2020 to 23 February 2022. Over the period, six different variants, including three Variants of Concern (VOC) [6] made an outbreak in the country. The initial outbreak belonged to the novel coronavirus original strain (designated herein as “the Initial variant”) and dominated from the beginning until 3 May 2020 and included 2933 patients. The B.1.36 variant took over from then on until 5 September 2020 and B.1.1.413 from 6 September until 24 January 2021 and there were 5548 and 10,563 suspected patients admitted to the hospitals during these periods, respectively. The Alpha (B.1.1.7) VOC prevailed from 7 September 2021 to 9 June, The Delta (B.1.617.2) VOC from then until 9 January 2022 and at last, Omicron (B.1.1.529) VOC made its outbreak from 10 January 2022 [17, 18] to the date we carried out this study, and the number of the patients comprised 7540, 16,286 and 3877 people, respectively. The diagnoses were confirmed using Reverse Transcription-Polymerase Chain Reactions (RT-PCR) performed on the viral RNA specimens acquired via throat and nasopharyngeal swabs collected from the respiratory tract of the symptomatic individuals. Based on the PCR test results individuals were classified into two groups; SARS‑CoV‑2 positive patients and SARS‑CoV‑2 negative subjects.

Data was registered from history on admission. The characteristics that were recorded comprised of demographic information (sex and age), a number of symptoms (namely, cough, dyspnea, taste or smell disorder, fever, chills, headache, myalgia, sore throat and diarrhoea) and the most prevalent comorbidities (diabetes, cardiovascular diseases, chronic lung diseases, renal diseases, chronic liver diseases, malignancies and immunodeficiency diseases). In this study, fever was defined as having a temperature of ≥ 38°C [19] using Non-contact (Infrared) Thermometers at the admission.

Statistical analysis

The quantitative variables were described using mean and standard deviation, while the qualitative variables were described using frequency and percentage. The chi-square test was used to compare the proportions in different groups. The independent t-test was used to compare age between the cases with positive and negative test results. The Analysis of Variance test was used to compare the means between different groups. To determine the adjusted associations between reported symptoms and variants of the infection, we utilized multiple logistic regression models with a backward approach. Logistic Regression, a machine learning technique, was presented in the proceedings as a solution for classification problems [20]. Furthermore, the results were validated using a neural network.

We employed powerful multilayer, feed-forward backpropagation, supervised neural networks for prediction and modeling purposes [21]. The neural network structure was determined after testing various combinations of connection functions (hyperbolic tangent or sigmoid for input and middle layer neurons and linear, hyperbolic tangent, or sigmoid functions for middle and output layer neurons), number of hidden layers (one or two), and number of neurons between 2 and 10 in each hidden layer. We used 70% of the data for training and the rest for model evaluation [22]. We supported the results by performing the neural network analyzing for the purpose of determining the importance of association between the factors and the COVID-19 variants. In both approaches, we defined a model for each variant and designated numbers chronologically (1 to 6). The statistical analyses were conducted using the Rattle package of the R statistical software (version 4.1.2) [23]. A significance level of less than 0.05 for the p-value was set for all the statistical analyses.

Results

Study participants and demographic results

Of a total of 46,747 individuals, 23,239 were men (49.7%) and 23,508 were women, of which the PCR test for COVID-19 was positive for 15,626 (33.4%) patients. Among PCR positive patients 49.5% were male and 50.5% female. Patients were on average 51.48 ± 21.41 years (across different variants) and SARS-2 negative patients were 46.39 ± 26.08 years of age. Out of the SARS-CoV-2 positive group, 8% of the individuals passed away, whereas, the percentage of deaths in the negative group was only 1.1. According to Table 1, the mean age of patients with the Omicron and Delta variants was significantly lower than other variants. In all studied symptoms except for malignancy and immunodeficiency, there is a significant difference between strains (P < 0.05).

Table 1 Between variants comparison of demographic characteristics and symptoms in people with positive test results in different variants

Symptom presentation and univariate analysis across variants

Based on univariate analysis of variants, age variable was significantly higher in cases with a positive test result in all variants. Generally, fever has been a common symptom among individuals with and without a positive test result. However, the odds for running a fever in the B.1.1.413 variant, were more in favor of patients with positive test results, whereas in the Omicron variant, patients with a negative test result had higher chances of having a fever (Table 2). Cough is an important predictor of COVID-19 in all variants expect for Omicron. The only symptom that showed strong significant results in favor of having each variant was myalgia (p < 0.001). Dyspnea is seen more in patients with a positive test result in variants B1.36, B.1.1.413 and Alpha compared to PCR-negative patients. In Alpha, Delta and Omicron variants, a significant incidence of smell or taste disorder was observed in PCR-positive patients. Also, the prevalence of chills, myalgia and smell or taste disorder was observed more in patients with the Delta and Omicron variants (P < 0.001). The presence of diarrhoea did not result in a positive test in all variants.

Table 2 Within variants comparison of demographic characteristics and symptoms

In relation to underlying conditions, diabetes, chronic lung disease, cardiovascular, renal and chorionic liver diseases were more prevalent in the early variants. The presence of chronic lung and heart diseases was significantly higher in most variants among cases with a negative test result. Diabetes was a significant risk factor in the primary variants (B.1.36, B.1.1.413 and Alpha), while there was no significant difference in the two groups with positive and negative test results in the Delta and Omicron variants (Table 2).

Multivariate analysis by machine learning logistic regression approach

Based on the machine learning modeling logistic regression, The most prominent positive associations were myalgia (OR: 2.03; 95% CI, 1.6 – 2.57; p < 0.001, Model 1 (the initial); OR: 2.04; 95% CI, 1.76 – 2.36; p < 0.001, Model 2 (B.1.36)), cough (OR: 1.93; 95% CI, 1.68—2.22; p < 0.001, Model 2; OR: 1.81; 95% CI, 1.62—2.03; p < 0.001, Model 4 (Alpha)), taste or smell disorder (OR: 2.62; 95% CI, 2.1 – 3.28; p < 0.001, Model 5 (Delta)), headache (OR: 1.51; 95% CI, 1.26—1.79; p < 0.001, Model 2), chills (OR: 1.7; 95% CI, 1.43 – 2; p < 0.001, Model 4; OR: 1.71; 95% CI, 1.43 – 2.04; p < 0.001, Model 6 (Omicron)) and sore throat (OR: 1.6; 95% CI, 1.37 – 1.86; p < 0.001, Model 6). On the other hand there were also a few significant negative associations: sore throat (OR: 0.6; 95% CI, 0.43 – 0.84; p = 0.003, Model 1), diarrhoea (OR: 0.48; 95% CI, 0.32 – 0.73; p = 0.001, Model 6; OR: 0.62; 95% CI, 0.47 – 0.81; p = 0.001, Model 4) and surprisingly fever in the Omicron outbreak (OR: 0.78; 95% CI, 0.65 – 0.94; p = 0.009). In demographics, aging (for each 10 years older) showed 6–12% likelihood of returning a positive test across the six periods (Table 3).

Table 3 Estimation of the simultaneous odds ratio of predictive symptoms for different COVID-19 strains based on a multiple regression model

Determining the importance of predictive symptoms by Artificial Neural Network (ANN)

Eighteen independent variables of 46,747 total patients were used to build the ANN. The output classes were positive and negative RT-PCR test, for each variant. As a result, age, sore throat, myalgia, along cough and diarrhoea with an accuracy of 81.7%, were the most important factors in the model evaluating the Initial strain (Table 4). The importance of age increased in subsequent variants and remained on top. The most prominent predictors besides age were myalgia in B.1.36 (model accuracy: 77.3%); cough, myalgia, fever in B.1.1.413 (59.6%); diarrhoea, and to a lesser extent, taste/smell disorder in Alpha (75.7%); taste or smell disorder, cough, and diarrhoea in Delta (69.2%) and, chill and diarrhoea in Omicron (62.5%) (For more details see Supplementary) (Table 4).

Table 4 Estimation of the importance level of symptoms as predictors of disease for each strain based on a neural network model

Discussion

The study analyzed a vast amount of data from inpatient populations who exhibited signs and symptoms indicative of COVID-19 upon admission. The objective was to identify the most predictive characteristics associated with each SARS-CoV-2 variant responsible for causing outbreaks in Iran and South Khorasan province over a two-year period. During this time, there were a total of six outbreaks attributed to six distinct strains of SARS-CoV-2,

Of particular significance, our analysis found that fever (defined as a temperature of ≥ 38°C), which has been commonly included in a triad of symptoms (alongside dyspnea and cough) used to diagnose COVID-19 [8, 9, 24], was only significantly associated with the B.1.1.413 variant. Conversely, we found a significant negative association between fever and the Omicron coronavirus variant. In a study conducted by Mousavi et al., normal body temperature was observed in patients during the period corresponding to the first five waves of our study [25]. However, the absence of fever in these patients could be attributed to the possible use of antipyretics prior to seeking medical attention.

We included underlying diseases only in the univariate analyses. The results for most of the comorbidities in our study were against the expectation especially for chronic lung disease, as they were shown to be more common among SARS-CoV-2 PCR negative patients. There appears to be an inverse relationship between chronic lung disease and the likelihood of testing positive for COVID-19. Godbout et al. showed that people with underlying diseases maintained lower levels of contacts as they perceived themselves at risk of COVID-19 complications [26, 27]. Guntur et al. found that chronic respiratory disease (asthma, ILD and COPD) were not associated with SARS-CoV-2 PCR positive test; Roland et al. also saw chronic lung disease more uncommon among COVID-19 positives [28,29,30,31]. Diabetes was identified as a significant risk factor for the primary variants of SARS-CoV-2, but it does not appear to be a major risk factor for the Delta and Omicron variants.

Myalgia, age (with an increase in risk for every ten years of age), and cough were consistently identified as significant characteristics across all variants. Notably, the probability of receiving a positive PCR test result decreased with the presence of myalgia as SARS-CoV-2 progressed over time; this trend was particularly evident in the B.1.36 and initial variants, and less so in later variants. In contrast, the association between aging and a positive test result was mostly increasing over time. Although cough had a higher prognostic value based on adjusted odds ratios, age was consistently identified as the most important characteristic across all six variants according to neural network analyses. These two symptoms, along with older age, have been frequently reported as signs of COVID-19 in various studies [32,33,34,35,36,37,38,39]. It is noteworthy that dyspnea was not found to be a significant factor at the onset of the pandemic either. This may be attributed to the fact that our data was limited to the time of admission, while dyspnea typically develops later in the course of the disease.

Loss of smell or taste has been reported by multiple studies as one of the most specific symptoms associated with SARS-CoV-2 positivity [8, 11, 28, 40,41,42,43,44] In our study, when sufficient data was available, such as during the Delta and Alpha periods, taste or smell disorders exhibited strong associations in our machine learning models, which were further supported by neural network analyses. Vihta et al. also reported a strong association between loss of taste/smell and COVID-19. They found that the reporting of loss of taste/smell was highest during the Delta period, followed by the wild-type and Alpha strain [43]. Due to a lack of public awareness, we did not have registered data for taste/smell disorders during the first three waves of our study. Hawkes has suggested that the actual prevalence of taste/smell impairment may be much higher than what is being reported by patients [11]. Other studies have reported that loss or change of taste/smell was indicative of the Alpha strain [45, 46] and additionally have confirmed its lower predictive strength compared to the Delta strain [44].

Sore throat and fever were the only symptoms in our study that showed inconsistent associations across the different variants, being significant in only two variants each. According to our Neural network modeling, sore throat was the second most important factor after age in the Initial variant. Of note, whereas sore throat acted to be statistically significant for predicting Omicron, taste/smell impairment did not return such relation for Omicron after adjustment for other factors in machine-learning modeling. This finding is consistent with some studies that have reported an increase in sore throat and a reduction in taste/smell disorder in Omicron cases [44, 47,48,49].

In this study, headache was found to be significant during the second surge of the disease (B.1.36), but its predictive strength decreased as the virus evolved, and it was no longer significant in Omicron. In Omicron, headache -similar to taste/smell disorder- showed a meaningful difference between COVID-19 and non-COVID-19 patients in univariate analysis, but after adjusting by other factors in the multivariate modeling, headache did not show up in the results for Omicron. This is consistent with other studies that have reported headache as a less significant symptom in Omicron [44, 50]. For instance, Ekroth found that although crude proportions suggested similar rates of headache between Delta and Omicron, after adjustment, headache was in favor of Delta infections [48].

Diarrhoea, a gastrointestinal symptom present from the early weeks of the pandemic, was initially reported to be more common among PCR-negative individuals [8, 9, 51], However, it has gradually emerged as a significant characteristic in the last three variants, and is now considered a negative predictor of COVID-19 [47, 48, 52].

Based on our two approaches—Machine-learning logistic regression and Neural network modeling, we have identified significant predictors for the variants evaluated in this study. For the initial variant, older age (for every 10 years), sore throat (negatively), myalgia, and cough were identified as significant predictors. For the B.1.36 variant, aging, myalgia, cough, and headache were significant predictors. In the B.1.1.413 variant, age, cough, myalgia, fever, and headache were significant predictors. For Alpha, age, diarrhoea (negatively), taste or smell impairment, cough, myalgia, headache, and chills were significant predictors. For the Delta strain, aging, taste or smell disorder, cough, and diarrhoea (negatively) were significant predictors, while for the Omicron variant, age, chills, diarrhoea (negatively), and sore throat were significant predictors.

Overall, age, myalgia, cough, and taste or smell disorder were identified as the most reliable factors to predict a positive test result for the first five variants. However, for the Omicron variant, chills and not having diarrhoea were found to be more decisive than cough or myalgia. It's worth noting that the Omicron variant, which is still being observed as a circulating VOC, is atypical compared to previous variants [6]. The supposed three common symptoms for COVID-19, namely cough, fever, and dyspnea, did not even show up as predictors for the Omicron variant. This change of symptoms in Omicron [49, 53] can be linked to the vaccine effect [47], although we did not consider the vaccination status in our study.

In conclusion, our study provides valuable insights into the symptomatology of different COVID-19 variants and could be useful for early diagnosis and management of the disease.

To the best of our knowledge, this paper is one of the few studies to date that has examined multiple COVID-19 variants and their clinical characteristics using multivariable modeling. In addition to the modeling approaches, this study’s key strength includes its large cohort and the fact that RT-PCR tests were performed for anyone with suspected COVID-19 or respiratory disease symptoms, regardless of their chief complaint. However, our study had some limitations. First, the symptoms collated were mostly self-reported. Second, there was no RT-PCR testing equipment available exclusively for each strain, so we differentiated them based on the periods of time when each variant was dominant. It should be noted that, due to the accuracy of the testing equipment, there were subjects with false-negative PCR results who were treated as COVID-19 patients based on other diagnostic methods, such as CT scans of the lungs and clinicians' judgment. However, in this study, they were considered non-COVID patients.

Conclusion

The present study aimed to investigate the clinical symptoms, comorbidities, and demographics of all symptomatic patients suspected of COVID-19 in the South Khorasan province from the emergence of the disease to the time this study was undertaken. The results indicate that older age, myalgia, cough, and taste or smell disorder are better predictors of COVID-19 than dyspnea or high body temperature. As the disease evolves, symptoms such as chills and diarrhoea demonstrate prognostic strength, as in the case of Omicron. These findings can be used to stratify next steps for isolation or hospitalization based on patient’s clinical conditions and to trace back in-contact subjects. It is important to be vigilant not to miss patients based on classic signs and symptoms, and this study could be beneficial for both the healthcare professionals and the general public in early recognition of COVID-19 infection.