1 Introduction

Ethnopharmacology is the study of traditionally used drugs. Some approved drugs are identified by using ethanopharmacological data e.g. sodium salt of chromoglicic acid as mast cell stabilizer [1]. Mast cells are master players during type I hypersensitivity reaction in response to allergens. So, it acts as target site for the treatment of allergy. Mast cells are master players during type I hypersensitivity reaction in response to allergens. So, it acts as target site for the treatment of allergy [2]. These sensitized mast cells after exposure to allergen initiate different signaling pathways inside themselves. Finally, different mediators like histamine are released from mast cells after degranulation and acting on surrounding cells produce symptoms of allergy. Mast cell stabilizers are natural, semi-synthetic and synthetic compounds, which can prevent release of mediators from mast cells. But their mechanisms of actions remain unknown till now.

Disodium cromoglycate, a chromone complex, is first discovered as orally active antiallergic drugs for mast cell stabilizer. This most commonly prescribed antiallergic drug, acts by inhibiting the release of mediators (histamine, leukotrienes) from mast cell as a prophylactic drug [3, 4]. It probably interferes with the antigen-mediated calcium ion influx into mast cells, as it acts on Calcium-activated potassium channel subunit alpha-1 [5, 6, 7]. But this drug has other targets for binding in human like Protein S100-P [8]. It plays an important role in cellular calcium signaling, which may cause adverse effect on our body.

Mastocytosis is the adverse reaction, which is reported during administration of Cromolyn Sodium as shown in different drug databases like DrugbindingDB etc. So, the designs of new oral and natural mast cell stabilizers are urgently needed, which are cheaper and with longer half-life period.

Several natural constituents, which are obtained from different herbs, like Holy Basil (Ocimum Tenuiflorum) [9], Chamomile (Matricaria Recutita) [10], flavonoids in peppermint [11], rhizome of Ginger [12], polyphenols of apple [13], are identified as mast cell stabilizers. Different types of chemical compounds are also analyzed in earlier research works as natural mast cell stabilizers like Flavonoids [14], Coumarins [15], phenols [16], terpenoids [17], tetrazoles [18] and amino acids [19].

Unangst et al. [18] in 1989, describe the inhibition of histamine release from human basophils by a group of Indolecarboxamidotetrazole compounds. They synthesize 31 derivatives. The percent inhibition (PI) of basophil histamine release for 31 derivatives along with two well-known inhibitors e.g. nedocromil (CHEMBL 746) and Cromolyn sodium (CHEMBL 74) is calculated at screening concentration of 10 µm. Among them, 16 compounds, which have PI values less than 50, are selected as training set for our QSAR study. For this study, log (PI/100-PI) is used as biological activity.

Quantitative structure–activity relationship (QSAR) analysis is used to identify mast cell stabilizing activity of different types of compounds [20, 21, 22]. This work is important because experimental methods are costly and time consuming. In this QSAR study of Indolecarboxamidotetrazole compounds, some parameters of selected chemical properties are correlated with biological activity by using mathematical equations. Quantitative parameters for physicochemical or biological or toxicological properties of a molecule, derived by the computational or experimental methods, are known as descriptors. Depending on the types of algorithms and dimensions of descriptors, QSAR analysis can be classified into 2D-QSAR, 3D-QSAR and so on. Among them, 2D-QSAR analysis is a less time consuming energy calculation, because its descriptors are simple and two dimensional. This type of analysis can be operated with direct mathematical algorithms [23].

In various models developed using MLR technique, the statistical parameters e.g. cross validated coefficient (CV) defines the goodness of prediction, whereas the non-cross validated conventional correlation coefficient (r2) defines the goodness of fit of the QSAR model [24].

As stated earlier, Unangst et al., in 1989 [18], synthesize and analyze antiallergic potential of a series of novel indolecarboxamidotetrazoles compounds. These series of thirty-one compounds are derivatives of the following compound, where different substitutions are incorporated in R1, R2 and R3 positions. Bioactivity data for 21 inhibitors of IgE receptor α subunit for human basophils are collected [18] and subjected to descriptor determination [25]. First of all, various topological, electronic, geometrical and constitutional descriptors are calculated and among them five are finally selected by systemic search method [26]. Finally, a linear regression model is hypothesized using selected descriptors, which will be quite useful for finding structurally optimum inhibitor for mast cell stabilization (Fig. 1).

Fig. 1.
figure 1

Chemical structure of Indolecarboxamidotetrazole.

Carboxyamidotriazole binds to and inhibits non-voltage-operated calcium channels, blocking both Ca2 + influx into cells and Ca2 + release from intracellular stores, resulting in the disruption of calcium channel-mediated signal transduction.

2 Material and Method

There are several steps in QSAR model development. First a dataset of similar chemical compounds with same biological activity have been identified. Then different types of descriptors are calculated and using systemic search most suitable descriptors have been identified. With the help of these selected descriptors QSAR models are constructed.

2.1 Data

A data set of the compounds, which consists of twenty-one Indolecarboxamidotetrazole substituted compounds as inhibitors for histamine release for IgE Fc receptor, alpha-subunit from allergic donors, are obtained from the ChEMBL database and literature [27]. According to Unangst et al. [18] chemicals are marked as compound key e.g. 8 l, 3 g, 13 m etc. The chemical IDs and standard inhibition (PI) values and log (PI/100-PI) values are presented in Table 1. Among them first sixteen compounds form training set and last five compounds consist of test set for QSAR study.

Table 1. Compound id and standard inhibition and their logarithmic values for twenty one inhibitors

2.2 Descriptor Generation and Calculation

In this study, SMILES structures, as they are obtained from ChEMBL [27], of all sixteen molecules are used as inputs to calculate 44 different types of molecular descriptors. These descriptors are categorized as topological, electronic, constitutional, and geometric classes for QSAR analysis. For our 2D QSAR study, we choose all 123 descriptors from Chemistry Development Kit (CDK v 1.03), an open source Java library for Chemoinformatics and Bioinformatics [25]. A total of 1,968 descriptors (123 descriptors for 16 compounds) are generated after the calculation using CDK for our experiment (Fig. 2).

Fig. 2.
figure 2

Chemical structures of twenty-one inhibitors.

2.3 Descriptor Selection

Variable selection method in BuildQSAR software [28] is carried out by using the systematic search method, where controlling parameters are set, as described next. Considering log (PI/100-PI) as chosen biological activity and a correlation coefficient factor R is >0.84, cross validation of results is done by using least-one-out method. The number of descriptors per model is set to be 1 as the ratio of (number of compounds) to (number of descriptors) should be >= 5. Here the total twenty-one compounds are randomly divided into training and validation sets containing first sixteen molecules in training set and last five compounds in validation set.

From the summary table of CDK descriptor (wiki.qspr.thesaurus.eu) the above-mentioned descriptors are discussed here. SPC-6 is a type of Chi Path Cluster descriptor belongs to the class of topological descriptor. This type of descriptor reflects the molecular connectivity of a compound without its geometry information. This descriptor evaluates the value of the Kier & Hall Chi path cluster indices of order 6. The second descriptor MDEC-12 represents molecular distance edge descriptors for C as another topological descriptor. Similarly, another topological descriptor is BCUTw-1 h. This is an Eigen value based descriptor noted for its utility in chemical diversity. SPC-5 is the value of the Kier & Hall Chi path cluster indices of order 5. XLogP is a constitutional descriptor which predicts of logP based on the atom-type method.

2.4 Model Development

Three models are generated using one descriptor in each model by Multiple Linear Regression method in BuildQSAR software [28]. The QSAR Model is represented as QSAR equation, with the correlation coefficients calculated for each descriptor used in the regression model. By plotting the experimental activity (Yexp) vs predicted activity (Ypred), built models are evaluated for their predictive powers to determine activity as mast cell stabilizer. Three linear models are built using three selected descriptors as shown in Table 2.

Table 2. Table for selected descriptors with statistical parameters for different models.

For the first model, linear regression equation is

$${ {\begin{aligned} & log\,\left( {PI/100 - PI} \right) = + 0.1798\left( { \pm 0.0689} \right)\,SPC - 6 - 0.2126\left( { \pm 0.1115} \right)\,MDEC - 12 - 0.0152 \\ & \left( { \pm \, 0.0062} \right)\,BCUTw - 1h - 1.5559\,\left( { \pm 0.6567} \right) \\ \end{aligned}}} $$
$${ {\left( {n = 16;\,R = 0.887;\,s = 0.159;\,F = 14.692;\,p = 0.0003;\,Q^{2} = 0.643;\,SPress = 0.205;\,SDEP = 0.184} \right)} }$$

This equation is obtained by analyzing the three topological descriptors e.g. Chi Path Cluster descriptor, molecular distance edge descriptor and Eigen value based descriptor. Here n is number of molecules under analysis, R is the correlation coefficient, r2(R) is the squared correlation coefficient, s is the standard deviation and F is the F statistical value. The cross validated squared correlation coefficient, Q2 is 0.643 and standard deviation of sum of square of difference between predicted and observed values, SPress is 0.205.

The second QSAR equation is

$$ {{\begin{aligned} & log\left( {PI/100 - PI} \right) = + 0.3223\,\left( { \pm 0.1372} \right)\,SPC - 5 - 0.2040\,\left( { \pm 0.1193} \right)\,MDEC - 12 - 0.0135 \\ & \left( { \pm 0.0064} \right)\,BCUTw - 1h - 1.9672\,\left( { \pm 0.8760} \right) \\ \end{aligned}}} $$
$${ {\left( {n = 16;\,R = 0.867;\,s = 0.171;\,F = 12.104;\, \, p = 0.0006;\,Q^{2} = 0.570;\,SPress = 0.225;\,SDEP = 0.202} \right)} }$$

This equation is obtained by analyzing the three topological descriptors. The cross validated squared correlation coefficient, Q2 is 0.570 and SPress is 0.225.

The third model can be generated by using two topological and one constitutional descriptor. Here the values of Q2 and SPress are 0.564 and 0.227 respectively.

$$ {{\begin{aligned} & log\left( {PI/100 - PI} \right) = - 0.1717\,\left( { \pm 0.1189} \right)\,MDEC - 12 - 0.0161\,\left( { \pm 0.0072} \right)\,BCUTw - 1h \\ & \quad + 0.1774\,\left( { \pm 0.0800} \right)\,XLogP - 0.6188\,\left( { \pm 0.4605} \right) \\ \end{aligned}}} $$
$$ {{\left( {n = 16;\,R = 0.855;\,s = 0.178;\,F = 10.910;\,p = 0.0010;\,Q2 = 0.564;\,SPress = 0.227;\,SDEP = 0.203} \right)}} $$

3 Evaluating the Model

As stated earlier in various models developed using Multiple Linear Regression technique, the cross validated coefficient (CV) defines the goodness of prediction for developed models, whereas the non-cross validated conventional correlation coefficient (r2) defines the goodness of fit of the QSAR model [24].

3.1 On the Basis of Goodness of Fit

Among the above three models, model 1 has produced high statistical quality equation (n = 16; R = 0.887; r 2 (R) = 0.7860; R 2-Adj. = 0.7325; s = 0.159; F = 14.692; p = 0.0003; Q 2 = 0.643; SPress = 0.205; SDEP = 0.184). It is seen that both models 2 and 3 have R values of 0.867 and 0.855 respectively, which are lower compared to the first model. Model 1 contents the lowest standard deviation value (s = 0.159) when compared to the other two models. The value of F, the calculated value of the F-ratio test, is 14.692, which is also highest among others F values for all models (Table 3). So, on the basis of R2, s and F values, model 1 can be considered as the best one among the three models.

Table 3. MLR output for regression coefficient.

3.2 On the Basis of Predictive Power

According to the predictive powers of models, the models can be ranked (from the best to the worst) (considering the values of Q2 in descending order), as 1, 2 and 3. When minimum value of SDEP is considered, the model 1 is the best model again on the basis of predictive power (Tables 4 and 5).

Table 4. Calculated and observed activity for model 1
Table 5. Residual table for model 1

4 Graphical Analysis

The graphical analysis has been performed and the graph is shown in following Figs. 3, 4, 5 and 6. The graph has been plotted between the predicted and observed log (PI/100-PI) values (Fig. 3). The predicted activity log(PI/100-PI)pred shows linear relationship with observed activity log(PI/100-PI)obs, because fit of the data to the regression line is good. The higher the value for r2, less likely proves that the relationship is due to chance.

Fig. 3.
figure 3

Predicted activity vs. observed activity.

Fig. 4.
figure 4

Observed activity vs. residual activity.

Fig. 5.
figure 5

Calculated activity vs. residual activity.

Fig. 6.
figure 6

Observed activity vs. calculated activity for validation set.

This QSAR investigation indicates that the descriptors, namely SPC-6, MDEC-12 and BCUT-1 h, for the set of Indolecarboxamidotetrazole compounds inhibitors studied, are found to have a great deal to positively contribute to biological activity.

The graph is plotted for observed activity versus residual (Fig. 4) and predicted activity versus residual (Fig. 5). The finalized descriptors are found to be the members of topological descriptors in model 1.

5 External Validation

All the three models are externally validated with validation set of eight compounds. On the basis of predictive power model 1 is selected by internal validation. A validation set is constructed with the other five molecules and external validation is done using descriptors of the model 1. Observed activity Vs calculated activity graph for validation set shows that all compounds of this set located symmetrically around the best fit line (Fig. 6). Thus model 1 are externally validated with five compounds with the values 0.980 and 0.9606 for the correlation coefficient R and the squared correlation coefficient r2(R) respectively.

$$ \begin{aligned} & log\,\left( {PI/100 - PI} \right) = - \,0.3700\,\left( { \pm 2.0994} \right)\,SPC - 6 + 0.8626\,\left( { \pm 4.2198} \right)\,MDEC - 12 \\ & \quad + 123.0171\,\left( { \pm 1003.2860} \right)\,BCUTw - 1h - 1967.2691\,\left( { \pm 16043.2591} \right) \\ \end{aligned} $$
$${{\left( {n = 5;\,R = 0.980;\,s = 0.043;\,F = 8.120;\,p = 0.2512;\,Q^{2} = Not\,Pred.;\,SPress = Not\,Pred.;\,SDEP = Not\,Pred.} \right)} }$$

6 Conclusion

Innumerable QSAR models have been built in last 50 years drug designing of antimycobacterial agents [30], antituberculosis agents [31], acetylcholinesterase inhibitors [31] and estrogen receptor agonist and antagonists [32]. Mast cell stabilizers can act as inhibitors on human basophil cells and thus they are potent anti allergic drugs.In earlier work, Unangst et al. in 1989 concludes that N –phenyl analogue of indolecarboxamidotetrazole inhibits histamine release from human leukocytes after stimulating with anti-IgE antibody (18), compared to the substitution N-H and N-methyl compounds. These compounds are marked as compound no. 7, 11 and 16 in our dataset respectively and their calculated activity using QSAR model correlates with their observed activity. In this QSAR model developed by using multiple linear regression (MLR) analysis, the cross-validated values of maximum Q2 and minimum SDEP correlates with the goodness of prediction, whereas the non-cross-validated conventional correlation coefficient (R2) defines the goodness of fit of the model. Based on the most predictable QSAR model 1, it can be inferred that inhibitory activity will be decreased with the following substituent, namely halogen substitution in 5th position at R1 of indole ring, OEt, OCH(Me)2 substitutions in R3 of indole ring. The model is also valid for other five carboxamidotetrazole derivatives of furan, thiophene, naphthalene and benzothiophene derivatives. So, it can be concluded that irrespective of nature of substituents the basic structure of this twenty one compounds are responsible for their action as mast cell stabilizer.

We hope that the derived models and effect of substituents can be used in searching more potential mast cell stabilizers from the natural resources prior to experimental evaluation for our future work.