QSAR and Molecular docking studies of 4-anilinoquinoline-triazine hybrids as pf-DHFR inhibitors

A quantitative structure-activity relationship (QSAR) investigation was performed towards 41 hybrids of 4-anilinoquinoline-triazines as potential antimalarial agents. The study was carried out using descendant multiple linear regression analyses (MLR), and artificial neural networks (ANN). Quantum chemical descriptors were calculated using DFT-B3LYP method, with the basis set 6-31G. The values obtained for the correlation coefficient of 0.87 and 0.92 by MLR and ANN, respectively, show a good predictive quality of the established model. In addition, the predicted model has been confirmed by several validation methods such as leave-one-out (LOO) cross-validation, Y-randomization, and external validation. The observed activity and the structural features of the studied molecules were further highlighted by molecular docking study on both wild and quadruple mutant type of pf-DHFR protein. Furthermore, the present work deals to study the binding modes and the key protein-ligand interactions. This methodology will be used to design new antimalarial drugs.


Introduction
Despite the scientific advances made in medicinal chemistry.Nowadays, the malaria disease continues to cause the death of huge numbers of human beings.According to the World malaria report, 2018 draws on data from 90 countries and areas, there were an estimated 219 million cases and 435 000 related deaths in 2017 1,2 .Malaria is caused by protozoa of the genus Plasmodium 3 .In fact, we can distinguish five species that infect humans, P. falciparum, P. vivax, P. malariae, P. ovale, and P. Knowlesi 4 .Among all five species, Plasmodium falciparum is the most severe and deadly type 5 .For this reason, the development of new drugs able to fight malaria is still in great interest.Indeed, 1,3,5-triazine derivatives as cycloguanil, chlorcycloguanil, clociguanil, WR99210 are already approved as effective dihydrofolate reductase (DHFR) inhibitors.They inhibit selectively the biochemical processes that are vital for parasite growth 6 .However, the development of the resistance to antimalarial drugs such as chloroquine, amodioquine, artemisinin, and antifolates becomes a serious health concern 7 .To overcome this problem, the concept of hybrid molecules has been introduced as one of the most used solutions, in which two or more pharmacophores are linked together and act by inhibiting simultaneously two conventional targets 8 .Herein, we report the molecular modeling of 4-anilinoquinoline-triazine derivatives 9 .In order to pursue, our ongoing research on molecular modeling of antimalarial activity 10 .We aim in this study to develop a predictive QSAR model, which will be used to analyze the antimalarial activity of a series of 41 hybrid 4-anilinoquinoline-triazine derivatives (Table 1).The proposed quantitative models are relying on the multivariate statistical analysis of experimental results published previously 11 .The MLR and ANN methods have served to establish a predictive model of antimalarial activity.Moreover, internal and external validations, as well as Y-randomization methods, were used to test the reliability of the built model.Besides, molecular docking is used to analyze the interactions of the hybrid systems with the active sites on the protein.Thus, we performed the docking of two isomers meta 11 and para 38 against Plasmodium Falciparum Dihydrofolate Reductase (Pf-DHFR) through its two forms wild type and quadruple mutant 12 .

Experimental Data
The experimental data of 4-anilinoquinolinetriazine derivatives are collected from previously reported work 9 , and are listed in Table 1.A total of 41 derivatives of 4-anilinoquinoline-triazine were studied toward a correlation between the antimalarial activity and structure of the target molecules.The observed activity (IC50 (nM)) was converted into logarithm scale logIC50 (Table 1).The studied set was divided into two groups enumerated from 1 to 33 and from 34 to 41, where the triazine frameworks are linked to the 4-anilinoquinoline moiety via nitrogen atom in para and meta position, respectively.The general structure of the 4-anilinoquinoline-triazine hybrids is represented in Figure1.

Molecular descriptors calculation
In this work, DFT was used to optimize the structure of all the studied compounds.Electronic descriptorswere calculated from the DFT optimized structures for each molecule 13 using Becke's threeparameter hybrid functional (B3LYP) 14 , with a 6-31G basis set.All calculations were performed by Gaussian 03 quantum chemistry package 15 .The topological, constitutional, lipophilic and steric descriptors were computed with ACD/ChemSketch 16 and Chembiodraw 17 softwares.All descriptors used in this work are summarized in Table 2.

Multiple Linear Regression
The multiple linear regression (MLR) analysis with descendent selection has been used to study the relationship between one dependent variable (biological activity) and several independent variables (molecular descriptors).The independent variables were individually added or deleted from the model at each step of the regression based on three criteria: Determination coefficient R 2 , Fisher ratio value (F) and the Root Mean Squared Error (RMSE).This procedure is a mathematic technique that minimizes the differences between observed and predicted values.The MLR model related to antimalarial activity is generated using the software xlstat 18 .It has served also to select the descriptors used as the input parameters for the artificial neural network (ANN).

Artificial Neural Networks
Artificial neural networks are artificial systems that simulate the learning process of neurons in the human brain.The ANN analysis is performed using Matlab software 19 .In this, the neurons are arranged in layers: the input layer, hidden layer, and output layer.Neurons in the same layer are not connected together.In this work, the input layer contains six neurons representing the relevant descriptors obtained with MLR techniques.The output layer represents the calculated activities values logIC50.The hidden layer has been determined by ρ = (number of compounds) / (number of connections).In our case, it is recommended to take into account the ρ interval 1 < ρ < 3 [20][21] .Hence, with 32 compounds and 6 descriptors, the ρ value is 1.88 when the hidden layer is composed of two neurons 22 .As a result, the final ANN architecture is (6-2-1).

Cross-validation
The cross-validation method has been performed by the Leave-One-Out (LOO) procedure 23,24 , which removes successively one molecule from a training set containing 32 molecules.Then it was repeated 32 times, in order to predict the properties of all molecules.The outcome from such a test is the LOO cross-validation correlation coefficient   2 , which is calculated according to equation 1 (eq.1).The high average of   2 was used by several authors as an indicator of robustness and predictive ability of a model 25 .
Eq. 1 Where Y exp , Y pred and Y are, respectively, the measured, predicted and the averaged values for the dependent variables.

Y-randomization test
This test has been used to ensure the robustness of the built QSAR model.In this test, validation is performed by permuting the response parameters (Y) with respect to the (X) matrix which has been kept unchanged 26 .The Y-randomization test ensures that the correlation coefficient of the obtained model is not found by chance.For an acceptable QSAR model, the average correlation coefficient (Rr) of randomized models should be less than the correlation coefficient (R) of nonrandomized model 27 .The basis for this method is to test the validity of the original QSAR model and to ensure that the selected descriptors are appropriate.

External validation
Validation strategies are recognized as one the most methods, which implies a quantitative assessment of QSAR model robustness, predictive power and application domain based on new data.According to Golbraikh and Tropsha study on validation methods, internal validation is not a good parameter to estimate the capability of QSAR models 25 .However, external validation has used to estimate the performance accuracy of the QSAR on the test set that determines the true predictive power of a QSAR model.The predictive power of the built QSAR was estimated by an external   2 defined as follows (eq.2) 25 : Eq. 2 Where  () and  () are, respectively, the measured and the predicted values of the dependent variable test set) and  ̅  is the average value for the dependent variable for the training set.

𝑄 𝑒𝑥𝑡
2 is an important indicator of the reliability of the proposed model.For this purpose, it must be greater than 0.5.Golbraikh and Tropsha also proposed several other parameters for analyzing the external predictive ability of the developed QSAR model which must be respected 25 .

Molecular Docking
Molecular docking became an essential tool in drug discovery, recently.Because, of its ability to predict the conformation and the bonding mode of the ligand within the target binding site.This study was performed toward dihydrofolate reductases (pf-DHFR) protein, which is the main target of the developing of antimalarial drugs.The crystal structures of two types of pf-DHFR wild type (coded as 1J3I.pdb)and quadruple mutant (coded as 1J3K.pdb)were obtained from the Protein Data Bank RCSB 28 .Both of them contain the third-generation Pf-DHFR inhibitor WR99210 bounded to the active site in the presence of NADPH 11 .The minimized protein structures were defined as receptor, consequently, the first step in the preparation of the receptor uses the Discovery Studio software for removal of all waters and ligands as well as other nonprotein, binding site was defined as volume occupied by the ligand in the receptor, and an input site sphere was defined over the binding site with a radius of 5 Å 29,30 .In this work, molecular docking was performed to compounds 11 and 38 toward 1J3I.pdb and 1J3K.pdbprotein using AutoDock 4.2 software 31 .The analysis of the interactions between the ligands and the receptor was performed by the AutoDock 4.2 software 31 .The 3D grid was created by the AUTOGRID algorithm 32 .The grid maps were constructed using 60, 60, 60, pointing in x, y and z directions, with grid point spacing of 0.375 Å.The coordinates of the grid box center were set to 28.09 Å, 5.76 Å, 52.59 Å, by the ligand location in the complex.

Results and Discussion
The present paper has devoted to the QSAR and molecular docking studies of 4-anilinoquinolinetriazine derivatives, which have shown a significant antimalarial activity.The experimental activity has collected from literature 9 .The dataset was randomly divided into two sets: The training set which contains 32 compounds and the test set contain with 9 compounds.The selected descriptors and the predicted activity values using the training set obtained by MLR and ANN methods are represented in Table 3.

Multiple Linear Regression (MLR)
The training set was used to build a QSAR model using MLR method.The obtained QSAR model is represented by the following equation 3:

Artificial Neural networks (ANN)
In order to increase the probability of good characterization of studied compounds, artificial neural networks (ANN) has used as a non-linear method to generate a predictive non-linear model between the set of molecular descriptors selected by MLR method and the observed activity.The correlation between the observed and the predicted activities using the ANN method is illustrated in Figure 4. Figure 4 shows a good correlation between observed and predicted (ANN) activities.Thus, this model has a significant statistical quality and an excellent prediction ability (R=0.97,R 2 =0.95 and RMSE=0.09).

Cross-validation (CV)
In this study, we have used to validate our model through "leave-one-out" (LOO) cross-validation.The outcomes achieved (  =0.95,   2 =0.91 and RMSE= 0.12) reveals the robustness and the predictive ability of the built QSAR model.Further tests have used to ensure the applicability of this obtained model.

Y-randomization
This technique was widely used to ensure the robustness of a QSAR model.The results of the QSAR model obtained in the Y-randomization test (Table 4) showed relatively a less average correlation coefficient (  2 = 0.69) than obtained by the nonrandomized model (R 2 = 0.78).Furthermore, this result implies that the obtained QSAR model is robust and has a good predictive ability.

External Validation
The external validation has used in order to estimate the true predictive power of the proposed QSAR model.For this reason, nine compounds have been randomly removed from the original set.Thus, the model has been built with 32 compounds through MLR and ANN methods.Then, we have tested the applicability of the built model on the nine compounds (Table 5).In addition, Golbraikh and Tropsha have proposed a set of parameters for determining the external predictability of QSAR model (Table 6) 25 .Based on the results obtained by the external validation method using an external test set (Table 5).We can conclude that the established model has a very good predictive power.According to Golbraikh and Tropsha, all conditions listed in Table 6 are satisfied.
Then, the established model is considered as satisfactory QSAR model.The most important result of this investigation is that in vitro antimalarial activity of this series could be predicted using QSAR methods.

Docking studies
Molecular docking study was performed toward Plasmodium Falciparum dihydrofolate reductases (Pf-DHFR) protein, an essential substrate in the biosynthesis of folate and it has been the main target of the developing of antimalarial drugs.In an attempt to understand the high antimalarial activity potency manifested with certain compounds and the lack of activity observed with others.We have decided to perform molecular docking with the binding sites of both wild type and a quadruple mutant of Pf-DHFR for the highest active compound (compound11) and the lowest active compound (compound 38).The two compounds have the same radicals R1 and R2 but belong to different structural categories as it is detailed in paragraph 2.1.The reported study by Yuvanyama et al 33,10 has found the binding modes and has localized the active sites on both wild and mutant type of protein (Pf-DHFR).The study performed with a potent inhibitor 1,3,5-triazine derivative which is a preclinical molecule called WR99210.It is found that the important sites in the case of the wild type are located in ILE14, ALA16, MET55, ASP54, SER108, ILE164 and TYR170.It was also found that important sites are located in ALA16, CYS50, ASN51, CYS59, ASN108, LEU164 and TYR170 in the case of the wild type.The interactions obtained for the two compounds are illustrated in Figure 5.
The molecular docking of compound 11 toward wild type shows three hydrogen bond between the three nitrogen atoms linked to the triazine moiety and the following amino acids SER111, SER108 and ILE164 at respectively 2.51 Ǻ, 2.24 Ǻ and 1.84 Ǻ.On the other hand, a P-P and a p-sigma interactions are observed between a phenyl group of the quinoline frameworks and the subsequent amino acids PHE116 (4.65 Ǻ) and MET55 (3.99 Ǻ).However, compound 38 forms only two hydrogen bonds with two less important binding sites, SER111 and GLY44, which are not cited as active sites for antimalarial activity 33 .In the case of quadruple mutant, compound 11 forms three hydrogen bond with ASN108, SER111, and LYS49 as well as two interactions P-sigma throughout two phenyl groups.This belongs to two anilines attached, respectively, to the anilinoquinoline moiety and the triazine moiety.Further, compound 38 forms only one hydrogen bond and one P-sigma interaction with LEU146 and LEU46.
In the analysis of these results we have observed that the residues with that the compound 11 has undertaken its interactions present the most important binding sites for antimalarial activity referring with an antecedent study on the antimalarial protein characterization 33 , which might be the plausible reason for observing differences in the activity of wild and mutant strains.So, as compounds 11 and 38 are two position isomers (meta and para) we can note how the position of radicals could make changes in activities potency through the changes of interactions with receptor.Further we noticed that the 3D visualization of compound 11-protein complex (wild type) (Figure 6) showed that the plans of phenyl rings that form the P-P interaction (at a distance of 4.65 Ǻ) are almost parallel, which is considered in a previous study 34 as a strong interaction.

Conclusion
The present work shows how antimalarial activities of 41 hybrids 4-anilinoquinoline-triazines may be treated statistically to uncover the molecular characteristics which are essential for high activity.The generated models were analyzed and validated for their statistical significance and external prediction power.The QSAR analysis revealed a set of important descriptors which influence the activity: HOMO energy, LUMO energy, Density, Sum of Valence Degrees and Balaban Index.The molecular docking studies performed with compounds 11 (paraisomer) and 38 (meta-isomer) shows that the most active compound 11 forms important interactions with the active sites as it was found with known inhibitors (WR99210) of the wild-type and quadruple mutant forms of Pf-DHFR.Therefore, the docked poses provide details of the predicted binding modes and the key molecular interactions, which might provide opportunities for medicinal chemists to develop new antimalarial drugs.

LogIC50= - 54 .
5+ 4.4* EHOMO-14.9*ELUMO + 34*D + 0.38*MR -8.2E -7 *Blndx -0.36* Svde Eq3 the number of compounds, R is the correlation coefficient, R 2 is the determination coefficient, RMSE is the root mean square error, F is the Fisher F-statistic.The relevant descriptors involved in the MLR model of the training set are HOMO energy (EHOMO), LUMO energy (ELUMO), Density (D), Molar refractivity (MR), Sum of Valence Degrees (Svde) and Balaban Index (blndx).The corresponding normalized descriptors coefficients and the correlation of the observed activity obtained by the MLR method are presented, respectively, in Figure 2 and Figure 3.

Figure 2 .
Figure 2. Modeling characterization by the normalized coefficients.

Figure 3 .
Figure 3. Correlation between observed and predicted activities calculated using MLR model.The correlation between the experimental and calculated activities based on MLR model for a series of 32 compounds was quite significant, as indicated by the statistical values (R=0.88), and the RMSE value (0.23).The value obtained for RMSE showed that the model has a good prediction precision.

Figure 4 .
Figure 4.The correlation between observed and predicted activities using the ANN method.

Figure 5 .
Figure 5. Binding interaction from docking simulation of 11 and 38 compounds into the active site of wild type and a quadruple mutant of pf-DHFR.

Table 1 .
The observed activity logIC50 of studied compounds.

Table 2 .
Descriptors forming the database.

Table 3 .
Values of the selected descriptors and the observed/predicted logIC50 values.

Table 4 .
Comparison between observed and predicted activities obtained using Y-randomization method.

Table 5 .
Comparison between experimental and predicted LogIC50 values of an external test set for the MLR and ANN model based on descriptors of equations 3.