denguedrug

Machine Learning and Molecular Docking Prediction of Potential Inhibitors against Dengue Virus.

Overview

This project aims at building an in silico pipeline to identify novel Dengue Virus inhibitors. We will incorporate Machine Learning (ML) and Molecular Modeling techniques into the pipeline.

Dengue virus (DENV) is a Flaviviridae family member responsible for the most prevalent mosquito-borne viral hemorrhagic fever. Dengue virus transmission to humans primarily occurs through mosquito bites from species such as Aedes aegypti and Aedes albopictus, widespread in tropical and subtropical climates, including both urban and rural regions. The severe and sometimes fatal diseases known as Dengue hemorrhagic fever (DHF) and Dengue shock syndrome (DSS) can develop in certain people infected with DENV. The spread of dengue fever has resulted in several medical emergencies and deaths for which no drug is currently available. Despite its prevalence, the treatment administered is symptomatic. The structural information available for the DENV presented an opportunity to discover potent antiviral agents capable of disrupting the early stages of DENV infection. Regions with a high prevalence of Dengue virus infection in Africa are highlighted in the figure below. Our approach seeks to train different Machine Learning models using the Anti-Dengue dataset from PubChem to discriminate potential anti-Dengue compounds from non-anti-Dengue compounds. Subsequently, we will further screen the predicted compounds against a Dengue protein target for downstream analysis. Details of the pipeline can be found in the presented in the "description" section.

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "id": 1,
      "properties": {
        "ID": 0
      },
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
              [25,33],
              [31,31],
              [38,20],
              [44,12],
              [52,11.5],
              [47,2],
              [39,-5],
              [41.5,-15],
              [35,-20],
              [30,-23],
              [21,-18],
              [25,33]
          ]
        ]
      }
    }
  ]
}

East Africa Coordinates Mapping of High Dengue prevalence regions.

Please cite and star the repository if you utilize the pipeline for research or commercial purposes.

Objectives

Description

This figure illustrates the proposed DengueDrug pipeline to be utilized to identify potent Dengue Virus Inhibitors.

Fig. Proposed Dengue Drug Identification Pipeline.

Step 1: Identification of Dengue Virus inhibitors database for ML training

The ligand database was obtained from PubChem BioAssay ID: 651640. The ligand database was experimentally generated using "in vivo" DENV2 CPE-Based HTS computed in Cell-Based and Microorganism Combination System, method by the Broad Institute. A total of 347,136 compounds were analyzed for their Dengue Virus inhibition and 5,946 actives and 324,845 non-actives were identified. An active is represented as a compound that can exhibit an ATP activity level above 20% at 10 $\mu M$.

Step 2: Preprocessing

The unprocessed database can be found here.
The molecular descriptors of the actives and inactive were calculated using PaDEL-Descriptors. The descriptors of the actives and inactives were calculated using the Descriptor Calculator Python script.
The actives and inactives databases were combined and all missing descriptors were filled with the value 0. Next, dimensionality reduction was conducted using a variance filter (scikit-learn VarianceThreshold library).
The data was then standardized using the mean and standard deviation metrics of various assessment parameters.

Step 3: Model construction

The data was split into training , test, and external datasets. The training dataset was equivalent to 70% (14875 compounds) of the data set and the test and external data sets were equivalent to 15% (~3188) each. The training dataset contained 3105 actives vs 11770 inactives.
The ML models were constructed using lazy predict python package. The models that exhibited the greatest Accuracy, F1-score, Balanced Accuracy, and ROC AUC metrics were selected for validation.
The models chosen for further validation were K-Nearest Neighbours, Gaussian Naïve Bayes, Support Vector Machine, Random Forest and Logistic regression. Using K-fold splitting of the training data, the models were cross-validated and the model's suitability was evaluated using the Accuracy, F1-score, Precision, Recall, and Specificity, and false and true positive and negative rate as selection metrics.
The models' prediction ability was assessed using the test data. The model's prediction accuracy was determined using Accuracy, F1-score, Precision, and Recall evaluation metrics.
The logistic regression (LR) model exhibited the greatest results on the test dataset and therefore was evaluated on the external dataset. The LR model obtained an 82% active and 98% inactive accuracy.

Step 4: Prediction

The LR model was employed to screen the Northern African Natural Products Database (NANPD), East African Natural Products Database (EANPD), AfroDB and Tradtional Chinese Medicine (TCM) database.
The natural compounds' chemical structures were prepared similarly to the training dataset and ~43,000 compounds were screened using the LR model.
7,722 compounds were predicted to be active and subsequently utilized for molecular docking.

Step 5: Molecular Docking

The crystal structure of the Dengue 2 virus envelope protein (PDB: 2FOM) was identified for structure-based virtual screening.
AutoDock Vina was utilized to screen the 7,722 compounds into the Dengue 2 virus envelope protein.
The potential hits were selected using the criterion:
- AutoDock Vina binding score.
- Presence of binding interactions between important binding site residues and ligand (LigPlot + v1.4.5).

Step 6: ADMET prediction

The ADMET properties of the identified hits will be predicted using SwissADME, applying Veber's rule and Lipinski's Rule of Five (Ro5).
The hits with potential pharmacokinetic and toxicity moieties will be removed.

Step 7: Molecular Dynamics (MD) Simulations

The hits binding mode stability will be assessed through 100-nanosecond (ns) MD simulations utilizing GROMACS.
The stability will be assessed using metrics like root-mean-square deviation (RMSD) and fluctuation (RMSF), Radius of Gyration (Rg), etc., using Xmgrace.
The compounds binding interactions retention with important residues throughout the MD simulations will be assessed with the ProLIF python library.
The compounds' binding free energies throughout the MD simulation were calculated using Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA).

Manuscript

Important

When using the pipeline or findings for research or commercial purposes, please cite our research and star the repository.

Results

From the various analyses and computations performed throughout this in silico exploration, the following results were obtained and observations made.

Data Acquisition and Processing

The bioactive dataset obtained from PubChem consisted of imbalanced data from which 1/3 where active compounds, and inactives dominated the dataset as seen in Figure 1 below. PaDEL was used to generate 1,444 molecular descriptors, providing a mathematical representation of compounds for QSAR modeling via the convertion of chemical information to numerical values. The dataset of 21,250 compounds was split into: 14,875 training set, 3,187 test set, and 3,188 externally-held set. Applying a variance filter with threshold value 0.1 shortened descriptors from 1,444 to 684, filtering out those with minimal variance, guaranteeing only the most informative features were withheld for posterior modeling.

Figure 1. Three-dimensional plot of the correlation between active and inactive compounds in the processed dataset according to ALogP, XLogP, and Zagreb.

Model Development and Evaluation

Five machine learning algorithms (k-NN, Gaussian Naïve Bayes, SVM, Random Forest, and Logistic Regression) were employed to build robust predictive models, each assessed based on several statistical parameters including the accuracy, precision, recall, and F1 score. LR produced the best results across most metrics, followed by the SVM model. With 81% accuracy and an F1 score of 0.51, Gaussian Naïve Bayes was depicted as least performing, and it struggled to balance between identification of true positives and minimization of false positives, given its low precision (0.55) and recall (0.47). Given its robustness via the various evaluation metrics, Logistic Regression appeared as the most dependable model for predicting potential inhibitors (Table 1).

Table 1. Evaluation of ML models performance on retained datasets.

Model	Accuracy	Precision	Recall	F1 Score
LR	0.94	0.91	0.76	0.83
SVM	0.93	0.94	0.71	0.81
KNN	0.92	0.89	0.68	0.77
RF	0.91	0.94	0.60	0.73
NB	0.81	0.55	0.47	0.51

Figure 2. Bar plot of performance metrics for each Machine Learning model.

Prediction of Inhibitors and New Compounds

Eighteen (18) known Dengue Virus inhibitors sourced from literature were used for initial testing and to further ascertain the model performance. Amongst these inhibitors, the Logistic Regression model successfully depicted 11 as active as seen below (Table 2), surpassing the performance of other ML models. These compounds underwent the same preprocessing as training data to ensure consistency in descriptor calculation and transformation. Notable active compounds predicted by the LR model included Pentoxifylline, Prochlorperazine, Balapiravir, Celgosivir, and Bortezomib, to cite a few. This effective validation showed it could extrapolate to compounds with same mode of action. The LR model was employed to predict activity in 812 and 1871 compounds from the ZINC and EANPDB databases respectively. Of the 2683 evaluated compounds, 933 were predicted as active and suitable for further investigation. This approach highlighted the significance of careful descriptor selection and data preprocessing in QSAR modeling toward properly addressing imbalanced data in anti-Dengue drug discovery.

Table 2. Prediction results for known Dengue Virus inhibitors using LR.

N°	Inhibitors	Prediction	Mechanism of Action	References
1	Pentoxifylline	1	Immune modulation	Salgado et al., 2012
2	4-hydroxyphenyl retinamide	0	Inhibits viral replication	Carocci et al., 2015; Fraser et al., 2014
3	Prochlorperazine	1	Inhibits viral binding and viral entry	Simanjuntak et al., 2015
4	Balapiravir	1	Inhibits viral replication	Nguyen et al., 2013
5	Bortezomib	1	Inhibits viral replication	Ci et al., 2023
6	Leflunomide	1	Immunosuppressive effects	Wu et al., 2011
7	SKI-417616	1	Inhibition of D4R suppressed DENV infection	Smith et al., 2014
8	Celgosivir	1	Inhibits viral replication	Tian et al., 2018
9	UV-4B	1	Inhibits viral replication	Franco et al., 2021
10	2-C-methylcytidine	0	Inhibits viral replication	Lee et al., 2015
11	Ketotifen	1	Vascular leakage	Lai et al., 2017
12	Chloroquine	1	Inhibits viral replication	Lai et al., 2017
13	Dasatinib	0	RNA replication inhibition	de Wispelaere et al., 2013
14	Lovastatin	0	Inhibits viral replication	Whitehorn et al., 2016
15	ST-148	0	Inhibits viral replication	Byrd et al., 2013
16	Dexamethasone	0	Inhibits viral replication	Kularatne et al., 2009
17	Prednisolone	1	Inhibits viral replication	Lai et al., 2017
18	Ivermectin	0	Helicase inhibition	Xu et al., 2018

0 = Inactive, 1 = Active

Target Selection and Molecular Docking of Predicted Compounds

This research focused on the NS2B/NS3 protease, chosen as a primary target from the seven nonstructural proteins of the Dengue Virus to validate the Logistic Regression (LR) model predictions. As a crucial enzyme involved in the viral replication and assembly process, it acts as a significant target for antiviral drug development. Potential inhibition sites for the DENV protease include the active and binding site of the NS3 protease with its cofactor, NS2B. The active site, which is the main target for intervention, comprises a preserved catalytic triad consisting of His51, Asp75, and Ser135. Searching through the Protein Data Bank revealed several solved structures for the NS2B/NS3 complex in Dengue Virus serotype, identified by PDB IDs like 4M9T, 2FOM, 4M9M, and 4M9I, with resolutions of 1.74, 1.50, 1.53, and 2.40 $Å$, and R-values of 0.215, 0.176, 0.203, and 0.215, respectively. For this study, 2FOM was selected due to its favorable resolution and R-value, allowing for visualization of its 3D structure with a ligand docked at the active site (possessing other residues like Leu128, Pro132, Ser131, & Tyr161), as represented below (Figure 3).

Figure 3. NS2B/NS3 protease structure as visualized in PyMOL, highlighting ligand docking. [A. Pale-yellow cartoon structure representation of the protein structure; B. Light-green surface representation of the protein with a ligand (blue) positioned in the active site].

853 compounds, curated from the total ML outputs, predicted by the LR model were docked into the active site of the NS2B/NS3 protease using Autodock Vina. Among these, anhydrophlegmacin exhibited the highest binding affinity of -9.2 $kcal/mol$, surpassing all other docked ligands. The binding affinities of the compounds ranged from -9.2 to -3.6 $kcal/mol$, supporting the predictive capability of the Logistic Regression model. With a threshold set at -8.0 $kcal/mol$, 59 compounds with equivalent or better binding energies were selected for further investigation. This threshold exceeded the conventional -7.0 $kcal/mol$ standard for classifying compounds as active against a given target(Kwofie et al., 2022). The binding affinity reflects the strength of interaction between ligands and target protein. Visual inspections of protein-ligand complexes were conducted using PyMOL to identify the most promising docked compounds. Known inhibitors - Leflunomide and Prednisolone - were included as controls, respectively demonstrating binding affinities of -7.1 and -7.0 $kcal/mol$. The binding affinities for all predicted compounds and inhibitors were systematically recorded and stored here.

Mechanism of Binding Characterization of selected compounds

Building on the structure-based molecular docking approach, interactions between the predicted compounds and the identified binding pocket were analyzed. Biomolecular interactions between the NS2B/NS3 protease and docked compounds were illustrated using LigPlot, to help identify key residues within the proteins' active sites, crucial investigation step for identification of promising lead compounds. Hydrogen bonding and hydrophobic interactions between the selected compounds and active site residues were investigated to identify potent inhibitors of the NS2B/NS3 protease.

Ligands docking to the protease's active site showed interactions with essential residues like His51, Ser135, Leu128, Pro132, Ser131, Tyr161, and Asp75, as detailed in Table 3 and the accompanying Supplementary file 1. With highest binding affinities, anhydrophlegmacin and anhydrophlegmacin-9,10-quinones_B2, interacted with same residues, including His51, Asp75, Gly151, Leu128, Pro132, and Gly153. These ligands formed hydrogen bonds with Asp75-Ser135-His51, with bond lengths measuring 2.57, 3.06, and 2.86 $Å$, respectively. Prednisolone established hydrogen bonds with Gly151, Asp75, His51, and Gly153 (Table 3). Furthermore, ZINC14441502 interacted with Gly151 and Ser135 at bond lengths 2.86 and 2.99 $Å$, respectively, and engaged in hydrophobic bonding with Leu128, Gly153, Asn152, Val72, Asp75, His151, and Phe130 (Supplementary file 1). A total of 39 from the 56 compounds docked effectively, showing strong interactions were chosen for subsequent analysis.

Table 3. Top 15 Protein-ligand interactions between selected hits and NS2B/NS3 post-docking, including two known inhibitors.

N°	Compound names	Binding Affinity ($kcal/mol$)	Hydrogen bonding with bond length ($Å$)	Hydrophobic contacts
1	anhydrophlegmacin	-9.2	Asn152 (2.76), Gly153 (2.88), Ser135 (3.06), Gly151 (2.86)	Val72, Asp75, His51, Pro132, Tyr150, Leu128
2	anhydrophlegmacin-9,10-quinones_B2	-9.2	Val72 (2.96), Asp75 (2.57), His51 (2.86), Lys73 (2.94)	Leu128, Pro132, Gly151, Gly153, Tyr161, Trp50
3	ZINC000035941652	-9.1	Leu149 (3.06)	Trp83, Asn152, Ala164, Ile165, Lys73, Asn167, Thr120, Ile123, Ala166, Lys74, Gly148, Leu76
4	chryslandicin	-9.0	Val72 (2.74)	Gly153, Trp50, His51, Tyr161, Leu128, Pro132, Gly151, Asn152, Asp75
5	ZINC000085594516	-8.8	Ser135 (3.09)	Leu128, Tyr150, Pro132, Phe130, Gly151, His51, Asn152, G1y153, Asp75
6	6a,12a-dehydromillettone	-8.7	None	His151, Asp75, Gly151, Gly153, Tyr150, Phe130, Pro132, Leu128
7	ZINC000028462577	-8.6	Ser135 (2.67), Val72 (2.94)	Trp50, Gly151, Leu128, Phe130, His51, Gly153, Pro132, Tyr150
8	anhydrophlegmacin-9',10'-quinone	-8.6	Asn152 (2.88), Gly153 (2.84), Ser135 (2.94)	Asp75, Val154, Val72, Trp50, His51, Pro132, Leu128, Gly151
9	2',4'-dihydroxychalcone-(4-O-5''')-4'',2''',4'''-trihydroxychalcone	-8.6	Leu149 (2.99), Thr120 (3.26)	Val154, Lys73, Val72, Asn152, His51, Asp75, Gly148, Leu76, Gly153. Trp83, Lys74, Ile165, Ala166, Ala164, Asn167, Ile123
10	ZINC000095485910	-8.6	Phe130 (2.71)	Ser135, Gly151, Leu128, His51, Asp75, Gly153, Pro132, Tyr150
11	ZINC000095485955	-8.6	Trp83 (2.84), Leu149 (3.20), Asn152 (2.80)	Gly87, Val146, Met149, Leu76, Ala164, Asn167, Ile165, Ala166, Gly148, Leu85, Val147
12	ZINC000095486025	-8.5	Leu128 (3.34) Gly153 (2.87)	Val72, His51, Asp75, Ser135, Gly151, Phe130, Pro132, Tyr150, Tyr161, Val54, Lys73, Asn152
13	ZINC000038628344	-8.5	His51 (2.89), Ser135 (2.68), Asp75 (2.57), Phe130 (3.06), Tyr150 (3.10)	Pro132, Ser131, Leu128, Tyr161, Gly153, Gly151
14	ZINC000095486053	-8.4	Gly151 (2.99)	His51, Pro132, Tyr150, Ser135, Phe130, Leu128
15	phaseollidin	-8.4	Gly87 (2.83), Val146 (2.98)	Leu85, Trp83, Gly148, Leu149, Ala164, Leu76, Asn167. Asn152, Lys74, Ile165, Trp89, Ala166, Glu88, Glu86, Val147

A visual representation of the interactions can be observed between ZINC38628344 and the NS2B/NS3 protease, with binding affinity of -8.5 $kcal/mol$, that established hydrogen bonds with His51 (2.89 $Å$), Asp75 (2.57 $Å$), Phe130 (3.06 $Å$), in addition to hydrophobic interactions with residues Pro132, Ser131, Leu128, Tyr161, Gly153, Gly151 (Figure 4).

Figure 4. Ligand ZINC38628344 docked into the NS2B/NS3 binding pocket, with 2D protein-ligand interaction visual produced with PyMOL (left) and LigPlot (right).

ADMET Screening of Selected Compounds

Pharmacokinetic analyses focus on absorption and elimination of medications by the body. Key features, including gastrointestinal (GI) absorption, were assessed; "High" absorption potential being the most favorable. Veber's criteria were applied, and 20 out of 39 hits that did not comply with Lipinski's Rule of Five (RO5) were eliminated, 12 of which breached one of the RO5 criteria (Supplementary Table 2). Overall, 31 compounds were deemed drug-like, while 7 showed poorest drug-likeness, including 5,7'-physcion-fallacinol, ZINC000095485956, ZINC000085594516, amentoflavone, ZINC000095486111, voucapane-18,19-di-(4-methyl)-benzenesulphonate, and ZINC000095485927, with two RO5 violations (Supplementary Table 2). Veber’s rule, emphasizing TPSA ≤ 140 and rotatable bonds ≤ 10, further filtered the hits, with 26 demonstrating zero violations. Solubility and pharmacological profiles indicated that while only one compound (ZINC000095485927) was predicted to be insoluble, many others showed moderate to poor solubility (Supplementary Table 2). Owing to GI absorption, 21 selected hits were marked High, against 18 Low. The mutagenicity and tumorigenicity of the hits were also assessed with DataWarrior (Table 4).

Table 4. Prediction of ADME (absorption, distribution, metabolism, excretion) and toxicity profiles for the top selected hits.

N°	Ligands	ESOL Solubility Class	GI absorption	RO5 violation	Veber’s rule violation	Mutagenicity	Tumorigenicity
1	ZINC000004095704	Soluble	Low	1	1	None	None
2	ZINC000095485958	Soluble	Low	1	1	None	None
3	ZINC000095485940	Soluble	High	0	0	None	None
4	ZINC000095485986	Soluble	Low	0	1	None	None
5	dihydrolanneaflavonol	Moderately soluble	High	0	0	None	None
6	lettowianthine	Moderately soluble	High	0	0	High	High
7	millettosine	Moderately soluble	High	0	0	None	None
8	ZINC000095486053	Moderately soluble	High	0	0	None	None
9	ZINC000031168265	Soluble	High	0	0	None	None
10	ZINC000095485910	Moderately soluble	High	0	0	High	High

Molecular Dynamics Simulations

To further investigate the stability of predicted lead compounds within the active site, Molecular Dynamics Simulations were conducted using GROMACS 2020.5. The binding mechanisms of the various molecules within the active site are essential for effective drug design. Dynamic behavior analyses of both unbound proteins and their complexes were performed, plotting metrics like root mean square deviation (RMSD), radius of gyration (Rg), and root mean square fluctuation (RMSF) using Xmgrace. All simulations were executed for 100 ns.

Root mean square deviation (RMSD) for 100 ns MD simulations

The RMSD is a well-grounded indicator of protein stability, evaluating complex versus original atomic coordinates of the protein backbone. The RMSD analysis indicated that both the unbound protein and the four lead compounds were stable over the 100 ns simulation, except for Prednisolone which showed instability until 70 ns (Figure 5). The NS2B/NS3pro-Prednisolone complex demonstrated significant fluctuations before stabilizing, NS2B/NS3pro-ZINC38628344 RMSD peaked at 0.25 nm and then stabilized (averaging 0.22 nm), while the other complexes were more stable, averaging around 0.17 nm. The unbound protein showed the least fluctuation overall.

Figure 5. RMSD vs. time graph for the unbound protein and NS2B/NS3pro-ligand complexes generated throughout the 100 ns MD simulation.

Radius of gyration for 100 ns MD simulations

The folding and compactness of the 05 complexes and the unbound protein were assessed by plotting the radius of gyration (Rg) over the 100 ns simulation period. The Rg values for both the unbound NS2B/NS3 protease and the protein-ligand complexes ranged from 1.51 to 1.59 nm as seen in Figure 6. The unbound protease showed steady fluctuations until around 50 ns, where it rose till the end, whereas the protein-ligand complexes displayed comparable fluctuation trends throughout the 100 ns. The NS2B/NS3pro-Prednisolone complex exhibited the greatest fluctuations, peaking at 1.59 nm.

Figure 6. Rg graph comparing NS2B/NS3pro-ligand complexes and the unbound protein.

Root mean square fluctuations (RMSF) for 100 ns MD simulations

The RMSF trajectories of the protein-ligand complexes and the unbound NS2B/NS3 were analyzed. All predicted lead compounds caused noticeable changes in similar regions, as reflected in the RMSF plot. Significant fluctuations were observed from residue indexes 28-33, with additional variations between indexes 60-65 and 116-123. The RMSF graph also indicated fluctuations in the unbound protein, particularly around residues 102-106.

Figure 7. Analysis of the RMSF trajectories of the NS2B/NS3pro-ligand complexes and the unbound protein residues.

MMPBSA Computations

MMPBSA Computations helped assess potential activity via the assessment of free binding energies.

Contributing Energy Terms

The binding free energies of complexes were calculated using the Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) approach. Contributions to binding free energy include van der Waals energies, electrostatic interactions, polar solvation, and solvent-accessible surface area energy; noting average and standard deviations. The lead compounds ZIN38628344, ZINC95485940, ZINC14441502, and 2',4'-dihydroxychalcone exhibited binding energies of -44.957, -18.586, -25.881, and -55.805 $kJ/mol$, respectively, with 2',4'-dihydroxychalcone displaying the lowest and ZINC95485940 the highest binding free energy. Prednisolone had a binding free energy of -17.682 $kJ/mol$.

Table 5. MMPBSA energy contributions for NS2B/NS3-ligand complexes presented as averages ± standard deviations in kJ/mol.

N°	Compounds	van der Waal energy ($kJ/mol$)	Electrostatic energy ($kJ/mol$)	Polar solvation energy ($kJ/mol$)	SASA energy ($kJ/mol$)	Binding energy ($kJ/mol$)
1	ZINC38628344	-73.805 ± 4.608	-10.304 ± 1.231	48.041 ± 3.817	-8.983 ± 0.555	-44.957 ± 3.383
2	ZINC95485940	-54.337 ± 3.716	-22.090 ± 2.316	65.388 ± 4.613	-7.682 ± 0.473	-18.586 ± 2.821
3	ZINC14441502	-52.459 ± 3.949	-8.261 ± 0.862	41.318 ± 3.042	-6.400 ± 0.476	-25.881± 3.519
4	Prednisolone	-39.913 ± 4.112	-9.190 ± 1.346	36.390 ± 3.989	-5.355 ± 0.527	-17.682 ± 3.583
5	2',4'-dihydroxychalcone-(4-O-5''')-4'',2''',4'''-trihydroxychalcone	-160.105 ± 5.769	-41.801 ± 2.540	164.633 ± 6.076	-18.440 ± 0.639	-55.805 ± 3.467

Per-residue Energy Decomposition

By employing per-residue decomposition, binding free energies were computed via the MMPBSA method. Residues contributing a binding free energy of at least ± 5 $kJ/mol$ were considered critical for ligand binding. Per-residue energy decomposition was performed for each complex. In NS2B/NS3-ZINC14441502 complex, only Tyr161 contributed a binding energy of -6.4629 $kJ/mol$ (Figure 8). For the NS2B/NS3-ZINC38628344 complex, Tyr161 and Leu128 contributed energies of -6.6957 and -3.4011 $kJ/mol$. Other key residues interacting with ZINC95485940, 2',4'-dihydroxychalcone, and Prednisolone contributed minor energy values.

Figure 8. MMPBSA plot illustrating binding free energy contributions for NS2B/NS3-ZINC14441502 complex.

Here is a summary of the data flow chart throughout this research:

graph TD;
    A[Bioactive Dataset - 343,305 compounds]--Data preprocessing (Compound standardization)-->B[21,250 Study data: 4470 actives + 16780 inactive];
    B[21,250 Study data: 4470 actives + 16780 inactives]--Data splitting (1:4)-->C[14,875 training data + 3,187 test + 3,188 externally held];
    C[14,875 training data + 3,187 test + 3,188 externally held]--Evaluation data-->D[3,188 externally held];
    D[3,188 externally held]--Model validation-->E{ML Model pool};
    E{ML Model pool}--Model selection-->F{QSAR models};
    G[PaDEL descriptors: 1,444]--Variance filter (Threshold = 0.1)-->H[Approved descriptors: 684];
    H[Approved descriptors: 684]--QSAR modeling-->F{QSAR models};
    I[18 Known inhibitors‡]--※Further model validation-->F{QSAR models};
    F{QSAR models}-->J{Logistic Regression};
    F{QSAR models}--※LR output-->K[11 Inhibitors marked **Active**‡];
    L[2683 New compounds: 812 ZINC & 1871 EANPDB]--LR model prediction-->J{Logistic Regression};
    J{Logistic Regression}--Yes-->M[933 active compounds];
    J{Logistic Regression}--No-->N[1750 inactive compounds];
    M[933 active compounds]--Compound selection based on 2FOM structure-->O[853 selected compounds];
    O[853 selected compounds]--NS2B_NS3 Molecular Docking (Affinity ≤ -8.0 kcal)-->P[59 ligands + 2 Known inhibitors];
    P[59 ligands + 2 Known inhibitors]--Binding affinities postdocking-->Q[39 Top docked hits];
    Q[39 Top docked hits]---->R{ADMET Screening};
    R{ADMET Screening}--Veber's rules & Lipinski's RO5-->S[20 Top non-violating hits];
    S[20 Top non-violating hits]-->T[Top Protein-Ligand complexes];
    T[Top Protein-Ligand complexes]-->U{Molecular Dynamics Simulations};
    T[Top Protein-Ligand complexes]-->V{MMPBSA Computations};
    U{Molecular Dynamics Simulations}--RMSD-->W[2',4'-dihydroxychalcone/ZINC14441502/ZINC95485940 > ZINC38628344];
    U{Molecular Dynamics Simulations}--Rg-->X[ZINC38628344/ZINC14441502/ZINC95485940 > 2',4'-dihydroxychalcone];
    U{Molecular Dynamics Simulations}--RMSF-->Y[ZINC38628344 \ ZINC95485940 \ 2',4'-dihydroxychalcone \ ZINC14441502];
    V{MMPBSA Computations}--Contributing energy Terms-->Z[2',4'-dihydroxychalcone > ZINC38628344 > ZINC14441502 > ZINC95485940];
    V{MMPBSA Computations}--Per-residue Decomposition-->AA[ZINC38628344 > ZINC14441502 > ZINC95485940 > 2',4'-dihydroxychalcone];

How to use

The documentation and videos give a general overview of how the pipeline was built and can be utilized to identify novel Dengue Virus inhibitors.

Tutorial

Study pipeline describes how the models were constructed, selected, validated, and implemented. It also pinpoints how the various scripts were written and put into action.

Molecular Docking and Dynamics Simulations though computerized, are also briefly described, and some of the scripts run to compute these analyses are identified and put into action.

Tutorial 2

Construction of a possible PyPi installation package for novel antiviral therapeutics prediction and discovery.

Data Availability

The data utilized for the project can be found here.

Reproducibility Prerequisites

Note

The codes and scripts were run on Python 3.8, Anaconda3 2024.06.1 and Jupyter Notebook version 7.

R 4.3.0 was used for some of the data visualization to plot graphs from MMPBSA computations.

Credits

The Team members include:

George Hanson – [email protected]
Joseph Adams - [email protected]
Daveson Innocento Brank Kepgang - [email protected]
Andy Asante - [email protected]
Emmanuel Israel Nsedu - [email protected]
Hem Bondarwad – [email protected]
Maureen Kisaakye - [email protected]
Lewis Tem Bueh - [email protected]
Luke S. Zondagh - [email protected]
Soham Amod Shirolkar - [email protected]
Olaitan I. Awe - [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
accessions		accessions
data		data
docs		docs
figures		figures
notebooks		notebooks
output		output
pipeline		pipeline
scripts		scripts
workflow		workflow
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

denguedrug

Overview

Table of contents

Objectives

Description

Step 1: Identification of Dengue Virus inhibitors database for ML training

Step 2: Preprocessing

Step 3: Model construction

Step 4: Prediction

Step 5: Molecular Docking

Step 6: ADMET prediction

Step 7: Molecular Dynamics (MD) Simulations

Manuscript

Results

Data Acquisition and Processing

Model Development and Evaluation

Prediction of Inhibitors and New Compounds

Target Selection and Molecular Docking of Predicted Compounds

Mechanism of Binding Characterization of selected compounds

ADMET Screening of Selected Compounds

Molecular Dynamics Simulations

Root mean square deviation (RMSD) for 100 ns MD simulations

Radius of gyration for 100 ns MD simulations

Root mean square fluctuations (RMSF) for 100 ns MD simulations

MMPBSA Computations

Contributing Energy Terms

Per-residue Energy Decomposition

How to use

Data Availability

Reproducibility Prerequisites

Credits

About

Releases 1

Packages

Contributors 5

Languages

License

omicscodeathon/denguedrug

Folders and files

Latest commit

History

Repository files navigation

denguedrug

Overview

Table of contents

Objectives

Description

Step 1: Identification of Dengue Virus inhibitors database for ML training

Step 2: Preprocessing

Step 3: Model construction

Step 4: Prediction

Step 5: Molecular Docking

Step 6: ADMET prediction

Step 7: Molecular Dynamics (MD) Simulations

Manuscript

Results

Data Acquisition and Processing

Model Development and Evaluation

Prediction of Inhibitors and New Compounds

Target Selection and Molecular Docking of Predicted Compounds

Mechanism of Binding Characterization of selected compounds

ADMET Screening of Selected Compounds

Molecular Dynamics Simulations

Root mean square deviation (RMSD) for 100 ns MD simulations

Radius of gyration for 100 ns MD simulations

Root mean square fluctuations (RMSF) for 100 ns MD simulations

MMPBSA Computations

Contributing Energy Terms

Per-residue Energy Decomposition

How to use

Data Availability

Reproducibility Prerequisites

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Languages

Packages