Skip to content

Machine Learning and Molecular Docking Prediction of Potential Inhibitors against Dengue Virus

License

Notifications You must be signed in to change notification settings

omicscodeathon/denguedrug

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DenD Open In Collab License: MIT GitHub Repo stars Github all releases Python GitHub contributors Github tag

denguedrug

Machine Learning and Molecular Docking Prediction of Potential Inhibitors against Dengue Virus.

Overview

This project aims at building an in silico pipeline to identify novel Dengue Virus inhibitors. We will incorporate Machine Learning (ML) and Molecular Modeling techniques into the pipeline.

Dengue virus (DENV) is a Flaviviridae family member responsible for the most prevalent mosquito-borne viral hemorrhagic fever. Dengue virus transmission to humans primarily occurs through mosquito bites from species such as Aedes aegypti and Aedes albopictus, widespread in tropical and subtropical climates, including both urban and rural regions. The severe and sometimes fatal diseases known as Dengue hemorrhagic fever (DHF) and Dengue shock syndrome (DSS) can develop in certain people infected with DENV. The spread of dengue fever has resulted in several medical emergencies and deaths for which no drug is currently available. Despite its prevalence, the treatment administered is symptomatic. The structural information available for the DENV presented an opportunity to discover potent antiviral agents capable of disrupting the early stages of DENV infection. Regions with a high prevalence of Dengue virus infection in Africa are highlighted in the figure below. Our approach seeks to train different Machine Learning models using the Anti-Dengue dataset from PubChem to discriminate potential anti-Dengue compounds from non-anti-Dengue compounds. Subsequently, we will further screen the predicted compounds against a Dengue protein target for downstream analysis. Details of the pipeline can be found in the presented in the "description" section.


{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "id": 1,
      "properties": {
        "ID": 0
      },
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
              [25,33],
              [31,31],
              [38,20],
              [44,12],
              [52,11.5],
              [47,2],
              [39,-5],
              [41.5,-15],
              [35,-20],
              [30,-23],
              [21,-18],
              [25,33]
          ]
        ]
      }
    }
  ]
}
Loading

East Africa Coordinates Mapping of High Dengue prevalence regions.

Please cite and star the repository if you utilize the pipeline for research or commercial purposes.

Table of contents

  1. Objectives
  2. Description
  3. Manuscript
  4. Results
  5. How to use
  6. Data Availability
  7. Reproducibility Prerequisites
  8. Credits

Objectives

  • Identify the Dengue virus protein target.
  • Identify Dengue virus ligand database for ML training and molecular modeling method validation.
  • Determine ML python algorithm to be utilized in the project.
  • Process ligand database and train ML model.
  • Evaluate ML performance and perform EDA.
  • Validate molecular modeling method using prepared ligand database (Actives vs Non-actives).
  • Virtual screening of predicted actives into identified protein crystal structures.
  • Assess and identify hits using criterion: docking score, interactions with important residues.
  • Assess hits ADMET properties.
  • Conduct MD simulations to determine compounds' binding mode stability and binding free energy, and MMPBSA computations.
  • Compile results and observations.
  • Finalize write-up.

Description


This figure illustrates the proposed DengueDrug pipeline to be utilized to identify potent Dengue Virus Inhibitors.


Proposed Dengue Drug Identification Pipeline
Fig. Proposed Dengue Drug Identification Pipeline.

Step 1: Identification of Dengue Virus inhibitors database for ML training

The ligand database was obtained from PubChem BioAssay ID: 651640. The ligand database was experimentally generated using "in vivo" DENV2 CPE-Based HTS computed in Cell-Based and Microorganism Combination System, method by the Broad Institute. A total of 347,136 compounds were analyzed for their Dengue Virus inhibition and 5,946 actives and 324,845 non-actives were identified. An active is represented as a compound that can exhibit an ATP activity level above 20% at 10 $\mu M$.

Step 2: Preprocessing

  • The unprocessed database can be found here.

  • The molecular descriptors of the actives and inactive were calculated using PaDEL-Descriptors. The descriptors of the actives and inactives were calculated using the Descriptor Calculator Python script.

  • The actives and inactives databases were combined and all missing descriptors were filled with the value 0. Next, dimensionality reduction was conducted using a variance filter (scikit-learn VarianceThreshold library).

  • The data was then standardized using the mean and standard deviation metrics of various assessment parameters.

Step 3: Model construction

  • The data was split into training , test, and external datasets. The training dataset was equivalent to 70% (14875 compounds) of the data set and the test and external data sets were equivalent to 15% (~3188) each. The training dataset contained 3105 actives vs 11770 inactives.

  • The ML models were constructed using lazy predict python package. The models that exhibited the greatest Accuracy, F1-score, Balanced Accuracy, and ROC AUC metrics were selected for validation.

  • The models chosen for further validation were K-Nearest Neighbours, Gaussian Naïve Bayes, Support Vector Machine, Random Forest and Logistic regression. Using K-fold splitting of the training data, the models were cross-validated and the model's suitability was evaluated using the Accuracy, F1-score, Precision, Recall, and Specificity, and false and true positive and negative rate as selection metrics.

  • The models' prediction ability was assessed using the test data. The model's prediction accuracy was determined using Accuracy, F1-score, Precision, and Recall evaluation metrics.

  • The logistic regression (LR) model exhibited the greatest results on the test dataset and therefore was evaluated on the external dataset. The LR model obtained an 82% active and 98% inactive accuracy.

Step 4: Prediction

Step 5: Molecular Docking

  • The crystal structure of the Dengue 2 virus envelope protein (PDB: 2FOM) was identified for structure-based virtual screening.

  • AutoDock Vina was utilized to screen the 7,722 compounds into the Dengue 2 virus envelope protein.

  • The potential hits were selected using the criterion:

    • AutoDock Vina binding score.
    • Presence of binding interactions between important binding site residues and ligand (LigPlot + v1.4.5).

Step 6: ADMET prediction

Step 7: Molecular Dynamics (MD) Simulations

  • The hits binding mode stability will be assessed through 100-nanosecond (ns) MD simulations utilizing GROMACS.
  • The stability will be assessed using metrics like root-mean-square deviation (RMSD) and fluctuation (RMSF), Radius of Gyration (Rg), etc., using Xmgrace.
  • The compounds binding interactions retention with important residues throughout the MD simulations will be assessed with the ProLIF python library.
  • The compounds' binding free energies throughout the MD simulation were calculated using Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA).

Manuscript

Important

When using the pipeline or findings for research or commercial purposes, please cite our research and star the repository.

Results

From the various analyses and computations performed throughout this in silico exploration, the following results were obtained and observations made.


Data Acquisition and Processing

The bioactive dataset obtained from PubChem consisted of imbalanced data from which 1/3 where active compounds, and inactives dominated the dataset as seen in Figure 1 below. PaDEL was used to generate 1,444 molecular descriptors, providing a mathematical representation of compounds for QSAR modeling via the convertion of chemical information to numerical values. The dataset of 21,250 compounds was split into: 14,875 training set, 3,187 test set, and 3,188 externally-held set. Applying a variance filter with threshold value 0.1 shortened descriptors from 1,444 to 684, filtering out those with minimal variance, guaranteeing only the most informative features were withheld for posterior modeling.



Figure 1. Three-dimensional plot of the correlation between active and inactive compounds in the processed dataset according to ALogP, XLogP, and Zagreb.


Model Development and Evaluation

Five machine learning algorithms (k-NN, Gaussian Naïve Bayes, SVM, Random Forest, and Logistic Regression) were employed to build robust predictive models, each assessed based on several statistical parameters including the accuracy, precision, recall, and F1 score. LR produced the best results across most metrics, followed by the SVM model. With 81% accuracy and an F1 score of 0.51, Gaussian Naïve Bayes was depicted as least performing, and it struggled to balance between identification of true positives and minimization of false positives, given its low precision (0.55) and recall (0.47). Given its robustness via the various evaluation metrics, Logistic Regression appeared as the most dependable model for predicting potential inhibitors (Table 1).

Table 1. Evaluation of ML models performance on retained datasets.

Model Accuracy Precision Recall F1 Score
LR 0.94 0.91 0.76 0.83
SVM 0.93 0.94 0.71 0.81
KNN 0.92 0.89 0.68 0.77
RF 0.91 0.94 0.60 0.73
NB 0.81 0.55 0.47 0.51



Figure 2. Bar plot of performance metrics for each Machine Learning model.


Prediction of Inhibitors and New Compounds

Eighteen (18) known Dengue Virus inhibitors sourced from literature were used for initial testing and to further ascertain the model performance. Amongst these inhibitors, the Logistic Regression model successfully depicted 11 as active as seen below (Table 2), surpassing the performance of other ML models. These compounds underwent the same preprocessing as training data to ensure consistency in descriptor calculation and transformation. Notable active compounds predicted by the LR model included Pentoxifylline, Prochlorperazine, Balapiravir, Celgosivir, and Bortezomib, to cite a few. This effective validation showed it could extrapolate to compounds with same mode of action. The LR model was employed to predict activity in 812 and 1871 compounds from the ZINC and EANPDB databases respectively. Of the 2683 evaluated compounds, 933 were predicted as active and suitable for further investigation. This approach highlighted the significance of careful descriptor selection and data preprocessing in QSAR modeling toward properly addressing imbalanced data in anti-Dengue drug discovery.

Table 2. Prediction results for known Dengue Virus inhibitors using LR.

Inhibitors Prediction Mechanism of Action References
1 Pentoxifylline 1 Immune modulation Salgado et al., 2012
2 4-hydroxyphenyl retinamide 0 Inhibits viral replication Carocci et al., 2015; Fraser et al., 2014
3 Prochlorperazine 1 Inhibits viral binding and viral entry Simanjuntak et al., 2015
4 Balapiravir 1 Inhibits viral replication Nguyen et al., 2013
5 Bortezomib 1 Inhibits viral replication Ci et al., 2023
6 Leflunomide 1 Immunosuppressive effects Wu et al., 2011
7 SKI-417616 1 Inhibition of D4R suppressed DENV infection Smith et al., 2014
8 Celgosivir 1 Inhibits viral replication Tian et al., 2018
9 UV-4B 1 Inhibits viral replication Franco et al., 2021
10 2-C-methylcytidine 0 Inhibits viral replication Lee et al., 2015
11 Ketotifen 1 Vascular leakage Lai et al., 2017
12 Chloroquine 1 Inhibits viral replication Lai et al., 2017
13 Dasatinib 0 RNA replication inhibition de Wispelaere et al., 2013
14 Lovastatin 0 Inhibits viral replication Whitehorn et al., 2016
15 ST-148 0 Inhibits viral replication Byrd et al., 2013
16 Dexamethasone 0 Inhibits viral replication Kularatne et al., 2009
17 Prednisolone 1 Inhibits viral replication Lai et al., 2017
18 Ivermectin 0 Helicase inhibition Xu et al., 2018

  0 = Inactive, 1 = Active


Target Selection and Molecular Docking of Predicted Compounds

This research focused on the NS2B/NS3 protease, chosen as a primary target from the seven nonstructural proteins of the Dengue Virus to validate the Logistic Regression (LR) model predictions. As a crucial enzyme involved in the viral replication and assembly process, it acts as a significant target for antiviral drug development. Potential inhibition sites for the DENV protease include the active and binding site of the NS3 protease with its cofactor, NS2B. The active site, which is the main target for intervention, comprises a preserved catalytic triad consisting of His51, Asp75, and Ser135. Searching through the Protein Data Bank revealed several solved structures for the NS2B/NS3 complex in Dengue Virus serotype, identified by PDB IDs like 4M9T, 2FOM, 4M9M, and 4M9I, with resolutions of 1.74, 1.50, 1.53, and 2.40 $Å$, and R-values of 0.215, 0.176, 0.203, and 0.215, respectively. For this study, 2FOM was selected due to its favorable resolution and R-value, allowing for visualization of its 3D structure with a ligand docked at the active site (possessing other residues like Leu128, Pro132, Ser131, & Tyr161), as represented below (Figure 3).


Figure 3. NS2B/NS3 protease structure as visualized in PyMOL, highlighting ligand docking. [A. Pale-yellow cartoon structure representation of the protein structure; B. Light-green surface representation of the protein with a ligand (blue) positioned in the active site].

853 compounds, curated from the total ML outputs, predicted by the LR model were docked into the active site of the NS2B/NS3 protease using Autodock Vina. Among these, anhydrophlegmacin exhibited the highest binding affinity of -9.2 $kcal/mol$, surpassing all other docked ligands. The binding affinities of the compounds ranged from -9.2 to -3.6 $kcal/mol$, supporting the predictive capability of the Logistic Regression model. With a threshold set at -8.0 $kcal/mol$, 59 compounds with equivalent or better binding energies were selected for further investigation. This threshold exceeded the conventional -7.0 $kcal/mol$ standard for classifying compounds as active against a given target(Kwofie et al., 2022). The binding affinity reflects the strength of interaction between ligands and target protein. Visual inspections of protein-ligand complexes were conducted using PyMOL to identify the most promising docked compounds. Known inhibitors - Leflunomide and Prednisolone - were included as controls, respectively demonstrating binding affinities of -7.1 and -7.0 $kcal/mol$. The binding affinities for all predicted compounds and inhibitors were systematically recorded and stored here.


Mechanism of Binding Characterization of selected compounds

Building on the structure-based molecular docking approach, interactions between the predicted compounds and the identified binding pocket were analyzed. Biomolecular interactions between the NS2B/NS3 protease and docked compounds were illustrated using LigPlot, to help identify key residues within the proteins' active sites, crucial investigation step for identification of promising lead compounds. Hydrogen bonding and hydrophobic interactions between the selected compounds and active site residues were investigated to identify potent inhibitors of the NS2B/NS3 protease.

Ligands docking to the protease's active site showed interactions with essential residues like His51, Ser135, Leu128, Pro132, Ser131, Tyr161, and Asp75, as detailed in Table 3 and the accompanying Supplementary file 1. With highest binding affinities, anhydrophlegmacin and anhydrophlegmacin-9,10-quinones_B2, interacted with same residues, including His51, Asp75, Gly151, Leu128, Pro132, and Gly153. These ligands formed hydrogen bonds with Asp75-Ser135-His51, with bond lengths measuring 2.57, 3.06, and 2.86 $Å$, respectively. Prednisolone established hydrogen bonds with Gly151, Asp75, His51, and Gly153 (Table 3). Furthermore, ZINC14441502 interacted with Gly151 and Ser135 at bond lengths 2.86 and 2.99 $Å$, respectively, and engaged in hydrophobic bonding with Leu128, Gly153, Asn152, Val72, Asp75, His151, and Phe130 (Supplementary file 1). A total of 39 from the 56 compounds docked effectively, showing strong interactions were chosen for subsequent analysis.

Table 3. Top 15 Protein-ligand interactions between selected hits and NS2B/NS3 post-docking, including two known inhibitors.

Compound names Binding Affinity ($kcal/mol$) Hydrogen bonding with bond length ($Å$) Hydrophobic contacts
1 anhydrophlegmacin -9.2 Asn152 (2.76), Gly153 (2.88), Ser135 (3.06), Gly151 (2.86) Val72, Asp75, His51, Pro132, Tyr150, Leu128
2 anhydrophlegmacin-9,10-quinones_B2 -9.2 Val72 (2.96), Asp75 (2.57), His51 (2.86), Lys73 (2.94) Leu128, Pro132, Gly151, Gly153, Tyr161, Trp50
3 ZINC000035941652 -9.1 Leu149 (3.06) Trp83, Asn152, Ala164, Ile165, Lys73, Asn167, Thr120, Ile123, Ala166, Lys74, Gly148, Leu76
4 chryslandicin -9.0 Val72 (2.74) Gly153, Trp50, His51, Tyr161, Leu128, Pro132, Gly151, Asn152, Asp75
5 ZINC000085594516 -8.8 Ser135 (3.09) Leu128, Tyr150, Pro132, Phe130, Gly151, His51, Asn152, G1y153, Asp75
6 6a,12a-dehydromillettone -8.7 None His151, Asp75, Gly151, Gly153, Tyr150, Phe130, Pro132, Leu128
7 ZINC000028462577 -8.6 Ser135 (2.67), Val72 (2.94) Trp50, Gly151, Leu128, Phe130, His51, Gly153, Pro132, Tyr150
8 anhydrophlegmacin-9',10'-quinone -8.6 Asn152 (2.88), Gly153 (2.84), Ser135 (2.94) Asp75, Val154, Val72, Trp50, His51, Pro132, Leu128, Gly151
9 2',4'-dihydroxychalcone-(4-O-5''')-4'',2''',4'''-trihydroxychalcone -8.6 Leu149 (2.99), Thr120 (3.26) Val154, Lys73, Val72, Asn152, His51, Asp75, Gly148, Leu76, Gly153. Trp83, Lys74, Ile165, Ala166, Ala164, Asn167, Ile123
10 ZINC000095485910 -8.6 Phe130 (2.71) Ser135, Gly151, Leu128, His51, Asp75, Gly153, Pro132, Tyr150
11 ZINC000095485955 -8.6 Trp83 (2.84), Leu149 (3.20), Asn152 (2.80) Gly87, Val146, Met149, Leu76, Ala164, Asn167, Ile165, Ala166, Gly148, Leu85, Val147
12 ZINC000095486025 -8.5 Leu128 (3.34) Gly153 (2.87) Val72, His51, Asp75, Ser135, Gly151, Phe130, Pro132, Tyr150, Tyr161, Val54, Lys73, Asn152
13 ZINC000038628344 -8.5 His51 (2.89), Ser135 (2.68), Asp75 (2.57), Phe130 (3.06), Tyr150 (3.10) Pro132, Ser131, Leu128, Tyr161, Gly153, Gly151
14 ZINC000095486053 -8.4 Gly151 (2.99) His51, Pro132, Tyr150, Ser135, Phe130, Leu128
15 phaseollidin -8.4 Gly87 (2.83), Val146 (2.98) Leu85, Trp83, Gly148, Leu149, Ala164, Leu76, Asn167. Asn152, Lys74, Ile165, Trp89, Ala166, Glu88, Glu86, Val147

A visual representation of the interactions can be observed between ZINC38628344 and the NS2B/NS3 protease, with binding affinity of -8.5 $kcal/mol$, that established hydrogen bonds with His51 (2.89 $Å$), Asp75 (2.57 $Å$), Phe130 (3.06 $Å$), in addition to hydrophobic interactions with residues Pro132, Ser131, Leu128, Tyr161, Gly153, Gly151 (Figure 4).



Figure 4. Ligand ZINC38628344 docked into the NS2B/NS3 binding pocket, with 2D protein-ligand interaction visual produced with PyMOL (left) and LigPlot (right).


ADMET Screening of Selected Compounds

Pharmacokinetic analyses focus on absorption and elimination of medications by the body. Key features, including gastrointestinal (GI) absorption, were assessed; "High" absorption potential being the most favorable. Veber's criteria were applied, and 20 out of 39 hits that did not comply with Lipinski's Rule of Five (RO5) were eliminated, 12 of which breached one of the RO5 criteria (Supplementary Table 2). Overall, 31 compounds were deemed drug-like, while 7 showed poorest drug-likeness, including 5,7'-physcion-fallacinol, ZINC000095485956, ZINC000085594516, amentoflavone, ZINC000095486111, voucapane-18,19-di-(4-methyl)-benzenesulphonate, and ZINC000095485927, with two RO5 violations (Supplementary Table 2). Veber’s rule, emphasizing TPSA ≤ 140 and rotatable bonds ≤ 10, further filtered the hits, with 26 demonstrating zero violations. Solubility and pharmacological profiles indicated that while only one compound (ZINC000095485927) was predicted to be insoluble, many others showed moderate to poor solubility (Supplementary Table 2). Owing to GI absorption, 21 selected hits were marked High, against 18 Low. The mutagenicity and tumorigenicity of the hits were also assessed with DataWarrior (Table 4).

Table 4. Prediction of ADME (absorption, distribution, metabolism, excretion) and toxicity profiles for the top selected hits.

Ligands ESOL Solubility Class GI absorption RO5 violation Veber’s rule violation Mutagenicity Tumorigenicity
1 ZINC000004095704 Soluble Low 1 1 None None
2 ZINC000095485958 Soluble Low 1 1 None None
3 ZINC000095485940 Soluble High 0 0 None None
4 ZINC000095485986 Soluble Low 0 1 None None
5 dihydrolanneaflavonol Moderately soluble High 0 0 None None
6 lettowianthine Moderately soluble High 0 0 High High
7 millettosine Moderately soluble High 0 0 None None
8 ZINC000095486053 Moderately soluble High 0 0 None None
9 ZINC000031168265 Soluble High 0 0 None None
10 ZINC000095485910 Moderately soluble High 0 0 High High

Molecular Dynamics Simulations

To further investigate the stability of predicted lead compounds within the active site, Molecular Dynamics Simulations were conducted using GROMACS 2020.5. The binding mechanisms of the various molecules within the active site are essential for effective drug design. Dynamic behavior analyses of both unbound proteins and their complexes were performed, plotting metrics like root mean square deviation (RMSD), radius of gyration (Rg), and root mean square fluctuation (RMSF) using Xmgrace. All simulations were executed for 100 ns.

Root mean square deviation (RMSD) for 100 ns MD simulations

The RMSD is a well-grounded indicator of protein stability, evaluating complex versus original atomic coordinates of the protein backbone. The RMSD analysis indicated that both the unbound protein and the four lead compounds were stable over the 100 ns simulation, except for Prednisolone which showed instability until 70 ns (Figure 5). The NS2B/NS3pro-Prednisolone complex demonstrated significant fluctuations before stabilizing, NS2B/NS3pro-ZINC38628344 RMSD peaked at 0.25 nm and then stabilized (averaging 0.22 nm), while the other complexes were more stable, averaging around 0.17 nm. The unbound protein showed the least fluctuation overall.



Figure 5. RMSD vs. time graph for the unbound protein and NS2B/NS3pro-ligand complexes generated throughout the 100 ns MD simulation.

Radius of gyration for 100 ns MD simulations

The folding and compactness of the 05 complexes and the unbound protein were assessed by plotting the radius of gyration (Rg) over the 100 ns simulation period. The Rg values for both the unbound NS2B/NS3 protease and the protein-ligand complexes ranged from 1.51 to 1.59 nm as seen in Figure 6. The unbound protease showed steady fluctuations until around 50 ns, where it rose till the end, whereas the protein-ligand complexes displayed comparable fluctuation trends throughout the 100 ns. The NS2B/NS3pro-Prednisolone complex exhibited the greatest fluctuations, peaking at 1.59 nm.



Figure 6. Rg graph comparing NS2B/NS3pro-ligand complexes and the unbound protein.

Root mean square fluctuations (RMSF) for 100 ns MD simulations

The RMSF trajectories of the protein-ligand complexes and the unbound NS2B/NS3 were analyzed. All predicted lead compounds caused noticeable changes in similar regions, as reflected in the RMSF plot. Significant fluctuations were observed from residue indexes 28-33, with additional variations between indexes 60-65 and 116-123. The RMSF graph also indicated fluctuations in the unbound protein, particularly around residues 102-106.



Figure 7. Analysis of the RMSF trajectories of the NS2B/NS3pro-ligand complexes and the unbound protein residues.


MMPBSA Computations

MMPBSA Computations helped assess potential activity via the assessment of free binding energies.

Contributing Energy Terms

The binding free energies of complexes were calculated using the Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) approach. Contributions to binding free energy include van der Waals energies, electrostatic interactions, polar solvation, and solvent-accessible surface area energy; noting average and standard deviations. The lead compounds ZIN38628344, ZINC95485940, ZINC14441502, and 2',4'-dihydroxychalcone exhibited binding energies of -44.957, -18.586, -25.881, and -55.805 $kJ/mol$, respectively, with 2',4'-dihydroxychalcone displaying the lowest and ZINC95485940 the highest binding free energy. Prednisolone had a binding free energy of -17.682 $kJ/mol$.

Table 5. MMPBSA energy contributions for NS2B/NS3-ligand complexes presented as averages ± standard deviations in kJ/mol.

Compounds van der Waal energy ($kJ/mol$) Electrostatic energy ($kJ/mol$) Polar solvation energy ($kJ/mol$) SASA energy ($kJ/mol$) Binding energy ($kJ/mol$)
1 ZINC38628344 -73.805 ± 4.608 -10.304 ± 1.231 48.041 ± 3.817 -8.983 ± 0.555 -44.957 ± 3.383
2 ZINC95485940 -54.337 ± 3.716 -22.090 ± 2.316 65.388 ± 4.613 -7.682 ± 0.473 -18.586 ± 2.821
3 ZINC14441502 -52.459 ± 3.949 -8.261 ± 0.862 41.318 ± 3.042 -6.400 ± 0.476 -25.881± 3.519
4 Prednisolone -39.913 ± 4.112 -9.190 ± 1.346 36.390 ± 3.989 -5.355 ± 0.527 -17.682 ± 3.583
5 2',4'-dihydroxychalcone-(4-O-5''')-4'',2''',4'''-trihydroxychalcone -160.105 ± 5.769 -41.801 ± 2.540 164.633 ± 6.076 -18.440 ± 0.639 -55.805 ± 3.467

Per-residue Energy Decomposition

By employing per-residue decomposition, binding free energies were computed via the MMPBSA method. Residues contributing a binding free energy of at least ± 5 $kJ/mol$ were considered critical for ligand binding. Per-residue energy decomposition was performed for each complex. In NS2B/NS3-ZINC14441502 complex, only Tyr161 contributed a binding energy of -6.4629 $kJ/mol$ (Figure 8). For the NS2B/NS3-ZINC38628344 complex, Tyr161 and Leu128 contributed energies of -6.6957 and -3.4011 $kJ/mol$. Other key residues interacting with ZINC95485940, 2',4'-dihydroxychalcone, and Prednisolone contributed minor energy values.



Figure 8. MMPBSA plot illustrating binding free energy contributions for NS2B/NS3-ZINC14441502 complex.


Here is a summary of the data flow chart throughout this research:

graph TD;
    A[Bioactive Dataset - 343,305 compounds]--Data preprocessing (Compound standardization)-->B[21,250 Study data: 4470 actives + 16780 inactive];
    B[21,250 Study data: 4470 actives + 16780 inactives]--Data splitting (1:4)-->C[14,875 training data + 3,187 test + 3,188 externally held];
    C[14,875 training data + 3,187 test + 3,188 externally held]--Evaluation data-->D[3,188 externally held];
    D[3,188 externally held]--Model validation-->E{ML Model pool};
    E{ML Model pool}--Model selection-->F{QSAR models};
    G[PaDEL descriptors: 1,444]--Variance filter (Threshold = 0.1)-->H[Approved descriptors: 684];
    H[Approved descriptors: 684]--QSAR modeling-->F{QSAR models};
    I[18 Known inhibitors‡]--※Further model validation-->F{QSAR models};
    F{QSAR models}-->J{Logistic Regression};
    F{QSAR models}--※LR output-->K[11 Inhibitors marked **Active**‡];
    L[2683 New compounds: 812 ZINC & 1871 EANPDB]--LR model prediction-->J{Logistic Regression};
    J{Logistic Regression}--Yes-->M[933 active compounds];
    J{Logistic Regression}--No-->N[1750 inactive compounds];
    M[933 active compounds]--Compound selection based on 2FOM structure-->O[853 selected compounds];
    O[853 selected compounds]--NS2B_NS3 Molecular Docking (Affinity ≤ -8.0 kcal)-->P[59 ligands + 2 Known inhibitors];
    P[59 ligands + 2 Known inhibitors]--Binding affinities postdocking-->Q[39 Top docked hits];
    Q[39 Top docked hits]---->R{ADMET Screening};
    R{ADMET Screening}--Veber's rules & Lipinski's RO5-->S[20 Top non-violating hits];
    S[20 Top non-violating hits]-->T[Top Protein-Ligand complexes];
    T[Top Protein-Ligand complexes]-->U{Molecular Dynamics Simulations};
    T[Top Protein-Ligand complexes]-->V{MMPBSA Computations};
    U{Molecular Dynamics Simulations}--RMSD-->W[2',4'-dihydroxychalcone/ZINC14441502/ZINC95485940 > ZINC38628344];
    U{Molecular Dynamics Simulations}--Rg-->X[ZINC38628344/ZINC14441502/ZINC95485940 > 2',4'-dihydroxychalcone];
    U{Molecular Dynamics Simulations}--RMSF-->Y[ZINC38628344 \ ZINC95485940 \ 2',4'-dihydroxychalcone \ ZINC14441502];
    V{MMPBSA Computations}--Contributing energy Terms-->Z[2',4'-dihydroxychalcone > ZINC38628344 > ZINC14441502 > ZINC95485940];
    V{MMPBSA Computations}--Per-residue Decomposition-->AA[ZINC38628344 > ZINC14441502 > ZINC95485940 > 2',4'-dihydroxychalcone];
Loading

How to use

The documentation and videos give a general overview of how the pipeline was built and can be utilized to identify novel Dengue Virus inhibitors.

Tutorial

Study pipeline describes how the models were constructed, selected, validated, and implemented. It also pinpoints how the various scripts were written and put into action.

Molecular Docking and Dynamics Simulations though computerized, are also briefly described, and some of the scripts run to compute these analyses are identified and put into action.

Tutorial 2

Construction of a possible PyPi installation package for novel antiviral therapeutics prediction and discovery.

Data Availability

The data utilized for the project can be found here.

Reproducibility Prerequisites

Note

The codes and scripts were run on Python 3.8, Anaconda3 2024.06.1 and Jupyter Notebook version 7.

R 4.3.0 was used for some of the data visualization to plot graphs from MMPBSA computations.

Credits

The Team members include:

  1. George Hanson – [email protected]
  2. Joseph Adams - [email protected]
  3. Daveson Innocento Brank Kepgang - [email protected]
  4. Andy Asante - [email protected]
  5. Emmanuel Israel Nsedu - [email protected]
  6. Hem Bondarwad – [email protected]
  7. Maureen Kisaakye - [email protected]
  8. Lewis Tem Bueh - [email protected]
  9. Luke S. Zondagh - [email protected]
  10. Soham Amod Shirolkar - [email protected]
  11. Olaitan I. Awe - [email protected]

About

Machine Learning and Molecular Docking Prediction of Potential Inhibitors against Dengue Virus

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages