Skip to content

Data analysis and prediction of small-molecule accumulation in Gram-negative bacteria

Notifications You must be signed in to change notification settings

matvey83/GramNegAccum

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GramNegAccum

Data analysis and prediction of small-molecule accumulation in Gram-negative bacteria published as:

Predictive Compound Accumulation Rules Yield a Broad-Spectrum Antibiotic. Richter, M. F.; Drown, B. S.; Riley, A. P.; Garcia, A.; Shirai, T.; Svec, R. L.; Hergenrother, P. J. Nature 2017, published on web May 10, 2017.

Analysis of accumulation Data

Description of assay

The accumulation of a diverse library of small-molecules was measured in a LC-MS assay. E. coli cells were incubated with compounds for 10 min before being washed and lysed. Clarified lysate were analyzed by LC-MS/MS.

Datasets

Several collections of compounds are included in accum/data. These correspond to the published Supplementary Tables.

Name Compounds Description
table1 12 Controls for accumulation analysis
table2 100 Initial dataset for accumulation with diversity of functionality
table3 54 SAR analysis that examines specific descriptors
table4 68 Primary amines
table5 79 Common antibiotics excluding beta-lactams
table6 49 Common beta-lactams

Generation of physiochemical descriptors

Initial 3D coordinates and protonation states for molecules were determined using Schrodinger's Ligprep. For mixtures of epimers, the most stable diastereomer was used. Ensembles of conformers were generated using MOE LowModeMD conformer search (see accum/scripts/conf_search.zsh). Molecular descriptors were then calculated for each conformer and averaged (see accum/scripts/ensemble_average.py). Output data is located at accum/data/table4.csv.

Data preprocessing

All data analysis, model training, and figure generation was performed using R. The distributions and co-correlations of descriptors were examined in accum/analysis/feature_select.R. Descriptors with near-zero variance or high co-correlation were removed in order to improve model stability.

Random forest classification model

A random forest model was trained using the R package caret. Several cross-validation methods were examined accum/analysis/compareCV.R which resulted in selection of repeated 10-fold CV (n=20) as the final method. Variable importance was measured to identify molecular features that may contribute to small-molecule accumulation accum/analysis/rand_forest.R.

About

Data analysis and prediction of small-molecule accumulation in Gram-negative bacteria

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 55.3%
  • Python 26.4%
  • Shell 13.3%
  • Tcl 5.0%