Graph Neural Network + Attention mechanism to predict scoring functions (i-RMSD) for protein complexes and decoys.
Make sure to create a dedicated environment as follow :
conda env create --name <YOUR_ENV_NAME> --file=environment_graph_predictions.yml
Tutorial for the data preparation, gridsearch training , testing and inference are available in this repository
1. Fully Automated data preparation pipeline that creates balanced graph datasets from PDB protein complexes and decoys files
2. Automated gridsearch for graph neural net architecture selection (Convolution, Node Attention, Edge Attention, Node+Edge Attention, customizable);
optimizer selection; possibility to train from scratch/resume training/transfer learning; feature selection
Conclusion : Precision within 2 A is reached using attention at both the node and the edge level to leverage complex interaction patterns between the nodes. Further training and architectures/hyperparameter exploration are required and might lead to performance improvement.
- augmented training dataset
- introduction of more features (pssm,depth,hse)
- hyperparameter gridsearch exploration
- use pretrained model as a feature embedding
- deeper version of Edge + Node attention network
- energy scoring functions