Introduction

Source code for Seq2Symm: Rapid and accurate prediction of protein homo-oligomer symmetry

Rapid prediction of homo-oligomer symmetries using a single sequence as input

Getting Started

Dependencies are in the yaml file esm2_finetune.yaml

conda env create --name esm2 --file=esm2_finetune.yaml

Downloads

the predictions from the model on various datasets, predictions on proteomes http://files.ipd.uw.edu/pub/seq2symm/predictions.zip
the trained model http://files.ipd.uw.edu/pub/seq2symm/ESM2_model.ckpt

Training the model

python src/finetune.py --meta_data_file ../datasets/homomer_pdbids_hash_clusterid_labels_sampled.csv --data_dir ../datasets/ --model_dir models/ --output_model seq2symm --output_dir outputs/seq2symm --bs 16 --data_splits_file ../datasets/train_val_test_splits.pkl --limit 65536 --granularity 3 --n_classes 20 --n_epoch 100 --weighted_sampler 1

Predicting using the model

A jupyter notebook is available at src/load_chkpt_and_predict.ipynb that shows examples of how this is done for two different file formats

Predicting via command line

The src/predict_oligmerization.py script predicts the oligomerization states of protein sequences provided in a FASTA file, outputting the probabilities for each symmetry state with a probability >= 1%.

python predict_oligomerization.py -input_file <input_fasta> -chkpt_file <model_checkpoint> [-output_file <results.csv>] [-batch_size <n>]

-input_file: Path to the input FASTA file containing the sequences for prediction (required).
-chkpt_file: Path to the model checkpoint file (required).
-output_file: Path to save the prediction results as a CSV file (default: ./results.csv).
-batch_size: Batch size for processing sequences (default: 1).

Example output:

fasta_id, predicted_oligomerization_state
ID1, "{'C1': 0.9, 'C2': 0.1, 'Other': 0.001}"
ID2, "{'C4': 0.85, 'C6': 0.15, 'D3': 0.01}"

Which you can load with pandas like this:

import pandas as pd
import ast
df = pd.read_csv("results.csv")
df['predicted_oligomerization_state'] = df['predicted_oligomerization_state'].apply(ast.literal_eval)

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
datasets		datasets
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
esm2_finetune.yaml		esm2_finetune.yaml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Getting Started

Downloads

Training the model

Predicting using the model

Predicting via command line

About

Releases

Packages

Contributors 3

Languages

License

microsoft/seq2symm

Folders and files

Latest commit

History

Repository files navigation

Introduction

Getting Started

Downloads

Training the model

Predicting using the model

Predicting via command line

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages