Skip to content

Improving Protein Succinylation Sites Prediction Using Features Extracted from Protein Language Model

License

Notifications You must be signed in to change notification settings

KCLabMTU/LMSuccSite

Repository files navigation

LMSuccSite

Improving Protein Succinylation Sites Prediction Using Features Extracted from Protein Language Model

python Bio Keras matplotlib numpy pandas Requests scikit_learn seaborn tensorflow PyTorch tqdm Transformers XGBoost
GitHub last commit GitHub license GitHub pull requests

Web Server

http://kcdukkalab.org/LMSuccSite/

Authors

Suresh Pokharel1, Pawel Pratyush1, Michael Heinzinger2, Robert H. Newman3, Dukka B KC1*
1Department of Computer Science, Michigan Technological University, Houghton, MI, USA
2Department of Informatics, Bioinformatics and Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
3Department of Biology, College of Science and Technology, North Carolina A&T State University, Greensboro, NC, USA

* Corresponding Author: [email protected]

Installation

If git is installed on your machine, clone the repository by entering this command into the terminal:

git clone [email protected]:KCLabMTU/LMSuccSite.git

or download the repository as a zip file by clicking here

Install Libraries

Python version: 3.9.7

To install the required libraries, run the following command:

pip install -r requirements.txt

Required libraries and versions:
Bio==1.5.2
keras==2.9.0
matplotlib==3.5.1
numpy==1.23.5
pandas==1.5.0
protobuf==3.20.*
requests==2.27.1
scikit_learn==1.2.0
seaborn==0.11.2
tensorflow==2.9.1
torch==1.11.0
tqdm==4.63.0
transformers==4.18.0
xgboost==1.5.0

Install Transformers

pip install -q SentencePiece transformers

Model evaluation using the existing benchmark independent test set

Please run the evaluate_model.py script. To evaluate our model on the independent test set, we have already placed the test sequences and corresponding ProtT5 features in data/test/ folder. Once you install the requirements, run the following command:

python evaluate_model.py

To run LMSuccSite model on your own sequences

In order to predict succinylation site using your own sequence, you need to have two inputs:

  1. Copy sequences you want to predict to input/sequence.fasta
  2. Run python predict.py
  3. Find results inside output folder

Training and other experiments

  1. Find training data at data/train/ folder
  2. Find all the codes and models related to training at training codes folder.

Citation

Pokharel, S., Pratyush, P., Heinzinger, M. et al. Improving protein succinylation sites prediction using embeddings from protein language model. Sci Rep 12, 16933 (2022). https://doi.org/10.1038/s41598-022-21366-2

Link: https://rdcu.be/cXFfM

Contact

Please send an email to [email protected] (CC: [email protected], [email protected]) for any kind of queries and discussions.

Additional files can be found at https://drive.google.com/drive/folders/1gzRzxoNI3LTWuU24qiBB-vu1t6-AsGW4?usp=drive_link