Improving Protein Succinylation Sites Prediction Using Features Extracted from Protein Language Model
http://kcdukkalab.org/LMSuccSite/
Suresh Pokharel1, Pawel Pratyush1, Michael Heinzinger2, Robert H. Newman3, Dukka B KC1*
1Department of Computer Science, Michigan Technological University, Houghton, MI, USA
2Department of Informatics, Bioinformatics and Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
3Department of Biology, College of Science and Technology, North Carolina A&T State University, Greensboro, NC, USA
* Corresponding Author: [email protected]
If git is installed on your machine, clone the repository by entering this command into the terminal:
git clone [email protected]:KCLabMTU/LMSuccSite.git
or download the repository as a zip file by clicking here
Python version: 3.9.7
To install the required libraries, run the following command:
pip install -r requirements.txt
Required libraries and versions:
Bio==1.5.2
keras==2.9.0
matplotlib==3.5.1
numpy==1.23.5
pandas==1.5.0
protobuf==3.20.*
requests==2.27.1
scikit_learn==1.2.0
seaborn==0.11.2
tensorflow==2.9.1
torch==1.11.0
tqdm==4.63.0
transformers==4.18.0
xgboost==1.5.0
pip install -q SentencePiece transformers
Please run the evaluate_model.py
script.
To evaluate our model on the independent test set, we have already placed the test sequences and corresponding ProtT5 features in data/test/
folder. Once you install the requirements, run the following command:
python evaluate_model.py
In order to predict succinylation site using your own sequence, you need to have two inputs:
- Copy sequences you want to predict to
input/sequence.fasta
- Run
python predict.py
- Find results inside
output
folder
- Find training data at
data/train/
folder - Find all the codes and models related to training at
training codes
folder.
Pokharel, S., Pratyush, P., Heinzinger, M. et al. Improving protein succinylation sites prediction using embeddings from protein language model. Sci Rep 12, 16933 (2022). https://doi.org/10.1038/s41598-022-21366-2
Link: https://rdcu.be/cXFfM
Please send an email to [email protected] (CC: [email protected], [email protected]) for any kind of queries and discussions.
Additional files can be found at https://drive.google.com/drive/folders/1gzRzxoNI3LTWuU24qiBB-vu1t6-AsGW4?usp=drive_link