Skip to content

davidwongmedinfo/CTTCR

 
 

Repository files navigation

The CATCR Model

The CATCR model comprises two components: CATCR-D and CATCR-G. CATCR-D is a discriminative model that predicts epitope-CDR3-beta pairs, while CATCR-G is a generative model designed to generate CDR3-beta sequences that bind to a given epitope.

Environment Requirements:

  • Python 3.9.18
  • PyTorch 2.1.0
  • CUDA 12.2

Training and Testing Data

The train and test data used for this model can be downloaded from [https://1drv.ms/u/s!ArPas8clhc3Fg6Qgfy9Zs1x4NADinw?e=48DLxi]. The pre-trained model is available at [https://1drv.ms/u/s!ArPas8clhc3Fg6EXGzXiMpYUxh8JZg?e=CjeSeB].

The internal test data includes "seen" epitopes with "unseen" CDR3 sequences. The external test data includes both "unseen" epitopes and CDR3 sequences. The PDB data was generated by OpenAI (a PyTorch version of AlphaFold2).

Datasets

  • Use DataLoader in Dataprocess.py to load training / test data of one batch.
  • b128_SupTrainBatch: Trainging set for CATCR-D
  • b128_SupInterTestBatch: Internal test set for CATCR-D test.
  • b128_SupTestBatch: External test set for CATCR-D test
  • b128_UnsupTrainBatch: Training set from CATCR-G
  • b128_UnsupInterTestBatch: Internal test set for CATCR-G
  • b128_UnsupTestBatch: External test set for CATCR-G

Data Preprocessing Tools

We provide a data processing tool to preprocess the sequence and PDB data in DataTransferTools.py.

  • The PDB2PositionFrame module converts .pdb files into Pandas DataFrames.
  • The RepDistanceAndSequenceMatrix module generates training data for a given sequence.
  • Note: A segment-based coding method is recommended. The correspondence between peptide sequences and their codes is listed in node_index.csv.

Training and Testing Procedures

  • The CATCR model consists of three modules: the discriminator (CATCR-D), the residue contact matrix transformer (RCMT), and the generative model (CATCR-G).
  • Train CATCR-D using SupTrain.py.
  • Train the RCMT using StructureTransformer.py.
  • Train CATCR-G using e2t_Generator.py.
  • Note: in the training of RCMT and CATCR-G, a pretrained epitope encoder is recommonded. Our pretrained models were provided at [https://1drv.ms/u/s!ArPas8clhc3Fg6EXGzXiMpYUxh8JZg?e=CjeSeB], or you can train the encoder from beginning.

Demo

About

Codes for CTTCR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%