The CATCR model comprises two components: CATCR-D and CATCR-G. CATCR-D is a discriminative model that predicts epitope-CDR3-beta pairs, while CATCR-G is a generative model designed to generate CDR3-beta sequences that bind to a given epitope.
- Python 3.9.18
- PyTorch 2.1.0
- CUDA 12.2
The train and test data used for this model can be downloaded from [https://1drv.ms/u/s!ArPas8clhc3Fg6Qgfy9Zs1x4NADinw?e=48DLxi]. The pre-trained model is available at [https://1drv.ms/u/s!ArPas8clhc3Fg6EXGzXiMpYUxh8JZg?e=CjeSeB].
The internal test data includes "seen" epitopes with "unseen" CDR3 sequences. The external test data includes both "unseen" epitopes and CDR3 sequences. The PDB data was generated by OpenAI (a PyTorch version of AlphaFold2).
- Use
DataLoader
inDataprocess.py
to load training / test data of one batch. - b128_SupTrainBatch: Trainging set for CATCR-D
- b128_SupInterTestBatch: Internal test set for CATCR-D test.
- b128_SupTestBatch: External test set for CATCR-D test
- b128_UnsupTrainBatch: Training set from CATCR-G
- b128_UnsupInterTestBatch: Internal test set for CATCR-G
- b128_UnsupTestBatch: External test set for CATCR-G
We provide a data processing tool to preprocess the sequence and PDB data in DataTransferTools.py.
- The PDB2PositionFrame module converts .pdb files into Pandas DataFrames.
- The RepDistanceAndSequenceMatrix module generates training data for a given sequence.
- Note: A segment-based coding method is recommended. The correspondence between peptide sequences and their codes is listed in node_index.csv.
- The CATCR model consists of three modules: the discriminator (CATCR-D), the residue contact matrix transformer (RCMT), and the generative model (CATCR-G).
- Train CATCR-D using SupTrain.py.
- Train the RCMT using StructureTransformer.py.
- Train CATCR-G using e2t_Generator.py.
- Note: in the training of RCMT and CATCR-G, a pretrained epitope encoder is recommonded. Our pretrained models were provided at [https://1drv.ms/u/s!ArPas8clhc3Fg6EXGzXiMpYUxh8JZg?e=CjeSeB], or you can train the encoder from beginning.
- We provided demo for CATCR-D and CATCR-G. The demo for CATCR-D is
D_test.py
, and demo from CATCR-G isBeamSearch.py
- To run the demo, you can download the demo data from [https://1drv.ms/u/s!ArPas8clhc3FhpQkWsWOPT9Jz9_k5A?e=EqFozq], and the pretrained model from [https://1drv.ms/u/s!ArPas8clhc3Fg6EXGzXiMpYUxh8JZg?e=CjeSeB], and put the two folders under CTTCR.