Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable 3D Ligand Files #34

Open
rjrich opened this issue Nov 20, 2024 · 7 comments
Open

Enable 3D Ligand Files #34

rjrich opened this issue Nov 20, 2024 · 7 comments

Comments

@rjrich
Copy link

rjrich commented Nov 20, 2024

When I used boltz-1 to predict the structure of a protein-ligand complex, the protein structure is fine, but the ligand pose is very distorted and rather remote from the pose in a reference x-ray crystal structure. The problem seems to be with using smiles strings as ligand files. Are there plans to enable using true 3D cooordinate ligand files such as SDF or MOL2? I have tried various conversions to smiles in attempts to preserve 3D and stereochemical information, but these have not worked satisfactorily. Thanks.

@jwohlwend
Copy link
Owner

Would you mind sharing with me your input config so I can take a look? Also does your ligand happen to have a CCD code I can compare against?

@rjrich
Copy link
Author

rjrich commented Nov 21, 2024 via email

@rjrich
Copy link
Author

rjrich commented Nov 21, 2024 via email

@RuikangSun
Copy link

RuikangSun commented Nov 25, 2024

The problem seems to be with using smiles strings as ligand files.

Sorry but I'm not sure if using 3D ligand files can make any meaningful results. One of the purpose of AlphaFold 3 and Boltz is predicting protein-ligand interaction. 3D ligand input generated in silico is not equal to conformation in protein binding pocket.
SMILES string can have stereochemical information (L-Proline is C(O)(=O)[C@@H]1CCCN1 for example). I have also tried generating protein-ligand interaction with SMILES as input. Although the binding pose is terribly wrong, the ligand structure (bond lenth, bond angle, etc) is reasonable.

Cheers!

@rjrich
Copy link
Author

rjrich commented Nov 25, 2024

Perhaps at least part of the problem is getting the SMILES string correct, especially for those of us who are not fluent in SMILES. For example, the PubChem SMILES string for ethinylestradiol (EE2) is as follows:
(C[C@]12CC[C@H]3[C@H]([C@@H]1CC[C@]2(C#C)O)CCC4=C3C=CC(=C4)O
If that string is opened in a 3D viewer, one can see that the structure of the 5-membered ring is incorrect.

Moreover, I found various incorrect versions of SMILES strings for this compound on other databases.
In addition, if the 3D SDF file is downloaded from PubChem and converted to SMILES using OpenBabel, the string is not correct.

However, the ChEMBl string appears to be correct:
[H][C@]12CC[C@@]3(C)[C@@]([H])(CC[C@@]3(O)C#C)[C@]1([H])CCc1cc(O)ccc21

When entering the SMILES strings in the yaml file for submitting a boltz job, I have enclosed them in single quotes ('....').

Thus far, although boltz is superb at predicting my protein receptor structures, ligands are still distorted and not in agreement with crystal structures of complexes. For example, the ethinyl group in EE2 was converted to an ethyl group and the entire ligand was flipped 180 degrees around its long axis. In contrast, conventional docking with Vina reproduced the crystal structure almost exactly. Nevertheless, boltz is a spectacular and most welcome achievement in rapid and accurate protein structure prediction starting with only an amino acid sequence file!

@rjrich
Copy link
Author

rjrich commented Nov 26, 2024

Would you mind sharing with me your input config so I can take a look? Also does your ligand happen to have a CCD code I can compare against?

Per your request, please find attached my input yaml file in a zip archive. This contains the CCD codes for the two ligands. The reference PDB ID for the complex is 4X1G.
4x1g_ee2_nonat_ccd.yaml.zip

@benf549
Copy link

benf549 commented Nov 26, 2024

I've also wondered about this feature and 3D ligand inputs can additionally be useful by keeping the atom order constant. Remapping between the order in one stage of a pipeline and another can be quite tedious...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants