Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction Failed with "Duplicate Entity error" #11

Open
LiorZ opened this issue Nov 18, 2024 · 3 comments
Open

Prediction Failed with "Duplicate Entity error" #11

LiorZ opened this issue Nov 18, 2024 · 3 comments

Comments

@LiorZ
Copy link

LiorZ commented Nov 18, 2024

Stack:

   return self.predict_loop.run()                                                                                            [18/37619]
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 178, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/pytorch_lightning/loops/prediction_loop.py", line 124, in run
    self._predict_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/pytorch_lightning/loops/prediction_loop.py", line 266, in _predict
_step
    call._call_callback_hooks(trainer, "on_predict_batch_end", predictions, *hook_kwargs.values())
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 218, in _call_callback_ho
oks
    fn(trainer, trainer.lightning_module, *args, **kwargs)
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/pytorch_lightning/callbacks/prediction_writer.py", line 155, in on
_predict_batch_end
    self.write_on_batch_end(trainer, pl_module, outputs, batch_indices, batch, batch_idx, dataloader_idx)
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/boltz/data/write/writer.py", line 143, in write_on_batch_end
    f.write(to_mmcif(new_structure))
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/boltz/data/write/mmcif.py", line 189, in to_mmcif
    dumper.write(fh, [system])
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/modelcif/dumper.py", line 870, in write
    return ihm.dumper.write(fh, systems, format, dumpers, variant)
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/ihm/dumper.py", line 3918, in write
    d.finalize(system)
  File "/home/lior/mambaforge/envs/boltz/lib/python3.10/site-packages/ihm/dumper.py", line 300, in finalize
    raise ValueError("Duplicate entity %s found" % entity)
ValueError: Duplicate entity <ihm.Entity(None)> found

Input FASTA:

>A|protein|./msa/seq.a3m
AKVAKINIAVAGTGYVGLSIAVLLAQHHQVTAVDIIQEKVDLINSKKSPIQDDYIEKYLAEKDLNLVATLDAEKAYKDAEIVVIAAPTNYDSAKNYFDTSHVEAVIQTVLSVNPQALMVIKSTIPVGFTQSMREKFGTENIIFSPEFLRESKALYDNLYPSRIIVSYESNSPQTVIEGAKLFAKLLQQGALKENVEVLHMGATEAEAVKLFANTYLALRVSYFNELDTYAELKGLDTESIIRGVGLDPRIGDHYNNPSFGYGGYCLPKDTKQLLANYNDIPQNMMTAIVESNRTRKDFIADQILKIAGYYDYSSHDQYSQLGEKEVIIGVYRLTMKSNSDNFRQSSIQGVMKRLKAKGAKVIIFEPTLENGSTFFGSKVINNLNQFKSKSHAIVANRYDSVLDDVLDKVYTRDIFRRD
>B|smiles
C1C=CN(C=C1C(=O)N)[C@H]2[C@@H]([C@@H]([C@H](O2)COP(=O)(O)OP(=O)(O)OC[C@@H]3[C@H]([C@H]([C@@H](O3)N4C=NC5=C(N=CN=C54)N)O)O)O)O
>C|smiles
C1=CN(C(=O)NC1=O)[C@H]2[C@@H]([C@@H]([C@H](O2)COP(=O)(O)OP(=O)(O)O[C@@H]3[C@@H]([C@H]([C@@H]([C@H](O3)CO)O)O)O)O)O
@jwohlwend
Copy link
Owner

I found the issue with this. It was due to the fact that all smiles ligands are being given the LIG 3 letter code in the mmcif which then led to multiple entities of the same underlying code. The current solution is to create one entity for all ligands provided as smiles strings, which I don't love but is a reasonable workaround for now.

I pushed a fix to the main branch, and will make a new release in pypi at some point today.

@LiorZ
Copy link
Author

LiorZ commented Nov 19, 2024

Hi!
Thanks a lot for the fix and for this amazing package.
I also fixed this issue in my personal fork and right when i was about to submit a pull request, saw you guys already fixed it.
It seems the issue stems from the fact that the ihm.ChemComp comparator performs this comparison to check if the entities are the same:

# Equal if all identifiers are the same def __eq__(self, other): return ((self.code, self.code_canonical, self.id, self.type) == (other.code, other.code_canonical, other.id, other.type))
https://github.com/ihmwg/python-ihm/blob/afb05c04b23e40cd07c168cceeeac2456c611b36/ihm/__init__.py#L975C1-L979C1

Providing "code_canonical" argument with a unique value to the ctor:
chem_comp = lambda x: ihm.NonPolymerChemComp(id=x,code_canonical=f"X{k}") # noqa: E731
solves this issue.
See here in my repo

If you like this solution better, i'd be happy to create a pull request (although you can also copy/paste it easily enough)

Thanks again!

@jwohlwend
Copy link
Owner

Oh that's clever, I think better than my solution. Would you mind opening a PR? Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants