Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: pdb_2022_09_28_mmcif_files.tar replacement with mmcif_files #88

Closed
hegelab opened this issue Nov 20, 2024 · 11 comments
Closed

bug: pdb_2022_09_28_mmcif_files.tar replacement with mmcif_files #88

hegelab opened this issue Nov 20, 2024 · 11 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@hegelab
Copy link

hegelab commented Nov 20, 2024

Hi,

You introduced this into run_alphafold.py, line 185:
pdb_2022_09_28_mmcif_files.tar # ~200k PDB mmCIF files in this tar.
mmcif_files/ # Directory containing ~200k PDB mmCIF files.

So run fails, since mmcif_files does not exists.

@Augustin-Zidek
Copy link
Collaborator

Augustin-Zidek commented Nov 20, 2024

Yes, this was done to significantly speed up template search. You will have to untar your PDB database (the download script has been updated to untar it).

See https://github.com/google-deepmind/alphafold3/blob/main/fetch_databases.sh#L28 for the exact command to run on the tarfile.

@Augustin-Zidek Augustin-Zidek added the question Further information is requested label Nov 20, 2024
@charlesbeattie
Copy link
Collaborator

You can pass the following flag to restore the original behaviour:

--pdb_database_path='${DB_DIR}/pdb_2022_09_28_mmcif_files.tar'

This will be considerably slower for each run of alphafold, so I would recommend untaring that file and keep the default.

@hegelab
Copy link
Author

hegelab commented Nov 21, 2024

Thanks :-)

@hegelab hegelab closed this as completed Nov 21, 2024
@hegelab hegelab reopened this Nov 21, 2024
@hegelab
Copy link
Author

hegelab commented Nov 21, 2024

It came into my mind:

If the structure search takes a significant amount of time then you may want to add an option not to perform it (I do not see this option; you can replace the template list with an empty list before starting inference, but this is post-processing after completed template search).

In most of my past cases with AF2 I needed to run the prediction without structural template. I suppose that AF3 works without structural templates as well as with templates (similar to AF2).

@Augustin-Zidek
Copy link
Collaborator

Yes, you are right, I will add an option to disable template search.

But I am also fixing the template search performance, so should be less of an issue.

@Augustin-Zidek Augustin-Zidek added the enhancement New feature or request label Nov 21, 2024
@kkorotkovuky
Copy link

Yes, you are right, I will add an option to disable template search.

But I am also fixing the template search performance, so should be less of an issue.

The template-free option will be extremely useful. Also, having the option to limit the template search by the release date - as implemented in AF2 - would be great.

@hegelab
Copy link
Author

hegelab commented Nov 22, 2024

May be important: in AF2 the filter-by-date was performed after the template search was completed; I would not perform the search on mmcif entries which can be already excluded based on the date. E.g. if I restricted the templates for later than 2050, the search was performed on all entries and finally none was used, since we were only in 2021.

@Augustin-Zidek
Copy link
Collaborator

Template search should now be much faster (up to ~100x in the mmCIF fetching and parsing stage after Hmmsearch) thanks to d6b06d6.

Starting work on the template-free and date filter features.

@Augustin-Zidek
Copy link
Collaborator

The ability to run template-free was added in 1942639.

@Augustin-Zidek
Copy link
Collaborator

Max template date flag added in e0cfd70.

I am going to close this issue as everything reported in here has been resolved. Thanks everyone for chiming in. Summary:

  • d6b06d6: template search is now up to 100x faster in the post-Hmmsearch stage.
  • 1942639: you can run template-free by setting templates to [] while keeping unpairedMsa and pairedMsa unset.
  • e0cfd70: you can control max template date using the --max_template_date flag, just like in AlphaFold 2.

Happy folding -- be it with or without templates. :)

@Augustin-Zidek
Copy link
Collaborator

Sorry, I missed on case (MSA set to empty, templates unset). Fix in 4bebfb0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants