Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models: GraphtransformerProcessor chunking #66

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

japols
Copy link
Member

@japols japols commented Jan 8, 2025

Describe your changes

This PR adds chunking for the GraphTransformerProcessorBlock to reduce memory usage in inference. The functionality is equivalent to the GraphTransformerMapperBlock chunking and uses the same env variable ANEMOI_INFERENCE_NUM_CHUNKS to control chunking behaviour.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist before requesting a review

  • I have performed a self-review of my code
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation and docstrings to reflect the changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have ensured that the code is still pip-installable after the changes and runs
  • I have not introduced new dependencies in the inference partion of the model
  • I have ran this on single GPU
  • I have ran this on multi-GPU or multi-node
  • I have ran this to work on LUMI (or made sure the changes work independently.)
  • I have ran the Benchmark Profiler against the old version of the code

Tag possible reviewers

@ssmmnn11 @gabrieloks

@japols japols self-assigned this Jan 8, 2025
@ssmmnn11
Copy link
Member

ssmmnn11 commented Jan 8, 2025

Hi Jan, thank you for adding this, very nice. Looking at the code it seems that some parts do the same but look slightly different. I was wondering if this would also be a good opportunity to reduce code duplication between GraphTransformerProcessorBlock and GraphTransformerMapperBlock. Maybe encapsulated in a common routine? I think differences are

dim of x_skip
providing size (I think that goes in the block though, ... need to check)
and the part of updating the source nodes.

diff1

@japols
Copy link
Member Author

japols commented Jan 10, 2025

Hi Jan, thank you for adding this, very nice. Looking at the code it seems that some parts do the same but look slightly different. I was wondering if this would also be a good opportunity to reduce code duplication between GraphTransformerProcessorBlock and GraphTransformerMapperBlock. Maybe encapsulated in a common routine? I think differences are

dim of x_skip providing size (I think that goes in the block though, ... need to check) and the part of updating the source nodes.

I moved the common "attention" part to the GraphTransformerBaseBlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants