Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datashader does not handle well nans in ocean plots #12

Open
lzampier opened this issue Nov 25, 2024 · 2 comments
Open

datashader does not handle well nans in ocean plots #12

lzampier opened this issue Nov 25, 2024 · 2 comments
Assignees
Labels
bug Something isn't working training

Comments

@lzampier
Copy link
Member

What happened?

When using the new default configuration in anemoi-training, which has datashader: True, the colorbar bounds in the callback plots are wrongly set. This is seemingly due to the presence of nans in the field.

Here is an example of the wrong plots:
Screenshot 2024-11-25 at 14 29 05

The result is ok without datashader.

What are the steps to reproduce the bug?

Here is an ORAS6-based config for anemoi-training that can be used to reproduce the issue. Please ask @lzampier if you need more details.

# anemoi-training: develop
# anemoi-models: feature/mask-bounding-dependent-ice-variables

defaults:
- data: mod_oce_for_atm
- dataloader: native_grid
- diagnostics: evaluation
- hardware: atos
- graph: encoder_decoder_only 
- model: transformer 
- training: default
- _self_


### This file is for local experimentation.
##  When you commit your changes, assign the new features and keywords
##  to the correct defaults.
# For example to change from default GPU count:
# hardware:
#   num_gpus_per_node: 1

data:
  resolution: o96
  normalizer:
    min-max: [avg_sivol, avg_siconc, avg_icesalt, avg_sialb, avg_siue, avg_sivn, avg_snvol]
    max:
    none:
    - cos_latitude
    - sin_latitude
    - cos_longitude
    - sin_longitude
    - cos_solar_zenith_angle
    - cos_julian_day
    - cos_local_time
    - sin_julian_day
    - sin_local_time
  frequency: 6h
  timestep: 24h
  diagnostic:
  forcing:
  - cos_latitude
  - sin_latitude
  - cos_longitude
  - sin_longitude
  - cos_solar_zenith_angle
  - cos_julian_day
  - cos_local_time
  - sin_julian_day
  - sin_local_time
  - 10u
  - 10v
  - 2t
  - 2d
  - ssrd
  - strd
  - tp
  - msl
  - lsm
  imputer:
    mean:
      - avg_zos
      - avg_tos
      - avg_sos
      - avg_svn
      - avg_sve
  const_imputer:
    0:
      - avg_sivol
      - avg_siconc
      - avg_icesalt
      - avg_sialb
      - avg_siue
      - avg_sivn
      - avg_snvol

hardware:
  paths:
    data: /home/mlx/ai-ml/datasets/
  files:
    dataset_atm: aifs-ea-an-oper-0001-mars-${data.resolution}-1979-2023-6h-v7.zarr
    dataset_oce: aifs-o6-tpa-ocda-0001-mars-${data.resolution}-2005-2023-6h-v2-ocean-surface-sea-ice.zarr
diagnostics:
  log:
    mlflow:
      enabled: True
      offline: False
      authentication: True
      experiment_name: 'coupled-ocean-atmos'
      run_name: 'mod:oce - for:atm - 24h - 2005-2021'
      
model:
  num_channels: 256
  bounding: #These are applied in order
    - _target_: anemoi.models.layers.bounding.ReluBounding #[0, infinity)
      variables:
        - avg_sivol
        - avg_snvol
        - avg_icesalt
    - _target_: anemoi.models.layers.bounding.HardtanhBounding #[0, 1]
      variables:
        - avg_siconc
        - avg_sialb
      min_val: 0
      max_val: 1

dataloader:
  limit_batches:
    training: 300
    validation: 300
  dataset: ${hardware.paths.data}/${hardware.files.dataset_oce}
  training:
    dataset:
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_oce}
      start: 2005-01-03
      end: 2021
      select: [avg_svn, avg_sve, avg_siue, avg_sivn, avg_sivol, avg_snvol, avg_siconc, avg_icesalt, avg_sialb, avg_tos, avg_sos, avg_zos, cos_latitude, sin_latitude, cos_longitude, sin_longitude, cos_solar_zenith_angle, cos_julian_day, cos_local_time, sin_julian_day, sin_local_time, lsm]
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_atm}
      start: 2005-01-03
      end: 2021
      select: [10u, 10v, 2t, 2d, ssrd, strd, tp, msl]
    start: 2005-01-03
    end: 2021

  validation:
    dataset:
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_oce}
      start: 2022
      end: 2022
      select: [avg_svn, avg_sve, avg_siue, avg_sivn, avg_sivol, avg_snvol, avg_siconc, avg_icesalt, avg_sialb, avg_tos, avg_sos, avg_zos, cos_latitude, sin_latitude, cos_longitude, sin_longitude, cos_solar_zenith_angle, cos_julian_day, cos_local_time, sin_julian_day, sin_local_time, lsm]
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_atm}
      start: 2022
      end: 2022
      select: [10u, 10v, 2t, 2d, ssrd, strd, tp, msl]
    start: 2022
    end: 2022
      
  test:
    dataset:
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_oce}
      start: 2023
      end: 2023-12-28
      select: [avg_svn, avg_sve, avg_siue, avg_sivn, avg_sivol, avg_snvol, avg_siconc, avg_icesalt, avg_sialb, avg_tos, avg_sos, avg_zos, cos_latitude, sin_latitude, cos_longitude, sin_longitude, cos_solar_zenith_angle, cos_julian_day, cos_local_time, sin_julian_day, sin_local_time, lsm]
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_atm}
      start: 2023
      end: 2023-12-28
      select: [10u, 10v, 2t, 2d, ssrd, strd, tp, msl]
    start: 2023
    end: 2023

training:
  max_steps: 150000
  lr:
    iterations: 150000 #${training.max_steps}
    min: 3e-7 #Not scaled by #GPU
  variable_loss_scaling:
    default: 1
    sfc:
      avg_svn: 0.5
      avg_sve: 0.5
      avg_siue: 100
      avg_sivn: 100
      avg_sivol: 500
      avg_snvol: 300
      avg_siconc: 200
      avg_icesalt: 30
      avg_sialb: 30
      avg_tos: 100
      avg_sos: 10
      avg_zos: 10
  metrics:
      - avg_zos
      - avg_tos
      - avg_sivol
      - avg_siconc
      - avg_sos
      - avg_svn
      - avg_sve
      - avg_icesalt
      - avg_sialb
      - avg_siue
      - avg_sivn
      - avg_snvol
  # rollout:
  #   epoch_increment: 1
  #   max: 4

Version

current anemoi-training develop (25-11-2024)

Platform (OS and architecture)

atos

Relevant log output

No response

Accompanying data

No response

Organisation

No response

@lzampier lzampier added the bug Something isn't working label Nov 25, 2024
@sahahner
Copy link
Member

solved by ecmwf/anemoi-training#152

@lzampier
Copy link
Member Author

lzampier commented Dec 4, 2024

The error with colour bars comes back, this time only for the last two columns of the panel:

gnn_pred_val_sample_rstep00_batch0000_rank0_epoch000

@lzampier lzampier reopened this Dec 4, 2024
@JesperDramsch JesperDramsch transferred this issue from ecmwf/anemoi-training Dec 19, 2024
theissenhelen added a commit that referenced this issue Jan 6, 2025
* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
theissenhelen added a commit that referenced this issue Jan 6, 2025
* feat: add configurability to dropout in MultiHeadSelfAttention

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>

* test: adjust to dropout_p

* doc: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* [pre-commit.ci] pre-commit autoupdate (#25)

updates:
- [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0)
- [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2)
- [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

* Feature/integrate reusable workflows (#16)

* ci: add public pr label

* ci: add readthedocs update check

* ci: add downstream ci

* ci: add ci-config

* chore(deps): remove unused dependency

* docs: update changelog

* ci: switch to main

* chore: changelog 0.2.1

* Update error messages from invalid sub_graph in model instantiation (#20)

* ci: inherit pypi publish flow (#17)

* ci: inherit pypi publish flow

Co-authored-by: Helen Theissen <[email protected]>

* docs: add to changelog

* fix: typo in reusable workflow

* fix: another typo

* chore: bump actions/setup-python to v5

* ci: run downstream-ci for changes in src and tests

* docs: update changelog

---------

Co-authored-by: Helen Theissen <[email protected]>

* Update CHANGELOG.md to KeepChangelog format

* Ci/changelog-release-updater (#26)

* ci: add changelof release updater

* docs: update changelog

---------

Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]>
Co-authored-by: Gert Mertes <[email protected]>
Co-authored-by: Mario Santa Cruz <[email protected]>
Co-authored-by: Jesper Dramsch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants