datashader does not handle well nans in ocean plots #12

lzampier · 2024-11-25T13:35:44Z

What happened?

When using the new default configuration in anemoi-training, which has datashader: True, the colorbar bounds in the callback plots are wrongly set. This is seemingly due to the presence of nans in the field.

Here is an example of the wrong plots:

The result is ok without datashader.

What are the steps to reproduce the bug?

Here is an ORAS6-based config for anemoi-training that can be used to reproduce the issue. Please ask @lzampier if you need more details.

# anemoi-training: develop
# anemoi-models: feature/mask-bounding-dependent-ice-variables

defaults:
- data: mod_oce_for_atm
- dataloader: native_grid
- diagnostics: evaluation
- hardware: atos
- graph: encoder_decoder_only 
- model: transformer 
- training: default
- _self_


### This file is for local experimentation.
##  When you commit your changes, assign the new features and keywords
##  to the correct defaults.
# For example to change from default GPU count:
# hardware:
#   num_gpus_per_node: 1

data:
  resolution: o96
  normalizer:
    min-max: [avg_sivol, avg_siconc, avg_icesalt, avg_sialb, avg_siue, avg_sivn, avg_snvol]
    max:
    none:
    - cos_latitude
    - sin_latitude
    - cos_longitude
    - sin_longitude
    - cos_solar_zenith_angle
    - cos_julian_day
    - cos_local_time
    - sin_julian_day
    - sin_local_time
  frequency: 6h
  timestep: 24h
  diagnostic:
  forcing:
  - cos_latitude
  - sin_latitude
  - cos_longitude
  - sin_longitude
  - cos_solar_zenith_angle
  - cos_julian_day
  - cos_local_time
  - sin_julian_day
  - sin_local_time
  - 10u
  - 10v
  - 2t
  - 2d
  - ssrd
  - strd
  - tp
  - msl
  - lsm
  imputer:
    mean:
      - avg_zos
      - avg_tos
      - avg_sos
      - avg_svn
      - avg_sve
  const_imputer:
    0:
      - avg_sivol
      - avg_siconc
      - avg_icesalt
      - avg_sialb
      - avg_siue
      - avg_sivn
      - avg_snvol

hardware:
  paths:
    data: /home/mlx/ai-ml/datasets/
  files:
    dataset_atm: aifs-ea-an-oper-0001-mars-${data.resolution}-1979-2023-6h-v7.zarr
    dataset_oce: aifs-o6-tpa-ocda-0001-mars-${data.resolution}-2005-2023-6h-v2-ocean-surface-sea-ice.zarr
diagnostics:
  log:
    mlflow:
      enabled: True
      offline: False
      authentication: True
      experiment_name: 'coupled-ocean-atmos'
      run_name: 'mod:oce - for:atm - 24h - 2005-2021'
      
model:
  num_channels: 256
  bounding: #These are applied in order
    - _target_: anemoi.models.layers.bounding.ReluBounding #[0, infinity)
      variables:
        - avg_sivol
        - avg_snvol
        - avg_icesalt
    - _target_: anemoi.models.layers.bounding.HardtanhBounding #[0, 1]
      variables:
        - avg_siconc
        - avg_sialb
      min_val: 0
      max_val: 1

dataloader:
  limit_batches:
    training: 300
    validation: 300
  dataset: ${hardware.paths.data}/${hardware.files.dataset_oce}
  training:
    dataset:
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_oce}
      start: 2005-01-03
      end: 2021
      select: [avg_svn, avg_sve, avg_siue, avg_sivn, avg_sivol, avg_snvol, avg_siconc, avg_icesalt, avg_sialb, avg_tos, avg_sos, avg_zos, cos_latitude, sin_latitude, cos_longitude, sin_longitude, cos_solar_zenith_angle, cos_julian_day, cos_local_time, sin_julian_day, sin_local_time, lsm]
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_atm}
      start: 2005-01-03
      end: 2021
      select: [10u, 10v, 2t, 2d, ssrd, strd, tp, msl]
    start: 2005-01-03
    end: 2021

  validation:
    dataset:
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_oce}
      start: 2022
      end: 2022
      select: [avg_svn, avg_sve, avg_siue, avg_sivn, avg_sivol, avg_snvol, avg_siconc, avg_icesalt, avg_sialb, avg_tos, avg_sos, avg_zos, cos_latitude, sin_latitude, cos_longitude, sin_longitude, cos_solar_zenith_angle, cos_julian_day, cos_local_time, sin_julian_day, sin_local_time, lsm]
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_atm}
      start: 2022
      end: 2022
      select: [10u, 10v, 2t, 2d, ssrd, strd, tp, msl]
    start: 2022
    end: 2022
      
  test:
    dataset:
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_oce}
      start: 2023
      end: 2023-12-28
      select: [avg_svn, avg_sve, avg_siue, avg_sivn, avg_sivol, avg_snvol, avg_siconc, avg_icesalt, avg_sialb, avg_tos, avg_sos, avg_zos, cos_latitude, sin_latitude, cos_longitude, sin_longitude, cos_solar_zenith_angle, cos_julian_day, cos_local_time, sin_julian_day, sin_local_time, lsm]
    - dataset: ${hardware.paths.data}/${hardware.files.dataset_atm}
      start: 2023
      end: 2023-12-28
      select: [10u, 10v, 2t, 2d, ssrd, strd, tp, msl]
    start: 2023
    end: 2023

training:
  max_steps: 150000
  lr:
    iterations: 150000 #${training.max_steps}
    min: 3e-7 #Not scaled by #GPU
  variable_loss_scaling:
    default: 1
    sfc:
      avg_svn: 0.5
      avg_sve: 0.5
      avg_siue: 100
      avg_sivn: 100
      avg_sivol: 500
      avg_snvol: 300
      avg_siconc: 200
      avg_icesalt: 30
      avg_sialb: 30
      avg_tos: 100
      avg_sos: 10
      avg_zos: 10
  metrics:
      - avg_zos
      - avg_tos
      - avg_sivol
      - avg_siconc
      - avg_sos
      - avg_svn
      - avg_sve
      - avg_icesalt
      - avg_sialb
      - avg_siue
      - avg_sivn
      - avg_snvol
  # rollout:
  #   epoch_increment: 1
  #   max: 4

Version

current anemoi-training develop (25-11-2024)

Platform (OS and architecture)

atos

Relevant log output

No response

Accompanying data

No response

Organisation

No response

The text was updated successfully, but these errors were encountered:

sahahner · 2024-11-27T12:38:15Z

solved by ecmwf/anemoi-training#152

lzampier · 2024-12-04T12:50:36Z

The error with colour bars comes back, this time only for the last two columns of the panel:

* feat: add configurability to dropout in MultiHeadSelfAttention Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> * test: adjust to dropout_p * doc: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * [pre-commit.ci] pre-commit autoupdate (#25) updates: - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2) - [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog --------- Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

lzampier added the bug Something isn't working label Nov 25, 2024

lzampier assigned anaprietonem Nov 25, 2024

sahahner linked a pull request Nov 27, 2024 that will close this issue

162 datashader does not handle well nans in ocean plots ecmwf/anemoi-training#170

Closed

sahahner closed this as completed Nov 27, 2024

lzampier reopened this Dec 4, 2024

anaprietonem mentioned this issue Dec 19, 2024

Exclude nans from increment and persistent error color bars ecmwf/anemoi-training#208

Draft

JesperDramsch added the training label Dec 19, 2024

JesperDramsch transferred this issue from ecmwf/anemoi-training Dec 19, 2024

anaprietonem mentioned this issue Jan 6, 2025

fix(training, plots) Exclude nans from error colorbars #59

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datashader does not handle well nans in ocean plots #12

datashader does not handle well nans in ocean plots #12

lzampier commented Nov 25, 2024

sahahner commented Nov 27, 2024

lzampier commented Dec 4, 2024

datashader does not handle well nans in ocean plots #12

datashader does not handle well nans in ocean plots #12

Comments

lzampier commented Nov 25, 2024

What happened?

What are the steps to reproduce the bug?

Version

Platform (OS and architecture)

Relevant log output

Accompanying data

Organisation

sahahner commented Nov 27, 2024

lzampier commented Dec 4, 2024