Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first cut at OMZ notebook, with cached intermediate results #43

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

klindsay28
Copy link
Contributor

PR is intended for initial discussion only, do not merge

questions/issues to consider/resolve:

  1. Do cache files belong in this repo? If not, then where do the go, in order to be useful to someone cloning the repo? This highlights tension between this repo being a stepping stone towards a general package vs a repo targeted at analysis of the hires MARBL runs.
  2. figure generation should have save_pngs option
  3. dask cluster/client gets instantiated even if all computations are cached. This is a bottleneck if casper is busy. It might be preferable to only instantiate cluster only if results are not available in cache

    propagate_coord_bounds:
        propagate coordinate bounds from ds_in to ds
    clean_ds:
        prep Dataset for saving to a netCDF file
    gen_hash:
        generate a deterministic hash of obj
rename var_lt_thres_area_sum->var_lt_thres_weight_sum

var_lt_thres_weight_sum related
    rename mask->weight
    include area factor in weight generation
    incorporate effect of masking into weight generation
    mv addition of region dim outside of var_lt_thres_weight_sum
    incorporate effect of region dim into weight generation
    add sum_dims argument to var_lt_thres_weight_sum
        instead of hard-coding them
    cache related
        use utils fcns propagate_coord_bounds, to_netcdf_prep when writing cache file
        rm cache from repo, relocate cache
        add hash of some var_lt_thres_weight_sum arguments to cache file
        add sum_dims to cache filename

use cluster.adapt instead of cluster.scale
    avoids starting up large resource if not needed, say if results are cached

add text cells describing notebook workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant