-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --end-year YYYY to postprocessing #210
Comments
Thanks for organizing this! The forward-looking approach that does not need the "-t YYYY" clue in order for the workflow to know that the year's YYYY output is ready for postprocessing is even easier than how you had it. The Cylc workflow itself can know whether the history files are available, so strictly speaking it should not need the wrapper to tell it anything about when new history files arrive. Custom (task) triggering functions is this Cylc feature: We don't yet use it in fre-workflows, but the pp-shield prototype cylc template does use it. e.g. (https://gitlab.gfdl.noaa.gov/fre2/workflows/pp-shield/-/blob/main/include/shield/shield.cylc?ref_type=heads#L32)
The "is_history_complete" trigger there is run continuously by the Cylc scheduler, and when it passes for a certain cycle point (year in the PP context), the So in an ideal world, we can totally outsource the history file present logic to Cylc. But until we master the cylc triggers, we'll want the ability to have the wrapper tell the workflow to start a particular year (cycle point) of postprocessing. |
I'd like to make clear that we don't need the is_history_complete trigger either - the current just-enough-functionality plan for fre pp run triggers a couple possible cylc commands:
The minimal functionality to get postprocessing updated with new data is more like the following:
see: https://github.com/NOAA-GFDL/fre-cli/tree/main/fre/pp#readme |
After talking with Chris, this functionality seems to come in 3 stages:
|
Yes, agreed. Cylc external triggers are the ultimate solution to workflows knowing when input data is available, and regular interactive Cylc task control ( |
Prior versions of fre relied on specifying the last (and sometimes first) years of post-processed data to control how many years of data were processed at once. This is how fre dealt with chunks of history files being copied over from wherever the model was running; the postprocessing syntax looked a bit like this:
where the large pause in between successive calls to bronx-or-earlier's wrapper equivalent gave time for new files to be transferred over to the pp nodes and fre was smart enough to know that prior years were post-processed and only the data in range ($year-1) - $year needed to be processed with this call.
This functionality is not present in the fre-cli codebase, and if we want to maintain backwards-comptaibility on this particular command we'd need to change our command-line options. However, we may NOT need it - canopy is capable of pausing jobs and running again when new data is present. The logic flow for that would look more like this:
Whether or not we implement this is going to depend a lot on whether the users miss this functionality - but for now, it's improvement to remember that this functionality is NOT present in fre-cli.
The text was updated successfully, but these errors were encountered: