-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix regressions introduced by PR #359 "Flexible locations for data served by thredds" #478
Conversation
…ved by thredds" In PR #359: `secure-thredds/config/magpie/permissions.cfg` started to use variable but was never renamed to `.template` so those variable never get template expanded (commit 317d96c). `bootstrap-testdata` default value was removed but did not source `read-configs.include.sh` so the variable stayed blank (commit 4ab0fc7). The default value was there initially so the script can be used in standalone situation (not inside a checkout).
… for data served by thredds"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I just left one explanatory comment and there's one small change request.
birdhouse/scripts/bootstrap-testdata
Outdated
DATASET_ROOT="${BIRDHOUSE_DATA_PERSIST_ROOT}/${THREDDS_SERVICE_DATA_LOCATION_ON_HOST}" | ||
# Default for when unable to source read-configs.include.sh (ie when | ||
# used standalone outside of the checkout). | ||
DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:=/data/datasets}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be :-
instead of :=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will set THREDDS_SERVICE_DATA_LOCATION_ON_HOST
if undefined, and then set DATASET_ROOT
with it, which will result in the same behavior in this case.
Since the rest of the script doesn't rely on THREDDS_SERVICE_DATA_LOCATION_ON_HOST
, not a big deal, but we can use :-
since it is more common to indicate the default value.
However, maybe consider BIRDHOUSE_DATA_PERSIST_ROOT
as well:
DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:-${BIRDHOUSE_DATA_PERSIST_ROOT:-/data}/datasets}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I started to use :=
instead of :-
to actually set THREDDS_SERVICE_DATA_LOCATION_ON_HOST
because had the sourcing of read-configs.include.sh
worked, that THREDDS_SERVICE_DATA_LOCATION_ON_HOST
would also exist for real.
I was trying to emulate the successful sourcing of read-configs.include.sh
even in standalone mode, but it was not needed since we don't use THREDDS_SERVICE_DATA_LOCATION_ON_HOST
afterwards. I can revert to :-
no problem.
@@ -10,8 +10,22 @@ | |||
# Need write-access to DATASET_ROOT (/data/datasets/). | |||
|
|||
|
|||
THIS_FILE="$(readlink -f "$0" || realpath "$0")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferred way to do this is to use the birdhouse
executable which should be used as the interface for interacting with the stack (see #428).
birdhouse configs -c birdhouse/scripts/bootstrap-testdata
I know a lot of the other scripts do this for backwards compatibility reasons but going forward I would rather encourage using the birdhouse
executable.
I'm fine with this for the PR since it puts it in line with some of the other scripts in this directory but I'd like to move away from this strategy soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the scripts having that "just in case" they are called directly.
If omitted, a server using the old way would suddenly get some operations working and others not.
That being said, we should promote using birdhouse
binary rather than calling the scripts directly.
However, it needs to be improved (other PRs), since the above command actually executes the script, and in this specific case, that leads to downloading all the test data. So, not that great for a "print configs" command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmigneault I agree with everything you're saying.
I'm not sure I understand what you mean by:
So, not that great for a "print configs" command.
Do you mind clarifying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
birdhouse configs -c birdhouse/scripts/bootstrap-testdata
Oh thanks for the reminder, my habits still stayed pre 2.0 version !
going forward I would rather encourage using the
birdhouse
executable.
I think this is absolutely amazing for various external one-off scripts outside of this repo because then the user do not have to remember the boilerplate to source read-configs.include.sh
.
But for official scripts committed to this repo, I think I would prefer the current backward compatible behavior for the following reasons:
- backward-compatible, can still be used standalone (without
birdhouse
executable and even without being in a checkout). Historically all the scripts inscripts/
actually started as fully standalone scripts without the need to be in a checkout. I wrote them for my own quick usage, then I realized they could be useful for other, so I commit them in the repo for sharing. - when one script calls another script (ex: this
bootstrap-testdata
is actually being called bybootstrap-instance-for-testsuite
, the single entrypoint for any CI to prep the instance with test data so we do not need to update the CI steps if we change our testdata initialization process), then the script should be able to run "standalone".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mishaschwartz
I've tried doing birdhouse configs -c birdhouse/scripts/bootstrap-testdata
, and the operation proceeded to download test data, although birdhouse -h
indicates that configs
only "prints a command that can be used to load configuration". I assumed that means it should only print the configurable variables, but not run the script itself? Must be a side effect of the sourcing mechanism that runs the script anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with this more detailed help, I'm not sure what is expected 😅
What is the use of the -p
flag (load variant?) vs not having it (execute variant?) ? Shouldn't load be the default for a "configs" operation?
I'm somewhat expecting birdhouse configs
to behave like docker compose config
or birdhouse-compose config
. It does any relevant templating replacement and prints the resulting "state" of what would be run (or available variables), but doesn't actually run it.
Is that correct expectation?
If not, why would one use configs to run the script, rather than just running the script?
It would be clearer to have something like birdhouse run -c <script-path>
for that purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you have a minute, run birdhouse configs -h
and check out the examples at the bottom to see if that clears up some of your questions.
To try to answer some of them here:
What is the use of the -p flag (load variant?) vs not having it (execute variant?)
with -p
you can essentially source the environment variables into your current process. So for example, we could replace the code that we're discussing right now with this to get the same result:
eval $(birdhouse configs -p)
I'm somewhat expecting birdhouse configs to behave like docker compose config or birdhouse-compose config
You can do that with birdhouse compose config
and you'll get the result you're expecting.
Is that correct expectation?
Not really... think of it this way... we're solving two problems:
- with
-c
we can load all the environment variables and then run a command in a subprocess - with
-p
we can load all of the environment variables in the current process (like we were sourcing read-configs.include.sh)
If not, why would one use configs to run the script, rather than just running the script?
Because now the script doesn't need to require that all relevant environment variables are sourced. Also we can run arbitrary commands/scripts so it makes testing and script writing easier.
It would be clearer to have something like birdhouse run -c for that purpose.
That's currently what we can do with birdhouse configs -c <script-path>
. Are you saying that you'd prefer run
as a subcommand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I wasn't clear by saying "I'm somewhat expecting birdhouse configs to behave like docker compose config [...]". I understand the confusion I created.
What I actually meant was not literally the "same command", but "same behavior", as in, it evaluates the configs as if it would do its usual command, but stops just before and shows the "config" instead. In the case of birdhouse configs
, the "configs" would just print the list of resolved variable names and values. Doing a birdhouse compose ...
right after should resolve exactly the same name/values, but do the compose operation with them. Basically, I was expecting birdhouse configs
to be equivalent to a pip freeze
, or any other <tool> list
command, and adding the -c
meant "using this config file instead of the default" (rather than run this command). Basically, a "show me resolved values" command.
Maybe that could be a birdhouse configs --list
?
I've read the examples, and yeah, it's clearer to me what it does now. I guess the -c
is potentially misleading because it could do way more that just "load configs" depending on what the script contains (as it did in this case downloading test data). Maybe there needs to be more warnings about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, a "show me resolved values" command. Maybe that could be a birdhouse configs --list ?
Ah ok I see what you mean now. The only issue I see with implementing this is that all the configuration settings get loaded as environment variables. So I don't see a great way to correctly display all the resolved values without including all the other environment variables that exist on the system as well. You can just do:
birdhouse configs -c printenv
This also includes other environment variables that are used internally but may not have much relevance to the user such as DELAYED_EVAL
and VARS
.
If we prefixed all our environment variables with BIRDHOUSE_
then we could do it but as it stands, we don't have a clean way of implementing this.
If you can think of something I'm missing please let me know.
and adding the -c meant "using this config file instead of the default"
Are you thinking of the -e
or --env-file
flag here?
I guess the -c is potentially misleading because it could do way more that just "load configs" depending on what the script contains (as it did in this case downloading test data). Maybe there needs to be more warnings about that.
I mean the help string for this command says "Execute the given command after loading configuration settings", I'm not sure how else to warn the user that the command that they provide will be executed other than by saying "execute the given command"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge this PR. Maybe you guys can continue the conversation over #479?
birdhouse/optional-components/secure-thredds/config/magpie/permissions.cfg.template
Outdated
Show resolved
Hide resolved
@@ -10,8 +10,22 @@ | |||
# Need write-access to DATASET_ROOT (/data/datasets/). | |||
|
|||
|
|||
THIS_FILE="$(readlink -f "$0" || realpath "$0")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the scripts having that "just in case" they are called directly.
If omitted, a server using the old way would suddenly get some operations working and others not.
That being said, we should promote using birdhouse
binary rather than calling the scripts directly.
However, it needs to be improved (other PRs), since the above command actually executes the script, and in this specific case, that leads to downloading all the test data. So, not that great for a "print configs" command.
birdhouse/scripts/bootstrap-testdata
Outdated
DATASET_ROOT="${BIRDHOUSE_DATA_PERSIST_ROOT}/${THREDDS_SERVICE_DATA_LOCATION_ON_HOST}" | ||
# Default for when unable to source read-configs.include.sh (ie when | ||
# used standalone outside of the checkout). | ||
DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:=/data/datasets}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will set THREDDS_SERVICE_DATA_LOCATION_ON_HOST
if undefined, and then set DATASET_ROOT
with it, which will result in the same behavior in this case.
Since the rest of the script doesn't rely on THREDDS_SERVICE_DATA_LOCATION_ON_HOST
, not a big deal, but we can use :-
since it is more common to indicate the default value.
However, maybe consider BIRDHOUSE_DATA_PERSIST_ROOT
as well:
DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:-${BIRDHOUSE_DATA_PERSIST_ROOT:-/data}/datasets}"
## Overview Update description of the `configs` subcommand to better describe it. The description when calling `bin/birdhouse -h` now matches the description when calling `bin/birdhouse configs -h` ## Changes **Non-breaking changes** help string update **Breaking changes** None ## Related Issue / Discussion #478 (comment) ## Additional Information Links to other issues or sources. ## CI Operations <!-- The test suite can be run using a different DACCS config with ``birdhouse_daccs_configs_branch: branch_name`` in the PR description. To globally skip the test suite regardless of the commit message use ``birdhouse_skip_ci`` set to ``true`` in the PR description. Using ``[<cmd>]`` (with the brackets) where ``<cmd> = skip ci`` in the commit message will override ``birdhouse_skip_ci`` from the PR description. Such commit command can be used to override the PR description behavior for a specific commit update. However, a commit message cannot 'force run' a PR which the description turns off the CI. To run the CI, the PR should instead be updated with a ``true`` value, and a running message can be posted in following PR comments to trigger tests once again. --> birdhouse_daccs_configs_branch: master birdhouse_skip_ci: false
In PR #359:
secure-thredds/config/magpie/permissions.cfg
started to use variable but was never renamed to.template
so those variable never get template expanded (commit 317d96c3).bootstrap-testdata
default value was removed but did not sourceread-configs.include.sh
so the variable stayed blank (commit 4ab0fc74). The default value was there initially so the script can be used in standalone situation (not inside a checkout).Overview
Please include a summary of the changes and which issues are fixed.
Please also include relevant motivation and context.
List any dependencies that are required for this change.
Changes
Non-breaking changes
Breaking changes
Related Issue / Discussion
Additional Information
Links to other issues or sources.
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false