Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix regressions introduced by PR #359 "Flexible locations for data served by thredds" #478

Merged
merged 6 commits into from
Nov 22, 2024

Conversation

tlvu
Copy link
Collaborator

@tlvu tlvu commented Nov 21, 2024

In PR #359:

secure-thredds/config/magpie/permissions.cfg started to use variable but was never renamed to .template so those variable never get template expanded (commit 317d96c3).

bootstrap-testdata default value was removed but did not source read-configs.include.sh so the variable stayed blank (commit 4ab0fc74). The default value was there initially so the script can be used in standalone situation (not inside a checkout).

Overview

Please include a summary of the changes and which issues are fixed.

Please also include relevant motivation and context.

List any dependencies that are required for this change.

Changes

Non-breaking changes

  • Adds...
  • New component version X:1.2.3

Breaking changes

  • New component version Y:2.0.0
    • Requires a new environment variable. See the following link.

Related Issue / Discussion

Additional Information

Links to other issues or sources.

  • Things to do...

CI Operations

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false

…ved by thredds"

In PR #359:

`secure-thredds/config/magpie/permissions.cfg` started to use variable
but was never renamed to `.template` so those variable never get
template expanded (commit 317d96c).

`bootstrap-testdata` default value was removed but did not source
`read-configs.include.sh` so the variable stayed blank (commit
4ab0fc7).  The default value was there
initially so the script can be used in standalone situation (not inside a
checkout).
@github-actions github-actions bot added ci/deployment Related to deployment utilities and scripts component/magpie Related to https://github.com/Ouranosinc/Magpie labels Nov 21, 2024
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 21, 2024
@tlvu tlvu changed the title fix regression introduced by PR #359 "Flexible locations for data served by thredds" fix regressions introduced by PR #359 "Flexible locations for data served by thredds" Nov 21, 2024
Copy link
Collaborator

@mishaschwartz mishaschwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I just left one explanatory comment and there's one small change request.

DATASET_ROOT="${BIRDHOUSE_DATA_PERSIST_ROOT}/${THREDDS_SERVICE_DATA_LOCATION_ON_HOST}"
# Default for when unable to source read-configs.include.sh (ie when
# used standalone outside of the checkout).
DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:=/data/datasets}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be :- instead of :=

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will set THREDDS_SERVICE_DATA_LOCATION_ON_HOST if undefined, and then set DATASET_ROOT with it, which will result in the same behavior in this case.

Since the rest of the script doesn't rely on THREDDS_SERVICE_DATA_LOCATION_ON_HOST, not a big deal, but we can use :- since it is more common to indicate the default value.

However, maybe consider BIRDHOUSE_DATA_PERSIST_ROOT as well:

DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:-${BIRDHOUSE_DATA_PERSIST_ROOT:-/data}/datasets}"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I started to use := instead of :- to actually set THREDDS_SERVICE_DATA_LOCATION_ON_HOST because had the sourcing of read-configs.include.sh worked, that THREDDS_SERVICE_DATA_LOCATION_ON_HOST would also exist for real.

I was trying to emulate the successful sourcing of read-configs.include.sh even in standalone mode, but it was not needed since we don't use THREDDS_SERVICE_DATA_LOCATION_ON_HOST afterwards. I can revert to :- no problem.

@@ -10,8 +10,22 @@
# Need write-access to DATASET_ROOT (/data/datasets/).


THIS_FILE="$(readlink -f "$0" || realpath "$0")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred way to do this is to use the birdhouse executable which should be used as the interface for interacting with the stack (see #428).

birdhouse configs -c birdhouse/scripts/bootstrap-testdata

I know a lot of the other scripts do this for backwards compatibility reasons but going forward I would rather encourage using the birdhouse executable.

I'm fine with this for the PR since it puts it in line with some of the other scripts in this directory but I'd like to move away from this strategy soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the scripts having that "just in case" they are called directly.
If omitted, a server using the old way would suddenly get some operations working and others not.

That being said, we should promote using birdhouse binary rather than calling the scripts directly.
However, it needs to be improved (other PRs), since the above command actually executes the script, and in this specific case, that leads to downloading all the test data. So, not that great for a "print configs" command.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmigneault I agree with everything you're saying.

I'm not sure I understand what you mean by:

So, not that great for a "print configs" command.

Do you mind clarifying

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

birdhouse configs -c birdhouse/scripts/bootstrap-testdata

Oh thanks for the reminder, my habits still stayed pre 2.0 version !

going forward I would rather encourage using the birdhouse executable.

I think this is absolutely amazing for various external one-off scripts outside of this repo because then the user do not have to remember the boilerplate to source read-configs.include.sh.

But for official scripts committed to this repo, I think I would prefer the current backward compatible behavior for the following reasons:

  • backward-compatible, can still be used standalone (without birdhouse executable and even without being in a checkout). Historically all the scripts in scripts/ actually started as fully standalone scripts without the need to be in a checkout. I wrote them for my own quick usage, then I realized they could be useful for other, so I commit them in the repo for sharing.
  • when one script calls another script (ex: this bootstrap-testdata is actually being called by bootstrap-instance-for-testsuite, the single entrypoint for any CI to prep the instance with test data so we do not need to update the CI steps if we change our testdata initialization process), then the script should be able to run "standalone".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mishaschwartz
I've tried doing birdhouse configs -c birdhouse/scripts/bootstrap-testdata, and the operation proceeded to download test data, although birdhouse -h indicates that configs only "prints a command that can be used to load configuration". I assumed that means it should only print the configurable variables, but not run the script itself? Must be a side effect of the sourcing mechanism that runs the script anyway.

Copy link
Collaborator

@fmigneault fmigneault Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with this more detailed help, I'm not sure what is expected 😅
What is the use of the -p flag (load variant?) vs not having it (execute variant?) ? Shouldn't load be the default for a "configs" operation?

I'm somewhat expecting birdhouse configs to behave like docker compose config or birdhouse-compose config. It does any relevant templating replacement and prints the resulting "state" of what would be run (or available variables), but doesn't actually run it.

Is that correct expectation?
If not, why would one use configs to run the script, rather than just running the script?
It would be clearer to have something like birdhouse run -c <script-path> for that purpose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have a minute, run birdhouse configs -h and check out the examples at the bottom to see if that clears up some of your questions.

To try to answer some of them here:

What is the use of the -p flag (load variant?) vs not having it (execute variant?)

with -p you can essentially source the environment variables into your current process. So for example, we could replace the code that we're discussing right now with this to get the same result:

eval $(birdhouse configs -p)

I'm somewhat expecting birdhouse configs to behave like docker compose config or birdhouse-compose config

You can do that with birdhouse compose config and you'll get the result you're expecting.

Is that correct expectation?

Not really... think of it this way... we're solving two problems:

  • with -c we can load all the environment variables and then run a command in a subprocess
  • with -p we can load all of the environment variables in the current process (like we were sourcing read-configs.include.sh)

If not, why would one use configs to run the script, rather than just running the script?

Because now the script doesn't need to require that all relevant environment variables are sourced. Also we can run arbitrary commands/scripts so it makes testing and script writing easier.

It would be clearer to have something like birdhouse run -c for that purpose.

That's currently what we can do with birdhouse configs -c <script-path>. Are you saying that you'd prefer run as a subcommand?

Copy link
Collaborator

@fmigneault fmigneault Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I wasn't clear by saying "I'm somewhat expecting birdhouse configs to behave like docker compose config [...]". I understand the confusion I created.

What I actually meant was not literally the "same command", but "same behavior", as in, it evaluates the configs as if it would do its usual command, but stops just before and shows the "config" instead. In the case of birdhouse configs, the "configs" would just print the list of resolved variable names and values. Doing a birdhouse compose ... right after should resolve exactly the same name/values, but do the compose operation with them. Basically, I was expecting birdhouse configs to be equivalent to a pip freeze, or any other <tool> list command, and adding the -c meant "using this config file instead of the default" (rather than run this command). Basically, a "show me resolved values" command.
Maybe that could be a birdhouse configs --list ?

I've read the examples, and yeah, it's clearer to me what it does now. I guess the -c is potentially misleading because it could do way more that just "load configs" depending on what the script contains (as it did in this case downloading test data). Maybe there needs to be more warnings about that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, a "show me resolved values" command. Maybe that could be a birdhouse configs --list ?

Ah ok I see what you mean now. The only issue I see with implementing this is that all the configuration settings get loaded as environment variables. So I don't see a great way to correctly display all the resolved values without including all the other environment variables that exist on the system as well. You can just do:

birdhouse configs -c printenv

This also includes other environment variables that are used internally but may not have much relevance to the user such as DELAYED_EVAL and VARS.

If we prefixed all our environment variables with BIRDHOUSE_ then we could do it but as it stands, we don't have a clean way of implementing this.

If you can think of something I'm missing please let me know.

and adding the -c meant "using this config file instead of the default"

Are you thinking of the -e or --env-file flag here?

I guess the -c is potentially misleading because it could do way more that just "load configs" depending on what the script contains (as it did in this case downloading test data). Maybe there needs to be more warnings about that.

I mean the help string for this command says "Execute the given command after loading configuration settings", I'm not sure how else to warn the user that the command that they provide will be executed other than by saying "execute the given command"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge this PR. Maybe you guys can continue the conversation over #479?

@@ -10,8 +10,22 @@
# Need write-access to DATASET_ROOT (/data/datasets/).


THIS_FILE="$(readlink -f "$0" || realpath "$0")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the scripts having that "just in case" they are called directly.
If omitted, a server using the old way would suddenly get some operations working and others not.

That being said, we should promote using birdhouse binary rather than calling the scripts directly.
However, it needs to be improved (other PRs), since the above command actually executes the script, and in this specific case, that leads to downloading all the test data. So, not that great for a "print configs" command.

DATASET_ROOT="${BIRDHOUSE_DATA_PERSIST_ROOT}/${THREDDS_SERVICE_DATA_LOCATION_ON_HOST}"
# Default for when unable to source read-configs.include.sh (ie when
# used standalone outside of the checkout).
DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:=/data/datasets}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will set THREDDS_SERVICE_DATA_LOCATION_ON_HOST if undefined, and then set DATASET_ROOT with it, which will result in the same behavior in this case.

Since the rest of the script doesn't rely on THREDDS_SERVICE_DATA_LOCATION_ON_HOST, not a big deal, but we can use :- since it is more common to indicate the default value.

However, maybe consider BIRDHOUSE_DATA_PERSIST_ROOT as well:

DATASET_ROOT="${THREDDS_SERVICE_DATA_LOCATION_ON_HOST:-${BIRDHOUSE_DATA_PERSIST_ROOT:-/data}/datasets}"

@tlvu tlvu merged commit 8dcb5a3 into master Nov 22, 2024
3 of 5 checks passed
@tlvu tlvu deleted the fix-testing-helper branch November 22, 2024 15:58
@github-actions github-actions bot added the ci/operations Continuous Integration components label Nov 22, 2024
mishaschwartz added a commit that referenced this pull request Nov 26, 2024
## Overview

Update description of the `configs` subcommand to better describe it.
The description when calling `bin/birdhouse -h` now matches the
description when calling `bin/birdhouse configs -h`

## Changes

**Non-breaking changes**
help string update

**Breaking changes**
None

## Related Issue / Discussion


#478 (comment)

## Additional Information

Links to other issues or sources.

## CI Operations

<!--
The test suite can be run using a different DACCS config with
``birdhouse_daccs_configs_branch: branch_name`` in the PR description.
To globally skip the test suite regardless of the commit message use
``birdhouse_skip_ci`` set to ``true`` in the PR description.

Using ``[<cmd>]`` (with the brackets) where ``<cmd> = skip ci`` in the
commit message will override ``birdhouse_skip_ci`` from the PR
description.
Such commit command can be used to override the PR description behavior
for a specific commit update.
However, a commit message cannot 'force run' a PR which the description
turns off the CI.
To run the CI, the PR should instead be updated with a ``true`` value,
and a running message can be posted in following PR comments to trigger
tests once again.
-->

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/deployment Related to deployment utilities and scripts ci/operations Continuous Integration components component/magpie Related to https://github.com/Ouranosinc/Magpie documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants