-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node details overrides + logging utils #415
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the spirit of this PR. I like that we can probably use this strategy to address the issue raised in #414 as well.
I just have a few suggested changes:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to rename read-configs.include.sh
. The rest, OK with the spirit. I will have to deploy this one for testing.
@@ -16,15 +16,26 @@ | |||
# | |||
# # Source the script providing function read_configs. | |||
# # read_configs uses COMPOSE_DIR to find default.env and env.local. | |||
# . $COMPOSE_DIR/read-configs.include.sh | |||
# . ${COMPOSE_DIR}/scripts/read-configs.include.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not move this read-configs.include.sh
file to the scripts/
folder.
To me, this is another entrypoint to the platform (gives the value of various config vars, source other files like the new logging.include.sh
). It's at the same level as the pavics-compose.sh
file, we want it clearly stand-out.
It's not just scripts in the scripts/
folder that use it. Scripts under deployment/
folder using it to, as all scripts from all external repos that we do not have any controls.
It has no pavics
in its name, why rename it. This will force all external repos to update as well, fairly back-ward incompatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I understand the use case where read-configs.include.sh
would be used as entrypoint. As its name indicates, it is intended only to be included to provide utility functions used by other scripts, in the same way the new logging provides reporting utilities. Those should all be contained into some "utils" group.
Ideally, the file structure should probably be more like:
scripts
pavics-compose.sh
utils
logging.include.sh
read-configs.include.sh
....
deployment
...
Or even:
scripts
birdhouse (+x)
utils
logging.include.sh
read-configs.include.sh
....
run
pavics-compose.sh
deploy
...
With some kind of birdhouse run ...
, birdhouse deploy ...
CLI that sources/dispatches operations to relevant scripts.
Any external script (as in outside this repo) that directly sources read-configs.include.sh
is not using the code the way it is meant to be. External users should not interact directly with utilities intended to evolve to adapt the scripts with new capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @fmigneault that we really shouldn't be using the *.include.sh
scripts as entrypoints.
@tlvu do you have an example where you're currently using it as an entrypoint now? If you show us the example I'm sure we can help you figure out a more robust way of accomplishing the task without having to use read-configs.include.sh
as a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By entrypoint I do not mean *.include.sh
is executed directly.
I mean it is sort of the API for external scripts to interact with the the birdhouse platform.
All external scripts will source this .include.sh
file before doing pretty much anything. So the location of this file should not change. Also the location should be front and square so it is visible, like the pavics-compose.sh
script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All external scripts will source this .include.sh file before doing pretty much anything
Which external scripts are you talking about here? Can you give us an example please.
I'm having difficulty remembering which scripts that are not part of the birdhouse-deploy code you are running at Ouranos that would require sourcing the environment variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the top of my head, we have external scripts for batch creating and deleting demo users, backup and sync data between prod and staging system ...
I think there are no limits to which external script are allowed to read read-configs.include.sh
. The backup and sync script for example, needs to access MAGPIE_PERSIST_DIR
which is a delayed eval var to sync the DB to the staging system.
Many vars are useful to many external scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated to this PR, but is there a reason why those scripts haven't been added to this repo? It sounds like those would all be generally useful for anyone managing a birdhouse deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are quick scripts with lots of hardcode with hostnames and paths specific to us. If I have time I can "generalize" those. The easiest to generalize is probably our batch demo users creation and deletion script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about your batch create/delete user script. What does it do that https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/scripts/create-magpie-users cannot handle already? The backup/sync between systems resembles https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/scripts/sync-data.
Do your scripts perform other operations? Seems like nice additions to the repo if they add more capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My batch user creation/deletion use create-magpie-users
internally but will generate the magpie config.yml
file with the numbers of demo users we need to bad create with some preset (email start with [email protected] and all have the same password).
For the backup/sync, sync-data
is a full live sync, I wrote it when our new production machine went online. For incremental sync to staging, we had a daily backup of magpie DB that we simply replicate to the staging machine. We do not replicate the content of user jupyter data because that's big for nothing and we not the Geoserver data either because it rarely changes. So basically aiming for speed and we don't care if data is not complete, not up-to-date. Needs are different.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2369/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-node-details DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-90.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1486/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2484/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-node-details DACCS_CONFIGS_BRANCH : branch_name PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https:// Infrastructure deployment failed. Instance has not been destroyed. @matprov |
@tlvu @mishaschwartz What was left to do? I thought I had pushed every update to address the comments. |
run tests |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2485/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-node-details DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1524/NOTEBOOK TEST RESULTS |
@mishaschwartz Agreed not blocking, that why I was okay to go forward with this, as long as my other feedbacks are handled.
@fmigneault Quickly from memory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed this PR again and I see only 2 blocking for me
read-configs.include.sh
should move back to its old location- notify calling scripts that
COMPOSE_DIR
are not auto-detected properly
I have provide more suggestions but given we want to expedite this PR, they can be done in a different PR.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2489/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-node-details DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1526/NOTEBOOK TEST RESULTS |
@tlvu |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2490/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-node-details DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1528/NOTEBOOK TEST RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just minor changes requested.
By the way, to prevent typo because you've changed a lot of scripts, including autodeploy scripts, please perform an autodeploy test just to be sure.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2492/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-node-details DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1529/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2493/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-node-details DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-133.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1530/NOTEBOOK TEST RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good for me. Please perform an autodeploy test in case there are accidental typo errors.
Can you do this test according to what you expect it to accomplish? We do not employ the |
I don't think I should be doing this test for you. We have a fully reproducible stack so that we can be independent in our testing. If the stack is not reproducible, then yes you will need to depend on the person with the proper setup to test for you. Autodeploy test procedure is well documented here https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/components#how-to-test-platform-autodeploy-is-not-broken-by-a-pr If me or Misha start playing with Magpie or Weaver code, would you want us to ask you to test for us? I don't think you see yourself as our QA agent. If you make changes to Magpie, I bet you will test that Weaver, Thredds, Geoserver and all the WPS services are still working properly with the new Magpie because Weaver, Thredds, Geoserver and all the WPS services are part of the stack and the goal is for all the built-in parts to work properly together. Autodeploy is also a built-in part of the stack. Last time, we discovered very late that Weaver I went ahead and test autodeploy for you anyways because I think we all want this PR to be merged. Good news, everything is fine with autodeploy. However I see too many WARNING that I think should probably be at DEBUG level instead to avoid drowning real WARNING. This could be a separate PR, not blocking this one.
|
@fmigneault Thank you for all your work on this! @fmigneault I'm still happy with this so feel free to merge whenever you're able |
@tlvu I understand where your coming from. I think the difference here is that all the other tests we rely on are part of an automated CI pipeline and the autodeploy tests are the only tests that have to be run manually. If we could make the autodeploy tests automated then we wouldn't have the issue of having to ask any one in particular to verify that test before each PR. |
Thanks @mishaschwartz, you've touched on an important point here. Allow me to elaborate further. To put it more clearly, we only think about fresh new deployment scenario and we forget about upgrade scenario of existing system already in production. Autodeploy is just one point of this upgrade scenario. Example of some other points related to upgrade: DB upgrade that can handle existing data, backward compatible for "public API" (ex: the location of The fact that Cowbird is not backward compatible with existing Jupyter users (#425) is another example of us forgetting about the upgrade scenario. So I think we should always look further than just getting the automated CI pipeline to pass because that CI pipeline do not include the upgrade scenario. |
@tlvu If the CI did a small test to run a deployment (even a pseudo-one if needed) and report errors, then I could plan fixes for it, but I currently have no automated way to get this feedback. I am not expecting anyone to run end-to-end tests of the full stack or individual components that are not handled in some automated way, because that takes too much time and effort, as it is not reproducible anyway. Since auto-deploy seems that critical on Ouranos' side, I believe it would be worth for you to invest time in developing a test case that does the necessary checks to ensure it works, because even a custom setup on my end would not guarantee it works on yours (e.g.: |
Overview
Add server details overrides and logging utilities.
Changes
Non-breaking changes
Compose script utilities:
BIRDHOUSE_COLOR
option and various logging/messaging definitions inbirdhouse/scripts/logging.include.sh
.echo
in scripts by utility variablesMSG_DEBUG
,MSG_INFO
,MSG_WARN
andMSG_ERROR
as applicable per respective messages.birdhouse/scripts
utilities to employ the sameCOMPOSE_DIR
variable (auto-resolved or explicitly set)in order to include or source any relevant dependencies they might have within the
birdhouse-deploy
repository.info
option (ie:pavics-compose.sh info
) that will stop processing just beforedocker-compose
call.This can be used to run a "dry-run" of the command and validate that was is loaded is as expected, by inspecting
provided log messages.
Defaults:
SERVER_[...]
variables with defaults using previously hard coded values referring to PAVICS.These variables use a special combination of
DELAYED_EVAL
andOPTIONAL_VARS
definitions that can make useof a variable formatted as
<ANY_NAME>='${__DEFAULT__<ANY_NAME>}'
that will print a warning messages indicatingthat the default is employed, although STRONGLY recommended to be overridden. This allows a middle ground between
backward-compatible
env.local
while flagging potentially misused configurations.Canarie-API: updated references
SERVER_[...]
variables.at Ouranosinc/pavics-sdi instead
of intended bird-house/birdhouse-deploy.
Breaking changes
Related Issue / Discussion
CI
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false