-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize and update the entire birdhouse stack #328
Conversation
This reverts commit 5b411ad.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1546/Result : failure BIRDHOUSE_DEPLOY_BRANCH : break-fix-everything DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-36.rdext.crim.ca Infrastructure deployment failed. Instance has not been destroyed. @matprov |
@mishaschwartz
For point (2) regarding Docker Compose version, a neet approach to manage that would be to remove all Maybe (5) and (6) could be combined, since services will disapear no matter what if an existing instance blindly update the stack anyway. All those changes except (6) can be "reasonably" breaking as fairly easy to patch up. I'm not sure (6) is even worth the changes as proposed in this PR. It doesn't seem to add any benefice from a technical point of view. The changes seems conceptual at most, and I don't agree with many of them such as the placement of some items in I find the proposed changes (specifically of (6) part) introduce too many problems and a lot of unnecessary debugging efforts (which we don't have enough time for already), not to mention incompatibility with existing PRs and instances. We won't go through all our deployed instances to adapt locations of configs and overrides. We are also just barely starting to get a "working and stable" stack with the nightly CI. The fact that we cannot validate those massive changes directly with the CI, I consider this is too much of a regression to accept them. I would prefer we focus more on stabilizing and increasing the TRL of the stack rather than adding more instability. I will close this PR directly since I don't intend to accept it in a whole chunk. |
We had a whole discussion about making all breaking changes at once.... I apologize if I misunderstood the intent of that discussion. The whole point was to make major changes all at once so that you didn't have to regularly "go through all our deployed instances...". I would have loved to make smaller more incremental changes over the last few weeks and months as I was working on them but, as per our previous discussions on this topic, I was trying to put all breaking changes into a single PR to not cause ongoing disruption to your already deployed instances. I also seem to have misunderstood our discussion about reorganizing the components of the stack into separate directories. I was under the impression that logically dividing components based on functionality was desirable. I also, apparently, seem to have misunderstood the discussion about removing version dependencies (version 3.4 currently) from the docker compose files. Which I was under the impression was a roadblock for other updates but hadn't been changed up until now because it breaks backwards compatibility. I agree that the current CI configuration is not sufficient to validate these changes (as I mentioned in the TODO list part of the description) which is why I left this as a draft PR until that can be addressed; a discussion I was hoping to continue with you or your team at CRIM shortly. It would really help me, @fmigneault, if you could comment here with any changes that are no-gos for you so that I can be sure to avoid them when suggesting changes in the future. From what I understand from your comment above, they are:
I would also like to hear from @tlvu as well as anyone else who would like to offer an opinion. |
Quick impression while reading the PR description only, have not have time to read the code change. I have a huge deadline for a project this week so I can only spend time on this PR next week. So yes anything breaking backward compat should be done together to avoid multiple manual interventions to the various existing deployments (prod and all the numerous staging/test instances on various organizations). Example: removal of the 3.4 version in all the I realized I probably was not clear that breaking backward compat in my mind is breaking pavics-compose.sh, autodeploy and the format of the That said, probably some of those changes can be done without breaking backward compat so do not have to be done inside this breaking PR? I am not sure on this one since I did not read the code change. Maybe docker-compose v2 update, the new endpoints "/version", "/services", documentation update to remove mention of PAVICS in favor of Birdhouse, new tests, are not breaking backward compat so no need to be in this PR? The idea is to make each PR easier to digest so it's quicker to spot problems and a faster review process in general. Cognitively small PR that change only 1 or 2 features at most are the best but that can not always be achieved so it's okay to break the rule from time to time. Most of the changes in this PR are already planned so I don't think you misunderstood anything Misha. Again, small PR are preferred but when it's not possible, it's just not possible. Here are my comments on the changes, based solely on the PR description, not the code. I have indicated whether I think this breaks an "end-user visible feature" or the operating of the "platform".
|
My point is exactly related to this. Only parts 5 & 6 (from my comment) are actually breaking. Those can be together (if needed). The rest are new features that should be contained on their own. I'm in favor of 5 being pushed relatively fast because they are all, indeed, deprecated components. I am less in favor of 6 because, after seeing the changes, I don't have the impression that the reorganizaiton of the components (core/data/services/...) are more logical than they currently are (config/components/optional-components) [see my previous comment for more details about the mixed use of services "categories" and proposed locations]. I find the PR only switches the current logic with mixed-grouping for another, but both the current categorization and the new one mix-and-match services of different uses and concepts. In theory, when we discussed, it seems like a good idea, but seeing the result doesn't seem as good as anticipated. Provided that the restructure of the directories doesn't bring any advantage, IMO it is not worth the effort it will cause. I agree with the general "what can / can't break" from @tlvu. If something "requires" breaking changes, then so be it. But if a small patch can preserve backward compat, then we might as well do so to make it transparent for active instances and users.
1st point, I also agree.
With this, I fully agree. There was no misunderstanding on your part. I have been using Docker Compose v2 and overridden 3.9 over some time already on my dev machine. So I consider those almost non-breaking ("almost" because volume mounts can be weird, and cowbird workspaces symlinks might not like the change). |
I don't remember testing the WPS client (birdy) with the weaver endpoint yet. |
Thank you both for your feedback and for clarifying what you all consider breaking changes for your purposes. I will do the following immediately since I can do so without breaking "end-user visibility" or "platform" (as @tlvu helpfully put it):
Any further steps will require some extensive reworking to pull out the parts into individual PRs and I will continue to work on that as time permits. I will not make any changes that move components in any way; there are too many concerns with that strategy at the moment. |
Overview
Reorganize and update the entire birdhouse stack.
Moves all components to the following directories under birdhouse/ subdirectory:
Deprecates the following services by moving them to the services/.deprecated/ directory. These services are no longer maintained but are kept for historical reasons. These services will no longer work with the current stack and will require significant changes if they need to be added back to the stack in the future.
Access to all routes now goes through twitcher by default. This ensures that all services are protected behind the authorization proxy by default. The only exception is jupyterhub which accesses magpie directly to authorize users. This includes WPS outputs which previously were only protected through twitcher if the secure-data-proxy component was enabled.
Docker compose no longer exposes any container ports outside the default network except for ports 80 and 443 from the proxy container. This ensures that ports that are not intended for external access are not exposed to the wider internet even if firewall rules are not set correctly.
Docker compose files previously were locked to version 3.4. Docker compose documentation currently recommends no specifying a version number so that the most recent syntax can be used if available. The version numbers were removed from all docker-compose.yml and docker-compose-extra.yml files.
Docker compose was being invoked with the
docker-compose
command instead of using thedocker compose
subcommand. This meant that docker compose version 1 was being used (has since been deprecated). This updates the stack to use docker compose version 2+.By default, all services authorized through twitcher/magpie were blocked for all users. A more reasonable default is to allow anonymous access by default to allow users with no login credentials try out the stack with limited allocated resources. This can be disabled by changing the MAGPIE_DEFAULT_PERMISSION_GROUP environment variable from "anonymous". This will give default access to whichever group is indicated by MAGPIE_DEFAULT_PERMISSION_GROUP instead.
Previously, WPS services could be access directly or through weaver. This meant that there were two endpoints that provided the same service and also required that access permissions be synchronized between both. This redundancy was unnecessary and so WPS are now only available through weaver. This also ensures that the outputs of WPS will be organized by username (see services/weaver/config/twitcher/weaver_hooks.py:add_x_wps_output_context) which simplifies permission management of WPS outputs.
A new endpoint "/services" is added that provides a json string describing each of the user facing services currently enabled on the stack. This is a static string and serves a different purpose than the endpoints served by canarie-api (monitoring status). This endpoint is meant to be polled by the node registry scripts (in development) to provide information about what services are meant to be available without having to poll other endpoints directly.
A new endpoint "/version" is added that provides a string containing the current version number of the stack (e.g. "1.26.0"). This endpoint is meant to be polled by the node registry scripts (in development).
The documentation and comments previously contained references to PAVICS explicitly. This is not desirable since this stack could be used to deploy a stack that is not called "PAVICS". All explicit references to "PAVICS" have been changed to "Birdhouse".
Added some additional deployment tests that can be used to ensure that the components of this stack are configured correctly. This should ensure that any future changes will not break the deployment mechanisms used here.
Changes
Non-breaking changes
Breaking changes
Related Issue / Discussion
Additional Information
Links to other issues or sources.