-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a single JupyterHub instance to spawn pods in multiple clusters #34
Comments
@vvcb I've reviewed this project and it looks like a good start, however I do have a number of concerns and potential improvements to pose before choosing this as an option. The solution relies on an internal kubespawner to do the work, rather than working in the standard kubernetes native design pattern which uses controllers to perform the bulk of the task. This is an ommission in the jupyterhub kubespawner as well, so it is not surprising. What this means is that all of the work is carried out by a single service, that service is the same service which the user interacts with creating a single vector for attack which would ultimately give a single vector for attack. If this service is breached due to an exploit in a library that jupyterhub uses then this could be used to run other workloads on the cluster which could potentially see the entire cluster and all of the data related to it breached. If we're going back to the drawing board of how the kubespawner is working then I would expect that we would want to see this item filled. Additionally, I am concerned about the use of a CLI application inside of the application, in my experience while this may work now, as the CLI evolves it will likely change syntax and will add a layer of complexity that could ultimately lead to it being harder to support. While it may seem easier to develop in this way, it is ultimately a false economy. When it comes to exception handling, such models often obfuscate any error messages and ultimately make it more difficult to debug. Kubernetes have provided class libraries and design models for interacting with the API's which are designed with an upgrade path model in mind, this in theory should allow us to interact with the models long into the future in spite of any changes to the CLI and should provide appropriate exception handling and feedback in the event of any problems. This may seem more difficult to learn, but ultimately this is really just a standard API and is well documented so isn't as hard to implement as it first appears. Security wise I also have concerns, the model relies on the various clusters being able to talk directly with the control plane of the other clusters, which is not a model I'm comfortable with. It will then be capable of provisioning workloads on the linked clusters with very few real controls around what is being provisioned. I would suggest instead that what we need to do is work on a new version of the kubespawner, that is designed to work in the way that kubernetes is designed to work. sequenceDiagram
participant Hub
participant API
participant Operator
Hub ->> Hub: User Logs into Jupyterhub and selects workspace
Hub ->> API: Create Custom Resource
Operator ->> API: Fetch Updated Custom Resources
Operator ->> API: Create Pod and wait for readiness
Operator ->> API: Update Status of Custom Resource to PodReady
Operator ->> Operator: Update Proxy
Operator ->> API: Update Status of Custom Resource to ProxyReady
Hub ->> API: Fetch Status
Hub ->> Hub: Redirect User Session to Pod
As part of the custom resource definition you could have custom properties which would be used to allow developers to extend the notebook definition with their own metadata that can then be used in their own implementation of the JupyterNotebooksOperator, which we could also build with event hooks to facilitate development of extensions This means that if they wanted it to do other things, they would only need to extend the operator code. Then we can add to this model a pub/sub model from one server to another that will allow another server to stay updated and provide feedback just on the one resource type we want (our custom resource). This would then be picked up and implemented on the relevant cluster: sequenceDiagram
participant Hub-1
participant API-1
participant Operator-1
participant Publisher-1
participant Subscriber-1
participant Subscriber-2
participant Publisher-2
participant API-2
participant Operator-2
Hub-1 ->> Hub-1: User Logs into Jupyterhub and selects workspace
Hub-1 ->> API-1: Create Custom Resource
par
Operator-1 ->> API-1: Fetch Updated Custom Resources
Operator-1 ->> Operator-1: No Action as managed off server
and
Publisher-1 ->> API-1: Fetch Updated Custom Resources
Subscriber-2 ->> Publisher-1: Fetch Updated Custom Resources
Subscriber-2 ->> API-2: Custom Resource Added
Operator-2 ->> API-2: Fetch Updated Custom Resources
par
Operator-2 ->> API-2: Create Pod and wait for readiness
and
Operator-2 ->> API-2: Create Service and wait for readiness
end
Operator-2 ->> API-2: Update Status of Custom Resource to PodReady
Publisher-2 ->> API-2: Fetch Changes to Custom Resource
Subscriber-1 ->> Publisher-2: Fetch Changes to Custom Resource
Subscriber-1 ->> API-1: Update Status of Custom Resource to PodReady
Operator-1 ->> API-1: Fetch Changes to Custom Resource
Operator-1 ->> Operator-1: Update Proxy
Operator-1 ->> API-1: Update Status of Custom Resource to ProxyReady
end
Hub-1 ->> API-1: Fetch Status
Hub-1 ->> Hub-1: Redirect User Session to Pod
This model would have instances of everything running on both clusters with the publishers able to serve many subscribers at once and we would put information on ownership of a service to a specific cluster so that we knew which cluster created it, and which implemented it, etc. We could even add in election logic into this that allowed multiple clusters to bid on it based on resource and the server which met the capabilities and had the greatest amount of free resource would be the one responsible for implementing it. This would require the development of the following:
Ultimately these should be relatively simple to implement and I suspect this new framework would be extremely useful to the community as it would make extension easier than it is at present and would increase security of the solution |
Per my conversation with @vvcb I have raised the primary design pattern as an issue on the kubespawner project: |
Excited to see ongoing conversations about this :)
100% agreed! I used Mostly just wanted to quickly respond here, as I wanted to explain away that particular code smell :) I'll try to respond to the other bits over the next day or so. |
Currently, JupyterHub spawns and manages pods in the same cluster that it is installed on.
However, it will be useful to be able to spawn pods on more than one cluster but managed by the same JupyterHub instance.
The main use case for us now is the ability to deploy pods to an on-prem cluster that may have bespoke compute allowing us to make use of existing investments in our infrastructure or partner's infrastructure. A good example is to off load GPU workloads to existing on-prem GPU compute to save costs.
Looks like @yuvipanda has already built https://github.com/yuvipanda/jupyterhub-multicluster-kubespawner and it will be worth investigating this further.
This may also be something to consider for the remote access work.
The text was updated successfully, but these errors were encountered: