-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Move tolerationSettings from notebooks generally to data science projects #1306
Comments
could this be applied to models as well? Maybe we could have a set of tolerations to allow models to be served on GPU nodes which are dedicated to serving by mean of taints |
This is no longer the case when we talk in AcceleratorProfiles. I think 1.33 or 2.4 of RHOAI has Accelerator Profiles. Tolerations behind GPU usage so you can effectively use taints is already covered @bdattoma This request is for allowing more flexibility in general tolerations for Notebooks (and in general I imagine all of a set of DS Project resources -- unrelated to GPUs or Accelerators) |
I think this predates the UX flow. Moving to UX. UX ContextI think we need to design a way to bring the |
Is it possible to set a custom toleration for the accelerator? If I don't want to use the default nvidia.com/gpu which I think is automatically added when attaching the GPU profile. |
@bdattoma Yes it is -- when you create the AcceleratorProfile (or modify the one we create on migration) you can pick whatever tolerations you want and as many as you want. Our old world was a single static toleration, so we migrate with that -- but it is modifiable. The Admin UI is coming in 2.6 I believe, and is currently in incubation if you want to check it out. The tracker: #1255 |
Feature description
Currently, the notebook toleration settings from odh dashboard config apply to all notebooks in all namespaces.
Assume we have a cluster with different dedicated nodes per customer:
The idea is having namespaces per customer, it can be one namespace per user, I have grown used to that concept, but there needs to be a way to ensure that users / workbench namespaces can belong to different customers and have different scheduling placements for pods in terms of on which node they land.
So, my suggestion would be to
Describe alternatives you've considered
For now, we do not have multiple customers, with data science projects namespaces grouped per customer, so we schedule all notebooks on nodes with a given node taint key, e.g. key: opendatahub, using the existing mechanism in OdhDashboardConfig.
But going forward, the issue of moving to namespace-specific instead of for-all configs will become important. Be it for tolerations or for things like linking all service accounts to an image pull secret, also those dynamic ones for notebooks in data science projects.
Anything else?
No response
The text was updated successfully, but these errors were encountered: