You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I deployed Gemma 2 2b it on GKE with autopilot mode following these instructions https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tgi#autopilot. There's this error Node scale up in zones us-central1-c associated with this pod failed: GCE quota exceeded. Pod is at risk of not being scheduled. I checked quota there's enough GPU. However the pod is in pending state.
The text was updated successfully, but these errors were encountered:
Hi here again @piksida, as mentioned via LinkedIn, the issue is related to the quota rather than the TGI deployment; so on, to fix that you should visit https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#gpu_quota and ensure that the quota for the GPU that you want to use e.g. NVIDIA L4 has a value greater equal or greater than 1 (which as of what you shared via LinkedIn it seems it's already the case). Anyway, try to increase that to at least 3, so that we discard that option, otherwise, you can give it a try creating the GKE cluster with the Standard mode and the node pool with a single NVIDIA L4 which should work as of your current quota.
See the screenshot below of the filters that you need to apply in order to see the quota that you need to upgrade:
Hello!
I deployed Gemma 2 2b it on GKE with autopilot mode following these instructions https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tgi#autopilot. There's this error Node scale up in zones us-central1-c associated with this pod failed: GCE quota exceeded. Pod is at risk of not being scheduled. I checked quota there's enough GPU. However the pod is in pending state.
The text was updated successfully, but these errors were encountered: