You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
With the rise of ML on EKS, the use of NVIDIA GPU based instances is very common. Customers use the NVIDIA GPU Operator on EKS which uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others. DCGM provides a /metrics endpoint as described here.
Describe the solution you'd like
A complete example that shows how customers can configure ADOT, AMP and AMG to scrape and view the GPU metrics.
Describe alternatives you've considered
None.
Additional context
None.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has been open 60 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days
Is your feature request related to a problem? Please describe
With the rise of ML on EKS, the use of NVIDIA GPU based instances is very common. Customers use the NVIDIA GPU Operator on EKS which uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others. DCGM provides a
/metrics
endpoint as described here.Describe the solution you'd like
A complete example that shows how customers can configure ADOT, AMP and AMG to scrape and view the GPU metrics.
Describe alternatives you've considered
None.
Additional context
None.
The text was updated successfully, but these errors were encountered: