Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Example that shows how to configure ADOT, AMP and AMG for NVIDIA GPU Operator #233

Closed
askulkarni2 opened this issue Sep 26, 2023 · 1 comment · Fixed by #257
Closed
Assignees

Comments

@askulkarni2
Copy link

Is your feature request related to a problem? Please describe

With the rise of ML on EKS, the use of NVIDIA GPU based instances is very common. Customers use the NVIDIA GPU Operator on EKS which uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others. DCGM provides a /metrics endpoint as described here.

Describe the solution you'd like

A complete example that shows how customers can configure ADOT, AMP and AMG to scrape and view the GPU metrics.

Describe alternatives you've considered

None.

Additional context

None.

@bonclay7 bonclay7 self-assigned this Oct 6, 2023
Copy link

github-actions bot commented Dec 6, 2023

This issue has been automatically marked as stale because it has been open 60 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants