Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s-1.30: Enable feature-gate of DRA alpha feature #1251

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

victortoso
Copy link
Member

@victortoso victortoso commented Aug 20, 2024

What this PR does / why we need it:

This is related to ongoing projects to test Dynamic Resource Allocation (DRA) in KubeVirt:

https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/

I'm enabling it in k8s-1.30 as per ongoing PCI passthrough driver developed by Sibasish Behera as part of his GSoC 2024 project, see:

https://github.com/kubevirt/dra-pci-driver

Special notes for your reviewer:

This patch mimics what was being done by hand in the dra-pci-driver, described here

I validated that it worked by checking resourceclasses exist within this image.

When the Node is up:

toso@tapioca ~/s/k/kubevirt (main)> cluster-up/kubectl.sh get resourceclasses
selecting docker as container runtime
No resources found

After installing dra-pci-driver

toso@tapioca ~/s/k/kubevirt (main)> cluster-up/kubectl.sh get resourceclasses
selecting docker as container runtime
NAME              DRIVERNAME                 AGE
pci.kubevirt.io   pci.resource.kubevirt.io   4m23s

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note:

Enable  Dynamic Resource Allocation (DRA) Alpha feature in k8s-1.30

@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Aug 20, 2024
@victortoso
Copy link
Member Author

/cc @TheRealSibasishBehera @alicefr

The demo is not yet fully working as I don't see the actual emulated nvme devices as allocatable yet. So, marking as draft till I figure that out.

@victortoso victortoso marked this pull request as draft August 20, 2024 19:33
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 20, 2024
@victortoso
Copy link
Member Author

The demo is not yet fully working as I don't see the actual emulated nvme devices as allocatable yet.

Seems to be an issue with the registration between dri-pci-driver and the kubelet. Not sure yet what's wrong :)

After cluster is up, I can see that most configs are correct, eg:

toso@tapioca ~/s/k/kubevirt (main)> ./cluster-up/ssh.sh node01 "sudo grep -ri \"DynamicResourceAllocation\" /etc/kubernetes/"
selecting docker as container runtime
/etc/kubernetes/manifests/kube-scheduler.yaml:    - --feature-gates=DynamicResourceAllocation=true
/etc/kubernetes/manifests/kube-controller-manager.yaml:    - --feature-gates=DynamicResourceAllocation=true
/etc/kubernetes/manifests/kube-apiserver.yaml:    - --feature-gates=DynamicResourceAllocation=true
/etc/kubernetes/kubeadm.conf:    feature-gates: DynamicResourceAllocation=true
/etc/kubernetes/kubeadm.conf:    feature-gates: DynamicResourceAllocation=true
/etc/kubernetes/kubeadm.conf:    feature-gates: DynamicResourceAllocation=true

It seems that kubelet's /etc/kubernetes/kubelet.conf is missing and same goes to what was documented, for path /var/lib/kubelet/config.yaml

toso@tapioca ~/s/k/kubevirt (main)> ./cluster-up/ssh.sh node01 "sudo grep -ri \"DynamicResourceAllocation\" /var/lib/kubelet"
selecting docker as container runtime

Bringing the dri-pci-driver up, as said in the first commet, shows the resourceclass just fine but registrations seems not to happen.

toso@tapioca ~/s/k/d/demo (main)> kubectl get resourceclasses
NAME              DRIVERNAME                 AGE
pci.kubevirt.io   pci.resource.kubevirt.io   6s

toso@tapioca ~/s/k/kubevirt (main)> k logs -v 10 --all-containers -n dra-pci-driver dra-pci-driver-kubeletplugin-49hxt -f
E0821 07:14:04.828142       1 nonblockinggrpcserver.go:125] "handling request failed" err="failed registration process: RegisterPlugin error -- no handler registered for plugin type: DRAPlugin at socket /var/lib/kubelet/plugins_registry/pci.resource.kubevirt.io.sock" logger="registrar" requestID=12 request="&RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- no handler registered for plugin type: DRAPlugin at socket /var/lib/kubelet/plugins_registry/pci.resource.kubevirt.io.sock,}"

Changing the kubelet's config and restarting it did not solve it. Registration shows no error anymore but still could not see the nvme resources in the node

[vagrant@node01 ~]$ sudo vi /etc/kubernetes/kubelet.conf
[vagrant@node01 ~]$ sudo vi /var/lib/kubelet/config.yaml
[vagrant@node01 ~]$ sudo systemctl daemon-reload
[vagrant@node01 ~]$ sudo systemctl restart kubelet.service
toso@tapioca ~/s/k/kubevirt (main)> k logs --max-log-requests 10 -v 10 --all-containers -n dra-pci-driver dra-pci-driver-kubeletplugin-w6szb -f
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:00.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:01.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:02.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:03.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:04.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:05.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:06.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:07.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:08.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:09.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:1f.0
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:1f.2
2024/08/21 07:18:57 GetDevicePCIID: basepath: /sys/bus/pci/devices, pciAddress: 0000:00:1f.3
I0821 07:18:57.603631       1 nonblockinggrpcserver.go:105] "GRPC server started" logger="dra"
I0821 07:18:57.603736       1 nonblockinggrpcserver.go:105] "GRPC server started" logger="registrar"

For this PR, I think kubelet's config needs to be set otherwise the registration will not work. I might be missing something else otherwise I'd see the resources in the node.

@TheRealSibasishBehera
Copy link

Thanks @victortoso for working on this .
As I have mentioned in the set up doc, we will need to change the kube-apiserver, kube-controller-manager, kube-scheduler and kubelet configuration

And before deploying the dra-pci-driver, we have to bind the nvme devices to vfio-driver .
For now , there are also some permission issues with selinux, so while trying use sudo setenforce 0

To verify that the devices are allocatable , check the NAS crd
kubectl describe nas node01 -n dra-pci-driver

@victortoso
Copy link
Member Author

To verify that the devices are allocatable , check the NAS crd
kubectl describe nas node01 -n dra-pci-driver

toso@tapioca ~/s/k/kubevirt (main) [1]> k describe nas node01 -n dra-pci-driver
Name:         node01
Namespace:    dra-pci-driver
Labels:       <none>
Annotations:  <none>
API Version:  nas.pci.resource.kubevirt.io/v1alpha1
Kind:         NodeAllocationState
Metadata:
  Creation Timestamp:  2024-08-21T15:36:34Z
  Generation:          4
  Owner References:
    API Version:     v1
    Kind:            Node
    Name:            node01
    UID:             d6f18aca-1bad-4235-b6f0-166251301266
  Resource Version:  4366
  UID:               c0c20bbb-e875-421c-ad48-1b7a0fc1abde
Spec:
  Allocatable Devices:
    Pci:
      Pci Address:    0000:00:07.0
      Resource Name:  devices.kubevirt.io/nvme
      Uuid:           98b11d5d-db05-4a2a-9f7d-7d5c2b5edce3
    Pci:
      Pci Address:    0000:00:08.0
      Resource Name:  devices.kubevirt.io/nvme
      Uuid:           1b95adf6-2c1e-41b0-b2a7-47fac8f03bca
Status:               Ready
Events:               <none>

This is related to ongoing projects to test Dynamic Resource
Allocation (DRA) in KubeVirt:

    https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/

I'm enabling it in k8s-1.30 as per ongoing PCI passthrough driver
developed by Sibasish Behera as part of his GSoC 2024 project, see:

    https://github.com/kubevirt/dra-pci-driver

Based on Alice's patch.
Signed-off-by: Victor Toso <[email protected]>
@victortoso victortoso marked this pull request as ready for review August 21, 2024 15:40
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 21, 2024
@kubevirt-bot kubevirt-bot requested a review from qinqon August 21, 2024 15:40
@victortoso
Copy link
Member Author

To verify that the devices are allocatable , check the NAS crd
kubectl describe nas node01 -n dra-pci-driver

I was looking at the wrong place, with kubectl describe node node01 -n dra-pci-driver. Thanks!
Now the kubelet's config was patched to include featureGate, so all bits should be in place for the kubevirtci k8-1.30 image.

@dhiller
Copy link
Contributor

dhiller commented Aug 21, 2024

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhiller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 21, 2024
@alicefr
Copy link
Member

alicefr commented Aug 22, 2024

@victortoso do we want to have this always on? I know it is more work but it will be good to have an KUBEVIRTCI env variable which enable this when we spin the cluster. Although, I'm not sure how to do a configurable kubeadm.conf, maybe using the default one and adding on the fly the additional option and overwrite the configuration?

@akalenyu
Copy link
Contributor

@victortoso do we want to have this always on? I know it is more more but it will be good to have an KUBEVIRTCI env variable which enable this when we spin the cluster. Although, I'm not sure how to do a configurable kubeadm.conf, maybe using the default one and adding on the fly the additional option and overwrite the configuration?

Yeah you probably don't want all other lanes to test the k8s beast with DRA enabled, it might hurt the integrity of what we end up testing, with DRA being the huge feature it is

@alicefr
Copy link
Member

alicefr commented Aug 22, 2024

/hold

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 22, 2024
@victortoso
Copy link
Member Author

Sorry, looking at this again.

I'm not sure how to do a configurable kubeadm.conf, maybe using the default one and adding on the fly the additional option and overwrite the configuration?

I'm looking at what was done with KUBEVIRT_CPU_MANAGER_POLICY here and we might be able to do it in a similar way. Probably a small refactor too to make we do both changes once in case we want to enable DRA & CPU_MANAGER_POLICY.

Now, looking at the comment there it feels we can use drop-in replacement already. This should reduce the seds while adding a few extra kubeadm.conf files and running kubelet with a --config-dir parameter by default.

@lyarwood does it sound reasonable? any suggestions?

Cheers,

@lyarwood
Copy link
Member

lyarwood commented Oct 7, 2024

Sorry, looking at this again.

I'm not sure how to do a configurable kubeadm.conf, maybe using the default one and adding on the fly the additional option and overwrite the configuration?

I'm looking at what was done with KUBEVIRT_CPU_MANAGER_POLICY here and we might be able to do it in a similar way. Probably a small refactor too to make we do both changes once in case we want to enable DRA & CPU_MANAGER_POLICY.

Now, looking at the comment there it feels we can use drop-in replacement already. This should reduce the seds while adding a few extra kubeadm.conf files and running kubelet with a --config-dir parameter by default.

@lyarwood does it sound reasonable? any suggestions?

Yes if we can get away with using drop-ins now we should do so. The KUBEVIRT_CPU_MANAGER_POLICY needs to get reworked anyway as it's duplicating the launch options we currently pass to kubelet during the provision step.

`echo "KUBELET_EXTRA_ARGS=--cgroup-driver=systemd --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice --fail-swap-on=false ` + nodeIP + ` --feature-gates=CPUManager=true,NodeSwap=true --cpu-manager-policy=static --kube-reserved=cpu=250m --system-reserved=cpu=250m" | tee /etc/sysconfig/kubelet > /dev/null`,

I thought I had captured this in a follow up issue but I can't seem to find it, I'll create one now.

@victortoso victortoso mentioned this pull request Oct 11, 2024
8 tasks
@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants