-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xfsprogs in csi-driver container and on host do not match #2588
Comments
thanks for providing the info. @kaitimmer the problem is that AKS node is still using Ubuntu 5.15 kernel which does not support xfs RMAPBT attribute (kernel 6.x supports), while the new alpine image 3.20.2 contains this new xfs RMAPBT attribute, that makes the incompatibility. I would rather revert to old alpine image. if you have AKS managed csi driver, and want to revert to v1.30.4, just mails me, thanks. Note that this bug only impacts new xfs disk using azure disk csi driver v1.30.5 |
I reached out to you for our specific clusters via Email. |
@andyzhangx |
@monotek we will upgrade to alpine base image 3.18.9 which also fixes the CVE, here is the PR: #2590 |
Ah, ok, Thanks! :) We were not sure about that as we saw Kernel 6.6 here too: https://github.com/microsoft/azurelinux/releases/tag/3.0.20240824-3.0 So I guess the aks nodes would still use Azure Linux 2.x? |
@monotek Azure Linux 3.x (preview) is on kernel 6.6, while Azure Linux 2.x is on kernel 5.15
and unfortunately we cannot downgrade to a specific version in higher alpine base image:
|
I appreciate the help from @andyzhangx in resetting the azuredisk-csi to v1.30.4 on our cluster. However we restart the cluster daily and each restart upgrades to v1.30.5 again. Is there an enduring solution? |
@ctrmcubed the hotfix has been rolled out complete in northeurope and westeurope regions now, pls check. we will also rollout on other regions next. |
btw, the issue is only on formatting new xfs PVC disk (no data loss risk here), if your cluster has been restored with fix ( with CSI driver v1.30.6 or v1.30.4), you need to delete existing broken xfs PVC, and then create new PVC again with the fixed CSI driver version. (only Azure disk CSI driver v1.30.5 is broken here) |
Confirmed this is now working in my region using v1.30.6. |
this issue has been fixed on all regions now |
What happened:
When we try to mount a new XFS volume to a pod (via volumeclaimtemplate) we see the following error:
What you expected to happen:
Mounting new volumes should just work.
How to reproduce it:
Create a new disk with the following StorageClass:
And mount it to a pod.
Anything else we need to know?:
When we log in to the aks node and execute dmesg we get the following information:
This is the output of the xfs disk on the node:
On other nodes with older xfs volumes, the mount still works. The differences of the xfs format are:
So, the new XFS volumes get the RMAPBT attribute, which seemingly can't be handled in the Ubuntu AKs node image.
Our workaround now is to log in to the Azure node and reformat the volume with 'mkfs.xfs—f/dev/sdX'.
Also, I would assume that this ffbeb55 might already be the hotfix for it. So, I'm just raising this for awareness so that others do not spend hours finding the issue on their end.
I do not think this is the best way to fix this, but it'll do as a quick solution.
A proper solution might be to mount the tools directly from the host into the container so that this version mismatch does not happen again.
Or do it like this: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/Dockerfile which also prevents this.
Can we please release a new version and release the fix?
Environment:
kubectl version
): 1.30.3The text was updated successfully, but these errors were encountered: