-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azuredisk-node-win returns wrong diskID and mounts the wrong disk (e.g. drive D:
)
#2661
Comments
@mweibel thanks for the PR. do you know the 3 disks are all data disks? what's the LUN num of the 3 disks? |
@andyzhangx thanks for the fast reply! There's a collapsed section in the initial message which has more detail, does that help? Does this answer your question? I also have some more detailed output from
|
I think the issue exists when you set Temp disk placement, what if you don't make that setting? |
to reproduce this I ran the following pod: apiVersion: v1
kind: Pod
metadata:
name: debug-windows
namespace: default
spec:
containers:
- name: main
image: mcr.microsoft.com/windows/servercore:ltsc2022
imagePullPolicy: IfNotPresent
command: [ "cmd", "/c" ]
args: [ "ping /t 127.0.0.1" ]
volumeMounts:
- name: test
mountPath: /test
volumes:
- name: test
ephemeral:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: ssd-ntfs
volumeMode: Filesystem
nodeSelector:
kubernetes.io/os: windows Storage class:
Running Get-Disk on the node the pod got assigned to:
so without this fix the issue would still prevail, even though Temp disk placement was not set. |
I see, that's related to NVMe disk controller VM sku, will take a look next week |
can you elaborate please? not sure what you mean. |
What happened:
Ephemeral disk
set toTemp disk placement
Standard_D96ads_v5
When the pod starts up and tries to stage the volume, azure disk windows fetches the disk by LUN. It reaches the following code:
azuredisk-csi-driver/pkg/os/disk/disk.go
Lines 47 to 53 in 4e0e202
Because it fetches by LUN, if two disks have the same LUN it takes whatever disk comes first. This may be for example drive
D:
instead of the actual disk it should mount.The issue does not always happen because of the order of the response from
Get-Disk
. It happens very often though, in a recent reproduction attempt we had 3 out of 13 nodes with the issue.In an affected node:
Powershell Cmdlet
Get-Disk
returns the following information:With all properties
As you can see, Disk 3 and 1 share the same LUN but are on different adapters. Disk 3 would be the correct one, while Disk 1 is incorrect.
When trying to take disk 1, the following logs show up in azuredisk-win:
The mount in the end still works, but it mounts the wrong disk (
D:
instead of the emptyDir).We haven't been able to find dedicated documentation on the Azure disk drivers, but judging from the information we gathered on that node, we determined that disks on Adapter 0 are critical disks and should be avoided.
Due to the way
Location
is just a string, we determined it might be better to useGet-CimInstance
cmdlet instead:What you expected to happen:
Mounting works and mounts the correct disk.
How to reproduce it:
Create several pods running each on a dedicated node configured as outlined above, and mount an emptyDir. Check what disk the pods have mounted and look for the error logs in azuredisk win.
Anything else we need to know?:
Environment:
kubectl version
): v1.31.1uname -a
):The text was updated successfully, but these errors were encountered: