Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mounting Disks under NVMe diskcontroller in windows failes #2365

Open
Flask opened this issue Jun 20, 2024 · 8 comments
Open

Mounting Disks under NVMe diskcontroller in windows failes #2365

Flask opened this issue Jun 20, 2024 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@Flask
Copy link

Flask commented Jun 20, 2024

What happened:
Trying to mount a managed disk on a nvme diskcontroller vm failes

I0620 07:48:36.892166    6464 utils.go:77] GRPC call: /csi.v1.Node/NodeStageVolume
I0620 07:48:36.892166    6464 utils.go:78] GRPC request: {"publish_context":{"LUN":"0"},"staging_target_path":"\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\disk.csi.azure.com\\3a07bbd56bedf026817504b649086872043fb4a71d1a81b17de2e82d86563b52\\globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ntfs"}},"access_mode":{"mode":7}},"volume_context":{"cachingMode":"ReadOnly","csi.storage.k8s.io/pv/name":"pvc-dcdeeaa3-cd7a-40ff-8e4e-3c3bd2430d7b","csi.storage.k8s.io/pvc/name":"mypod","csi.storage.k8s.io/pvc/namespace":"myns,"fsType":"ntfs","kind":"Managed","requestedsizegib":"512","skuName":"Premium_LRS","storage.kubernetes.io/csiProvisionerIdentity":"1718807269317-6827-disk.csi.azure.com"},"volume_id":"/subscriptions/<subscription>/resourceGroups/myrg/providers/Microsoft.Compute/disks/pvc-dcdeeaa3-cd7a-40ff-8e4e-3c3bd2430d7b"}

Warning FailedMount 4m49s (x49 over 89m) kubelet MountVolume.MountDevice failed for volume "pvc-dcdeeaa3-cd7a-40ff-8e4e-3c3bd2430d7b" : rpc error: code = Internal desc = failed to find disk on lun 0. azureDisk - findDiskByLun(0) failed with error(could not find disk id for lun: 0)

What you expected to happen:
provide the pvc to the pod

How to reproduce it:
try to attach an azuredisk to a windows kubernetes node of type Standard_D4alds_v6

Anything else we need to know?:

Environment:

  • CSI Driver version: v1.29.2
  • Kubernetes version (use kubectl version): v1.28.5
  • OS (e.g. from /etc/os-release): windows server 2019/2022
  • Others: csi-proxy 1.1.2
@andyzhangx
Copy link
Member

could it always repro on Standard_D4alds_v6 windows vm sku?

@Flask
Copy link
Author

Flask commented Jun 20, 2024

hey @andyzhangx i've tried it 4-5 times with different machines in a vmss. I think there have been some changes on how managed disks are attached to the those VMs. Maybe this helps:

Managed disk on Standard_D96ads_v5:

get-disk                                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                             
Number Friendly Name                                                                                                                                      Serial Number                    HealthStatus         OperationalStatus      Total Size Partition  
                                                                                                                                                                                                                                                  Style      
------ -------------                                                                                                                                      -------------                    ------------         -----------------      ---------- ---------- 
...
11     Msft Virtual Disk                                                                                                                                                                   Healthy              Online                     512 GB GPT 
...
ConvertTo-Json @(Get-Disk | select Number, Location)  
[                                                                                                                                                                                                                                                            
    ...                                                                                                                                                                                                                                 
    {                                                                                                                                                                                                                                                        
        "Number":  11,                                                                                                                                                                                                                                       
        "Location":  "Integrated : Adapter 3 : Port 0 : Target 0 : LUN 0"                                                                                                                                                                                    
    },
    ...

on the Standard_D96alds_v6:

Get-Disk                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                             
Number Friendly Name                                                                                                                                      Serial Number                    HealthStatus         OperationalStatus      Total Size Partition  
                                                                                                                                                                                                                                                  Style      
------ -------------                                                                                                                                      -------------                    ------------         -----------------      ---------- ---------- 
...      
12     MSFT NVMe Accelerator v1.0                                                                                                                         B91B_DB34_FB4F_48EE_AC80_7234... Healthy              Online                     512 GB GPT        
 
ConvertTo-Json @(Get-Disk | select Number, Location)  
[                                                                                                                                                                                                                                                            
    ...                                                                                                                                                                                                                                 
    {                                                                                                                                                                                                                                                        
        "Number":  12,                                                                                                                                                                                                                                       
        "Location":  "Integrated : Adapter 0"                                                                                                                                                                                                                
    }  
    ...

I've removed the non-related entries to keep it simple and replaced them with ...

@andyzhangx
Copy link
Member

@Flask so on Standard_D96alds_v6, is disk num 12 a managed data disk? the is Friendly Name of that disk is MSFT NVMe Accelerator v1.0 , and that disk does not have lun num mapping as Standard_D96ads_v5, e.g. "Location": "Integrated : Adapter 3 : Port 0 : Target 0 : LUN 0"

@Flask
Copy link
Author

Flask commented Jun 24, 2024

Exactly. Storage class is in both cases:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ssd-ntfs
parameters:
  cachingMode: ReadOnly
  fsType: ntfs
  kind: managed
  skuName: Premium_LRS
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

@andyzhangx
Copy link
Member

@Flask I think there is sth. wrong with the windows vm internal config for this vm sku. Can you file a support ticket to Azure windows VM team? thx

@andyzhangx
Copy link
Member

On linux, there should be a udev rule to detect data disk automatically:
#2034 (comment)
I think Windows VM should also have similar udev rule on this VM sku.

@andyzhangx
Copy link
Member

FYI. the nvme disk is already supported on Linux node with v1.30.3 release, still need to figure how to get the <lun, disk-num> mapping on Windows node.

@andyzhangx andyzhangx added the kind/bug Categorizes issue or PR as related to a bug. label Aug 1, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants