Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSA initcontainer failures #1836

Closed
mregmi opened this issue Sep 13, 2024 · 9 comments · Fixed by #1849
Closed

DSA initcontainer failures #1836

mregmi opened this issue Sep 13, 2024 · 9 comments · Fixed by #1849
Labels
bug Something isn't working dsa DSA device plugin related issue

Comments

@mregmi
Copy link
Contributor

mregmi commented Sep 13, 2024

Describe the bug
The DSA initcontainer fails in openshift in a system with DSA available.

The log shows:

accel-config disable-device dsa0
disabled 1 device(s) out of 1
accel-config load-config -e -c scratch/dsa0.conf
Error enabling wq
Enabling device dsa0
Enabling wq wq0.2
Error[0x80110000] dsa0/wq0.2: Unknown error

the dmesg shows:
[82721.495946] user: probe of wq0.2 failed with error -95

The system has intel iommu enabled and sm_on is also added to kernel boot parameters.
we are using the default dsa.cong from the repo.

please let us know if we are missing something.

To Reproduce
Create the DSA plugin using operator UI . the pod is created but it initcontainer fails

Expected behavior
The DSA plugin should be in running state

Screenshots
If applicable, add screenshots to help explain your problem.

System (please complete the following information):

  • OS version: RHEL 9.2
  • Kernel version: 5.14.0-427.26.1.el9_4.x86_64
  • Device plugins version: v0.29.0
  • Hardware info: SPR with DSA, QAT and IAA

Additional context
Add any other context about the problem here.

@mythi
Copy link
Contributor

mythi commented Sep 16, 2024

The DSA initcontainer fails in openshift in a system with DSA available.

can you check for cat /sys/bus/dsa/devices/dsa0/pasid_enabled and also share dmesg | grep -i for idxd and dmar

@mregmi
Copy link
Contributor Author

mregmi commented Sep 17, 2024

It looks like for some reason it cannot enable user SVA feature . Here are the info

sh-5.1# cat /sys/bus/dsa/devices/dsa0/pasid_enabled
0

sh-5.1# dmesg | grep -i idxd
[    5.105795] idxd 0000:6a:01.0: enabling device (0144 -> 0146)
[    5.105827] idxd 0000:6a:01.0: Unable to turn on user SVA feature.
[    5.111156] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100)
[    5.111281] idxd 0000:e7:01.0: enabling device (0144 -> 0146)
[    5.111336] idxd 0000:e7:01.0: Unable to turn on user SVA feature.
[    5.118566] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100)

iommu and sm_on is enabled:

sh-5.1# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/boot/ostree/rhcos-dbed79638d76ee2cf8a7f72b963b7f36f1377de1c4337a80ad6e8a76388e434b/vmlinuz-5.14.0-427.26.1.el9_4.x86_64 rw ostree=/ostree/boot.0/rhcos/dbed79638d76ee2cf8a7f72b963b7f36f1377de1c4337a80ad6e8a76388e434b/0 ignition.platform.id=metal ip=dhcp root=UUID=e1ebfb3c-e448-49ac-a545-9f1d647ddfe9 rw rootflags=prjquota boot=UUID=204bb77d-88d5-4eaa-bcfe-95afe0f54d1c intel_iommu=on,sm_on modules_load=vfio-pci systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=1

DMAR info:

sh-5.1# dmesg | grep -i dmar
[    0.013731] ACPI: DMAR 0x00000000777E0000 000518 (v01 INTEL  M50FCP   00000001 INTL 20091013)
[    0.013774] ACPI: Reserving DMAR table memory at [mem 0x777e0000-0x777e0517]
[    0.037393] DMAR: IOMMU enabled
[    0.037395] DMAR: Enable scalable mode if hardware supports
[    0.166032] DMAR: Host address width 52
[    0.166035] DMAR: DRHD base: 0x000000d97fc000 flags: 0x0
[    0.166047] DMAR: dmar0: reg_base_addr d97fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166053] DMAR: DRHD base: 0x000000e17fc000 flags: 0x0
[    0.166066] DMAR: dmar1: reg_base_addr e17fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166070] DMAR: DRHD base: 0x000000e97fc000 flags: 0x0
[    0.166077] DMAR: dmar2: reg_base_addr e97fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166081] DMAR: DRHD base: 0x000000f17fc000 flags: 0x0
[    0.166087] DMAR: dmar3: reg_base_addr f17fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166091] DMAR: DRHD base: 0x000000f97fc000 flags: 0x0
[    0.166097] DMAR: dmar4: reg_base_addr f97fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166101] DMAR: DRHD base: 0x000000d13fc000 flags: 0x0
[    0.166107] DMAR: dmar5: reg_base_addr d13fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166111] DMAR: DRHD base: 0x000000f9ffc000 flags: 0x0
[    0.166119] DMAR: dmar6: reg_base_addr f9ffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9e86f050df
[    0.166124] DMAR: DRHD base: 0x000000fa7fc000 flags: 0x0
[    0.166130] DMAR: dmar7: reg_base_addr fa7fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9e86f050df
[    0.166134] DMAR: DRHD base: 0x000000faffc000 flags: 0x0
[    0.166140] DMAR: dmar8: reg_base_addr faffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9e86f050df
[    0.166143] DMAR: DRHD base: 0x000000fb7fc000 flags: 0x0
[    0.166149] DMAR: dmar9: reg_base_addr fb7fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9e86f050df
[    0.166153] DMAR: DRHD base: 0x0000009f7fc000 flags: 0x0
[    0.166158] DMAR: dmar10: reg_base_addr 9f7fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166162] DMAR: DRHD base: 0x000000a93fc000 flags: 0x0
[    0.166168] DMAR: dmar11: reg_base_addr a93fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166172] DMAR: DRHD base: 0x000000b2ffc000 flags: 0x0
[    0.166178] DMAR: dmar12: reg_base_addr b2ffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166181] DMAR: DRHD base: 0x000000bcbfc000 flags: 0x0
[    0.166187] DMAR: dmar13: reg_base_addr bcbfc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166190] DMAR: DRHD base: 0x000000c67fc000 flags: 0x0
[    0.166196] DMAR: dmar14: reg_base_addr c67fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166200] DMAR: DRHD base: 0x000000c6ffc000 flags: 0x0
[    0.166205] DMAR: dmar15: reg_base_addr c6ffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9e86f050df
[    0.166209] DMAR: DRHD base: 0x000000c77fc000 flags: 0x0
[    0.166214] DMAR: dmar16: reg_base_addr c77fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9e86f050df
[    0.166218] DMAR: DRHD base: 0x000000c7ffc000 flags: 0x0
[    0.166223] DMAR: dmar17: reg_base_addr c7ffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9e86f050df
[    0.166227] DMAR: DRHD base: 0x000000c87fc000 flags: 0x0
[    0.166232] DMAR: dmar18: reg_base_addr c87fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9e86f050df
[    0.166236] DMAR: DRHD base: 0x000000957fc000 flags: 0x1
[    0.166245] DMAR: dmar19: reg_base_addr 957fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    0.166249] DMAR: RMRR base: 0x00000076cc1000 end: 0x00000076cc3fff
[    0.166254] DMAR: ATSR flags: 0x0
[    0.166262] DMAR: RHSA base: 0x000000957fc000 proximity domain: 0x0
[    0.166265] DMAR: RHSA base: 0x0000009f7fc000 proximity domain: 0x0
[    0.166267] DMAR: RHSA base: 0x000000a93fc000 proximity domain: 0x0
[    0.166269] DMAR: RHSA base: 0x000000b2ffc000 proximity domain: 0x0
[    0.166272] DMAR: RHSA base: 0x000000bcbfc000 proximity domain: 0x0
[    0.166274] DMAR: RHSA base: 0x000000c67fc000 proximity domain: 0x0
[    0.166276] DMAR: RHSA base: 0x000000c6ffc000 proximity domain: 0x0
[    0.166278] DMAR: RHSA base: 0x000000c77fc000 proximity domain: 0x0
[    0.166281] DMAR: RHSA base: 0x000000c7ffc000 proximity domain: 0x0
[    0.166283] DMAR: RHSA base: 0x000000c87fc000 proximity domain: 0x0
[    0.166285] DMAR: RHSA base: 0x000000d97fc000 proximity domain: 0x1
[    0.166288] DMAR: RHSA base: 0x000000e17fc000 proximity domain: 0x1
[    0.166290] DMAR: RHSA base: 0x000000e97fc000 proximity domain: 0x1
[    0.166292] DMAR: RHSA base: 0x000000f17fc000 proximity domain: 0x1
[    0.166295] DMAR: RHSA base: 0x000000f97fc000 proximity domain: 0x1
[    0.166297] DMAR: RHSA base: 0x000000d13fc000 proximity domain: 0x1
[    0.166299] DMAR: RHSA base: 0x000000f9ffc000 proximity domain: 0x1
[    0.166301] DMAR: RHSA base: 0x000000fa7fc000 proximity domain: 0x1
[    0.166304] DMAR: RHSA base: 0x000000faffc000 proximity domain: 0x1
[    0.166306] DMAR: RHSA base: 0x000000fb7fc000 proximity domain: 0x1
[    0.166308] DMAR: SATC flags: 0x0
[    0.166313] DMAR-IR: IOAPIC id 8 under DRHD base  0x957fc000 IOMMU 19
[    0.166316] DMAR-IR: HPET id 0 under DRHD base 0x957fc000
[    0.166319] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.173218] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    3.236655] DMAR: IOMMU feature pasid inconsistent
[    3.236656] DMAR: IOMMU feature pasid inconsistent
[    3.236657] DMAR: IOMMU feature pasid inconsistent
[    3.236657] DMAR: IOMMU feature pasid inconsistent
[    3.236658] DMAR: IOMMU feature pasid inconsistent
[    3.236659] DMAR: IOMMU feature pasid inconsistent
[    3.236660] DMAR: IOMMU feature pasid inconsistent
[    3.236661] DMAR: IOMMU feature pasid inconsistent
[    3.236661] DMAR: IOMMU feature pasid inconsistent
[    3.236662] DMAR: IOMMU feature pasid inconsistent
[    3.236662] DMAR: IOMMU feature pasid inconsistent
[    3.236663] DMAR: IOMMU feature pasid inconsistent
[    3.236664] DMAR: dmar18: Using Queued invalidation
[    3.236669] DMAR: dmar17: Using Queued invalidation
[    3.236672] DMAR: dmar16: Using Queued invalidation
[    3.236675] DMAR: dmar15: Using Queued invalidation
[    3.236677] DMAR: dmar14: Using Queued invalidation
[    3.236681] DMAR: dmar13: Using Queued invalidation
[    3.236684] DMAR: dmar12: Using Queued invalidation
[    3.236686] DMAR: dmar11: Using Queued invalidation
[    3.236689] DMAR: dmar10: Using Queued invalidation
[    3.236692] DMAR: dmar9: Using Queued invalidation
[    3.236694] DMAR: dmar8: Using Queued invalidation
[    3.236697] DMAR: dmar7: Using Queued invalidation
[    3.236700] DMAR: dmar6: Using Queued invalidation
[    3.236703] DMAR: dmar5: Using Queued invalidation
[    3.236706] DMAR: dmar4: Using Queued invalidation
[    3.236709] DMAR: dmar3: Using Queued invalidation
[    3.236711] DMAR: dmar2: Using Queued invalidation
[    3.236715] DMAR: dmar1: Using Queued invalidation
[    3.236718] DMAR: dmar0: Using Queued invalidation
[    3.236720] DMAR: dmar19: Using Queued invalidation
[    3.496196] DMAR: Intel(R) Virtualization Technology for Directed I/O

@mythi
Copy link
Contributor

mythi commented Sep 17, 2024

AFAIK getting it to 1 is a precondition for this to work. Maybe you are missing some setting in your BIOS?

@tkatila
Copy link
Contributor

tkatila commented Sep 18, 2024

For reference, same logs from our SPR node that runs the DSA tests:

# cat /sys/bus/dsa/devices/dsa?/pasid_enabled
1
1
# dmesg | grep idxd
[   12.818021] idxd 0000:6a:01.0: enabling device (0144 -> 0146)
[   12.818139] idxd 0000:6a:01.0: No in-kernel DMA with PASID.
[   12.841209] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100)
[   12.841305] idxd 0000:6a:02.0: enabling device (0140 -> 0142)
[   12.841364] idxd 0000:6a:02.0: No in-kernel DMA with PASID.
[   12.858269] idxd 0000:6a:02.0: Intel(R) Accelerator Device (v100)
[   12.858492] idxd 0000:e7:01.0: enabling device (0144 -> 0146)
[   12.858618] idxd 0000:e7:01.0: No in-kernel DMA with PASID.
[   12.885874] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100)
[   12.886072] idxd 0000:e7:02.0: enabling device (0140 -> 0142)
[   12.886163] idxd 0000:e7:02.0: No in-kernel DMA with PASID.
[   13.079519] idxd 0000:e7:02.0: Intel(R) Accelerator Device (v100)
# dmesg | grep -i dmar                                    
[    0.013033] ACPI: DMAR 0x00000000773D0000 000468 (v01 INTEL  INTEL ID 00000001 INTL 20091013)                                           
[    0.013079] ACPI: Reserving DMAR table memory at [mem 0x773d0000-0x773d0467]                                                            
[    0.659261] DMAR: IOMMU enabled                                   
[    0.659263] DMAR: Enable scalable mode if hardware supports
[    1.369139] DMAR: Host address width 52           
[    1.369140] DMAR: DRHD base: 0x000000c97fc000 flags: 0x0
[    1.369149] DMAR: dmar0: reg_base_addr c97fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                         
[    1.369153] DMAR: DRHD base: 0x000000c9bfc000 flags: 0x0
[    1.369158] DMAR: dmar1: reg_base_addr c9bfc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                         
[    1.369160] DMAR: DRHD base: 0x000000c9ffc000 flags: 0x0
[    1.369164] DMAR: dmar2: reg_base_addr c9ffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                         
[    1.369167] DMAR: DRHD base: 0x000000ca3fc000 flags: 0x0
[    1.369171] DMAR: dmar3: reg_base_addr ca3fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                         
[    1.369172] DMAR: DRHD base: 0x000000f97fc000 flags: 0x0
[    1.369186] DMAR: dmar4: reg_base_addr f97fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                         
[    1.369188] DMAR: DRHD base: 0x000000c93fc000 flags: 0x0
[    1.369193] DMAR: dmar5: reg_base_addr c93fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                         
[    1.369194] DMAR: DRHD base: 0x000000f9ffc000 flags: 0x0
[    1.369199] DMAR: dmar6: reg_base_addr f9ffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9ea6f050df                                         
[    1.369201] DMAR: DRHD base: 0x000000fa7fc000 flags: 0x0
[    1.369206] DMAR: dmar7: reg_base_addr fa7fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9ea6f050df                                         
[    1.369208] DMAR: DRHD base: 0x000000faffc000 flags: 0x0
[    1.369212] DMAR: dmar8: reg_base_addr faffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9ea6f050df                                         
[    1.369214] DMAR: DRHD base: 0x000000fb7fc000 flags: 0x0
[    1.369218] DMAR: dmar9: reg_base_addr fb7fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9ea6f050df                                         
[    1.369220] DMAR: DRHD base: 0x00000095bfc000 flags: 0x0
[    1.369224] DMAR: dmar10: reg_base_addr 95bfc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                        
[    1.369226] DMAR: DRHD base: 0x000000973fc000 flags: 0x0
[    1.369230] DMAR: dmar11: reg_base_addr 973fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                        
[    1.369231] DMAR: DRHD base: 0x000000977fc000 flags: 0x0
[    1.369235] DMAR: dmar12: reg_base_addr 977fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                        
[    1.369237] DMAR: DRHD base: 0x00000097bfc000 flags: 0x0
[    1.369241] DMAR: dmar13: reg_base_addr 97bfc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                        
[    1.369242] DMAR: DRHD base: 0x000000c67fc000 flags: 0x0
[    1.369246] DMAR: dmar14: reg_base_addr c67fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df                                        
[    1.369248] DMAR: DRHD base: 0x000000c6ffc000 flags: 0x0
[    1.369252] DMAR: dmar15: reg_base_addr c6ffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9ea6f050df                                        
[    1.369253] DMAR: DRHD base: 0x000000c77fc000 flags: 0x0
[    1.369257] DMAR: dmar16: reg_base_addr c77fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9ea6f050df                                        
[    1.369259] DMAR: DRHD base: 0x000000c7ffc000 flags: 0x0
[    1.369263] DMAR: dmar17: reg_base_addr c7ffc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9ea6f050df                                        
[    1.369264] DMAR: DRHD base: 0x000000c87fc000 flags: 0x0
[    1.369268] DMAR: dmar18: reg_base_addr c87fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ef9ea6f050df
[    1.369270] DMAR: DRHD base: 0x000000957fc000 flags: 0x1          
[    1.369274] DMAR: dmar19: reg_base_addr 957fc000 ver 6:0 cap 19ed008c40780c66 ecap 3ee9e86f050df
[    1.369275] DMAR: RMRR base: 0x00000073e3b000 end: 0x00000073e3dfff
[    1.369278] DMAR: ATSR flags: 0x0
[    1.369282] DMAR: RHSA base: 0x000000957fc000 proximity domain: 0x0
[    1.369284] DMAR: RHSA base: 0x00000095bfc000 proximity domain: 0x0
[    1.369285] DMAR: RHSA base: 0x000000973fc000 proximity domain: 0x0
[    1.369286] DMAR: RHSA base: 0x000000977fc000 proximity domain: 0x0
[    1.369287] DMAR: RHSA base: 0x00000097bfc000 proximity domain: 0x0
[    1.369288] DMAR: RHSA base: 0x000000c67fc000 proximity domain: 0x0
[    1.369289] DMAR: RHSA base: 0x000000c6ffc000 proximity domain: 0x0
[    1.369290] DMAR: RHSA base: 0x000000c77fc000 proximity domain: 0x0
[    1.369291] DMAR: RHSA base: 0x000000c7ffc000 proximity domain: 0x0
[    1.369292] DMAR: RHSA base: 0x000000c87fc000 proximity domain: 0x0
[    1.369293] DMAR: RHSA base: 0x000000c97fc000 proximity domain: 0x1
[    1.369294] DMAR: RHSA base: 0x000000c9bfc000 proximity domain: 0x1
[    1.369295] DMAR: RHSA base: 0x000000c9ffc000 proximity domain: 0x1
[    1.369296] DMAR: RHSA base: 0x000000ca3fc000 proximity domain: 0x1
[    1.369297] DMAR: RHSA base: 0x000000f97fc000 proximity domain: 0x1
[    1.369298] DMAR: RHSA base: 0x000000c93fc000 proximity domain: 0x1
[    1.369299] DMAR: RHSA base: 0x000000f9ffc000 proximity domain: 0x1
[    1.369300] DMAR: RHSA base: 0x000000fa7fc000 proximity domain: 0x1
[    1.369301] DMAR: RHSA base: 0x000000faffc000 proximity domain: 0x1
[    1.369302] DMAR: RHSA base: 0x000000fb7fc000 proximity domain: 0x1
[    1.369303] DMAR: SATC flags: 0x0
[    1.369307] DMAR-IR: IOAPIC id 8 under DRHD base  0x957fc000 IOMMU 19
[    1.369309] DMAR-IR: HPET id 0 under DRHD base 0x957fc000
[    1.369311] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    1.375726] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    3.589196] DMAR: IOMMU feature pasid inconsistent
[    3.589198] DMAR: IOMMU feature prs inconsistent
[    3.589199] DMAR: IOMMU feature pasid inconsistent
[    3.589200] DMAR: IOMMU feature prs inconsistent
[    3.589201] DMAR: IOMMU feature pasid inconsistent
[    3.589201] DMAR: IOMMU feature prs inconsistent
[    3.589202] DMAR: IOMMU feature pasid inconsistent
[    3.589203] DMAR: IOMMU feature prs inconsistent
[    3.589204] DMAR: IOMMU feature pasid inconsistent
[    3.589205] DMAR: IOMMU feature prs inconsistent
[    3.589206] DMAR: IOMMU feature pasid inconsistent
[    3.589207] DMAR: IOMMU feature prs inconsistent
[    3.589208] DMAR: IOMMU feature pasid inconsistent
[    3.589208] DMAR: IOMMU feature prs inconsistent
[    3.589209] DMAR: IOMMU feature pasid inconsistent
[    3.589210] DMAR: IOMMU feature prs inconsistent
[    3.589211] DMAR: IOMMU feature pasid inconsistent
[    3.589211] DMAR: IOMMU feature prs inconsistent
[    3.589212] DMAR: IOMMU feature pasid inconsistent
[    3.589213] DMAR: IOMMU feature prs inconsistent
[    3.589213] DMAR: IOMMU feature pasid inconsistent
[    3.589214] DMAR: IOMMU feature prs inconsistent
[    3.589215] DMAR: IOMMU feature pasid inconsistent
[    3.589215] DMAR: IOMMU feature prs inconsistent
[    3.589216] DMAR: dmar18: Using Queued invalidation
[    3.589227] DMAR: dmar17: Using Queued invalidation
[    3.589230] DMAR: dmar16: Using Queued invalidation
[    3.589232] DMAR: dmar15: Using Queued invalidation
[    3.589234] DMAR: dmar14: Using Queued invalidation
[    3.589241] DMAR: dmar13: Using Queued invalidation
[    3.589243] DMAR: dmar12: Using Queued invalidation
[    3.589246] DMAR: dmar11: Using Queued invalidation
[    3.589247] DMAR: dmar10: Using Queued invalidation
[    3.589256] DMAR: dmar9: Using Queued invalidation
[    3.589258] DMAR: dmar8: Using Queued invalidation
[    3.589261] DMAR: dmar7: Using Queued invalidation
[    3.589263] DMAR: dmar6: Using Queued invalidation
[    3.589273] DMAR: dmar5: Using Queued invalidation
[    3.589275] DMAR: dmar4: Using Queued invalidation
[    3.589277] DMAR: dmar3: Using Queued invalidation
[    3.589279] DMAR: dmar2: Using Queued invalidation
[    3.589287] DMAR: dmar1: Using Queued invalidation
[    3.589289] DMAR: dmar0: Using Queued invalidation
[    3.589292] DMAR: dmar19: Using Queued invalidation
[    4.277035] DMAR: Intel(R) Virtualization Technology for Directed I/O

@uMartinXu
Copy link

AFAIK getting it to 1 is a precondition for this to work. Maybe you are missing some setting in your BIOS?

Do you have some instruction about the DSA related BIOS setting? Or related document can share with us? We also need to include this BIOS setting into our readme. Thanks!

@mregmi
Copy link
Contributor Author

mregmi commented Sep 18, 2024

The DSA documentation does not mention about any other bios settings other than Intel® Virtualization Technology or Directed I/O (VT-d). these both are enabled.

https://www.intel.com/content/www/us/en/content-details/759709/intel-data-streaming-accelerator-user-guide.html

We were suggested (intel/idxd-config#46) to disable 5 lvl paging via kernel boot parameters. do you have it disabled in your Nodes? we are going to try this but its not documented in DSA guide as requirement.

@mythi
Copy link
Contributor

mythi commented Sep 18, 2024

Look for ENQCMD/ENQCMDS

@mregmi
Copy link
Contributor Author

mregmi commented Sep 19, 2024

ENQCMD/ENQCMDS are enabled in Bios and we still see this issue. We also tried disabling 5 level Pagetables from the issue above but it does not resolve the issue either.

@mregmi
Copy link
Contributor Author

mregmi commented Sep 23, 2024

Update:
The first set of issues regarding the SVA Feature enablement was caused by hardware issue and it was resolved by a BIOS config.

After the idxd driver was loaded properly, the initcontainer was still failing. It turns out the Driver on RHEL 9.2 requires the "driver_name" to be present in the DSA config file (intel/idxd-config@84f099c). After adding the "driver_name" in the config, initcointainer started properly. Thanks to @mythi for the help debugging the issue.

@tkatila tkatila added bug Something isn't working dsa DSA device plugin related issue labels Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dsa DSA device plugin related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants