Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvme show-regs is giving invalid argument error #2092

Closed
imthenachoman opened this issue Oct 15, 2023 · 5 comments
Closed

nvme show-regs is giving invalid argument error #2092

imthenachoman opened this issue Oct 15, 2023 · 5 comments

Comments

@imthenachoman
Copy link

TL;DR;

I'm getting an error with show-regs:

$ sudo ./nvme-cli-latest-x86_64.AppImage show-regs -H /dev/nvme0n1
get-property: Invalid argument

Full story:

I'm getting regular emails from the smart daemon about my NVMe disk.

SMART error (ErrorCount) detected on host: desk

This message was generated by the smartd daemon running on:

   host name:  [redacted]
   DNS domain: [redacted]

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, number of Error Log entries increased from 2519 to 2521

Device info:
KBG30ZMV256G TOSHIBA, S/N:X8OPD1PGP12P, FW:ADHA0101

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Sat Oct  7 23:38:04 2023 EDT
Another message will be sent in 24 hours if the problem persists.

I've been trying to figure this out for months but I've not had any luck. Here are the various commands I have tried and their output.

smartctl -a /dev/nvme0

$ sudo smartctl -a /dev/nvme0
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-13-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       KBG30ZMV256G TOSHIBA
Serial Number:                      X8OPD1PGP12P
Firmware Version:                   ADHA0101
PCI Vendor/Subsystem ID:            0x1179
IEEE OUI Identifier:                0x00080d
Controller ID:                      0
NVMe Version:                       1.2.1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            00080d 04004ad9aa
Local Time is:                      Sun Oct 15 17:53:35 2023 EDT
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0017):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.30W       -        -    0  0  0  0        0       0
 1 +     2.70W       -        -    1  1  1  1        0       0
 2 +     2.30W       -        -    2  2  2  2        0       0
 3 -   0.0500W       -        -    4  4  4  4     8000   32000
 4 -   0.0050W       -        -    4  4  4  4     8000   40000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -    4096       0         0
 1 +     512       0         3

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        31 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    33%
Data Units Read:                    35,454,740 [18.1 TB]
Data Units Written:                 70,575,255 [36.1 TB]
Host Read Commands:                 306,457,518
Host Write Commands:                881,616,851
Controller Busy Time:               12,766
Power Cycles:                       342
Power On Hours:                     21,991
Unsafe Shutdowns:                   617
Media and Data Integrity Errors:    0
Error Information Log Entries:      2,528
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               31 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0       2528     0  0x301c  0xc002  0x000            -     4     -
  1       2527     0  0x201d  0xc004  0x028            -     1     -
  2       2526     0  0x101d  0xc004  0x028            -     1     -
  3       2525     0  0x6005  0xc002  0x000            -     4     -
  4       2524     0  0x6004  0xc004  0x028            -     1     -
  5       2523     0  0x5006  0xc004  0x028            -     1     -
  6       2522     0  0x1006  0xc005  0x028            -     1     -
  7       2521     0  0x4013  0xc005  0x028            -     0     -

nvme error-log /dev/nvme0

nvme.log

nvme list

$ sudo ./nvme-cli-latest-x86_64.AppImage list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            X8OPD1PGP12P         KBG30ZMV256G TOSHIBA                     0x1        256.06  GB / 256.06  GB    512   B +  0 B   ADHA0101
@ikegami-t
Copy link
Contributor

Confirmed the nvme-cli implementation below so looks the mmap is failed but the get-property seems also not supported on the drive device used then the error caused.

static void *mmap_registers(nvme_root_t r, struct nvme_dev *dev)
{
...
	membase = mmap(NULL, getpagesize(), PROT_READ, MAP_SHARED, fd, 0);
	if (membase == MAP_FAILED) {
		if (map_log_level(0, false) >= LOG_DEBUG) {
			fprintf(stderr, "%s failed to map. ", dev->name);
			fprintf(stderr, "Did your kernel enable CONFIG_IO_STRICT_DEVMEM?\n");
		}
		membase = NULL;
...
	return membase;
...
static int show_registers(int argc, char **argv, struct command *cmd, struct plugin *plugin)
{
...
	bar = mmap_registers(r, dev);
	if (!bar) {
		err = nvme_get_properties(dev_fd(dev), &bar);
		if (err)
			goto free_tree;
		fabrics = true;
	}

About the mmap failure it is mentioned by the issue comment #1941 (comment) and the issue #1846 so please refer them also. Since seems if you can disable the kernel config CONFIG_IO_STRICT_DEVMEM the error behavior can be resolved. (Sorry I am not sure about the change impact for your environment if you can change the kernel config.)
By the way to make sure can you check the nvme get-property command result on your environment? (Probably it will be failed as the same error.)
Note: On my local environment the error seems different below as Invalid Command Opcode but not Invalid argument as mentioned.

tokunori@tokunori-desktop:~/nvme-cli$ sudo .build/nvme get-property /dev/nvme1 --offset=0
NVMe status: Invalid Command Opcode: A reserved coded value or an unsupported value in the command opcode field(0x2001)

@igaw
Copy link
Collaborator

igaw commented Nov 2, 2023

Device: /dev/nvme0, number of Error Log entries increased from 2519 to 2521

This is a know problem with certain devices and we are working on this. The issue is that the device is reporting an error for an valid command which the firmware is not handling. Instead just ignoring it (this would be the right thing to do) it logs it as error.

show-regs

This is also a known configuration issue. When the Linux kernel is configured with CONFIG_IO_STRICT_DEVMEM, the kernel prevents user space to map IO memory to userspace. Either compile a kernel without this option enable or set the kernel command line option io_memory=relaxed.

get-property

The error message is not really good here. As @ikegami-t pointed out the kernel doesn't allow us to map the IO space and the device does not support the get-property call for the IO space.

@igaw
Copy link
Collaborator

igaw commented Dec 8, 2023

@sanyer
Copy link

sanyer commented Jan 22, 2024

No more problems using:

nvme version 2.7.1 (git 2.7.1)
libnvme version 1.7 (git 1.7)

@igaw
Copy link
Collaborator

igaw commented Jan 22, 2024

Thanks for testing. There is still a minor issue that list reports the wrong format in certain cases. It is already fixed in master. So next version should be fix this issue for good. I know famous last words :)

@igaw igaw closed this as completed Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants