Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some data parsing seems wrong? #46

Open
QiE2035 opened this issue Jul 27, 2022 · 10 comments
Open

Some data parsing seems wrong? #46

QiE2035 opened this issue Jul 27, 2022 · 10 comments
Assignees
Labels
needs more info More info is required to continue

Comments

@QiE2035
Copy link

QiE2035 commented Jul 27, 2022

The code of py-SMART in device.update():

            if 'used endurance' in line:
                pct = int(line.split(':')[1].strip()[:-1])
                self.diagnostics.Life_Left = 100 - pct
            if 'Specified cycle count' in line:
                self.diagnostics.Start_Stop_Spec = int(
                    line.split(':')[1].strip())
            if 'Accumulated start-stop cycles' in line:
                self.diagnostics.Start_Stop_Cycles = int(
                    line.split(':')[1].strip())
                if self.diagnostics.Start_Stop_Spec != 0:
                    self.diagnostics.Start_Stop_Pct_Left = int(round(
                        100 - (self.diagnostics.Start_Stop_Cycles /
                               self.diagnostics.Start_Stop_Spec), 0))
            if 'Specified load-unload count' in line:
                self.diagnostics.Load_Cycle_Spec = int(
                    line.split(':')[1].strip())
            if 'Accumulated load-unload cycles' in line:
                self.diagnostics.Load_Cycle_Count = int(
                    line.split(':')[1].strip())
                if self.diagnostics.Load_Cycle_Spec != 0:
                    self.diagnostics.Load_Cycle_Pct_Left = int(round(
                        100 - (self.diagnostics.Load_Cycle_Count /
                               self.diagnostics.Load_Cycle_Spec), 0))
            if 'Elements in grown defect list' in line:
                self.diagnostics.Reallocated_Sector_Ct = int(
                    line.split(':')[1].strip())
            if 'read:' in line:
                line_ = ' '.join(line.split()).split(' ')
                if line_[1] == '0' and line_[2] == '0' and line_[3] == '0' and line_[4] == '0':
                    self.diagnostics.Corrected_Reads = 0
                elif line_[4] == '0':
                    self.diagnostics.Corrected_Reads = int(
                        line_[1]) + int(line_[2]) + int(line_[3])
                else:
                    self.diagnostics.Corrected_Reads = int(line_[4])
                self.diagnostics.Reads_GB = float(line_[6])
                self.diagnostics.Uncorrected_Reads = int(line_[7])
            if 'write:' in line:
                line_ = ' '.join(line.split()).split(' ')
                if (line_[1] == '0' and line_[2] == '0' and
                        line_[3] == '0' and line_[4] == '0'):
                    self.diagnostics.Corrected_Writes = 0
                elif line_[4] == '0':
                    self.diagnostics.Corrected_Writes = int(
                        line_[1]) + int(line_[2]) + int(line_[3])
                else:
                    self.diagnostics.Corrected_Writes = int(line_[4])
                self.diagnostics.Writes_GB = float(line_[6])
                self.diagnostics.Uncorrected_Writes = int(line_[7])
            if 'verify:' in line:
                line_ = ' '.join(line.split()).split(' ')
                if (line_[1] == '0' and line_[2] == '0' and
                        line_[3] == '0' and line_[4] == '0'):
                    self.diagnostics.Corrected_Verifies = 0
                elif line_[4] == '0':
                    self.diagnostics.Corrected_Verifies = int(
                        line_[1]) + int(line_[2]) + int(line_[3])
                else:
                    self.diagnostics.Corrected_Verifies = int(line_[4])
                self.diagnostics.Verifies_GB = float(line_[6])
                self.diagnostics.Uncorrected_Verifies = int(line_[7])
            if 'non-medium error count' in line:
                self.diagnostics.Non_Medium_Errors = int(
                    line.split(':')[1].strip())
            if 'Accumulated power on time' in line:
                self.diagnostics.Power_On_Hours = int(
                    line.split(':')[1].split(' ')[1])

And my output of smartctl:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.18.14-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WDC PC SN730 SDBPNTY-512G-1101
Serial Number:                      211515805454
Firmware Version:                   11190001
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 512,110,190,592 [512 GB]
Unallocated NVM Capacity:           0
Controller ID:                      8215
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 8b48720fd3
Local Time is:                      Wed Jul 27 11:39:39 2022 CST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     88 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.50W       -        -    0  0  0  0        0       0
 1 +     3.50W       -        -    1  1  1  1        0       0
 2 +     3.00W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3     4000   10000
 4 -   0.0035W       -        -    4  4  4  4     4000   40000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        46 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    1%
Data Units Read:                    63,787,598 [32.6 TB]
Data Units Written:                 14,028,396 [7.18 TB]
Host Read Commands:                 869,234,784
Host Write Commands:                179,000,138
Controller Busy Time:               1,670
Power Cycles:                       1,059
Power On Hours:                     1,562
Unsafe Shutdowns:                   106
Media and Data Integrity Errors:    0
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

For example, Life_Left of smartctl is Percentage Used, but py-SMART is used endurance, the two don't seem to match?

@ralequi ralequi self-assigned this Jul 27, 2022
@ralequi
Copy link
Collaborator

ralequi commented Jul 28, 2022

Diagnostic information do not have a stable format across device-types in smartctl... It's a very pain in the ass must say.

It depends on if it is a SSD over SATA/SAS or Raid card or if it is a nvme. On each interface we have a very different output.

I don't have an "endurance" entry on my testing dataset, but I think is worth to add "Percentage Used" as 100 - Life_left.
I've to check the docs and verify if that value is what it seems to be.

Also, I'm going to check if we can add more values from the NVMe Log to the diagnostics structure, but, in any case, it may be a better idea to design a refactor.

If you are interested in a subset of those params, please tell me so I can prioritize them.
Thank you.

@ralequi
Copy link
Collaborator

ralequi commented Sep 26, 2022

I've added a new field to device called if_attributes.
It will store specific-iface data.
In the future, this will just a handler and the device will show metrics almost transparent to the iface. (or at least that's the idea).

Please, confirm me that field fixes your issues and you could get the info you need.

To install using pypi try: pip install git+https://github.com/truenas/py-SMART@develop

@synodriver
Copy link

synodriver commented Feb 11, 2023

What about using json format with flag -j? It's very simple to parse with pydantic.

@synodriver
Copy link

synodriver commented Feb 11, 2023

My implementation here with json format and asyncio support, and I need more test data because I don't have any SAS or SCSI drive. Those todos are currently haven't been implemented. In order to get full data of the drive, I add a new attribute raw_data to store the original dict from the process.

@synodriver
Copy link

Here is my own test data, from my hdd and ssd.
Do you have more? This is very helpful for testing。

@ralequi
Copy link
Collaborator

ralequi commented Feb 15, 2023

Thanks for your contrib @synodriver ,

However...
On the first hand, notice this is a very old project, that's the reason behind not using json on the first place.
On the other hand, the json output (as far as I could test) vary as much as text output. It depends if is ATA, SCSI/SAS or NVMe, the version of the interface itself, the type of disk (ssd vs hd), and a big etc.

Of course it would be easier to migrate everything into a JSON parsing and update the output API of pysmart (which is... not very friendly and very underlaying-protocol dependant) BUT we have tons of test to release something in that line and I don't have every single testcase (on the recent months there were many issues related with cornercases, most of them NON related with text-parsing but mostly on regexes & postprocesing!)

If you really wanna help, we can create a branch and begin working there together. I would upload as much tests as I could and try to do our best. Nonetheless consider this may be a long-term update, as, if we have to wait until we have enough test data to release it confidently.

@ulmitov
Copy link

ulmitov commented Feb 21, 2023

Json output does not always holds all the output that text output gives. Especially in older smartctl versions. I work with v7.2 and do observe sometimes log parts missing in the json output. In older versions I've seen it more frequently. Text output is more reliable.

@synodriver
Copy link

The regexes always cause some weird problems on my computer, so I made a decision to use json format instead. As for the output data... to be honest I haven’t notice that.

@ralequi
Copy link
Collaborator

ralequi commented Feb 27, 2023

The regexes always cause some weird problems on my computer, so I made a decision to use json format instead. As for the output data... to be honest I haven’t notice that.

@synodriver , in that case, please, share those issues so we can fix them.

Thank you

@ralequi
Copy link
Collaborator

ralequi commented Feb 27, 2023

The regexes always cause some weird problems on my computer, so I made a decision to use json format instead. As for the output data... to be honest I haven’t notice that.

I've added your tests to the test folder. I haven't found any issue. Please, describe the problem you were having with the regexes

@ralequi ralequi added the needs more info More info is required to continue label May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more info More info is required to continue
Projects
None yet
Development

No branches or pull requests

4 participants