Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend self-test log processing #151

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aritas1
Copy link

@aritas1 aritas1 commented Aug 20, 2023

this adds metrics for monitoring the latest self-tests execution time.

also fix the missing smartctl_device_self_test_log_count metric due to missing --log=selftest argument.

// assume the table will always be in descending order
processedTypes := make(map[string]bool)

for _, logEntry := range smart.json.Get("ata_smart_self_test_log.standard.table").Array() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should accept either standard or extended. Some args & device combinations only have one of them. The layout of the json struct is the same.

logTestType = "unknown"
}

if !processedTypes[logTestType] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is implicitly trusting that the tests appear in newest to oldest order. I don't know if I trust drives enough for that.

testTime = testTime * 60 * 60

// skip running tests
if testRunningIndicator != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct, from one of my systems:

"status": {
"value": 41,
"string": "Interrupted (host reset)",
"remaining_percent": 90
}

status.passeed is NOT present in this case.

I don't have any SATA drives w/ failing checks to compare presentlyy, but I worry they are also non-zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it's definetly in need of work; also in the smartctl sources:

  std::string msgstat;
  switch (test_status >> 4) {
    case 0x0: msgstat = "Completed without error";       break;
    case 0x1: msgstat = "Aborted by host";               break;
    case 0x2: msgstat = "Interrupted (host reset)";      break;
    case 0x3: msgstat = "Fatal or unknown error";        break;
    case 0x4: msgstat = "Completed: unknown failure";    break;
    case 0x5: msgstat = "Completed: electrical failure"; break;
    case 0x6: msgstat = "Completed: servo/seek failure"; break;
    case 0x7: msgstat = "Completed: read failure";       break;
    case 0x8: msgstat = "Completed: handling damage??";  break;
    case 0xf: msgstat = "Self-test routine in progress"; break;
    default:  msgstat = strprintf("Unknown status (0x%x)", test_status >> 4);
  }

So if it's 0xF then skip it as running; otherwise map the error.

@@ -399,6 +401,50 @@ func (smart *SMARTctl) mineDeviceErrorLog() {
}
}

func (smart *SMARTctl) mineDeviceSelfTest() {
validTypes := map[int]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from smartctl sources:

  switch (test_type) {
    case 0x00: msgtest = "Offline";            break;
    case 0x01: msgtest = "Short offline";      break;
    case 0x02: msgtest = "Extended offline";   break;
    case 0x03: msgtest = "Conveyance offline"; break;
    case 0x04: msgtest = "Selective offline";  break;
    case 0x7f: msgtest = "Abort offline test"; break;
    case 0x81: msgtest = "Short captive";      break;
    case 0x82: msgtest = "Extended captive";   break;
    case 0x83: msgtest = "Conveyance captive"; break;
    case 0x84: msgtest = "Selective captive";  break;
    default:
      if ((0x40 <= test_type && test_type <= 0x7e) || 0x90 <= test_type)
        msgtest = strprintf("Vendor (0x%02x)", test_type);
      else
        msgtest = strprintf("Reserved (0x%02x)", test_type);
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants