Segmentation fault(coredump) when using -c or -C options #26

dks0296586 · 2022-11-01T18:35:52Z

We have deployed the exporter to approximately 200 AIX servers of various versions and TL levels with no issues.

There are 10 servers, all atleast running AIX 7.1 that are having issues.
When we set either -C or -c and Prometheus initiates the scrape, we get a segmentation fault. This happens on all versions of the exporter that we have tested it on (1.14.3.0, 1.12.1.0, 1.8.0.0, maybe others)

./node_exporter_aix -p 50005 -a -cmdif
Node exporter for AIX version 1.14.3.0 listening on port 50005
Segmentation fault(coredump)

We tested the debug version that was posted in another segmentation fault issue, and got a little extra info:

./node_exporter_aix_debug -p 50005 -a -cmdif
Node exporter for AIX version 1.12.1.0 listening on port 50005
Number of cpu records: 160
Segmentation fault(coredump)

We found that 9 of the 10 servers have 8 SMT threads with over 128 virtual CPU’s allocated.
All the other servers that are working have less than 64 virtual cpu’s.

Is there a limit on number of CPUs that we could be hitting to cause the segmentation faults?

mattdurham · 2022-11-02T14:17:10Z

Can you give https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.6 a whirl? Testing it with some of our users and it solved segmentation fault, would love to see if it also solves your issues. Once its baked in a bit going to submit PR to upstream the changes.

dks0296586 · 2022-11-03T13:37:42Z

We were able to confirm that 120 logical cpu's is fine, but adding 1 more(smt8) to 128 logical cpu's causes the segmentation fault

Can you give https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.6 a whirl?

We will give this a try today!

dks0296586 · 2022-11-03T15:39:03Z

Can you give https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.6 a whirl? Testing it with some of our users and it solved segmentation fault, would love to see if it also solves your issues. Once its baked in a bit going to submit PR to upstream the changes.

This version seems to be working initially with only "-c" on 128+(tested up to 168) logical cpus. Definitly an improvement.
The "-C" is still causing the same segmentation fault errors

mattdurham · 2022-11-04T11:42:04Z

https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.7 <- give this a whirl. The -C goes through a different path than other collects so had to change that one too.

dks0296586 · 2022-11-04T16:00:18Z

https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.7 <- give this a whirl. The -C goes through a different path than other collects so had to change that one too.

That seems to be running with no segmentation faults!

During the issues with this, we noticed that our CPU usage % doesn't seem to be coming out right on this or the older versions. Have you noticed this?
This probably doesn't belong in this thread, I can start a new one to discuss.

mattdurham · 2022-11-07T16:10:08Z

I haven't but its not something I have looked into. If you want to start a new discussion and tag me with the exact details, I can take a look.

lbsivahari · 2023-08-02T17:24:51Z

Please refer pull request #33 #34 #35, Whether that is fixing your issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault(coredump) when using -c or -C options #26

Segmentation fault(coredump) when using -c or -C options #26

dks0296586 commented Nov 1, 2022 •

edited

Loading

mattdurham commented Nov 2, 2022

dks0296586 commented Nov 3, 2022

dks0296586 commented Nov 3, 2022

mattdurham commented Nov 4, 2022 •

edited

Loading

dks0296586 commented Nov 4, 2022

mattdurham commented Nov 7, 2022

lbsivahari commented Aug 2, 2023

Segmentation fault(coredump) when using -c or -C options #26

Segmentation fault(coredump) when using -c or -C options #26

Comments

dks0296586 commented Nov 1, 2022 • edited Loading

mattdurham commented Nov 2, 2022

dks0296586 commented Nov 3, 2022

dks0296586 commented Nov 3, 2022

mattdurham commented Nov 4, 2022 • edited Loading

dks0296586 commented Nov 4, 2022

mattdurham commented Nov 7, 2022

lbsivahari commented Aug 2, 2023

dks0296586 commented Nov 1, 2022 •

edited

Loading

mattdurham commented Nov 4, 2022 •

edited

Loading