-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault(coredump) when using -c or -C options #26
Comments
Can you give https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.6 a whirl? Testing it with some of our users and it solved segmentation fault, would love to see if it also solves your issues. Once its baked in a bit going to submit PR to upstream the changes. |
We were able to confirm that 120 logical cpu's is fine, but adding 1 more(smt8) to 128 logical cpu's causes the segmentation fault
We will give this a try today! |
This version seems to be working initially with only "-c" on 128+(tested up to 168) logical cpus. Definitly an improvement. |
https://github.com/grafana/node_exporter_aix/releases/tag/v1.15.7 <- give this a whirl. The -C goes through a different path than other collects so had to change that one too. |
That seems to be running with no segmentation faults! During the issues with this, we noticed that our CPU usage % doesn't seem to be coming out right on this or the older versions. Have you noticed this? |
I haven't but its not something I have looked into. If you want to start a new discussion and tag me with the exact details, I can take a look. |
We have deployed the exporter to approximately 200 AIX servers of various versions and TL levels with no issues.
There are 10 servers, all atleast running AIX 7.1 that are having issues.
When we set either -C or -c and Prometheus initiates the scrape, we get a segmentation fault. This happens on all versions of the exporter that we have tested it on (1.14.3.0, 1.12.1.0, 1.8.0.0, maybe others)
We tested the debug version that was posted in another segmentation fault issue, and got a little extra info:
We found that 9 of the 10 servers have 8 SMT threads with over 128 virtual CPU’s allocated.
All the other servers that are working have less than 64 virtual cpu’s.
Is there a limit on number of CPUs that we could be hitting to cause the segmentation faults?
The text was updated successfully, but these errors were encountered: