Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for generating metrics #4

Open
calebhailey opened this issue Feb 7, 2022 · 8 comments · May be fixed by #5
Open

Add support for generating metrics #4

calebhailey opened this issue Feb 7, 2022 · 8 comments · May be fixed by #5
Assignees
Labels
enhancement New feature or request

Comments

@calebhailey
Copy link

No description provided.

@jspaleta jspaleta self-assigned this Mar 3, 2022
@jspaleta
Copy link

jspaleta commented Mar 3, 2022

Straw Proposal

  1. Add new boolean flag to optionally emit metrics output
  2. Internally hold array of process objects that match input search configuration
  3. Construct per process metric names from available process functions: https://pkg.go.dev/github.com/shirou/gopsutil/[email protected]/process#Process
  4. tag each per process metric with search_string, full_cmdline, process_name, and 'pid'
  5. Add process_count metric tagged by search_string that can be used in threshold alerting
  6. Add other count metrics tagged by search string summed over matching processes used in threshold alerting.

@calebhailey
Copy link
Author

calebhailey commented Mar 3, 2022

References:

Relevant queries from the Sumo Logic process dashboard(s):

metric=procstat field=cpu_usage host.name={{host.name}}  process.executable.name={{process.executable.name}}  | max by host.name, process.executable.name | topk (10, max) 
metric=procstat field=memory_usage host.name={{host.name}}  process.executable.name={{process.executable.name}}  | max by host.name, process.executable.name | topk (10, max) 
metric=procstat field=cpu_usage host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | filter avg > 65 |  topk(10, avg) 
metric=procstat field=memory_usage host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | filter avg > 65 |  topk(10, avg) 
metric=procstat field=created_at host.name={{host.name}} process.executable.name={{process.executable.name}} | min by process.executable.name, host.name | bottomk(10, max) 
metric=procstat field=created_at host.name={{host.name}} process.executable.name={{process.executable.name}} user=* | count by user | topk(10, latest)
metric=procstat field=num_fds host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | filter latest > 1 |  topk(10, latest) 
metric=procstat (field=read_count or field=write_count) AND host.name={{host.name}}  process.executable.name={{process.executable.name}} | sum by host.name, process.executable.name | filter sum > 65 |  topk(10, sum) 
metric=procstat field=num_threads host.name={{host.name}} process.executable.name={{process.executable.name}} | max by process.executable.name, host.name | topk(10, max) 

metric=procstat field=cpu_usage host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | outlier 
metric=procstat field=read_bytes host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | outlier 
metric=procstat field=write_bytes host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | outlier 
metric=procstat field=num_fds  host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | outlier 
metric=procstat field=major_faults host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | outlier 
metric=procstat field=involuntary_context_switches  host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | outlier 
metric=procstat field=signals_pending  host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | outlier 
metric=procstat field=memory_usage host.name={{host.name}} process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name | outlier
metric=procstat field=cpu_usage host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by field, host.name, user, process.executable.name  
metric=procstat field=memory_usage host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by field,host.name, user, process.executable.name  
metric=procstat field=num_fds host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by field, host.name, user, process.executable.name  
metric=procstat field=num_threads host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by field, host.name, user, process.executable.name
metric=procstat field=read_bytes host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by field, host.name, user, process.executable.name
metric=procstat field=write_bytes host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by field, host.name, user, process.executable.name  
metric=procstat field=memory_rss host.name={{host.name}}  process.executable.name={{process.executable.name}} | avg by host.name, process.executable.name  | outlier 

host.name={{host.name}}  metric=processes field=total | sum by host.name | outlier
metric=processes host.name={{host.name}}   field=total_threads | sum by host.name | outlier
host.name={{host.name}}  metric=processes  field=dead   
host.name={{host.name}}    metric=processes field=zombies    
host.name={{host.name}}  metric=processes  AND (field=running OR field=stopped OR field=sleeping OR field=blocked OR field=dead OR field=wait OR field=idle OR field=paging) | avg by field
host.name={{host.name}}   metric=processes field=total | avg by field

@jspaleta
Copy link

jspaleta commented Mar 3, 2022

So... since this uses a field named field.. should i make this a sumologic-compat flag?

@jspaleta
Copy link

jspaleta commented Mar 3, 2022

@jspaleta
Copy link

jspaleta commented Mar 3, 2022

Development branch pushed... here's initial implementation output:

$ ./sensu-processes-check -s '[{"search_string":"sensu", "full_cmdline" : true }]'
# OK       | 1 >= 1 (found >= required) evaluated true for "sensu"
# Status - OK
# HELP procstat SumoLogic Compatibility
# TYPE procstat gauge
procstat{field="cpu_usage",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu"} 0.7501450595132637 1646347704969

This should be compatible with the first SumoLogic metric query:

metric=procstat field=cpu_usage host.name={{host.name}}  process.executable.name={{process.executable.name}}  | max by host.name, process.executable.name | topk (10, max) 

@jspaleta jspaleta linked a pull request Mar 4, 2022 that will close this issue
4 tasks
@jspaleta
Copy link

jspaleta commented Mar 4, 2022

All SumoLogic metric queries referencing procstat metric family should now work with feature branch in linked pull request.

Note: some metrics require privileged access. IThese metrics will be silently ignored if the executable cannot access them.

@jspaleta
Copy link

jspaleta commented Mar 4, 2022

Okay SumoLogic metric queries referencing processes metric family should now work as well
Here is the output matching a single process.

$ ./sensu-processes-check -s '[{"search_string":"sensu", "full_cmdline" : true }]' 
# OK       | 1 >= 1 (found >= required) evaluated true for "sensu"
# Status - OK
# HELP processes SumoLogic Dashboard Compatible Cumulative Process Metrics
# TYPE processes gauge
processes{field="total",host.name="carbon",units="count"} 1 1646355921928
processes{field="total_threads",host.name="carbon",units="count"} 14 1646355921928
processes{field="parked",host.name="carbon",units="count"} 0 1646355921928
processes{field="wait",host.name="carbon",units="count"} 0 1646355921928
processes{field="zombies",host.name="carbon",units="count"} 0 1646355921928
processes{field="running",host.name="carbon",units="count"} 0 1646355921928
processes{field="sleeping",host.name="carbon",units="count"} 1 1646355921928
processes{field="unknown",host.name="carbon",units="count"} 0 1646355921928
processes{field="blocked",host.name="carbon",units="count"} 0 1646355921928
processes{field="dead",host.name="carbon",units="count"} 0 1646355921928
processes{field="stopped",host.name="carbon",units="count"} 0 1646355921928
processes{field="idle",host.name="carbon",units="count"} 0 1646355921928
processes{field="unexpected",host.name="carbon",units="count"} 0 1646355921928
# HELP procstat SumoLogic Dashboard Compatible Per Process Metrics
# TYPE procstat gauge
procstat{field="cpu_usage",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="percent"} 0.8524630207020176 1646355921928
procstat{field="memory_usage",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="percent"} 0.7666301131248474 1646355921928
procstat{field="memory_rss",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="bytes"} 1.25980672e+08 1646355921928
procstat{field="memory_vms",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="bytes"} 6.205018112e+09 1646355921928
procstat{field="memory_swap",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="bytes"} 0 1646355921928
procstat{field="memory_data",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="bytes"} 0 1646355921928
procstat{field="memory_stack",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="bytes"} 0 1646355921928
procstat{field="memory_locked",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="bytes"} 0 1646355921928
procstat{field="created_at",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="nanoseconds"} 1.646275163e+18 1646355921928
procstat{field="num_fds",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="count"} 0 1646355921928
procstat{field="num_threads",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="count"} 1.4e+07 1646355921928
procstat{field="major_faults",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="count"} 1042 1646355921928
procstat{field="minor_faults",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="count"} 37969 1646355921928
procstat{field="child_major_faults",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="count"} 0 1646355921928
procstat{field="child_minor_faults",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="count"} 314065 1646355921928
procstat{field="involuntary_context_switches",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="count"} 9416 1646355921928
procstat{field="voluntary_context_switches",host.name="carbon",process.executable.name="sensu-backend",search_string="sensu",units="count"} 864533 1646355921928

@asachs01
Copy link
Contributor

@jspaleta @calebhailey noting that the request in this internal ref may be related to this. However, the need appears to be such that they want to alert on if a named process is using more than X% CPU. While the metrics above would go part way to solving the request in the internal ref, we'd still need to add a status component, if that's even a possibility.

@lspxxv lspxxv added the enhancement New feature or request label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants