Skip to content

Commit

Permalink
Add Jenkins health metrics (#998)
Browse files Browse the repository at this point in the history
* Add Jenkins health metrics

* Add Jenkins health metrics

* Try fix the build

* Improve code readability

* Improve code

* Improve code

* revert CI config

* Fix bug

* add metric jenkins.plugins.updates

* Better code docs

---------

Co-authored-by: Ivan Fernandez Calvo <[email protected]>
  • Loading branch information
cyrille-leclerc and kuisathaverat authored Dec 16, 2024
1 parent e1da710 commit a85780e
Show file tree
Hide file tree
Showing 7 changed files with 500 additions and 215 deletions.
76 changes: 76 additions & 0 deletions docs/monitoring-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,44 @@ Inventory of health metrics collected by the Jenkins OpenTelemetry integration:
<td></td>
<td>Job failed</td>
</tr>
<tr>
<td>jenkins.executor</td>
<td><code>${executors}</code></td>
<td>
<code>label</code>,<br/>
<code>status</code>
</td>
<td>
Jenkins build agent <code>label</code>code> like <code>linux</code><br/>
<code>busy</code>, <code>idle</code>, <code>connecting</code>
</td>
<td>
Jenkins executors broken down by <code>label</code> and <code>status</code>. Executors annotated with
multiple <code>label</code> are reported multiple times
</td>
</tr>
<tr>
<td>jenkins.executor.total</td>
<td><code>${executors}</code></td>
<td>
<code>status</code>
</td>
<td>
<code>busy</code>, <code>idle</code>
</td>
<td>Jenkins executors broken down by <code>status</code></td>
</tr>
<tr>
<td>jenkins.node</td>
<td><code>${nodes}</code></td>
<td>
<code>status</code>
</td>
<td>
<code>online</code>, <code>offline</code>
</td>
<td>Jenkins build nodes</td>
</tr>
<tr>
<td>jenkins.executor.available</td>
<td><code>${executors}</code></td>
Expand Down Expand Up @@ -166,6 +204,15 @@ Inventory of health metrics collected by the Jenkins OpenTelemetry integration:
<td></td>
<td></td>
</tr>
<tr>
<td>jenkins.queue</td>
<td><code>${tasks}</code></td>
<td><code>status</code></td>
<td>
<code>blocked</code>, <code>buildable</code>, <code>stuck</code>, <code>waiting</code>, <code>unknown</code>
</td>
<td>Number of tasks in the queue. See <code>status</code>code> description [here](https://javadoc.jenkins.io/hudson/model/Queue.html)</td>
</tr>
<tr>
<td>jenkins.queue.waiting</td>
<td><code>${items}</code></td>
Expand Down Expand Up @@ -208,6 +255,35 @@ Inventory of health metrics collected by the Jenkins OpenTelemetry integration:
<td></td>
<td>Disk Usage size</td>
</tr>
<tr>
<td>http.server.request.duration</td>
<td><code>s</code></td>
<td>
<code>http.request.method</code>,<br/>
<code>url.scheme</code>,<br/>
<code>error.type</code>, <br/>
<code>http.response.status_code</code>, <br/>
<code>http.route</code>, <br/>
<code>server.address</code>, <br/>
<code>server.port</code>
</td>
<td></td>
<td>HTTP server duration metric as defined by the OpenTelemetry specification ([here](https://opentelemetry.io/docs/specs/semconv/http/http-metrics/#metric-httpserverrequestduration))</td>
</tr>
<tr>
<td>jenkins.plugins</td>
<td><code>${plugins}</code></td>
<td><code>status</code></td>
<td><code>active</code>, <code>inactive</code>, <code>failed</code></td>
<td>Jenkins plugins broken down by activation <code>status</code></td>
</tr>
<tr>
<td>jenkins.plugins.updates</td>
<td><code>${plugins}</code></td>
<td><code>status</code></td>
<td><code>hasUpdate</code>, <code>isUpToDate</code></td>
<td>Jenkins plugins broken down by updatability <code>status</code></td>
</tr>
</table>

## Jenkins agents metrics
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,15 @@

package io.jenkins.plugins.opentelemetry.init;

import static io.jenkins.plugins.opentelemetry.semconv.JenkinsOtelSemanticAttributes.STATUS;

import hudson.Extension;
import hudson.model.Computer;
import hudson.model.LoadStatistics;
import hudson.model.Node;
import io.jenkins.plugins.opentelemetry.JenkinsControllerOpenTelemetry;
import io.jenkins.plugins.opentelemetry.api.OpenTelemetryLifecycleListener;
import io.opentelemetry.api.common.AttributeKey;
import io.jenkins.plugins.opentelemetry.semconv.JenkinsOtelSemanticAttributes;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.ObservableLongMeasurement;
Expand All @@ -19,6 +23,8 @@
import javax.annotation.PostConstruct;
import javax.inject.Inject;
import java.util.Objects;
import java.util.Optional;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.logging.Level;
import java.util.logging.Logger;

Expand All @@ -42,30 +48,87 @@ public void postConstruct() {
logger.log(Level.FINE, () -> "Start monitoring Jenkins controller executor pool...");

Meter meter = Objects.requireNonNull(jenkinsControllerOpenTelemetry).getDefaultMeter();

final ObservableLongMeasurement queueLength = meter.gaugeBuilder(JENKINS_EXECUTOR_QUEUE).setUnit("${items}").setDescription("Executors queue items").ofLongs().buildObserver();
final ObservableLongMeasurement totalExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_TOTAL).setUnit("${executors}").setDescription("Total executors").ofLongs().buildObserver();
final ObservableLongMeasurement nodes = meter.gaugeBuilder(JENKINS_NODE).setUnit("${nodes}").setDescription("Nodes").ofLongs().buildObserver();
final ObservableLongMeasurement executors = meter.gaugeBuilder(JENKINS_EXECUTOR).setUnit("${executors}").setDescription("Per label executors").ofLongs().buildObserver();

// TODO the metrics below should be deprecated in favor of

Check warning on line 57 in src/main/java/io/jenkins/plugins/opentelemetry/init/JenkinsExecutorMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: the metrics below should be deprecated in favor of
// * `jenkins.executor` metric with the `status` and `label`attributes
// * `jenkins.node` metric with the `status` attribute
// * `jenkins.executor.total` metric with the `status` attribute
final ObservableLongMeasurement availableExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_AVAILABLE).setUnit("${executors}").setDescription("Available executors").ofLongs().buildObserver();
final ObservableLongMeasurement busyExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_BUSY).setUnit("${executors}").setDescription("Busy executors").ofLongs().buildObserver();
final ObservableLongMeasurement idleExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_IDLE).setUnit("${executors}").setDescription("Idle executors").ofLongs().buildObserver();
final ObservableLongMeasurement onlineExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_ONLINE).setUnit("${executors}").setDescription("Online executors").ofLongs().buildObserver();
final ObservableLongMeasurement connectingExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_CONNECTING).setUnit("${executors}").setDescription("Connecting executors").ofLongs().buildObserver();
final ObservableLongMeasurement definedExecutors = meter.gaugeBuilder(JENKINS_EXECUTOR_DEFINED).setUnit("${executors}").setDescription("Defined executors").ofLongs().buildObserver();
final ObservableLongMeasurement queueLength = meter.gaugeBuilder(JENKINS_EXECUTOR_QUEUE).setUnit("${items}").setDescription("Executors queue items").ofLongs().buildObserver();

logger.log(Level.FINER, () -> "Metrics: " + availableExecutors + ", " + busyExecutors + ", " + idleExecutors + ", " + onlineExecutors + ", " + connectingExecutors + ", " + definedExecutors + ", " + queueLength);

meter.batchCallback(() -> {
logger.log(Level.FINE, () -> "Recording Jenkins controller executor pool metrics...");
logger.log(Level.FINER, () -> "Metrics: " + availableExecutors + ", " + busyExecutors + ", " + idleExecutors + ", " + onlineExecutors + ", " + connectingExecutors + ", " + definedExecutors + ", " + queueLength);
Jenkins jenkins = Jenkins.get();

// TOTAL EXECUTORS
AtomicInteger totalExecutorsIdle = new AtomicInteger();
AtomicInteger totalExecutorsBusy = new AtomicInteger();
AtomicInteger nodeOnline = new AtomicInteger();
AtomicInteger nodeOffline = new AtomicInteger();

if (jenkins.getNumExecutors() > 0) {

Check warning on line 80 in src/main/java/io/jenkins/plugins/opentelemetry/init/JenkinsExecutorMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Partially covered line

Line 80 is only partially covered, one branch is missing
nodeOnline.incrementAndGet();
Optional.ofNullable(jenkins.toComputer())
.map(Computer::getExecutors)
.ifPresent(e -> e.forEach(executor -> {
if (executor.isIdle()) {

Check warning on line 85 in src/main/java/io/jenkins/plugins/opentelemetry/init/JenkinsExecutorMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Partially covered line

Line 85 is only partially covered, one branch is missing
totalExecutorsIdle.incrementAndGet();
} else {
totalExecutorsBusy.incrementAndGet();

Check warning on line 88 in src/main/java/io/jenkins/plugins/opentelemetry/init/JenkinsExecutorMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered line

Line 88 is not covered by tests
}
}));
}
jenkins.getNodes().stream().map(Node::toComputer).filter(Objects::nonNull).forEach(node -> {
if (node.isOnline()) {
nodeOnline.incrementAndGet();
node.getExecutors()
.forEach(executor -> {
if (executor.isIdle()) {

Check warning on line 97 in src/main/java/io/jenkins/plugins/opentelemetry/init/JenkinsExecutorMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Partially covered line

Line 97 is only partially covered, one branch is missing
totalExecutorsIdle.incrementAndGet();
} else {
totalExecutorsBusy.incrementAndGet();

Check warning on line 100 in src/main/java/io/jenkins/plugins/opentelemetry/init/JenkinsExecutorMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered line

Line 100 is not covered by tests
}
});
} else {
nodeOffline.incrementAndGet();
}
});

totalExecutors.record(totalExecutorsBusy.get(), Attributes.of(STATUS, "busy"));
totalExecutors.record(totalExecutorsIdle.get(), Attributes.of(STATUS, "idle"));
nodes.record(nodeOnline.get(), Attributes.of(STATUS, "online"));
nodes.record(nodeOffline.get(), Attributes.of(STATUS, "offline"));

// PER LABEL
jenkins.getLabels().forEach(label -> {
LoadStatistics.LoadStatisticsSnapshot loadStatisticsSnapshot = label.loadStatistics.computeSnapshot();
Attributes attributes = Attributes.of(AttributeKey.stringKey("label"), label.getDisplayName());
Attributes attributes = Attributes.of(JenkinsOtelSemanticAttributes.LABEL, label.getDisplayName());

executors.record(loadStatisticsSnapshot.getBusyExecutors(), attributes.toBuilder().put(STATUS, "busy").build());
executors.record(loadStatisticsSnapshot.getIdleExecutors(), attributes.toBuilder().put(STATUS, "idle").build());
executors.record(loadStatisticsSnapshot.getConnectingExecutors(), attributes.toBuilder().put(STATUS, "connecting").build());
queueLength.record(loadStatisticsSnapshot.getQueueLength(), attributes);

// TODO the metrics below should be deprecated in favor of `jenkins.executor` metric with the `status`

Check warning on line 123 in src/main/java/io/jenkins/plugins/opentelemetry/init/JenkinsExecutorMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: the metrics below should be deprecated in favor of `jenkins.executor` metric with the `status`
// and `label`attributes
availableExecutors.record(loadStatisticsSnapshot.getAvailableExecutors(), attributes);
busyExecutors.record(loadStatisticsSnapshot.getBusyExecutors(), attributes);
idleExecutors.record(loadStatisticsSnapshot.getIdleExecutors(), attributes);
onlineExecutors.record(loadStatisticsSnapshot.getOnlineExecutors(), attributes);
definedExecutors.record(loadStatisticsSnapshot.getDefinedExecutors(), attributes);
connectingExecutors.record(loadStatisticsSnapshot.getConnectingExecutors(), attributes);
queueLength.record(loadStatisticsSnapshot.getQueueLength(), attributes);
});
}, availableExecutors, busyExecutors, idleExecutors, onlineExecutors, connectingExecutors, definedExecutors, queueLength);
}, availableExecutors, busyExecutors, idleExecutors, onlineExecutors, connectingExecutors, definedExecutors, totalExecutors, executors, nodes, queueLength);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
/*
* Copyright The Original Author or Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.jenkins.plugins.opentelemetry.init;

import hudson.Extension;
import hudson.PluginManager;
import io.jenkins.plugins.opentelemetry.JenkinsControllerOpenTelemetry;
import io.jenkins.plugins.opentelemetry.api.OpenTelemetryLifecycleListener;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.metrics.Meter;
import io.opentelemetry.api.metrics.ObservableLongMeasurement;
import jenkins.YesNoMaybe;
import jenkins.model.Jenkins;

import javax.annotation.PostConstruct;
import javax.inject.Inject;
import java.util.Objects;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.logging.Level;
import java.util.logging.Logger;

import static io.jenkins.plugins.opentelemetry.semconv.JenkinsOtelSemanticAttributes.STATUS;
import static io.jenkins.plugins.opentelemetry.semconv.JenkinsSemanticMetrics.JENKINS_PLUGINS;
import static io.jenkins.plugins.opentelemetry.semconv.JenkinsSemanticMetrics.JENKINS_PLUGINS_UPDATES;

/**
* <p>
* Monitor the Jenkins plugins
* </p>
* <p>
* TODO report on `hasUpdate` plugin count.

Check warning on line 34 in src/main/java/io/jenkins/plugins/opentelemetry/init/PluginMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: report on `hasUpdate` plugin count.
* </p>
*/
@Extension(dynamicLoadable = YesNoMaybe.MAYBE, optional = true)
public class PluginMonitoringInitializer implements OpenTelemetryLifecycleListener {

private static final Logger logger = Logger.getLogger(PluginMonitoringInitializer.class.getName());

@Inject
JenkinsControllerOpenTelemetry jenkinsControllerOpenTelemetry;

@PostConstruct
public void postConstruct() {

logger.log(Level.FINE, () -> "Start monitoring Jenkins plugins...");

Meter meter = Objects.requireNonNull(jenkinsControllerOpenTelemetry).getDefaultMeter();

final ObservableLongMeasurement plugins = meter
.gaugeBuilder(JENKINS_PLUGINS)
.setUnit("${plugins}")
.setDescription("Jenkins plugins")
.ofLongs()
.buildObserver();
final ObservableLongMeasurement pluginUpdates = meter
.gaugeBuilder(JENKINS_PLUGINS_UPDATES)
.setUnit("${plugins}")
.setDescription("Jenkins plugin updates")
.ofLongs()
.buildObserver();
meter.batchCallback(() -> {
logger.log(Level.FINE, () -> "Recording Jenkins controller executor pool metrics...");

AtomicInteger active = new AtomicInteger();
AtomicInteger inactive = new AtomicInteger();
AtomicInteger hasUpdate = new AtomicInteger();
AtomicInteger isUpToDate = new AtomicInteger();

PluginManager pluginManager = Jenkins.get().getPluginManager();
pluginManager.getPlugins().forEach(plugin -> {
if (plugin.isActive()) {

Check warning on line 74 in src/main/java/io/jenkins/plugins/opentelemetry/init/PluginMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Partially covered line

Line 74 is only partially covered, one branch is missing
active.incrementAndGet();
} else {
inactive.incrementAndGet();

Check warning on line 77 in src/main/java/io/jenkins/plugins/opentelemetry/init/PluginMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered line

Line 77 is not covered by tests
}
if (plugin.hasUpdate()) {

Check warning on line 79 in src/main/java/io/jenkins/plugins/opentelemetry/init/PluginMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Partially covered line

Line 79 is only partially covered, one branch is missing
hasUpdate.incrementAndGet();

Check warning on line 80 in src/main/java/io/jenkins/plugins/opentelemetry/init/PluginMonitoringInitializer.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered line

Line 80 is not covered by tests
} else {
isUpToDate.incrementAndGet();
}
});
int failed = pluginManager.getFailedPlugins().size();
plugins.record(active.get(), Attributes.of(STATUS, "active"));
plugins.record(inactive.get(), Attributes.of(STATUS, "inactive"));
plugins.record(failed, Attributes.of(STATUS, "failed"));
pluginUpdates.record(hasUpdate.get(), Attributes.of(STATUS, "hasUpdate"));
pluginUpdates.record(isUpToDate.get(), Attributes.of(STATUS, "isUpToDate"));
}, plugins, pluginUpdates);
}
}
Loading

0 comments on commit a85780e

Please sign in to comment.