adds optional per table metrics #5030

keith-turner · 2024-11-02T20:46:26Z

For a subset of metrics in the tablet server and scan server adds optional tableId tags to meters. In a follow on change the compactor could be updated to emit per table metrics, however its current code is very process oriented and this change should be in its own commit.

Each server process will automatically remove meters for tables that were delete or related to tables it has not been servicing in a while.

closes #4511

For a subset of metrics in the tablet server and scan server adds optional tableId tags to meters. In a follow on change the compactor could be updated to emit per table metrics, however its current code is very process oriented and this change should be in its own commit. Each server process will automatically remove meters for tables that were delete or related to tables it has not been servicing in a while.

dlmarion · 2024-11-04T12:48:07Z

server/base/src/main/java/org/apache/accumulo/server/metrics/PerTableMetrics.java

+   * currently have no table metrics object in the cache. It will also remove an per table metrics
+   * object from the cache that have been inactive for a while or where the table was deleted.
+   */
+  public synchronized void refresh() {


I'm curious if it might be better to evaluate whether per table metrics should be added or removed when a tablet is hosted or unhosted in the ScanServer and TabletServer. There are explicit mechanisms in the TabletServer for hosting and unhosting tablets. In the ScanServer we have the TabletMetadataLoader for hosting a tablet, and we could add an evictionListener to the tabletMetadataCache to handle a tablet removal. Thoughts on that?

It probably would be better to do that for all cases. Its only being done for the tablet server on tablet load for this method. The three other cases you mentioned are not done.

For the scan server I could not find a good place to register on tablet load, I will circle back and see what I can find. For now its probably ok that scan server does not register on load because it has no gauges, so when a scan happen it will touch meters which will load metrics. However that is shaky ground, if gauges were ever used then those may not be loaded until the timer task kicks in. Would also be good to push code to TabletHostingServer so that the metrics code can interact w/ the same code for each server type.

If all 4 cases are covered with callbacks then we could run the timer task less frequently. For the unload case I was completely leaving that to the timer task to catch.

Made some changes related to this in afa401f. Was able to optimize and centralize the code for detecting changes in the set of table ids. Using those changes could efficiently handle a tablet being loaded and detect if anything needed to be done. However for the case of a tablet being unloaded found that is hard to handle that efficiently because when one tablet is unloaded other tablet may still have that same tablet id, so need to scan all tablets on each tablet unload to see if anything needs to be done. Decided not to do anything for this case and leave it to the periodic timer task. Was able to centralize that timer task and make it more efficient though.

keith-turner · 2024-11-04T16:15:17Z

The static analysis checks in the build had a really good find. Found a bug where I forgot check the future for the scheduled task.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project accumulo-server-base: Compilation failure
[ERROR] /home/runner/work/accumulo/accumulo/server/base/src/main/java/org/apache/accumulo/server/metrics/PerTableMetrics.java:[81,56] error: [FutureReturnValueIgnored] Return value of methods returning Future must be checked. Ignoring returned Futures suppresses exceptions thrown from the code that completes the Future.
[ERROR]     (see https://errorprone.info/bugpattern/FutureReturnValueIgnored)
[ERROR]   Did you mean 'var unused = context.getScheduledExecutor().scheduleAtFixedRate(this::refresh, 30, 30, TimeUnit.SECONDS);' or to remove this line?
[ERROR] 
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project accumulo-server-base: Compilation failure

dlmarion · 2024-11-15T21:12:54Z

server/base/src/main/java/org/apache/accumulo/server/metrics/PerTableMetrics.java

+    return perTableMetrics.computeIfAbsent(tableId, tid -> {
+      List<Meter> meters = new ArrayList<>();
+      T tableMetrics = newPerTableMetrics(registry, tableId, meters::add,
+          List.of(Tag.of(TABLE_ID_TAG_NAME, tid.canonical())));


I'm thinking it might be more useful that have a tableName tag and use the table name instead of the tableId.

dlmarion · 2024-11-18T17:47:42Z

pom.xml

@@ -169,7 +169,7 @@
      <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-bom</artifactId>
-        <version>1.12.2</version>
+        <version>1.13.6</version>


FYI that some things will have to change down in the accumulo-testing Terraform contrib code when this is merged due to the version change.

dlmarion · 2024-11-18T18:56:37Z

server/tserver/src/main/java/org/apache/accumulo/tserver/metrics/TabletServerMetrics.java

@@ -63,8 +81,55 @@ private long getTotalEntriesWritten() {
    return FileCompactor.getTotalEntriesWritten();
  }

+  public static class TableMetrics {


I did some digging to see if Micrometer had any support for dynamic tags. I found micrometer-metrics/micrometer#4097, which essentially allows you to create templates for Meters, then these get registered when you supply the tags. I'm wondering if you had seen this, and if not, if it would change your implementation here. I'm thinking we could create the templates (MeterProvider in Micrometer) when the servers start up, then just apply and remove based on the table id tags.

keith-turner added this to the 4.0.0 milestone Nov 2, 2024

dlmarion reviewed Nov 4, 2024

View reviewed changes

keith-turner added 4 commits November 4, 2024 23:27

code review update

afa401f

Merge remote-tracking branch 'upstream/main' into accumulo-4511-2

f2cb289

fix imports

1c08f78

add override annotation

60c48f0

dlmarion mentioned this pull request Nov 15, 2024

New Monitor server implementation #5012

Draft

dlmarion reviewed Nov 15, 2024

View reviewed changes

dlmarion reviewed Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds optional per table metrics #5030

adds optional per table metrics #5030

keith-turner commented Nov 2, 2024

dlmarion Nov 4, 2024

keith-turner Nov 4, 2024

keith-turner Nov 4, 2024

keith-turner commented Nov 4, 2024

dlmarion Nov 15, 2024

dlmarion Nov 18, 2024

dlmarion Nov 18, 2024

adds optional per table metrics #5030

Are you sure you want to change the base?

adds optional per table metrics #5030

Conversation

keith-turner commented Nov 2, 2024

dlmarion Nov 4, 2024

Choose a reason for hiding this comment

keith-turner Nov 4, 2024

Choose a reason for hiding this comment

keith-turner Nov 4, 2024

Choose a reason for hiding this comment

keith-turner commented Nov 4, 2024

dlmarion Nov 15, 2024

Choose a reason for hiding this comment

dlmarion Nov 18, 2024

Choose a reason for hiding this comment

dlmarion Nov 18, 2024

Choose a reason for hiding this comment