Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds optional per table metrics #5030

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

keith-turner
Copy link
Contributor

For a subset of metrics in the tablet server and scan server adds optional tableId tags to meters. In a follow on change the compactor could be updated to emit per table metrics, however its current code is very process oriented and this change should be in its own commit.

Each server process will automatically remove meters for tables that were delete or related to tables it has not been servicing in a while.

closes #4511

For a subset of metrics in the tablet server and scan server adds
optional tableId tags to meters.  In a follow on change the compactor
could be updated to emit per table metrics, however its current code is
very process oriented and this change should be in its own commit.

Each server process will automatically remove meters for tables that
were delete or related to tables it has not been servicing in a while.
@keith-turner keith-turner added this to the 4.0.0 milestone Nov 2, 2024
* currently have no table metrics object in the cache. It will also remove an per table metrics
* object from the cache that have been inactive for a while or where the table was deleted.
*/
public synchronized void refresh() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if it might be better to evaluate whether per table metrics should be added or removed when a tablet is hosted or unhosted in the ScanServer and TabletServer. There are explicit mechanisms in the TabletServer for hosting and unhosting tablets. In the ScanServer we have the TabletMetadataLoader for hosting a tablet, and we could add an evictionListener to the tabletMetadataCache to handle a tablet removal. Thoughts on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably would be better to do that for all cases. Its only being done for the tablet server on tablet load for this method. The three other cases you mentioned are not done.

For the scan server I could not find a good place to register on tablet load, I will circle back and see what I can find. For now its probably ok that scan server does not register on load because it has no gauges, so when a scan happen it will touch meters which will load metrics. However that is shaky ground, if gauges were ever used then those may not be loaded until the timer task kicks in. Would also be good to push code to TabletHostingServer so that the metrics code can interact w/ the same code for each server type.

If all 4 cases are covered with callbacks then we could run the timer task less frequently. For the unload case I was completely leaving that to the timer task to catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes related to this in afa401f. Was able to optimize and centralize the code for detecting changes in the set of table ids. Using those changes could efficiently handle a tablet being loaded and detect if anything needed to be done. However for the case of a tablet being unloaded found that is hard to handle that efficiently because when one tablet is unloaded other tablet may still have that same tablet id, so need to scan all tablets on each tablet unload to see if anything needs to be done. Decided not to do anything for this case and leave it to the periodic timer task. Was able to centralize that timer task and make it more efficient though.

@keith-turner
Copy link
Contributor Author

The static analysis checks in the build had a really good find. Found a bug where I forgot check the future for the scheduled task.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project accumulo-server-base: Compilation failure
[ERROR] /home/runner/work/accumulo/accumulo/server/base/src/main/java/org/apache/accumulo/server/metrics/PerTableMetrics.java:[81,56] error: [FutureReturnValueIgnored] Return value of methods returning Future must be checked. Ignoring returned Futures suppresses exceptions thrown from the code that completes the Future.
[ERROR]     (see https://errorprone.info/bugpattern/FutureReturnValueIgnored)
[ERROR]   Did you mean 'var unused = context.getScheduledExecutor().scheduleAtFixedRate(this::refresh, 30, 30, TimeUnit.SECONDS);' or to remove this line?
[ERROR] 
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:compile (default-compile) on project accumulo-server-base: Compilation failure

return perTableMetrics.computeIfAbsent(tableId, tid -> {
List<Meter> meters = new ArrayList<>();
T tableMetrics = newPerTableMetrics(registry, tableId, meters::add,
List.of(Tag.of(TABLE_ID_TAG_NAME, tid.canonical())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it might be more useful that have a tableName tag and use the table name instead of the tableId.

@@ -169,7 +169,7 @@
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-bom</artifactId>
<version>1.12.2</version>
<version>1.13.6</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI that some things will have to change down in the accumulo-testing Terraform contrib code when this is merged due to the version change.

@@ -63,8 +81,55 @@ private long getTotalEntriesWritten() {
return FileCompactor.getTotalEntriesWritten();
}

public static class TableMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some digging to see if Micrometer had any support for dynamic tags. I found micrometer-metrics/micrometer#4097, which essentially allows you to create templates for Meters, then these get registered when you supply the tags. I'm wondering if you had seen this, and if not, if it would change your implementation here. I'm thinking we could create the templates (MeterProvider in Micrometer) when the servers start up, then just apply and remove based on the table id tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore adding table ids tags to some metrics
2 participants