-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detailed tasks information as metrics #371
Conversation
… yet) and modify some methods to be more reusable
&& operationType != OperationType.KEY_CACHE_SAVE | ||
&& operationType != OperationType.ROW_CACHE_SAVE; | ||
}) | ||
.collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: I'm getting the following exception when running in Cassandra 4.0:
java.lang.IllegalArgumentException: No enum constant org.apache.cassandra.db.compaction.OperationType.Compaction
at java.base/java.lang.Enum.valueOf(Unknown Source)
at org.apache.cassandra.db.compaction.OperationType.valueOf(OperationType.java:20)
at io.k8ssandra.metrics.prometheus.CassandraTasksExports.lambda$getCompactionStats$0(CassandraTasksExports.java:288)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
at io.k8ssandra.metrics.prometheus.CassandraTasksExports.getCompactionStats(CassandraTasksExports.java:293)
at io.k8ssandra.metrics.prometheus.CassandraTasksExports.collect(CassandraTasksExports.java:46)
at io.prometheus.client.Collector.collect(Collector.java:45)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:204)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:219)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:152)
at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:71)
at io.prometheus.client.exporter.common.TextFormat.writeFormat(TextFormat.java:53)
at io.k8ssandra.metrics.http.NettyServerHandler.channelRead0(NettyServerHandler.java:66)
at io.k8ssandra.metrics.http.NettyServerHandler.channelRead0(NettyServerHandler.java:30)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I added .toUpperCase() and a catch clause to prevent this Exception. Should help with 4.0 as the CompactionMetrics calls seem to work reliably only with 4.1 and newer.
familySamples.addAll(getStreamInfoStats()); | ||
|
||
// Collect other sstableOperations (if not part of Compactions metrics already) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo: we need follow up tickets for these missing types of operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on 4.1 we have everything, for example UpgradeSSTables is here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: I'm getting exceptions when hitting the metrics endpoint if a compaction is running:
WARN [epollEventLoopGroup-6-1] 2023-10-04 08:56:11,917 DefaultChannelPipeline.java:1152 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.IndexOutOfBoundsException: Index 5 out of bounds for length 5
at java.base/jdk.internal.util.Preconditions.outOfBounds(Unknown Source)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Unknown Source)
at java.base/jdk.internal.util.Preconditions.checkIndex(Unknown Source)
at java.base/java.util.Objects.checkIndex(Unknown Source)
at java.base/java.util.ArrayList.get(Unknown Source)
at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:119)
at io.prometheus.client.exporter.common.TextFormat.writeFormat(TextFormat.java:53)
at io.k8ssandra.metrics.http.NettyServerHandler.channelRead0(NettyServerHandler.java:66)
at io.k8ssandra.metrics.http.NettyServerHandler.channelRead0(NettyServerHandler.java:30)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
A pretty similar exception happens when a streaming operation is ongoing:
WARN [epollEventLoopGroup-6-1] 2023-10-04 09:23:05,143 DefaultChannelPipeline.java:1152 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.IndexOutOfBoundsException: Index 4 out of bounds for length 4
at java.base/jdk.internal.util.Preconditions.outOfBounds(Unknown Source)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Unknown Source)
at java.base/jdk.internal.util.Preconditions.checkIndex(Unknown Source)
at java.base/java.util.Objects.checkIndex(Unknown Source)
at java.base/java.util.ArrayList.get(Unknown Source)
at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:119)
at io.prometheus.client.exporter.common.TextFormat.writeFormat(TextFormat.java:53)
at io.k8ssandra.metrics.http.NettyServerHandler.channelRead0(NettyServerHandler.java:66)
at io.k8ssandra.metrics.http.NettyServerHandler.channelRead0(NettyServerHandler.java:30)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
…s to verify the methods do not error and have correct labelValues and labelNames sizes in the end results also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's working nicely now
While Cassandra metrics already include compaction and streaming statistics, they do not include per-stream, per-compaction or per-sstabletask metrics which would allow to follow one single compaction's progress for example.
This PR adds those (a lot of extra work is because of 3.11 support, which has different APIs than 4.0 and up that would have simplified this implementation).
Fixes #293