-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEDA Unable to Retrieve correct Kafka Metrics from ScaledObject on GKE #5730
Comments
Could you change the log-level to debug in operator and send the operator logs? |
In addition to the log, could you provide the |
sorry for the delay, I still have this issue and created a new GCP/GKE cluster specially to debug it. I was able to reproduce the issue and got the logs from the exact moment where the current metric value switch in the HPA from the logs from the controller when this switch happened:
the scaled object:
|
found people with similar issue https://kubernetes.slack.com/archives/CKZJ36A5D/p1709761505122509 |
@SpiritZhou @dttung2905 yesterday I created an EKS cluster with the same GCP setup/versions, and it works perfectly. Can you think of anything that could be different for GCP and AWS? any kind of blocker or anything that could be causing the issue? I have also followed the k8s events and couldn't find anything bad in GCP. |
I did some more tests, I believe kafka connection is ok, I was able to produce and consume messages inside the pod using Go Sarama library (the same knative kafka extension library). I created a debian pod in Kubernetes/GCP/GKE, attached to it and:
another thing I noticed/not sure if relevant is that the metric is not listed here:
but I can query it, as it shows above. |
You should try to query the specific metric for the ScaledObject, see the examples down below: https://keda.sh/docs/2.14/operate/metrics-server/#querying-metrics-exposed-by-keda-metrics-server |
thanks @zroubalik , Im able to query the metric, and I can also see the metric value via open telemetry/datadog, the metric value is correct/the expected value, and not the unexpected |
This code: fmt.Println("+Inf", int64(math.Inf(1)))
fmt.Println("-Inf", int64(math.Inf(-1)))
fmt.Println("NaN", int64(math.NaN())) prints +Inf 9223372036854775807
-Inf -9223372036854775808
NaN 0 on Apple M1. On same Apple M1, but when compiling with
|
There are places in HPA controller where such conversion is happening. One case that I was investigating is when using KEDA to target custom resource, with TargetType If This On EKS, when we fixed status.replicas on the target resource, things started to work correctly for us. On GKE, we still see |
I've followed up on this issue. We've seen the actual calculation of the scaler to be correct. However HPA is displaying the values wrong when queried. @pstibrany As you pointed out the issue seems to be how the usage is calculated. I still have pending to try to use Looking at the source code in the method it turns out that even if number of replicas is 0, the replicas calculation is correct because in the end the Knowing this is the root cause granted me peace of mind, thank you! Now the remaining question is why That's assuming that GCP did not deploy their own custom version of the Autoscaler and that introduced something else that could be messing around. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
I think this issue might be worth documenting |
I opened a case for this / my related discussion #6375 Case 55415916: HPA scaler has huge delay when coming from 0 replica deployment Let's see if this get us somewhere... |
Report
KEDA is unable to retrieve metrics correctly from a ScaledObject/ScaleTarget using a Kafka trigger when deployed to a GKE cluster (It works locally)
Expected Behavior
When HPA calculates the current metric value, it should not return
-9223372036854775808m
, but a valid Kafka lag.Actual Behavior
When the Kafka ScaledObject is deployed to GKE:
Steps to Reproduce the Problem
Logs from KEDA operator
There is no error or warning in the Keda operator.
KEDA Version
2.13.1
Kubernetes Version
1.27
Platform
Google Cloud
Scaler Details
Kafka
Anything else?
No response
The text was updated successfully, but these errors were encountered: