-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recompute TTL values on each get
#34
Conversation
TTL values must be recomputed on each `get` action instead of being fixed to a constant value for the lifetime of the cached object. Prior to this change, the constant TTL value `X` would cause kafka-connect to continually reschedule connector restarts `X` milliseconds in the future, effectively ensuring that the connector never actually restarts. With this change, the reschedule actions have a timestamp that shrinks until the TTL is reached. Before (note contant restart time of ~30 mins): ``` 2021-12-03 20:19:65,634 INFO || Scheduling a restart of connector debezium_infra in 1799900 ms [org.apache.kafka.connect.runtime.WorkerConfigTransformer] 2021-12-03 20:18:25,143 INFO || Scheduling a restart of connector debezium_infra in 1799900 ms [org.apache.kafka.connect.runtime.WorkerConfigTransformer] ``` After (note restart time decreases ~40 seconds and the log messages are ~40 seconds apart): ``` 2021-12-03 21:09:24,228 INFO || Scheduling a restart of connector debezium_infra in 2063263 ms [org.apache.kafka.connect.runtime.WorkerConfigTransformer] 2021-12-03 21:08:45,858 INFO || Scheduling a restart of connector debezium_infra in 2101945 ms [org.apache.kafka.connect.runtime.WorkerConfigTransformer] ```
Hey @kppullin . Please accept our apologies for leaving this PR sitting here going stale. I just have one question, and that is how to ensure the automatic scheduling:
I'm trying to write tests for some of these scenarios to ensure they work, but have been unable to ensure that the automatic restart ever happens (even with your fixes). From the documentation:
Are there any specific configuration you had to supply to ensure the connector was restarted on the TTL change? (Other than the |
@davidsloan since it's been so long I've lost all the details and context on this change. I looked through our configs and do not see anything that'd relate or be required for the modification in this PR to work. FWIW this change, along with #32, has been working well for a couple years now in our deployed kafka-connect clusters. However, there may be a separate need to refresh the "parent" vault credentials which expire after 32 days by default... this remains to be confirmed and something we may have seen just a couple times as we typically restart our clusters more frequently than 32 days. |
Hi @kppullin I appreciate the feedback above! I think the majority of these changes will be covered under #47 and #50 but we did not update the Azure secret provider yet. If you would like to rebase your PR for Azure then it will be accepted, otherwise we will try to get around to fixing this functionality in the future. |
@davidsloan - I'm unsure when I'll next have time to revisit this (as well as #32 and #33), but would hope to find time in the next few months. We are actiely looking into resolving the issue mentioned above where the parent/login token expires after 32 days in our spring boot services and then will move on to fixing this on the kafka-connect & secret-provider side next, at which point will likely include revisiting the various PRs I've pending. |
Your Azure work has been rebased against the latest code in PR #63 . Thanks again for the contribution! |
TTL values must be recomputed on each
get
action insteadof being fixed to a constant value for the lifetime of the cached object.
Prior to this change, the constant TTL value
X
would cause kafka-connectto continually reschedule connector restarts
X
milliseconds in the future,effectively ensuring that the connector never actually restarts.
With this change, the reschedule actions have a timestamp that shrinks until
the TTL is reached.
Before (note contant restart time of ~30 mins):
After (note restart time decreases ~40 seconds and the log messages are ~40 seconds apart):