Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recompute TTL values on each get #34

Closed
wants to merge 3 commits into from
Closed

Conversation

kppullin
Copy link

@kppullin kppullin commented Dec 3, 2021

TTL values must be recomputed on each get action instead
of being fixed to a constant value for the lifetime of the cached object.

Prior to this change, the constant TTL value X would cause kafka-connect
to continually reschedule connector restarts X milliseconds in the future,
effectively ensuring that the connector never actually restarts.

With this change, the reschedule actions have a timestamp that shrinks until
the TTL is reached.

Before (note contant restart time of ~30 mins):

2021-12-03 20:19:65,634 INFO   ||  Scheduling a restart of connector debezium_infra in 1799900 ms   [org.apache.kafka.connect.runtime.WorkerConfigTransformer]
2021-12-03 20:18:25,143 INFO   ||  Scheduling a restart of connector debezium_infra in 1799900 ms   [org.apache.kafka.connect.runtime.WorkerConfigTransformer]

After (note restart time decreases ~40 seconds and the log messages are ~40 seconds apart):

2021-12-03 21:09:24,228 INFO   ||  Scheduling a restart of connector debezium_infra in 2063263 ms   [org.apache.kafka.connect.runtime.WorkerConfigTransformer]
2021-12-03 21:08:45,858 INFO   ||  Scheduling a restart of connector debezium_infra in 2101945 ms   [org.apache.kafka.connect.runtime.WorkerConfigTransformer]

TTL values must be recomputed on each `get` action instead
of being fixed to a constant value for the lifetime of the cached object.

Prior to this change, the constant TTL value `X` would cause kafka-connect
to continually reschedule connector restarts `X` milliseconds in the future,
effectively ensuring that the connector never actually restarts.

With this change, the reschedule actions have a timestamp that shrinks until
the TTL is reached.

Before (note contant restart time of ~30 mins):
```
2021-12-03 20:19:65,634 INFO   ||  Scheduling a restart of connector debezium_infra in 1799900 ms   [org.apache.kafka.connect.runtime.WorkerConfigTransformer]
2021-12-03 20:18:25,143 INFO   ||  Scheduling a restart of connector debezium_infra in 1799900 ms   [org.apache.kafka.connect.runtime.WorkerConfigTransformer]
```

After (note restart time decreases ~40 seconds and the log messages are ~40 seconds apart):
```
2021-12-03 21:09:24,228 INFO   ||  Scheduling a restart of connector debezium_infra in 2063263 ms   [org.apache.kafka.connect.runtime.WorkerConfigTransformer]
2021-12-03 21:08:45,858 INFO   ||  Scheduling a restart of connector debezium_infra in 2101945 ms   [org.apache.kafka.connect.runtime.WorkerConfigTransformer]
```
@davidsloan
Copy link
Contributor

Hey @kppullin . Please accept our apologies for leaving this PR sitting here going stale.

I just have one question, and that is how to ensure the automatic scheduling:

2021-12-03 21:09:24,228 INFO || Scheduling a restart of connector debezium_infra in 2063263 ms [org.apache.kafka.connect.runtime.WorkerConfigTransformer]

I'm trying to write tests for some of these scenarios to ensure they work, but have been unable to ensure that the automatic restart ever happens (even with your fixes).

From the documentation:

A value of 'restart' indicates that Connect should restart/reload the connector with the updated configuration properties.The restart may actually be scheduled in the future if the external configuration provider indicates that a configuration value will expire in the future.

Are there any specific configuration you had to supply to ensure the connector was restarted on the TTL change? (Other than the config.action.reload that is referred to above?

@kppullin
Copy link
Author

@davidsloan since it's been so long I've lost all the details and context on this change. I looked through our configs and do not see anything that'd relate or be required for the modification in this PR to work.

FWIW this change, along with #32, has been working well for a couple years now in our deployed kafka-connect clusters. However, there may be a separate need to refresh the "parent" vault credentials which expire after 32 days by default... this remains to be confirmed and something we may have seen just a couple times as we typically restart our clusters more frequently than 32 days.

@davidsloan
Copy link
Contributor

Hi @kppullin I appreciate the feedback above!

I think the majority of these changes will be covered under #47 and #50 but we did not update the Azure secret provider yet. If you would like to rebase your PR for Azure then it will be accepted, otherwise we will try to get around to fixing this functionality in the future.

@kppullin
Copy link
Author

@davidsloan - I'm unsure when I'll next have time to revisit this (as well as #32 and #33), but would hope to find time in the next few months. We are actiely looking into resolving the issue mentioned above where the parent/login token expires after 32 days in our spring boot services and then will move on to fixing this on the kafka-connect & secret-provider side next, at which point will likely include revisiting the various PRs I've pending.

@davidsloan
Copy link
Contributor

Your Azure work has been rebased against the latest code in PR #63 .

Thanks again for the contribution!

@davidsloan davidsloan closed this Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants