Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

80 clean up left over job status progress and result queues #63

Merged

Conversation

cwang39403
Copy link
Contributor


if auto_dead_letter_after_ttl is not None:
message_ttl = auto_dead_letter_after_ttl
queue_ttl = auto_dead_letter_after_ttl * 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed JOB_QUEUES_REMOVAL_BUFFER (previously set too small) and set queue_ttl to be two times of auto_dead_letter_after_ttl

Why: If the worker is running for a long time (assuming almost but slightly less than auto_dead_letter_after_ttl time), the job result message might be returned to the queue where the queue is soon to expire. The message will be dropped silently due to the expired queue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the rational correctly, then no SDK will have connected to this queue for a period of time. Lets call this time queue_offline_time. If we set queue_ttl == auto_dead_letter_after_ttl + 30 seconds, then the queue will be deleted after the SDK has been offline in that period of time. We are assuming that a period of 48 hours (auto_dead_letter_after_ttl) is sufficient to say that an error scenario has occurred and no SDK will retrieve any other messages from this queue. Wouldn't it then hold that auto_dead_letter_after_ttl is sufficient time that the queue may be expired and any results are dropped silently?

I am making the assumption here that connecting to a queue is sufficient to not trigger the TTL. Lets say the queue TTL is set to 2 hours. Do you know if RabbitMQ also triggers the queue TTL and expires it if the queue has been empty for 2 hours but a consumer was connected in all that time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say SDK is never reconnected, and we want the job result message is at least (which is not the case here) dead lettered and logged by the orchestrator and queues to be cleaned up in the end. Then the issue raised here is that the job result message could actually be dropped silently due to the expired queue.

  1. SDK submits a job and declares queues (assume auto_dead_letter_after_ttl = 1hr, and queue_ttl = auto_dead_letter_after_ttl + 30 seconds)
  2. Worker starts processing the job (assume taking 50 mins to complete)
  3. Meanwhile, SDK is offline 1 mins after the job submission due to errors. Queue TTL starts counting down. The queue will be expired in 1 hr + 30 sec
  4. The worker finishes the job (took 50 mins) and the result is returned to the queue. The job result will be dead lettered in 1 hr. However, the job queue is actually expired first before the result can be dead lettered.

"I am making the assumption here that connecting to a queue is sufficient to not trigger the TTL." -> You are right on this. Queue TTL is activated and starts counting down when there is no consumer (SDK offline). When the queue has a consumer (SDK reconnects), the Queue TTL is basically inactive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh gotcha! Does publishing the message to the queue not refresh the Queue TTL? Or are only consumer-side subscriptions counted towards refreshing a queue TTL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a misconception that republishing a message to the queue resets Queue TTL, but it turned out not the case...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants