retry wait for result independently from job creation #1042

github-christophe-oudar · 2023-12-05T19:39:39Z

resolves #1045

Problem

When an error occurs either on job creation or waiting for the result, the job creation + wait result step is retried.
Then the underlying wait for result step might "fail" (as it's polling for the result every X seconds) and a network error... can lead to retry the whole job.
If the job isn't idempotent => it leads to a bug (what happened for a coworker).
if the job is idempotent => you likely wasted slot time/BQ resources.

Solution

To solve that, let's split the step in 2 functions that are both retried on their own so that we retry accessing the running job.

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

github-christophe-oudar · 2023-12-05T19:44:51Z

dbt/adapters/bigquery/connections.py

@@ -787,7 +789,7 @@ def reopen_conn_on_error(error):
                target=fn,
                predicate=_ErrorCounter(self.get_job_retries(conn)).count_error,
                sleep_generator=self._retry_generator(),
-                deadline=self.get_job_retry_deadline_seconds(conn),
+                timeout=self.get_job_retry_deadline_seconds(conn),


it's just that the previous field is deprecated in the driver, it's the same behavior

McKnight-42 · 2023-12-06T20:20:53Z

@github-christophe-oudar wanted to point you to this open PR I have as I think these are related #977

github-christophe-oudar · 2023-12-06T21:07:43Z

@McKnight-42 thank for pointing it out 👍
That solution looks interesting too but I wonder if my approach wouldn't be simpler to fix that problem.
Wouldn't the query fail if you provide an existing running job id?
I see that PR is stale for 1 month, what's the status?
That would be great to move forward on a solution that properly retry on the job status as it looks like I'm not the single person affected.

McKnight-42 · 2023-12-06T21:38:49Z

@github-christophe-oudar The ticket is still being worked on. I did have to set it aside for a bit due to some other work and traveling but it's on my board. I plan to set some time to dig into these pr's and problems a little more over the next few days and will keep you updated.

github-christophe-oudar · 2023-12-06T22:03:42Z

Ok, great to know!
I think both approaches could be combined.
The sooner we can deal with that bug, the better 🙏

github-actions · 2024-07-15T01:56:17Z

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

github-actions · 2024-07-23T01:54:23Z

Although we are closing this PR as stale, it can still be reopened to continue development. Just add a comment to notify the maintainers.

cla-bot bot added the cla:yes label Dec 5, 2023

github-christophe-oudar commented Dec 5, 2023

View reviewed changes

Kayrnt force-pushed the fix-retry-job branch 4 times, most recently from 36051fb to ccb141d Compare December 6, 2023 16:53

retry wait for result independently from job creation

120fce1

Kayrnt force-pushed the fix-retry-job branch from ccb141d to 120fce1 Compare December 6, 2023 16:56

github-christophe-oudar marked this pull request as ready for review December 6, 2023 16:57

github-christophe-oudar requested a review from a team as a code owner December 6, 2023 16:57

McKnight-42 self-requested a review December 6, 2023 21:39

McKnight-42 mentioned this pull request Dec 20, 2023

[ADAP-1062] [Bug] Retries on wait for result step is recreating the whole job #1045

Closed

2 tasks

McKnight-42 self-assigned this Jan 16, 2024

github-actions bot added the Stale label Jul 15, 2024

github-actions bot closed this Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retry wait for result independently from job creation #1042

retry wait for result independently from job creation #1042

github-christophe-oudar commented Dec 5, 2023 •

edited

Loading

github-christophe-oudar Dec 5, 2023

McKnight-42 commented Dec 6, 2023

github-christophe-oudar commented Dec 6, 2023

McKnight-42 commented Dec 6, 2023

github-christophe-oudar commented Dec 6, 2023

github-actions bot commented Jul 15, 2024

github-actions bot commented Jul 23, 2024

retry wait for result independently from job creation #1042

retry wait for result independently from job creation #1042

Conversation

github-christophe-oudar commented Dec 5, 2023 • edited Loading

Problem

Solution

Checklist

github-christophe-oudar Dec 5, 2023

Choose a reason for hiding this comment

McKnight-42 commented Dec 6, 2023

github-christophe-oudar commented Dec 6, 2023

McKnight-42 commented Dec 6, 2023

github-christophe-oudar commented Dec 6, 2023

github-actions bot commented Jul 15, 2024

github-actions bot commented Jul 23, 2024

github-christophe-oudar commented Dec 5, 2023 •

edited

Loading