Use deterministic job_ids to avoid retrying successful queries #977

McKnight-42 · 2023-10-23T20:10:37Z

resolves #949
docs dbt-labs/docs.getdbt.com/#

Problem

Currently if we experience a transient exception like RemoteDisconnected we can sometimes end up re-running a query that has been successfully kicked off.

Solution

on retry try to poll by a deterministic job_id to see if a successful job has already been kicked off

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

…ed in previous run

…mcknight/adap-924

McKnight-42 · 2023-10-23T20:12:07Z

dbt/adapters/bigquery/connections.py

    def cancel_open(self) -> None:
-        pass
+        names = []
+        this_connection = self.get_if_exists()
+        with self.lock:
+            for thread_id, connection in self.thread_connections.items():
+                if connection is this_connection:
+                    continue
+
+                if connection.handle is not None and connection.state == ConnectionState.OPEN:
+                    client = connection.handle
+                    for job_id in self.jobs_by_thread.get(thread_id, []):
+
+                        def fn():
+                            return client.cancel_job(job_id)
+
+                        self._retry_and_handle(msg=f"Cancel job: {job_id}", conn=connection, fn=fn)
+
+                    self.close(connection)
+
+                if connection.name is not None:
+                    names.append(connection.name)


Do we need to return names here? if we do we need to re-add it and change the type for function. to match what's in dbt-core on the SQLConnectionManager

Within dbt-bigquery the only place we call this is in unit tests

McKnight-42 · 2023-10-23T20:26:29Z

Having to modify two unit tests to take into account object() not having newly assigned attributes from cancel_open

    adapter.connections.thread_connections[0] = object()

test_cancel_open_connections_master in test_bigquery_adapter.py would it be best to create a new mock?

tests/unit/test_bigquery_adapter.py

dbt/adapters/bigquery/connections.py

McKnight-42 · 2023-10-25T16:51:48Z

all integration/functional tests failing now for

dbt.exceptions.DbtRuntimeError: Runtime Error
E             'Client' object has no attribute 'connection'

in raw_execute

McKnight-42 · 2023-10-25T19:35:26Z

all integration/functional tests failing now for
dbt.exceptions.DbtRuntimeError: Runtime Error
E             'Client' object has no attribute 'connection'
in raw_execute

solved this part of it by not going through defined client and just using conn.name

McKnight-42 · 2023-10-25T19:36:41Z

@colin-rogers-dbt it's possible we may still have to use uuid like the original pr as I don't think invocation_id is active on a fresh run during the raw_execute so we may need to define a unique identifier instead. thoughts?

this seems to pass locally but not a fan of interaction when you add a breakpoint around ln524 on connections.py seems like it loops to many times in pdb

…der to create unique job_id

…mcknight/adap-924

mikealfare · 2023-10-27T20:48:00Z

dbt/adapters/bigquery/connections.py

+
+                    self.close(connection)
+
+                if connection.name is not None:


Do we specifically want all connections which are not this_connection? Or do we only want connections in which we cancelled jobs? In this current flow, a connection for which connection.state == ConnectionState.CLOSED will show up in names, which doesn't feel like an intuitive list to get from `cancel_open'.

If i backtrack open_cancel to core it looks like one of the only places we call it is for cancel_open_connections which does make me think the desired result is that all closed connections are accounted for?

In other words we should only be returning the connections which we cancelled during this call, right?

mikealfare · 2023-10-27T20:51:43Z

dbt/adapters/bigquery/connections.py

+                if connection is this_connection:
+                    continue
+
+                if connection.handle is not None and connection.state == ConnectionState.OPEN:


This could be over-engineering, but I would consider putting the contents of this if block into its own method.

to be clear are you referring to lines 308-321?

I'm referring to lines 310-318. Then this would look like:

names = [] this_connection = self.get_if_exists() with self.lock: for thread_id, connection in self.thread_connections.items(): if connection is this_connection: continue if connection.handle and connection.state == ConnectionState.OPEN: self.close_thread(thread_id, connection) # or whatever name you choose if name := connection.name: names.append(name) return names

Something else worth considering is whether we want to handle all of the threads within a connection. I don't know if there's more than one thread for a connection, but I feel like there is. If there's a connection with more than one thread, you'll close that connection in the second condition above when you get to the first thread. Then you'll skip past the second condition for every other thread since connection.state should be closed at that point.

I think what you probably want is a list of job_ids by connection. Then for each connection you would cancel the job. Once all jobs are cancelled, then close the connection.

mikealfare · 2023-10-27T20:55:08Z

dbt/adapters/bigquery/connections.py

+        # build out determinsitic_id
+        model_name = conn.credentials.schema  # schema name as model name is not
+        invocation_id = str(uuid.uuid4())
+        job_id = define_job_id(model_name, invocation_id)


I don't think uuid.uuid4() is deterministic, which means job_id is not either. Have you considered an md5 hash of sufficient attributes (model, connection name, etc.)?

currently calling uuid directly as part of getting unit tests swapped over for functionality I think initial/current plan was to use the invocation_id we define via tracking in core https://docs.getdbt.com/reference/dbt-jinja-functions/invocation_id and it itself is a uuid based on docs.

Instead of using invocation_id (which we only sometimes have) should we just use the actual query text (which we have to have)?

mikealfare · 2023-10-27T20:58:19Z

dbt/adapters/bigquery/jobs.py

@@ -0,0 +1,3 @@
+def define_job_id(model_name, invocation_id):
+    job_id = f"{model_name}_{invocation_id}"


What's the constraint on job_id? Is there a max length? Can all characters that go into a model name also be used in a job id?

will definitely have to test this, I think all characters are fine as we should just be combining 2 strings but length may hit a limit

I would want to make sure that we can submit a job_id to BQ with some weird characters. People put all kinds of things in their model names. An alternative is to hash the model_name so that it's only alpha-numeric.

+1 if we want to stick with uuid we can just generate a deterministic one with uuid.uuid5

…mcknight/adap-924

…igquery into mcknight/adap-924

github-christophe-oudar · 2023-12-07T10:11:35Z

dbt/adapters/bigquery/jobs.py

+def define_job_id(sql, invocation_id=None):
+    if invocation_id:
+        job_id = str(uuid.uuid5(invocation_id, sql))
+    else:
+        job_id = str(uuid.uuid5(_INVOCATION_ID, sql))
+    job_id = job_id.replace("-", "_")
+    return job_id


I would leverage a macro to let end users override that logic to make it unique across invocations if needed

github-actions · 2024-06-05T01:50:45Z

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

github-actions · 2024-06-12T01:51:53Z

Although we are closing this PR as stale, it can still be reopened to continue development. Just add a comment to notify the maintainers.

McKnight-42 added 2 commits October 23, 2023 15:05

initial work on bigquery canceling a retry if a job did in fact succe…

a4e2b74

…ed in previous run

initial work on bigquery canceling a retry if a job did in fact succe…

6056dec

…ed in previous run

McKnight-42 added the Skip Changelog Skips GHA to check for changelog file label Oct 23, 2023

McKnight-42 self-assigned this Oct 23, 2023

cla-bot bot added the cla:yes label Oct 23, 2023

McKnight-42 requested a review from colin-rogers-dbt October 23, 2023 20:10

Merge branch 'main' of https://github.com/dbt-labs/dbt-bigquery into …

28d9379

…mcknight/adap-924

McKnight-42 commented Oct 23, 2023

View reviewed changes

modify unit test

ce59491

modifying unit tests

6dcb67e

McKnight-42 commented Oct 23, 2023

View reviewed changes

tests/unit/test_bigquery_adapter.py Outdated Show resolved Hide resolved

McKnight-42 commented Oct 23, 2023

View reviewed changes

dbt/adapters/bigquery/connections.py Show resolved Hide resolved

modify unit tests

21322b5

McKnight-42 added 4 commits October 25, 2023 14:38

modify determinsitic id build on model_name step

c988153

use a uuid for the invodation_id and schema_name for model name in or…

ccb1a6e

…der to create unique job_id

remove commented out breakpoint

24ebf39

Merge branch 'main' of https://github.com/dbt-labs/dbt-bigquery into …

afef1e8

…mcknight/adap-924

mikealfare reviewed Oct 27, 2023

View reviewed changes

McKnight-42 requested a review from mikealfare October 27, 2023 21:28

McKnight-42 and others added 7 commits October 27, 2023 16:28

Merge branch 'main' into mcknight/adap-924

8e71a27

Merge branch 'main' of https://github.com/dbt-labs/dbt-bigquery into …

6bf6c6a

…mcknight/adap-924

Merge branch 'main' of https://github.com/dbt-labs/dbt-bigquery into …

e4c72f8

…mcknight/adap-924

Merge branch 'main' into mcknight/adap-924

23180e1

Merge branch 'mcknight/adap-924' of https://github.com/dbt-labs/dbt-b…

8132504

…igquery into mcknight/adap-924

change job_id to be based on hashed sql of current query and a uuid

6ebd25c

Merge branch 'main' into mcknight/adap-924

8a32a1e

McKnight-42 mentioned this pull request Dec 6, 2023

retry wait for result independently from job creation #1042

Closed

4 tasks

github-christophe-oudar reviewed Dec 7, 2023

View reviewed changes

McKnight-42 mentioned this pull request Dec 20, 2023

[ADAP-1062] [Bug] Retries on wait for result step is recreating the whole job #1045

Closed

2 tasks

github-actions bot added the Stale label Jun 5, 2024

github-actions bot closed this Jun 12, 2024

mikealfare deleted the mcknight/adap-924 branch July 17, 2024 23:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use deterministic job_ids to avoid retrying successful queries #977

Use deterministic job_ids to avoid retrying successful queries #977

McKnight-42 commented Oct 23, 2023 •

edited

Loading

McKnight-42 Oct 23, 2023 •

edited

Loading

McKnight-42 commented Oct 23, 2023

McKnight-42 commented Oct 25, 2023 •

edited

Loading

McKnight-42 commented Oct 25, 2023

McKnight-42 commented Oct 25, 2023 •

edited

Loading

mikealfare Oct 27, 2023

McKnight-42 Oct 27, 2023

mikealfare Nov 3, 2023

mikealfare Oct 27, 2023

McKnight-42 Oct 27, 2023

mikealfare Nov 3, 2023

mikealfare Oct 27, 2023

McKnight-42 Oct 27, 2023

colin-rogers-dbt Nov 6, 2023

mikealfare Oct 27, 2023

McKnight-42 Oct 27, 2023

mikealfare Nov 3, 2023

colin-rogers-dbt Nov 6, 2023

github-christophe-oudar Dec 7, 2023

github-actions bot commented Jun 5, 2024

github-actions bot commented Jun 12, 2024

		@@ -0,0 +1,3 @@
		def define_job_id(model_name, invocation_id):
		job_id = f"{model_name}_{invocation_id}"

Use deterministic job_ids to avoid retrying successful queries #977

Use deterministic job_ids to avoid retrying successful queries #977

Conversation

McKnight-42 commented Oct 23, 2023 • edited Loading

Problem

Solution

Checklist

McKnight-42 Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

McKnight-42 commented Oct 23, 2023

McKnight-42 commented Oct 25, 2023 • edited Loading

McKnight-42 commented Oct 25, 2023

McKnight-42 commented Oct 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jun 5, 2024

github-actions bot commented Jun 12, 2024

McKnight-42 commented Oct 23, 2023 •

edited

Loading

McKnight-42 Oct 23, 2023 •

edited

Loading

McKnight-42 commented Oct 25, 2023 •

edited

Loading

McKnight-42 commented Oct 25, 2023 •

edited

Loading