'bigquery' key error when running compare-reports fails to produce diff summary #932

Stochastic-Squirrel · 2023-12-06T22:15:20Z

First of all, I am really enjoying this tool! Unfortunately I have come across this bug which is blocking a rollout to the wider team so I am hoping that there is a quick fix!

Describe the bug
When comparing two piperider reports, a warning "bigquery" is returned and no comparison summary is generated.

$ piperider compare-reports --base /target_prod_profile/run.json --target /outputs/latest/run.json --output ./comparison_report
────────────────────────────── Comparison report ───────────────────────────────
Selected reports:
  Base:   /target_prod_profile/run.json
  Target: /outputs/pre-release-20231206213745/run.json
Warning:
'bigquery'
Got problem to generate changeset.
Comparison report: 
/data_warehouse/comparison_report/index.html

Reproduce
Unfortunately, I cannot provide the manifest jsons but I will try my best to describe the issue and steps taken.

I generated a run report on production and I run this across all models in prod

dbt compile -t prod
piperider run --dbt-target prod --debug --report-dir $CI_PROJECT_DIR

When I open an MR, I run a dbt run on the modified and new models only in staging

dbt run --fail-fast -t pre-release --select "state:modified.body+ state:modified.configs+ state:new+" --defer --state /target_prod

I create the staging piperider run report only on the models above

piperider run --select "state:modified.body+ state:modified.configs+ state:new+" --state /target_prod --dbt-target pre-release --debug --report-dir $CI_PROJECT_DIR

I then compare the two reports

piperider compare-reports --base /target_prod_profile/run.json --target /outputs/latest/run.json --output ./comparison_report

What's strange is that the diff summary report works for some MRs but not others. I have tried to find the common trait but I am unable to.
The MR and subsequent report comparison that works is a very simple test case where I add a text column to an existing table with a constant value e.g.

...
"apples" as fruit,
...

Looking at the comparison report, row and col information for both base (production) and target (staging) are recorded.

What I have tried

Tried locally and in my CI environment for the failed cases
Tested the run.json of the MR that did work above both locally and in the CI pipeline and it still works
Tried different combinations of adding cols, removing cols, enforcing data contracts etc
piperider diagnose passes

I attached a debugger and I tried to figure out what was going on.

The GraphDataChangeSet object fails to be created
This is due to the function call for list_changes_in_unique_id failing
This fails because when it invokes the dbt task, the key error for bigquery (hence the warning printed on the console), the AdapterContainer's lookup_adapter function is called which attempts to extract the bigquery adapter using the bigquery key

    def lookup_adapter(self, adapter_name: str) -> Adapter:
        return self.adapters[adapter_name]

Relevant code linked here

Expected behavior
Diff summary reports for dbt models that have been changed.
Example output below from the successful MR comparison

Selected reports:
  Base:   /target_prod_profile/run.json
  Target: /outputs/pre-release-20231206165611/run.json
Impact Summary:
  Code Changes: added=0, removed=0, modified=2
  Resource Impact: potentially_impacted=7, assessed=7, skipped=0, impacted=5
Comparison report: 
/data_warehouse/comparison_report/index.html
Comparison summary: 
/data_warehouse/comparison_report/summary.md

Desktop (please complete the following information):

OS: macOS local env and ubuntu in CI env
Python Version 3.10
Version v0.41 piperider
dbt-core 1.7.2
dbt-bigquery 1.7.2

The text was updated successfully, but these errors were encountered:

DaveFlynn · 2023-12-07T00:41:42Z

@Stochastic-Squirrel Thanks for reporting this issue. The team is taking a look and we'll get back to you shortly

DaveFlynn · 2023-12-07T04:21:36Z

Hi @Stochastic-Squirrel

We're having some difficulty in reproducing the issue. We'll continue to look into this.

It looks like you already ran PipeRider with --debug ? If not, I would suggest that as a further debugging step. There may be a lot of output - If there's nothing sensitive in the output you could share that with us. Either attach it here, or email to [email protected]

In the meantime, you could try out an Impact Report manually by using the DBT Manifest Analyzer in PipeRider Cloud:

Log into PipeRider Cloud
Click the Analyze tab
Upload two manifest files into the Manifest Analyzer

I'll follow up on this when we have had more success reproducing the issue.

Thanks,

Dave

Stochastic-Squirrel · 2023-12-07T13:02:46Z

Hi Dave, thanks for reaching out!

I have been using the --debug flag throughout and nothing is printed to the console. Unfortunately I haven't been able to glean any more info!
I tried the website now and I experienced a server error when attempting uploads
Sentry event id: 229c6e399c2b48d08f61682dff0ac69a
I have tried uploading a single manifest as well as two at the same time.

Unfortunately I don't feel comfortable sharing the manifests in their entirety. I'll try to cut them down to a minimal set.
What are the essential keys needed? I am thinking of maybe isolating a single table that does not contain any sensitive information.

However, I do have an update on what causes the error!
I experimented some more and I noticed that it is only when changing the model YAML files that the error occurs. Changes to the SQL models seem to work fine.

Here are some scenarios that causes the error

Changing the data type of an existing column in prod array<string> to array
Renaming an existing column in prod fruit to fruits_and_vegetables in staging in the model yaml
Switching enforcement of contracts on and off has no effect
Adding a new column to the model YAML in staging that is not in prod does raise an error

In my case, I have to tweak model yamls for SQL models that are affected by a change e.g. data type change.

I hope this makes it a bit easier to recreate the issue on your end.

popcornylu · 2023-12-15T02:24:29Z

@Stochastic-Squirrel

Hi, thanks for your information. I know there is some privacy concern for providing the real run.json. So Is it possible to provide the two run.json from a dummy project? It would help us to reproduce the issue.

Expected reproduce steps

Download your two run.json

run

piperider compare-reports --base run_base.json --target run.json --output ./comparison_report

Get error result:

Selected reports:
  Base:   run_base.json
  Target: run.json
Warning:
'bigquery'
Got problem to generate changeset.
Comparison report: 
/data_warehouse/comparison_report/index.html

Stochastic-Squirrel · 2023-12-20T06:37:38Z

thanks @popcornylu. I'll try to reproduce this error in a dummy project as soon as I can!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'bigquery' key error when running compare-reports fails to produce diff summary #932

'bigquery' key error when running compare-reports fails to produce diff summary #932

Stochastic-Squirrel commented Dec 6, 2023 •

edited

Loading

DaveFlynn commented Dec 7, 2023

DaveFlynn commented Dec 7, 2023

Stochastic-Squirrel commented Dec 7, 2023 •

edited

Loading

popcornylu commented Dec 15, 2023

Stochastic-Squirrel commented Dec 20, 2023

'bigquery' key error when running compare-reports fails to produce diff summary #932

'bigquery' key error when running compare-reports fails to produce diff summary #932

Comments

Stochastic-Squirrel commented Dec 6, 2023 • edited Loading

DaveFlynn commented Dec 7, 2023

DaveFlynn commented Dec 7, 2023

Stochastic-Squirrel commented Dec 7, 2023 • edited Loading

popcornylu commented Dec 15, 2023

Expected reproduce steps

Stochastic-Squirrel commented Dec 20, 2023

Stochastic-Squirrel commented Dec 6, 2023 •

edited

Loading

Stochastic-Squirrel commented Dec 7, 2023 •

edited

Loading