Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'bigquery' key error when running compare-reports fails to produce diff summary #932

Open
Stochastic-Squirrel opened this issue Dec 6, 2023 · 5 comments

Comments

@Stochastic-Squirrel
Copy link

Stochastic-Squirrel commented Dec 6, 2023

First of all, I am really enjoying this tool! Unfortunately I have come across this bug which is blocking a rollout to the wider team so I am hoping that there is a quick fix!

Describe the bug
When comparing two piperider reports, a warning "bigquery" is returned and no comparison summary is generated.

$ piperider compare-reports --base /target_prod_profile/run.json --target /outputs/latest/run.json --output ./comparison_report
────────────────────────────── Comparison report ───────────────────────────────
Selected reports:
  Base:   /target_prod_profile/run.json
  Target: /outputs/pre-release-20231206213745/run.json
Warning:
'bigquery'
Got problem to generate changeset.
Comparison report: 
/data_warehouse/comparison_report/index.html

Reproduce
Unfortunately, I cannot provide the manifest jsons but I will try my best to describe the issue and steps taken.

  1. I generated a run report on production and I run this across all models in prod
dbt compile -t prod
piperider run --dbt-target prod --debug --report-dir $CI_PROJECT_DIR
  1. When I open an MR, I run a dbt run on the modified and new models only in staging
dbt run --fail-fast -t pre-release --select "state:modified.body+ state:modified.configs+ state:new+" --defer --state /target_prod
  1. I create the staging piperider run report only on the models above
piperider run --select "state:modified.body+ state:modified.configs+ state:new+" --state /target_prod --dbt-target pre-release --debug --report-dir $CI_PROJECT_DIR
  1. I then compare the two reports
piperider compare-reports --base /target_prod_profile/run.json --target /outputs/latest/run.json --output ./comparison_report

What's strange is that the diff summary report works for some MRs but not others. I have tried to find the common trait but I am unable to.
The MR and subsequent report comparison that works is a very simple test case where I add a text column to an existing table with a constant value e.g.

...
"apples" as fruit,
...

Looking at the comparison report, row and col information for both base (production) and target (staging) are recorded.

What I have tried

  • Tried locally and in my CI environment for the failed cases
  • Tested the run.json of the MR that did work above both locally and in the CI pipeline and it still works
  • Tried different combinations of adding cols, removing cols, enforcing data contracts etc
  • piperider diagnose passes

I attached a debugger and I tried to figure out what was going on.

  • The GraphDataChangeSet object fails to be created
  • This is due to the function call for list_changes_in_unique_id failing
  • This fails because when it invokes the dbt task, the key error for bigquery (hence the warning printed on the console), the AdapterContainer's lookup_adapter function is called which attempts to extract the bigquery adapter using the bigquery key
    def lookup_adapter(self, adapter_name: str) -> Adapter:
        return self.adapters[adapter_name]

Relevant code linked here

Expected behavior
Diff summary reports for dbt models that have been changed.
Example output below from the successful MR comparison

Selected reports:
  Base:   /target_prod_profile/run.json
  Target: /outputs/pre-release-20231206165611/run.json
Impact Summary:
  Code Changes: added=0, removed=0, modified=2
  Resource Impact: potentially_impacted=7, assessed=7, skipped=0, impacted=5
Comparison report: 
/data_warehouse/comparison_report/index.html
Comparison summary: 
/data_warehouse/comparison_report/summary.md

Desktop (please complete the following information):

  • OS: macOS local env and ubuntu in CI env
  • Python Version 3.10
  • Version v0.41 piperider
  • dbt-core 1.7.2
  • dbt-bigquery 1.7.2
@DaveFlynn
Copy link
Contributor

@Stochastic-Squirrel Thanks for reporting this issue. The team is taking a look and we'll get back to you shortly

@DaveFlynn
Copy link
Contributor

Hi @Stochastic-Squirrel

We're having some difficulty in reproducing the issue. We'll continue to look into this.

It looks like you already ran PipeRider with --debug ? If not, I would suggest that as a further debugging step. There may be a lot of output - If there's nothing sensitive in the output you could share that with us. Either attach it here, or email to [email protected]

In the meantime, you could try out an Impact Report manually by using the DBT Manifest Analyzer in PipeRider Cloud:

  1. Log into PipeRider Cloud
  2. Click the Analyze tab
  3. Upload two manifest files into the Manifest Analyzer

I'll follow up on this when we have had more success reproducing the issue.

Thanks,

Dave

@Stochastic-Squirrel
Copy link
Author

Stochastic-Squirrel commented Dec 7, 2023

Hi Dave, thanks for reaching out!

I have been using the --debug flag throughout and nothing is printed to the console. Unfortunately I haven't been able to glean any more info!
I tried the website now and I experienced a server error when attempting uploads
Sentry event id: 229c6e399c2b48d08f61682dff0ac69a
I have tried uploading a single manifest as well as two at the same time.

Unfortunately I don't feel comfortable sharing the manifests in their entirety. I'll try to cut them down to a minimal set.
What are the essential keys needed? I am thinking of maybe isolating a single table that does not contain any sensitive information.

However, I do have an update on what causes the error!
I experimented some more and I noticed that it is only when changing the model YAML files that the error occurs. Changes to the SQL models seem to work fine.

Here are some scenarios that causes the error

  • Changing the data type of an existing column in prod array<string> to array
  • Renaming an existing column in prod fruit to fruits_and_vegetables in staging in the model yaml
  • Switching enforcement of contracts on and off has no effect
  • Adding a new column to the model YAML in staging that is not in prod does raise an error

In my case, I have to tweak model yamls for SQL models that are affected by a change e.g. data type change.

I hope this makes it a bit easier to recreate the issue on your end.

@popcornylu
Copy link
Contributor

@Stochastic-Squirrel

Hi, thanks for your information. I know there is some privacy concern for providing the real run.json. So Is it possible to provide the two run.json from a dummy project? It would help us to reproduce the issue.

Expected reproduce steps

  1. Download your two run.json
  2. run
    piperider compare-reports --base run_base.json --target run.json --output ./comparison_report
    
  3. Get error result:
    Selected reports:
      Base:   run_base.json
      Target: run.json
    Warning:
    'bigquery'
    Got problem to generate changeset.
    Comparison report: 
    /data_warehouse/comparison_report/index.html
    

@Stochastic-Squirrel
Copy link
Author

thanks @popcornylu. I'll try to reproduce this error in a dummy project as soon as I can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants