Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make microbatch models skippable #11020

Merged
merged 3 commits into from
Nov 21, 2024
Merged

make microbatch models skippable #11020

merged 3 commits into from
Nov 21, 2024

Conversation

MichelleArk
Copy link
Contributor

@MichelleArk MichelleArk commented Nov 21, 2024

Resolves #11021

Problem

  1. Skipping microbatch model fails with typeError: object of type 'Field' has no len()
  2. That codepath indicates the microbatch model is still running in some capacity

Solution

  1. Stop using field(default_factory=...) outside a dataclass (MicrobatchModelRunner)
  2. Return results early if initial microbatch run results in a skipped result

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

🎩

Invoking dbt with ['run']
16:21:04  Running with dbt=1.9.0-b4
16:21:04  Registered adapter: postgres=1.9.0-b1
16:21:04  Unable to do partial parsing because saved manifest not found. Starting full parse.
16:21:05  Found 2 models, 429 macros
16:21:05  
16:21:05  Concurrency: 4 threads (target='default')
16:21:05  
16:21:05  1 of 2 START sql table model test17322060642773181458_test_microbatch.input_model  [RUN]
16:21:05  1 of 2 ERROR creating sql table model test17322060642773181458_test_microbatch.input_model  [ERROR in 0.05s]
16:21:05  2 of 2 SKIP relation test17322060642773181458_test_microbatch.microbatch_model . [SKIP]
16:21:05  
16:21:05  Finished running 1 incremental model, 1 table model in 0 hours 0 minutes and 0.55 seconds (0.55s).
16:21:05  
16:21:05  Completed with 1 error, 0 partial successs, and 0 warnings:
16:21:05  
16:21:05    Database Error in model input_model (models/input_model.sql)
  column "invalid" does not exist
  LINE 14: select invalid as event_time
                  ^
  compiled code at target/run/test/models/input_model.sql
16:21:05  

(extra '.' in skip result is a general issue, not just for microbatch models)

Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@@ -335,7 +335,7 @@ def execute(self, model, manifest):

class MicrobatchModelRunner(ModelRunner):
batch_idx: Optional[int] = None
batches: Dict[int, BatchType] = field(default_factory=dict)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usage of field(default_factory=...) was invalid outside of a dataclass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we verified that having it set as batches: Dict[int, BatchType] = {} doesn't make it so two instances of MicrobatchModelRunner share the same underlying dict?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a test. We do indeed need to do something different

>>> class MyClass:
...   batches = {}
... 
>>> instance1 = MyClass()
>>> instance2 = MyClass()
>>> instance1.batches
{}
>>> instance2.batches
{}
>>> instance1.batches['catch_phrase'] = "I like cats!"
>>> instance1.batches
{'catch_phrase': 'I like cats!'}
>>> instance2.batches
{'catch_phrase': 'I like cats!'}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Fixed with init override:

>>> class MyClass:
...   def __init__(self):
...     self.batches = {}
...
>>> instance1 = MyClass()
>>> instance2 = MyClass()
>>> instance1.batches
{}
>>> instance2.batches
{}
>>> instance1.batches['catch_phrase'] = "I like cats!"
>>> instance1.batches
{'catch_phrase': 'I like cats!'}
>>> instance2.batches
{}
>>>

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.08%. Comparing base (fd6ec71) to head (052d699).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11020      +/-   ##
==========================================
- Coverage   89.14%   89.08%   -0.07%     
==========================================
  Files         183      183              
  Lines       23760    23764       +4     
==========================================
- Hits        21182    21171      -11     
- Misses       2578     2593      +15     
Flag Coverage Δ
integration 86.39% <100.00%> (-0.14%) ⬇️
unit 62.17% <75.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 62.17% <75.00%> (+0.04%) ⬆️
Integration Tests 86.39% <100.00%> (-0.14%) ⬇️
---- 🚨 Try these New Features:

@MichelleArk MichelleArk marked this pull request as ready for review November 21, 2024 16:29
@MichelleArk MichelleArk requested a review from a team as a code owner November 21, 2024 16:29
Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@MichelleArk MichelleArk merged commit a42303c into main Nov 21, 2024
54 of 55 checks passed
@MichelleArk MichelleArk deleted the skippable-microbatch-model branch November 21, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Skipping microbatch model fails with typeError: object of type 'Field' has no len()
2 participants