You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The add_test_edges() function is called during the dbt build command, and inserts edges into the execution graph which are meant to ensure that models downstream from a node will not run until all the tests on that node have passed.
The function is slow in certain projects, and recent data from the field show that it inflates the number of edges in the graph by a factor of six. It is slow enough that it often shows up in performance profiles, but is even more problematic in terms of memory consumption, as memory use is high enough to cause OOM crashes.
Acceptance criteria
If possible, implement a new version of this function which adds edges to achieve the desired test-dependency behavior but inserts fewer edges and runs more quickly.
Add a new behavior flag which causes the new function to be used, while retaining the old function on the default code path.
Follow up by gathering data about the relative performance of the two implementations and monitoring for regressions.
Suggested Tests
Existing tests should suffice, but we should add additional tests to reduce the risks associated with the new implementation.
Impact to Other Teams
None.
Will backports be required?
No.
Context
No response
The text was updated successfully, but these errors were encountered:
As @peterallenwebb noted, a source of complexity here is that this add_test_edges currently accounts for tests that depend on multiple models, not just one. It may be difficult to take similar approaches for running test nodes "just in time" after a model completes during handle_job_queue if certain tests depend on multiple models before they can run
One thought here is to remove the transitive edgestest1 -> model 3(add_test_edges).
@gshank mentioned we can also only do this operation for selected parts of the DAG or not build it when people select tests in build command.
ChenyuLInx
changed the title
Improve the Performance Characteristics of add_test_edges()
[SPIKE+] Improve the Performance Characteristics of add_test_edges()
Nov 4, 2024
Housekeeping
Short description
The add_test_edges() function is called during the
dbt build
command, and inserts edges into the execution graph which are meant to ensure that models downstream from a node will not run until all the tests on that node have passed.The function is slow in certain projects, and recent data from the field show that it inflates the number of edges in the graph by a factor of six. It is slow enough that it often shows up in performance profiles, but is even more problematic in terms of memory consumption, as memory use is high enough to cause OOM crashes.
Acceptance criteria
Suggested Tests
Existing tests should suffice, but we should add additional tests to reduce the risks associated with the new implementation.
Impact to Other Teams
None.
Will backports be required?
No.
Context
No response
The text was updated successfully, but these errors were encountered: