Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add group property to records. Correct filter on file records. #169

Merged
merged 4 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changes/unreleased/Under the Hood-20240716-125753.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Under the Hood
body: Add record grouping mechanism to record/replay.
time: 2024-07-16T12:57:53.434099-04:00
custom:
Author: peterallenwebb
Issue: "169"
26 changes: 12 additions & 14 deletions dbt_common/clients/system.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,15 @@
c_bool = None


def _record_path(path: str) -> bool:
return (
# TODO: The first check here obviates the next two checks but is probably too coarse?
"dbt/include" not in path
and "dbt/include/global_project" not in path
and "/plugins/postgres/dbt/include/" not in path
)


@dataclasses.dataclass
class FindMatchingParams:
root_path: str
Expand All @@ -61,12 +70,7 @@ def __init__(
def _include(self) -> bool:
# Do not record or replay filesystem searches that were performed against
# files which are actually part of dbt's implementation.
return (
"dbt/include"
not in self.root_path # TODO: This actually obviates the next two checks but is probably too coarse?
and "dbt/include/global_project" not in self.root_path
and "/plugins/postgres/dbt/include/" not in self.root_path
)
return _record_path(self.root_path)


@dataclasses.dataclass
Expand Down Expand Up @@ -150,10 +154,7 @@ class LoadFileParams:
def _include(self) -> bool:
# Do not record or replay file reads that were performed against files
# which are actually part of dbt's implementation.
return (
"dbt/include/global_project" not in self.path
and "/plugins/postgres/dbt/include/" not in self.path
)
return _record_path(self.path)


@dataclasses.dataclass
Expand Down Expand Up @@ -248,10 +249,7 @@ class WriteFileParams:
def _include(self) -> bool:
# Do not record or replay file reads that were performed against files
# which are actually part of dbt's implementation.
return (
"dbt/include/global_project" not in self.path
and "/plugins/postgres/dbt/include/" not in self.path
)
return _record_path(self.path)


@Recorder.register_record_type
Expand Down
9 changes: 5 additions & 4 deletions dbt_common/record.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ class Record:
to the request, and the 'result' is what is returned."""

params_cls: type
result_cls: Optional[type]
result_cls: Optional[type] = None
group: Optional[str] = None

def __init__(self, params, result) -> None:
self.params = params
Expand Down Expand Up @@ -309,9 +310,9 @@ def record_replay_wrapper(*args, **kwargs) -> Any:
if recorder is None:
return func_to_record(*args, **kwargs)

if (
recorder.recorded_types is not None
and record_type.__name__ not in recorder.recorded_types
if recorder.recorded_types is not None and not (
record_type.__name__ in recorder.recorded_types
or record_type.group in recorder.recorded_types
):
return func_to_record(*args, **kwargs)

Expand Down
9 changes: 6 additions & 3 deletions docs/guides/record_replay.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,16 @@ The final detail needed is to define the classes specified by `params_cls` and `
With these decorators applied and classes defined, dbt is able to record all file access during a run, and mock out the accesses during replay, isolating dbt from actually loading files. At least it would if dbt only used this function for all file access, which is only mostly true. We hope to continue improving the usefulness of this mechanism by adding more recorded functions and routing more operations through them.

## How to record/replay
If `DBT_RECORDER_MODE` is not `replay` or `record`, case insensitive, this is a no-op. Invalid values are ignored and do not throw exceptions.

`DBT_RECODER_TYPES` is optional. It indicates which types to filter the results by and expects a list of strings values for the `Record` subclasses. Any invalid types will be ignored. `all` is a valid type and behaves the same as not populating the env var.
Record/replay behavior is activated and configured via environment variables. When DBT_RECORDER_MODE is unset, the entire subsystem is disabled, and the decorators described above have no effect at all. This helps isolate the subsystem from core's application code, reducing the risk of performance impact or regressions.

The record/replay subsystem is activated by setting the `DBT_RECORDER_MODE` variable to `replay`, `record`, or `diff`, case insensitive. Invalid values are ignored and do not throw exceptions.

`DBT_RECODER_TYPES` is optional. It indicates which types to filter the results by and expects a list of strings values for the `Record` subclasses or groups of such classes. For example, all records of database/DWH interaction performed by adapters belong to the `Database` group. Any invalid type or group name will be ignored. `all` is a valid value for this variable and has the same effect as not populating the variable.


```bash
DBT_RECORDER_MODE=record DBT_RECODER_TYPES=QueryRecord,GetEnvRecord dbt run
DBT_RECORDER_MODE=record DBT_RECODER_TYPES=Database dbt run
```

replay need the file to replay
Expand Down
Loading