Implement `standardized` and `clean` stage dbt model generation for TIGER-taskflows #226

MattTriano · 2024-12-18T16:08:47Z

At present, the TIGER taskflow (shown below, in the first image) stops after ingesting a dataset into the data_raw schema. As far as I know, these datasets are released as static vintages and don't change over time (so the "capture all distinct record versions in data_raw and deduplicate by selecting the latest record-versions in clean" logic I use for Socrata taskflows shouldn't produce different record-sets), which is why I was originally fine with the TIGER taskflow leaving these datasets in the data_raw stage. I now think it's better to have a more consistent data-flow pattern, and I appreciate being able to standardize column names and types in the standardized stage. So I want to add in tasks to create the _standardized and then _clean model files, then to run them (as is done in the Socrata-taskflow task_group shown in the second image).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `standardized` and `clean` stage dbt model generation for TIGER-taskflows #226

Implement `standardized` and `clean` stage dbt model generation for TIGER-taskflows #226

MattTriano commented Dec 18, 2024

Implement standardized and clean stage dbt model generation for TIGER-taskflows #226

Implement standardized and clean stage dbt model generation for TIGER-taskflows #226

Comments

MattTriano commented Dec 18, 2024

Implement `standardized` and `clean` stage dbt model generation for TIGER-taskflows #226

Implement `standardized` and `clean` stage dbt model generation for TIGER-taskflows #226