-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
View data summary as part of dbt run. #3979
Comments
Really cool ideas @akashshah59! There's a lot you're touching on here, I'll try to break some of it down. Option 1: Show table samples in the terminal while dbt is runningThis follows your proposal for There's actually a way to do this in dbt today, but it's exclusively limited to
In practice, this isn't so useful, but it is pretty cool. There's a related feature request (#3265) to do this for test failures, which is conceivable in conjunction with the Option 2: Use dbt to run interactive queries against your databaseThis is more like your proposal of a dedicated
Followed by:
Or, to follow your example, a handy-dandy CLI command that shortcuts the above:
That's exactly the kind of development workflow that the dbt Server exists to enable. The initial version (
Agree! The existing Server sort of does this, by storing results as an agate table (dataframe) and then returning them as a JSON object. I think this kind of workflow would be most compelling in a notebook application, where the notebook persists SQL cell results as data frames (available to R/python context), and a dbt Server makes it possible to run dbt-SQL in that notebook, with access to your full project context. codaOf course, if you're really committed to doing it all from the command line, and avoiding a data warehouse console, there are some cool SQL-from-CLI tools (e.g. whale) :) |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers. |
Describe the feature
DBT has proved to be a great tool in organizing our data science project ETLs. However, in order to view the results of our intermediate transformations, or even the resulting transformation after an entire pipeline, data scientists have to log in to snowflake and get a summary of the data, using a LIMIT clause or doing some basic statistics on their features. Further, as a Data Scientist, sometimes we want to experiment with a feature and view the results then and there, pretty frequently, and this requires us to get into the data warehouse UI again and again.
While the current dbt project structure allows us scientists to focus more on code rather then boiler plate create statements, what could be additionally useful is a feature or a flag that allows to display intermediate results from views or tables created as part of the
dbt run
command.An additional thought would be to return the intermediate results as a dataframe like structure, allowing us to manipulate or play around with that data after performing a
dbt run
. As I visualize it,dbt view-data <model_name>
Describe alternatives you've considered
dbt run --view-data True
dbt view-data <model_name>
Additional context
This feature should not be database-specific, however we use dbt with snowflake and it would be great to start off right there.
I understand that this might be difficult to implement when it comes to larger analytics jobs, especially the ones that involve huge data volumes , maybe in GigaBytes. However we were hoping to get around that using some sampling strategy.
Who will this benefit?
Data scientists, Engineers and everyone how makes frequent changes to their dbt models.
Are you interested in contributing this feature?
Yes.
The text was updated successfully, but these errors were encountered: