-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Existing Duck db file as a data source #2910
Comments
Hey, thanks for the request. The use case definitely makes sense to me. Firstly, here's a way you might be able to get this working today – use Secondly, I'm thinking about how we can better support your use case. If we added support for the following, do you think that would address your use case?
Third, do you have a hosting use case for your dashboards? If yes, can you share where you store/host the DuckDB |
Oooo, I'm going to try your suggesting of using
I can see this being useful. It's still fuzzy in my mind how I'd like the overall workflow to go, but this meets the requirement of Person A writes
I would love this. Rill as a database client and also a dashboarding tool <3
Not yet! I'll probably try a few things while prototyping. Right now I think something low-tech would work best. Like a private S3 bucket. I think this would be a sweet spot between local analysis and full-bore data pipelines, and it would be hard for someone to screw up--worst case is they upload an invalid file; that still wouldn't corrupt existing data. Thank you for your quick response! |
Coming back to say that the |
@begelundmuller Would the This allows definition of data sources reading local data files, but with transforms in the database that Rill sees as tables.
I'd suggest a change where Rill throws an error rather than overwriting DuckDB objects for this approach. |
Hey @kmatt, yes it would allow that, but if it's a view that queries files directly, you might run into performance issues since Rill issues a lot of queries and therefore benefits from data being ingested into DuckDB's native format (we use However, in the last release (0.32), we actually added support for free-form DuckDB SQL sources – I think they should address your use case. The syntax is: # sources/my_source.yaml
type: duckdb
sql: "SELECT * FROM read_parquet('./data/my_data.parquet')" # can be any valid DuckDB SELECT statement
Unfortunately we can't easily do this. Since code file definitions and the underlying data can change in-between Rill sessions, we don't always know if a table in DuckDB was created by a previous version of Rill or by the user. We keep some metadata about ingested data in an internal table ( What if we instead added support for creating tables in a custom database schema? For example, we could add a flag |
Hey We just merged the changes that allows DuckDB data file to be used as a source. This will be available in the next release version but can now be used with the nightly as well (
|
Is your feature request related to a problem? Please describe.
I have existing duckdb files lying around (and also a program that takes a zip and emits a duck) that I would like to analyze with Rill.
Describe the solution you'd like
My ideal (albeit maybe naive) workflow would be:
Describe alternatives you've considered
As far as I can tell the only alternative is to use CSV/JSON/Parquet files, but this is not a realistic option. The major reason is that the value have going straight from zip to duck is codifying the data modeling of files in the zip--especially complex
MAP(VARCHAR, STRUCT(...))
types for deeply-nested JSON.Additional context
Rill looks amazing! Maybe I'm just holding it wrong with this feature request, but I really want the dashboarding features without necessarily using the data modeling features.
The text was updated successfully, but these errors were encountered: