-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit connection reuse #92
Comments
Hello @timrburnham ,
If it is about the saving the time to reconnect to the database, connection pooling does enable this. This sounds more like you want everything happening in the same transaction? Also, while declaring temporary tables is currently possible in So I would need to understand very precisely why you want these commands on the same connection. Maybe there is something here which still fits in the scope of the Alternatively you could also try:
As I see it every piece of Software has ownership issues. It is just in Rust the compiler tells you about them. |
Thank for the quick analysis! I don't think I need transaction control, just the same session--I really thought it might work with connection pooling. I'm not clear what's being reset. In this test case, I'm using the DuckDB driver under UnixODBC, since it's simple to setup and use. For reference, my ~/.odbc.ini contains: [DuckDB]
Driver=DuckDB Driver
Database=:memory: and my ~/.odbcinst.ini: [DuckDB Driver]
driver = /home/tim/libduckdb_odbc.so Here's a non-working case, using arrow-odbc: arrow_odbc.enable_odbc_connection_pooling()
# create temp table, DuckDB syntax
sql = """\
create temporary table temp_keys (
k varchar(128)
)
on commit preserve rows
;"""
arrow_odbc.read_arrow_batches_from_odbc(sql, 'DSN=DuckDB;')
# insert a bunch of rows from Arrow Table
filter = pa.Table.from_pydict({"k": ["key1", "key2", "key3"]})
reader = pa.RecordBatchReader.from_batches(filter.schema, filter.to_batches())
arrow_odbc.insert_into_table(
reader=reader,
chunk_size=1000,
table="temp_keys",
connection_string='DSN=DuckDB;',
)
sql = """\
select v from
values ('key1', 1), ('key6', 6) as fake_big_table(k, v)
inner join temp_keys
using (k)
;"""
results = arrow_odbc.read_arrow_batches_from_odbc(sql, 'DSN=DuckDB;') Results:
And after pip installing the native DuckDB driver, here's a working version where we explicitly reuse same connection: duck = duckdb.connect(':memory:')
sql = """\
create temporary table temp_keys (
k varchar(128)
)
on commit preserve rows
;"""
duck.sql(sql)
filter = pa.Table.from_pydict({"k": ["key1", "key2", "key3"]})
duck.sql("insert into temp_keys(k) select k from filter;")
sql = """\
select *
from values ('key1', 1), ('key6', 6) as fake_big_table(k, v)
inner join temp_keys
using (k)
;"""
results = duck.sql(sql).to_arrow_table() Results:
The general idea is, if I have a million rows in a table, and I only want a thousand of them, if I already know the keys it's easy to upload them into a temp table and join against my big table. You can get pretty far binding parameters to where clauses, but joins are better in some cases. |
Hello @timrburnham , thanks for the detailed response and sorry for the delayed answer. The use case is legit, however it is quite removed from there
Although working, this is of course not the intended use of I'll give it some thought. However please not that, at least during the month of May, I won't have any time to act on this, so please be aware that you will need a workaround / different solution in the meanwhile. Best, Markus |
Thanks very much, Markus! Using permanent tables is fine as a workaround, of course, I'm only thinking of the ergonomics. I really really like using arrow-odbc, thanks for a fantastic package! |
This is mostly a note to myself: I have not decided yet, whether to take this into the scope of the
|
|
|
Note to myself:
Going forward we aim to combine the 1. and 2. I.e. We offer an explicit connection object. However internally it would be modeled as |
This sounds ideal! Would it be practical for the Python Connection object to directly offer an |
It would be convenient to be able to re-use specific database connections for explicitly sequential tasks. For instance, declare a temporary table, insert some rows from an Arrow table, and join against that session table in a subsequent select.
Things I have tried unsucessfully:
I assume there are Rust ownership issues that make this difficult, but it would be amazing for ETL jobs which compute intermediate data sets. Right now I'm creating tables dynamically and dropping them after.
The text was updated successfully, but these errors were encountered: