Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow IPC stream format doesn't require schema anymore #61

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

abdolence
Copy link
Owner

No description provided.

@abdolence abdolence linked an issue Aug 14, 2024 that may be closed by this pull request
@domoritz
Copy link

This is great. Exactly what I need.

@abdolence
Copy link
Owner Author

@domoritz
It is good if it works for you. Though I'm still curious if you able to read this stream anyhow? Looking at many APIs (like for Python, etc), they all suspiciously require schema when you write a stream.

@domoritz
Copy link

Let me explain my use case. I have a database query that generates a stream of arrow buffers in https://github.com/uwdata/mosaic/blob/711f0c3ffd658b8f1425aa88e1e4f8c430169935/packages/duckdb-server-rust/src/db.rs#L102-L131. I use this method later in an Axum handler to create a response. Since I have to send the batch stream over thread boundaries, I don't have easy access to the schema and only get the stream of arrow batches. Each batch has a schema so I thought it would be possible for this library to just use the schema of the batches (e.g. the first batch) instead of me having to find a way to pass the schema through. So yes, I do need a schema in the byte stream so I can read it but I was hoping there is some way for me to get it from the batches.

@abdolence
Copy link
Owner Author

So yes, I do need a schema in the byte stream

Ok, then in this case this change won't work for you since schema is basically absent in the stream?
While reading spec it seems fine, but I have concerns that people wouldn't be able to use other libraries to read it.

@domoritz
Copy link

I see. I'll figure out a way to get the schema then

@abdolence abdolence marked this pull request as draft August 15, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arrow streaming without schema
2 participants