Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: offer and and generators and async generators #6

Open
Goldziher opened this issue Jun 19, 2024 · 4 comments
Open

Request: offer and and generators and async generators #6

Goldziher opened this issue Jun 19, 2024 · 4 comments

Comments

@Goldziher
Copy link

Goldziher commented Jun 19, 2024

Hi there!

Thanks for this neat library. I'm giving it a go.

It would be great to have two variants of the chunkerify function that return a generator and async generator, and a version that is async.

Use cases:

  • async evaluation is good for non-blocking situations, for example - chunking dynamically inside a web request, which in a blocking (sync scenario) will impact the backend service as a whole in some cases. Furthermore, it could allow for creating a concurrent (not parallel) version of chunking perhaps.
  • returning a generator allows evaluating in intervals and executing code in between, for example - a for loop.
  • returning an async generator offers the same, within an async context.

The simplest option (but non performant) version for implementing async logic, is simply to execute the sync version using something like anyio.to_thread.run_sync: https://anyio.readthedocs.io/en/stable/threads.html.

@umarbutler
Copy link
Owner

Offering a generator chunker and perhaps even support for lazy chunking is something I’m open to. I’ll start work on that shortly.

With regard to offering an asynchronous generator, I’m not too sure what value there would be in that when there isn’t anything I’m aware of in my chunker that is IO-bound. And seeing as synchronous functions and generators are already callable within asynchronous environments, making chunkers asynchronous would only seem to add more overhead. If there’s something I’m missing here, however, please let me know.

@Goldziher
Copy link
Author

Offering a generator chunker and perhaps even support for lazy chunking is something I’m open to. I’ll start work on that shortly.

With regard to offering an asynchronous generator, I’m not too sure what value there would be in that when there isn’t anything I’m aware of in my chunker that is IO-bound. And seeing as synchronous functions and generators are already callable within asynchronous environments, making chunkers asynchronous would only seem to add more overhead. If there’s something I’m missing here, however, please let me know.

using an asnyc iterator / generator allows for streaming the source rather than loading it all into memory.

@umarbutler
Copy link
Owner

So you imagine it being used to handle inputs that are async iterators, is that right? For example:

chunker = chunkerify(...)
texts = my_async_text_generator()

# Normally you'd do this:
chunks = [chunker(text) async for text in texts]

# But you'd like to be able to do this(?)
chunks = await chunker(texts)

@Goldziher
Copy link
Author

So you imagine it being used to handle inputs that are async iterators, is that right? For example:

chunker = chunkerify(...)
texts = my_async_text_generator()

# Normally you'd do this:
chunks = [chunker(text) async for text in texts]

# But you'd like to be able to do this(?)
chunks = await chunker(texts)

For a stream I would use an async iterator (e.g. async generator)

But using async for chunking is purely for IO bound situations, like using chunking in an API. The advantage of

chunks = await chunker(texts)

Is that this will be ran in an async worker thread rather than the main thread, and thus not block the execution of other async threads.

I can fake it by doing something like

await anyio.to_thread.run_sync(chunker, texts)

But this is pretty suboptimal since it slows execution quite a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants