Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixing blocking behavior of imap_queued (#14)
When using process_folder in sdk, it seems to be stuck on a single file * [imap function is being used for processing](https://github.com/cohere-ai/cohere-compass-sdk/blob/main/compass_sdk/parser.py#L150-L156) * [definition of imap_queue](https://github.com/cohere-ai/cohere-compass-sdk/blob/main/compass_sdk/parser.py#L150-L156) * The interesting part is the [popleft line](https://github.com/cohere-ai/cohere-compass-sdk/blob/main/compass_sdk/utils.py#L24) , if one file is too big it will get stuck on that file and not process any further files. That is why the processing is really slow if we have couple of big pdfs This code aims to improve that by using `as_completed` and waits for only any first completed future ## Auto generated <!-- begin-generated-description --> This PR introduces changes to the `compass_sdk/utils.py` and `pyproject.toml` files, primarily focusing on updating import statements and adding a new dependency. ## `compass_sdk/utils.py` Changes: - **Import Updates:** The PR modifies the import statements for the `concurrent` and `concurrent.futures` modules. It now imports `concurrent.futures` directly from `concurrent`, ensuring a more organized and explicit import structure. - **Data Structure Change:** In the `imap_queued` function, the data structure used to store tasks has been changed from a `deque` to a `set`. This is done by creating a `futures_set` set and adding tasks to it using `futures_set.add()`. - **Task Management:** The task management logic has been updated. Instead of using a `while` loop to manage tasks, the code now employs `futures.wait()` and `futures.as_completed()` to handle task completion and yielding results. ## `pyproject.toml` Changes: - **New Dependency:** The `aiohttp` dependency has been added with the version specified as "3.10.5". This addition suggests the introduction of asynchronous HTTP capabilities to the project. <!-- end-generated-description -->
- Loading branch information