-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: Fix the driver block hanging issue in serialized execution mode (#…
…11647) Summary: Pull Request resolved: #11647 Fix the driver block hanging issues in serialized execution mode such as for Gluten. When task calls next, it will keep iterating through the drivers from the all pipelines from all drivers. If there is no runnable drivers, it generate a future by collecting all the blocked driver futures and return to the user. If there is no cross blocked drivers dependence, then the user can wait for all the drivers to resume to continue, otherwise it will simply hanging there. Gluten found these in the following two cases which might also happen for Meta internal use case when we have more complex pipelines: (1) hash join: the probe operator initially wait for the build to finish, the hash build (or its associated pipeline) might wait for external event for input. Then the whole task will hang; (2) local exchange: the producer pipeline might wait for the consumer pipeline to consume data to proceed. Then the whole task will hang. This PR fixes the issue by using collect any instead of collect all. The use of collect any will also accelerate performance as a task can proceed with the first ready future instead of waiting for all the blocked drivers to proceed, e.g. the gluten union case for Spark under serialized execution mode. Given a future is not usable after passing to collect any and we can't call into a driver if it has a pending future or waiting event, we introduce a BlockingDriverState in task to capture the blocking driver state under serialized execution mode: When a driver returns a future, we set it into the blocking state which setups the blocking state (a bool indicating if a driver is blocked or not) and the continuation to clear the blocking state such as the derived the future for collect any operation; When task to collect any future from the blocking drivers, it gets a derived future from the blocking driver state which creates a promise contract and keeps the promise to signal when the corresponding driver future becomes ready. Since the driver future ready is from async code path, there is a lock inside each blocking driver state to prevent the concurrency Unit tests added to reprod and verification. #11442 Reviewed By: Yuhta, bikramSingh91, oerling Differential Revision: D66438632 fbshipit-source-id: c052fa3de2f4a48b1261382368d7af9fdce9fbef
- Loading branch information
1 parent
ea3656e
commit e80bf12
Showing
5 changed files
with
427 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.