Skip to content

[FEAT] [Join Optimizations] Add broadcast join. #3278

[FEAT] [Join Optimizations] Add broadcast join.

[FEAT] [Join Optimizations] Add broadcast join. #3278

Triggered via pull request December 7, 2023 20:50
Status Success
Total duration 22s
Artifacts

release-drafter.yml

on: pull_request
update_release_draft
5s
update_release_draft
Fit to window
Zoom out
Zoom in

Annotations

2 errors
update_release_draft
Validation Failed: {"resource":"Release","code":"invalid","field":"target_commitish"} { name: 'HttpError', id: '7133853818', status: 422, response: { url: 'https://api.github.com/repos/Eventual-Inc/Daft/releases/132770221', status: 422, headers: { 'access-control-allow-origin': '*', 'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset', connection: 'close', 'content-length': '195', 'content-security-policy': "default-src 'none'", 'content-type': 'application/json; charset=utf-8', date: 'Thu, 07 Dec 2023 20:50:35 GMT', 'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', server: 'GitHub.com', 'strict-transport-security': 'max-age=31536000; includeSubdomains; preload', vary: 'Accept-Encoding, Accept, X-Requested-With', 'x-accepted-github-permissions': 'contents=write', 'x-content-type-options': 'nosniff', 'x-frame-options': 'deny', 'x-github-api-version-selected': '2022-11-28', 'x-github-media-type': 'github.v3; format=json', 'x-github-request-id': '0407:1FD4:FD1DFF:20A14CD:6572301B', 'x-ratelimit-limit': '1000', 'x-ratelimit-remaining': '986', 'x-ratelimit-reset': '1701985799', 'x-ratelimit-resource': 'core', 'x-ratelimit-used': '14', 'x-xss-protection': '0' }, data: { message: 'Validation Failed', errors: [ { resource: 'Release', code: 'invalid', field: 'target_commitish' } ], documentation_url: 'https://docs.github.com/rest/releases/releases#update-a-release' } }, request: { method: 'PATCH', url: 'https://api.github.com/repos/Eventual-Inc/Daft/releases/132770221', headers: { accept: 'application/vnd.github.v3+json', 'user-agent': 'probot/12.2.5 octokit-core.js/3.5.1 Node.js/16.20.2 (linux; x64)', authorization: 'token [REDACTED]', 'content-type': 'application/json; charset=utf-8' }, body: '{"body":"## Changes\\n\\n## ✨ New Features\\n\\n- [FEAT] [JSON Reader] Add native streaming + parallel JSON reader. @clarkzinzow (#1679)\\n\\n## 🚀 Performance Improvements\\n\\n- [PERF] Enable Predicates in Parquet Reader @samster25 (#1702)\\n\\n## 📖 Documentation\\n\\n- [DOCS] Add notebooks used for pydata global 2023 presentation @jaychia (#1703)\\n","draft":true,"prerelease":false,"make_latest":"true","name":"v0.2.7","tag_name":"v0.2.7","target_commitish":"refs/pull/1706/merge"}', request: {} }, event: { id: '7133853818', name: 'pull_request', payload: { action: 'edited', changes: { body: { from: 'This PR adds a broadcast join implementation as a new join algorithm/strategy, where all partitions of a small table are broadcasted to each partition in the larger table, such that we do a local (hash) of the entire small table with each individual partition of the larger table.\r\n' + '\r\n' + 'The query planner chooses the broadcast join as its join strategy if one of the sides of the join is smaller than a preconfigured broadcasting threshold (set to 10 MiB by default, but is user-configurable).\r\n' + '\r\n' + 'If the smaller side of the join is the right side, we invert the join for planning and scheduling simplicity so we can always broadcast the left side; we then swap back to the correct join ordering when performing the local joins. This means that we always form the probe table on the left side of the join; a future optimization (applicable to both the broadcast join and the hash join) would be to have local joins build the probe table on the smaller side while preserving the expected column ordering. We would still need to always build the probe table on the l
update_release_draft
HttpError: Validation Failed: {"resource":"Release","code":"invalid","field":"target_commitish"} at /home/runner/work/_actions/release-drafter/release-drafter/v5/dist/index.js:8462:21 at processTicksAndRejections (node:internal/process/task_queues:96:5) at async Job.doExecute (/home/runner/work/_actions/release-drafter/release-drafter/v5/dist/index.js:30793:18) { name: 'AggregateError', event: { id: '7133853818', name: 'pull_request', payload: { action: 'edited', changes: { body: { from: 'This PR adds a broadcast join implementation as a new join algorithm/strategy, where all partitions of a small table are broadcasted to each partition in the larger table, such that we do a local (hash) of the entire small table with each individual partition of the larger table.\r\n' + '\r\n' + 'The query planner chooses the broadcast join as its join strategy if one of the sides of the join is smaller than a preconfigured broadcasting threshold (set to 10 MiB by default, but is user-configurable).\r\n' + '\r\n' + 'If the smaller side of the join is the right side, we invert the join for planning and scheduling simplicity so we can always broadcast the left side; we then swap back to the correct join ordering when performing the local joins. This means that we always form the probe table on the left side of the join; a future optimization (applicable to both the broadcast join and the hash join) would be to have local joins build the probe table on the smaller side while preserving the expected column ordering. We would still need to always build the probe table on the left side of the join if we need to preserve the row-ordering of the right side of the join, e.g. if the right side of the join is range-partitioned.\r\n' + '\r\n' + '## TODOs\r\n' + '\r\n' + '- [x] Test coverage.\r\n' + '- [ ] (Follow-up?) TPC-H benchmarking demonstrating speedup due to use of broadcast join.\r\n' + '- [ ] (Follow-up) In local joins, build the probe table on the smaller side of the join.\r\n' + '- [ ] (Follow-up) Add table size approximations for operators that affect cardinality.' } }, number: 1706, organization: { avatar_url: 'https://avatars.githubusercontent.com/u/98941975?v=4', description: 'Eventual Computing', events_url: 'https://api.github.com/orgs/Eventual-Inc/events', hooks_url: 'https://api.github.com/orgs/Eventual-Inc/hooks', id: 98941975, issues_url: 'https://api.github.com/orgs/Eventual-Inc/issues', login: 'Eventual-Inc', members_url: 'https://api.github.com/orgs/Eventual-Inc/members{/member}', node_id: 'O_kgDOBeW8Fw', public_members_url: 'https://api.github.com/orgs/Eventual-Inc/public_members{/member}', repos_url: 'https://api.github.com/orgs/Eventual-Inc/repos', url: 'https://api.github.com/orgs/Eventual-Inc' }, pull_request: { _links: { comments: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/issues/1706/comments' }, commits: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/1706/commits' }, html: { href: 'https://github.com/Eventual-Inc/Daft/pull/1706' }, issue: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/issues/1706' }, review_comment: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/comments{/number}' }, review_comments: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/1706/comments' }, self: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/1706' }, statuses: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/statuses/4fb0e12e84fb572291fa2f323a0b351843068604' } }, active_lo