Skip to content

[FEAT] Add smart planning of ScanTasks starting with merging by filesizes #3259

[FEAT] Add smart planning of ScanTasks starting with merging by filesizes

[FEAT] Add smart planning of ScanTasks starting with merging by filesizes #3259

Triggered via pull request December 5, 2023 19:21
Status Success
Total duration 25s
Artifacts

release-drafter.yml

on: pull_request
update_release_draft
7s
update_release_draft
Fit to window
Zoom out
Zoom in

Annotations

2 errors
update_release_draft
Validation Failed: {"resource":"Release","code":"invalid","field":"target_commitish"} { name: 'HttpError', id: '7105546658', status: 422, response: { url: 'https://api.github.com/repos/Eventual-Inc/Daft/releases/131989933', status: 422, headers: { 'access-control-allow-origin': '*', 'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset', connection: 'close', 'content-length': '195', 'content-security-policy': "default-src 'none'", 'content-type': 'application/json; charset=utf-8', date: 'Tue, 05 Dec 2023 19:21:55 GMT', 'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', server: 'GitHub.com', 'strict-transport-security': 'max-age=31536000; includeSubdomains; preload', vary: 'Accept-Encoding, Accept, X-Requested-With', 'x-accepted-github-permissions': 'contents=write', 'x-content-type-options': 'nosniff', 'x-frame-options': 'deny', 'x-github-api-version-selected': '2022-11-28', 'x-github-media-type': 'github.v3; format=json', 'x-github-request-id': '3C47:5497:99B0DD:9EF1A2:656F7852', 'x-ratelimit-limit': '1000', 'x-ratelimit-remaining': '970', 'x-ratelimit-reset': '1701805370', 'x-ratelimit-resource': 'core', 'x-ratelimit-used': '30', 'x-xss-protection': '0' }, data: { message: 'Validation Failed', errors: [ { resource: 'Release', code: 'invalid', field: 'target_commitish' } ], documentation_url: 'https://docs.github.com/rest/releases/releases#update-a-release' } }, request: { method: 'PATCH', url: 'https://api.github.com/repos/Eventual-Inc/Daft/releases/131989933', headers: { accept: 'application/vnd.github.v3+json', 'user-agent': 'probot/12.2.5 octokit-core.js/3.5.1 Node.js/16.20.2 (linux; x64)', authorization: 'token [REDACTED]', 'content-type': 'application/json; charset=utf-8' }, body: '{"body":"## Changes\\n\\n## ✨ New Features\\n\\n- [FEAT] Enable Comparison between timestamp / dates @samster25 (#1689)\\n- [FEAT] Enable MicroPartitions by default @jaychia (#1684)\\n- [FEAT] Temporal Literals for Date and Timestamp @samster25 (#1683)\\n- [FEAT] Partitioning exprs for Iceberg @samster25 (#1680)\\n\\n## 👾 Bug Fixes\\n\\n- [BUG] Use schema\\\\_hints as hints instead of definitive schema @colin-ho (#1636)\\n- [BUG] Allow for use of Ray jobs for benchmarking @jaychia (#1690)\\n- [BUG] fix off by 1 for retries for cred provider @samster25 (#1681)\\n\\n## 🧰 Maintenance\\n\\n- [CHORE] bump gcs and s3fs @samster25 (#1699)\\n- [CHORE] Add warmup step for remote tpch benchmarking @jaychia (#1691)\\n- [CHORE] drop s3 compat mode for gcs for anonymous mode @samster25 (#1682)\\n- [CHORE] Remove usage of credentials in workflows @jaychia (#1686)\\n- [CHORE] Iceberg Image Caching @samster25 (#1687)\\n- [CHORE] Bump Iceberg Version and V1 of caching @samster25 (#1685)\\n\\n## ⬆️ Dependencies\\n\\n- Bump globset from 0.4.13 to 0.4.14 @dependabot (#1694)\\n- Bump libc from 0.2.149 to 0.2.150 @dependabot (#1693)\\n- Bump google-github-actions/auth from 1 to 2 @dependabot (#1698)\\n","draft":true,"prerelease":false,"make_latest":"true","name":"v0.2.6","tag_name":"v0.2.6","target_commitish":"refs/pull/1692/merge"}', request: {} }, event: { id: '7105546658', name: 'pull_request', payload: { action: 'edited', changes: { body: { from: 'Refactors/changes required on ScanTask itself:\r\n' + '\r\n' + '1. Added a `ScanTask::merge`\r\n' + '2. Added a `ScanTask::partition_spec()`\r\n' + '3. Added some validation in `ScanTask::new` to assert that all the underlying sources have th
update_release_draft
HttpError: Validation Failed: {"resource":"Release","code":"invalid","field":"target_commitish"} at /home/runner/work/_actions/release-drafter/release-drafter/v5/dist/index.js:8462:21 at processTicksAndRejections (node:internal/process/task_queues:96:5) at async Job.doExecute (/home/runner/work/_actions/release-drafter/release-drafter/v5/dist/index.js:30793:18) { name: 'AggregateError', event: { id: '7105546658', name: 'pull_request', payload: { action: 'edited', changes: { body: { from: 'Refactors/changes required on ScanTask itself:\r\n' + '\r\n' + '1. Added a `ScanTask::merge`\r\n' + '2. Added a `ScanTask::partition_spec()`\r\n' + '3. Added some validation in `ScanTask::new` to assert that all the underlying sources have the same partition spec\r\n' + '\r\n' + 'I then added a new module `daft_scan::scan_task_iterators` which contains functions that perform transformations on a `Box<dyn Iterator<item = DaftResult<ScanTaskRef>>>`.\r\n' + '\r\n' + 'TODO:\r\n' + '\r\n' + '- [x] Make the file_size configurable (as an environment variable/context flag) so that our unit-tests still run correctly when we do multi-file tests for multi-partition dataframes\r\n' + '- [ ] Figure out if it is possible to make `PartitionSpec` implement `Hash`, which will help with the performance of the `MergeByFileSize` iterator when it needs to check for matches\r\n' } }, number: 1692, organization: { avatar_url: 'https://avatars.githubusercontent.com/u/98941975?v=4', description: 'Eventual Computing', events_url: 'https://api.github.com/orgs/Eventual-Inc/events', hooks_url: 'https://api.github.com/orgs/Eventual-Inc/hooks', id: 98941975, issues_url: 'https://api.github.com/orgs/Eventual-Inc/issues', login: 'Eventual-Inc', members_url: 'https://api.github.com/orgs/Eventual-Inc/members{/member}', node_id: 'O_kgDOBeW8Fw', public_members_url: 'https://api.github.com/orgs/Eventual-Inc/public_members{/member}', repos_url: 'https://api.github.com/orgs/Eventual-Inc/repos', url: 'https://api.github.com/orgs/Eventual-Inc' }, pull_request: { _links: { comments: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/issues/1692/comments' }, commits: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/1692/commits' }, html: { href: 'https://github.com/Eventual-Inc/Daft/pull/1692' }, issue: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/issues/1692' }, review_comment: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/comments{/number}' }, review_comments: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/1692/comments' }, self: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/1692' }, statuses: { href: 'https://api.github.com/repos/Eventual-Inc/Daft/statuses/29f56139e807dcc50b2a6b2a81b71cc8a63c9df1' } }, active_lock_reason: null, additions: 243, assignee: null, assignees: [], author_association: 'CONTRIBUTOR', auto_merge: null, base: { label: 'Eventual-Inc:main', ref: 'main', repo: { allow_auto_merge: true, allow_forking: true, allow_merge_commit: false, allow_rebase_merge: false, allow_squash_merge: true, allow_update_branch: false, archive_url: 'https://api.github.com/repos/Eventual-Inc/Daft/{archive_format}{/ref}', archived: false, assignees_url: 'https://api.github.com/repos/Eventual-Inc/Daft/assignees{/user}', blobs_url: '