[FEAT] Add streaming + parallel CSV reader, with decompression support. #2305
release-drafter.yml
on: pull_request
update_release_draft
6s
label
4s
Annotations
2 errors
update_release_draft
Validation Failed: {"resource":"Release","code":"invalid","field":"target_commitish"}
{
name: 'HttpError',
id: '6564704347',
status: 422,
response: {
url: 'https://api.github.com/repos/Eventual-Inc/Daft/releases/124512520',
status: 422,
headers: {
'access-control-allow-origin': '*',
'access-control-expose-headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset',
connection: 'close',
'content-length': '195',
'content-security-policy': "default-src 'none'",
'content-type': 'application/json; charset=utf-8',
date: 'Wed, 18 Oct 2023 17:55:05 GMT',
'referrer-policy': 'origin-when-cross-origin, strict-origin-when-cross-origin',
server: 'GitHub.com',
'strict-transport-security': 'max-age=31536000; includeSubdomains; preload',
vary: 'Accept-Encoding, Accept, X-Requested-With',
'x-accepted-github-permissions': 'contents=write',
'x-content-type-options': 'nosniff',
'x-frame-options': 'deny',
'x-github-api-version-selected': '2022-11-28',
'x-github-media-type': 'github.v3; format=json',
'x-github-request-id': 'A148:4420:90717F:124E5DF:65301BF9',
'x-ratelimit-limit': '1000',
'x-ratelimit-remaining': '950',
'x-ratelimit-reset': '1697652395',
'x-ratelimit-resource': 'core',
'x-ratelimit-used': '50',
'x-xss-protection': '0'
},
data: {
message: 'Validation Failed',
errors: [
{
resource: 'Release',
code: 'invalid',
field: 'target_commitish'
}
],
documentation_url: 'https://docs.github.com/rest/releases/releases#update-a-release'
}
},
request: {
method: 'PATCH',
url: 'https://api.github.com/repos/Eventual-Inc/Daft/releases/124512520',
headers: {
accept: 'application/vnd.github.v3+json',
'user-agent': 'probot/12.2.5 octokit-core.js/3.5.1 Node.js/16.20.2 (linux; x64)',
authorization: 'token [REDACTED]',
'content-type': 'application/json; charset=utf-8'
},
body: '{"body":"## Changes\\n\\n## ✨ New Features\\n\\n- [FEAT] IOStats for Native Reader @samster25 (#1493)\\n\\n## 🚀 Performance Improvements\\n\\n- [PERF] Micropartition, lazy loading and Column Stats @samster25 (#1470)\\n- [PERF] Use pyarrow table for pickling rather than ChunkedArray @samster25 (#1488)\\n- [PERF] Use region from system and leverage cached credentials when making new clients @samster25 (#1490)\\n- [PERF] Update default max\\\\_connections 64->8 because it is now per-io-thread @jaychia (#1485)\\n- [PERF] Pass-through multithreaded\\\\_io flag in read\\\\_parquet @jaychia (#1484)\\n\\n## 👾 Bug Fixes\\n\\n- [BUG] Fix handling of special characters in S3LikeSource @jaychia (#1495)\\n- [BUG] Fix local globbing of current directory @jaychia (#1494)\\n- [BUG] fix script to upload file 1 at a time @samster25 (#1492)\\n- [CHORE] Add tests and fixes for Azure globbing @jaychia (#1482)\\n\\n## 🧰 Maintenance\\n\\n- [CHORE] Better logging for physical plan @jaychia (#1499)\\n- [CHORE] Refactor logging @jaychia (#1489)\\n- [CHORE] Add Workflow to build artifacts and upload to S3 @samster25 (#1491)\\n- [CHORE] Update default num\\\\_tries on S3Config to 25 @jaychia (#1487)\\n- [CHORE] Add tests and fixes for Azure globbing @jaychia (#1482)\\n","draft":true,"prerelease":false,"make_latest":"true","name":"v0.1.21","tag_name":"v0.1.21","target_commitish":"refs/pull/1501/merge"}',
request: {}
},
event: {
id: '6564704347',
name: 'pull_request',
payload: {
action: 'edited',
changes: {
body: {
from: 'This PR adds streaming + parallel CSV reading and parsing, along with support for streaming decompression. In particular, this PR:\r\n' +
'- Adds support for streaming decompression for brotli, bz, deflate, gzip,
|
update_release_draft
HttpError: Validation Failed: {"resource":"Release","code":"invalid","field":"target_commitish"}
at /home/runner/work/_actions/release-drafter/release-drafter/v5/dist/index.js:8462:21
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async Job.doExecute (/home/runner/work/_actions/release-drafter/release-drafter/v5/dist/index.js:30793:18)
{
name: 'AggregateError',
event: {
id: '6564704347',
name: 'pull_request',
payload: {
action: 'edited',
changes: {
body: {
from: 'This PR adds streaming + parallel CSV reading and parsing, along with support for streaming decompression. In particular, this PR:\r\n' +
'- Adds support for streaming decompression for brotli, bz, deflate, gzip, lzma, xz, zlib, and zstd.\r\n' +
'- Performs chunk-based streaming CSV reads, filling up a small buffer of unparsed records.\r\n' +
'- Pipelines chunk-based CSV parsing with reading by spawning Tokio + rayon parsing tasks.\r\n' +
'- Performances chunk parsing, as well as column parsing within a chunk, in parallel on the rayon threadpool.\r\n' +
'- Changes schema inference to involve an (at most) 1 MiB file peak rather than a full file read.\r\n' +
'- Gathers a mean row size in bytes estimate during schema inference and propagates this estimate back to the reader.\r\n' +
'- Unifies local and cloud reads + schema inference.\r\n' +
'- Adds thorough Rust-side local + cloud test coverage.\r\n' +
'\r\n' +
'The streaming + parallel reading leads to a 4-8x speed up over the pyarrow reader and the previous non-parallel reader when benchmarking large file (~1 GB) reads, while also resulting in lower memory utilization due to the streaming reading + parsing.\r\n' +
'\r\n' +
'## TODOs (follow-up PRs)\r\n' +
'\r\n' +
'- [ ] Add snappy decompression support (need to essentially do something like [this](https://github.com/belltoy/tokio-snappy/blob/master/src/lib.rs))'
}
},
number: 1501,
organization: {
avatar_url: 'https://avatars.githubusercontent.com/u/98941975?v=4',
description: 'Eventual Computing',
events_url: 'https://api.github.com/orgs/Eventual-Inc/events',
hooks_url: 'https://api.github.com/orgs/Eventual-Inc/hooks',
id: 98941975,
issues_url: 'https://api.github.com/orgs/Eventual-Inc/issues',
login: 'Eventual-Inc',
members_url: 'https://api.github.com/orgs/Eventual-Inc/members{/member}',
node_id: 'O_kgDOBeW8Fw',
public_members_url: 'https://api.github.com/orgs/Eventual-Inc/public_members{/member}',
repos_url: 'https://api.github.com/orgs/Eventual-Inc/repos',
url: 'https://api.github.com/orgs/Eventual-Inc'
},
pull_request: {
_links: {
comments: {
href: 'https://api.github.com/repos/Eventual-Inc/Daft/issues/1501/comments'
},
commits: {
href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/1501/commits'
},
html: { href: 'https://github.com/Eventual-Inc/Daft/pull/1501' },
issue: {
href: 'https://api.github.com/repos/Eventual-Inc/Daft/issues/1501'
},
review_comment: {
href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/comments{/number}'
},
review_comments: {
href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/1501/comments'
},
self: {
href: 'https://api.github.com/repos/Eventual-Inc/Daft/pulls/1501'
},
statuses: {
href: 'https://api.github.com/repos/Eventual-Inc/Daft/statuses/d0cd093357b690e3461bc457af21064c9a14ee6e'
}
},
active_lock_reason: null,
additions: 1616,
assignee: null,
assignees: [],
author_association: 'CONTRIBUTOR',
auto_merge: null,
base:
|