-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Minimal indices dtype for FixedShapeSparseTensors #3149
[FEAT] Minimal indices dtype for FixedShapeSparseTensors #3149
Conversation
CodSpeed Performance ReportMerging #3149 will degrade performances by 14.33%Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense! Main question I have is that this would only apply to FixedShape tensors Sparse right?
## The Rationales Thanks to the [great work](Eventual-Inc#3018) from @universalmind303 , Daft now supports `INTERVAL` type exposed from `arrow2`. Beyond DataFrame supports, this PR aims to unlock SQL simple `INTERVAL` usage in SQL syntax, mainly copied from [planner.rs](https://github.com/sgl-project/sglang/pull/1790/files#diff-ea02b059cdabc0939616c35c6566dbcf980a5794306dedd241c2823afd9b2db2). Notes: This naive impl doesn't fully support complex interval scenarios, like leap year or relative duration addition and subtraction. We might need more carefully handled logic as the follow ups. --------- Signed-off-by: Austin Liu <[email protected]>
* Removes Int128 Type * Refactor Decimal128 to be backed by a DataArray rather than a LogicalArray * Implements math operations for Decimal * Implements comparison operations for Decimal
* Enables Between for Decimal128
Likely also increases performance due to removing heap alloc in some places. --------- Co-authored-by: Colin Ho <[email protected]>
…#2776) Bumps [slackapi/slack-github-action](https://github.com/slackapi/slack-github-action) from 1.26.0 to 1.27.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/slackapi/slack-github-action/releases">slackapi/slack-github-action's releases</a>.</em></p> <blockquote> <h2>Slack Send V1.27.0</h2> <h2>What's changed</h2> <p>This release introduces an optional <code>payload-delimiter</code> parameter for flattening nested objects with a customized delimiter before the payload is sent to Slack Workflow Builder when using workflow webhook triggers.</p> <pre lang="diff"><code> - name: Send a custom flattened payload uses: slackapi/[email protected] + with: + payload-delimiter: "_" env: SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }} </code></pre> <p>Setting this value to an underscore (<code>_</code>) is recommended when using nested inputs within Workflow Builder to match expected input formats of Workflow Builder, but the actual value can be changed to something else! This "flattening" behavior <strong>did</strong> exist prior to this version, but used a period (<code>.</code>) which is not valid for webook inputs in Workflow Builder.</p> <!-- raw HTML omitted --> <p>The resulting output of flattened objects is not always clear, but the following can hopefully serve as a quick reference as well as <a href="https://github.com/slackapi/slack-github-action/blob/5d1fb07d3c4f410b8d278134c714edff31264beb/test/slack-send-test.js#L264-L319">these specs</a> when using <code>_</code> as the delimiter:</p> <p><strong>Input</strong>:</p> <pre lang="json"><code>{ "apples": "tree", "bananas": { "truthiness": true } } </code></pre> <p><strong>Output</strong>:</p> <pre lang="json"><code>{ "apples": "tree", "bananas_truthiness": "true" } </code></pre> <p>Notice that <code>bananas_truthiness</code> is also stringified in this process, as part of updating values to match the expected inputs of Workflow Builder!</p> <!-- raw HTML omitted --> <h2>Changes</h2> <p>In addition to the changes above, the following lists all of the changes since the prior version with the <strong>complete changelog</strong> changes found here: <a href="https://github.com/slackapi/slack-github-action/compare/v1.26.0...v1.27.0">https://github.com/slackapi/slack-github-action/compare/v1.26.0...v1.27.0</a></p> <h4>🎁 Enhancements</h4> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/slackapi/slack-github-action/commit/37ebaef184d7626c5f204ab8d3baff4262dd30f0"><code>37ebaef</code></a> Automatic compilation</li> <li><a href="https://github.com/slackapi/slack-github-action/commit/5d1fb07d3c4f410b8d278134c714edff31264beb"><code>5d1fb07</code></a> chore(release): tag version 1.27.0</li> <li><a href="https://github.com/slackapi/slack-github-action/commit/3bc06716971bb1dc2899ccd0332da69b8b778356"><code>3bc0671</code></a> chore(deps): bump axios to 1.7.5 (<a href="https://redirect.github.com/slackapi/slack-github-action/issues/332">#332</a>)</li> <li><a href="https://github.com/slackapi/slack-github-action/commit/b452451af72f751bd902edfbbc084a8b2e6e5031"><code>b452451</code></a> feat: make the payload delimiter configurable for workflow webhook triggers (...</li> <li><a href="https://github.com/slackapi/slack-github-action/commit/c50e848fe18b1da5665e19286e3c9b86ad1b3bf5"><code>c50e848</code></a> build(deps-dev): bump mocha from 10.5.2 to 10.7.0 (<a href="https://redirect.github.com/slackapi/slack-github-action/issues/328">#328</a>)</li> <li><a href="https://github.com/slackapi/slack-github-action/commit/e4a9c4b6853f8b64ba9fee848d3f30198f9427c1"><code>e4a9c4b</code></a> build(deps): bump <code>@slack/web-api</code> from 7.2.0 to 7.3.2 (<a href="https://redirect.github.com/slackapi/slack-github-action/issues/327">#327</a>)</li> <li><a href="https://github.com/slackapi/slack-github-action/commit/9a7f0fa18816ae797b801ec2c27a04499fc2381b"><code>9a7f0fa</code></a> build(deps-dev): bump chai from 4.4.1 to 4.5.0 (<a href="https://redirect.github.com/slackapi/slack-github-action/issues/326">#326</a>)</li> <li><a href="https://github.com/slackapi/slack-github-action/commit/73b7062b8dccf12c0d62626d19953ea628e418ba"><code>73b7062</code></a> build(deps-dev): bump eslint-plugin-jsdoc from 48.5.0 to 48.10.2 (<a href="https://redirect.github.com/slackapi/slack-github-action/issues/325">#325</a>)</li> <li><a href="https://github.com/slackapi/slack-github-action/commit/3d5207b5cf109bd2640ec20613ed7f29ab46e853"><code>3d5207b</code></a> build(deps): bump https-proxy-agent from 7.0.4 to 7.0.5 (<a href="https://redirect.github.com/slackapi/slack-github-action/issues/320">#320</a>)</li> <li><a href="https://github.com/slackapi/slack-github-action/commit/4e15b6a964ca554d1a7b7a56850baa97e8316be2"><code>4e15b6a</code></a> build(deps): bump <code>@slack/web-api</code> from 7.0.4 to 7.2.0 (<a href="https://redirect.github.com/slackapi/slack-github-action/issues/323">#323</a>)</li> <li>Additional commits viewable in <a href="https://github.com/slackapi/slack-github-action/compare/v1.26.0...v1.27.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=slackapi/slack-github-action&package-manager=github_actions&previous-version=1.26.0&new-version=1.27.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> > **Note** > Automatic rebases have been disabled on this pull request as it has been open for over 30 days. Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [image](https://github.com/image-rs/image) from 0.24.9 to 0.25.4. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/image-rs/image/blob/main/CHANGES.md">image's changelog</a>.</em></p> <blockquote> <h3>Version 0.25.4</h3> <p>Features:</p> <ul> <li>Much faster decoding of lossless WebP due to a variety of optimizations. Our benchmarks show 2x to 2.5x improvement.</li> <li>Added support for orientation metadata, so that e.g. smartphone camera images could be displayed correctly: <ul> <li>Added <code>ImageDecoder::orientation()</code> and implemented orientation metadata extraction for JPEG, WebP and TIFF formats</li> <li>Added <code>DynamicImage::apply_orientation()</code> to apply the orientation to an image</li> </ul> </li> <li>Added support for extracting Exif metadata from images via <code>ImageDecoder::exif_metadata()</code>, and implemented it for JPEG and WebP formats</li> <li>Added <code>ImageEncoder::set_icc_profile()</code> and implemented it for WebP format. Pull requests with implementations for other formats are welcome.</li> <li>Added <code>DynamicImage::fast_blur()</code> for a linear-time approximation of Gaussian blur, which is much faster at larger blur radii</li> </ul> <p>Bug fixes:</p> <ul> <li>Fixed some APNG images being decoded incorrectly</li> <li>Fixed the iterator over animated WebP frames to return <code>None</code> instead of an error when the end of the animation is reached</li> </ul> <h3>Version 0.25.3</h3> <p>Yanked! This version accidentally missed a commit that should have been included with the release. The <code>Orientation</code> struct should be in the appropriate module instead of the top-level. This release won't be supported.</p> <h3>Version 0.25.2</h3> <p>Features:</p> <ul> <li>Added the HDR encoder to supported formats in generic write methods with the <code>hdr</code> feature enabled. Supports 32-bit float RGB color only, for now.</li> <li>When cloning <code>ImageBuffer</code>, <code>DynamicImage</code> and <code>Frame</code> the existing buffer will now be reused if possible.</li> <li>Added <code>image::ImageReader</code> as an alias.</li> <li>Implement <code>ImageEncoder</code> for <code>HdrEncoder</code>.</li> </ul> <p>Structural changes</p> <ul> <li>Switch from <code>byteorder</code> to <code>byteorder-lite</code>, consolidating some casting unsafety to <code>bytemuck</code>.</li> <li>Many methods on <code>DynamicImage</code> and buffers gained <code>#[must_use]</code> indications.</li> </ul> <p>Bug fixes:</p> <ul> <li>Removed test data included in the crate archive.</li> <li>The WebP animation decoder stops when reaching the indicate frame count.</li> <li>Fixed bugs in the <code>bmp</code> decoder.</li> <li>Format support gated on the <code>exr</code> feature now compiles in isolation.</li> </ul> <h3>Version 0.25.1</h3> <p>Bug fixes:</p> <ul> <li>Fixed corrupt JPEG output when attempting to encode images containing an alpha channel.</li> <li>Only accept ".ff" file extension for farbfeld images.</li> <li>Correct farbfeld feature flag for <code>ImageFormat::{reading_enabled, writing_enabled}</code>.</li> <li>Disable strict mode for JPEG decoder.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/image-rs/image/commit/0307a47de2ea14eea8a497a859724e7ee005773c"><code>0307a47</code></a> Merge pull request <a href="https://redirect.github.com/image-rs/image/issues/2354">#2354</a> from image-rs/release-0.25.4</li> <li><a href="https://github.com/image-rs/image/commit/ac09ced4b3cba911934baae797512e4105a02d3b"><code>ac09ced</code></a> Propose wording for republishing as 0.25.4</li> <li><a href="https://github.com/image-rs/image/commit/5e6bf4fd3c77b0eeaae0a64216e9321b56f16cf1"><code>5e6bf4f</code></a> Merge pull request <a href="https://redirect.github.com/image-rs/image/issues/2352">#2352</a> from image-rs/changelog-update</li> <li><a href="https://github.com/image-rs/image/commit/42d1396eb4ef250605bd83c999e45c4106bd5b90"><code>42d1396</code></a> Drop incorrect changelog entry</li> <li><a href="https://github.com/image-rs/image/commit/d52a194e5c3fa304143cc71d85d551e88fd211d9"><code>d52a194</code></a> Merge pull request <a href="https://redirect.github.com/image-rs/image/issues/2347">#2347</a> from Shnatsel/new-release</li> <li><a href="https://github.com/image-rs/image/commit/fe94eabb7f7491b9ba9378ea5ece2f8884c30c65"><code>fe94eab</code></a> Mention lossless WebP improvements</li> <li><a href="https://github.com/image-rs/image/commit/5976c195939bfbede976fe1e0a80225d192a793c"><code>5976c19</code></a> Merge pull request <a href="https://redirect.github.com/image-rs/image/issues/2349">#2349</a> from Shnatsel/orientation-in-metadata</li> <li><a href="https://github.com/image-rs/image/commit/91a001f23146d3fdb47c8eca9a4b19ebea3e4fc6"><code>91a001f</code></a> Don't import orientation in doc example</li> <li><a href="https://github.com/image-rs/image/commit/693079d51491bf0ab4c41403520f2dceba6dd3a0"><code>693079d</code></a> Reword ravif changelog entry</li> <li><a href="https://github.com/image-rs/image/commit/fb5799bd8fdfac399c9b40817b62a98dada19a1b"><code>fb5799b</code></a> Move Orientation to metadata module</li> <li>Additional commits viewable in <a href="https://github.com/image-rs/image/compare/v0.24.9...v0.25.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=image&package-manager=cargo&previous-version=0.24.9&new-version=0.25.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [adlfs](https://github.com/fsspec/adlfs) from 2023.10.0 to 2024.7.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/fsspec/adlfs/releases">adlfs's releases</a>.</em></p> <blockquote> <h2>2024.7.0</h2> <h2>What's Changed</h2> <ul> <li>Fix account host by <a href="https://github.com/dorbaker"><code>@dorbaker</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/480">fsspec/adlfs#480</a></li> <li>Allow blobs and file systems to pickle by <a href="https://github.com/ghidalgo3"><code>@ghidalgo3</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/479">fsspec/adlfs#479</a></li> <li>support signed urls via connection string alone by <a href="https://github.com/shcheklein"><code>@shcheklein</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/478">fsspec/adlfs#478</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/dorbaker"><code>@dorbaker</code></a> made their first contribution in <a href="https://redirect.github.com/fsspec/adlfs/pull/480">fsspec/adlfs#480</a></li> <li><a href="https://github.com/ghidalgo3"><code>@ghidalgo3</code></a> made their first contribution in <a href="https://redirect.github.com/fsspec/adlfs/pull/479">fsspec/adlfs#479</a></li> <li><a href="https://github.com/shcheklein"><code>@shcheklein</code></a> made their first contribution in <a href="https://redirect.github.com/fsspec/adlfs/pull/478">fsspec/adlfs#478</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/fsspec/adlfs/compare/2024.4.1...2024.7.0">https://github.com/fsspec/adlfs/compare/2024.4.1...2024.7.0</a></p> <h2>2024.4.1</h2> <h2>What's Changed</h2> <ul> <li>Honor the anon parameter if set by <a href="https://github.com/adam-roughton"><code>@adam-roughton</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/468">fsspec/adlfs#468</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/adam-roughton"><code>@adam-roughton</code></a> made their first contribution in <a href="https://redirect.github.com/fsspec/adlfs/pull/468">fsspec/adlfs#468</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/fsspec/adlfs/compare/2024.4.0...2024.4.1">https://github.com/fsspec/adlfs/compare/2024.4.0...2024.4.1</a></p> <h2>2024.4.0</h2> <h2>What's Changed</h2> <ul> <li>add missing await on delete_blob call per issue 459 by <a href="https://github.com/johnmacnamararseg"><code>@johnmacnamararseg</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/460">fsspec/adlfs#460</a></li> <li>format via black and add installation of dev deps to contributing docs by <a href="https://github.com/johnmacnamararseg"><code>@johnmacnamararseg</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/464">fsspec/adlfs#464</a></li> <li>Make AzureBlobFileSystem anon behaviour configurable via env var. by <a href="https://github.com/microft"><code>@microft</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/437">fsspec/adlfs#437</a></li> <li>document that <code>credential</code> needs to be from azure.identity.aio by <a href="https://github.com/temporaer"><code>@temporaer</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/463">fsspec/adlfs#463</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/johnmacnamararseg"><code>@johnmacnamararseg</code></a> made their first contribution in <a href="https://redirect.github.com/fsspec/adlfs/pull/460">fsspec/adlfs#460</a></li> <li><a href="https://github.com/microft"><code>@microft</code></a> made their first contribution in <a href="https://redirect.github.com/fsspec/adlfs/pull/437">fsspec/adlfs#437</a></li> <li><a href="https://github.com/temporaer"><code>@temporaer</code></a> made their first contribution in <a href="https://redirect.github.com/fsspec/adlfs/pull/463">fsspec/adlfs#463</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/fsspec/adlfs/compare/2024.2.0...2024.4.0">https://github.com/fsspec/adlfs/compare/2024.2.0...2024.4.0</a></p> <h2>2024.2.0</h2> <h2>What's Changed</h2> <ul> <li>fs.url(): expose response content headers for pre-signed URLs by <a href="https://github.com/pmrowla"><code>@pmrowla</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/451">fsspec/adlfs#451</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/fsspec/adlfs/compare/2024.1.0...2024.2.0">https://github.com/fsspec/adlfs/compare/2024.1.0...2024.2.0</a></p> <h2>2024.1.0</h2> <h2>What's Changed</h2> <ul> <li>adlfs: fix version typo by <a href="https://github.com/efiop"><code>@efiop</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/449">fsspec/adlfs#449</a></li> <li>Check for Hdi_isfolder with a capital by <a href="https://github.com/basnijholt"><code>@basnijholt</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/418">fsspec/adlfs#418</a></li> <li>put_file: default to overwrite=True by <a href="https://github.com/pmrowla"><code>@pmrowla</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/419">fsspec/adlfs#419</a></li> <li>Fix recursive delete on hierarchical namespace accounts by <a href="https://github.com/Tom-Newton"><code>@Tom-Newton</code></a> in <a href="https://redirect.github.com/fsspec/adlfs/pull/454">fsspec/adlfs#454</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/fsspec/adlfs/blob/main/CHANGELOG.md">adlfs's changelog</a>.</em></p> <blockquote> <p><strong>Change Log</strong></p> <h2>Unreleased</h2> <ul> <li><code>AzureBlobFileSystem</code> and <code>AzureBlobFile</code> support pickling.</li> <li>Handle mixed casing for <code>hdi_isfolder</code> metadata when determining whether a blob should be treated as a folder.</li> <li><code>_put_file</code>: <code>overwrite</code> now defaults to <code>True</code>.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/fsspec/adlfs/compare/2023.10.0...2024.7.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=adlfs&package-manager=pip&previous-version=2023.10.0&new-version=2024.7.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> > **Note** > Automatic rebases have been disabled on this pull request as it has been open for over 30 days. Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
I'm using RustRover to code and debug Rust related code and I noticed the debug run in RustRover doesn't work out of box: there's no variable showing when breakpoint is hit. It turns out that the RustRover IDE will launch debug process with test profile, which is inherited from dev[^1] profile. I am not sure why the dev[^2] profile in this project is configured without debug enabled. I add the test profile with debug enabled in this PR. [^1]: https://doc.rust-lang.org/cargo/reference/profiles.html#test [^2]: https://github.com/Eventual-Inc/Daft/blob/main/Cargo.toml#L86
…. /tmp/**.csv) (Eventual-Inc#3100) Closes Eventual-Inc#1820. Main issue seems to be that the `globset` crate is permissive for what kind of pattern it builds (no error is thrown when we try to build a pattern for `/tmp/**.csv`, for instance, so we have to check ourselves for any such patterns.
Streaming writes for swordfish (parquet + csv only). Iceberg and delta writes are here: Eventual-Inc#2966 Implement streaming writes as a blocking sink. Unpartitioned writes run with 1 worker, and Partitioned writes run with NUM_CPUs workers. As a drive by, made blocking sinks parallelizable. **Behaviour** - Unpartitioned: Make writes to a `TargetFileSizeWriter`, which manages file sizes and row group sizes, as data is streamed in. - Partitioned: Partition data via a `Dispatcher` and send to workers based on the hash. Each worker runs a `PartitionedWriter` that manages partitioning by value, file sizes, and row group sizes. **Benchmarks:** I made a new benchmark suite in `tests/benchmarks/test_streaming_writes.py`, it tests writes of tpch lineitem to parquet/csv with/without partition columns and different file/rowgroup size. The streaming executor performs much better when there are partition columns, as seen in this screenshot. Without partition columns it is about the same, when target row group size / file size is decreased, it is slightly slower. Likely due to the fact that probably does more slicing, but will need to investigate more. Memory usage is the same for both. <img width="1400" alt="Screenshot 2024-10-03 at 11 22 32 AM" src="https://github.com/user-attachments/assets/53b4d77d-553a-4181-8a4d-9eddaa3adaf7"> Memory test on read->write parquet tpch lineitem sf1: Native: <img width="1078" alt="Screenshot 2024-10-08 at 1 48 34 PM" src="https://github.com/user-attachments/assets/3eda33c6-9413-415f-b808-ac3c7437e269"> Python: <img width="1090" alt="Screenshot 2024-10-08 at 1 48 50 PM" src="https://github.com/user-attachments/assets/f92b9a9f-a3b5-408b-98d5-4ba2d66b7be4"> --------- Co-authored-by: Colin Ho <[email protected]> Co-authored-by: Colin Ho <[email protected]> Co-authored-by: Colin Ho <[email protected]>
Spawns compute tasks on joinsets so that they can be cancelled. --------- Co-authored-by: Colin Ho <[email protected]>
This PR marks `PartitionTasks` as done only after they have been explicitly marked as done by the runner. Previously, we used the existence of the `.results` on a PartitionTask to determine whether or not it is done. However, this is not quite correct in the case of the RayRunner, which will attach a result containing a Ray ObjectRef, which is a future. This future may not (and is likely not) be completed yet at the time of PartitionTask creation. --------- Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
@jaychia @colin-ho Just added temporal doc section to expressions.rst. Let me know what you think of the content and then we can finalize which page or user-guide section to put it on from there. Thanks! --------- Co-authored-by: Colin Ho <[email protected]>
a whole bunch of boilerplate for tpc-ds benchmarking and testing. wanted to keep this separate from others as there's not much functionality here, just adding a `dsdgen` command to the makefile to generate tpc-ds datasets. I called it `dsdgen` because that's what duckdb calls it, and this uses the duckdb implementation to generate all of the datasets. The answers were copied from [duckdb/duckdb/extension/tpcds/dsdgen/answers](https://github.com/duckdb/duckdb/tree/10c42435f1805ee4415faa5d6da4943e8c98fa55/extension/tpcds/dsdgen/answers) Usage: ```sh # defaults to sf=1 and dir=data/tpc-ds > make dsdgen > make dsdgen SCALE_FACTOR=<scale_factor> OUTPUT_DIR=<output_dir> ``` ## Notes for reviewer Most files here are boilerplate. The only relevant files are: - Makefile - requirements_dev.txt - benchmarking/tpc-ds/datagen.py
When running in a Ray Job, without the user invoking any Ray commands or `ray.init()` explicitly, the `ray.is_initialized()` function returns False. This means that Daft "does not know" that it is running inside of a Ray cluster, and thus will not default to using the RayRunner. This can lead to unexpected behavior when using `daft-launcher` because a user must know to call `daft.context.set_runner_ray()`. This PR changes that behavior by attempting to look up the `$RAY_JOB_ID` environment variable, as a heuristic to tell whether or not it is currently running inside of a Ray job. To test, I just ran a Ray job and called `daft.context.get_context()` after initializing a Daft dataframe <img width="1350" alt="image" src="https://github.com/user-attachments/assets/0a6d8ae4-034a-424d-a3d7-9311d08be454"> --------- Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
@samster25 I also added a |
True, to avoid dtype ambiguity for dynamic sparse tensors (i.e. when concatenating 2 dataframes). What do you think? |
Hi @sagiahrac! Just took a look, just a few minor requests! |
The
indices
inFixedShapeSparseTensors
are limited by the total number of elements within each tensor. As long as they remain within the range defined by the tensor’s shape, we can choose a more compact data type for the indices, reducing memory usage without sacrificing functionality.