Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Allow passing on_error="null" to ignore decoding errors in image decode #2033

Merged
merged 2 commits into from
Mar 22, 2024

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Mar 22, 2024

Allows for loose decoding of images, when we expect some bytes to be bad images

API:

df.with_column("images", df["bytes"].image.decode(on_error="null"))

@github-actions github-actions bot added the enhancement New feature or request label Mar 22, 2024
@jaychia jaychia requested review from kevinzwang and colin-ho March 22, 2024 20:39
Copy link

codecov bot commented Mar 22, 2024

Codecov Report

Attention: Patch coverage is 85.00000% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 84.71%. Comparing base (9d250af) to head (2f984c4).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2033      +/-   ##
==========================================
+ Coverage   84.69%   84.71%   +0.02%     
==========================================
  Files          62       62              
  Lines        6787     6803      +16     
==========================================
+ Hits         5748     5763      +15     
- Misses       1039     1040       +1     
Files Coverage Δ
daft/series.py 93.13% <100.00%> (+0.15%) ⬆️
daft/expressions/expressions.py 91.08% <62.50%> (-0.55%) ⬇️

... and 1 file with indirect coverage changes

Copy link
Contributor

@colin-ho colin-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a couple comments/questions!

@@ -1076,16 +1076,27 @@ def __repr__(self) -> str:
class ExpressionImageNamespace(ExpressionNamespace):
"""Expression operations for image columns."""

def decode(self) -> Expression:
def decode(self, on_error: Literal["raise"] | Literal["null"] = "raise") -> Expression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this a raise_error_on_failure : bool like the internal methods? Would we expect other options in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the semantics of our current .url.download(on_error: "null" | "raise"), so that our APIs could be more consistent.


with pytest.raises(ValueError, match="Decoding image from bytes failed"):
s.image.decode(on_error="raise")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test for parameter input that isn't raise or null ?

Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall!

@@ -1076,16 +1076,27 @@ def __repr__(self) -> str:
class ExpressionImageNamespace(ExpressionNamespace):
"""Expression operations for image columns."""

def decode(self) -> Expression:
def decode(self, on_error: Literal["raise"] | Literal["null"] = "raise") -> Expression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expect this to always have boolean semantics, a better user-facing API might be raise_on_error: bool or the like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the semantics of our current .url.download(on_error: "null" | "raise"), so that our APIs could be more consistent.

@jaychia jaychia merged commit eb315a8 into main Mar 22, 2024
31 checks passed
@jaychia jaychia deleted the jay/image-decode-null-on-error branch March 22, 2024 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants