Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate enum values against other possible forms of expected values #1502

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

changhc
Copy link
Contributor

@changhc changhc commented Oct 25, 2024

Change Summary

Validate enum values against other possible forms of expected values, specifically tuples. The problem is actually bigger than what the initial commit solves. Please see my comment.

Related issue number

Fixes pydantic/pydantic#10629

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @sydney-runkle

Copy link

codspeed-hq bot commented Oct 25, 2024

CodSpeed Performance Report

Merging #1502 will not alter performance

Comparing changhc:10629-fix-enum (06ac56d) with main (cd0346d)

Summary

✅ 155 untouched benchmarks

Copy link

codecov bot commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 97.29730% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.39%. Comparing base (ab503cb) to head (06ac56d).
Report is 226 commits behind head on main.

Files with missing lines Patch % Lines
src/validators/enum_.rs 96.96% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1502      +/-   ##
==========================================
- Coverage   90.21%   89.39%   -0.83%     
==========================================
  Files         106      112       +6     
  Lines       16339    17966    +1627     
  Branches       36       40       +4     
==========================================
+ Hits        14740    16060    +1320     
- Misses       1592     1886     +294     
- Partials        7       20      +13     
Files with missing lines Coverage Δ
src/serializers/config.rs 94.62% <100.00%> (+0.17%) ⬆️
src/serializers/mod.rs 100.00% <ø> (ø)
src/validators/enum_.rs 93.37% <96.96%> (ø)

... and 53 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2419981...06ac56d. Read the comment docs.

@changhc
Copy link
Contributor Author

changhc commented Oct 25, 2024

This issue lies mostly in validating JSON values. Take the case in the issue as an example:

from enum import Enum

class A(Enum):
    a = 1, 2

serialised = RootModel[A](A.a).model_dump_json()
parsed = RootModel[A].model_validate_json(serialised) # fail

What happens here is that A.a is serialised as [1, 2], but the validator is built with one possible enum value (1, 2) and thus cannot find any possible value that matches the input value [1, 2].

My initial commit does fix this issue by adding list(tuple) to the validator, but this does not cover all cases. The complete solution is to load the validator with serialised possible enum values and use them when the validator deals with JSON input values. Think the following example:

class Model(BaseModel):
    x: int

    @model_serializer
    def ser_model(self):
        return [self.x]

class A(Enum):
    a = 2, 3
    b = Model(x=1)

serialised = RootModel[A](A.b).model_dump_json() # [1]
parsed = RootModel[A].model_validate_json(serialised) # fail

To fix this issue completely, we need to know the serialised form of all enum values and add them to the validator. My initial fix solves the problem because tuples are serialised as JSON arrays and casting tuples to lists covers this case. The problem for me now is that I don't know how to find the corresponding serialisers given a PyAny instance. Since we know what possible enum values are and they are a limited set, I imagine this to be straightforward. @davidhewitt could you point me to the right place?

By the way, the following case is expected:

class MyEnum(Enum):
    a = 1, 2
    b = [1, 2]

v = SchemaValidator(core_schema.enum_schema(MyEnum, list(MyEnum.__members__.values())))
assert v.validate_json('[1, 2]') is MyEnum.a

Since both MyEnum.a and MyEnum.b are identical after serialisation, there is no way to determine what a JSON value represented before it was serialised, the first matched enum value is returned.

@davidhewitt
Copy link
Contributor

I think you might want to be looking in infer.rs to see what we would serialize for an arbitrary object?

@changhc
Copy link
Contributor Author

changhc commented Nov 8, 2024

Hi @davidhewitt, I've finally come up with a seemingly generalisable solution, but I would like to check with you and see if I missed out anything.

The idea is basically that we create a separate lookup in EnumValidator that contains the JSON form of all enum values and use this lookup for validation when the input type is JSON. The JSON form is acquired by calling serializers::to_jsonable_python. So far my implementation works with the following scenarios:

# root model with enum round trip
class Model(BaseModel):
    x: int

    @model_serializer
    def ser_model(self):
        return [self.x]

class A(Enum):
    e = "1+2j"
    a = 2, 3
    b = complex(1, 2)
    c = datetime.datetime.fromisoformat("2024-01-01T00:00:00Z")
    d = Model(x=2)
    f = float('inf')

for v in A.__members__.values():
    RootModel[A].model_validate_json(RootModel[A](v).model_dump_json())

# base model with enum round trip
class M(BaseModel):
    v: A

    #model_config = ConfigDict(ser_json_inf_nan='strings', use_enum_values=True)

for v in A.__members__.values():
    M.model_validate_json(M(v=v).model_dump_json())

I'm not quite sure about calling to_jsonable_python because ideally we should compose the JSON lookup using EnumSerializer so that it would be a round trip without any unexpected mismatch, e.g. think the complex value inference I missed out in my original implementation, but meanwhile that means exposing quite a few things from the serializer module and I'm not sure if that's the right thing to do.

At the same time, I have some questions regarding implementations around to_python(mode='JSON'), especially regarding float. Do you know why here it does not take inf_nan_mode into consideration? This is a case where my implementation does not fully work. In the class M above, if you uncomment model_config, it gives an error when processing A.f, that float('inf'):

pydantic_core._pydantic_core.ValidationError: 1 validation error for M
v
  Input should be '1+2j', (2, 3), (1+2j), datetime.datetime(2024, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), Model(x=2) or inf [type=enum, input_value='Infinity', input_type=str]

because the serialized value in model_dump_json ("Infinity") does not match the one acquired with to_jsonable_python (inf).

By the way, I'm still listing the original values in the error message instead of listing them in the JSON form because I'm not sure if showing the JSON form would be confusing.

Sorry if this is not very clear because I'm in a bit of a rush, but please let me know what you think. Thank you!

@changhc changhc marked this pull request as ready for review November 12, 2024 08:15
@changhc
Copy link
Contributor Author

changhc commented Nov 12, 2024

please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(🐞) Enum with multiple values (tuple) can't be round tripped through json
3 participants