Validate enum values against other possible forms of expected values #1502

changhc · 2024-10-25T07:13:04Z

Change Summary

Validate enum values against other possible forms of expected values, specifically tuples. The problem is actually bigger than what the initial commit solves. Please see my comment.

Related issue number

Fixes pydantic/pydantic#10629

Checklist

Unit tests for the changes exist
Documentation reflects the changes where applicable
Pydantic tests pass with this pydantic-core (except for expected changes)
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @sydney-runkle

codspeed-hq · 2024-10-25T07:20:39Z

CodSpeed Performance Report

Merging #1502 will not alter performance

_{Comparing changhc:10629-fix-enum (06ac56d) with main (cd0346d)}

Summary

✅ 155 untouched benchmarks

codecov · 2024-10-25T07:24:53Z

Codecov Report

Attention: Patch coverage is 97.29730% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.39%. Comparing base (ab503cb) to head (06ac56d).
Report is 226 commits behind head on main.

Files with missing lines	Patch %	Lines
src/validators/enum_.rs	96.96%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1502      +/-   ##
==========================================
- Coverage   90.21%   89.39%   -0.83%     
==========================================
  Files         106      112       +6     
  Lines       16339    17966    +1627     
  Branches       36       40       +4     
==========================================
+ Hits        14740    16060    +1320     
- Misses       1592     1886     +294     
- Partials        7       20      +13

Files with missing lines	Coverage Δ
src/serializers/config.rs	`94.62% <100.00%> (+0.17%)`	⬆️
src/serializers/mod.rs	`100.00% <ø> (ø)`
src/validators/enum_.rs	`93.37% <96.96%> (ø)`

... and 53 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2419981...06ac56d. Read the comment docs.

changhc · 2024-10-25T07:41:14Z

This issue lies mostly in validating JSON values. Take the case in the issue as an example:

from enum import Enum

class A(Enum):
    a = 1, 2

serialised = RootModel[A](A.a).model_dump_json()
parsed = RootModel[A].model_validate_json(serialised) # fail

What happens here is that A.a is serialised as [1, 2], but the validator is built with one possible enum value (1, 2) and thus cannot find any possible value that matches the input value [1, 2].

My initial commit does fix this issue by adding list(tuple) to the validator, but this does not cover all cases. The complete solution is to load the validator with serialised possible enum values and use them when the validator deals with JSON input values. Think the following example:

class Model(BaseModel):
    x: int

    @model_serializer
    def ser_model(self):
        return [self.x]

class A(Enum):
    a = 2, 3
    b = Model(x=1)

serialised = RootModel[A](A.b).model_dump_json() # [1]
parsed = RootModel[A].model_validate_json(serialised) # fail

To fix this issue completely, we need to know the serialised form of all enum values and add them to the validator. My initial fix solves the problem because tuples are serialised as JSON arrays and casting tuples to lists covers this case. The problem for me now is that I don't know how to find the corresponding serialisers given a PyAny instance. Since we know what possible enum values are and they are a limited set, I imagine this to be straightforward. @davidhewitt could you point me to the right place?

By the way, the following case is expected:

class MyEnum(Enum):
    a = 1, 2
    b = [1, 2]

v = SchemaValidator(core_schema.enum_schema(MyEnum, list(MyEnum.__members__.values())))
assert v.validate_json('[1, 2]') is MyEnum.a

Since both MyEnum.a and MyEnum.b are identical after serialisation, there is no way to determine what a JSON value represented before it was serialised, the first matched enum value is returned.

davidhewitt · 2024-10-29T12:02:44Z

I think you might want to be looking in infer.rs to see what we would serialize for an arbitrary object?

…o 10629-fix-enum

changhc · 2024-11-08T08:45:42Z

Hi @davidhewitt, I've finally come up with a seemingly generalisable solution, but I would like to check with you and see if I missed out anything.

The idea is basically that we create a separate lookup in EnumValidator that contains the JSON form of all enum values and use this lookup for validation when the input type is JSON. The JSON form is acquired by calling serializers::to_jsonable_python. So far my implementation works with the following scenarios:

# root model with enum round trip
class Model(BaseModel):
    x: int

    @model_serializer
    def ser_model(self):
        return [self.x]

class A(Enum):
    e = "1+2j"
    a = 2, 3
    b = complex(1, 2)
    c = datetime.datetime.fromisoformat("2024-01-01T00:00:00Z")
    d = Model(x=2)
    f = float('inf')

for v in A.__members__.values():
    RootModel[A].model_validate_json(RootModel[A](v).model_dump_json())

# base model with enum round trip
class M(BaseModel):
    v: A

    #model_config = ConfigDict(ser_json_inf_nan='strings', use_enum_values=True)

for v in A.__members__.values():
    M.model_validate_json(M(v=v).model_dump_json())

I'm not quite sure about calling to_jsonable_python because ideally we should compose the JSON lookup using EnumSerializer so that it would be a round trip without any unexpected mismatch, e.g. think the complex value inference I missed out in my original implementation, but meanwhile that means exposing quite a few things from the serializer module and I'm not sure if that's the right thing to do.

At the same time, I have some questions regarding implementations around to_python(mode='JSON'), especially regarding float. Do you know why here it does not take inf_nan_mode into consideration? This is a case where my implementation does not fully work. In the class M above, if you uncomment model_config, it gives an error when processing A.f, that float('inf'):

pydantic_core._pydantic_core.ValidationError: 1 validation error for M
v
  Input should be '1+2j', (2, 3), (1+2j), datetime.datetime(2024, 1, 1, 0, 0, tzinfo=datetime.timezone.utc), Model(x=2) or inf [type=enum, input_value='Infinity', input_type=str]

because the serialized value in model_dump_json ("Infinity") does not match the one acquired with to_jsonable_python (inf).

By the way, I'm still listing the original values in the error message instead of listing them in the JSON form because I'm not sure if showing the JSON form would be confusing.

Sorry if this is not very clear because I'm in a bit of a rush, but please let me know what you think. Thank you!

changhc · 2024-11-12T08:15:43Z

please review

fix enum

6aac326

fix test

578c893

changhc mentioned this pull request Nov 3, 2024

Fix serializing complex values in enums #1524

Merged

4 tasks

changhc added 5 commits November 5, 2024 23:36

fix enum

dd7f7ca

fix test

85e8e09

Merge branch '10629-fix-enum' of github.com:changhc/pydantic-core int…

27700b3

…o 10629-fix-enum

validate values against their json form

48de7c6

Merge branch 'main' into 10629-fix-enum

5435ba7

changhc added 2 commits November 12, 2024 07:43

add tests

a1abde7

fix tests

06ac56d

changhc marked this pull request as ready for review November 12, 2024 08:15

pydantic-hooky bot added the ready for review label Nov 12, 2024

pydantic-hooky bot assigned sydney-runkle Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate enum values against other possible forms of expected values #1502

Validate enum values against other possible forms of expected values #1502

changhc commented Oct 25, 2024 •

edited by pydantic-hooky bot

Loading

codspeed-hq bot commented Oct 25, 2024 •

edited

Loading

codecov bot commented Oct 25, 2024 •

edited

Loading

changhc commented Oct 25, 2024

davidhewitt commented Oct 29, 2024

changhc commented Nov 8, 2024

changhc commented Nov 12, 2024

Validate enum values against other possible forms of expected values #1502

Are you sure you want to change the base?

Validate enum values against other possible forms of expected values #1502

Conversation

changhc commented Oct 25, 2024 • edited by pydantic-hooky bot Loading

Change Summary

Related issue number

Checklist

codspeed-hq bot commented Oct 25, 2024 • edited Loading

CodSpeed Performance Report

Merging #1502 will not alter performance

Summary

codecov bot commented Oct 25, 2024 • edited Loading

Codecov Report

changhc commented Oct 25, 2024

davidhewitt commented Oct 29, 2024

changhc commented Nov 8, 2024

changhc commented Nov 12, 2024

changhc commented Oct 25, 2024 •

edited by pydantic-hooky bot

Loading

codspeed-hq bot commented Oct 25, 2024 •

edited

Loading

codecov bot commented Oct 25, 2024 •

edited

Loading