Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upcoming Changes in V1 #155

Open
1 of 13 tasks
rnag opened this issue Nov 27, 2024 Discussed in #153 · 0 comments
Open
1 of 13 tasks

Upcoming Changes in V1 #155

rnag opened this issue Nov 27, 2024 Discussed in #153 · 0 comments
Labels
enhancement New feature or request performance

Comments

@rnag
Copy link
Owner

rnag commented Nov 27, 2024

Discussed in #153

Originally posted by rnag November 27, 2024
I want to add my thoughts on planned (breaking) changes in the next major release V1.

Planned Changes

  • There will/should be no default key transform on dump (serialization). So if dataclass fields are defined in snake_case, then in JSON output it will also be in snake_case.
    • Thus we can remove helper class JSONPyWizard, as it was only a stop-gap solution.
  • Similarly, there will be no "auto" key transform on load (de-serialization) anymore. See my note/comment under "Performance Improvements" below.
  • JSONWizard should be the default class name, it is time to do away with alias to JSONSerializable, which IMO doesn't make much sense to retain in a library called Dataclass Wizard.
  • The __str__() will no longer be default (or at least the same) on JSONWizard subclass.
    • Pretty-printing a dataclass instance as JSON is a bit unexpected and humorous to me (maybe childish?). Not sure what the new default will be. To use an example of a library, pydantic does it weirdly, it prints the field names with repr'd values separated by a space, and no class name. Maybe there's a middle ground or it could involve leveraging pprint. I'll have to think on it.
  • We will no longer automatically (silently) convert float value or a float in str (ex. 123.4 or '12.3') to an int if the annotated type is int. There seems to be lot of concern over this and it appears to be tied around unintentional data loss, and I agree, we shouldn't lose the fractional part when converting to int, especially as Python we should strive to be more explicit and not do "silent" conversions like these.
  • The @dataclass decorator may no longer be required? For convenience, our library can use @dataclass_transform and apply it ourselves if a class isn't decorated with it. Especially true as most IDEs like PyCharm now support it. I think this would be a huge help for users, and me personally, as I sometimes forget to apply @dataclass.
  • All deprecated stuff should and can be removed (ex. __pre_as_dict__() hook)

Performance improvements

  • Improving some helper conversion functions. For example as_str() is unnecessary, simply using builtin str() appears to be the fastest approach. What a shocker 😮
    • Though for best practice, we can also support None when loading to str type. Something like '' if x is None else str(x) seems like a good middle ground to have 🤔
  • Methods under LoadMixin should now return a string instead of be defined as regular functions, this will boost performance as we now exec function anyway, so there's no need to nest functions when parsing individual fields.
    • If I have time, I can also do a similar thing for DumpMixin and dumpers.py. My reasoning is, perhaps by default we can use the type annotations on a field to determine how to dump/serialize it. For example, if annotated type is str | None, then have kwargs[field] = value in string code to return the field value, no need to check the type of value as how dataclasses does it, e.g. if type(obj) in _ATOMIC_TYPES: ... each time asdict is called. Though my follow-up thought was, it will prove tricky for cases like Optional and Union. For Union type annotated fields, maybe it's best to check the type of value directly after all.
    • Coincidentally, this also means some (or all?) of the Parsers in parsers.py can be removed, as they will be unnecessary.
  • The default behavior should be to iterate over dataclass fields on de-serialization, instead of looping over the JSON object. This will have the minor benefit of eliminating for loop. I am thinking maybe having a Meta setting such as input_letter_case or similar, so e.g. if set to input_letter_case='CAMEL', then it will enable automatically map my_str dataclass field to myStr in input JSON object. Plus of course, another setting such as wizard_mode=True or auto_key_transform=True would effectively disable "minor optimization" mode and loop over the input JSON object, as this library is currently doing it, and as the example on the frontpage of the docs clearly illustrates to users.

I had more changes planned, if I remember them I will add or jot them down here. Thanks all, and kindly let me know any comments or feedback down below! 👋

@rnag rnag added enhancement New feature or request performance labels Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

1 participant