Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wiz-cli] duplicate dataclass schemas should be replaced with just one #59

Open
rnag opened this issue May 13, 2022 · 0 comments
Open
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed self-created Opened by me!

Comments

@rnag
Copy link
Owner

rnag commented May 13, 2022

  • Dataclass Wizard version: 0.22.1
  • Python version: 3.10
  • Operating System: Mac OS

Description

In certain cases - and especially in certain API responses, most notably for AWS Rekognition - the input JSON object can contain multiple definitions for the same field - for ex. "element", all of which contain an identical schema.

I'd like to eliminate those duplicate dataclass definitions in the output, so that the generated schema is a bit less verbose and we only have the data we care about.

For example, note the below sample input and output.

What I Did

I ran the following command from my mac terminal:

echo '{
    "element": {
        "my_str": "string",
        "my_int": 3
    },
    "Elements": [
        {
            "my_str": "hello",
            "my_int": 5
        },
        {
            "myStr": "world",
            "MyInt": 7
        }
    ],
    "other_field": {
        "element": {
            "my_str": "other string",
            "my_int": 42
        }
    }
}' | wiz gs

The generated output is a bit noisy in this scenario, as it contains duplicate definitions of the dataclass Element:

from dataclasses import dataclass
from typing import List

from dataclass_wizard import JSONWizard


@dataclass
class Data(JSONWizard):
    """
    Data dataclass

    """
    element: 'Element'
    elements: List['Element']
    other_field: 'OtherField'


@dataclass
class Element:
    """
    Element dataclass

    """
    my_str: str
    my_int: int


@dataclass
class Element:
    """
    Element dataclass

    """
    my_str: str
    my_int: int


@dataclass
class OtherField:
    """
    OtherField dataclass

    """
    element: 'Element'


@dataclass
class Element:
    """
    Element dataclass

    """
    my_str: str
    my_int: int

I'd like to eliminate all the duplicate definitions - preferably trim any duplicates after the first dataclass schema for Element.

Resolution

There are multiple ways to achieve this, but I think the easiest might be to store the generated string or __repr__ for the schema in a dict with the class name as the key, and then lookup and compare if those string defintions are the same. If so, we just continue and return an empty __repr__ after the first time. If not, we generate all the field names and types for the dataclass as normal.

@rnag rnag added bug Something isn't working help wanted Extra attention is needed good first issue Good for newcomers labels May 13, 2022
@rnag rnag changed the title [wiz-cli] duplicate dataclass declarations should be replaced with just one [wiz-cli] duplicate dataclass schemas should be replaced with just one May 13, 2022
@rnag rnag added the self-created Opened by me! label Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed self-created Opened by me!
Projects
None yet
Development

No branches or pull requests

1 participant