Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Introduce a zero_value field in LiteralType to capture DataClasses default fields #5887

Open
2 tasks done
EngHabu opened this issue Oct 22, 2024 · 5 comments
Open
2 tasks done
Assignees
Labels
enhancement New feature or request

Comments

@EngHabu
Copy link
Contributor

EngHabu commented Oct 22, 2024

Motivation: Why do you think this is important?

@dataclass
class MyOtherStruct:
    i: int = 5


@workflow
def my_wf_with_defaults(a: MyOtherStruct) -> int:
    return my_task()

a's LiteralType would look something like this:

metadata -> json schema in proto struct format + default value i=5
structure -> json schema (no i=5)
type: scalar ... STRUCT

This makes it difficult for clients (e.g. FlyteConsole) to display the zero value for a.

Goal: What should the final outcome look like, ideally?

message LiteralType {
  ...
  Literal zero_value = ;
}

Flytekit should populate this value for DataClasses. I think for other types the zero value is just empty/null
This is not to be confused with "default value"...

Describe alternatives you've considered

Parse metadata and extract field information from there...

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@EngHabu EngHabu added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Oct 22, 2024
@EngHabu
Copy link
Contributor Author

EngHabu commented Oct 22, 2024

cc @wild-endeavor @fg91 @eapolinario

@fg91
Copy link
Member

fg91 commented Oct 23, 2024

I have some clarification questions:

Flytekit should populate this value for DataClasses.

For the dataclass or for the attribute with a default value, in this case i, of the dataclass?

This is not to be confused with "default value"...

LiteralType doesn't have a default_value, do you mean the value that protobuf will default to for a field if not specified in the message? So zero_value would be the python default value while "default value" is protobuf's default?

And how is this affected by the #5742 RFC?

@wild-endeavor
Copy link
Contributor

wild-endeavor commented Oct 23, 2024

If you came across this paragraph, what would you think @EngHabu

The default keyword specifies a default value. This value is not used to fill in missing values during the validation process. Non-validation tools such as documentation generators or form generators may use this value to give hints to users about how to use a value. However, default is typically used to express that if a value is missing, then the value is semantically the same as if the value was present with the default value. The value of default should validate against the schema in which it resides, but that isn't required.

@wild-endeavor
Copy link
Contributor

Also is this a useful feature outside of dataclasses?

@EngHabu
Copy link
Contributor Author

EngHabu commented Oct 25, 2024

I think I got my wires crossed.

This code:

from dataclasses import dataclass
from mashumaro.jsonschema import build_json_schema
import typing
from flytekit import task

@dataclass
class Person:
    name: str = "foo"
    age: int = 123
    email: str = "[email protected]"

print("Output for build_json_schema")
print(build_json_schema(Person).to_json())

@task
def my_task(a: Person):
    ...

print("Output for task structure")
print(my_task.interface.inputs.get("a").to_flyte_idl().type.metadata)

Produces:

Output for build_json_schema
{'type': 'object', 'title': 'Person', 'properties': {'name': {'type': 'string', 'default': 'foo'}, 'age': {'type': 'integer', 'default': 123}, 'email': {'type': 'string', 'default': '[email protected]'}}, 'additionalProperties': False}

Output for task metadata
fields {
  key: "type"
  value {
    string_value: "object"
  }
}
fields {
  key: "title"
  value {
    string_value: "Person"
  }
}
fields {
  key: "properties"
  value {
    struct_value {
      fields {
        key: "name"
        value {
          struct_value {
            fields {
              key: "type"
              value {
                string_value: "string"
              }
            }
            fields {
              key: "default"
              value {
                string_value: "foo"
              }
            }
          }
        }
      }
      fields {
        key: "email"
        value {
          struct_value {
            fields {
              key: "type"
              value {
                string_value: "string"
              }
            }
            fields {
              key: "default"
              value {
                string_value: "[email protected]"
              }
            }
          }
        }
      }
      fields {
        key: "age"
        value {
          struct_value {
            fields {
              key: "type"
              value {
                string_value: "integer"
              }
            }
            fields {
              key: "default"
              value {
                number_value: 123
              }
            }
          }
        }
      }
    }
  }
}
fields {
  key: "additionalProperties"
  value {
    bool_value: false
  }
}

The vanilla build_json_schema and flytekit's output don't quite look the same
It looks like it's coming from here

Which looks pretty similar to what I did but with different results.

I also want to understand what .structure provide that's different? I get it won't have default values.

LiteralType doesn't have a default_value, do you mean the value that protobuf will default to for a field if not specified in the message? So zero_value would be the python default value while "default value" is protobuf's default?

Exactly. But looks like this can be extracted from metadata with some work :)

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Oct 31, 2024
@eapolinario eapolinario self-assigned this Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants