Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] MessagePack IDL, Pydantic Support, and Attribute Access #6022

Merged
merged 20 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions docs/user_guide/data_types_and_io/accessing_attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ Note that while this functionality may appear to be the normal behavior of Pytho
Consequently, accessing attributes in this manner is, in fact, a specially implemented feature.
This functionality facilitates the direct passing of output attributes within workflows, enhancing the convenience of working with complex data structures.

```{important}
Flytekit version >= v1.14.0 supports Pydantic BaseModel V2, you can do attribute access on Pydantic BaseModel V2 as well.
```

```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```
Expand All @@ -19,7 +23,7 @@ To begin, import the required dependencies and define a common task for subseque

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 1-10
:lines: 1-9
```

## List
Expand All @@ -31,38 +35,38 @@ Flyte currently does not support output promise access through list slicing.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 14-23
:lines: 13-22
```

## Dictionary
Access the output dictionary by specifying the key.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 27-35
:lines: 26-34
```

## Data class
Directly access an attribute of a dataclass.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 39-53
:lines: 38-51
```

## Complex type
Combinations of list, dict and dataclass also work effectively.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 57-80
:lines: 55-78
```

You can run all the workflows locally as follows:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/attribute_access.py
:caption: data_types_and_io/attribute_access.py
:lines: 84-88
:lines: 82-86
```

## Failure scenario
Expand Down
18 changes: 17 additions & 1 deletion docs/user_guide/data_types_and_io/dataclass.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,24 @@ When you've multiple values that you want to send across Flyte entities, you can
Flytekit uses the [Mashumaro library](https://github.com/Fatal1ty/mashumaro)
to serialize and deserialize dataclasses.
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved

With the 1.14 release, `flytekit` adopted `MessagePack` as the
serialization format for dataclasses, overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype, like the previous versions do:

davidmirror-ops marked this conversation as resolved.
Show resolved Hide resolved
to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to insert a new line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think insert a new line is more readable


:::{important}
If you're using Flytekit version < v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
:::

:::{important}
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved
If you're using Flytekit version below v1.11.1, you will need to add `from dataclasses_json import dataclass_json` to your imports and decorate your dataclass with `@dataclass_json`.
Flytekit version < v1.14.0 will produce protobuf `struct` literal for dataclasses.

Flytekit version >= v1.14.0 will produce msgpack bytes literal for dataclasses.

If you're using Flytekit version >= v1.14.0 and you want to produce protobuf `struct` literal for dataclasses, you can
set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.

This was already mentioned above

Also in the readthedocs build, you can see there are two important blocks nested


For more details, you can refer the MSGPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
:::

```{note}
Expand Down
3 changes: 2 additions & 1 deletion docs/user_guide/data_types_and_io/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Here's a breakdown of these mappings:
- Use ``pyspark.DataFrame`` as a type hint.
* - ``pydantic.BaseModel``
- ``Map``
- To utilize the type, install the ``flytekitplugins-pydantic`` plugin.
- To utilize the type, install the ``pydantic>2`` module.
- Use ``pydantic.BaseModel`` as a type hint.
* - ``torch.Tensor`` / ``torch.nn.Module``
- File
Expand Down Expand Up @@ -144,6 +144,7 @@ flytefile
flytedirectory
structureddataset
dataclass
pydantic_basemodel
accessing_attributes
pytorch_type
enum_type
Expand Down
103 changes: 103 additions & 0 deletions docs/user_guide/data_types_and_io/pydantic_basemodel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
(pydantic_basemodel)=

# Pydantic BaseModel

```{eval-rst}
.. tags:: Basic
```

`flytekit` version >=1.14 supports natively the `JSON` format that Pydantic `BaseModel` produces, enhancing the
interoperability of Pydantic BaseModels with the Flyte type system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to insert new line


:::{important}
Pydantic BaseModel V2 only works when you are using flytekit version >= v1.14.0.
:::

With the 1.14 release, `flytekit` adopted `MessagePack` as the serialization format for Pydantic `BaseModel`,
overcoming a major limitation of serialization into a JSON string within a Protobuf `struct` datatype like the previous versions do:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

to store `int` types, Protobuf's `struct` converts them to `float`, forcing users to write boilerplate code to work around this issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think new line after a colon (unless a list) looks strange


:::{important}
Future-Outlier marked this conversation as resolved.
Show resolved Hide resolved
By default, `flytekit >= 1.14` will produce `msgpack` bytes literals when serializing, preserving the types defined in your `BaseModel` class.
If you're serializing `BaseModel` using `flytekit` version >= v1.14.0 and you want to produce Protobuf `struct` literal instead, you can set environment variable `FLYTE_USE_OLD_DC_FORMAT` to `true`.

For more details, you can refer the MESSAGEPACK IDL RFC: https://github.com/flyteorg/flyte/blob/master/rfc/system/5741-binary-idl-with-message-pack.md
:::

```{note}
You can put Dataclass and FlyteTypes (FlyteFile, FlyteDirectory, FlyteSchema, and StructuredDataset) in a pydantic BaseModel.
```

```{note}
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks].
```

To begin, import the necessary dependencies:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 1-9
```

Build your custom image with ImageSpec:
```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 11-14
```

## Python types
We define a `pydantic basemodel` with `int`, `str` and `dict` as the data types.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:pyobject: Datum
```

You can send a `pydantic basemodel` between different tasks written in various languages, and input it through the Flyte console as raw JSON.

:::{note}
All variables in a data class should be **annotated with their type**. Failure to do should will result in an error.
:::

Once declared, a dataclass can be returned as an output or accepted as an input.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 26-41
```

## Flyte types
We also define a data class that accepts {std:ref}`StructuredDataset <structured_dataset>`,
{std:ref}`FlyteFile <files>` and {std:ref}`FlyteDirectory <folder>`.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 45-86
```

A data class supports the usage of data associated with Python types, data classes,
flyte file, flyte directory and structured dataset.

We define a workflow that calls the tasks created above.

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:pyobject: basemodel_wf
```

You can run the workflow locally as follows:

```{literalinclude} /examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py
:caption: data_types_and_io/pydantic_basemodel.py
:lines: 99-100
```

To trigger a task that accepts a dataclass as an input with `pyflyte run`, you can provide a JSON file as an input:
```
pyflyte run \
https://raw.githubusercontent.com/flyteorg/flytesnacks/b71e01d45037cea883883f33d8d93f258b9a5023/examples/data_types_and_io/data_types_and_io/pydantic_basemodel.py \
basemodel_wf --x 1 --y 2
```

[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/
Loading