Skip to content

Commit

Permalink
chore(weave): Create a system of defining schema-validated objects (#…
Browse files Browse the repository at this point in the history
…2826)

* init

* init

* init

* generation complete

* beginning ts implementation

* Initial TS implementation complete

* Initial TS tests complete

* Initial TS tests complete

* Initial python tests complete

* Typescript improvements

* Python Tests complete

* Attempted fix

* Attempted fix

* clean

* clean

* fixed bug

* maybe fix

* maybe fix

* Added diagram

* Removed first hack

* Removed second hack

* lint

* Fixed generation

* Fixed generation 2

* Fixed types

* Type fixes

* Addressed comments
  • Loading branch information
tssweeney authored Oct 31, 2024
1 parent c39db4f commit 3e1647f
Show file tree
Hide file tree
Showing 23 changed files with 2,027 additions and 69 deletions.
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,8 @@ docs:
build:
uv build

prepare-release: docs build
prepare-release: docs build

synchronize-base-object-schemas:
cd weave && make generate_base_object_schemas && \
cd ../weave-js && yarn generate-schemas
218 changes: 218 additions & 0 deletions dev_docs/BaseObjectClasses.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# BaseObjectClasses

## Refresher on Objects and object storage

In Weave, we have a general-purpose data storage system for objects.
The payloads themselves are completely free-form - basically anything that can be JSON-serialized.
Users can "publish" runtime objects to weave using `weave.publish`.
For example:

```python
config = {"model_name": "my_model", "model_version": "1.0"}
ref = weave.publish(config, name="my_model_config")
```

This will create a new object "version" in the collection called "my_model_config".
These can then be retrieved using `weave.ref().get()`:

```python
config = weave.ref("my_model_config").get()
```

Sometimes users are working with standard structured classes like `dataclasses` or `pydantic.BaseModel`.
In such cases, we have special serialization and deserialization logic that allows for cleaner serialization patterns.
For example, let's say the user does:

```python
class ModelConfig(weave.Object):
model_name: str
model_version: str
```

Then the user can publish an instance of `ModelConfig` as follows:

```python
config = ModelConfig(model_name="my_model", model_version="1.0")
ref = weave.publish(config)
```

This will result in an on-disk payload that looks like:

```json
{
"model_name": "my_model",
"model_version": "1.0",
"_type": "ModelConfig",
"_class_name": "ModelConfig",
"_bases": ["Object", "BaseModel"]
}
```

And additionally, the user can query for all objects of the `ModelConfig` class using the `base_object_classes` filter in `objs_query` or `POST objs/query`.
Effectively, this is like creating a virtual table for that class.

**Terminology**: We use the term "weave Object" (capital "O") to refer to instances of classes that subclass `weave.Object`.

**Technical note**: the "base_object_class" is the first subtype of "Object", not the _class_name.
For example, let's say the class hierarchy is:
* `A -> Object -> BaseModel`, then the `base_object_class` filter will be "A".
* `B -> A -> Object -> BaseModel`, then the `base_object_class` filter will still be "A"!

Finally, the Weave library itself utilizes this mechanism for common objects like `Model`, `Dataset`, `Evaluation`, etc...
This allows the user to subclass these objects to add additional metadata or functionality, while categorizing them in the same virtual table.

## Validated Base Objects

While many Weave Objects are free-form and user-defined, there is often a need for well-defined schemas for configuration objects that are tightly defined by Weave itself. The BaseObject system provides a way to define these schemas once and use them consistently across the entire stack.

### Key Features

1. **Single Source of Truth**: Define your schema once using Pydantic models
2. **Full Stack Integration**: The schema is used for:
- Python SDK validation
- Server-side HTTP API validation
- Frontend UI validation with generated TypeScript types
- Future: OpenAPI schema generation
- Future: TypeScript SDK type generation

### Usage Example

Here's how to define and use a validated base object:

1. **Define your schema** (in `weave/trace_server/interface/base_object_classes/your_schema.py`):

```python
from pydantic import BaseModel
from weave.trace_server.interface.base_object_classes import base_object_def

class NestedConfig(BaseModel):
setting_a: int

class MyConfig(base_object_def.BaseObject):
name: str
nested: NestedConfig
reference: base_object_def.RefStr

__all__ = ["MyConfig"]
```

2. **Use in Python**:
```python
# Publishing
ref = weave.publish(MyConfig(...))

# Fetching (maintains type)
config = ref.get()
assert isinstance(config, MyConfig)
```

3. **Use via HTTP API**:
```bash
# Creating
curl -X POST 'https://trace.wandb.ai/obj/create' \
-H 'Content-Type: application/json' \
-d '{
"obj": {
"project_id": "user/project",
"object_id": "my_config",
"val": {...},
"set_base_object_class": "MyConfig"
}
}'

# Querying
curl -X POST 'https://trace.wandb.ai/objs/query' \
-d '{
"project_id": "user/project",
"filter": {
"base_object_classes": ["MyConfig"]
}
}'
```

4. **Use in React**:
```typescript
// Read with type safety
const result = useBaseObjectInstances("MyConfig", ...);

// Write with validation
const createFn = useCreateBaseObjectInstance("MyConfig");
createFn({...}); // TypeScript enforced schema
```

### Keeping Frontend Types in Sync

Run `make synchronize-base-object-schemas` to ensure the frontend TypeScript types are up to date with your Pydantic schemas.

### Implementation Notes

- Base objects are pure data schemas (fields only)
- The system is designed to work independently of the weave SDK to maintain clean separation of concerns
- Server-side validation ensures data integrity
- Client-side validation (both Python and TypeScript) provides early feedback
- Generated TypeScript types ensure type safety in the frontend

### Architecture Flow

1. Define your schema in a python file in the `weave/trace_server/interface/base_object_classes/test_only_example.py` directory. See `weave/trace_server/interface/base_object_classes/test_only_example.py` as an example.
2. Make sure to register your schemas in `weave/trace_server/interface/base_object_classes/base_object_registry.py` by calling `register_base_object`.
3. Run `make synchronize-base-object-schemas` to generate the frontend types.
* The first step (`make generate_base_object_schemas`) will run `weave/scripts/generate_base_object_schemas.py` to generate a JSON schema in `weave/trace_server/interface/base_object_classes/generated/generated_base_object_class_schemas.json`.
* The second step (yarn `generate-schemas`) will read this file and use it to generate the frontend types located in `weave-js/src/components/PagePanelComponents/Home/Browse3/pages/wfReactInterface/generatedBaseObjectClasses.zod.ts`.
4. Now, each use case uses different parts:
1. `Python Writing`. Users can directly import these classes and use them as normal Pydantic models, which get published with `weave.publish`. The python client correct builds the requisite payload.
2. `Python Reading`. Users can `weave.ref().get()` and the weave python SDK will return the instance with the correct type. Note: we do some special handling such that the returned object is not a WeaveObject, but literally the exact pydantic class.
3. `HTTP Writing`. In cases where the client/user does not want to add the special type information, users can publish base objects by setting the `set_base_object_class` setting on `POST obj/create` to the name of the class. The weave server will validate the object against the schema, update the metadata fields, and store the object.
4. `HTTP Reading`. When querying for objects, the server will return the object with the correct type if the `base_object_class` metadata field is set.
5. `Frontend`. The frontend will read the zod schema from `weave-js/src/components/PagePanelComponents/Home/Browse3/pages/wfReactInterface/generatedBaseObjectClasses.zod.ts` and use that to provide compile time type safety when using `useBaseObjectInstances` and runtime type safety when using `useCreateBaseObjectInstance`.
* Note: it is critical that all techniques produce the same digest for the same data - which is tested in the tests. This way versions are not thrashed by different clients/users.

```mermaid
graph TD
subgraph Schema Definition
F["weave/trace_server/interface/<br>base_object_classes/your_schema.py"] --> |defines| P[Pydantic BaseObject]
P --> |register_base_object| R["base_object_registry.py"]
end
subgraph Schema Generation
M["make synchronize-base-object-schemas"] --> G["make generate_base_object_schemas"]
G --> |runs| S["weave/scripts/<br>generate_base_object_schemas.py"]
R --> |import registered classes| S
S --> |generates| J["generated_base_object_class_schemas.json"]
M --> |yarn generate-schemas| Z["generatedBaseObjectClasses.zod.ts"]
J --> Z
end
subgraph "Trace Server"
subgraph "HTTP API"
R --> |validates using| HW["POST obj/create<br>set_base_object_class"]
HW --> DB[(Weave Object Store)]
HR["POST objs/query<br>base_object_classes"] --> |Filters base_object_class| DB
end
end
subgraph "Python SDK"
PW[Client Code] --> |import & publish| W[weave.publish]
W --> |store| HW
R --> |validates using| W
PR["weave ref get()"] --> |queries| HR
R --> |deserializes using| PR
end
subgraph "Frontend"
Z --> |import| UBI["useBaseObjectInstances"]
Z --> |import| UCI["useCreateBaseObjectInstance"]
UBI --> |Filters base_object_class| HR
UCI --> |set_base_object_class| HW
UI[React UI] --> UBI
UI --> UCI
end
style F fill:#f9f,stroke:#333,stroke-width:2px
style P fill:#f9f,stroke:#333,stroke-width:2px
style R fill:#bbf,stroke:#333,stroke-width:2px
style DB fill:#dfd,stroke:#333,stroke-width:2px
style J fill:#ffd,stroke:#333,stroke-width:2px
style Z fill:#ffd,stroke:#333,stroke-width:2px
style M fill:#faa,stroke:#333,stroke-width:4px
```
Loading

0 comments on commit 3e1647f

Please sign in to comment.