Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from StrictYAML to Pydantic #870

Merged
merged 19 commits into from
Mar 8, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions RELEASE-NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
## Fixes and improvements
* Adding `--image-name` option for image name argument in `spcs image-repository list-tags` for consistency with other commands.
* Fixed errors during `spcs image-registry login` not being formatted correctly.
* Project definition no longer accept extra fields. Any extra field will cause an error.
* Project definition no longer accept extra fields. Any extra field will cause an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# v2.1.0

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ dependencies = [
"typer==0.9.0",
"urllib3>=1.21.1,<2.3",
"GitPython==3.1.42",
"pydantic==2.6.3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove need for strictyml? Or should it be done in separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un-done. Left pydantic for some uses not directly related to project definitions

]
classifiers = [
"Development Status :: 3 - Alpha",
Expand Down
2 changes: 1 addition & 1 deletion src/snowflake/cli/api/commands/flags.py
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,7 @@ def project_definition_option(project_name: str):

def _callback(project_path: Optional[str]):
dm = DefinitionManager(project_path)
project_definition = dm.project_definition.get(project_name)
project_definition = getattr(dm.project_definition, project_name, None)
project_root = dm.project_root

if not project_definition:
Expand Down
44 changes: 20 additions & 24 deletions src/snowflake/cli/api/project/definition.py
Original file line number Diff line number Diff line change
@@ -1,45 +1,36 @@
from pathlib import Path
from typing import Dict, List, Union
from typing import Dict, List

import yaml.loader
from snowflake.cli.api.cli_global_context import cli_context
from snowflake.cli.api.constants import DEFAULT_SIZE_LIMIT_MB
from snowflake.cli.api.project.schemas.project_definition import (
project_override_schema,
project_schema,
)
from snowflake.cli.api.project.schemas.project_definition import ProjectDefinition
from snowflake.cli.api.project.util import (
append_to_identifier,
clean_identifier,
get_env_username,
to_identifier,
)
from snowflake.cli.api.secure_path import SecurePath
from strictyaml import (
YAML,
as_document,
load,
)
from yaml import load

DEFAULT_USERNAME = "unknown_user"


def merge_left(target: Union[Dict, YAML], source: Union[Dict, YAML]) -> None:
def merge_left(target: Dict, source: Dict) -> None:
"""
Recursively merges key/value pairs from source into target.
Modifies the original dict-like "target".
"""
for k, v in source.items():
if k in target and (
isinstance(v, dict) or (isinstance(v, YAML) and v.is_mapping())
):
if k in target and isinstance(target[k], dict):
# assumption: all inputs have been validated.
assert isinstance(target[k], dict) or isinstance(target[k], YAML)
merge_left(target[k], v)
else:
target[k] = v


def load_project_definition(paths: List[Path]) -> dict:
def load_project_definition(paths: List[Path]) -> ProjectDefinition:
"""
Loads project definition, optionally overriding values. Definition values
are merged in left-to-right order (increasing precedence).
Expand All @@ -49,22 +40,23 @@ def load_project_definition(paths: List[Path]) -> dict:
raise ValueError("Need at least one definition file.")

with spaths[0].open("r", read_file_limit_mb=DEFAULT_SIZE_LIMIT_MB) as base_yml:
definition = load(base_yml.read(), project_schema)
definition = load(base_yml.read(), Loader=yaml.loader.BaseLoader)

for override_path in spaths[1:]:
with override_path.open(
"r", read_file_limit_mb=DEFAULT_SIZE_LIMIT_MB
) as override_yml:
overrides = load(override_yml.read(), project_override_schema)
overrides = load(override_yml.read(), Loader=yaml.loader.BaseLoader)
merge_left(definition, overrides)

# TODO: how to show good error messages here?
definition.revalidate(project_schema)

return definition.data
return ProjectDefinition(**definition)


def generate_local_override_yml(project: Union[Dict, YAML]) -> YAML:
def generate_local_override_yml(
project: ProjectDefinition,
) -> ProjectDefinition:
"""
Generates defaults for optional keys in the same YAML structure as the project
schema. The returned YAML object can be saved directly to a file, if desired.
Expand All @@ -76,8 +68,8 @@ def generate_local_override_yml(project: Union[Dict, YAML]) -> YAML:
warehouse = conn.warehouse

local: dict = {}
if "native_app" in project:
name = clean_identifier(project["native_app"]["name"])
if project.native_app:
name = clean_identifier(project.native_app.name)
app_identifier = to_identifier(name)
user_app_identifier = append_to_identifier(app_identifier, f"_{user}")
package_identifier = append_to_identifier(app_identifier, f"_pkg_{user}")
Expand All @@ -90,8 +82,12 @@ def generate_local_override_yml(project: Union[Dict, YAML]) -> YAML:
},
"package": {"name": package_identifier, "role": role},
}
# TODO: this is an ugly workaround, because pydantics BaseModel.model_copy(update=) doesn't work properly
# After fixing UpdatableModel.update_from_dict it should be used here
target_definition = project.model_dump()
merge_left(target_definition, local)

return as_document(local, project_override_schema)
return ProjectDefinition(**target_definition)


def default_app_package(project_name: str):
Expand Down
3 changes: 2 additions & 1 deletion src/snowflake/cli/api/project/definition_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from snowflake.cli.api.exceptions import MissingConfiguration
from snowflake.cli.api.project.definition import load_project_definition
from snowflake.cli.api.project.schemas.project_definition import ProjectDefinition


def _compat_is_mount(path: Path):
Expand Down Expand Up @@ -100,5 +101,5 @@ def _user_definition_file_if_available(project_path: Path) -> Optional[Path]:
)

@functools.cached_property
def project_definition(self) -> dict:
def project_definition(self) -> ProjectDefinition:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the type safety Pydantic gives us!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too :)
As i cannot answer the comment above - yes, Pydantic can be used to generate JSON schemas, but i think i'll summon @sfc-gh-turbaszek to answer the long-term vision plans :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we could use description and examples fields of FieldInfo, to have the schema well documented.

return load_project_definition(self._project_config_paths)
24 changes: 24 additions & 0 deletions src/snowflake/cli/api/project/errors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
from textwrap import dedent

from pydantic import ValidationError


class SchemaValidationError(Exception):
generic_message = "For field {loc} you provided '{loc}'. This caused: {msg}"
message_templates = {
"string_type": "{msg} for field '{loc}', you provided '{input}'",
"extra_forbidden": "{msg}. You provided field '{loc}' with value '{input}' that is not present in the schema",
"missing": "Your project definition is missing following fields: {loc}",
}

def __init__(self, error: ValidationError):
errors = error.errors()
message = f"During evaluation of {error.title} schema following errors were encoutered:\n"
message += "\n".join(
[
self.message_templates.get(e["type"], self.generic_message).format(**e)
for e in errors
]
)

super().__init__(dedent(message))
43 changes: 0 additions & 43 deletions src/snowflake/cli/api/project/schemas/native_app.py

This file was deleted.

Empty file.
31 changes: 31 additions & 0 deletions src/snowflake/cli/api/project/schemas/native_app/application.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from __future__ import annotations

from typing import Literal, Optional

from pydantic import Field
from snowflake.cli.api.project.schemas.updatable_model import (
IdentifierField,
UpdatableModel,
)


class Application(UpdatableModel):
role: Optional[str] = Field(
title="Role to use when creating the application instance and consumer-side objects",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "application object" instead of "application instance"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

default=None,
)
name: Optional[str] = Field(
title="Name of the application created when you run the snow app run command",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... application object created ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

default=None,
)
warehouse: Optional[str] = IdentifierField(
title="Name of the application created when you run the snow app run command",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, and for line 26 as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

default=None,
)
debug: Optional[bool] = Field(
title="Whether to enable debug mode when using a named stage to create an application",
default=True,
)


DistributionOptions = Literal["internal", "external", "INTERNAL", "EXTERNAL"]
39 changes: 39 additions & 0 deletions src/snowflake/cli/api/project/schemas/native_app/native_app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
from __future__ import annotations

import re
from typing import List, Optional, Union

from pydantic import Field, field_validator
from snowflake.cli.api.project.schemas.native_app.application import Application
from snowflake.cli.api.project.schemas.native_app.package import Package
from snowflake.cli.api.project.schemas.native_app.path_mapping import PathMapping
from snowflake.cli.api.project.schemas.updatable_model import UpdatableModel
from snowflake.cli.api.project.util import (
SCHEMA_AND_NAME,
)


class NativeApp(UpdatableModel):
name: str = Field(
title="Project identifier",
)
artifacts: List[Union[PathMapping, str]] = Field(
title="List of file source and destination pairs to add to the deploy root",
)
deploy_root: Optional[str] = Field(
title="Folder at the root of your project where the build step copies the artifacts.",
default="output/deploy/",
)
source_stage: Optional[str] = Field(
title="Identifier of the stage that stores the application artifacts.",
default="app_src.stage",
)
package: Optional[Package] = Field(title="PackageSchema", default=None)
application: Optional[Application] = Field(title="Application info", default=None)

@field_validator("source_stage")
@classmethod
def validate_source_stage(cls, input_value: str):
if not re.match(SCHEMA_AND_NAME, input_value):
raise ValueError("Incorrect value for Native Apps source stage value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can rephrase to:
Incorrect value for source_stage property of native_app.
The goal is to avoid using "Native Apps" as a term, which comes from our marketing teams.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think latter one we can add something like regex_validator(value, pattern) that would cause a generic error like Incorrect value for native_app.source_stage. Allowed values: pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return input_value
40 changes: 40 additions & 0 deletions src/snowflake/cli/api/project/schemas/native_app/package.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
from __future__ import annotations

from typing import List, Optional

from pydantic import Field, field_validator
from snowflake.cli.api.project.schemas.native_app.application import DistributionOptions
from snowflake.cli.api.project.schemas.updatable_model import (
IdentifierField,
UpdatableModel,
)


class Package(UpdatableModel):
scripts: Optional[List[str]] = Field(
title="List of SQL file paths relative to the project root", default=None
)
role: Optional[str] = IdentifierField(
title="Role to use when creating the application package and provider-side objects",
default=None,
)
name: Optional[str] = IdentifierField(
title="Name of the application created when you run the snow app run command", # TODO: this description seems duplicated, is it ok?
sfc-gh-jsikorski marked this conversation as resolved.
Show resolved Hide resolved
default=None,
)
warehouse: Optional[str] = IdentifierField(
title="Warehouse used to run the scripts", default=None
)
distribution: Optional[DistributionOptions] = Field(
title="Distribution of the application package created by the Snowflake CLI",
default="internal",
)

@field_validator("scripts")
@classmethod
def validate_scripts(cls, input_list):
if len(input_list) != len(set(input_list)):
raise ValueError(
"Scripts field should contain unique values. Check the list for duplicates and try again"
sfc-gh-jsikorski marked this conversation as resolved.
Show resolved Hide resolved
)
return input_list
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from __future__ import annotations

from typing import Optional

from snowflake.cli.api.project.schemas.updatable_model import UpdatableModel


class PathMapping(UpdatableModel):
src: str
dest: Optional[str] = None
40 changes: 21 additions & 19 deletions src/snowflake/cli/api/project/schemas/project_definition.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,25 @@
from __future__ import annotations

from snowflake.cli.api.project.schemas import (
native_app,
snowpark,
streamlit,
)
from snowflake.cli.api.project.schemas.relaxed_map import RelaxedMap
from strictyaml import (
Int,
Optional,
)
from typing import Optional

project_schema = RelaxedMap(
{
"definition_version": Int(),
Optional("native_app"): native_app.native_app_schema,
Optional("snowpark"): snowpark.snowpark_schema,
Optional("streamlit"): streamlit.streamlit_schema,
}
)
from pydantic import Field
from snowflake.cli.api.project.schemas.native_app.native_app import NativeApp
from snowflake.cli.api.project.schemas.snowpark.snowpark import Snowpark
from snowflake.cli.api.project.schemas.streamlit.streamlit import Streamlit
from snowflake.cli.api.project.schemas.updatable_model import UpdatableModel

project_override_schema = project_schema.as_fully_optional()

class ProjectDefinition(UpdatableModel):
definition_version: int = Field(
title="Version of the project definition schema, which is currently 1"
)
native_app: Optional[NativeApp] = Field(
title="Native app definitions for the project", default=None
)
snowpark: Optional[Snowpark] = Field(
title="Snowpark functions and procedures definitions for the project",
default=None,
)
streamlit: Optional[Streamlit] = Field(
title="Streamlit definitions for the project", default=None
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add validator for definition_version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean definition == 1?

Copy link
Contributor

@sfc-gh-cgorrie sfc-gh-cgorrie Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to think about this -- when we get to project definition v2 (design currently in progress) it will look completely different. We should have a way to separate out the definitions by definition_version rather than have to have a complex base model that has to support both. I agree that for now ensuring definition_version == 1 is important, but we should make sure that when we introduce definition_version == 2 it's something that can live alongside this model rather than "within" it, if we have enough changes that it makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so my proposition is to add definition_version==1 to field validation. If user provides another version of definition, it's pointless to proceed with parsing any fields.
Then, when we add new versions, we can change load_project_definition() logic to decide which model should be used for provided data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's have ==1 for now. We will add more validation once we start working on v2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Loading
Loading