You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
If I try to subclass a check to define a certain customizable behavior, everything goes well so long as I keep the original name for the input variable. However, if I add an alias for a Field, an expectation can be instantiated, but it cannot be validated.
To Reproduce
Code:
import great_expectations as gx
from pydantic import v1 as pydantic_v1
from pyspark.sql import SparkSession
from pyspark.sql import types as st
from great_expectations import expectations as gxe
from great_expectations.core.suite_parameters import SuiteParameterDict
class ExpectColumnValuesToStartWith(gxe.ExpectColumnValuesToMatchRegex):
"""Pre-fill a regex expression expectation with a caret, and escape all special characters."""
regex: str | SuiteParameterDict = pydantic_v1.Field(
default="(?s).*",
alias="startswith",
description="Expect rows in a given column to start with some particular value.",
)
@pydantic_v1.validator("regex", pre=True)
def validate_regex(cls, v: str):
return (
"^"
+ "".join(
char if char not in set(r"[@_!#$%^&*()<>?/\|}{~:]") else "\\" + char
for char in v
)
+ ".*"
)
class Config(gxe.ExpectColumnValuesToMatchRegex.Config):
populate_by_name = True
df = (
SparkSession.builder.appName("spark")
.getOrCreate()
.createDataFrame(
[("aaa", "bcc"), ("abb", "bdd"), ("acc", "abc")],
st.StructType(
[
st.StructField("col1", st.StringType(), True),
st.StructField("col2", st.StringType(), True),
]
),
)
)
(
gx.get_context()
.data_sources.add_spark(name="spark_source")
.add_dataframe_asset(name="dataframe_asset")
.add_batch_definition_whole_dataframe(name="whole_dataframe")
.get_batch(batch_parameters={"dataframe": df})
.validate(ExpectColumnValuesToStartWith(column="col2", startwith="a"))
)
Please note that this fails even if I set extras="allowed"Config class attribute.
Expected behavior
I expected this check to fail because of a bad row value, but instead, I got a ValidationError because Field alias are not accounted for in the batch_definition.get_batch(...).validate method.
Environment (please complete the following information):
Operating System: Linux
Great Expectations Version: 1.0.5
Data Source: Spark
Cloud environment: Databricks
Additional context
See my discussion with Tyler Hoffman here.
The text was updated successfully, but these errors were encountered:
A workaround for this is something like this, but documentation seems to implicitly suggest subclassing for customized checks, so I think this bug is still worth pursuing, in the very least because it would allow us to also do useful procedures like modifying the descriptionField parameter in one go:
def ExpectColumnValuesToStartWith(
startwith: str, column: str
) -> gxe.ExpectColumnValuesToMatchRegex:
"""Pre-fill a regex expression expectation with a caret, and escape all special characters."""
regex = (
"^"
+ "".join(
char if char not in set(r"[@_!#$%^&*()<>?/\|}{~:]") else "\\" + char
for char in startwith
)
+ ".*"
)
return gxe.ExpectColumnValuesToMatchRegex(regex=regex, column=column)
Describe the bug
If I try to subclass a check to define a certain customizable behavior, everything goes well so long as I keep the original name for the input variable. However, if I add an alias for a
Field
, an expectation can be instantiated, but it cannot be validated.To Reproduce
Code:
Stack trace:
Please note that this fails even if I set
extras="allowed"
Config
class attribute.Expected behavior
I expected this check to fail because of a bad row value, but instead, I got a ValidationError because Field alias are not accounted for in the
batch_definition.get_batch(...).validate
method.Environment (please complete the following information):
Additional context
See my discussion with Tyler Hoffman here.
The text was updated successfully, but these errors were encountered: