You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When integrating Great Expectations with Kedro, using the condition_parser within a pipeline does not yield expected results. Specifically, setting a row_condition in an expectation suite within the pipeline is failing. I've used other expectations successfully but this is the first using conditioning.
Steps to reproduce the issue:
Add an expectation to the suite with ExpectColumnPairValuesToBeInSet.
Configure the value_pairs_set, column_A, and column_B.
Set condition_parser to 'pandas' and add a row_condition.
Expected behavior
Expectation should execute successfully within the Kedro pipeline with the row condition applied as specified.
Environment:
Operating System: Windows
Great Expectations Version: 1.1.3
Data Source: Pandas
Additional context: This issue arises only when running within a Kedro pipeline.
data docs expectation result:
expect_column_pair_values_to_be_in_set raised an exception:
Traceback (most recent call last):
File ".../site-packages/great_expectations/validator/validator.py", line 650, in graph_validate
result = expectation.metrics_validate(
File ".../site-packages/great_expectations/expectations/expectation.py", line 1113, in metrics_validate
evr: ExpectationValidationResult = self._build_evr(
File ".../site-packages/great_expectations/expectations/expectation.py", line 1130, in _build_evr
evr = ExpectationValidationResult(**raw_response)
File ".../site-packages/great_expectations/core/expectation_validation_result.py", line 96, in __init__
raise gx_exceptions.InvalidCacheValueError(result)
great_expectations.exceptions.exceptions.InvalidCacheValueError: Invalid result values were found when trying to instantiate an ExpectationValidationResult.
- Invalid result values are likely caused by inconsistent cache values.
- Great Expectations enables caching by default.
- Please ensure that caching behavior is consistent between the underlying Dataset (e.g. Spark) and Great Expectations.
Result: {
"element_count": 760,
"unexpected_count": 98321,
"unexpected_percent": 100.0,
"missing_count": -97561, "missing_percent": -12836.973684210525, "unexpected_percent_total": 12936.973684210525, "unexpected_percent_nonmissing": 100.0,
I've removed some sensitive information but if anything else is needed I can try to help
The text was updated successfully, but these errors were encountered:
Could you share more details about your workflow with GX and Kedro? Specifically, I'm trying to determine if you're transforming the DataFrame in a pipeline step without providing GX a reference to the updated DataFrame, which might result in applying the row condition to the untransformed data.
Additionally, are you properly updating your batch with each change to the DataFrame?
It would also be helpful if you could share an anonymized version of the row_condition for context.
An engineer I spoke with mentioned that the CachedDataset abstraction might be relevant here, so we may need to explore that as well.
Ultimately, please provide as much detail as possible about your workflow so we can investigate further. Thank you!
Describe the bug
When integrating Great Expectations with Kedro, using the condition_parser within a pipeline does not yield expected results. Specifically, setting a row_condition in an expectation suite within the pipeline is failing. I've used other expectations successfully but this is the first using conditioning.
Steps to reproduce the issue:
Add an expectation to the suite with ExpectColumnPairValuesToBeInSet.
Configure the value_pairs_set, column_A, and column_B.
Set condition_parser to 'pandas' and add a row_condition.
suite.add_expectation(
gx.expectations.ExpectColumnPairValuesToBeInSet(
value_pairs_set=<EXPECTED_SET>,
column_A="colA",
column_B="colB",
condition_parser='pandas',
row_condition='<ROW_CONDITION>',
)
)
Expected behavior
Expectation should execute successfully within the Kedro pipeline with the row condition applied as specified.
Environment:
Operating System: Windows
Great Expectations Version: 1.1.3
Data Source: Pandas
Additional context: This issue arises only when running within a Kedro pipeline.
data docs expectation result:
I've removed some sensitive information but if anything else is needed I can try to help
The text was updated successfully, but these errors were encountered: