Issue Using condition_parser in Great Expectations within Kedro Pipeline #10644

fabio-scarel · 2024-11-08T02:48:42Z

Describe the bug
When integrating Great Expectations with Kedro, using the condition_parser within a pipeline does not yield expected results. Specifically, setting a row_condition in an expectation suite within the pipeline is failing. I've used other expectations successfully but this is the first using conditioning.

Steps to reproduce the issue:

Add an expectation to the suite with ExpectColumnPairValuesToBeInSet.
Configure the value_pairs_set, column_A, and column_B.
Set condition_parser to 'pandas' and add a row_condition.

suite.add_expectation(
gx.expectations.ExpectColumnPairValuesToBeInSet(
value_pairs_set=<EXPECTED_SET>,
column_A="colA",
column_B="colB",
condition_parser='pandas',
row_condition='<ROW_CONDITION>',
)
)

Expected behavior
Expectation should execute successfully within the Kedro pipeline with the row condition applied as specified.

Environment:

Operating System: Windows
Great Expectations Version: 1.1.3
Data Source: Pandas
Additional context: This issue arises only when running within a Kedro pipeline.

data docs expectation result:

expect_column_pair_values_to_be_in_set raised an exception:
Traceback (most recent call last):
File ".../site-packages/great_expectations/validator/validator.py", line 650, in graph_validate
result = expectation.metrics_validate(
File ".../site-packages/great_expectations/expectations/expectation.py", line 1113, in metrics_validate
evr: ExpectationValidationResult = self._build_evr(
File ".../site-packages/great_expectations/expectations/expectation.py", line 1130, in _build_evr
evr = ExpectationValidationResult(**raw_response)
File ".../site-packages/great_expectations/core/expectation_validation_result.py", line 96, in __init__
raise gx_exceptions.InvalidCacheValueError(result)

great_expectations.exceptions.exceptions.InvalidCacheValueError: Invalid result values were found when trying to instantiate an ExpectationValidationResult.
- Invalid result values are likely caused by inconsistent cache values.
- Great Expectations enables caching by default.
- Please ensure that caching behavior is consistent between the underlying Dataset (e.g. Spark) and Great Expectations.
Result: {
"element_count": 760,
"unexpected_count": 98321,
"unexpected_percent": 100.0,
"missing_count": -97561, "missing_percent": -12836.973684210525, "unexpected_percent_total": 12936.973684210525, "unexpected_percent_nonmissing": 100.0,

I've removed some sensitive information but if anything else is needed I can try to help

The text was updated successfully, but these errors were encountered:

adeola-ak · 2024-12-02T22:05:06Z

Hi @fabio-scarel, thanks for reaching out!

Could you share more details about your workflow with GX and Kedro? Specifically, I'm trying to determine if you're transforming the DataFrame in a pipeline step without providing GX a reference to the updated DataFrame, which might result in applying the row condition to the untransformed data.

Additionally, are you properly updating your batch with each change to the DataFrame?

It would also be helpful if you could share an anonymized version of the row_condition for context.

An engineer I spoke with mentioned that the CachedDataset abstraction might be relevant here, so we may need to explore that as well.

Ultimately, please provide as much detail as possible about your workflow so we can investigate further. Thank you!

github-project-automation bot added this to GX Core Issues Board Nov 8, 2024

github-project-automation bot moved this to To Do in GX Core Issues Board Nov 8, 2024

adeola-ak moved this from To Do to In progress in GX Core Issues Board Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue Using condition_parser in Great Expectations within Kedro Pipeline #10644

Issue Using condition_parser in Great Expectations within Kedro Pipeline #10644

fabio-scarel commented Nov 8, 2024

adeola-ak commented Dec 2, 2024

Issue Using condition_parser in Great Expectations within Kedro Pipeline #10644

Issue Using condition_parser in Great Expectations within Kedro Pipeline #10644

Comments

fabio-scarel commented Nov 8, 2024

adeola-ak commented Dec 2, 2024