[BUG] Exception during validation of ExpectColumnValuesToNotBeNull #10410

Utkarsh-Krishna · 2024-09-17T15:31:41Z

Describe the bug
I am using a spark/pandas dataframe. The dataframe has multiple columns and I am using one of them as a parameter for this expectation. If I use a column which has no null values then there is no exception and I get the expected result. Now when I pass some other column (also does not have any null value) or some columns which have nulls, I see exceptions.

To Reproduce
Traceback:
"exception_info": {
"exception_traceback": "Traceback (most recent call last):\n File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-6e39b63e-ade0-4e51-94c2-99c6cf2319a5/lib/python3.9/site-packages/great_expectations/validator/validator.py", line 648, in graph_validate\n result = expectation.metrics_validate(\n File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-6e39b63e-ade0-4e51-94c2-99c6cf2319a5/lib/python3.9/site-packages/great_expectations/expectations/expectation.py", line 1081, in metrics_validate\n _validate_dependencies_against_available_metrics(\n File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-6e39b63e-ade0-4e51-94c2-99c6cf2319a5/lib/python3.9/site-packages/great_expectations/expectations/expectation.py", line 2773, in _validate_dependencies_against_available_metrics\n raise InvalidExpectationConfigurationError( # noqa: TRY003\ngreat_expectations.exceptions.exceptions.InvalidExpectationConfigurationError: Metric ('column_values.nonnull.unexpected_count', '657e384d8614677fff7d7be97ee019fe', ()) is not available for validation of configuration. Please check your configuration.\n",
"exception_message": "Metric ('column_values.nonnull.unexpected_count', '657e384d8614677fff7d7be97ee019fe', ()) is not available for validation of configuration. Please check your configuration.",
"raised_exception": true

Environment (please complete the following information):

Databricks runtime 12.2 LTS
GX version 1.0.4

adeola-ak · 2024-09-18T16:59:14Z

Please share the expectations you have written.

adeola-ak · 2024-09-18T21:59:20Z

Just to clarify, are you passing columns with both non-null and null values to ExpectColumnValuesToNotBeNull and only encountering exceptions when the columns have null values?

Could you please share as much as possible regarding your configuration of suites and expectations? After reviewing the exception message, it seems the error suggests that the metric related to non-null values isn’t being handled as expected. If your configuration looks fine, I'll escalate this to the team to investigate why the metric can't be computed.

Utkarsh-Krishna · 2024-09-19T06:39:28Z

Irrespective of the column values (null or not null) this is working for some columns and giving exceptions for some columns in the validation_results (see the code below).

I am sharing an example of the code that I have written.

CODE:

import great_expectations as gx

context = gx.get_context()
data_source_name = "my_data_source"
data_source = context.data_sources.add_spark(name=data_source_name)
data_asset_name = "my_dataframe_data_asset"
data_asset = data_source.add_dataframe_asset(name=data_asset_name)

batch_definition_name = "my_batch_definition"
batch_definition = data_asset.add_batch_definition_whole_dataframe(
    batch_definition_name
)

suite = context.suites.add(
    gx.core.expectation_suite.ExpectationSuite(name="my_expectations")
)

suite.add_expectation(
    expectation = gx.expectations.ExpectColumnValuesToNotBeNull(
        column="id"
    )
)

dataframe = spark.table("table1")
batch_parameters = {"dataframe": dataframe}

#validation
validation_definition = context.validation_definitions.add(
    gx.core.validation_definition.ValidationDefinition(
        name="my_validation_definition",
        data=batch_definition,
        suite=suite,
    )
)

validation_results = validation_definition.run(batch_parameters=batch_parameters)
print(validation_results)

adeola-ak · 2024-09-19T18:42:36Z

for the ones that are working and not working - are they all within the same table, "table1"? also, for the columns that are not working, what is the type of the columns that you see are resulting in exceptions? I am not able to reproduce this yet in databricks

Utkarsh-Krishna · 2024-09-20T13:01:40Z

data types for columns which are working - date, string
data types for columns that NOT working - string
NOT working column data - string with 40 characters (alpha numeric)

Just FYI.
Same data works well with SparkDFDataset (GX version = 0.18.17)

adeola-ak · 2024-10-03T04:00:09Z

hi there, are you still running into this issue?

Utkarsh-Krishna · 2024-10-03T08:07:12Z

hi, yes I am.

adeola-ak · 2024-10-03T17:35:00Z

can you run your code with nyc taxi sample data from databricks. more info here and lets try using the same data to see if we can narrow down on whats going on here? because i am still unable to reproduce this on varying types of data on my end. I don't know if your data could be the issue

Utkarsh-Krishna · 2024-10-07T10:07:07Z

I found the issue, it seems the code works for my data if the column name is passed in UPPER CASE e.g. "ID" but the same code doesn't work if the column name is passed in LOWER CASE e.g. "id".

So looks like the columns are case sensitive now which was not the case with SparkDFDataset.
Let me know if this is expected.

JerryLeeD3d · 2024-10-07T18:14:02Z

I have the same issue - ExpectColumnValuesToNotBeNull for a spark dataframe.

Utkarsh-Krishna · 2024-10-18T07:28:02Z

any updates?

adeola-ak · 2024-10-22T16:36:47Z

Ensuring that columns are passed in the exact case that they are defined in the DataFrame schema is a temporary workaround while this is investigated further

jwalant-dattani · 2024-11-19T06:52:18Z

For those still facing this issue, there could be another reason for this.

In my case the column names were in the expected case. However, I was getting this error when input spark df was getting created from a pandas df. I switched that to create it directly from the dataset using spark.read.* api and then it worked fine. Possibly some implicit conversion of schema or data happening underneath that was causing this.

See if this workaround helps.

github-project-automation bot added this to GX Core Issues Board Sep 17, 2024

github-project-automation bot moved this to To Do in GX Core Issues Board Sep 17, 2024

Utkarsh-Krishna mentioned this issue Sep 17, 2024

[BUG] Exception during validation of ExpectColumnValuesToBeInSet #10371

Closed

Utkarsh-Krishna changed the title ~~Exception during validation of ExpectColumnValuesToNotBeNull~~ [BUG] Exception during validation of ExpectColumnValuesToNotBeNull Sep 17, 2024

adeola-ak added the bug Bugs bugs bugs! label Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Exception during validation of ExpectColumnValuesToNotBeNull #10410

[BUG] Exception during validation of ExpectColumnValuesToNotBeNull #10410

Utkarsh-Krishna commented Sep 17, 2024

adeola-ak commented Sep 18, 2024

adeola-ak commented Sep 18, 2024 •

edited

Loading

Utkarsh-Krishna commented Sep 19, 2024 •

edited by Kilo59

Loading

adeola-ak commented Sep 19, 2024

Utkarsh-Krishna commented Sep 20, 2024

adeola-ak commented Oct 3, 2024

Utkarsh-Krishna commented Oct 3, 2024

adeola-ak commented Oct 3, 2024

Utkarsh-Krishna commented Oct 7, 2024

JerryLeeD3d commented Oct 7, 2024

Utkarsh-Krishna commented Oct 18, 2024

adeola-ak commented Oct 22, 2024

jwalant-dattani commented Nov 19, 2024

[BUG] Exception during validation of ExpectColumnValuesToNotBeNull #10410

[BUG] Exception during validation of ExpectColumnValuesToNotBeNull #10410

Comments

Utkarsh-Krishna commented Sep 17, 2024

adeola-ak commented Sep 18, 2024

adeola-ak commented Sep 18, 2024 • edited Loading

Utkarsh-Krishna commented Sep 19, 2024 • edited by Kilo59 Loading

adeola-ak commented Sep 19, 2024

Utkarsh-Krishna commented Sep 20, 2024

adeola-ak commented Oct 3, 2024

Utkarsh-Krishna commented Oct 3, 2024

adeola-ak commented Oct 3, 2024

Utkarsh-Krishna commented Oct 7, 2024

JerryLeeD3d commented Oct 7, 2024

Utkarsh-Krishna commented Oct 18, 2024

adeola-ak commented Oct 22, 2024

jwalant-dattani commented Nov 19, 2024

adeola-ak commented Sep 18, 2024 •

edited

Loading

Utkarsh-Krishna commented Sep 19, 2024 •

edited by Kilo59

Loading