-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Exception during validation of ExpectColumnValuesToNotBeNull #10410
Comments
Please share the expectations you have written. |
Just to clarify, are you passing columns with both non-null and null values to ExpectColumnValuesToNotBeNull and only encountering exceptions when the columns have null values? Could you please share as much as possible regarding your configuration of suites and expectations? After reviewing the exception message, it seems the error suggests that the metric related to non-null values isn’t being handled as expected. If your configuration looks fine, I'll escalate this to the team to investigate why the metric can't be computed. |
Irrespective of the column values (null or not null) this is working for some columns and giving exceptions for some columns in the validation_results (see the code below). I am sharing an example of the code that I have written. CODE: import great_expectations as gx
context = gx.get_context()
data_source_name = "my_data_source"
data_source = context.data_sources.add_spark(name=data_source_name)
data_asset_name = "my_dataframe_data_asset"
data_asset = data_source.add_dataframe_asset(name=data_asset_name)
batch_definition_name = "my_batch_definition"
batch_definition = data_asset.add_batch_definition_whole_dataframe(
batch_definition_name
)
suite = context.suites.add(
gx.core.expectation_suite.ExpectationSuite(name="my_expectations")
)
suite.add_expectation(
expectation = gx.expectations.ExpectColumnValuesToNotBeNull(
column="id"
)
)
dataframe = spark.table("table1")
batch_parameters = {"dataframe": dataframe}
#validation
validation_definition = context.validation_definitions.add(
gx.core.validation_definition.ValidationDefinition(
name="my_validation_definition",
data=batch_definition,
suite=suite,
)
)
validation_results = validation_definition.run(batch_parameters=batch_parameters)
print(validation_results) |
for the ones that are working and not working - are they all within the same table, "table1"? also, for the columns that are not working, what is the type of the columns that you see are resulting in exceptions? I am not able to reproduce this yet in databricks |
data types for columns which are working - date, string Just FYI. |
hi there, are you still running into this issue? |
hi, yes I am. |
can you run your code with nyc taxi sample data from databricks. more info here and lets try using the same data to see if we can narrow down on whats going on here? because i am still unable to reproduce this on varying types of data on my end. I don't know if your data could be the issue |
I found the issue, it seems the code works for my data if the column name is passed in UPPER CASE e.g. "ID" but the same code doesn't work if the column name is passed in LOWER CASE e.g. "id". So looks like the columns are case sensitive now which was not the case with SparkDFDataset. |
I have the same issue - ExpectColumnValuesToNotBeNull for a spark dataframe. |
any updates? |
Ensuring that columns are passed in the exact case that they are defined in the DataFrame schema is a temporary workaround while this is investigated further |
For those still facing this issue, there could be another reason for this. In my case the column names were in the expected case. However, I was getting this error when input spark df was getting created from a pandas df. I switched that to create it directly from the dataset using spark.read.* api and then it worked fine. Possibly some implicit conversion of schema or data happening underneath that was causing this. See if this workaround helps. |
Describe the bug
I am using a spark/pandas dataframe. The dataframe has multiple columns and I am using one of them as a parameter for this expectation. If I use a column which has no null values then there is no exception and I get the expected result. Now when I pass some other column (also does not have any null value) or some columns which have nulls, I see exceptions.
To Reproduce
Traceback:
"exception_info": {
"exception_traceback": "Traceback (most recent call last):\n File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-6e39b63e-ade0-4e51-94c2-99c6cf2319a5/lib/python3.9/site-packages/great_expectations/validator/validator.py", line 648, in graph_validate\n result = expectation.metrics_validate(\n File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-6e39b63e-ade0-4e51-94c2-99c6cf2319a5/lib/python3.9/site-packages/great_expectations/expectations/expectation.py", line 1081, in metrics_validate\n _validate_dependencies_against_available_metrics(\n File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-6e39b63e-ade0-4e51-94c2-99c6cf2319a5/lib/python3.9/site-packages/great_expectations/expectations/expectation.py", line 2773, in _validate_dependencies_against_available_metrics\n raise InvalidExpectationConfigurationError( # noqa: TRY003\ngreat_expectations.exceptions.exceptions.InvalidExpectationConfigurationError: Metric ('column_values.nonnull.unexpected_count', '657e384d8614677fff7d7be97ee019fe', ()) is not available for validation of configuration. Please check your configuration.\n",
"exception_message": "Metric ('column_values.nonnull.unexpected_count', '657e384d8614677fff7d7be97ee019fe', ()) is not available for validation of configuration. Please check your configuration.",
"raised_exception": true
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: