Replies: 13 comments 4 replies
-
@ZeMirella Thank you for filing this. We have not seen this error on Spark. I will keep this issue open, label it with "help wanted" and invite the community to chime in with more details and context. |
Beta Was this translation helpful? Give feedback.
-
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?\n\nThis issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Beta Was this translation helpful? Give feedback.
-
@eugmandel @ZeMirella i see from |
Beta Was this translation helpful? Give feedback.
-
I'm in the same boat. I get Snippet for reference:
I'm also attempting this on Spark (using Synapse) Could we get the issue reopened? Also any configuration that can be tweaked at engine definition level? |
Beta Was this translation helpful? Give feedback.
-
For anyone else that might experience this issue, after having a look at the code and playing around with the parameters, the following workaround worked for me:
I think there is likely a bug somewhere in this version (0.14.6) in how the existing spark context is retrieved. |
Beta Was this translation helpful? Give feedback.
-
@madmaxlax @alexszym what version of Great Expectations are you using? The original issue was for 0.12.1 (very old and using Can you share any steps to reproduce? Thanks |
Beta Was this translation helpful? Give feedback.
-
I didn't specify a version so it should have been just grabbing whatever is latest for GE I was running it in pyspark notebook in Azure Synapse, trying to follow the setup instructing in https://docs.greatexpectations.io/docs/deployment_patterns/how_to_instantiate_a_data_context_hosted_environments |
Beta Was this translation helpful? Give feedback.
-
@kenwade4 0.14.6 in my case |
Beta Was this translation helpful? Give feedback.
-
We also encountered the same exception when trying to add a new runtime data source with |
Beta Was this translation helpful? Give feedback.
-
@alexszym @madmaxlax Thank you for posting this solution and confirmation. I tested out 0.1.4.13 and it works in Synapse. Have you guys try upgrading to anything newer than 0.14.13? I tried to use version 0.15.9 but I am getting the same Promise error. |
Beta Was this translation helpful? Give feedback.
-
@wilson-mok im having the same problem in 0.15.9 also in Synapse. |
Beta Was this translation helpful? Give feedback.
-
@wilson-mok @frammnm Same workaround is still working for me on 0.15.11 |
Beta Was this translation helpful? Give feedback.
-
Thx you for the update and can confirmed that 0.15.11 works! |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
The job presented intermittent errors during data processing, and I couldn't identify what was causing this exception.
To Reproduce
Steps to reproduce the behavior:
We created a class where we passed all the necessary configurations to the DataContextConfig, and after that, we passed the configurations to the BaseDataContext class (project_config = DataContextConfig)
context = BaseDataContext(project_config = DataContextConfig)
expectation_suite = context.get_expectation_suite(expectation_suite_name)
batch_kwargs = {
"datasource": "my_spark_datasource",
"dataset": spark_df,
"data_asset_name": data_asset_name,
}
batch = context.get_batch(batch_kwargs, expectation_suite)
run_id = datetime.datetime.utcnow().strftime("%Y%m%dT%H%M%S.%fZ")
data_context = context.run_validation_operator(
validation_operator_name="action_list_operator",
assets_to_validate=[batch],
run_id=run_id,
)
Expected behavior
Pass the settings to the class, and be able to use the library by passing the tables to be validated
Environment (please complete the following information):
Additional context
Traceback (most recent call last):
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 2442, in _call_proxy
return_value = getattr(self.pool[obj_id], method)(*params)
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/pyspark.zip/pyspark/sql/utils.py", line 207, in call
raise e
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/pyspark.zip/pyspark/sql/utils.py", line 204, in call
self.func(DataFrame(jdf, self.sql_ctx), batch_id)
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/modules.zip/infrastructure/write/write_parquet.py", line 16, in
df, epochId, to_clean_dataframes, spark_context
File "streaming_fact_order.py", line 110, in foreach_batch_function
data_asset_name="fact_order",
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/modules.zip/data_quality/ge_validation.py", line 125, in validate
batch = context.get_batch(batch_kwargs, expectation_suite)
File "/usr/local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1425, in get_batch
batch_parameters=batch_parameters,
File "/usr/local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1136, in _get_batch_v2
datasource = self.get_datasource(batch_kwargs.get("datasource"))
File "/usr/local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1833, in get_datasource
name=datasource_name, config=config
File "/usr/local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1727, in _instantiate_datasource_from_config
datasource_name=name, message=str(e)
great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_spark_datasource, error: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Promise already completed.
at scala.concurrent.Promise.complete(Promise.scala:53)
at scala.concurrent.Promise.complete$(Promise.scala:52)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
at scala.concurrent.Promise.success(Promise.scala:86)
at scala.concurrent.Promise.success$(Promise.scala:86)
at scala.concurrent.impl.Promise$DefaultPromise.success(Promise.scala:187)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$sparkContextInitialized(ApplicationMaster.scala:404)
at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:895)
at org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32)
at org.apache.spark.SparkContext.(SparkContext.scala:613)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Beta Was this translation helpful? Give feedback.
All reactions