Exception Promise already completed #5689

ZeMirella · 2021-03-22T13:16:13Z

ZeMirella
Mar 22, 2021

Describe the bug
The job presented intermittent errors during data processing, and I couldn't identify what was causing this exception.

To Reproduce
Steps to reproduce the behavior:

We created a class where we passed all the necessary configurations to the DataContextConfig, and after that, we passed the configurations to the BaseDataContext class (project_config = DataContextConfig)

context = BaseDataContext(project_config = DataContextConfig)
expectation_suite = context.get_expectation_suite(expectation_suite_name)

batch_kwargs = {
"datasource": "my_spark_datasource",
"dataset": spark_df,
"data_asset_name": data_asset_name,
}

batch = context.get_batch(batch_kwargs, expectation_suite)
run_id = datetime.datetime.utcnow().strftime("%Y%m%dT%H%M%S.%fZ")

data_context = context.run_validation_operator(
validation_operator_name="action_list_operator",
assets_to_validate=[batch],
run_id=run_id,
)

Expected behavior
Pass the settings to the class, and be able to use the library by passing the tables to be validated

Environment (please complete the following information):

Linux
Great Expectations Version: 0.12.1

Additional context

Traceback (most recent call last):
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 2442, in _call_proxy
return_value = getattr(self.pool[obj_id], method)(*params)
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/pyspark.zip/pyspark/sql/utils.py", line 207, in call
raise e
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/pyspark.zip/pyspark/sql/utils.py", line 204, in call
self.func(DataFrame(jdf, self.sql_ctx), batch_id)
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/modules.zip/infrastructure/write/write_parquet.py", line 16, in
df, epochId, to_clean_dataframes, spark_context
File "streaming_fact_order.py", line 110, in foreach_batch_function
data_asset_name="fact_order",
File "/mnt/yarn/usercache/hadoop/appcache/application_1613275903560_8414/container_1613275903560_8414_01_000001/modules.zip/data_quality/ge_validation.py", line 125, in validate
batch = context.get_batch(batch_kwargs, expectation_suite)
File "/usr/local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1425, in get_batch
batch_parameters=batch_parameters,
File "/usr/local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1136, in _get_batch_v2
datasource = self.get_datasource(batch_kwargs.get("datasource"))
File "/usr/local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1833, in get_datasource
name=datasource_name, config=config
File "/usr/local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1727, in _instantiate_datasource_from_config
datasource_name=name, message=str(e)
great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_spark_datasource, error: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Promise already completed.
at scala.concurrent.Promise.complete(Promise.scala:53)
at scala.concurrent.Promise.complete$(Promise.scala:52)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
at scala.concurrent.Promise.success(Promise.scala:86)
at scala.concurrent.Promise.success$(Promise.scala:86)
at scala.concurrent.impl.Promise$DefaultPromise.success(Promise.scala:187)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$sparkContextInitialized(ApplicationMaster.scala:404)
at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:895)
at org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32)
at org.apache.spark.SparkContext.(SparkContext.scala:613)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

eugmandel · 2021-03-24T16:49:44Z

eugmandel
Mar 24, 2021

@ZeMirella Thank you for filing this. We have not seen this error on Spark. I will keep this issue open, label it with "help wanted" and invite the community to chime in with more details and context.

0 replies

2021-06-23T00:09:12Z

github-actions[bot]
bot Jun 23, 2021

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?\n\nThis issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

0 replies

madmaxlax · 2022-02-02T05:33:37Z

madmaxlax
Feb 2, 2022

@eugmandel @ZeMirella
I am having this issue as well,
I am trying to set up an in-memory context, on Spark (on azure synapse)
and cant get any of the context adding to work because this error comes up
IllegalStateException: Promise already completed

i see from
there is a new parameter force_reuse_spark_context that may help with this, but adding that to the execution_engine config did not help in setting up the context (in code)

0 replies

alexszym · 2022-02-17T14:55:54Z

alexszym
Feb 17, 2022

I'm in the same boat. I get IllegalStateException: Promise already completed when attempting to add a datasource to context that uses SparkDFExecutionEngine

Snippet for reference:

import yaml

datasource_config = {
    "name": "runtime_datasource",
    "class_name": "Datasource",
    "module_name": "great_expectations.datasource",
    "execution_engine": {
        "module_name": "great_expectations.execution_engine",
        "class_name": "SparkDFExecutionEngine",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
    },
}

context.test_yaml_config(yaml.dump(datasource_config)) # or using add_datasource

I'm also attempting this on Spark (using Synapse)

Could we get the issue reopened? Also any configuration that can be tweaked at engine definition level?

0 replies

alexszym · 2022-02-18T10:10:45Z

alexszym
Feb 18, 2022

For anyone else that might experience this issue, after having a look at the code and playing around with the parameters, the following workaround worked for me:

datasource_config = {
    "name": "runtime_datasource",
    "class_name": "Datasource",
    "module_name": "great_expectations.datasource",
    "execution_engine": {
        "module_name": "great_expectations.execution_engine",
        "class_name": "SparkDFExecutionEngine",
        "spark_config":{
            "spark.app.name": spark.sparkContext.getConf().get("spark.app.name")
        }
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
    },
}

I think there is likely a bug somewhere in this version (0.14.6) in how the existing spark context is retrieved.

4 replies

mingyyy Feb 16, 2023

@alexszym thank you for posting the workaround. However, I tried that and I still get the Promise already completed error when trying to instantiate my DataContext.

Following the last example on how_to_instantiate_a_data_context_without_a_yml_file and example configuration on how to connect to data on Azure Blob Storage using Spark.

data_context_config = DataContextConfig(
    config_version=2,
    plugins_directory=None,
    config_variables_file_path=None,
    datasources={
        "my_azure_datasource": DatasourceConfig(
            class_name="Datasource",
            execution_engine={
                "class_name": "SparkDFExecutionEngine",
                "azure_options": {
                    "account_url": AZURE_STORAGE_CONNECTION_STRING,
                    # "credential": AZURE_CREDENTIAL,
                },
            },
            data_connectors={
                "default_inferred_data_connector_name": {
                    "class_name": "InferredAssetAzureDataConnector",
                    "azure_options": {
                        "account_url": AZURE_STORAGE_CONNECTION_STRING,
                        # "credential": AZURE_CREDENTIAL,
                    },
                    "container": CONTAINER_NAME,
                    "name_starts_with": FILE_PATH,
                    "default_regex": {
                        "pattern": FILE_PATTERN,
                        "group_names": ["data_asset_name"],
                    },
                }
            },
        )
    },

    stores = {
        "expectations_AZ_store": {
            "class_name": "ExpectationsStore",
            "store_backend": {
                "class_name": "TupleAzureBlobStoreBackend",
                "container": CONTAINER_NAME,
                "prefix": "expectations",
                "connection_string": AZURE_STORAGE_CONNECTION_STRING,
            },
        },
        "validations_AZ_store": {
            "class_name": "ValidationsStore",
            "store_backend": {
                "class_name": "TupleAzureBlobStoreBackend",
                "container": CONTAINER_NAME,
                "prefix": "expectations",
                "connection_string": AZURE_STORAGE_CONNECTION_STRING,
            },
        },
        "evaluation_parameter_store": {"class_name": "EvaluationParameterStore"},
    },
    expectations_store_name="expectations_AZ_store",
    validations_store_name="validations_AZ_store",
    evaluation_parameter_store_name="evaluation_parameter_store",
    data_docs_sites={
        "blob_site": {
            "class_name": "SiteBuilder",
            "store_backend": {
                "class_name": "TupleFilesysteBackend",
                "base_directory": "expectations/",
            },
            "site_index_builder": {
                "class_name": "DefaultSiteIndexBuilder",
                "show_cta_footer": True,
            },
        }
    },
    validation_operators={
        "action_list_operator": {
            "class_name": "ActionListValidationOperator",
            "action_list": [
                {
                    "name": "store_validation_result",
                    "action": {"class_name": "StoreValidationResultAction"},
                },
                {
                    "name": "store_evaluation_params",
                    "action": {"class_name": "StoreEvaluationParametersAction"},
                },
                {
                    "name": "update_data_docs",
                    "action": {"class_name": "UpdateDataDocsAction"},
                },
            ],
        }
    },
    anonymous_usage_statistics={
      "enabled": True
    }
)

context = gx.get_context(project_config=data_context_config)

I still get this error causing the failure in initializing of data source.

Cannot initialize datasource my_azure_datasource: Cannot initialize datasource my_azure_datasource, error: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Promise already completed.
	at scala.concurrent.Promise.complete(Promise.scala:53)
	at scala.concurrent.Promise.complete$(Promise.scala:52)
	at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
	at scala.concurrent.Promise.success(Promise.scala:86)
	at scala.concurrent.Promise.success$(Promise.scala:86)
	at scala.concurrent.impl.Promise$DefaultPromise.success(Promise.scala:187)
	at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$sparkContextInitialized(ApplicationMaster.scala:414)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:931)
	at org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:704)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)

Do you think you are able to help? Thanks!

mingyyy Feb 27, 2023

@anthonyburdi @Shinnnyshinshin Since you solved the other problem for us... would you be so kind to help with the above issue as well? I am still not able to get this to work. So far only PandasExecutionEngine doesn't give me an error when trying to instantiate DataContext in Synapse Notebook. But I really would like to figure out how to configure SparkDFExecutionEngine in Azure Synapse. Thank you very much!

alexszym Mar 1, 2023

@mingyyy Are you able to share version of the package you're using and the data context with the workaround applied that you tried?

mingyyy Mar 2, 2023

Thanks! @alexszym
I am using the latest version 15.49. And the data_context_config I posted above two weeks ago. It never worked for me.
The only difference from yours is this:

            "spark.app.name": spark.sparkContext.getConf().get("spark.app.name")
        }

But adding this didn't help either.

kenwade4 · 2022-02-22T18:47:06Z

kenwade4
Feb 22, 2022

@madmaxlax @alexszym what version of Great Expectations are you using? The original issue was for 0.12.1 (very old and using batch_kwargs)

Can you share any steps to reproduce? Thanks

0 replies

madmaxlax · 2022-02-22T18:54:09Z

madmaxlax
Feb 22, 2022

@kenwade4

I didn't specify a version so it should have been just grabbing whatever is latest for GE

I was running it in pyspark notebook in Azure Synapse, trying to follow the setup instructing in

https://docs.greatexpectations.io/docs/deployment_patterns/how_to_instantiate_a_data_context_hosted_environments
Was also trying some instructions from the Databricks walkthrough as well

https://docs.greatexpectations.io/docs/deployment_patterns/how_to_use_great_expectations_in_databricks

0 replies

alexszym · 2022-02-22T19:30:35Z

alexszym
Feb 22, 2022

@kenwade4 0.14.6 in my case

0 replies

mauromelis · 2022-04-13T13:17:54Z

mauromelis
Apr 13, 2022

We also encountered the same exception when trying to add a new runtime data source with SparkDFExecutionEngine but on GE version 0.14.13.
Being the same use case, the code is similar to the snippet posted by @alexszym and we confirm that the workaround he proposed worked perfectly (thanks a lot!), temporarily solving our problems.

0 replies

wilson-mok · 2022-06-14T21:01:31Z

wilson-mok
Jun 14, 2022

@alexszym @madmaxlax Thank you for posting this solution and confirmation. I tested out 0.1.4.13 and it works in Synapse.

Have you guys try upgrading to anything newer than 0.14.13? I tried to use version 0.15.9 but I am getting the same Promise error.

0 replies

frammnm · 2022-06-16T01:45:29Z

frammnm
Jun 16, 2022

@wilson-mok im having the same problem in 0.15.9 also in Synapse.

0 replies

alexszym · 2022-06-26T19:07:25Z

alexszym
Jun 26, 2022

@wilson-mok @frammnm Same workaround is still working for me on 0.15.11

0 replies

wilson-mok · 2022-06-27T21:14:50Z

wilson-mok
Jun 27, 2022

Thx you for the update and can confirmed that 0.15.11 works!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception Promise already completed #5689

{{title}}

Replies: 13 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Exception Promise already completed #5689

Replies: 13 comments · 4 replies

github-actions[bot] bot Jun 23, 2021

Replies: 13 comments 4 replies

github-actions[bot]
bot Jun 23, 2021