Runtime Data Sources #10289
Replies: 9 comments
-
@ramananayak Would could alter the return type annotation to be an Immutable The idiomatic way to add or update a datasource is by using one of the import great_expectations as gx
context = gx.data_context.FileDataContext(context_root_dir="my_context_dir")
connection_string = "postgresql+psycopg2://<user_name>:<password>@<host>:<port>/<database>"
runtime_datasource = context.sources.add_or_update_postgres(
name="ds_runtime",
connection_string=connection_string,
create_temp_table=True
)
print(repr(runtime_datasource)) |
Beta Was this translation helpful? Give feedback.
-
thanks for the clarification @Kilo59 . I did some investigation and saw that for FileDataContext() context file is opened in So is there any way to add configurations for true run time use without changing context file everytime. Same case with dataasset, I don’t see any example to show how can we create runtime dataseet. Currently I am testing with fluent datasource, all the methods are just keep adding dataasset to context file. So it will lead to growing config file. validations:
- batch_request:
data_asset_name: runtime_asset
runtime_parameters:
query: "select column 1 from table"
expectation_suite_name: appstat_suite I don't rally know how can I achieve this in the latest version. |
Beta Was this translation helpful? Give feedback.
-
@ramananayak import great_expectations as gx
context = gx.get_context(mode="ephemeral")
connection_string = "postgresql+psycopg2://<user_name>:<password>@<host>:<port>/<database>"
runtime_datasource = context.sources.add_or_update_postgres(
name="ds_runtime",
connection_string=connection_string,
create_temp_table=True
)
print(repr(runtime_datasource)) The code above ☝️ should work but you won't have access to your filebacked checkpoints or expectations etc. I will pass this along to our team working on the |
Beta Was this translation helpful? Give feedback.
-
There's a somewhat related issue where a user is creating an ephemeral context from a file context but is unable to load the fluent configs. Updated example that should allow your ephemeral context to pull in the project config from your file context. import great_expectations as gx
# Create two different contexts using THE SAME config
file_ctx = gx.get_context(mode="file")
ephm_ctx = gx.get_context(mode="ephemeral", project_config=file_ctx.config)
connection_string = "postgresql+psycopg2://<user_name>:<password>@<host>:<port>/<database>"
runtime_datasource = ephm_ctx.sources.add_or_update_postgres(
name="ds_runtime",
connection_string=connection_string,
create_temp_table=True
)
print(repr(runtime_datasource)) |
Beta Was this translation helpful? Give feedback.
-
Hi @Kilo59 Here is my version import yaml
import great_expectations as gx
from great_expectations.data_context.types.base import DataContextConfig
from great_expectations.data_context import EphemeralDataContext
context_root_dir="path to my initial great_expectation.yml file "
with open(context_root_dir+'/great_expectations.yml', 'r') as file:
conf = yaml.safe_load(file)
context_config = DataContextConfig(**conf)
ephm_ctx = EphemeralDataContext(project_config=context_config)
connection_string = "postgresql+psycopg2://<user_name>:<password>@<host>:<port>/<database>"
runtime_datasource = ephm_ctx.sources.add_postgres(name="ds_runtime",
connection_string=connection_string,
create_temp_table=True)
print(repr(runtime_datasource)) This is working. although I have to mention complete path for all the respective GX directories (like plugin directory) but that's understood. But as I mentioned above, 0.17.11 supported RuntimeBatchRequest, where I could define datasource, dataasset and runtimequery as a part of checkpoint. I could see it is also available in 0.18.9 documentation. validations:
- batch_request:
data_asset_name: runtime_asset
runtime_parameters:
query: "select column 1 from table"
expectation_suite_name: appstat_suite Is it supported in the latest GX. or I have to go with creating dataasset separately outside of checkpoint for input query and then call the checkpoint as a part of validation ? Because this is a really helpfull feature for us, as we keep all the respective queries as a part of checkpoint and they stay separately , easy to identify dataasset and expectations together. thanks ! |
Beta Was this translation helpful? Give feedback.
-
Hi @Kilo59 If you have any idea on this, if you can give some pointers that would really help. thanks ! |
Beta Was this translation helpful? Give feedback.
-
@ramananayak any workflow from I think the issue is that the new "Fluent Style" Datasource (which are datasources created using the The documentation for the old "Block Style" datasources is no longer part of our latest version. You'll have to refer to 0.15 docs You can continue to use the old ("Block Style" Datasources) or you can create a runtime_datasource = ephm_ctx.sources.add_postgres(
name="ds_runtime",
connection_string=connection_string,
create_temp_table=True
)
my_query_asset = runtime_datasource.add_query_asset(name="my_query", query="select column 1 from table")
batch_request = my_query_asset.build_batch_request()
# pass batch_request to your checkpoint Does the |
Beta Was this translation helpful? Give feedback.
-
Hi @Kilo59 thanks for you response. Now I understand that But I think run time query config is a nice feature to have because we have a lot of config which user will (say analyst) will setup in the form of config and all we do is to wrap the config in Airflow scheduler which runs this checks. As I know lot of people use this method to add multiple checks in a single time. Also moving everything to a config file (in case of filedata context) also makes config file very bulky with lot of unnecessary configs added in context. |
Beta Was this translation helpful? Give feedback.
-
@ramananayak -- in your case, are you expecting to be able to use the validation results that come from these runtime assets at any time other than the immediate validation? We designed runtime assets to mean that the data would be available/provided at runtime, but the asset configuration itself was durable. The intent of that approach was to ensure that saved validation results could be identified by the asset's (durable) name. It sounds to me like that may be the gap, in that you're not looking to have the configuration of the asset persist at all. I'd love to jump on a call with you and @Kilo59 if you'd like to make sure I understand the case fully, since we've recently been looking at the question of how to support runtime cases more clearly. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
I want to add fluent_datasource at runtime after a FileDataContext is already defined.
context.fluent_datasources is of type dictionary. When I add a new fluent_datasource, it does not add to the existing dictionary.
Where as it works on datasource.
To Reproduce
Expected behavior
context.fluent_datasources should show added runtime_datasource inside the dictonary
Environment (please complete the following information):
Additional context
Add any other context about the problem here.
Beta Was this translation helpful? Give feedback.
All reactions