You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
I am using dbt-spark in my projects.
The get_relation function in the dbt adapter intermittently fails to find relations that exist in the database. This issue manifests indirectly when using dbt macros such as incremental, get_columns_in_relation, or others that rely on get_relation to check for the existence of a relation.
As per my understanding, the get_relation function in the dbt adapter works by calling list_relations to fetch all relations in a schema or database. If the schema is already cached, list_relations retrieves the list of relations from the cache instead of querying the database. However, if a new relation (e.g., a table or view) is created after the schema cache is populated, the cache is not automatically refreshed.
def list_relations(self, database: Optional[str], schema: str) -> List[BaseRelation]:
if self._schema_is_cached(database, schema):
return self.cache.get_relations(database, schema)
schema_relation = self.Relation.create(
database=database,
schema=schema,
identifier="",
quote_policy=self.config.quoting,
).without_identifier()
# we can't build the relations cache because we don't have a
# manifest so we can't run any operations.
relations = self.list_relations_without_caching(schema_relation)
# if the cache is already populated, add this schema in
# otherwise, skip updating the cache and just ignore
if self.cache:
for relation in relations:
self.cache.add(relation)
if not relations:
# it's possible that there were no relations in some schemas. We want
# to insert the schemas we query into the cache's `.schemas` attribute
# so we can check it later
self.cache.update_schemas([(database, schema)])
fire_event(
ListRelations(
database=cast_to_str(database),
schema=schema,
relations=[_make_ref_key_dict(x) for x in relations],
)
)
return relations
Consequently, get_relation may return None for an existing relation that is not present in the outdated cache. This behavior indirectly affects dbt macros such as incremental and get_columns_in_relation, which rely on get_relation to check for the existence of relations. As a result, these macros may intermittently fail or behave unexpectedly, depending on whether the cache is outdated at the time the macro executes.
If my understanding is incorrect, please clarify how caching and get_relation are supposed to work.
Expected Behavior
To reliably validate the existence of a relation, the process could include the following stages:
Check the Cache: First, check if the relation exists in the list of relations for the cached schema (if the schema is already cached).
Fallback to Fresh Query: If the relation is not found in the cache, perform a fresh query to fetch the list of relations in the schema without relying on the cache. This step accounts for cases where the relation might have been created after the schema cache was populated.
Conclude Non-Existence: If the relation is not found in both the cached and freshly queried lists, conclude that the relation does not exist.
Steps To Reproduce
While the issue is intermittent, it can occur under the following conditions:
Run a dbt operation that internally calls get_relation (e.g., incremental model or macros like get_columns_in_relation) for a schema.
Below attaching the screenshot of the error , which ran successfully in the next retry. (even though the relation already exists)
Is this a new bug?
Current Behavior
I am using dbt-spark in my projects.
The get_relation function in the dbt adapter intermittently fails to find relations that exist in the database. This issue manifests indirectly when using dbt macros such as incremental, get_columns_in_relation, or others that rely on get_relation to check for the existence of a relation.
As per my understanding, the get_relation function in the dbt adapter works by calling list_relations to fetch all relations in a schema or database. If the schema is already cached, list_relations retrieves the list of relations from the cache instead of querying the database. However, if a new relation (e.g., a table or view) is created after the schema cache is populated, the cache is not automatically refreshed.
Consequently, get_relation may return None for an existing relation that is not present in the outdated cache. This behavior indirectly affects dbt macros such as incremental and get_columns_in_relation, which rely on get_relation to check for the existence of relations. As a result, these macros may intermittently fail or behave unexpectedly, depending on whether the cache is outdated at the time the macro executes.
If my understanding is incorrect, please clarify how caching and get_relation are supposed to work.
Expected Behavior
To reliably validate the existence of a relation, the process could include the following stages:
Check the Cache: First, check if the relation exists in the list of relations for the cached schema (if the schema is already cached).
Fallback to Fresh Query: If the relation is not found in the cache, perform a fresh query to fetch the list of relations in the schema without relying on the cache. This step accounts for cases where the relation might have been created after the schema cache was populated.
Conclude Non-Existence: If the relation is not found in both the cached and freshly queried lists, conclude that the relation does not exist.
Steps To Reproduce
While the issue is intermittent, it can occur under the following conditions:
Run a dbt operation that internally calls get_relation (e.g., incremental model or macros like get_columns_in_relation) for a schema.
Below attaching the screenshot of the error , which ran successfully in the next retry. (even though the relation already exists)
Relevant log output
No response
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: