[Bug] --empty doesn't work with relation that are already named and might conflict #124

github-christophe-oudar · 2024-03-09T00:57:25Z

Is this a new bug?

I believe this is a new bug
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

This a follow up from my comment on the original PR.

If any is using code like FROM {{ source('schema', 'table') }} my_table_alias while using the --empty alias.
The resulting SQL is the following: FROM (select * from `db`.`schema`.`table` where false limit 0) _dbt_limit_subq my_table_alias

The problem is linked to the alias that's already set up here:

dbt-adapters/dbt/adapters/base/relation.py

Lines 213 to 214 in 2a99e36

    
               return f"(select * from {rendered} where false limit 0) _dbt_limit_subq" 
        
           else:

which prevents from aliasing by ourselves. Also if there are multiple references, I guess the tables will all be aliased to _dbt_limit_subq resulting in an error.

Expected Behavior

The simplest expected would be that it results in FROM (select * from `db`.`schema`.`table` where false limit 0) my_table_alias (practically that no table name is suffixed automatically). However if the user didn't name its table, it will fail... which is another problem. Yet we could look into alternatives:

finding a way to detect the table is suffixed
- it looks like complex based on how dbt works
enabling setting the alias through the kwargs from the source/ref macros
- it might be seems reasonable but requires modifications on the end users

Steps To Reproduce

Create a simple model my_model.sql with an SQL like:
SELECT * FROM {{ source('schema', 'table') }} my_table_alias
run dbt command dbt run -s my_model --empty

Relevant log output

No response

Environment

- OS: MacOS 13.2.1
- Python: 3.11
- dbt-adapter: 1.8.0b1

Additional Context

No response

The text was updated successfully, but these errors were encountered:

dwreeves · 2024-03-21T21:48:33Z

set no alias (but then I think some DBs requires an alias and then it would fail for them with default setup)

Yes, some do (e.g. Postgres) but some don't (e.g. Snowflake).

This could be an attribute of the dialect's adapter, e.g. requires_aliases_for_subqueries: bool.

For those that do not required, aliasing shouldn't be used at all. Then the problem is solved arbitrarily, no other fixes required.

For those that do require aliases, that is where it gets trickier. One of the issues I see with something like, select * from {{ ref("foo", alias="foo") }} or select * from {{ ref("foo", alias=True) }} (pick your poison for an API) is that this isn't any less complex in terms of the pre-compiled code than just doing select * from {{ ref("foo", alias=True) }} as foo.

So, based on that, and based on the fact that most (all?) OLAP systems (i.e. what typical users are using dbt with) don't require aliasing subqueries, I actually think there is a strong argument for not aliasing these subqueries at all: OLAP people don't need aliases; Postgres people do but the code isn't made simpler by forcing it into the Jinja instead of the SQL. (In that case, you also wouldn't even need a requires_aliases_for_subqueries: bool for the adapter.)

khaledh · 2024-04-18T14:21:01Z

Just want to add +1 as we're having the same issue (we're using dbt-bigquery).

dcarvalhofernandes · 2024-04-18T15:44:49Z

+1 on this, using dbt-snowflake

colin-rogers-dbt · 2024-04-18T18:34:46Z

Working on a PR for this, approach is going to be to make the aliasing configurable for each adapter so we can keep it for postgres or mysql but drop for platforms that don't need it (i.e. bigquery / snowflake). This should at least fix this case for those platforms.

Worth noting the workaround, in the meantime, is to move the ref into a CTE.
So instead of:
SELECT * FROM {{ source('schema', 'table') }} my_table_alias
this should work...

WITH my_table_alias as (SELECT * FROM {{ source('schema', 'table') }})

SELECT * FROM my_table_alias

dwreeves · 2024-04-18T19:49:35Z

Users may still be using aliases for tables in Postgres, e.g.

select a.*
from {{ ref("base_table") }} a
left join {{ ref("foreign_entity") }} b
on a.foreign_id = b.id

The above is valid Postgres SQL, after rendering. So it seems like adding aliases can still end up breaking workflows for some Postgres users, unless I'm missing something.

colin-rogers-dbt · 2024-04-18T20:03:14Z

You're not missing something and we still need to come up with a better long term solution here (other than asking people to write their SQL differently if they want to use --empty). Essentially I'm going to get a fast-ish fix to reduce the scope of this case to just platforms that require aliasing subqueries.
Currently there isn't a way for us to interpret the context of the ref to know if we should or should not add an alias

dwreeves · 2024-04-18T20:31:03Z

I guess my point is, for SQL for dialects requiring aliased subqueries, the path to making all queries working arbitrarily with --empty is:

If an alias is added by dbt: Replace aliased refs {{ ref("table") }} as x with (select * from {{ ref("table") }}) as x
If an alias is not added by dbt: Replace un-aliased refs {{ ref("table") }} with {{ ref("table") }} as x

In both cases, SQL rewrites are required (one set of users has to make changes), and between the two I think people would rather write as x than wrap in a subquery.

Furthermore, unless someone can chime in with expertise on the Postgres query planner, I'd not rule out the possibility that replacing from table with from (select * from table) could cause performance troubles for non-empty queries. In my limited testing with Postgres on some queries we use in prod, it doesn't seem like replacing tbl with (select * from tbl) impacted the explains, but I don't know enough about Postgres, and don't have enough real world test cases, to rule out that possibility. (Could anyone who knows more than me chime in and assure it never impacts what the query planner does? 😅)

dwreeves · 2024-04-18T20:44:55Z

Ah, it seems like it does not ever impact the query planner to replace from tbl with from (select * from tbl)! https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-SUBQUERIES OK, that is one of my concerns alleviated. 😄

One more note on Postgres, specifically, for what it's worth, since apparently my information is a little out of date: Postgres ≤13 requires aliases, but ≥14 does not.

colin-rogers-dbt · 2024-04-18T23:26:28Z

@dwreeves It may be true that most users have those preferences but changing this behavior in dbt-postgres would amount to a breaking change for the set of users for whom this is working as opposed to preserving the present behavior where it's always been broken for this case.

Ultimately we need to find a better long term approach for supporting --empty and making these kind of dynamic sql manipulation.

colin-rogers-dbt · 2024-05-07T17:27:34Z

closing, tracking the long term fix here: #199

benesch · 2024-06-07T19:47:22Z

One more note on Postgres, specifically, for what it's worth, since apparently my information is a little out of date: Postgres ≤13 requires aliases, but ≥14 does not.

For posterity: aliases became optional in v16, not v14! See: postgres/postgres@bcedd8f

### Motivation As per this bug [here](dbt-labs/dbt-adapters#124), the `--empty` flag doesn't work with relation that are already named and fails. A quick repro: * Create a simple model with: `SELECT * FROM {{ ref('my_upstream_model')}} some_alias` * The generated SQL when using the `--empty` flag, looks like this: ```sql as ( SELECT * FROM ( select * from "my_upstream_model" where false limit 0) _dbt_limit_subq_my_upstream_model alias ); ```  ### Tips for reviewer  ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. ([trigger-ci for additional test/nightly runs](https://trigger-ci.dev.materialize.com/)) - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design.  - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)).  - [ ] This PR includes the following [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note): -

dwreeves · 2024-06-07T20:19:09Z

One more note on Postgres, specifically, for what it's worth, since apparently my information is a little out of date: Postgres ≤13 requires aliases, but ≥14 does not.

For posterity: aliases became optional in v16, not v14! See: postgres/postgres@bcedd8f

Thanks! Seems I have poor reading comprehension, or confused myself clicking through the docs versions 😞

mvzakharov · 2024-10-24T12:41:08Z

Hello everyone. I use the adapter dbt-vertica
I want to add there is the same problem.
It is not possible to use the key --empty.

github-christophe-oudar added bug Something isn't working triage labels Mar 9, 2024

dbeatty10 added the pre-release Bug not yet in a stable release label Mar 13, 2024

martynydbt added High Severity bug with significant impact that should be resolved in a reasonable timeframe Core 1.8 and removed triage labels Mar 15, 2024

martynydbt assigned McKnight-42 Mar 26, 2024

McKnight-42 assigned colin-rogers-dbt and unassigned McKnight-42 Apr 17, 2024

dbeatty10 mentioned this issue Apr 18, 2024

[Bug] --empty flag generates SQL that conflicts with subquery aliases dbt-labs/dbt-core#9967

Closed

2 tasks

This was referenced Apr 18, 2024

Make aliasing subquery configurable for render_limited #179

Merged

Stop adding aliases to render_limited output dbt-labs/dbt-snowflake#980

Merged

This was referenced Apr 23, 2024

Stop adding aliases to render_limited output dbt-labs/dbt-bigquery#1190

Merged

Stop adding aliases to render_limited output dbt-labs/dbt-redshift#782

Merged

colin-rogers-dbt mentioned this issue May 7, 2024

[Feature] Support contextual table aliasing #199

Open

3 tasks

colin-rogers-dbt closed this as completed May 7, 2024

github-christophe-oudar mentioned this issue May 28, 2024

Support --empty flag for schema-only dry runs dbt-labs/dbt-core#8971

Merged

5 tasks

bobbyiliev mentioned this issue Jun 7, 2024

dbt-materialize: Stop adding aliases to subqueries with --empty MaterializeInc/materialize#27522

Merged

5 tasks

jwills mentioned this issue Jul 11, 2024

dbt run --empty throws a syntax error while the model is ran correctly without the --empty flag duckdb/dbt-duckdb#412

Closed

dbeatty10 added the empty Issues related to the --empty CLI flag label Sep 26, 2024

Dewwi mentioned this issue Oct 23, 2024

Stop adding aliases to render_limited output dbt-labs/dbt-athena#749

Merged

4 tasks

dbeatty10 mentioned this issue Nov 19, 2024

[Bug] Model reference and AS with empty run fails dbt-labs/dbt-postgres#175

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] --empty doesn't work with relation that are already named and might conflict #124

[Bug] --empty doesn't work with relation that are already named and might conflict #124

github-christophe-oudar commented Mar 9, 2024 •

edited by dbeatty10

Loading

dwreeves commented Mar 21, 2024 •

edited

Loading

khaledh commented Apr 18, 2024

dcarvalhofernandes commented Apr 18, 2024

colin-rogers-dbt commented Apr 18, 2024

dwreeves commented Apr 18, 2024 •

edited

Loading

colin-rogers-dbt commented Apr 18, 2024

dwreeves commented Apr 18, 2024 •

edited

Loading

dwreeves commented Apr 18, 2024 •

edited

Loading

colin-rogers-dbt commented Apr 18, 2024

colin-rogers-dbt commented May 7, 2024

benesch commented Jun 7, 2024

dwreeves commented Jun 7, 2024 •

edited

Loading

mvzakharov commented Oct 24, 2024

[Bug] --empty doesn't work with relation that are already named and might conflict #124

[Bug] --empty doesn't work with relation that are already named and might conflict #124

Comments

github-christophe-oudar commented Mar 9, 2024 • edited by dbeatty10 Loading

Is this a new bug?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

dwreeves commented Mar 21, 2024 • edited Loading

khaledh commented Apr 18, 2024

dcarvalhofernandes commented Apr 18, 2024

colin-rogers-dbt commented Apr 18, 2024

dwreeves commented Apr 18, 2024 • edited Loading

colin-rogers-dbt commented Apr 18, 2024

dwreeves commented Apr 18, 2024 • edited Loading

dwreeves commented Apr 18, 2024 • edited Loading

colin-rogers-dbt commented Apr 18, 2024

colin-rogers-dbt commented May 7, 2024

benesch commented Jun 7, 2024

dwreeves commented Jun 7, 2024 • edited Loading

mvzakharov commented Oct 24, 2024

github-christophe-oudar commented Mar 9, 2024 •

edited by dbeatty10

Loading

dwreeves commented Mar 21, 2024 •

edited

Loading

dwreeves commented Apr 18, 2024 •

edited

Loading

dwreeves commented Apr 18, 2024 •

edited

Loading

dwreeves commented Apr 18, 2024 •

edited

Loading

dwreeves commented Jun 7, 2024 •

edited

Loading