Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incorrect schema name handling in selectable for SQLAlchemy (Bigquery) #10333

Open
TaerimL opened this issue Sep 4, 2024 · 5 comments
Open
Labels
bug Bugs bugs bugs! stack:big_query

Comments

@TaerimL
Copy link

TaerimL commented Sep 4, 2024

Describe the bug
I am encountering an issue when running validation with SQLAlchemy (BigQuery). Specifically, when generating a selectable with a query, the returned table object for temporary table has the schema referenced twice: once within the table name and another time within the schema attribute.

Consequently, the table reference, built using selectatble, is incorrectly formatted as schema_name.schema_name.table_name. instead of schema_name.table_name. This is problematic in BigQuery because the table name is already qualified with its schema name. This incorrectly formatted Table object is referenced during validations and fails.

It seems that the issue was handled for the function _generate_selectable_from_schema_name_and_table_name but not on generating it with a query.

To Reproduce
This behavior can be observed with the following arguments to the function _generate_selectable_from_query when the dialect = GXSqlDialect.BIGQUERY.

"create_temp_table" = True
"temp_table_schema_name" = "sample_schema_name"
"query" = "sample_query"

Expected behavior
In the function "_generate_selectable_from_query", the selectable should be in the format schema_name.table_name when dialect = GXSqlDialect.BIGQUERY.

I would like to suggest a possible solution below:

def _generate_selectable_from_query(
        self,
        query: str,
        dialect: GXSqlDialect,
        create_temp_table: bool,
        temp_table_schema_name: Optional[str] = None,
    ) -> sqlalchemy.Table | sqlalchemy.TextClause:
        """Helper method to generate Selectable from query string.

        Args:
            query (str): query passed in as RuntimeBatchRequest.
            dialect (GXSqlDialect): Needed for _create_temporary_table, since different backends name temp_tables differently.
            create_temp_table (bool): Should we create a temp_table? If not a `TextClause` will be returned instead of a Table.
            temp_table_schema_name (Optional[str], optional): Optional string for temp_table schema.  Defaults to None.

        Returns:
            sqlalchemy.Table: SqlAlchemy Table that is Selectable or a TextClause.
        """

        if not create_temp_table:
            return sa.text(query)
        _, temp_table_name = self._create_temporary_table(
            dialect=dialect,
            query=query,
            temp_table_schema_name=temp_table_schema_name,
        )

        if dialect == GXSqlDialect.BIGQUERY:
            if temp_table_schema_name is not None:
                logger.warning(
                    "schema_name should not be used when passing a table_name for biquery. Instead, include the schema name in the table_name string."
                )
            # In BigQuery the table name is already qualified with its schema name
            return sa.Table(
                temp_table_name,
                sa.MetaData(),
                schema=None,
            )

        return sa.Table(
            temp_table_name,
            sa.MetaData(),
            schema=temp_table_schema_name,
        )

Environment (please complete the following information):

  • Operating System: MacOS
  • Great Expectations Version: 0.18.19
  • Data Source: Sqlalchemy(BigQuery)
  • Cloud environment: GCP
@breno-costa
Copy link

Apart from the suggestion above, would you have other suggestion to resolve this issue?

@adeola-ak adeola-ak added the bug Bugs bugs bugs! label Sep 9, 2024
@adeola-ak
Copy link
Contributor

Hi @TaerimL! Thank you for bringing this to our attention - i'll be sharing this with the team and will follow up on progress. In the meantime, since you've offered a potential solution, I'd like to mention that we always welcome code contributions. if interested, you can find more details about contributing here

@adeola-ak adeola-ak moved this from To Do to Confirmed in GX Core Issues Board Sep 10, 2024
@tyler-hoffman
Copy link
Contributor

Hi @TaerimL , do you have a script handy that will reproduce the issue? The fix you've suggested sounds reasonable, but I'd like to reproduce it using public methods from an end-user flow.

Also, it looks like you are on gx version 0.18. We're focusing our efforts on version 1.0 now, so our team will likely just fix it in 1.0.

Thanks!

@TaerimL
Copy link
Author

TaerimL commented Oct 7, 2024

Hello @tyler-hoffman, you can regenerate by setting the following arguments: runtime_parameters={ "query": sql_query, "temp_table_schema_name": temp_table_dataset } in RuntimeBatchRequest, and create_temp_table=True and temp_table_dataset = temp_table_dataset in BigQueryQualityChecker, thank you :)

@adeola-ak
Copy link
Contributor

Hi @TaerimL, have you had a chance to upgrade to v1.x yet? I need to confirm whether this is an issue in our latest release before we can proceed with any further actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs bugs bugs! stack:big_query
Projects
Status: In progress
Development

No branches or pull requests

5 participants