Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fetch_r_script_checksum to iterate over multiple results batches within a cursor #120

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gcnuss
Copy link

@gcnuss gcnuss commented Jun 14, 2022

In our experience using schemachange, we have run into an issue where fetch_r_script_checksum fails due to the quantity of R-scripts in our pipeline exceeding the data size that will fit within one Result Batch as returned by snowflake-connector-python. It will successfully iterate over all the rows in the first "chunk" but fail to proceed into a second chunk if it exists. The suggested code update resolves this by first returning all results batches and iterating through the rows within each batch. It does require pyarrow to be installed, which I've added to setup.cfg. The specified range of versions matches those required by snowflake-connector-python.

Note that I did discover the latest version of snowflake-connector-python (2.7.8) no longer fails with the existing fetch_r_script_checksum code as-written, so upgrading that config requirement may also work. That said I tested the suggested code update below with snowflake-connector-python 2.6.2, 2.7.4, and 2.7.8 and it works on all of them, so it would be more flexible and could be implemented without upgrading to 2.7.8 yet.

@@ -20,6 +20,7 @@ install_requires =
pandas~=1.3
Copy link
Author

@gcnuss gcnuss Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why the install_requires are set to "~=" instead of >= or with upper/lower bounds like I added for pyarrow. I couldn't find other examples using this online, what is the purpose of that approach?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcnuss The pattern is aligned to this reference.

@jonrobinsdev
Copy link

+1. Am having the same issue currently with our snowflake migrations pipelines that use schemachange. Would love if this fix could be merged, or at the very least snowflake-connector-python could be updated to 2.7.8

@sfc-gh-jhansen
Copy link
Collaborator

I'm so sorry for the delay here everyone (@gcnuss, @jonrobinsdev), I've been super busy lately and unable to give schemachange the attention it deserves. I just published a new version of schemachange (3.4.2) which updated the dependency on snowflake-connector-python to the newest version 2.8.

Please let me know if that solves the problem, and if you still think we should move forward with the batching code (and dependency on pyarrow) here.

@sfc-gh-tmathew
Copy link
Collaborator

@gcnuss @jonrobinsdev We have been reviewing open requests and trying to cathup on open items. Would appreciate your help to address this open item.

Does the issue still exist? Does the latest version render this PR unnecessary?

Kindly confirm

@sfc-gh-tmathew sfc-gh-tmathew self-assigned this Nov 2, 2023
@sfc-gh-tmathew sfc-gh-tmathew added question Further information is requested dependencies Pull requests that update a dependency file labels Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants