Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1374015: 🐛 Snowflake SQLAlchemy Driver fails when reflecting new VECTOR data type #499

Open
aaronsteers opened this issue May 9, 2024 · 8 comments
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team

Comments

@aaronsteers
Copy link

aaronsteers commented May 9, 2024

Symptom trying to fix?

When reflecting a SQL table, failures will raise if using the (very new!) VECTOR data type.

What did you expect to see?

I think this driver needs to be updated to handle VECTOR type. Internally, this is basically an array of floats, except that the number of items in the array is fixed at creation time.

@aaronsteers aaronsteers added bug Something isn't working needs triage labels May 9, 2024
@github-actions github-actions bot changed the title 🐛 Snowflake SQLAlchemy Driver fails when reflecting new VECTOR data type SNOW-1374015: 🐛 Snowflake SQLAlchemy Driver fails when reflecting new VECTOR data type May 9, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this May 10, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka added enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team and removed bug Something isn't working needs triage labels May 10, 2024
@sfc-gh-dszmolka
Copy link
Contributor

hello and thank you for the interest of the public preview feature of VECTOR datatype! as documented on the feature page, currently it is

[..]only supported in SQL, the Python connector and the Snowpark Python library. No other languages are supported.

We'll work on adding support in other Snowflake drivers and connectors and thank you for bearing with us while this happens.

@sfc-gh-dszmolka sfc-gh-dszmolka removed their assignment May 10, 2024
@aaronsteers
Copy link
Author

@sfc-gh-dszmolka - Yes, that makes sense. Our workaround for now is to pass certain commands through the Snowflake Python client - but it would be beneficial long-term to switch back to the native SQLAlchemy integrations - and also to at least make sure SQLAlchemy does not break when attempting to scan or read from those tables.

Happy to use this issue as a tracking item for that future work. Thanks for your support.

@japborst
Copy link

Hey @sfc-gh-dszmolka. As this feature is now out of public preview, do you know the status of this issue? Thanks!

@sfc-gh-dszmolka
Copy link
Contributor

hi @japborst unfortunately at this moment, I don't have any additional info on the timeline for the implementation, but trying to get it from the team and will update this issue when/if I have any news. Thank you all for bearing with us !

@japborst
Copy link

japborst commented Sep 3, 2024

Hey @sfc-gh-dszmolka!

On the website I read

The VECTOR data type is only supported in SQL, the Python connector and the Snowpark Python library. No other languages are supported.

Do I read correctly that whilst there is Python support (the Python snowflake connector), it's primarily SQLAlchemy support that is missing?

@tazhigaliyev
Copy link

for those who need vector dt in sqlalchemy, this temp workaround (there's nothing more permanent than temporary) might help:

from sqlalchemy.types import UserDefinedType

class SFVector(UserDefinedType):
    def __init__(self, data_type, length):
        self.data_type = data_type
        self.length = length

    def get_col_spec(self):
        return f"VECTOR({self.data_type}, {self.length})"

embedding = Column(SFVector('FLOAT', 1536), nullable=True)

@MattLJoslin
Copy link

Is there any ETA on this? It makes this quite hard to connect to sqlalchemy based solutions such as SuperSet.

@aaronsteers
Copy link
Author

for those who need vector dt in sqlalchemy, this temp workaround (there's nothing more permanent than temporary) might help:

from sqlalchemy.types import UserDefinedType

class SFVector(UserDefinedType):
    def __init__(self, data_type, length):
        self.data_type = data_type
        self.length = length

    def get_col_spec(self):
        return f"VECTOR({self.data_type}, {self.length})"

embedding = Column(SFVector('FLOAT', 1536), nullable=True)

FWIW: I've used a similar workaround, although it won't help when SQLAlchemy is wrapped by another tool (like Superset in the comment above). Would be great to see native support added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

5 participants