-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_deltalake
does not support Unity catalog tables on Azure storage
#3024
Comments
Can confirm, io_config isn't populated at all, it has a value of None when reading from azure databricks using unity.load_table(). Current workaround is to use your own credential to storage directly which bypasses unity's permissions model (and most of my clients have essentially locked down and disabled). |
Thanks @jordandakota for confirming this too. |
Thanks @anilmenon14 for the really well fleshed out issue! Great to have someone with more databricks knowledge than us working on this. Here is the logic where we create our As you can see, it currently only populates AWS credentials (and also does not handle regions, because we could not figure out how Unity would be vending region information given the early nature of the project) |
Thank you, @jaychia . Apart from that, this block also needs to be fixed to handle Azure storage being passed down in As for the AWS credential vending region, an interesting observation I noticed when I stepped into the
The
I'd certainly love to look further into that and try and contribute on that issue as well :) |
I believe this is now resolved with #3025, and will be available in Daft v0.3.9 |
Cross-posting information, originally mentioned in issue #2903, regarding testing on v0.3.9 The issue is fixed, but it needs a workaround, which I hope we can avoid in the future. I will look into this since it was a bit of an oversight on my part not to test the What works for Azure now on Daft v0.3.9
What does not work for Azure:
Error:
|
Describe the bug
When Databricks Unity catalog tables are on Azure storage, the
daft.read_deltalake
method does not support reading from this storage.It appears to only work for AWS storage, albeit with
io_config
passed down if AWS region of the storage is notus-east-1
( see issue here where the commands to read data from AWS is shown as an example)On Azure, the
io_config
is not populated upstream using theunity.load_table
method, which is the primary cause of this issue. Following which, theDeltaLakeScanOperator
instantiation withindaft.read_deltalake
is unable to handle non-s3io_config
objectsAnother issue to consider, which is not part of this issue, is to consider whether, for AWS storage, region needs to be passed down when instantiating an object in
DeltaLake
class, since this possibly can be done without specifying region. Happy to help contribute to explore this further since this would mean that we don't have to deal with having to have theus-east-1
default.To Reproduce
This block of code is really not being used and simply mentioned here just in case region is required anywhere
Attempt 1: Passing instance of
unity_catalog.UnityCatalogTable
todaft.read_deltalake
Error:
Attempt 2: Passing table_uri and io_config
Error:
Expected behavior
This is the behavior seen with AWS storage where I can successfully retrieve a table.
Output:
Dataframe is displayed without issues.
Component(s)
Python Runner
Additional context
No response
The text was updated successfully, but these errors were encountered: