-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for AWS Glue as an alternative Hive metastore implementation #112
Comments
I thought Glue exposed the same Thrift API that Hive uses. If that's the case, then we should be able to use the same lock API and code. |
I believe the API is partially implemented and doesn't include locking mechanisms unfortunately. Looking into it a bit when running on Spark EMR for instance, the So, I was thinking the lock piece could be abstracted out where the generic Hive implementation uses the |
The client source was made available for Glue now for reference, see announcement. |
I think that Glue should implement locking as required by the interface it exposes. I'd be fine adding a solution specific to Glue in Iceberg as well, but I'm not sure what that would look like. Good to know that Glue won't work though. |
Looking into it a bit when running on Spark EMR for instance
I believe there is ongoing work to have the HiveMetaStoreClientFactory
abstraction
contributed to vanilla Apache Hive:
https://issues.apache.org/jira/browse/HIVE-12679
…On Fri, 7 Dec 2018 at 21:07, Ryan Rupp ***@***.***> wrote:
I believe the API is partially implemented and doesn't include locking
mechanisms unfortunately. Looking into it a bit when running on Spark EMR
for instance, the HiveMetaStoreClientFactory can be overridden to specify
AWSGlueDataCatalogHiveClientFactory see here
<https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html>.
The implementation used there implements the basic Hive metastore
operations e.g. create/alter/get table (calling back to the Glue public
API) but UnsupportedOperationException is thrown for the lock method.
So, I was thinking the lock piece could be abstracted out where the
generic Hive implementation uses the lock method via the Hive metastore
but then a Glue override could use some other mechanism. So I guess mainly
at this point it's a limitation of the Glue implementation but wanted to
toss this out there as a nice to have for people not running their own Hive
metastore.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#112 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAN-VqlejBd-TXdAUcUiyB5amA3-XdOJks5u2tiEgaJpZM4Y4lVs>
.
|
Similar to the functionality in Presto I was wondering if Glue can be substituted in as an alternative implementation of a Hive metastore. Looking at the current
HiveTableOperations
it relies on:The locking mechanism would be the problematic part as I don't believe an equivalent API is available in Glue. Possibly there's another approach or another service could be used for the locking functionality e.g. DynamoDB.
The text was updated successfully, but these errors were encountered: