-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added extra option to add readOnly thrift HMS uri #308
Conversation
…d on read only calls for better spread of traffic
waggle-dance-api/src/main/java/com/hotels/bdp/waggledance/api/model/AbstractMetaStore.java
Show resolved
Hide resolved
...core/src/main/java/com/hotels/bdp/waggledance/client/SplitTrafficMetastoreClientFactory.java
Show resolved
Hide resolved
README.md
Outdated
@@ -167,6 +167,7 @@ The table below describes all the available configuration values for Waggle Danc | |||
| `primary-meta-store.hive-metastore-filter-hook` | No | Name of the class which implements the `MetaStoreFilterHook` interface from Hive. This allows a metastore filter hook to be applied to the corresponding Hive metastore calls. Can be configured with the `configuration-properties` specified in the `waggle-dance-server.yml` configuration. They will be added in the HiveConf object that is given to the constructor of the `MetaStoreFilterHook` implementation you provide. | | |||
| `primary-meta-store.database-name-mapping` | No | BiDirectional Map of database names and mapped name, where key=`<database name as known in the primary metastore>` and value=`<name that should be shown to a client>`. See the [Database Name Mapping](#database-name-mapping) section.| | |||
| `primary-meta-store.glue-config` | No | Can be used instead of `remote-meta-store-uris` to federate to an AWS Glue Catalog ([AWS Glue](https://docs.aws.amazon.com/glue/index.html). See the [Federate to AWS Glue Catalog](#federate-to-aws-glue-catalog) section.| | |||
| `primary-meta-store.read-only-remote-meta-store-uris` | No | Can be used to configure an extra read-only endpoint for the primary Metastore. This is an optimization if your environment runs separate Metastore endpoints and traffic needs to be differted efficiently. Waggle Dance will direct traffic to the read-write or read-only endpoints based on the call being done. For instance `get_table` will be a read-only call but `alter_table` will be forwarded to the read-write Metastore.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo differentiated
or diverted
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for diverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diverted 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You gave us a mashup of both! Haha.
if (metaStore.getReadOnlyRemoteMetaStoreUris() != null) { | ||
CloseableThriftHiveMetastoreIface readWrite = newHiveInstance(metaStore, name, metaStore.getRemoteMetaStoreUris(), | ||
properties); | ||
CloseableThriftHiveMetastoreIface readOnly = newHiveInstance(metaStore, name+"_ro", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space missing before and after +
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatted
...core/src/main/java/com/hotels/bdp/waggledance/client/SplitTrafficMetastoreClientFactory.java
Show resolved
Hide resolved
...core/src/main/java/com/hotels/bdp/waggledance/client/SplitTrafficMetastoreClientFactory.java
Show resolved
Hide resolved
README.md
Outdated
@@ -167,6 +167,7 @@ The table below describes all the available configuration values for Waggle Danc | |||
| `primary-meta-store.hive-metastore-filter-hook` | No | Name of the class which implements the `MetaStoreFilterHook` interface from Hive. This allows a metastore filter hook to be applied to the corresponding Hive metastore calls. Can be configured with the `configuration-properties` specified in the `waggle-dance-server.yml` configuration. They will be added in the HiveConf object that is given to the constructor of the `MetaStoreFilterHook` implementation you provide. | | |||
| `primary-meta-store.database-name-mapping` | No | BiDirectional Map of database names and mapped name, where key=`<database name as known in the primary metastore>` and value=`<name that should be shown to a client>`. See the [Database Name Mapping](#database-name-mapping) section.| | |||
| `primary-meta-store.glue-config` | No | Can be used instead of `remote-meta-store-uris` to federate to an AWS Glue Catalog ([AWS Glue](https://docs.aws.amazon.com/glue/index.html). See the [Federate to AWS Glue Catalog](#federate-to-aws-glue-catalog) section.| | |||
| `primary-meta-store.read-only-remote-meta-store-uris` | No | Can be used to configure an extra read-only endpoint for the primary Metastore. This is an optimization if your environment runs separate Metastore endpoints and traffic needs to be differted efficiently. Waggle Dance will direct traffic to the read-write or read-only endpoints based on the call being done. For instance `get_table` will be a read-only call but `alter_table` will be forwarded to the read-write Metastore.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for diverted
tldr; Split traffic based on called HMS API method, e.g.
getTable
will go to a readOnly HMS andalterTable
will go to readWrite HMSThe problem addressed here is running WD at scale. Generally our company deploys Waggle Dance as part of an Apiary Data lake: https://github.com/ExpediaGroup/apiary-data-lake.
This involves deploying ReadOnly and ReadWrite Metastores (HMS).
For the primary (local) metastore waggle dance is configured to the ReadWrite instance which connects to a ReadWrite RDS backend. This means all traffic both read and writes end up on our ReadWrite RDS instance. This PR tries to split that traffic and move read traffic to ReadOnly instance.
The benefit would be: