-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement gRPC server to ingest streaming features #3687
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mehmettokgoz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks for review @adchia. Waiting for your answers. |
@adchia addressed review comments. ✅ Changes:
|
Thanks @crispin-ki Do you have any idea why linter might be failing? The errors are not related to the changes in this PR and I see it's green for your PR. |
Yeah it's a bit weird that these are failing now, seeing as you didn't touch them. I think you have a few options though:
In terms of which is preferred, I think all have pros and cons. I think just go with whichever you feel is most appropriate (or if unsure, might be one for maintainers to decide - they might in fact have a better solution ) Logs for the failing workflow:
|
I found that it is related to new version of flake8 (which upgrades pycodestyle). Now pycodetype check for EE721 (https://github.com/PyCQA/pycodestyle/blob/main/CHANGES.txt#L12). I set flake version to |
f12dd5a
to
43a2c91
Compare
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
Signed-off-by: Danny C <[email protected]>
/lgtm |
/lgtm |
/lgtm /approve |
Signed-off-by: Danny C <[email protected]>
Signed-off-by: Danny C <[email protected]>
Signed-off-by: Danny C <[email protected]>
# [0.34.0](v0.33.0...v0.34.0) (2023-09-07) ### Bug Fixes * Add NUMERIC to bq_to_feast type map ([#3719](#3719)) ([6474b4b](6474b4b)) * Fix python unit tests ([#3734](#3734)) ([e81684d](e81684d)) * Handle unknown postgres source types gracefully ([#3634](#3634)) ([d7041f4](d7041f4)) * Pin protobuf version to avoid seg fault on some machines ([028cc20](028cc20)) * Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([#3730](#3730)) ([f2c5988](f2c5988)) * Run store.plan() only when need it. ([#3708](#3708)) ([7bc7c47](7bc7c47)) * Saved datasets no longer break CLI registry-dump command ([#3717](#3717)) ([f28ccc2](f28ccc2)) * Update py3.8 ci requirements for cython 3.0 release ([#3735](#3735)) ([1695c13](1695c13)) ### Features * Enhance customization of Trino connections when using Trino-based Offline Stores ([#3699](#3699)) ([ed7535e](ed7535e)) * Implement gRPC server to ingest streaming features ([#3687](#3687)) ([a3fcd1f](a3fcd1f))
) * Implemented gRPC server for ingesting streaming features. Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
# [0.34.0](feast-dev/feast@v0.33.0...v0.34.0) (2023-09-07) ### Bug Fixes * Add NUMERIC to bq_to_feast type map ([feast-dev#3719](feast-dev#3719)) ([6474b4b](feast-dev@6474b4b)) * Fix python unit tests ([feast-dev#3734](feast-dev#3734)) ([e81684d](feast-dev@e81684d)) * Handle unknown postgres source types gracefully ([feast-dev#3634](feast-dev#3634)) ([d7041f4](feast-dev@d7041f4)) * Pin protobuf version to avoid seg fault on some machines ([028cc20](feast-dev@028cc20)) * Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([feast-dev#3730](feast-dev#3730)) ([f2c5988](feast-dev@f2c5988)) * Run store.plan() only when need it. ([feast-dev#3708](feast-dev#3708)) ([7bc7c47](feast-dev@7bc7c47)) * Saved datasets no longer break CLI registry-dump command ([feast-dev#3717](feast-dev#3717)) ([f28ccc2](feast-dev@f28ccc2)) * Update py3.8 ci requirements for cython 3.0 release ([feast-dev#3735](feast-dev#3735)) ([1695c13](feast-dev@1695c13)) ### Features * Enhance customization of Trino connections when using Trino-based Offline Stores ([feast-dev#3699](feast-dev#3699)) ([ed7535e](feast-dev@ed7535e)) * Implement gRPC server to ingest streaming features ([feast-dev#3687](feast-dev#3687)) ([a3fcd1f](feast-dev@a3fcd1f))
) * Implemented gRPC server for ingesting streaming features. Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]>
# [0.34.0](feast-dev/feast@v0.33.0...v0.34.0) (2023-09-07) ### Bug Fixes * Add NUMERIC to bq_to_feast type map ([feast-dev#3719](feast-dev#3719)) ([6474b4b](feast-dev@6474b4b)) * Fix python unit tests ([feast-dev#3734](feast-dev#3734)) ([e81684d](feast-dev@e81684d)) * Handle unknown postgres source types gracefully ([feast-dev#3634](feast-dev#3634)) ([d7041f4](feast-dev@d7041f4)) * Pin protobuf version to avoid seg fault on some machines ([028cc20](feast-dev@028cc20)) * Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([feast-dev#3730](feast-dev#3730)) ([f2c5988](feast-dev@f2c5988)) * Run store.plan() only when need it. ([feast-dev#3708](feast-dev#3708)) ([7bc7c47](feast-dev@7bc7c47)) * Saved datasets no longer break CLI registry-dump command ([feast-dev#3717](feast-dev#3717)) ([f28ccc2](feast-dev@f28ccc2)) * Update py3.8 ci requirements for cython 3.0 release ([feast-dev#3735](feast-dev#3735)) ([1695c13](feast-dev@1695c13)) ### Features * Enhance customization of Trino connections when using Trino-based Offline Stores ([feast-dev#3699](feast-dev#3699)) ([ed7535e](feast-dev@ed7535e)) * Implement gRPC server to ingest streaming features ([feast-dev#3687](feast-dev#3687)) ([a3fcd1f](feast-dev@a3fcd1f))
PR feast-dev#3687 added a spiffy feature to ingest streaming features, but this came along with a large batch of depdencies. Notable this induces a core dependency on `protobuf>=4.21.6` while Feast itself is on `protobuf<4.23.4,>3.20`. This is a fiddly narrow range and excludes all 3.x uses. Signed-off-by: Chris Burroughs <[email protected]>
) * Implemented gRPC server for ingesting streaming features. Signed-off-by: mehmettokgoz <[email protected]> Signed-off-by: Danny C <[email protected]> Signed-off-by: Attila Toth <[email protected]>
# [0.34.0](feast-dev/feast@v0.33.0...v0.34.0) (2023-09-07) ### Bug Fixes * Add NUMERIC to bq_to_feast type map ([feast-dev#3719](feast-dev#3719)) ([6474b4b](feast-dev@6474b4b)) * Fix python unit tests ([feast-dev#3734](feast-dev#3734)) ([e81684d](feast-dev@e81684d)) * Handle unknown postgres source types gracefully ([feast-dev#3634](feast-dev#3634)) ([d7041f4](feast-dev@d7041f4)) * Pin protobuf version to avoid seg fault on some machines ([028cc20](feast-dev@028cc20)) * Remove unwanted excessive splitting of gcs path, so expected gcs parquet paths are returned from BigQueryRetrievalJob.to_remote_storage() ([feast-dev#3730](feast-dev#3730)) ([f2c5988](feast-dev@f2c5988)) * Run store.plan() only when need it. ([feast-dev#3708](feast-dev#3708)) ([7bc7c47](feast-dev@7bc7c47)) * Saved datasets no longer break CLI registry-dump command ([feast-dev#3717](feast-dev#3717)) ([f28ccc2](feast-dev@f28ccc2)) * Update py3.8 ci requirements for cython 3.0 release ([feast-dev#3735](feast-dev#3735)) ([1695c13](feast-dev@1695c13)) ### Features * Enhance customization of Trino connections when using Trino-based Offline Stores ([feast-dev#3699](feast-dev#3699)) ([ed7535e](feast-dev@ed7535e)) * Implement gRPC server to ingest streaming features ([feast-dev#3687](feast-dev#3687)) ([a3fcd1f](feast-dev@a3fcd1f)) Signed-off-by: Attila Toth <[email protected]>
PR feast-dev#3687 added a spiffy feature to ingest streaming features, but this came along with a large batch of depdencies. Notable this induces a core dependency on `protobuf>=4.21.6` while Feast itself is on `protobuf<4.23.4,>3.20`. This is a fiddly narrow range and excludes all 3.x uses. Signed-off-by: Chris Burroughs <[email protected]>
PR feast-dev#3687 added a spiffy feature to ingest streaming features, but this came along with a large batch of depdencies. Notable this induces a core dependency on `protobuf>=4.21.6` while Feast itself is on `protobuf<4.23.4,>3.20`. This is a fiddly narrow range and excludes all 3.x uses. Signed-off-by: Chris Burroughs <[email protected]>
PR feast-dev#3687 added a spiffy feature to ingest streaming features, but this came along with a large batch of depdencies. Notable this induces a core dependency on `protobuf>=4.21.6` while Feast itself is on `protobuf<4.23.4,>3.20`. This is a fiddly narrow range and excludes all 3.x uses. Signed-off-by: Chris Burroughs <[email protected]>
What this PR does / why we need it:
Currently only way to ingest streaming features to Feast is using the Spark/Kafka processor. This processor has several limitations. It retrieves data in batches and forces to run all transformations in Python runtime.
The gRPC server that listens for the streaming features allows users to apply transformation on more powerful streaming engines and send it to Feast then. This feature introduces new streaming sources for Feast, in addition to Kafka and Kinesis.
Our main purpose for the gRPC ingestion service is to integrate Hazelcast streaming engine with Feast. Hazelcast provides various connectors, and supports to process those sources in real-time. This server makes possible to sink transformed data to Feast.