Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Fix GCP Identity Aware Proxy integration errors occurring with grpcio>1.56.0 #2034

Closed
wants to merge 2 commits into from

Conversation

fg91
Copy link
Member

@fg91 fg91 commented Dec 8, 2023

Why are the changes needed?

Flyte recently integrated with GCP Identity Aware Proxy, including a reference deployment using the GCE Ingress controller instead of the Nginx Ingress controller: flyteorg/flyte#3965

Around the same time as the respective flytekit PR was merged, a restriction on the grpcio version required by flytekit was lifted.

With the previously pinned versions of grpcio, namely versions grpcio>=1.50.0,!=1.55.0,<1.53.1,<2.0, the integration with GCP Identity Aware Proxy worked. However, for newer versions of grpcio, flytekit isn't able to make requests to flyteadmin through IAP anymore.

This PR proposes a fix that enables flytekit to make requests through IAP also for newer grpcio versions.

What changes were proposed in this pull request?

For these grpcio versions, requests through IAP currently work/don't work:

* `1.53.0` works  <- newest allowed version before https://github.com/flyteorg/flytekit/commit/cf165f784f0034cbda5a25f03e24a85ee0e3c2d2
* `1.53.1` works
* `1.53.2` works
* `1.54.0` works
* `1.54.2` works
* `1.54.3` works
* `1.55.3` doesn't work (prior `1.55` versions have been yanked)
* `1.56.0` works
* `1.56.2` doesn't work
* `1.57.0` doesn't work
* `1.58.0` doesn't work
* `1.59.0` doesn't work

For the versions which don't work, the warning/error message is:

...
E1208 16:37:57.673192000 4371793280 hpack_parser.cc:999]               Error parsing 'content-type' metadata: invalid value
...
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Stream removed"
	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"Stream removed", grpc_status:2, created_time:"2023-12-08T16:36:24.576706+01:00"}"

When using a version for which the call to flyteadmin is ultimately successful, this warning (similar but not exactly the same) is shown:

E1208 16:37:04.492457000 4307731840 hpack_parser.cc:853]               Error parsing metadata: error=invalid value key=content-type value=text/html; charset=UTF-8

A colleague of mine, @fellhorn, realized that when one makes an authenticated request to one of the flyteadmin grpc services through IAP, the content-type of the response is application/grpc ...

authenticated

... whereas if one makes an unauthenticated request (e.g. incognito window), the content type of the response returned by IAP is text/html:

unauthenticated

This explains the warning:

E1208 16:37:04.492457000 4307731840 hpack_parser.cc:853]               Error parsing metadata: error=invalid value key=content-type value=text/html; charset=UTF-8

Flytekit's AuthUnaryInterceptor first tries to make an unauthenticated request and if it receives a 401 error, tries again with auth metadata.

For authentication with flyteadmin this makes a lot of sense as it might not be configured to use authentication. Generally, the philosophy appears to be that the client doesn't have to be configured correctly, flyteadmin will tell it how to authenticate.

I propose to not attach auth metadata for proxies like GCP Identity Aware Proxy lazily but instead always include it proactively. This solves the error with the wrong content-type.


Considering alternatives:

If flyteadmin is behind a proxy, unauthenticated requests (with the proxy, not flyte) cannot ever reach flyteadmin. This means that flyteadmin cannot tell the user how to authenticate as the requests never reach it and even if they did, flyteadmin is not aware of the proxy. This means the user has to actively configure proxy-authorization by including a proxyCommand in the flyte client config anyways. And if the user does so, I would argue that it is also acceptable to make use of the knowledge that requests without proxy-authorization are expected to fail anyways. Also it saves a request every time.

The alternative would be to somehow handle the unexpected context type text/html on unauthenticated requests.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Fabio Grätz added 2 commits December 8, 2023 17:30
@fg91 fg91 requested a review from jeevb December 8, 2023 17:24
@fg91 fg91 self-assigned this Dec 8, 2023
@fg91 fg91 added the bug Something isn't working label Dec 8, 2023
Copy link

codecov bot commented Dec 8, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (ed14694) 84.78% compared to head (c537304) 85.83%.
Report is 1 commits behind head on master.

Files Patch % Lines
flytekit/clients/grpc_utils/auth_interceptor.py 42.85% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2034      +/-   ##
==========================================
+ Coverage   84.78%   85.83%   +1.05%     
==========================================
  Files         251      306      +55     
  Lines       20145    22904    +2759     
  Branches     3470     3470              
==========================================
+ Hits        17080    19660    +2580     
- Misses       2470     2649     +179     
  Partials      595      595              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jeevb
Copy link
Contributor

jeevb commented Dec 9, 2023

Not sure if we want to refresh credentials on every call. Alternatively, could we consider checking the expiry of an existing token (from the claims?) and use that to determine if the credentials need to be refreshed?

This is very reminiscent of this issue. Somewhat interesting that a GCP product does not return an appropriate content-type to a GRPC client.

As mentioned in the above PR, the Go client already maps HTTP errors to GRPC ones if a grpc-status header is not present. Perhaps, this warrants further discussion to enable this behavior on the Python client as well.

@fg91
Copy link
Member Author

fg91 commented Dec 9, 2023

Not sure if we want to refresh credentials on every call.

You are right, this is unreasonable. As this issue has been giving me headaches for a while, I was so happy yesterday that all of a sudden I had something working that my brain happily jumped to conclusions ^^

Alternatively, could we consider checking the expiry of an existing token (from the claims?) and use that to determine if the credentials need to be refreshed?

Since flytekit receives the token for the proxy-authorization header from an external command, it doesn't know the idp/key and cannot validate the token. But even without validating, it would still work to use the exp of the token to refresh it if it would be expired.
Since this would still be a rather unsatisfying bandaid and since just restricting the grpcio version works for now, shell we treat this as the backup plan if we can't manage a better fix?

This is very reminiscent of this issue.

Isn't it not only similar but exactly the same issue? Here, @yashykt also explained that "the value of content-type key received is text/html which is failing parsing".

Somewhat interesting that a GCP product does not return an appropriate content-type to a GRPC client,

Especially interesting that a GCP product does that even after flytekit, with this fix, explicitly says it accepts "application/grpc" ...

As mentioned in the above PR, the Go client already maps HTTP errors to GRPC ones if a grpc-status header is not present. Perhaps, this warrants further discussion to enable this behavior on the Python client as well.

Let's do that.

@kumare3
Copy link
Contributor

kumare3 commented Dec 10, 2023

OMG, now the proxy adds to the woes of grpc?

@fg91 fg91 marked this pull request as draft December 13, 2023 10:52
@fg91 fg91 closed this Mar 8, 2024
@fg91
Copy link
Member Author

fg91 commented Mar 8, 2024

Fixed in #2212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants