-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_credentials_http: Add support for EKS Pod Identities #9013
base: master
Are you sure you want to change the base?
Conversation
This patch rewrites how the HTTP credentials provider works to allow both ECS and EKS identities to work. It is based on the aws-sdk-go-v2 implementation. It validates that the endpoint is correct if the transport is HTTP, but does not support DNS resolution, however based on how the pod identity agent works today, DNS should not be needed. If the transport is HTTPS, which will not be the case when using EKS Pod Identities, any endpoint is allowed. This is in line with how the AWS SDK works. Similarly to the SDK, it also reads the authentication token environment variables, with the file taking precedence over the raw token variable. This has been tested against an EKS 1.30 cluster with AL2023 nodes. Signed-off-by: Andrew Titmuss <[email protected]>
This patch fixes the handling of IPv6 literals, which would fail to parse after not finding balanced `[]` characters. This happened because it searched for the first `:` before copying the host portion, so (assuming it didn't return NULL), the endpoint `[fd00:ec2::23]` would become `fd00`. I've added some tests around this case, but I wouldn't describe the tests as in-depth. This patch is needed for EKS Pod Identities to work on IPv6 clusters. Signed-off-by: Andrew Titmuss <[email protected]>
I suspect we probably need some docs updates for this as well, or at least a simple example of how to use it? Can we link a docs PR? |
I found this while searching for docs to update, but I can't see it referenced anywhere on docs.fluentbit.io. Do you want a link added to this page somewhere as well? |
I think it was linked from the S3 or similar pages, I'm sure I found it before. It really should be a docs page in the index probably too. |
Added the AWS Credentials docs to the index: fluent/fluent-bit-docs#1400 |
Looks like @PettitWesley has been working on the same feature for the aws/aws-for-fluent-bit distro: PettitWesley#32 Happy if you want to wait for that PR instead, it seems to handle some things differently to how I implemented it. For instance, my implementation reads the auth token environment variables once at startup (file path, not the contents) rather than on each refresh, which is in line with how I saw it implemented in the AWS SDKs. However, mine might not handle tokens with In any case, I definitely learned a lot about how the credential chain works by implementing this. |
This error handler is for JSON APIs that respond with a Code and Message field in their error responses. It is originally from PettitWesley/fluent-bit@7a2c1a8 Signed-off-by: Andrew Titmuss <[email protected]>
This patch fixes the following: 1. ensures malformed tokens (containing `\r\n`) are treated as invalid 2. fixes allowing any https host, which previously went through the same checks as http hosts 3. fixes a memory leak when provider initialisation fails 4. logs errors from the HTTP agent 5. rewrites the tests using the cases from PettitWesley/fluent-bit@195db30 Signed-off-by: Andrew Titmuss <[email protected]>
I've updated my PR with some improved error handling and rewritten tests that validate the unhappy path better, after looking at how my PR and @PettitWesley's branch differs. Running valgrind against the test cases reports no memory leaks, including fixing one that wasn't caught by the original tests I had written. |
// Loopback | ||
// IPv4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fluent Bit code style is to always use /* */ comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes that you linked that I made for the AWS Distro were also made on the master branch and tested thoroughly. Apologies that I forgot to put up a PR against this repo. Here is the PR: https://github.com/fluent/fluent-bit/pull/9206/files
This PR looks very very similar to my changes, which a few renamings and slight differences.
I strongly prefer the change I made in flb_utils to use SDS strings as opposed to this PR which modifies the existing flb_utils C string code- that must be thoroughly tested to avoid negative impact on other parts of this project.
If you continue with this work, please rebase your changes on top of my commits so we both get credit. I am unable to test this change right now, so help with testing is appreciated.
That's all good, I'm fine with your changes. As I had mentioned earlier, I had made my own implementation of the feature before finding your branch, then updated mine to account the behavioural differences between the two. Happy for this to be closed in favour of #9206 - I'll give it a whirl in my cluster too. |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR initially includes two commits, as I had to patch
flb_utils
for IPv6 clusters to work.This patch rewrites how the HTTP credentials provider works to allow both ECS and EKS identities to work. It is based on the aws-sdk-go-v2 implementation.
It validates that the endpoint is correct if the transport is HTTP, but does not support DNS resolution, however based on how the pod identity agent works today, DNS should not be needed. If the transport is HTTPS, which will not be the case when using EKS Pod Identities today, any endpoint is allowed. This is in line with how the AWS SDK works.
Similarly to the SDK, it also reads the authentication token environment variables, with the file taking precedence over the raw token variable.
This has been tested against an EKS 1.30 cluster with AL2023 nodes, but it would be great if someone else could test this too rather than just taking my word for it 😄. I did test with an ECS cluster as well to ensure that did not break as a result of this change, but I don't generally use ECS, so if someone else could validate this that would be great.
Fixes #8550
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Not sure how I'd test this in-cluster, but the tests for the credential provider should be evidence enough. If more test cases are needed I'm happy to write them
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
fluent/fluent-bit-docs#1400
Backporting
I'm fine with this being in a minor release instead of a patch, given the testing it should receive
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.