-
Notifications
You must be signed in to change notification settings - Fork 654
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] sts get caller identity does not work immediately #2517
Comments
@hjkatz -- This is ultimately a fault in server-side behavior. The IAM state change associated with retrieving credentials (via AssumeRole or whatever else) does not propagate immediately after credentials are returned. |
Hi @hjkatz , Just to chime in as well. This is not an SDK issue and is not unique to SSO. What you are experiencing is a propagation delay that is solved with a retry. This is not an issue with the SDK itself but just a nature of a distributed system. Thanks, |
This issue is now closed. Comments on closed issues are hard for our team to see. |
@RanVaknin Kindly review the linked discussion. Retrying 5 times with a gradual backoff from 500ms to 2s still reproduces the issue. (I can provide a mvp example if desired too.) I understand that a distributed system will require time to propogate the new token. So I'll start with my goal instead: I want to verify that the SSO session via the SDK is valid. How can I test that reliably? |
@RanVaknin I realize that the discussion doesn't have as clear of an example as I'm suggesting. I'll upload a clear example later today. |
// Wrapper for sts.GetCallerIdentity() that supports retries
//
// Use the context to set a maximum time that retries can be performed.
// When the context is canceled then the final response will be returned.
//
// See: https://github.com/aws/aws-sdk-go-v2/discussions/2093#discussioncomment-8455830
func StsGetCallerIdentity(ctx context.Context, client *sts.Client) (result *sts.GetCallerIdentityOutput, err error) {
if client == nil {
return nil, errs.New("cannot call StsGetCallerIdentity with nil client!")
}
attempt := 0
retries := 5
// internal lib that implements a backoff between start -> end, without jitter (false)
backoff := reliable.NewBackoff(100*time.Millisecond, 1*time.Second, false)
for attempt < retries {
attempt++
select {
case <-ctx.Done():
// ran out of time, return whatever we got last
return
default:
// continue below
}
result, err = client.GetCallerIdentity(ctx, &sts.GetCallerIdentityInput{})
if err == nil {
// success
return // final values
}
// See: https://github.com/aws/aws-sdk-go-v2/discussions/2093
if !strings.Contains(err.Error(), "api error InvalidClientTokenId") {
// non-retryable error
return // last result + error
}
// error, backoff and try again
be := backoff.Wait(ctx) // Wait() == time.Sleep(backoff.NextDuration())
if be != nil {
// context canceled
return // last values
}
}
return // last values
} I hope this helps reproduce what we're seeing. I'm also hopeful we can find a way via the SDK to verify the session credentials returned by the SSO credentials provider are valid. Some thoughts to get at this information:
Happy to discuss alternatives too! |
@hjkatz On average, how long does it take for This really seems like it should be something that sts models as In case you're not familiar with waiters -- https://aws.github.io/aws-sdk-go-v2/docs/making-requests/#using-waiters. The most ubiquitous example in my mind would be tl;dr waiting for async state changes is a problem we've solved at large, if I understand correctly it's just a question of pushing for sts to add some additional modeling to solve this specific case |
Followup - what is the delay between you provisioning the token (looks like through sso) and first calling If it's on the order of seconds, then what I said above generally stands. If this is like minutes or hours, that doesn't seem at all like acceptable behavior in the IAM sense and would warrant further investigation. |
It's on the order of milliseconds. For context we have a shared developer CLI that everyone uses for various commands/tools/utilities. In that CLI we annotate some commands as needing SSO to work correctly. Many of our commands are annotated and interact with AWS in some required way. Our goal was to warn the user that they have not started their SSO session for the running command. To do this we need to check if the session is valid (not sure how to do this), and we came up with generating a token then trying to see if it works.
For our use case I think the order of seconds is too long. It feels like a delay for our users interacting with a CLI. I would prefer milliseconds or some approach that does the bare minimum for testing that the SSO session is valid for generating an STS token or something like that. |
Sorry, my last question there was incomplete in wording. I'm trying to understand what the actual delay is you're observing between provisioning the token and then getting a successful call to |
I gotcha. I was writing up a test case to get some real data.
Here's the summary of 1 million attempts. It's looking much better today than when I originally opened the ticket. For whatever reason I'm not seeing anything take longer than ~100ms but in the past I would feel the delay more in the ~1-2s range, so maybe something's improved. (It could be my network as I'm at my parents' place atm) How about I run this test again daily and get back to you with more data? |
Today's summary also seems fine:
I'm suspecting that the way I'm using a backoff after getting the credentials should be a naive loop with |
Today seems the same too.
I'm going to add additional logging into our CLI and see if I can reproduce any inconsistency today, but my suspicion is that 5 attempts isn't enough and we just need to try more times. |
My recommendation in general there would not be to fix the number of attempts and instead write your "waiting" construct to accept a timeout. This is how modeled waiters are written (well, generated, but obviously we wrote the code to do that). The waiter then just retries "infinitely" (with an increasing backoff) until it either hits the success case or exceeds the caller-provided deadline. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Describe the bug
See: #2093
Expected Behavior
I expect the call to
sts.GetCallerIdentity()
to succeed immediately after authenticating/receiving new sts credentials.Current Behavior
The calls to
sts.GetCallerIdentity()
seem to be inconsistently failing.Reproduction Steps
See: #2093
Possible Solution
See: #2093
Additional Information/Context
No response
AWS Go SDK V2 Module Versions Used
Compiler and Version used
go version go1.21.6 linux/amd64
Operating System and version
ubuntu
The text was updated successfully, but these errors were encountered: