Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Amplify Auth v1 to v2 migration fails 5-10% of the time, logs user out #2929

Closed
1 task done
camhart opened this issue Sep 24, 2024 · 40 comments
Closed
1 task done
Labels
auth Related to the Auth category/plugins bug Something isn't working

Comments

@camhart
Copy link

camhart commented Sep 24, 2024

Before opening, please confirm:

Language and Async Model

Java

Amplify Categories

Authentication

Gradle script dependencies

// Put output below this line

implementation 'com.amplifyframework:aws-auth-cognito:2.21.0'

Environment information

# Put output below this line
C:\Users\Cam\projects\project-android>gradlew --version

------------------------------------------------------------
Gradle 8.7
------------------------------------------------------------

Build time:   2024-03-22 15:52:46 UTC
Revision:     650af14d7653aa949fce5e886e685efc9cf97c10

Kotlin:       1.9.22
Groovy:       3.0.17
Ant:          Apache Ant(TM) version 1.10.13 compiled on January 4 2023
JVM:          20.0.2 (Oracle Corporation 20.0.2+9-78)
OS:           Windows 10 10.0 amd64

Please include any relevant guides or documentation you're referencing

No response

Describe the bug

I've updated my Android app to use AWS Amplify V2. I deployed it to beta users, and ~5-10% of them had issues with the data migration. Essentially they ended up logged out of the app after their app updated and migrated from v1 to v2. This shouldn't happen. If I have those customers uninstall/reinstall the android app, and login, everything works moving forward, however this isn't an acceptable solution.

I created a ticket with AWS support and they told me to create a github issue. See case 172444220700816.

Here's an example log output when the app attempts to make API calls but is unable to due to being logged out.

D/ 09-23 15:31:15.551 BackendCallTask( 5715): AUTH fetchAuthSessionRequest
D/ 09-23 15:31:16.729 BackendCallTask( 5715): AUTH fetchAuthSessionRequest result, isSignedIn=true
D/ 09-23 15:31:16.729 BackendCallTask( 5715): AUTH exception: SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.}
W/ 09-23 15:31:16.732 System.err( 5715): SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.}
W/ 09-23 15:31:16.732 System.err( 5715):  at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1.execute(SourceFile:48)
W/ 09-23 15:31:16.732 System.err( 5715):  at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1$1.invokeSuspend(Unknown Source:12)
W/ 09-23 15:31:16.733 System.err( 5715): Caused by: NotAuthorizedException(message=Invalid Refresh Token.)
W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.model.NotAuthorizedException$Builder.a(SourceFile:4)
W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.NotAuthorizedExceptionDeserializer.c(SourceFile:27)
W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.d(SourceFile:344)
W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.b(SourceFile:1)
W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.c(SourceFile:43)
W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.b(SourceFile:1)
D/ 09-23 15:31:28.709 BackendCallTask( 5715): AUTH fetchAuthSessionRequest
D/ 09-23 15:31:28.963 BackendCallTask( 5715): AUTH fetchAuthSessionRequest result, isSignedIn=true
D/ 09-23 15:31:28.963 BackendCallTask( 5715): AUTH exception: SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.}
W/ 09-23 15:31:28.963 System.err( 5715): SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.}
W/ 09-23 15:31:28.963 System.err( 5715):  at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1.execute(SourceFile:48)
W/ 09-23 15:31:28.963 System.err( 5715):  at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1$1.invokeSuspend(Unknown Source:12)
W/ 09-23 15:31:28.963 System.err( 5715): Caused by: NotAuthorizedException(message=Invalid Refresh Token.)
W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.model.NotAuthorizedException$Builder.a(SourceFile:4)
W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.NotAuthorizedExceptionDeserializer.c(SourceFile:27)
W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.d(SourceFile:344)
W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.b(SourceFile:1)
W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.c(SourceFile:43)
W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.b(SourceFile:1)
D/ 09-23 15:31:3

I'd like to request a feature addition to this library, where the migration creates persistent migration logs that the app developer can request to help troubleshoot issues like this. Also, it'd be able to be able to retry the migration. Right now it seems to destroy all the old v1 data and just assumes everything worked when it doesn't. The migration fails sporadically and I have no clue why, with no recourse for troubleshooting. I have to wait for a customer support ticket complaining about the problem in order to get logs, but they aren't really too helpful as they just show the user was signed out for some reason. I've been using aws amplify auth v1 for several years without any issue keeping users logged in.

Reproduction steps (if applicable)

I've been unable to reproduce the issue myself.

Code Snippet

// Put your code below this line.

Log output

// Put your logs below this line


amplifyconfiguration.json

{
"auth": {
"plugins": {
"awsCognitoAuthPlugin": {
"IdentityManager": {
"Default": {}
},
"CredentialsProvider": {
"CognitoIdentity": {
"Default": {
"PoolId": "us-west-2:xxxxxxxxxxxx",
"Region": "us-west-2"
}
}
},
"CognitoUserPool": {
"Default": {
"PoolId": "us-west-2_xxxxxxxxx",
"AppClientId": "xxxxxxxxx",
"AppClientSecret": "xxxxxxxxx",
"Region": "us-west-2"
}
},
"Auth": {
"Default": {
"OAuth": {
"WebDomain": "cognitoauth.xxxxxxxxx.io",
"AppClientId": "xxxxxxxx",
"AppClientSecret": "xxxxxxxxx",
"SignInRedirectURI": "xxxxxxxx://callback/",
"SignOutRedirectURI": "xxxxxxxx://signout/",
"Scopes": [
"email",
"openid",
"profile",
"aws.cognito.signin.user.admin"
]
},
"authenticationFlowType": "USER_SRP_AUTH"
}
}
}
}
}
}

GraphQL Schema

// Put your schema below this line

Additional information and screenshots

One more detail. V1 of the amplify auth library has code that Google Play throws big warnings about and claims it'll stop accepting app updates that use it. Fixing this issue with the v1 -> v2 migration should be a top priority, as continuing to use v1 in the interim isn't an option. I essentially can't update my app unless it's using amplify v2.

@github-actions github-actions bot added pending-triage Issue is pending triage pending-maintainer-response Issue is pending response from an Amplify team member labels Sep 24, 2024
@mattcreaser
Copy link
Member

Sorry to hear you're having issues @camhart. Can you please confirm that you updated directly to 2.21.1 and did not first try to use an older version of v2? There was a known issue in the migration code that was fixed in version 2.16.1.

Is reinstalling the app the only solution? What about calling Amplify.Auth.fetchAuthSession with options specifying forceRefresh = true?

Are there any obvious similarities between the affected users?

@mattcreaser mattcreaser added bug Something isn't working auth Related to the Auth category/plugins labels Sep 24, 2024
@github-actions github-actions bot removed pending-maintainer-response Issue is pending response from an Amplify team member pending-triage Issue is pending triage labels Sep 24, 2024
@ruisebas ruisebas added the pending-maintainer-response Issue is pending response from an Amplify team member label Sep 25, 2024
@camhart
Copy link
Author

camhart commented Sep 25, 2024

Can you please confirm that you updated directly to 2.21.1 and did not first try to use an older version of v2? There was a known issue in the migration code that was fixed in version 2.16.1.

Yes, we went direct from v1 to v2.21.1.

Is reinstalling the app the only solution? What about calling Amplify.Auth.fetchAuthSession with options specifying forceRefresh = true?

I haven't tried this, but didn't think it would be needed. The SDK is supposed to detect when credentials are expired and handle refreshing them automatically isn't it?

@mattcreaser
Copy link
Member

That's correct, it should - I only suggested trying to force refresh the tokens as a way to gather more information about what is going wrong. Another thought is to try catching the exception and invoking signOut.

We will need to investigate this issue to see what's going on - unfortunately it sounds like it will be difficult to reproduce. Any additional details about the affected users would be beneficial.

@github-actions github-actions bot removed the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 4, 2024
@camhart
Copy link
Author

camhart commented Oct 10, 2024

unfortunately it sounds like it will be difficult to reproduce

Ideally you can add more tools to the library so I can better troubleshoot the issue to provide more info. I'm confident if I release the app to another 1% of my customers, I'll get a few emails about it. But I don't want to do that until there's some ability to troubleshoot. We need some sort of migration record to indicate what happened to the migration and to understand why it failed. I'm not asking for you to solve it immediately. But adding some support for better troubleshooting migration issues seems like a low hanging fruit that moves the needle forward.

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 10, 2024
@harsh62
Copy link
Member

harsh62 commented Oct 21, 2024

@camhart Can you please share the code snippets so that we can try to reproduce the issue in a local environment.. Snippets of how Auth category is being used from from both V1 and V2 will be really helpful to isolate how we investigate the issue.
Please share any other details you think will help us isolate the issue.

@github-actions github-actions bot removed the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 21, 2024
@camhart
Copy link
Author

camhart commented Oct 21, 2024

@harsh62 I don't have code snippets to share that can reproduce the issue. I've tried multiple times with my entire app to replicate the problem and can't replicate it locally, but it is happening. This is why I'm arguing for better tools to investigate/troubleshoot problems relating to the migration.

Here are all the Amplify method calls I use:

  • Amplify.Auth.signIn
  • Amplify.Auth.fetchAuthSession
  • Amplify.Auth.signUp
  • Amplify.Auth.signInWithWebUI
  • Amplify.Auth.fetchUserAttributes
  • Amplify.Auth.confirmSignUp
  • Amplify.Auth.signOut

V1 used the same method calls but adjusted for the api changes between the two. I don't use Amplify for anything else--only Auth.

Please share any other details you think will help us isolate the issue.

My app is a long running background app that stays running 24/7 in the background on the device (it's a parental control app). It automatically launches itself after an app update has occurred.

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 21, 2024
@harsh62
Copy link
Member

harsh62 commented Oct 21, 2024

Are you able to isolate if the issue is happening with customers using Amplify.Auth.signInWithWebUI compared to Amplify.Auth.signIn?
Another follow up to that would be, if your customers are able to use Amplify.Auth.signIn and Amplify.Auth.signInWithWebUI interchangeably? i.e. customer could be using Amplify.Auth.signInWithWebUI in Amplify V1 and decided to use Amplify.Auth.signIn in Amplify V2.

If you could answer this, it would greatly narrow down our reproduction codepath.

@github-actions github-actions bot removed the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 21, 2024
@camhart
Copy link
Author

camhart commented Oct 21, 2024

Are you able to isolate if the issue is happening with customers using Amplify.Auth.signInWithWebUI compared to Amplify.Auth.signIn?

Not easily. If the problem is happening to customers logged in via one of those calls, it's not happening 100% of the time. I can release the app to another 1% of customers and wait for the support tickets to come in, but I'm really hoping to avoid doing that without having better tools in place to troubleshoot the migration.

Another follow up to that would be, if your customers are able to use Amplify.Auth.signIn and Amplify.Auth.signInWithWebUI interchangeably?

They can use one or the other, but not both. Once logged in one way, we don't give them the option to login again without signing out first.

i.e. customer could be using Amplify.Auth.signInWithWebUI in Amplify V1 and decided to use Amplify.Auth.signIn in Amplify V2.

We don't give customers the ability to logout once the device is setup (there's additional steps they have to take after logging in to set the device up with my app). There's only a very brief window where they can logout where the customer has logged in but not setup the device. Once the device is setup, if they want to logout they need to uninstall/reinstall the app. The customers who've reported the issue to me have all had their device setup fully, so there is no longer an option for them to logout at that point. So, long story short, it's not possible for them to use Amplify.Auth.signIn and then use Amplify.Auth.signInWithWebUI (or vice versa). Does that make sense?

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 21, 2024
@harsh62
Copy link
Member

harsh62 commented Oct 21, 2024

@camhart This is good information. Another question I have is that has your amplifyconfiguration.json file changed in anyway from Amplify V1 to V2?

From the issues reported, are you able to see if anything common in the affected users, device types, OS versions, manufacturer type, or anything else?

@github-actions github-actions bot removed the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 21, 2024
@camhart
Copy link
Author

camhart commented Oct 21, 2024

Another question I have is that has your amplifyconfiguration.json file changed in anyway from Amplify V1 to V2?

No it hasn't changed.

From the issues reported, are you able to see if anything common in the affected users, device types, OS versions, manufacturer type, or anything else?

I haven't kept track of this. However, I do recall Samsung being one of the devices and it was on OS version 13. I have multiple samsung test devices though and I haven't been able to replicate the issue on any of them. When I release the app update to more customers, we get reports of customers having issues, but I can guarantee many have the issue but never report it. They'll just cancel their subscription with us or try and resolve it on their own.

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 21, 2024
@harsh62
Copy link
Member

harsh62 commented Oct 21, 2024

Thanks for providing all the information, one of our engineers will try to reproduce this issue locally by trying out different codepaths.. Will get back to you when we have more updates.

@github-actions github-actions bot removed the pending-maintainer-response Issue is pending response from an Amplify team member label Oct 21, 2024
@tylerjroach
Copy link
Member

@camhart One more question that would help in our research. Can you post all of the AWS dependencies you are using in Gradle? Ex Amplify as well as any other AWS SDKs.

@camhart
Copy link
Author

camhart commented Nov 1, 2024

    implementation 'com.amplifyframework:aws-auth-cognito:2.21.0'
    coreLibraryDesugaring 'com.android.tools:desugar_jdk_libs:2.0.3'

    implementation 'com.amazonaws:aws-android-sdk-apigateway-core:2.16.1'

Those are the only dependencies being used. Let me know if you need anything else!

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Nov 1, 2024
@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Dec 9, 2024
@camhart
Copy link
Author

camhart commented Dec 9, 2024

I do want to add--thank you for your help. I'm not trying to be a complainer here, but I do want to pass the pain that I'm feeling along so you have an appropriate understanding of the impact this troubleshooting experience has had.

@tylerjroach
Copy link
Member

Hi @camhart, I understand your frustrations. Thank you for quickly answering all of the questions sent your way. I know it has been a lot, but these types of edge cases are always difficult to figure out with lack of logs that highlight the problem. It's especially hard considering our team members, and yourself, have been unable to replicate the failure.

There could be something unique about these 5-10% of users that we haven't yet tracked down (ex: sign in method, device type, device OS, etc). We are continuing to look at any failure paths on our end.


Should I simply upgrade com.amazonaws:aws-android-sdk-apigateway-core to the latest version (2.77.1)? Would that potentially stop transitively calling code and wiping out credentials? How confident are you that it'll fix the problem I'm facing?

I don't believe this would directly fix the problem, but it is always best to try and keep up to date with our latest versions. You are using a version of API Gateway that is 5 years old, which means it is missing 5 years of any bug fixes that would have possibly been added along the way. Given that you are confident the issues are happening with each rollout, and CognitoCachingCredentials provider is no longer being used, I do not expect this cause the invalid refresh token error you are seeing.

@github-actions github-actions bot removed the pending-maintainer-response Issue is pending response from an Amplify team member label Dec 9, 2024
@camhart
Copy link
Author

camhart commented Dec 9, 2024

Sounds good, I'll wait to hear further instruction from you then before trying anything. Getting this fixed is top priority on my end, so I'll respond quickly and as clearly as possible.

@tylerjroach
Copy link
Member

@camhart If you wouldn't mind, join our discord channel https://discord.com/invite/amplify and you can reach out to me @tylerjroach. We can dm and set up a screenshare call.

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Dec 9, 2024
@camhart
Copy link
Author

camhart commented Dec 9, 2024

Just sent you a DM.

@tylerjroach
Copy link
Member

Thanks, we can continue discussion there!

@github-actions github-actions bot removed the pending-maintainer-response Issue is pending response from an Amplify team member label Dec 9, 2024
@tylerjroach
Copy link
Member

tylerjroach commented Dec 10, 2024

I have identified an issue with migrating logins that have Device Tracking enabled. I am recognizing this ticket as a bug and we are actively working on a fix.

This is not an issue with your hosted ui (web) sign ins, as they do not use device tracking. This will be an issue with any SRP sign ins that use device tracking.

@tylerjroach tylerjroach added bug Something isn't working and removed question General question labels Dec 10, 2024
@tylerjroach
Copy link
Member

@camhart I have discovered the root cause and am working on a fix here: #2963

I believe we should be able to migrate the missing device metadata to our new credential store, which would result in token refreshes immediately working without requiring another sign in.

The cause is due to aliased userIds. When email is used for signIn, the users actual userId is a UUID. During the migration process, Amplify v2 will attempt to migrate based on the email address, when it should be looking at the UUID userId instead.


In my testing, I also identified a workaround. If you are not actually using Device Tracking (primarily used to prevent repeated MFA validations on sign in), I believe the issue can immediately be resolved by changing the "Remember User Devices" setting to "Don't Remember" in the Cognito console. This turns off the device tracking verification on token refreshes. The refresh calls that were failing would now succeed, because Cognito no longer checks the device metadata upon refresh. If you were to re-enable this setting, the refreshes would begin failing again until our official fix is released.


TLDR: We are working on a fix, but if you don't actually need Device Tracking enabled for your use case, token refreshes will begin working again if you toggle "Remember User Devices" to "Don't Remember".

@camhart
Copy link
Author

camhart commented Dec 11, 2024

Thank you for the update. Great news if we can migrate without causing people to have to sign in again.

Is there any risk that changing the "Remember User Devices" setting could have an adverse effect that couldn't easily be undone by changing it back?

The plan was to eventually offer MFA support. That's still the plan.

We are working on a fix

How long does a fix like this typically take to get released? A week? Three months?

Thanks again! Really happy to finally get this figured out.

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Dec 11, 2024
@tylerjroach
Copy link
Member

I do not believe there is any risk in the change.

  • Tokens that were generated while device tracking was turned on (and currently failing to refresh) would begin successfully refreshing with device tracking turned off. If device tracking were turned back on, they would begin failing again until this PR fix is ready.
  • Tokens that were generated while device tracking was turned off would continue to work even if device tracking were turned on again.

I don't see any adverse side effects in your case. MFA could still be enabled. Device Tracking is used as a way to bypass subsequent MFA requirements on future sign ins. Considering your app doesn't have signOut functionality, this really wouldn't matter in your case.

Once a fix is merged and ready, it will typically go in the next release. We try and release weekly if there are commits ready to go live.

@github-actions github-actions bot removed the pending-maintainer-response Issue is pending response from an Amplify team member label Dec 12, 2024
@camhart
Copy link
Author

camhart commented Dec 12, 2024

I can confirm that disabling device tracking fixed the issue for one of our customers (hopefully all of them--time will tell). I realized I had it disabled already in my dev environment--that's why my testing didn't catch the issue when I did my own 24 hour tests. Thank you for all the help! Very much appreciated.

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Dec 12, 2024
@thisisabhash thisisabhash removed the pending-maintainer-response Issue is pending response from an Amplify team member label Dec 13, 2024
@tylerjroach
Copy link
Member

@camhart Amplify v2.25.1 is updated which contains the fix that will properly migrate device metadata. This will retroactively migrate any previously missed data as well. I would recommend giving your users time to upgrade to this version before turning device tracking back on.

@tylerjroach tylerjroach added the closing soon This issue will be closed in 7 days unless further comments are made. label Dec 16, 2024
@camhart
Copy link
Author

camhart commented Dec 16, 2024 via email

@github-actions github-actions bot added the pending-maintainer-response Issue is pending response from an Amplify team member label Dec 16, 2024
@github-actions github-actions bot removed closing soon This issue will be closed in 7 days unless further comments are made. pending-maintainer-response Issue is pending response from an Amplify team member labels Dec 18, 2024
Copy link
Contributor

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auth Related to the Auth category/plugins bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants