Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add scope user_id to JWT payload #33455

Merged
merged 14 commits into from
Oct 30, 2023
Merged

feat: Add scope user_id to JWT payload #33455

merged 14 commits into from
Oct 30, 2023

Conversation

moeez96
Copy link
Contributor

@moeez96 moeez96 commented Oct 10, 2023

Description

Ecommerce system uses the JWT payload attribute user_id to populate the field lms_user_id in the Ecommerce User table.
The JWT payload is built on edx-platform and sent to the Ecommerce system. Currently the attribute user_id is missing from this payload. This PR adds the missing attribute user_id to the JWT payload.

Supporting information

Jira ticket: https://2u-internal.atlassian.net/browse/LEARNER-9640
Related ADR: https://github.com/openedx/ecommerce/blob/master/docs/decisions/0004-unique-identifier-for-users.rst
Related Ecommerce system attribute mapping: https://github.com/openedx/ecommerce/blob/f0e196fe4371f56a14969a1fb0d2fed79c39630a/ecommerce/settings/base.py#L654

Testing instructions

  1. Login a user via JWT
  2. Use the JWT token to call an Ecommerce API for e.g. {{ecommerce_domain}}/api/iap/v1/basket/add/?sku={{sku}} GET
  3. Place a debugger in the edx-drf-extensions in the ecommerce system in https://github.com/openedx/edx-drf-extensions/blob/e700fbffedde78963ae1e46d4267d10787dd9d0e/edx_rest_framework_extensions/auth/jwt/authentication.py#L129 and ensure the payload has the user_id included in it.

Deadline

None

@moeez96 moeez96 changed the title [WIP] feat: Add user_id to JWT payload feat: Add user_id to JWT payload Oct 11, 2023
Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should be closed. Instead, you’ll need to add the user_id scope for the oauth client in Django admin.

@moeez96
Copy link
Contributor Author

moeez96 commented Oct 16, 2023

Thanks for the review @robrap .
The API involved (in this context) for the creation of a JWT token is {{lms_domain}}/oauth2/access_token/

response = super().dispatch(request, *args, **kwargs)

This view does not add further scopes in the token but sets the default scopes provided here:
OAUTH2_DEFAULT_SCOPES = {

I believe by oauth client in Django admin. you mean the ApplicationAccess model in the edx_platform where we can specify scopes for Applications/Clients. I believe in the context of this API, this config will only allow us to filter the already added scopes and not insert new scopes in the payload.

This results in all live users using the {{lms_domain}}/oauth2/access_token/ url to create JWT tokens being unrecognizable by the ecommerce system.

Please let me know if I missed something, Thanks again!

@robrap
Copy link
Contributor

robrap commented Oct 16, 2023

@moeez96: Thank you for helping me understand the situation better. When external orgs request a token on behalf of one of their users, we explicitly were avoiding sending our user_id with the PII. That said, I couldn’t find the ADR or discussion around all of this, and it seems the design didn’t account for your current use case. This will require some research to ensure that the user id is only provided to the actual user who has the user id (or who could easily access the user id by other means).

@robrap
Copy link
Contributor

robrap commented Oct 16, 2023

  • Note that JWT cookies defaults to always add the user_id scope.
  • The current default scopes in the code you are trying to change intentionally doesn't include the user_id scope.
  • In the caller, we'll need to determine under what conditions it is ok to add user_id to the scope, unless we determine we wish to drop the scope altogether and replace it with your change, but there are security implications to that possibility that need to be discussed separately, as I attempted to discuss above with the external org example.

@moeez96
Copy link
Contributor Author

moeez96 commented Oct 17, 2023

  • I agree. Perhaps we could check if the grant_type equals password. If True, add user_id to the scope if not already present while creating the JWT.
  • I believe external orgs request tokens on behalf of their users using the client_credentials grant_type. And only the user will be aware of the password who can extract the user_id by other means.
  • Do you think this approach is feasible? Are there any other cases we should be mindful of?

@robrap
Copy link
Contributor

robrap commented Oct 17, 2023

I agree. Perhaps we could check if the grant_type equals password. If True, add user_id to the scope if not already present while creating the JWT.

I wish I could be more clear on where we should not be adding user_id, because maybe there would be a simpler solution for this overall. That said, I think your solution sounds great. With password grant type, add user_id scope as needed.

This would be a great time to capture an ADR regarding the past decision (as best as we can) and this updated decision as well. I am happy to help with that. It doesn't need to be perfect, but anything would be better than the current situation where none of this is documented.

@moeez96 moeez96 force-pushed the moeez96/LEARNER-9640 branch from 644f291 to ebbe087 Compare October 19, 2023 11:38
Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I made a bunch of proposals for the ADR. Let me know what you think.

Comment on lines 174 to 175
# The scope `user_id` must be added for requests with grant_type password.
scopes = _update_user_id_in_scopes(scopes or ['email', 'profile'], grant_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking more closely add the code, I think we should follow the same pattern of either using the explicitly defined scopes, or using the default scopes. Something like the following:

    # Default scopes should only contain non-privileged data.
    # Do not be misled by the fact that `email` and `profile` are default scopes. They
    # were included for legacy compatibility, even though they contain privileged data.
    if grant_type == Application.GRANT_PASSWORD:
        default_scopes = ['user_id', 'email', 'profile']
    else
        default_scopes = ['email', 'profile']
    scopes = scopes or default_scopes

All of this could be packaged into a private method if you wish, but it is more consistent (i.e. won't override explicitly set scopes) and keeps all the scopes code together. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JWT consumes scopes from the access_token payload. By default, scopes will almost never be empty since the default scopes are always added to the access_token (in case of missing explicitly defined scopes).

OAUTH2_DEFAULT_SCOPES = {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. [Nit] Thanks for the clarification. This is just an opinion, but I'd prefer seeing _update_user_id_in_scopes renamed to _get_updated_scopes (or some name like that), and packaging all the scope related code (default + user_id) in one place. If you feel strongly otherwise, no need to update.
  2. I can't think of a clean and simply way to create a new appropriate setting related to user_id, but wondering if we could at least add a comment before
    'user_id': _('Know your user identifier'),
    that reads something like:
# user_id is added in code as a default scope for JWT cookies and all password grant_type JWTs

@moeez96 moeez96 requested a review from robrap October 20, 2023 12:21
Comment on lines 177 to 178
expected_scopes = data['scope'].split(' ')
self._update_expected_scopes_with_user_id(expected_scopes, grant_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need help understanding how this isn't just covering up a problem. Shouldn't data which comes from the response contain all the necessary scopes? Something feels fishy about this fix, so maybe you can help explain it a bit better. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just the expected scopes.
The scopes from the response are actually stored in the body of the decoded access token itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bug:

  • The scopes in the response used to match the scopes in the JWT, but now they don't.
  • I think the fix would be that this comment should be removed, and that we now need to set the scopes in the response to the scopes in the JWT.

Let me know your thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be a misunderstanding here:

  • In the tests, Actual scopes are derived by decoding the token here.
  • What we are editing in the tests are the expected scopes, since we added the logic to add the user_id scope to grant_type password requests.
  • In the tests, we are not manipulating the scopes returned in the response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call the API from the command line using a password grant type, is the user_id in the scopes in the dict of the response? This dict should be in sync with the scopes in the JWT, and I’m guessing it’s not, but I could be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point now, the user_id scope is not added in the response to the API. Updated.

Copy link
Contributor

@robrap robrap Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATED: I'm glad we are on the same page about the API response scopes matching the JWT scopes, but I'm still confused about all the test changes. Shouldn't the test just show that these match, and if they don't match, why don't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out, reverted related tests changes.

Comment on lines 174 to 175
# The scope `user_id` must be added for requests with grant_type password.
scopes = _update_user_id_in_scopes(scopes or ['email', 'profile'], grant_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. [Nit] Thanks for the clarification. This is just an opinion, but I'd prefer seeing _update_user_id_in_scopes renamed to _get_updated_scopes (or some name like that), and packaging all the scope related code (default + user_id) in one place. If you feel strongly otherwise, no need to update.
  2. I can't think of a clean and simply way to create a new appropriate setting related to user_id, but wondering if we could at least add a comment before
    'user_id': _('Know your user identifier'),
    that reads something like:
# user_id is added in code as a default scope for JWT cookies and all password grant_type JWTs

@robrap robrap added the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Oct 20, 2023
@robrap
Copy link
Contributor

robrap commented Oct 20, 2023

@moeez96: Additionally, could we follow up this PR with a small edit to the Authentication ADR to mention/link to this new ADR from some appropriate place?

@moeez96
Copy link
Contributor Author

moeez96 commented Oct 23, 2023

could we follow up this PR with a small edit to the Authentication ADR

@robrap Can you confirm which Authentication ADR you are referring to here?

@robrap
Copy link
Contributor

robrap commented Oct 23, 2023

@robrap Can you confirm which Authentication ADR you are referring to here?

The newly added ADR in this PR.

@moeez96 moeez96 changed the title feat: Add user_id to JWT payload feat: Add scope user_id to JWT payload Oct 24, 2023
@moeez96
Copy link
Contributor Author

moeez96 commented Oct 24, 2023

I understand we aded a new ADR in this PR. Where do you suggest we mention/link this ADR to? In other words, which file are you suggesting a change in?

@robrap
Copy link
Contributor

robrap commented Oct 24, 2023

I understand we aded a new ADR in this PR. Where do you suggest we mention/link this ADR to? In other words, which file are you suggesting a change in?

Potentially as a sub-bullet of the JWT Authentication bullet here? https://open-edx-proposals.readthedocs.io/en/latest/best-practices/oep-0042-bp-authentication.html#oauth2-and-jwts. Note that this is not a blocker for this PR, but just an idea to make it easier to learn about this.

Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about all the comments, but this code is brittle and needs to be secure, so I just want to ensure it works as expected. There is still something about the test code that doesn't add up for me.

@@ -79,10 +80,11 @@ def create_jwt_token_dict(token_dict, oauth_adapter, use_asymmetric_key=None):
# .. custom_attribute_name: create_jwt_grant_type
# .. custom_attribute_description: The grant type of the newly created JWT.
set_custom_attribute('create_jwt_grant_type', grant_type)
scopes = token_dict['scope'].split(' ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this should work similarly to jwt_expires_in, where it is processed once here and used twice below, in the JWT and the response dict.
Note: You may want to leave the old scopes = scopes or ['email', 'profile'] as it was in _create_jwt just in case, but otherwise, this will make it more clear that these two values should be matching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I’d do this in one statement, but it is just a personal preference, so you can decide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Altered

Comment on lines 95 to -94
jwt_token_dict = token_dict.copy()
# Note: only "scope" is not overwritten at this point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is true that we are now overwriting all response values, we no longer need to copy the dict. We could just have:

jwt_token_dict = {
    ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refresh_token is still not overwritten in token_dict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional thoughts:

  1. Can you restore the comment, and mention the refresh_token?
  2. I’m wondering if it would be best if the refresh_token had the extra scope as well, and if we didn’t need to add it here, because it would have been added earlier?
  3. If we called the api with the three password grant scopes explicitly (as a test from the command line), does it work or complain about the user_id request? If it complains, does that indicate where the scope update should happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Comment restored.
  2. The scope is only added in the JWT, it is not added in the access token or refresh token since it's not required there in this use case.
  3. If we call the API with payload having scopes set to grant types email, profile and user_id, it works in the same fashion and does not error out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #2 would be more accurate/consistent, because the refresh token would match the actual scopes.
That said, I agree that this shouldn't be a blocker, so I added a potential code comment just to capture what is going on.

Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor improvements noted. Feel free to merge once they are complete. Thank you!

@@ -91,11 +93,12 @@ def create_jwt_token_dict(token_dict, oauth_adapter, use_asymmetric_key=None):
)

jwt_token_dict = token_dict.copy()
# Note: only "scope" is not overwritten at this point.
# Note: only "refresh_token" is not overwritten at this point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a possible code comment trail if we decide to ignore #2 from this comment: https://github.com/openedx/edx-platform/pull/33455/files#r1372861173

Suggested change
# Note: only "refresh_token" is not overwritten at this point.
# Note: only "refresh_token" is not overwritten at this point.
# At this time, the user_id scope added for grant type password is only added to the
# JWT, and is not added for the DOT access token or refresh token, so we must override
# here. If this inconsistency becomes an issue, then the user_id scope should be
# added earlier with the DOT tokens, and we would no longer need to override "scope".

Comment on lines 95 to -94
jwt_token_dict = token_dict.copy()
# Note: only "scope" is not overwritten at this point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #2 would be more accurate/consistent, because the refresh token would match the actual scopes.
That said, I agree that this shouldn't be a blocker, so I added a potential code comment just to capture what is going on.

@@ -325,7 +325,7 @@ def test_jwt_access_token_scopes_and_filters(self, grant_type):
self.assert_valid_jwt_access_token(
data['access_token'],
self.user,
scopes,
data['scope'].split(' '),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want an additional assertion against the original scopes, to ensure that all 3 match:

  1. The scopes sent to the request.
  2. The scopes in the response.
  3. The scopes in the JWT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scopes in the response and the scopes in the JWT will match.
The scopes sent to the request will be a subset of the scopes in the response in this case, since we add user_id in the response scopes on purpose in case of grant_type=password.
Assertion added.

@moeez96 moeez96 merged commit ba1f382 into master Oct 30, 2023
63 checks passed
@moeez96 moeez96 deleted the moeez96/LEARNER-9640 branch October 30, 2023 02:57
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants