Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shibboleth: multiple email addresses are stored as a single invalid address separated by a semicolon #2512

Closed
bencomp opened this issue Sep 14, 2015 · 21 comments
Assignees
Milestone

Comments

@bencomp
Copy link
Contributor

bencomp commented Sep 14, 2015

When I log in via Shibboleth and my Identity Provider (IdP) provides multiple email addresses separated by semicolon, the concatenation is treated as the email address. The login process works fine, but when I create a dataverse and click "Save", the pre-filled email address doesn't pass validation.

In the database I see [email protected];[email protected] in the email field.

I expect that one of the email addresses is chosen and stored. This choice could be the result of asking the user during account creation (after the first login), or perhaps on every login the first email is compared to the known address and updated when it changed (depending on the semantics of the ordering). This appears to be similar to #1608.

@pdurbin
Copy link
Member

pdurbin commented Sep 14, 2015

The login process works fine, but when I create a dataverse and click "Save", the pre-filled email address doesn't pass validation.

This is the part that concerns me the most. The application shouldn't allow an invalid email address to be persisted. The database entity BuiltinUser has @ValidateEmail(message = "Please enter a valid email address.") but AuthenticatedUser does not. It should probably be added for consistency between BuiltIn and Shibboleth accounts.

I can confirm that I'm seeing the same issue @bencomp reported. As I was saying at http://irclog.iq.harvard.edu/dataverse/2015-09-14 the quick fix in #1608 for names was to sort the multiple givenNames and just pick the first one but I agree that letting the user make a selection would be nicer.

Here's a screenshot of the error:

new_dataverse_-_2015-09-14_10 41 04

@pdurbin
Copy link
Member

pdurbin commented Sep 14, 2015

@michbarsinai since you know AuthenticatedUser best can you please take a look at 72ca330 at let me know if there is any problem with making AuthenticatedUser consistent with BuiltinUser in terms of using the same @ValidateEmail annotation? @scolapasta if you could take a look at this too, I'd appreciate it. I'm thinking we could make this change in 4.3 since I don't want to disrupt the 4.2 release train.

If you uncomment the @ValidateEmail line I'm proposing in that commit and run curl http://localhost:8080/api/admin/settings/:DebugShibAccountType -X PUT -d TWO_EMAILS and visit http://localhost:8080/shib.xhtml per http://guides.dataverse.org/en/latest/developers/dev-environment.html#shibboleth you should be able to preview the attempt to save two email addresses into a field that shouldn't allow it:

shib_-_2015-09-14_15 25 48

@pdurbin
Copy link
Member

pdurbin commented Sep 15, 2015

I played around with some ideas in a branch at 050249a that I'd like @scolapasta (and ideally @michbarsinai and @sekmiller) to review. Details are in the commit message and javadoc.

@eaquigley and @mcrosas we should probably think about how to handle multiple email addresses and multiple names coming in from Shibboleth. I added a related comment in the 2015-04-30 Dataverse User Accounts and Auth Meeting doc. Above @bencomp has suggested building a GUI/workflow for allowing users to select their preferred email address and name when multiple are asserted by Shibboleth.

The commit above is more of a minimum viable fix. Consistent with what we do for multiple first names being asserted, in the proposed commit, I sort them and pick the first one. It sounds like @bencomp isn't especially concerned about seeing multiple email addresses in production but https://www.incommon.org/federation/attributesummary.html does indicate that we should prepare for the "mail" attribute to be multivalued.

@pdurbin pdurbin assigned scolapasta and unassigned pdurbin Sep 15, 2015
@michbarsinai
Copy link
Member

Whether AuthenticatedUser should have multiple accounts or not, is really a requirement issue - I don't see any problem with that, except for the obvious added complexity. Is there any scenario that can't be supported by a single email account?

The Shib "driver" should basically take any data coming from a Shib server and translate it into an AuthenticatedUser record, removing data if needed. It's always better to have a GUI flow, as suggested by @bencomp, but automatic resolution (as implemented by @pdurbin) also works well. The point here is that the "source of truth" about the user's email is really the Shib server, not their AuthenitcatedUserrecord. Moreover, if the email given by the Shib IdP changes on a subsequent login, the email on the AuthenticatedUser should change as well*.

  • There's a possible issue here when Shib users update their info manually. In this case, we should probably not replace the manually entered data automatically. We should, however, provide a way to restart the automatic updates. Maybe a checkbox on the user info page, labeled "Automatically update my details from my institutional directory" would work.

@pdurbin
Copy link
Member

pdurbin commented Sep 15, 2015

@michbarsinai another way to think about this: what if I remove the @ValidateEmail annotation from BuiltinUser? It would be another way to achieve consistency with AuthenticatedUser but it sure seems like a bad idea. I haven't looked but I'm pretty sure all the code currently assumes the "email" field in AuthenticatedUser only contains a single valid email. I doubt it splits on semicolon.

Also, while as of this writing (version 4.2) Dataverse will copy the email from Shibboleth directly into the AuthenticatedUser table on each login, overwriting previous values (as specified for 4.0), we want to change how this works in a future version. @mcrosas and I talked about how we'd like Shibboleth to provide the initial population of the email address, we'd like Shibboleth users change their email address (as well as other fields such as username) at will within the Dataverse app.

To support all this, we'll need some more database tables, obviously. And while we're talking about email addresses, in #2170 there's an open question: "Perhaps we wouldn't need to confirm email addresses from institutions." @bencomp or anyone running Shibboleth, if you have an opinion on that, please comment on #2170.

@michbarsinai
Copy link
Member

Note that while BuiltinUser and AuthenitcatedUser look similar, they have very different semantics. BuiltinUser is really a part of a minimal IdP bundled with Dataverse. This IdP can theoretically go away, and can effectively be overridden (by changing the signup link and removing it as an authentication option in the settings table). AuthenticatedUser is a user in a Dataverse installation, that can submit Command objects to the Dataverse engine. It is the product of a trusted IdP identifying one of its registered users.
Thus, I'm not sure we can have compatibility between the two.

As the email field in AuthenticatedUser is really a product of code, it needs to be validated by the code producing it. We can add another validation there, but I think it would be hard to handle violations looking at an invalid AuthenticatedUser object. The code creating that object should ensure the object it creates is valid.

@pdurbin
Copy link
Member

pdurbin commented Sep 15, 2015

Yes, as part of 050249a (ideas, not merged) I refactored EMailValidator so I can call it from the Shib code before it gets persisted to the AuthenticatedUser table. (It also allows to write unit tests of just the email validation logic.) In fact, if we don't get a valid email address from Shibboleth, we should tell the user "The SAML assertion contained an invalid email address" like I put in the commit (but we can make the language friendlier) and perhaps prompt them to enter a valid email address (hmm, which should be validated, since it's user-entered, per #2170).

I'm a fan of belt and suspenders. Or a backstop, if you like. Yes, let's make the Shib code validate the email address but let's also put the @ValidateEmail annotation on the email field of AuthenticatedUser. What we don't want is this kind of stuff coming out of the AuthenticatedUser table:

464dd85e-5ad0-11e5-88ce-b0cc8ad2817a png_ png_image _1024_x_737_pixels _-_2015-09-15_16 12 19

@michbarsinai
Copy link
Member

I'm fine with adding an annotation there, as long as it's the second line of defence :-)

@mercecrosas mercecrosas modified the milestone: In Review Nov 30, 2015
@scolapasta scolapasta removed their assignment Jan 27, 2016
@scolapasta scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016
@pdurbin pdurbin self-assigned this Feb 4, 2016
@pdurbin
Copy link
Member

pdurbin commented Feb 26, 2016

In the database I see [email protected];[email protected] in the email field.

I expect that one of the email addresses is chosen and stored. This choice could be the result of asking the user during account creation (after the first login), or perhaps on every login the first email is compared to the known address and updated when it changed (depending on the semantics of the ordering). This appears to be similar to #1608.

@bencomp I want to make it clear to you and others interested in this issue that for "phase 1" effort underway in #2939 I am not planning on building any sort of GUI nor add an additional database table to handle any of the workflows you've suggested. Sorry. We're trying to keep this effort small and focused. My plan is to prevent multiple email addresses from being stored as a single address by enforcing the same Bean Validation rules we use elsewhere. Also, as you've noted, I'll probably use the same technique as #1608 where I simply sort the (unexpected) multiple values and pick the first one. Why Identity Providers think they're adding value by punting the choice to the Service Provider is beyond me. I'd much rather the Identity Provider hand me a single sane value for givenName or email or whatever.

@mheppler @eaquigley if you want to start working on mockups of what some future GUI would look like to let users choose between "Eleni" and "Elleni" when givenName: Eleni;Elleni comes over the wire, that's fine. See also the 2015-04-30 Dataverse User Accounts and Auth Meeting doc and the more recent "Allow to create dataverse groups based on attributes from the identity provider..." requirement from phase 2 of the Remote Authentication Business Requirements Document (BRD) [Update: nevermind, this is about #1515 ]. If you want to go ahead and create an issue for tracking purposes please feel free. I'd link to it from this issue before it's closed.

@bencomp
Copy link
Contributor Author

bencomp commented Feb 26, 2016

@pdurbin the first point of phase 2 of the BRD is #1515, isn't it?

@pdurbin
Copy link
Member

pdurbin commented Feb 26, 2016

@bencomp bah! You're right! Sorry! Doing too many things at once. Probably best to capture the idea in a new GitHub issue then. Please feel free to create one if you're so inclined!

@bencomp
Copy link
Contributor Author

bencomp commented Feb 26, 2016

"the idea" being (in user story form) "a user with multiple email addresses can choose one", right? I'm not really inclined to open this right now :)

@pdurbin
Copy link
Member

pdurbin commented Feb 26, 2016

@bencomp yes. No worries. There's a UI component to this so @eaquigley or @mheppler would be in a better position to define the requirement and open the new issue. To be clear, this is outside the scope of phase 1. It would go under "Issues that are definitely out of scope" at #2939.

@pdurbin
Copy link
Member

pdurbin commented Mar 1, 2016

I'm working on this issue now and I just wanted to add a side note that the email address رونیکا.محمدخان@example.com is not considered valid but it comes from https://randomuser.me which I use for testing Shibboleth (documented as "RANDOM" in the Dev Guide). In practice, we don't see right-to-left email addresses like this.

@michbarsinai
Copy link
Member

Just for the record, I've never seen a Hebrew email address either.

On 1 Mar 2016, at 22:40, Philip Durbin [email protected] wrote:

I'm working on this issue now and I just wanted to add a side note that the email address رونیکا.محمدخان@example.com is not considered valid but it comes from https://randomuser.me https://randomuser.me/ which I use for testing Shibboleth (documented as "RANDOM" in the Dev Guide). In practice, we don't see right-to-left email addresses like this.


Reply to this email directly or view it on GitHub #2512 (comment).

pdurbin added a commit that referenced this issue Mar 2, 2016
- Put email addresses throught the same "find single value" logic
  originally developed in #1608 for multiple first and last names.
- Add `@ValidateEmail` to the "email" field on AuthenticatedUser to
  match BuiltinUser.
- Add null check added to EmailValidator to make it testable.
- Add `INVALID_EMAIL` and `MISSING_REQUIRED_ATTR` modes for Shib testing
  in dev.
- Remove red warning when TestShib doesn't provide "mail" attribute.
- Catch authSvc.createAuthenticatedUser exceptions and handle errors
  better.
- Reformat code (getPrettyFacesHomePageString seems ok).
@pdurbin
Copy link
Member

pdurbin commented Mar 2, 2016

In ffd1d6e I implemented a fix:

  • Put email addresses throught the same "find single value" logic
    originally developed in Shibboleth: handle Identity Providers that provide multiple first or last names separated by semicolons in indeterminate order #1608 for multiple first and last names.
  • Add @ValidateEmail to the "email" field on AuthenticatedUser to
    match BuiltinUser.
  • Add null check added to EmailValidator to make it testable.
  • Add INVALID_EMAIL and MISSING_REQUIRED_ATTR modes for Shib testing
    in dev.
  • Remove red warning when TestShib doesn't provide "mail" attribute.
  • Catch authSvc.createAuthenticatedUser exceptions and handle errors
    better.
  • Reformat code (getPrettyFacesHomePageString seems ok).

To test this, curl http://localhost:8080/api/admin/settings/:DebugShibAccountType -X PUT -d TWO_EMAILS and then look for the following in server.log:

"Multiple email addresses were asserted by the Identity Provider ([email protected];[email protected] ). These were sorted and the first was chosen: [email protected]"

This is very much a dev mode thing (documented at http://guides.dataverse.org/en/4.2.4/developers/dev-environment.html#shibboleth ) so servers should be set to normal with curl -X DELETE http://localhost:8080/api/admin/settings/:DebugShibAccountType after testing.

I believe this issue is ready for QA. I showed it to @eaquigley (we decided to not show a message that the system is picking an email address for you from multiple values).

@pdurbin
Copy link
Member

pdurbin commented Mar 3, 2016

miché[email protected] is another example of an email address from https://randomuser.me that fails our bean validation:

screen shot 2016-03-03 at 9 50 25 am

I'm not sure if we should be worried about this or not.

@pdurbin
Copy link
Member

pdurbin commented Mar 4, 2016

I mentioned this email validation issue at yesterday's standup and we agreed to create a separate issue for it, which I just did: #2998

@pdurbin pdurbin changed the title Shibboleth: multiple email addresses are stored as a single one Shibboleth: multiple email addresses are stored as a single invalid address separated by a semicolon Mar 18, 2016
@pdurbin
Copy link
Member

pdurbin commented Mar 18, 2016

I'm passing this issue to QA but it's going to be tricky to test. In dev I use "localhost" per http://guides.dataverse.org/en/4.2.4/developers/dev-environment.html#shibboleth but on a real server running Shibboleth you get shibsp::ConfigurationException if you put Dataverse in dev mode using the :DebugShibAccountType "TWO_EMAILS" setting and try to visit https://shibtest.dataverse.org/shib.xhtml . I got myself all turned around and confused on this the other day at http://irclog.perlgeek.de/shibboleth/2016-03-15

Since @bencomp originally reported this issue I reached out to him at http://irclog.iq.harvard.edu/dataverse/2016-03-18#i_32926 to see if he can grab the war file at https://build.hmdc.harvard.edu:8443/job/shibtest.dataverse.org-build-2939-shib/1/edu.harvard.iq$dataverse/artifact/edu.harvard.iq/dataverse/4.3/dataverse-4.3.war and try it out (you need to upgrade to 4.3 first: https://github.com/IQSS/dataverse/releases/tag/v4.3 ). Of course anyone who is reading this is welcome to test out that war file as well!

It would be nice if we controlled a Shibboleth Identity Provider (IdP) so we could more easily do the QA ourselves but I don't know how to set one up. :(

The bottom line for this issue is that we are using the same logic (and code path) as the fix for #1608 where we sort the multiple values received and pick the first one. The fix is in pull request #3025.

@pdurbin pdurbin assigned kcondon and unassigned pdurbin Mar 18, 2016
@pdurbin
Copy link
Member

pdurbin commented Mar 30, 2016

@kcondon as I mentioned one way to test this is to put the server in "dev mode" by commenting out the the Apache Shib stuff like this:

#<Location /shib.xhtml>
#  AuthType shibboleth
#  ShibRequestSetting requireSession 1
#  require valid-user
#  ShibUseHeaders On
#</Location>

Then, to test this particular bug:

curl http://localhost:8080/api/admin/settings/:DebugShibAccountType -X PUT -d TWO_EMAILS

When you're done:

curl http://localhost:8080/api/admin/settings/:DebugShibAccountType -X DELETE

And put Apache back, of course.

@kcondon
Copy link
Contributor

kcondon commented Mar 30, 2016

Tested by stubbing out a IdP response and it correctly chose the first email of a sorted list and logged that fact. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants