Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend running as non-root user cannot kill child processes after timeout #5536

Closed
dpward opened this issue Mar 15, 2021 · 12 comments
Closed
Assignees
Labels
Closed: Fixed Issue was closed as fixed. Work in Progress

Comments

@dpward
Copy link
Contributor

dpward commented Mar 15, 2021

Each time the backend creates a krb5_child process, it schedules a timeout event. If the child process is still running when this timeout expires, the backend will send SIGKILL to it.

However when the backend is running as a non-root user, it does not have permission to do that:

[be[DOMAIN]] [krb5_child_timeout] (0x0040): Timeout for child [55046] reached. In case KDC is distant or network is slow you may consider increasing value of krb5_auth_timeout.
[be[DOMAIN]] [krb5_child_timeout] (0x0020): kill failed [1][Operation not permitted].
[be[DOMAIN]] [krb5_auth_done] (0x0020): child timed out!

The reason is described in the kill(2) man page.

  For a process to have permission to send a signal, it must either
  be privileged (under Linux: have the CAP_KILL capability in the
  user namespace of the target process), or the real or effective
  user ID of the sending process must equal the real or saved set-
  user-ID of the target process.

This condition is not being met. The backend does not have CAP_KILL when it is running as a non-root user. krb5_child changes both its real user ID and saved set-user-ID when it runs (although there is special handling for PKINIT).

Afterwards, the backend will proceed to launch a second krb5_child process, which actually will execute in parallel with the first one. Notice the interleaved log messages between two different PIDs here, which are both trying to accomplish the same thing:

[krb5_child[55075]] [sss_child_krb5_trace_cb] (0x4000): [55075] 1615784955.833998: Sending request (302 bytes) to DOMAIN
[krb5_child[55046]] [sss_child_krb5_trace_cb] (0x4000): [55046] 1615784955.814917: Initiating TCP connection to stream ADDRESS:PORT
[krb5_child[55046]] [sss_child_krb5_trace_cb] (0x4000): [55046] 1615784955.814918: Sending TCP request to stream ADDRESS:PORT
[krb5_child[55075]] [sss_child_krb5_trace_cb] (0x4000): [55075] 1615784955.833999: Initiating TCP connection to stream ADDRESS:PORT
[krb5_child[55046]] [sss_child_krb5_trace_cb] (0x4000): [55046] 1615784955.814919: Received answer (2094 bytes) from stream ADDRESS:PORT
[krb5_child[55046]] [sss_child_krb5_trace_cb] (0x4000): [55046] 1615784955.814920: Terminating TCP connection to stream ADDRESS:PORT
[krb5_child[55075]] [sss_child_krb5_trace_cb] (0x4000): [55075] 1615784955.834000: Sending TCP request to stream ADDRESS:PORT

I'm not sure which approach is preferable for fixing this.

@dpward
Copy link
Contributor Author

dpward commented Apr 13, 2021

@sumit-bose @simo5 Thoughts on this?

In this situation we have sssd_be running as the sssd user account, not as root. What if we allowed krb5_child to run with sssd as the saved set-user-ID? It would be possible for the process to change its real and effective user ID to sssd — but that is different from it being able to change to root.

@alexey-tikhonov
Copy link
Member

alexey-tikhonov commented Apr 13, 2021

In this situation we have sssd_be running as the sssd user account, not as root.

Is this the only issue you face when trying to run sssd as non-root user?

Unfortunately, this feature isn't "production ready" in general.

For example, see https://lists.fedorahosted.org/archives/list/[email protected]/thread/CRMNX6NYDKBXY2ACKARU4LMOJRTNZ265/

@dpward
Copy link
Contributor Author

dpward commented Apr 13, 2021

Actually, yes, this has generally been working (RHEL 8). I have run it this way for a month or so.

This is the one issue I have encountered so far that is related to running SSSD as a non-root user, although I may be missing others. I see the reports you linked to; but being able to run SSSD in a container with no root account seems like a very specialized use case. There was a request to automate SSSD configuration for running as a non-root user, but these configuration changes can be applied manually right now.

This particular issue is different. The code here needs adjusting somehow in order to handle running as a non-root user, in what is otherwise a very typical deployment. But there is more than one approach that would resolve this, and it's not obvious which one might be preferred. It would also be possible to add CAP_KILL to the sssd_be process instead of what I described earlier; but my assumption is that it would be preferable not to grant the process that capability.

@sumit-bose
Copy link
Contributor

Hi,

I think there are two items consider. First, if you are using FAST or Kerberos ticket validation krb5_child must be able to read a keytab, typically /etc/krb5.keytab. So in this case you either have to make sure that either /etc/krb5.keytab can be read by the SSSD user or use a dedicated keytab file readable by the SSSD user. Reading the keytab at runtime when running as SSSD user is an issue at other places as well, so we have to find implement some better solution here, using gssproxy might be a way to solve this.

Second, krb5_child will try to become the user trying to log in to make sure the credential cache is written with the right ownership and permissions. If I understand it correctly having the set-UID as SSSD user should be sufficient for krb5_child to receive the signal in this case, but it should be tested as well.

HTH

bye,
Sumit

@simo5
Copy link
Contributor

simo5 commented Apr 14, 2021

In theory we could transfer the krbtgt received by krb5_child back to pam_sss via the pam conversation back and let pam_sss it write it out as the user, but there are several chicken/egg issues that may arise from such a change in workflow as well.

@simo5
Copy link
Contributor

simo5 commented Apr 14, 2021

Access to the keytab is the real blocking issue.

@dpward
Copy link
Contributor Author

dpward commented Apr 14, 2021

Access to the keytab is the real blocking issue.

Would this be a reasonable solution?

%pre common
setfacl -m u:sssd:r /etc/krb5.keytab

%postun common
setfacl -x u:sssd /etc/krb5.keytab

I don't know if that is oversimplifying things, or if there would be benefits to using gssproxy instead.

@simo5
Copy link
Contributor

simo5 commented Apr 14, 2021

No, for a few reasons.

First of all at install time there may be no keytab, often systems are joined to a domain after sssd has already been installed.

Secondarily if the system (or the admin) makes a change to the keytab (like obtaining a new one) and doesn't carefully preserve that permission, you suddenly get users locked out, unable to login, and figuring out why will be painful.

Additionally, of course, giving access to sssd to the keytab directly gives another avenue to the sssd user to circle around and elevate privileges by forging a ticket and logging into the system. But that is already the case as sssd can forge a password check anyway, so perhaps not hugely important.

Perhaps having a systemd preexec script run by the sssd unit that fixes those permissions could help, but it would still require a reboot, or at least a restart of the unit to fix a system where the keytab file has lost that ACL.

@alexey-tikhonov alexey-tikhonov self-assigned this Aug 8, 2023
@alexey-tikhonov alexey-tikhonov removed their assignment Aug 8, 2023
@alexey-tikhonov
Copy link
Member

In this situation we have sssd_be running as the sssd user account, not as root. What if we allowed krb5_child to run with sssd as the saved set-user-ID?

This works.

Besides, when 'sssd_be' run under 'sssd' starts suid binary 'krb5_child', "Real ID" is still set to 'sssd'. This is also enough to allow 'sssd_be' to signal 'krb5_child'.

But:

  • this additional logic for PKINIT complicates things
  • I guess there are might be similar issues with all other suid childs

It would be good to have a uniform solution.

Among other things, we could replace 'suid' bit with file capability 'cap_setuid,cap_setgid+ep' and probably it's possible to lock process within this set from the very beginning using SECBIT_NOROOT etc, so even setresuid(0, 0, 0) wouldn't grant all caps.

@alexey-tikhonov
Copy link
Member

alexey-tikhonov commented Jan 9, 2024

Taking a closer look, I guess there are two points where this issue can be observed:

(1) FAST+KEYTAB path, when krb5_child forked child process to get fast_ccache and awaits its completion:
https://github.com/SSSD/sssd/blob/sssd-2-9/src/providers/krb5/krb5_child.c#L3306
-- IIUC, at this point krb5_child still runs under 0:0
This probably can be fixed via switching creds to fast_uid:fast_uid earlier (IIUC, it should be done later anyway).

(2) No-PKINIT path, when krb5_child switches to user_uid:user_gid right after reading keytab into memory.
This can be fixed by keeping save-uid==fast_uid.
The caveat here is that this allows possibility to set e-uid back to fast_uid, that might be 0 (root) if SSSD in general is configured to run under root.
But that's not worse than PKINIT path and if somebody wants hardening they need to run SSSD under SSSD_USER anyway.

@alexey-tikhonov
Copy link
Member

(1) FAST+KEYTAB path, when krb5_child forked child process to get fast_ccache and awaits its completion:
https://github.com/SSSD/sssd/blob/sssd-2-9/src/providers/krb5/krb5_child.c#L3306
-- IIUC, at this point krb5_child still runs under 0:0

But real-uid should be SSSD_USER at this point, so this actually leaves only (2)

alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Jan 30, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Jan 31, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Jan 31, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 1, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 1, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 9, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 12, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 12, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 13, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 14, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 19, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 19, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 21, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 26, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Feb 29, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Mar 7, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Mar 13, 2024
There are some known issues like SSSD#5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).
alexey-tikhonov added a commit that referenced this issue Mar 18, 2024
There are some known issues like #5536 but those have to be
solved differently. Having 'CAP_KILL' in sssd.service doesn't
help anyway (and currently isn't used anyhow).

Reviewed-by: Justin Stephenson <[email protected]>
Reviewed-by: Pavel Březina <[email protected]>
Reviewed-by: Sumit Bose <[email protected]>
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Apr 10, 2024
Keep saved set-user-ID in `k5c_become_user()` so that 'sssd_be'
running under SSSD_USER could signal it.

Resolves: SSSD#5536
@alexey-tikhonov alexey-tikhonov self-assigned this Apr 10, 2024
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Apr 10, 2024
Keep saved set-user-ID in `k5c_become_user()` so that 'sssd_be'
running under SSSD_USER could signal it.

Resolves: SSSD#5536
alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Apr 30, 2024
Keep saved set-user-ID in `k5c_become_user()` so that 'sssd_be'
running under SSSD_USER could signal it.

Resolves: SSSD#5536
@alexey-tikhonov
Copy link
Member

Pushed PR: #7285

  • master
    • 2891e74 - KRB5_CHILD: keep 'set-user-ID' in k5c_become_user()
    • b3a487a - LDAP_CHILD: replace become_user() with sss_drop_all_caps()

@alexey-tikhonov alexey-tikhonov added the Closed: Fixed Issue was closed as fixed. label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed: Fixed Issue was closed as fixed. Work in Progress
Projects
None yet
Development

No branches or pull requests

4 participants