Backend running as non-root user cannot kill child processes after timeout #5536

dpward · 2021-03-15T06:55:44Z

Each time the backend creates a krb5_child process, it schedules a timeout event. If the child process is still running when this timeout expires, the backend will send SIGKILL to it.

However when the backend is running as a non-root user, it does not have permission to do that:

[be[DOMAIN]] [krb5_child_timeout] (0x0040): Timeout for child [55046] reached. In case KDC is distant or network is slow you may consider increasing value of krb5_auth_timeout.
[be[DOMAIN]] [krb5_child_timeout] (0x0020): kill failed [1][Operation not permitted].
[be[DOMAIN]] [krb5_auth_done] (0x0020): child timed out!

The reason is described in the kill(2) man page.

  For a process to have permission to send a signal, it must either
  be privileged (under Linux: have the CAP_KILL capability in the
  user namespace of the target process), or the real or effective
  user ID of the sending process must equal the real or saved set-
  user-ID of the target process.

This condition is not being met. The backend does not have CAP_KILL when it is running as a non-root user. krb5_child changes both its real user ID and saved set-user-ID when it runs (although there is special handling for PKINIT).

Afterwards, the backend will proceed to launch a second krb5_child process, which actually will execute in parallel with the first one. Notice the interleaved log messages between two different PIDs here, which are both trying to accomplish the same thing:

[krb5_child[55075]] [sss_child_krb5_trace_cb] (0x4000): [55075] 1615784955.833998: Sending request (302 bytes) to DOMAIN
[krb5_child[55046]] [sss_child_krb5_trace_cb] (0x4000): [55046] 1615784955.814917: Initiating TCP connection to stream ADDRESS:PORT
[krb5_child[55046]] [sss_child_krb5_trace_cb] (0x4000): [55046] 1615784955.814918: Sending TCP request to stream ADDRESS:PORT
[krb5_child[55075]] [sss_child_krb5_trace_cb] (0x4000): [55075] 1615784955.833999: Initiating TCP connection to stream ADDRESS:PORT
[krb5_child[55046]] [sss_child_krb5_trace_cb] (0x4000): [55046] 1615784955.814919: Received answer (2094 bytes) from stream ADDRESS:PORT
[krb5_child[55046]] [sss_child_krb5_trace_cb] (0x4000): [55046] 1615784955.814920: Terminating TCP connection to stream ADDRESS:PORT
[krb5_child[55075]] [sss_child_krb5_trace_cb] (0x4000): [55075] 1615784955.834000: Sending TCP request to stream ADDRESS:PORT

I'm not sure which approach is preferable for fixing this.

The text was updated successfully, but these errors were encountered:

dpward · 2021-04-13T17:55:10Z

@sumit-bose @simo5 Thoughts on this?

In this situation we have sssd_be running as the sssd user account, not as root. What if we allowed krb5_child to run with sssd as the saved set-user-ID? It would be possible for the process to change its real and effective user ID to sssd — but that is different from it being able to change to root.

alexey-tikhonov · 2021-04-13T18:07:09Z

In this situation we have sssd_be running as the sssd user account, not as root.

Is this the only issue you face when trying to run sssd as non-root user?

Unfortunately, this feature isn't "production ready" in general.

For example, see https://lists.fedorahosted.org/archives/list/[email protected]/thread/CRMNX6NYDKBXY2ACKARU4LMOJRTNZ265/

dpward · 2021-04-13T18:39:49Z

Actually, yes, this has generally been working (RHEL 8). I have run it this way for a month or so.

This is the one issue I have encountered so far that is related to running SSSD as a non-root user, although I may be missing others. I see the reports you linked to; but being able to run SSSD in a container with no root account seems like a very specialized use case. There was a request to automate SSSD configuration for running as a non-root user, but these configuration changes can be applied manually right now.

This particular issue is different. The code here needs adjusting somehow in order to handle running as a non-root user, in what is otherwise a very typical deployment. But there is more than one approach that would resolve this, and it's not obvious which one might be preferred. It would also be possible to add CAP_KILL to the sssd_be process instead of what I described earlier; but my assumption is that it would be preferable not to grant the process that capability.

sumit-bose · 2021-04-14T06:45:23Z

Hi,

I think there are two items consider. First, if you are using FAST or Kerberos ticket validation krb5_child must be able to read a keytab, typically /etc/krb5.keytab. So in this case you either have to make sure that either /etc/krb5.keytab can be read by the SSSD user or use a dedicated keytab file readable by the SSSD user. Reading the keytab at runtime when running as SSSD user is an issue at other places as well, so we have to find implement some better solution here, using gssproxy might be a way to solve this.

Second, krb5_child will try to become the user trying to log in to make sure the credential cache is written with the right ownership and permissions. If I understand it correctly having the set-UID as SSSD user should be sufficient for krb5_child to receive the signal in this case, but it should be tested as well.

HTH

bye,
Sumit

simo5 · 2021-04-14T17:05:29Z

In theory we could transfer the krbtgt received by krb5_child back to pam_sss via the pam conversation back and let pam_sss it write it out as the user, but there are several chicken/egg issues that may arise from such a change in workflow as well.

simo5 · 2021-04-14T17:05:54Z

Access to the keytab is the real blocking issue.

dpward · 2021-04-14T18:31:15Z

Access to the keytab is the real blocking issue.

Would this be a reasonable solution?

%pre common
setfacl -m u:sssd:r /etc/krb5.keytab

%postun common
setfacl -x u:sssd /etc/krb5.keytab

I don't know if that is oversimplifying things, or if there would be benefits to using gssproxy instead.

simo5 · 2021-04-14T19:39:14Z

No, for a few reasons.

First of all at install time there may be no keytab, often systems are joined to a domain after sssd has already been installed.

Secondarily if the system (or the admin) makes a change to the keytab (like obtaining a new one) and doesn't carefully preserve that permission, you suddenly get users locked out, unable to login, and figuring out why will be painful.

Additionally, of course, giving access to sssd to the keytab directly gives another avenue to the sssd user to circle around and elevate privileges by forging a ticket and logging into the system. But that is already the case as sssd can forge a password check anyway, so perhaps not hugely important.

Perhaps having a systemd preexec script run by the sssd unit that fixes those permissions could help, but it would still require a reboot, or at least a restart of the unit to fix a system where the keytab file has lost that ACL.

alexey-tikhonov · 2023-08-29T12:55:43Z

In this situation we have sssd_be running as the sssd user account, not as root. What if we allowed krb5_child to run with sssd as the saved set-user-ID?

This works.

Besides, when 'sssd_be' run under 'sssd' starts suid binary 'krb5_child', "Real ID" is still set to 'sssd'. This is also enough to allow 'sssd_be' to signal 'krb5_child'.

But:

this additional logic for PKINIT complicates things
I guess there are might be similar issues with all other suid childs

It would be good to have a uniform solution.

Among other things, we could replace 'suid' bit with file capability 'cap_setuid,cap_setgid+ep' and probably it's possible to lock process within this set from the very beginning using SECBIT_NOROOT etc, so even setresuid(0, 0, 0) wouldn't grant all caps.

alexey-tikhonov · 2024-01-09T18:25:46Z

Taking a closer look, I guess there are two points where this issue can be observed:

(1) FAST+KEYTAB path, when krb5_child forked child process to get fast_ccache and awaits its completion:
https://github.com/SSSD/sssd/blob/sssd-2-9/src/providers/krb5/krb5_child.c#L3306
-- IIUC, at this point krb5_child still runs under 0:0
This probably can be fixed via switching creds to fast_uid:fast_uid earlier (IIUC, it should be done later anyway).

(2) No-PKINIT path, when krb5_child switches to user_uid:user_gid right after reading keytab into memory.
This can be fixed by keeping save-uid==fast_uid.
The caveat here is that this allows possibility to set e-uid back to fast_uid, that might be 0 (root) if SSSD in general is configured to run under root.
But that's not worse than PKINIT path and if somebody wants hardening they need to run SSSD under SSSD_USER anyway.

alexey-tikhonov · 2024-01-10T08:56:51Z

(1) FAST+KEYTAB path, when krb5_child forked child process to get fast_ccache and awaits its completion:
https://github.com/SSSD/sssd/blob/sssd-2-9/src/providers/krb5/krb5_child.c#L3306
-- IIUC, at this point krb5_child still runs under 0:0

But real-uid should be SSSD_USER at this point, so this actually leaves only (2)

There are some known issues like SSSD#5536 but those have to be solved differently. Having 'CAP_KILL' in sssd.service doesn't help anyway (and currently isn't used anyhow).

There are some known issues like #5536 but those have to be solved differently. Having 'CAP_KILL' in sssd.service doesn't help anyway (and currently isn't used anyhow). Reviewed-by: Justin Stephenson <[email protected]> Reviewed-by: Pavel Březina <[email protected]> Reviewed-by: Sumit Bose <[email protected]>

Keep saved set-user-ID in `k5c_become_user()` so that 'sssd_be' running under SSSD_USER could signal it. Resolves: SSSD#5536

alexey-tikhonov · 2024-04-30T14:05:57Z

Pushed PR: #7285

master
- 2891e74 - KRB5_CHILD: keep 'set-user-ID' in k5c_become_user()
- b3a487a - LDAP_CHILD: replace become_user() with sss_drop_all_caps()

mtorromeo mentioned this issue Mar 31, 2021

Fix python headers detection with recent autoconf (Fixes #5336) #5495

Closed

alexey-tikhonov self-assigned this Aug 8, 2023

alexey-tikhonov added the Future work label Aug 8, 2023

alexey-tikhonov removed their assignment Aug 8, 2023

alexey-tikhonov mentioned this issue Sep 5, 2023

IPA backend and DBus system bus #5888

Closed

alexey-tikhonov mentioned this issue Feb 15, 2024

Improve ability to run under non-root user; revoke unneeded capabilities; run under 'sssd' starting F41 by default. #7193

Closed

alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Apr 10, 2024

KRB5_CHILD: keep 'set-user-ID' in k5c_become_user()

64c64a2

Keep saved set-user-ID in `k5c_become_user()` so that 'sssd_be' running under SSSD_USER could signal it. Resolves: SSSD#5536

alexey-tikhonov self-assigned this Apr 10, 2024

alexey-tikhonov added Work in Progress and removed Future work labels Apr 10, 2024

alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Apr 10, 2024

KRB5_CHILD: keep 'set-user-ID' in k5c_become_user()

f55460b

Keep saved set-user-ID in `k5c_become_user()` so that 'sssd_be' running under SSSD_USER could signal it. Resolves: SSSD#5536

alexey-tikhonov added a commit to alexey-tikhonov/sssd that referenced this issue Apr 30, 2024

KRB5_CHILD: keep 'set-user-ID' in k5c_become_user()

7bfe4f5

Keep saved set-user-ID in `k5c_become_user()` so that 'sssd_be' running under SSSD_USER could signal it. Resolves: SSSD#5536

alexey-tikhonov closed this as completed in 2891e74 Apr 30, 2024

alexey-tikhonov added the Closed: Fixed Issue was closed as fixed. label Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend running as non-root user cannot kill child processes after timeout #5536

Backend running as non-root user cannot kill child processes after timeout #5536

dpward commented Mar 15, 2021 •

edited

Loading

dpward commented Apr 13, 2021 •

edited

Loading

alexey-tikhonov commented Apr 13, 2021 •

edited

Loading

dpward commented Apr 13, 2021 •

edited

Loading

sumit-bose commented Apr 14, 2021

simo5 commented Apr 14, 2021

simo5 commented Apr 14, 2021

dpward commented Apr 14, 2021 •

edited

Loading

simo5 commented Apr 14, 2021

alexey-tikhonov commented Aug 29, 2023

alexey-tikhonov commented Jan 9, 2024 •

edited

Loading

alexey-tikhonov commented Jan 10, 2024

alexey-tikhonov commented Apr 30, 2024

Backend running as non-root user cannot kill child processes after timeout #5536

Backend running as non-root user cannot kill child processes after timeout #5536

Comments

dpward commented Mar 15, 2021 • edited Loading

dpward commented Apr 13, 2021 • edited Loading

alexey-tikhonov commented Apr 13, 2021 • edited Loading

dpward commented Apr 13, 2021 • edited Loading

sumit-bose commented Apr 14, 2021

simo5 commented Apr 14, 2021

simo5 commented Apr 14, 2021

dpward commented Apr 14, 2021 • edited Loading

simo5 commented Apr 14, 2021

alexey-tikhonov commented Aug 29, 2023

alexey-tikhonov commented Jan 9, 2024 • edited Loading

alexey-tikhonov commented Jan 10, 2024

alexey-tikhonov commented Apr 30, 2024

dpward commented Mar 15, 2021 •

edited

Loading

dpward commented Apr 13, 2021 •

edited

Loading

alexey-tikhonov commented Apr 13, 2021 •

edited

Loading

dpward commented Apr 13, 2021 •

edited

Loading

dpward commented Apr 14, 2021 •

edited

Loading

alexey-tikhonov commented Jan 9, 2024 •

edited

Loading