What should happen when an FD is closed while a thread is blocked on it? #4051

RalfJung · 2024-11-22T06:42:23Z

For both epoll and eventfd, the question arises what happens when an FD is closed while a thread is blocked on it. Currently, we throw an "unsupported" error. That's not a great solution -- we should figure out what POSIX guarantees in that case, and implement that.

So, what does POSIX say?

Cc @tiif

oli-obk · 2024-11-22T07:11:32Z

POSIX says nothing on this. In theory it counts as a race condition, as you can't ever say another thread has blocked on this FD already (there's no general way to ensure that has happened yet, it may be right before the thread actually executes the blocking op).

In practice, once actually blocked, the thread just never wakes.

So... beyond going to reporting a race condition error instead of an unsupported error, I see no good course of action. We picked unsupported as it's unspecified what the behaviour is, and in practice it's definitely not "return the same result as if the FD were closed before the read/write call was done"

RalfJung · 2024-11-22T07:37:34Z

you can't ever say another thread has blocked on this FD already (there's no general way to ensure that has happened yet, it may be right before the thread actually executes the blocking op).

I can't parse this, what do you mean?

reporting a race condition error

That would mean making it UB. The question is, is it UB?

RalfJung · 2024-11-22T07:46:12Z

@the8472 do you know if there is any required behavior from POSIX here?

"Never wake up the thread" does seem like a reasonable implementation we could copy. Maybe we should issue some sort of warning though? We don't have much in the way of warnings in Miri currently, but after all this is almost certainly a bug.

oli-obk · 2024-11-22T08:05:36Z

I can't parse this, what do you mean?

When thread A is running on a code path that will inevitably block on FD, and thread B is closing FD, we are in an unclear state wrt what happens to thread A. Depending on the scheduler it

is already blocked
it will invoke the blocking op after thread B closes FD
it's already accessing FD internals while B is closing FD

It's unspecified if the third option can happen.

The first option leads to the thread staying blocked forever in practice (this may also depend on the type of FD, pipes seem in theory more likely to unblock than e.g. a file)

The second option is well specified

oli-obk · 2024-11-22T08:10:25Z

I decided we should pick unsupported as this is always a bug according to all stackoverflow and other forum answers I could find. These often state that it's a race condition, but without any references. On a more high level, a FD should only be closed if there are no outstanding uses of it. The docs of close say the word "race condition"

   Multithreaded processes and close()
   It is probably unwise to close file descriptors while they may be
   in use by system calls in other threads in the same process.
   Since a file descriptor may be reused, there are some obscure
   race conditions that may cause unintended side effects.

   Furthermore, consider the following scenario where two threads
   are performing operations on the same file descriptor:

   (1)  One thread is blocked in an I/O system call on the file
        descriptor.  For example, it is trying to write(2) to a pipe
        that is already full, or trying to read(2) from a stream
        socket which currently has no available data.

   (2)  Another thread closes the file descriptor.

   The behavior in this situation varies across systems.  On some
   systems, when the file descriptor is closed, the blocking system
   call returns immediately with an error.

   On Linux (and possibly some other systems), the behavior is
   different: the blocking I/O system call holds a reference to the
   underlying open file description, and this reference keeps the
   description open until the I/O system call completes.  (See
   open(2) for a discussion of open file descriptions.)  Thus, the
   blocking system call in the first thread may successfully
   complete after the close() in the second thread.

https://www.man7.org/linux/man-pages/man2/close.2.html

RalfJung · 2024-11-22T08:25:07Z

"unsupported" indicates a missing feature in Miri, not a bug in the program. The error text even explicitly says that. So according to what you said we should emit a different error.

bjorn3 · 2024-11-22T09:18:31Z

The epoll man page says the following:

Will closing a file descriptor cause it to be removed from all epoll interest lists?

Yes, but be aware of the following point. A file descriptor is a reference to an open file description (see open(2)). Whenever a file descriptor is duplicated via dup(2), dup2(2), fcntl(2)
F_DUPFD, or fork(2), a new file descriptor referring to the same open file description is created. An open file description continues to exist until all file descriptors referring to it have
been closed.

A file descriptor is removed from an interest list only after all the file descriptors referring to the underlying open file description have been closed. This means that even after a file
descriptor that is part of an interest list has been closed, events may be reported for that file descriptor if other file descriptors referring to the same underlying file description remain open.
To prevent this happening, the file descriptor must be explicitly removed from the interest list (using epoll_ctl(2) EPOLL_CTL_DEL) before it is duplicated. Alternatively, the application must
ensure that all file descriptors are closed (which may be difficult if file descriptors were duplicated behind the scenes by library functions that used dup(2) or fork(2)).

In other words the fd will be removed from the interest list if there is no dup of the fd still around. If it is the last fd in the interest list, epoll_wait will block until another thread inserts a new fd in the interest list and this new fd triggers an event I think, just like when calling epoll_wait when the interest list was empty from the start.

RalfJung · 2024-11-22T10:23:35Z

The epoll case I was talking about above is about what happens when the epoll FD itself is closed while someone blocks on it.

What happens when an FD in the interest list is closed, is a separate question, but I think that is already handled properly. epoll identifies interest by the pair of FD number and underlying file description object; if the object is closed then even if the number gets reused the interest will never be triggered.

RalfJung · 2024-11-22T10:25:18Z

@tiif do we have a test for this situation of "the FD registered with epoll got closed but there's a duplicate FD for the same file description so events are still reported"? I guess that means events are reported with the old FD number, which is actually quite confusing since it won't refer to the relevant object any more...

the8472 · 2024-11-22T12:10:52Z

@the8472 do you know if there is any required behavior from POSIX here?

epoll is a linux-specific api. poll is the posix standard one, but I haven't seen any explicit documentation whether a wakeup will be triggered by a concurrent close or not.

The epoll case I was talking about above is about what happens when the epoll FD itself is closed while someone blocks on it.

File descriptions are refcounted resources (on linux). So I assume that closing the file descriptor will only decrease the count by one but the ongoing polling syscall will own another token and so the underlying description will remain open at least until the syscall returns.

Since epollfd is linux-specific the portability of those semantics shouldn't be relevant here.

oli-obk · 2024-11-22T12:54:59Z

I really don't want to replicate per-FD-kind behaviour on close while blocked, with platform dependent shenanigans. It's why I opted for the unsupported, since we do not support the (potentially guaranteed) platform or FD specific behaviour. The situations are always an edge case that you probably don't want to encounter anyway. We could add a new Undesirable error to clearly signal that difference to regular unsupported which usually mean we invite adding an impl for. A race condition error signals too much that this is UB, when it may be defined for some platforms and some FDs

RalfJung · 2024-11-22T14:03:39Z

I don't think we should have per-FD-type logic, either.

We could raise a "deadlock" error, since it does mean that this thread will definitely never be woken up.

So I assume that closing the file descriptor will only decrease the count by one but the ongoing polling syscall will own another token and so the underlying description will remain open at least until the syscall returns.

So that would mean even if the epoll FD is closed, future wakeups on threads waiting on that FD still happen, since the underlying data is still present. Makes sense. We should probably implement that, then.

tiif · 2024-11-22T15:21:12Z

do we have a test for this situation of "the FD registered with epoll got closed but there's a duplicate FD for the same file description so events are still reported"?

yes

miri/tests/pass-dep/libc/libc-epoll-no-blocking.rs

Lines 339 to 383 in 158973f

    
           // When a certain file descriptor registered with epoll is closed, but the underlying file description 
        
           // is not closed, notification should still be provided. 
        
           // 
        
           // This is a quirk of epoll being described in https://man7.org/linux/man-pages/man7/epoll.7.html 
        
           // A file descriptor is removed from an interest list only after all the file descriptors 
        
           // referring to the underlying open file description have been closed. 
        
           fn test_not_fully_closed_fd() { 
        
               // Create an epoll instance. 
        
               let epfd = unsafe { libc::epoll_create1(0) }; 
        
               assert_ne!(epfd, -1); 
        
               // Create an eventfd instance. 
        
               let flags = libc::EFD_NONBLOCK | libc::EFD_CLOEXEC; 
        
               let fd = unsafe { libc::eventfd(0, flags) }; 
        
               // Dup the fd. 
        
               let newfd = unsafe { libc::dup(fd) }; 
        
               assert_ne!(newfd, -1); 
        
               // Register eventfd with EPOLLIN | EPOLLOUT | EPOLLET 
        
               let mut ev = libc::epoll_event { events: EPOLL_IN_OUT_ET, u64: u64::try_from(fd).unwrap() }; 
        
               let res = unsafe { libc::epoll_ctl(epfd, libc::EPOLL_CTL_ADD, fd, &mut ev) }; 
        
               assert_eq!(res, 0); 
        
               // Close the original fd that being used to register with epoll. 
        
               let res = unsafe { libc::close(fd) }; 
        
               assert_eq!(res, 0); 
        
               // Notification should still be provided because the file description is not closed. 
        
               let expected_event = u32::try_from(libc::EPOLLOUT).unwrap(); 
        
               let expected_value = fd as u64; 
        
               check_epoll_wait::<1>(epfd, &[(expected_event, expected_value)]); 
        
               // Write to the eventfd instance to produce notification. 
        
               let sized_8_data: [u8; 8] = 1_u64.to_ne_bytes(); 
        
               let res = unsafe { libc::write(newfd, sized_8_data.as_ptr() as *const libc::c_void, 8) }; 
        
               assert_eq!(res, 8); 
        
               // Close the dupped fd. 
        
               let res = unsafe { libc::close(newfd) }; 
        
               assert_eq!(res, 0); 
        
               // No notification should be provided. 
        
               check_epoll_wait::<1>(epfd, &[]); 
        
           }

tiif · 2024-11-22T16:21:46Z

So that would mean even if the epoll FD is closed, future wakeups on threads waiting on that FD still happen, since the underlying data is still present. Makes sense. We should probably implement that, then.

If we plan to support "closing fd while blocking on it" operation for epoll, should we do the same for socketpair/pipe and eventfd?

RalfJung · 2024-11-22T16:55:38Z

Those are different situations though -- one is blocking on read/write, the other is blocking on a dedicated operation (epoll_wait) that merely uses the FD for bookkeeping. epoll_wait does not get woken up by actions on its own FD, it gets woken up by actions on other FDs.

tiif · 2024-11-22T17:09:10Z

epoll_wait does not get woken up by actions on its own FD, it gets woken up by actions on other FDs.

I expect it'd be similar for socketpair/pipe too. If the thread blocked because it tried to read from an empty socketpair/pipe, writing to it's peer fd would unblock it.

RalfJung · 2024-11-22T17:29:22Z

Yeah but we are not talking about a closed peer fd here.

tiif · 2024-11-23T08:19:25Z

The blocked fd could be closed too, the case that I am thinking of is:

thread1 blocks on read for one of the socketpair fd, fd1
thread2 tries to close fd1
thread3 writes to the peer fd of fd1, and this can unblock thread1

which of the following should happen here?

throw error for the close in step2
allow the close and allow thread1 to be unblocked
throw miri error as we don't want to support this behavior

RalfJung · 2024-11-23T09:23:51Z

Since in practice the thread will not be unblocked, we should not do 2. So my proposal would be to throw a "deadlocked" error either when the FD is closed with a non-empty wait list, or when the thread wakes up and finds that its FD is gone.

tiif · 2024-11-23T13:37:40Z

Sure, this makes sense. It would be nice if we could throw a more specific error than deadlock if we encounter this scenario. The reason is if someone manages to make this work on a certain platform in a real system, but it fails in miri with a deadlock error, it'd be helpful to know that it encountered this specific scenario. This way, we can rule out the possibility of it being a miri implementation bug early on.

RalfJung · 2024-11-23T18:00:20Z

Yeah ideally the error would point at both the "close", and the thread that can now no longer wake up. And then for epoll - we should probably make the Weak ref into an Rc so that the unblock callback can definitely access the epoll instance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What should happen when an FD is closed while a thread is blocked on it? #4051

What should happen when an FD is closed while a thread is blocked on it? #4051

RalfJung commented Nov 22, 2024

oli-obk commented Nov 22, 2024

RalfJung commented Nov 22, 2024

RalfJung commented Nov 22, 2024

oli-obk commented Nov 22, 2024

oli-obk commented Nov 22, 2024

RalfJung commented Nov 22, 2024 via email

bjorn3 commented Nov 22, 2024

RalfJung commented Nov 22, 2024

RalfJung commented Nov 22, 2024

the8472 commented Nov 22, 2024 •

edited

Loading

oli-obk commented Nov 22, 2024

RalfJung commented Nov 22, 2024

tiif commented Nov 22, 2024

tiif commented Nov 22, 2024

RalfJung commented Nov 22, 2024

tiif commented Nov 22, 2024

RalfJung commented Nov 22, 2024 via email

tiif commented Nov 23, 2024

RalfJung commented Nov 23, 2024 via email

tiif commented Nov 23, 2024

RalfJung commented Nov 23, 2024 via email

What should happen when an FD is closed while a thread is blocked on it? #4051

What should happen when an FD is closed while a thread is blocked on it? #4051

Comments

RalfJung commented Nov 22, 2024

oli-obk commented Nov 22, 2024

RalfJung commented Nov 22, 2024

RalfJung commented Nov 22, 2024

oli-obk commented Nov 22, 2024

oli-obk commented Nov 22, 2024

RalfJung commented Nov 22, 2024 via email

bjorn3 commented Nov 22, 2024

RalfJung commented Nov 22, 2024

RalfJung commented Nov 22, 2024

the8472 commented Nov 22, 2024 • edited Loading

oli-obk commented Nov 22, 2024

RalfJung commented Nov 22, 2024

tiif commented Nov 22, 2024

tiif commented Nov 22, 2024

RalfJung commented Nov 22, 2024

tiif commented Nov 22, 2024

RalfJung commented Nov 22, 2024 via email

tiif commented Nov 23, 2024

RalfJung commented Nov 23, 2024 via email

tiif commented Nov 23, 2024

RalfJung commented Nov 23, 2024 via email

the8472 commented Nov 22, 2024 •

edited

Loading