Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

end endpoint driver on manually close #1828

Conversation

divagant-martian
Copy link

@divagant-martian divagant-martian commented Apr 20, 2024

When calling quinn::Endpoint::close the socket resources are not freed until all references to the endpoint are dropped. This makes it so that the socket is freed on close.

Any suggestions on the best way to test this are welcome

@Ralith
Copy link
Collaborator

Ralith commented Apr 20, 2024

I don't think this is harmful, but I'm not sure it's useful either. What's the motivation? If you're trying to make the socket available for reuse, this still won't be reliable, because it takes some time for connections to close (i.e. for endpoint.connections.is_empty() to become empty).

@divagant-martian
Copy link
Author

Motivation is try to mitigate the fact that calling close is a bit of a "fire and forget" operation as an user. There is no way to know for sure when graceful shutdown operations are complete. So this tries to speed it up

@Ralith
Copy link
Collaborator

Ralith commented Apr 22, 2024

There is no way to know for sure when graceful shutdown operations are complete

That's Endpoint::wait_idle.

So this tries to speed it up

How does a possibly earlier, but still unpredictable, drop help? What's the use case that this is addressing?

@divagant-martian
Copy link
Author

I see this is hard to see from quinn's point of view. Let me try to explain a bit better, at least from my -potentially poor- understanding of what's going on underneath.

wait_idle waits for shutdown operations regarding empty connections but still holds an EndpointRef (since this takes &self). EndpointDriver only finishes after the references are dropped. So even after this call, the socket is still being polled even if close was called before.

This means there is no way to actively wait for the resources held be socket to be freed. To put it very specifically, as far as I can tell there is no async quinn::Endpoint::close_and_free_socket that ensures after awaiting the socket has been dropped.

This is observable from an implementation of AsyncUdpSocket. Specific use case is exactly this. To ensure a clean socket shutdown I can observe via some notifications that the socket has been dropped and performed all shutdown operations. Dropping the EndpointDriver on close makes this at least possible, because being quinn::Endpoint: Clone I want to be able to ensure after a single close call the socket is dropped regardless of how many clones of the endpoint there are.

Sorry for the convoluted explanation, but I hope this helps

@Ralith
Copy link
Collaborator

Ralith commented Apr 22, 2024

Thanks, those additional details are helpful. Putting it in my own words, you have a custom AsyncUdpSocket whose Drop you wish to run in a timely manner after Endpoint::close, and you want to guard against accidentally delaying this indefinitely by holding on to a quinn::Endpoint somewhere obscure.

Unfortunately, this change will not address your use case. Every quinn::Endpoint shares ownership of the underlying quinn::endpoint::State which owns the Arc<dyn AsyncUdpSocket>. If your driver is still alive due to a lingering quinn::Endpoint, then your AsyncUdpSocket will be as well, even if you modify the driver to terminate early.

Consider Endpoint::rebind API to replace your socket with a dummy after calling Endpoint::close and awaiting Endpoint::wait_idle. Alternatively, you could send a message to your custom socket logic at that time.

@divagant-martian
Copy link
Author

divagant-martian commented Apr 22, 2024

Thanks, those additional details are helpful. Putting it in my own words, you have a custom AsyncUdpSocket whose Drop you wish to run in a timely manner after Endpoint::close, and you want to guard against accidentally delaying this indefinitely by holding on to a quinn::Endpoint somewhere obscure.

Correct, thanks for bearing with me

Unfortunately, this change will not address your use case. Every quinn::Endpoint shares ownership of the underlying quinn::endpoint::State which owns the Arc. If your driver is still alive due to a lingering quinn::Endpoint, then your AsyncUdpSocket will be as well, even if you modify the driver to terminate early.

Thanks for pointing out that dropping the socket won't happen even in this scenario.

Consider Endpoint::rebind API to replace your socket with a dummy after calling Endpoint::close and awaiting Endpoint::wait_idle. Alternatively, you could send a message to your custom socket logic at that time.

Since the objective is to free resources rebind is not really what I'm looking for. Informing the socket of closing on a side channel is I think the best shot, but without this change there is still an undesirable observable behaviour. Maybe our AsyncUdpSocket implementation is wrong - I think it should be alright, but open to differing opinions -.

Consider an AsyncUdpSocket that contains the underlying socket, for simplicity let's assume

struct MySocket<I: AsyncUdpSocket>(Arc<Mutex<Option<I>>>);

struct Super<I: AsyncUdpSocket> {
    socket: MySocket<I>,
    endpoint: quinn::Endpoint,
}

with a MySocket::expensive_close method that will Option::take the underlying I and return an IO error (io::ErrorKind::NotConnected) on poll_send and poll_receive when closed/taken

and a Super::close that looks like this

self.endpoint.close(code, reason);
self.endpoint.wait_idle().await;
self.socket.expensive_close().await;
// underlying socket is already dropped, and it was dropped after idle

since the socket is still being polled after close, the returned io error from poll_send/poll_recv will be bubbled up here

Beside this being annoying, it feels wrong. After close and idle the endpoint won't be able to send or receive so it being polled is something I think should not be happening - unless we are misunderstanding the semantics of closing the endpoint and its connections

@Ralith
Copy link
Collaborator

Ralith commented Apr 22, 2024

Since the objective is to free resources rebind is not really what I'm looking for.

Specifically rebinding with a dummy socket (e.g. a ZST that stubs all trait methods and owns no resources) would serve your needs.

since the socket is still being polled after close, the returned io error from poll_send/poll_recv will be bubbled up

An easy solution is to silently no-op rather than returning an error.

After close and idle the endpoint won't be able to send or receive so it being polled is something I think should not be happening - unless we are misunderstanding the semantics of closing the endpoint and its connections

QUIC endpoints may perform I/O even when there are no connections. For example, I/O is required to reject incoming connection attempts gracefully. It's not obvious that Endpoint::close should suppress this; there's no standard definition of a "closed endpoint". Quinn's Endpoint::close is just a convenience API intended for the presumed-common case of shutting down a process, when lingering Endpoint handles aren't going to be a concern anyway.

@divagant-martian
Copy link
Author

Specifically rebinding with a dummy socket (e.g. a ZST that stubs all trait methods and owns no resources) would serve your needs.

as far as I can tell the only rebinding option is with a "real" socket. https://docs.rs/quinn/0.10.2/quinn/struct.Endpoint.html#method.rebind

@Ralith
Copy link
Collaborator

Ralith commented Apr 22, 2024

Good point. We should probably expose a lower-level version, since Endpoint::new_with_abstract_socket is already public.

@djc
Copy link
Member

djc commented Apr 23, 2024

Good point. We should probably expose a lower-level version, since Endpoint::new_with_abstract_socket is already public.

See #1831.

@flub
Copy link
Contributor

flub commented Apr 23, 2024

IIUC the change as proposed might result in the CONNECTION_CLOSE frames not being sent and thus the peer not receiving the close reason and error code. That's probably legal, but not very nice.

As far as I understand what we are really after with the custom AsyncUdpSocket shutdown is knowing when Quinn is done with the socket. Which currently we only know when it is dropped.

The thing that seems to be missing is that an application can never really know when Quinn is done. It spawns various tasks in the background which hold on to some resources. IIUC this problem would also be solved by knowing when an endpoint has released all it's resources, i.e. all the tasks are done. Maybe this could be in the guise of some sort of tokio::task::JoinSet (I know this is not runtime-independent) which the application could take ownership off if it wants, and then it can wait on those or even abort them. The above shutdown scenario would then become:

let tasks = self.endpoint.take_task_set();
self.endpoint.close(code, reason);
tasks.await;
self.socket.expensive_close().await;

@Ralith
Copy link
Collaborator

Ralith commented Apr 23, 2024

That's already provided by the resources in question getting dropped.

@divagant-martian is #1831 an adequate solution for your use case?

@flub
Copy link
Contributor

flub commented May 1, 2024

That's already provided by the resources in question getting dropped.

This is true for someone who implements an AsyncUdpSocket, but that's a lot to ask for a normal library users. They would still benefit from knowing when all resources are freed. I realise that's not where this PR started from, but this knowing when all resources are released are at the root of the reason this PR was started.

Also, relying on drop is not that simple. It is only possible if everything relies on drop. n0-computer/iroh#2227 (comment) describe why doing so when you need to support an explicit .close() or .shutdown() or the like is not feasible.

So I still think there might be some benefit to have a mechanism to know when Quinn is done with all it's resources.

@Ralith
Copy link
Collaborator

Ralith commented May 1, 2024

They would still benefit from knowing when all resources are freed.

What is the concrete use case?

@flub
Copy link
Contributor

flub commented May 2, 2024

They would still benefit from knowing when all resources are freed.

What is the concrete use case?

The obvious answer is that the above use case of having to close the sockets would have been trivially simple if something like this would have existed. I consider the current solution of rebinding with a stubbed-out socket rather a weird hack.

Another use case is for tests where ideally, at least for successful tests, the resources would be cleaned up which helps when running a large test suite in parallel.

Though in general I'm rather uncomfortable if a library does not allow closing its resources entirely. Sooner or later this results in a race-condition where something is trying to wait for things to finish but then they aren't.

@djc
Copy link
Member

djc commented May 2, 2024

Maybe it's time for iroh to drop quinn and drop down to the quinn-proto level?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants