-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solution 1# for client halt bug #56
Conversation
00ff24a
to
3ff3a73
Compare
@jacobkaufmann @jacobkaufmann @emhane ping for review |
I pulled your fix, I can send at least 10 MB no problem now. Could you link the code lines you base your assumptions on? It helps to argue for your case by speeding up the process for readers tying to recreate your trail of thought. I'm interested in "There is already code that handles windowing packets", because many times there are reasons to mange congestion control on multiple levels. Rn multiplexing is like this: One congestion controller per stream exists (conn acceptor and initiator), hence it seems the congestion controller is accounting for the concurrent sending and receiving of packets on that stream, but isn't managing congestion control of concurrent streams, at least not in an informed way since that would require the congestion controller to know the state of the other congestion controllers (other streams). We only have one udp socket to receive and send packets from many streams on and we want those streams to progress concurrently, so congestion control of streams does matter. Therefore we may need the check Nonetheless, this fix doesn't let me send 100 MB, which it should to be stable. Running your fix on my error-handling branch I get this informative error: Running this test for 10 MB without your fix Kolby, I get the same idle time out error, but from the initiator side, which in my test only sends data: By this information, I think that use of the idle time out is funky and needs to be inspected. Great work! |
good dective work I will debug with 100MB now. Recieveing a packet should reset the idle time in theory so maybe I will look at that |
It would probably just be quicker to control f send_buffer then control f receive_buffer. The limited shouldn't affect congestion controls but I guess I should look into it more. |
100mb fail is caused by a timeout bug, cause when I run wire shark the data is being sent fine. So this PR fixes a stale bug, I will try to find the timeout bug |
this is not a bug, and the limit on the send buffer is deliberate. there is a |
I am not saying having a limit is a bug. I am saying // Write as much data as possible into send buffer.
while let Some((data, ..)) = self.pending_writes.front() {
if data.len() <= send_buf.available() {
let (data, tx) = self.pending_writes.pop_front().unwrap();
send_buf.write(&data).unwrap();
let _ = tx.send(Ok(data.len()));
self.writable.notify_one();
} else {
break;
}
} If this else condition is hit the client will stall which doesn't seem to be intentional? Hence the stall is unintential?
So the limit it itself isn't the problem, it is how it is handled. I proposed this as a solution cause the way the library is archtected there isn't an easy way to resolve this a different way. also pending_writes and send buffer are practically the same thing. I don't see how this isn't a bug saying this will halt your client and kill your connection |
Here is an alternate solution that solves the halt problem #59 and gives the responsibility to the user (this also aligns which your comment). I think this solution is flawed though since if someone does 1.5MB 0.2MB but then the data will be unordered A way to solve this would be to repush the data onto pending writes, but that would have the downside of more complexity since we would need to remember we already pushed 1MB for the rx channel. So a solution to that would be remove pending writes all together since it is redundant then have the user interact like User -> send buffer instead of User -> pending_writes -> send buffer this would resolve the complexity and give the responsibility to the user. Why do we need 2 write buffers? |
yea sorry I think I misinterpreted your comment. what I think should happen is that an attempt to write returns an error when the send buffer (or the sum of send buffer and pending writes) exceeds the capacity. the capacity exists to limit the amount of resource consumption. related, the capacity should be set to a value that is greater than or equal to the largest conceivable write your program will perform. the separation between pending writes and send buffer exists so that send buffer can be quite simple. I'm not saying a simpler design isn't available, but the solution here where the capacity of the send buffer is removed is not in the right direction. |
I think that is fair, the short term solution would then be to return an error like you said then add a todo possibly to simplify that |
closed in favor of #61 |
fixes #54 and #44
The issue was / how to replicate the bug
If data.len() is > send_buf.available then the code will "lock"/stale data from being able to be sent. A.k.a data will never be added to the send buffer resulting in a timeout with 0 packets sent.
pictures of wireshark testing this bug can be seen here #54 (comment)
where did this bug come from?
I believe it is because receive buffer was written first where having a limit makes sense (for sending the final packet), but then the code was copy pasted creating send buffer this limit was kept but didn't have a usecase, but still had checks which then caused the stall.
How was the bug fixed?
By removing avaliability/size limit from send buffer and hence the unneeded checks. Resolving the stall send bug.
To iterate there isn't a reason why we should have that limit for send buffer. There is already code that handles windowing packets.