Solution 1# for client halt bug #56

KolbyML · 2023-05-23T19:36:52Z

fixes #54 and #44

The issue was / how to replicate the bug

        // Write as much data as possible into send buffer.
        while let Some((data, ..)) = self.pending_writes.front() {
            if data.len() <= send_buf.available() {
                let (data, tx) = self.pending_writes.pop_front().unwrap();
                send_buf.write(&data).unwrap();
                let _ = tx.send(Ok(data.len()));
                self.writable.notify_one();
            } else {
                break;
            }
    }

If data.len() is > send_buf.available then the code will "lock"/stale data from being able to be sent. A.k.a data will never be added to the send buffer resulting in a timeout with 0 packets sent.

pictures of wireshark testing this bug can be seen here #54 (comment)

where did this bug come from?

I believe it is because receive buffer was written first where having a limit makes sense (for sending the final packet), but then the code was copy pasted creating send buffer this limit was kept but didn't have a usecase, but still had checks which then caused the stall.

How was the bug fixed?

By removing avaliability/size limit from send buffer and hence the unneeded checks. Resolving the stall send bug.

To iterate there isn't a reason why we should have that limit for send buffer. There is already code that handles windowing packets.

KolbyML · 2023-05-23T19:53:24Z

@jacobkaufmann @jacobkaufmann @emhane

ping for review

emhane · 2023-05-24T08:41:53Z

I pulled your fix, I can send at least 10 MB no problem now.

Could you link the code lines you base your assumptions on? It helps to argue for your case by speeding up the process for readers tying to recreate your trail of thought. I'm interested in "There is already code that handles windowing packets", because many times there are reasons to mange congestion control on multiple levels.

Rn multiplexing is like this:
one udp socket to one utp socket
one utp socket to many streams
one stream to one connection
one connection to one congestion control

One congestion controller per stream exists (conn acceptor and initiator), hence it seems the congestion controller is accounting for the concurrent sending and receiving of packets on that stream, but isn't managing congestion control of concurrent streams, at least not in an informed way since that would require the congestion controller to know the state of the other congestion controllers (other streams). We only have one udp socket to receive and send packets from many streams on and we want those streams to progress concurrently, so congestion control of streams does matter. Therefore we may need the check if data.len() <= send_buf.available() { after all for this purpose.

Nonetheless, this fix doesn't let me send 100 MB, which it should to be stable. Running your fix on my error-handling branch I get this informative error: thread 'stream::test::test_transfer_100_megabyte' panicked at 'Should read 100 megabyte: Conn(ConnClosed("idle uTP connection time out"))', which leads me to think the error originates somewhere here and here where the idle time out is reset. Adding the connection endpoint to that error tells me it is the acceptor of the connection that times out which in my test only receives data: thread 'stream::test::test_transfer_100_megabyte' panicked at 'Should read 100 megabyte: Conn(ConnClosed("idle uTP connection time out, Acceptor((10328, 10650))"))'.

Running this test for 10 MB without your fix Kolby, I get the same idle time out error, but from the initiator side, which in my test only sends data: thread 'stream::test::test_transfer_10_megabyte' panicked at 'Should send 10 megabyte: Conn(ConnClosed("idle uTP connection time out, Initiator((36072, 1))")).

By this information, I think that use of the idle time out is funky and needs to be inspected.

Great work!

KolbyML · 2023-05-24T13:36:16Z

good dective work I will debug with 100MB now. Recieveing a packet should reset the idle time in theory so maybe I will look at that

KolbyML · 2023-05-24T13:50:41Z

Could you link the code lines you base your assumptions on? It helps to argue for your case by speeding up the process for readers tying to recreate your trail of thought. I'm interested in "There is already code that handles windowing packets", because many times there are reasons to mange congestion control on multiple levels.

It would probably just be quicker to control f send_buffer then control f receive_buffer. The limited shouldn't affect congestion controls but I guess I should look into it more.

KolbyML · 2023-05-24T16:18:48Z

100mb fail is caused by a timeout bug, cause when I run wire shark the data is being sent fine.

So this PR fixes a stale bug, I will try to find the timeout bug

jacobkaufmann · 2023-05-24T17:03:30Z

this is not a bug, and the limit on the send buffer is deliberate. there is a TODO to make the capacity configurable.

KolbyML · 2023-05-24T17:11:58Z

this is not a bug, and the limit on the send buffer is deliberate. there is a TODO to make the capacity configurable.

I am not saying having a limit is a bug. I am saying

        // Write as much data as possible into send buffer.
        while let Some((data, ..)) = self.pending_writes.front() {
            if data.len() <= send_buf.available() {
                let (data, tx) = self.pending_writes.pop_front().unwrap();
                send_buf.write(&data).unwrap();
                let _ = tx.send(Ok(data.len()));
                self.writable.notify_one();
            } else {
                break;
            }
    }

If this else condition is hit the client will stall which doesn't seem to be intentional? Hence the stall is unintential?

^ when the else condition is hit this is what happens

^ this is what is supposed to happen

So the limit it itself isn't the problem, it is how it is handled. I proposed this as a solution cause the way the library is archtected there isn't an easy way to resolve this a different way. also pending_writes and send buffer are practically the same thing.

I don't see how this isn't a bug saying this will halt your client and kill your connection

KolbyML · 2023-05-24T17:30:19Z

Here is an alternate solution that solves the halt problem #59 and gives the responsibility to the user (this also aligns which your comment).

I think this solution is flawed though since if someone does

1.5MB 0.2MB
the 1.5MB will be cut off and the user will try to resend it

but then the data will be unordered
1MB 0.2MB 0.5MB (0.5 from the original send)

A way to solve this would be to repush the data onto pending writes, but that would have the downside of more complexity since we would need to remember we already pushed 1MB for the rx channel.

So a solution to that would be remove pending writes all together since it is redundant then have the user interact like

User -> send buffer

instead of

User -> pending_writes -> send buffer

this would resolve the complexity and give the responsibility to the user. Why do we need 2 write buffers?

jacobkaufmann · 2023-05-26T04:51:00Z

yea sorry I think I misinterpreted your comment. what I think should happen is that an attempt to write returns an error when the send buffer (or the sum of send buffer and pending writes) exceeds the capacity. the capacity exists to limit the amount of resource consumption. related, the capacity should be set to a value that is greater than or equal to the largest conceivable write your program will perform.

the separation between pending writes and send buffer exists so that send buffer can be quite simple. I'm not saying a simpler design isn't available, but the solution here where the capacity of the send buffer is removed is not in the right direction.

KolbyML · 2023-05-26T05:33:52Z

the separation between pending writes and send buffer exists so that send buffer can be quite simple. I'm not saying a simpler design isn't available, but the solution here where the capacity of the send buffer is removed is not in the right direction.

I think that is fair, the short term solution would then be to return an error like you said then add a todo possibly to simplify that

KolbyML · 2023-05-26T06:14:28Z

closed in favor of #61

Remove limit on send buffer

3ff3a73

KolbyML force-pushed the fix-send-to-big-bug branch from 00ff24a to 3ff3a73 Compare May 23, 2023 19:42

KolbyML mentioned this pull request May 23, 2023

Full send_buf prevents sending of FIN #44

Open

KolbyML mentioned this pull request May 24, 2023

Add else condition for writing as much into buffer alternate solution to https://github.com/ethereum/utp/pull/56 #59

Closed

KolbyML changed the title ~~Remove limit on send buffer~~ Solution 1# for client halt bug May 25, 2023

KolbyML mentioned this pull request May 26, 2023

Prevent client from deadlocking/halting on large transfer by throwing error #61

Closed

KolbyML closed this May 26, 2023

KolbyML mentioned this pull request May 28, 2023

Avoid deadlock on data bigger than buffer by chunking #65

Open

emhane mentioned this pull request May 28, 2023

Add current limitation of transfers to README #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solution 1# for client halt bug #56

Solution 1# for client halt bug #56

KolbyML commented May 23, 2023 •

edited

Loading

KolbyML commented May 23, 2023

emhane commented May 24, 2023 •

edited

Loading

KolbyML commented May 24, 2023

KolbyML commented May 24, 2023

KolbyML commented May 24, 2023

jacobkaufmann commented May 24, 2023

KolbyML commented May 24, 2023 •

edited

Loading

KolbyML commented May 24, 2023 •

edited

Loading

jacobkaufmann commented May 26, 2023

KolbyML commented May 26, 2023 •

edited

Loading

KolbyML commented May 26, 2023

Solution 1# for client halt bug #56

Solution 1# for client halt bug #56

Conversation

KolbyML commented May 23, 2023 • edited Loading

The issue was / how to replicate the bug

where did this bug come from?

How was the bug fixed?

KolbyML commented May 23, 2023

emhane commented May 24, 2023 • edited Loading

KolbyML commented May 24, 2023

KolbyML commented May 24, 2023

KolbyML commented May 24, 2023

jacobkaufmann commented May 24, 2023

KolbyML commented May 24, 2023 • edited Loading

KolbyML commented May 24, 2023 • edited Loading

jacobkaufmann commented May 26, 2023

KolbyML commented May 26, 2023 • edited Loading

KolbyML commented May 26, 2023

KolbyML commented May 23, 2023 •

edited

Loading

emhane commented May 24, 2023 •

edited

Loading

KolbyML commented May 24, 2023 •

edited

Loading

KolbyML commented May 24, 2023 •

edited

Loading

KolbyML commented May 26, 2023 •

edited

Loading