-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full send_buf prevents sending of FIN #44
Comments
can you share calling code to reproduce? |
Yup, I basically just created a new cargo project and this is the #[tokio::main]
async fn main() {
tracing_subscriber::fmt()
.with_max_level(tracing::Level::TRACE)
.init();
let args: Vec<String> = env::args().collect();
match args[1].as_str() {
"receiver" => receiver().await,
"sender" => sender().await,
_ => panic!("invalid argument"),
}
}
async fn sender() {
// bind a standard UDP socket. (transport is over a `tokio::net::UdpSocket`.)
let socket_addr = SocketAddr::from(([127, 0, 0, 1], 3400));
let udp_based_socket = UtpSocket::bind(socket_addr).await.unwrap();
tokio::time::sleep(tokio::time::Duration::from_secs(5)).await;
// connect to a remote peer over uTP.
let remote = SocketAddr::from(([127, 0, 0, 1], 3401));
let config = ConnectionConfig::default();
let mut stream = udp_based_socket.connect(remote, config).await.unwrap();
// write data to the remote peer over the stream.
let data = vec![0xef; 100_000];
let n = stream.write(data.as_slice()).await.unwrap();
}
async fn receiver() {
// bind a standard UDP socket. (transport is over a `tokio::net::UdpSocket`.)
let socket_addr = SocketAddr::from(([127, 0, 0, 1], 3401));
let udp_based_socket = UtpSocket::bind(socket_addr).await.unwrap();
// accept a connection from a remote peer.
let config = ConnectionConfig::default();
let mut stream = udp_based_socket.accept(config).await.unwrap();
// read data from the remote peer until the peer indicates there is no data left to write.
let mut data = vec![];
let n = stream.read_to_eof(&mut data).await.unwrap();
} I run the experiment by running the sender via |
huh, well when I substitute 100k here, things still appear to work correctly. one thing I can think of is whether your code is running with the multi-threaded to answer your question about the send buffer, the internal representation is a |
👍 on send buffer Yeah, I also substituted 100k into that test and everything worked. I was running with In #45 I explained some weird behavior in my terminal that I observed. And since in that pr we have a single test to verify against, which is easier to debug, I'll revisit this issue once we resolve whatever's going on in that pr. |
I believe your fix to this test shows that utp manages to transfer 100k bytes without problem, right? @njgheorghita |
There is a stall bug I found in the library which prevents all write if a large enough vector is submitted. Here is a PR which resolves it #56 |
another curious thing with this |
@emhane IIUC correctly, rust runs all tests concurrently.... this could be the source of the issue? If so I do think that would confirm the flakiness of this library w/ handling concurrent txs. In trin we use the serial test library in situations where want to control the order in which tests are run. It'd be interesting to know whether the second test still fails if you run them serially |
I am running them separately, clone the branch and you tell me what's up? @njgheorghita |
I believe the read
The read loop blocks. |
Idk.. cloned it and saw the same behavior you're describing. However... in test 2... if I This could be evidence that Furthermore, I've only experienced bugs when trin uses No coherent thoughts here, just thinking out loud |
I am running suit This is my test branch (nothing is modified except adding a few debug logs and the test suit)
The issue appears to be an incorrect handling of overflow The import part to look for is how
|
Here is a demo of the overflow bug 65535 passes and 65536 fails |
I will echo a comment I left on #56: the send buffer capacity should be configurable to account for the maximum payload length that you could conceivably transfer in your program. this is why I asked nick in #45 about the size of the payloads. also mentioned there, the send buffer currently has a fixed capacity. that capacity should be made configurable. any attempted writes beyond that capacity will fail due to the issue described in #56. |
I believe the root of the problem is that most of the code is sync, i.e. blocking, and called kind of recursively sometimes like process_writes calling process_writes again through the indirection. What happens now is that the If the sender and the receiver are on different machines, the program won't go into a deadlock for unidirectional streams so long as packets aren't lost too often, i.e. as long as the sender gets the ACKs so it can finish writing and move on with the next thing. The first fix may hence be to make sure a failed connection is handled properly so that another connection is established within the same stream if the connection fails like in an unreliable network condition, so without losing the stream state. To allow for concurrent streams to work smoothly, each stream can be ran in a blocking thread, that means that all the streams can make progress all the way till the end concurrently. Otherwise we get some weird idle time outs as each stream will write/read till its done before another can proceed, so some unlucky streams when running many streams concurrently in the same program, will always wait too long for the other streams to finish a write or a read and will time out. Another way, as most functions called from, for example, Possibly we want to try some of these approaches out before making the whole code called from Connection |
This seems quite promising! Nice digging! I just ran some local tests and observed that the bug in #55 occurs when the window is
I was under the impression that a @emhane Given the observed fact that trin-> fluffy utp txs work flawlessly. How would you reconcile that with this hypothesis? It would appear to me as though the blocking (or at least the problematic blocking) occurs on the receiver side, in some way preventing them from sending Curious to hear @jacobkaufmann 's thoughts on this |
it will take some more time for me to go through emilia's analysis, but I can confirm there is a 1:1 relationship between stream and connection. the connection is essentially the state machine, and the stream is the interface to the underlying connection. |
@njgheorghita yes precisely, 1-1 rn stream-connection, I didn't state otherwise but opened for exploration of 1-*. this is my understanding of the relationship between components. that fluffy-trin works but not trin-trin is definitely interesting, how much data are you sending in those tests? great, @jacobkaufmann, I'm sure with all our heads together we can send 1 GB within weeks. |
this is also quite curious and also pointing to the direction that there is something blocking with the receiving: when I pass the expected eof to a read, the second test that failed for us @njgheorghita works... |
What do you mean by send 1 GB send 1 GB during a connection or send 1 GB as a total to multiple connections |
@KolbyML on one stream, so that would mean chunking any larger data into chunks of 1 MB and opening a new connection for each chunk, serially I guess, but sending them all in parallel from one stream would be cool - then the data would arrive in no time 😎 as for why the first test works and the second not, a stream must be manually shut down or dropped for FIN to be sent and a write and read to be successful. |
😎 We can do that after we resolve the halt/deadlock bug I found and after we fix the overflow bug I found (which I am working on having a PR to fix both) I see you proposed a solution for the halt/deadlock bug, I left a comment with a issue I think your solution has. But yeah it should be doable to send 1 GB soon after the overflow bug is fixed. I think we should avoid making new connections serially, and just fix the bug, but yeah! Parallel connections probably wouldn't make transfers faster and would add complexity
|
I found all the same bugs on my end, naturally, because they are the bugs blocking sending more data.
sure, I like your attitude. let's squash 'em all.
it would speed it up the transfer for example when other data is transferred in one other window too, as then the transfer would be allowed 2/3 separate windows instead of 1/2 windows. but this is not where we start you're absolutely right. but if all connections do that, try to double, triple, etc. their windows, then there is no gain you're quite right.
a little bit of complexity but nothing new. like the congestion control model used again for multiple streams more or less, to sort the streams at destination. however it would require all clients interpret turn data in a special way, which is a protocol change, those take time to get consensus on, so nvm for this case. #62, how will that buffer increase the amount available to transfer? does it mask the seq numbers or smthg like that so we get more values from a u16? I couldn't find the reference implementation. |
Of course people using the same buggy library would experience the same bugs
These are considered the references implementaions If you want something simpler to read but highly similar to the reference implementations read fluffies implementation the current problem, I think diagrams will help explain this 😎
I think to understand this it will help for me to explain how this implementation and the reference is different how ethereum/utp worksThis implementation works by starting a vector with the first data packet we sent and adding every outgoing packet to this vector till we have store 2^16 sent packets. We can check if the packet we sent was acted because we will store it in every packet struct as data (we shouldn't do this). We can also check if the ack_num the reciever sent is correct by checking if it is within the range I will now do a demo image of how ethereum/utp works
Our outgoing_buffer (sent_packets.packets in the code) will start here^
Problems?
how reference uTP implementations work and TCP works
We don't need more values since our outgoing buffer only has to track the cur_send_window worth of packets, since if a packet is confirmed recieved we no longer need to track it. Instead of growing a list of all knowledge ever sent. The reference implementations uses a sliding window which is a window of packets not confirmed to be recieved yet. This is all the information we need. Let say there are currently 2k sent but unacked data packets at all times. Imagine this is our u16 sequence number index and we happened to start at 0 (this is random normally) (
Our sliding window will start here^ The picture above is us sending 2^16 + a few data packets, but as we could imagine this could go on forever (this is how TCP/uTP works) Final NotesI hope I explained things well if not please ask more questions :). The sliding window is what I want to implement this will allow us to send infinity data just like a TCP connection would. I hope this answers what #62 is trying to achieve (of course it will be a multi part PR). Increasing the size of |
I've spent some time playing around with this library. Basically, I've been running a bare simple app locally that creates a sender & receiver utp socket and tries to tx 100000 bytes. This transfer has failed consistently - while it might be user error, i'm fairly confident I'm using the library in the way it's intended...
The repeated behavior I'm seeing is the sender / initiator processes the tx succesfully and closes the stream - however the recipient never fully processes the stream (via
read_to_eof
) and hangs until it time outs.It appears to me as though a
fin
is never sent from the sender -> receipient. This is because the sender never clears itssend_buf
. The evaluations here and here always fail sincesend_buf
is never emptied. So the fin is never sent. Something appears to be happening here where thesend_buf
is not properly cleared.I still have some understanding to improve upon, so I'm not sure if these thoughts make any sense.
send_buf
andpending_writes
pending_writes
queue seems to empty correctly, thesend_buf
never fully clears... porque?send_buf
aVecDeque<Vec<u8>>
and not justVecDeque<u8>>
?But from what I can tell from experimenting with this library, it's consistent behavior that's preventing the sending of the
fin
packet. Mostly just leaving this here so that I can try and organize my thoughts & i'll pick it back up after the weekend, but if anybody has any clues / ideas / pointers on what I'm misunderstanding - that's definitely appreciated.The text was updated successfully, but these errors were encountered: