-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HyStart implementation Linux vs. picoquic #1694
Comments
No need to be sorry for posting issues! About the 'filtered min'. The idea there is to remove the delay jitter common in wireless links, see https://www.privateoctopus.com/2019/11/11/implementing-cubic-congestion-control-in-quic/. So instead of comparing max to the "recent max", it is compared to the "recent min", trying to minimize the effect of jitter. The threshold is a bit arbitrary. Neither 1/4th or 1/8th is based on science. Too low and you get false positives and exit too early, too high and you wait too long and cause queue overflow. I guess what is really needed is to implement Hystart++. |
I have been thinking about this "hystart++" problem for a while. Of course, the first step would be to implement RFC 9406. But if the goal is "proceed safely and find the limit", there may be a bit more research to do. My main concern is that when the node probes for 25% larger window, it will only see the results after 2 RTT, by which time another 25% increase will have been started. During that second RTT, we might get a window 50% larger than ideal. That will probably not cause packet losses if the buffers are ready to absorb at least 1 BDP, but it will be creating queues. Maybe we could lift some ideas from BBR and prevent that. |
I already have implemented RFC 9406 in one of my branches. (https://github.com/hfstco/picoquic/tree/hystart%2B%2B) I want to compare HyStart and HyStart++ for high BDP connections. |
@hfstco Any progress? Do you need help? |
"Short" update from my side: RFC 9406 is currently implemented and works in CUBIC and NEWRENO. But I'm not entirely satisfied with the results. Short explanation of my code:
Check whether a transition from CSS to SS and vice versa is necessary.
Following the procedure for every ACK:
My observations:These are all just individual tests and my interpretation of them. Unfortunately, I don't currently have the time to run through and evaluate several iterations.
But I cannot confirm this at the moment. Several iterations may give more reliable results.
To discuss:
Ok, enough for today. Maybe it is a good place to start discussing. |
First, that's great work. Yes, the performance comparison between Hystart and Hystart++ is inconclusive, but that in itself is an interesting result. You should find a way to get this published -- at a minimum an internal report, but maybe in a place that takes practical papers. CCA? Or one of the SIGCOM conferences? My goal with "a better slow start" is to find a way to reach adequate performance quickly, while also avoiding building large queues and causing packet losses. The basic Hystart does a pretty good job here, once we filter the RTT samples to take out the jitter, but all exponential schemes suffer from the lag effect: Suppose that at time X we have a window that's "just right", but we do not know that yet. The delay measurements up to X + 1 RTT will reflect the old value of the congestion window (or pacing rate), which what less than the "just right" value. No queue is being built yet. The measurements between X+1 and X+2 will reflect the increased pressure on the bottleneck: delay increases, maybe we get L4S style ECN/CE marks. The sooner the sender notices that, the better, but in any case, the delay increases will be observed until X+2, which probably arrives 3 min-RTT after X because of the increased queues. At that point, the delay is 2 min RTT, and the CE/ECN marks may be coming in at a fast pace. If the bottleneck buffer is not larger than the BDP, packets will be lost. What we want is, start reducing the window as soon as we detect something wrong. Of course, there is a tradeoff. Stopping soon means reacting on weaker signals than waiting for the delay to climb, so there is always a risk of stopping too soon. That's why I like the idea behind Hystart++: stop increasing quickly, but then do a safe growth to perform needed corrections. What you found out is that the spec in RFC 9406 does not really deliver that. Reading that RFC, it is unnecessarily complicated. It uses time constants, divisors, etc. I think a simpler spec would be:
There are a couple of signals that I would like to throw in. One is bandwidth measurement (same idea as BBR). In slow start, this is particularly useful to cope with limited senders. The bandwidth measurement let us compute a "min BDP" as: minRTT times max bandwidth measurement. In Slow Start, we should cap the max CWND at 2 times minBDP. In controlled growth, cap at 1.25 times minBDP. BBR adds to that "exit if the bandwidth measurement does not increase for 3 RTT", but that has interesting interactions with limited senders -- in practice, measuring delays provides an equivalent signal, without the limited sender noise. Everything gets easier if we have an idea of the underlying rate or underlying BDP. We can in particular exit slow start if the CWND is larger than 80 or 90% of the target bandwidth, and let controlled growth finish that. |
On your specific questions:
|
We just compared the Linux and picoquic HyStart implementation.
https://github.com/torvalds/linux/blob/ea5f6ad9ad9645733b72ab53a98e719b460d36a6/net/ipv4/tcp_cubic.c#L427-L445
picoquic/picoquic/cc_common.c
Line 138 in c306ab4
We wondered to which extend the picoquic implementation is aligned with the Linux code.
Two specific questions:
picoquic/picoquic/cc_common.c
Lines 150 to 151 in c306ab4
Why is
rtt_track->rtt_filtered_min
compared/set tortt_track->sample_max
and notrtt_track->sample_min
?In Linux, the delay threshold is
ca->delay_min >> 3
while in picoquic there isdelta_max = rtt_track->rtt_filtered_min / 4
. We wondered why these dividers are different.PS: Sorry for posting three issues in a row :-)
The text was updated successfully, but these errors were encountered: