Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

header/timeout has none or little sense #15

Open
libor-peltan-cznic opened this issue May 30, 2024 · 4 comments
Open

header/timeout has none or little sense #15

libor-peltan-cznic opened this issue May 30, 2024 · 4 comments

Comments

@libor-peltan-cznic
Copy link

"timeout": {
                    "description": "The network timeout used in the generator, in fractional seconds",
                    "type": "number"
                }

what does this actually mean? Unless UDP, there are various timeouts that can be used and they usually differ, e.g. handshake timeout, IO timeout, idle timeout....

I suggest to either remove this, or to improve this to accomodate various timeout information.

@nicki-krizek
Copy link

I think the most important timeout is the user timeout - i.e. the amount of time since the client starts processing a query, until the time it receives the answer.

But I agree other timeouts might be useful as well. DNS Shotgun also uses:

  • handshake_timeout: the amount of time after which an attempt to finish a handshake is abandoned
  • idle_timeout: the amount of time for which the established connection remains open even if there are no queries sent (applies to stateful protocols)

Please note that the handshake/idle timeout are parameters which might differ for each DNS Metrics object. For example, one subid sender might use idle_timeout of 0 to simulate clients which aggressively close connections as soon as they get an answer, while other subid sender could use a value like 10 to simulate more well-behaved clients.

Putting these into a header might be too limiting. Perhaps the header could contain definitions of various timeout configurations which could then be referenced in the DNS Metrics object?

@libor-peltan-cznic
Copy link
Author

In practice, any way we measure any kind of latency, we can equivalently impose timeouts (i.e. ceiling of latency after which there will be any kind of failure).

What i think we could do, is to remove this header/timeout for now, first design a robust system of measured latencies, and only after that, think of a system for declaring timeouts. Anyone better idea?

@pspacek
Copy link

pspacek commented May 30, 2024

Generally I agree but I don't have a good idea how to express it without endless repetition.

My view: The "timeout" value is mostly used for interpreting the data. If I have data like this:

  • requests sent = 1000
  • latency histogram
    • under 50 ms = 990 responses
    • under 100 ms = 9 responses
    • one request is unaccounted for, and the timeout value set to 1000 tells me that it was either packet drop or a response slower than 1000 time units

Timeouts are can of worms because it's also debatable when you start the timer etc. E.g. if load simulator like Shotgun generates "traffic like from real users" it probably wants to measure end-to-end latency for individual queries, i.e. start the timer when "user wanted to send the query" and stop it only after receiving a response. In this case the latency/timeout would include potential TCP/TLS/DoH session setup etc.

Another tool might want to measure DNS request/response latency and exclude connection setup from that. 🤯

@nicki-krizek
Copy link

The "timeout" value is mostly used for interpreting the data.

It shouldn't be, at least it's not the case in shotgun - timeout-ed queries should be accounted for by being present in the very last latency bucket.

What i think we could do, is to remove this header/timeout for now, first design a robust system of measured latencies, and only after that, think of a system for declaring timeouts.

Agreed. It's better to have no way of representing timeouts for now than to have a vague unclear value in a place where it might not belong.

Timeouts are can of worms because it's also debatable when you start the timer etc.

Right, the responses in latency buckets might mean different things for different tools. I can't think of a way around that other than to have some optional metadata field in the header which would specify what the query latencies actually represent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants