Add deterministic simulation testing #90

caspark · 2024-12-14T14:11:56Z

What

Add deterministic simulation testing to verify correctness of ggrs logic with (ideally) all possible combinations of parameters - or at least with commonly used sets of parameters.

Why

This library has a lot of fiddly logic but also needs to be 100% correct in its logic to avoid crashes or desyncs - any failure basically breaks a multiplayer game session.

While there are some automated tests, the current automated test coverage is pretty minimal (probably partly because writing automated tests for this stuff is a pain), so when hacking on this library I've been doing a lot of manual testing.

But it's hard to remember the test cases for manual testing, and the number of available options and network conditions result in a combinatorial explosion of possibilities to test (or at least consider):

network conditions: zero/low/high latency, none/some/high packet loss, out of order message receiving
various player counts (I usually test with 1, 2, 3 - and sometimes 8), single or multiple local players
p2p, synctest, spectator sessions
the various configuration options that impact logic: input delay, prediction window size/lockstep mode, sparse saving, desync detection (at different intervals)
edge cases like disconnect-caused-by-network, disconnect-caused-by-api, etc
adversarial input: bad clients specifically trying to crash other clients' games

On the other hand, the observable behavior for each session should always obey certain obvious and easy to check invariants - so it seems like if we can model the network-quality driven events (message delays, loss and reordering) then we should be able to write automated tests that verify those invariants are upheld - which in turn would make it easier to add features and fix bugs.

How

Here's how I'd go about it:

First, remove nondeterminism from core ggrs logic. Currently the only sources are:

rand for setting the per-client magic (and doing the synchronizing process).
- It should be possible to (optionally) pass in 6 random numbers at session construction time, so that rand is not called. Or maybe we can parameterize the rand impl via the GgrsConfig, which would also allow folks to swap out the random implementation with another one (such as the lighter-weight oorandom).
instant, used for fetching the current time. That in turn is used for figuring out when to send quality reports to clients, working out when clients should be marked disconnected, and shutting down the udp socket when we manually disconnect a player.
- Needs investigation to figure out how to best make this deterministic. Should an elapsed duration be passed in at the top level of advance_frame(), should GgrsConfig gain a Clock associated type parameter, should sessions use an internal 'tick-based' clock that advances automatically when advance_frame() has been called, etc.

Then, pick a property-based testing framework and implement some tests! proptest is probably the best bet, but other options are arbtest (using Arbitrary might allow fuzzing down the line?) or quickcheck (generates inputs faster than proptest but doesn't shrink inputs as nicely).

Thirdly, implement:

the appropriate Arbitrary trait for the 3 ggrs session types as well as some player input (pretty easy)
implement a mocked network socket whose characteristics/reliability can be predetermined via an Arbitrary (e.g. "base latency X milliseconds", "drops 3rd packet and 2 packets thereafter", etc - somewhat tricky)

The most basic actual test for a p2p session would be something like each player sends a bool as input, the "game's" advance frame simply accumulates how many trues it has received from each player, and the test verifies that the count of bools from each player exactly matches what it has submitted to ggrs.

(Of course the devil is in the details: when you consider that a player could have been disconnected from a long sequence of missed packets it becomes apparent that the network simulation needs to be carefully modeled in terms of expected failure modes so we can modify tests' assertions appropriately - but I reckon we could start with a basic case and add the other edge cases later.)

If we're happy with an approach along these lines(?), I can have a shot at it when I next have some time to burn on open source contributions - though probably after #89 is sorted.

The text was updated successfully, but these errors were encountered:

caspark added the enhancement New feature or request label Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deterministic simulation testing #90

Add deterministic simulation testing #90

caspark commented Dec 14, 2024

Add deterministic simulation testing #90

Add deterministic simulation testing #90

Comments

caspark commented Dec 14, 2024

What

Why

How