Skip to content

Commit

Permalink
update docs on dataformat
Browse files Browse the repository at this point in the history
  • Loading branch information
inodentry committed Aug 15, 2024
1 parent 49c7ac1 commit 12b8eb0
Show file tree
Hide file tree
Showing 7 changed files with 210 additions and 237 deletions.
8 changes: 5 additions & 3 deletions doc/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
- [Session Configuration](./server/sessions.md)
- [Security](./server/security.md)
- [Runtime Management](./server/management.md)
- [Technical Documentation](./tech.md)
- [Player Stream Format](./tech/dataformat-player.md)
- [Spectator/Replay Stream Format](./tech/dataformat-spectator.md)
- [MineWars DataFormat](./dataformat/intro.md)
- [File Structure](./dataformat/file.md)
- [Initialization Sequence (IS)](./dataformat/is.md)
- [Game Updates and Framing](./dataformat/frames.md)
- [Game Update Messages](./dataformat/msgs.md)
45 changes: 45 additions & 0 deletions doc/src/dataformat/file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# File Structure

A MineWars File contains the following, in order:
- [File Header](#file-header)
- [Initialization Sequence](./is.md)
- [Frames of Game Updates](./frames.md)

## File Header

The file header has the following structure:
- `[u64; 3]`: checksums
- `u32`: length of compressed frame data in bytes
- `u32`: length of uncompressed frame data in bytes

If compressed length == uncompressed length, the frames data is stored uncompressed.

If compressed length < uncompressed length, all the frames are compressed as a single big LZ4 block.

### Checksums

The file begins with 3 SeaHash checksums.

The first checksum covers:
- the remainder of the file header, incl. the following 2 checksums
- the header part of the [Initialization Sequence](./is.md)

The second checksum covers:
- the data of the Initialization Sequence (everything after the header)

The third checksum covers:
- all the frames data

## Initialization Sequence

After the File Header follows the [Initialization Sequence](./is.md).

## Frames

After the IS follow [frames of game updates](./frames.md)

Note: neither the length of the IS nor the start offset of the frame data
are encoded in the file header. The IS Header must be parsed to compute that.

It is thus impossible to read the frames from a MineWars file without
decoding the IS first.
Original file line number Diff line number Diff line change
@@ -1,61 +1,4 @@
# Spectator/Replay Stream Format

The Spectator Protocol is essentially a container format that multiplexes
multiple [player protocol](./dataformat-player.md) streams (one for each player in
the game, representing their view of the world) together, along with a global
"spectator view" stream (also in the same format) providing a global view
of the game world.

This is used to give spectator clients all the data they need to simultaneously
follow all participants in the game. This is also the file format used for
replay files.

## Stream Structure

The contents of the stream/file appear in this order:

- File Header (file only)
- Initialization Sequence
- [... frames ...]

## File Header

In the case of a replay file, a header is prepended.

The file header has the following structure:
- `[u64; 3]`: checksums
- `u32`: length of compressed frame data in bytes
- `u32`: length of uncompressed frame data in bytes

If compressed length == uncompressed length, the frames data is stored uncompressed.

If compressed length < uncompressed length, all the frames are compressed as a single big LZ4 block.

## Checksums

Checksums are only used in the case of replay files. Network streams do
not have checksums. In that case, the transport protocol is assumed to be
responsible for data integrity.

The file begins with 3 SeaHash checksums.

The first checksum covers:
- the remainder of the file header, incl. the following 2 checksums
- the header part of the Initialization Sequence

The second checksum covers:
- the data of the Initialization Sequence (everything after the header)

The third checksum covers:
- all the frames data

## Initialization Sequence

This is the same as described in the [player protocol documentation](./dataformat-player.md).

However, starting item positions should be encoded inside the map data.

## Frames
# Game Updates and Framing

A Frame is a collection of game updates that happen together at the same time.
It encodes the point of view of every player in the game who is involved + a
Expand All @@ -64,13 +7,13 @@ the frame.

Note: it is not a requirement that *all* game update messages from the same
timestamp are encoded together. They may be fragmented into multiple frames.
Subsequent frames would just have the timestamp field set to zero.
Subsequent frames would just have their time offset set to zero.

Such fragmentation is necessary if the frame payload exceeds 256 bytes in length.

There are three kinds of frame encodings: Homogenous, Heterogenous, Keepalive.

### Homogenous Frames
## Homogenous Frames

Homogenous frames are frames where every participant gets the same data. The data is
only encoded once and assumed to apply to all participating streams.
Expand All @@ -94,7 +37,7 @@ Initialization Sequence. `u8` if `max_plid <= 7`, `u16` if `max_plid >= 8`.
The data payload is the [player protocol update messages](./dataformat-player.md#gameplay-messages).
All of the players listed in the participation mask must receive the entire identical data payload.

### Heterogenous Frames
## Heterogenous Frames

Heterogenous frames are freams where each participant gets different data. The data
for each participating stream is included in the frame.
Expand Down Expand Up @@ -128,7 +71,7 @@ messages](./dataformat-player.md#gameplay-messages) for that view.
The total length of the data payload is the sum of the lengths of each view's
data, as given in the Heterogenous Frame Header described above.

### Keepalive Frames
## Keepalive Frames

Keepalive frames are to be used if the time delta since the last frame is too
long to be represented in a single frame header. It is an empty frame with no
Expand All @@ -141,34 +84,3 @@ Keepalive frames have the following structure:
- `u16`: `-111111111111111`

Note: there is no participation mask, no data length field, no data payload

## The Global Spectator View

The global spectator view behaves somewhat differently from the player views.

- No fog of war must be displayed
- Digits are to be calculated by the client, from known mine locations

To accommodate this, there are some special provisions in the spectator
stream format, that differ from the player stream.

The initialization sequence encodes mine positions inside the map data.

The global spectator view is controlled using the same update message format
as player views, but some message types are used differently:
- "Digit Update" and "Capture + Digits" must have the tile owner inferred
from the participation mask. The mask must encode only one PlayerID
(other than bit 0 for the spectator stream).

## Compression Dictionary

A special dictionary is prepared to help improve compression of the update
frames. It is to be generated from the data in the initialization sequence.

It is constructed by concatenating the following data:

- Every mountain coordinate on the map, in sorted order.
- Every land coordinate on the map, in sorted order.

This effectively pre-seeds the compression algorithm with data sequences
likely to occur early-game.
56 changes: 56 additions & 0 deletions doc/src/dataformat/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# MineWars Data Format

The MineWars Data Format is the file format and encoding used to store MineWars
game data. Specifically, this is the format of `*.minewars` files that can
store maps and replays. It is implemented in the `mw_dataformat` Rust crate.

The Data Format is capable of storing:
- Map Data
- Other parameters and metadata of the game session
- A stream of gameplay updates for all players in the game, multiplexed together,
with timing information, to allow for watching a replay of a game.

It is not to be confused with the MineWars Player Protocol, which is what is used
over-the-wire for communication between the Game Client App and Host Server for
networked multiplayer gameplay.

The Player Protocol does internally use the Data Format for some purposes, such as:
- Transmitting the map data and configuration metadata to start a game session (Initialization Sequence).
- Encoding of most gameplay updates/events during gameplay (Game Update Messages).
- Multiplexing the PoVs of all the players in the game for sending to spectators (Framing).

However, the Player Protocol also does a lot more. The full Player Protocol
is proprietary and not publicly documented.

The Player Protocol and the Data Format are versioned separately (and separately from
the MineWars client and server software), but both of their versions are important for
compatibility.

Reusing the encoding of map data and gameplay updates between all of these use cases
(live gameplay, spectation, replay files) makes it easier to implement all of this
functionality in MineWars. That is the design goal of the Data Format.

## General Properties of the Data Format

This is a custom purpose-built compact binary format.

All multi-byte values are encoded as **big endian** and unaligned.

All **coordinates** are encoded as `(row: u8, col: u8)` (note (Y,X) order).
In places where a sequence of multiple coordinates is listed, it is recommended
to encode them in sorted order. This helps compression.

Some places use a special encoding for **time durations**:

|Bits |Meaning |
|----------|-------------------------|
|`0xxxxxxx`| `x` milliseconds |
|`10xxxxxx`| (`x` + 13) centiseconds |
|`11xxxxxx`| (`x` + 8) deciseconds |

**PlayerId**: a value between 1-15 inclusive.

**PlayerSubId**: a value between 0-14 inclusive.

You will also need to bring a LZ4 implementation supporting **raw blocks**.
The `lz4_flex` Rust crate is perfect. :)
95 changes: 95 additions & 0 deletions doc/src/dataformat/is.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Initialization Sequence (IS)

The IS is what sets the general configuration and metadata of the game and
encodes the initial state of map that the game will be played on.

## Header

It begins with a header:
- (`u8`,`u8`,`u8`,`u8`): Data Format Version
- `u8`: flags
- `u8`: map size (radius)
- `u8`: `max_plid` (bits 0-3), `max_sub_plid` (bits 4-7)
- `u8`: number of cities/regions
- `u32`: length of compressed map data in bytes
- `u16`: length of the Rules data
- `u16`: length of the Cits names data

The `flags` field is encoded as follows:

|Bits |Meaning |
|----------|----------------------------|
|`----0---`| Game uses a hexagonal grid |
|`----1---`| Game uses a square grid |
|`xxx--xxx`|(reserved bits) |

## Map Data

Then follows the map data.

If compressed length < uncompressed length, the data is LZ4 compressed.

If compressed length == uncompressed length, the data is raw/uncompressed.

The compressed length is stored in the header. The uncompressed length must
be computed from the map radius.

First, the map data is encoded as one byte per tile:

|Bits |Meaning |
|----------|----------------------------|
|`----xxxx`| Tile Kind |
|`xxxx----`| Item Kind |

Tile Kind: same encoding as the "Tile Kind Update" message below.
Item Kind: same encoding as the "Reveal Item" message below.

The Item Kind is useful for spectators and replay files, so that they don't
need to start with a long sequence of "Reveal Item" messages at tick 0 for
all the initial items on the map. Other use cases (such as "map files")
may just always set it to zero.

The tiles are encoded in concentric-ring order, starting from the center of
the map. The map data ends when all rings up until the map radius specified in
the header have been encoded.

Each ring starts from the lowest (Y,X) coordinate and follows the +X direction first:

Square example:
```
654
7.3
012
```

Hex example:
```
4 3
5 . 2
0 1
```

(`0` is the starting position, assuming +X points right and +Y points up)

After the map data, regions are encoded the same way: one byte per tile, in
concentric ring order. The byte is the city/region ID for that tile.

If the number of cities/regions is 0, this part of the map data is skipped.

## City Info

First, locations for each city on the map:
- `(u8, u8)`: (y, x) location

Then, names for each city on the map:
- `u8`: length in bytes
- …: phonemes

The name uses a special Phoneme encoding (undocumented, see source code),
which can be rendered/localized based on client language.

## Game Parameters / Rules

Then follow the parameters used for the game rules, in this game.

// TODO
Loading

0 comments on commit 12b8eb0

Please sign in to comment.