Skip to content

Commit

Permalink
book: frequently asked questions and answers
Browse files Browse the repository at this point in the history
  • Loading branch information
anwayde authored and mmcgee-jump committed Jun 14, 2024
1 parent 4905b6c commit d4ac2c4
Show file tree
Hide file tree
Showing 7 changed files with 304 additions and 3 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,4 @@ deps-bundle.tar.zst
/book/.vitepress/cache
/book/.vitepress/dist
/book/node_modules
/book/bun.lockb
4 changes: 4 additions & 0 deletions book/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ export default defineConfig({
{ text: 'Getting Started', link: 'getting-started' },
{ text: 'Configuring', link: 'configuring' },
{ text: 'Initializing', link: 'initializing' },
{ text: 'Frequently Asked Questions', link: 'faq' },
{ text: 'Monitoring', link: 'monitoring' },
{ text: 'Troubleshooting', link: 'troubleshooting' },
{ text: 'Tuning', link: 'tuning' },
]
}
] },
Expand Down
14 changes: 11 additions & 3 deletions book/guide/configuring.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,14 @@ for different commands may cause them to fail.
## Logging
By default Firedancer will maintain two logs. One permanent log which is
written to a file, and an ephemeral log for fast visual inspection which
is written to stderr.
is written to stderr. The Agave runtime and consensus components also
output logs which are a part of the Firedancer's logs. You can increase
the ephemeral log output in the configuration TOML.

```toml
[log]
level_stderr = "INFO"
```

## Layout
One way that Firedancer is fast is that it pins a dedicated thread to
Expand All @@ -73,10 +80,11 @@ should be started.

```toml
[layout]
affinity = "0-14"
net_tile_count = 4
affinity = "1-18"
quic_tile_count = 2
verify_tile_count = 4
bank_tile_count = 4
solana_labs_affinity = "19-31"
```

It is suggested to run as many tiles as possible and tune the tile
Expand Down
65 changes: 65 additions & 0 deletions book/guide/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Frequently Asked Questions

::: details What hardware do I need to run Frankendancer?

The current Frankendancer hardware requirements are the same
as that of an Agave validator. Refer to the [Hardware](./getting-started.md#hardware-requirements)
section in the [Getting Started](./getting-started.md) guide
for more details.

:::

::: details How can I obtain the Frankendancer binaries?

Frankendancer does not currently provide pre-built binaries.
It is recommended to build the binaries on the same host where
you are planning to run the validator. Frankendancer detects
system properties and tries to build a binary tuned for the
particular host. Take a look at the [getting started](./getting-started.md)
guide for requirements and instructions.

:::

::: details What branch or tag should I build from?

You can always checkout the `v0.1` tag, which will point to the
latest release. For more information, refer to the [releases](./getting-started.md#releases)
section.

:::

::: details How do I resolve errors encountered while starting up Frankendancer?

The Frankendancer binary `fdctl` tries to provide helpful error
messages to identify the problem and sometimes even suggests
solutions. Take a look at the [troubleshooting](./troubleshooting.md)
guide for some easy steps that can mitigate some common issues.

:::

::: details Can Agave and Frankendancer use the same ledger and snapshots?

Yes, Frankendancer is fully compatible with both the snapshot
and the ledger formats of the Agave validator.

:::

::: details How can I monitor the status of my Frankendancer node?

You can use most of the regular monitoring tools and commands
that you typically would use with an Agave validator to monitor
Frankendancer as well. Refer to the [monitoring](./monitoring.md)
guide for some helpful commands.

:::

::: details Why is my node still delinquent?

There could be several reasons, some of which include the validator
being unable to catchup and the validator not voting properly among
others. Take a look at the [tuning](./tuning.md) guide for some
tips on how to configure Frankendancer to increase the performance
of the replay stage so the validator catches up faster. Also make
sure that your node is staked and the stake is active.

:::
69 changes: 69 additions & 0 deletions book/guide/monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Monitoring

The Frankendancer validator can be monitored quite similar to an
Agave validator.

## Pre-requisite

Be sure to build the `solana` binary, i.e. specify `solana` as a
target to the `make` command. The binary should be in the same
directory as `fdctl`. If you have not added that directory to the
`PATH` environment variable, replace `solana` with the full path
to the binary in the following commands.

::: tip NOTE

Note that this list is not exhaustive. Some commands may not
work without RPC enabled on your validator. Check out the
comments in the `rpc` section of the `default.toml` file to
configure it according to your needs.

:::

## Solana Commands

* Ensure the validator has joined gossip

```sh [bash]
solana -ut gossip | grep <PUBKEY>
```

* Ensure the validator is caught up

```sh [bash]
solana -ut catchup --our-localhost
```

* Ensure the validator is voting

```sh [bash]
solana -ut validators | grep <PUBKEY>
```

* Ensure the validator is producing blocks

```sh [bash]
solana -ut block-production | grep <PUBKEY>
```

::: tip NOTE

You can also use the `agave-validator --ledger <PATH> monitor`
command with Frankendancer. For that, you need to build the
`agave-validator` binary from the `agave` repository.

:::

## Frankendancer Metrics

* Look at the prometheus metrics (on the same host)

```sh [bash]
curl http://localhost:7999/metrics
```

* Running the Frankendancer monitor

```sh [bash]
fdctl monitor --config ~/config.toml
```
73 changes: 73 additions & 0 deletions book/guide/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Troubleshooting

This page has a collection of common troubleshooting steps when operators
encounter errors while building and running Frankendancer. If these do
not address the problem, send a message in the `#firedancer-operators`
channel on the Solana Tech Discord or file an issue on GitHub.

## Building

### General Recommendations

* It is always a good idea to retry building everything again from scratch.
Do a fresh clone of the repository, following the instructions in the
[Getting Started](./getting-started.md#prerequisites) guide. Remember to
check if you're using a supported compiler and to run `./deps.sh`!

* If you're updating an existing repository clone, be sure to update
the solana submodule _after_ pulling the latest changes. For example:

```sh [bash]
~/firedancer $ git fetch
~/firedancer $ git checkout v0.1
~/firedancer $ git submodule update
```

### Specific Errors

* Missing `cargo` binary from rust toolchain

```sh [bash]
error: the 'cargo' binary, normally provided by the 'cargo' component, is not applicable to the '1.75.0-x86_64-unknown-linux-gnu' toolchain
+ exec cargo +1.75.0 build --profile=release-with-debug --lib -p solana-validator
error: the 'cargo' binary, normally provided by the 'cargo' component, is not applicable to the '1.75.0-x86_64-unknown-linux-gnu' toolchain
make: *** [src/app/fdctl/Local.mk:107: cargo-validator] Error 1
```

This typically happens due to a race condition between trying to install the
correct version of the rust toolchain and using it. Separately re-installing
the toolchain fixes it (replace `1.75.0` with the appropriate version):

```sh [bash]
rustup toolchain uninstall 1.75.0-x86_64-unknown-linux-gnu
rustup toolchain install 1.75.0-x86_64-unknown-linux-gnu
```

## Configuring

### General Recommendations

* If there are errors during `fdctl configure init all --config
~/config.toml`, consider running `fdctl configure fini all --config
~/config.toml` to remove all existing configuration and try the `init`
command again. You can also re-run a specific configure stage, for
example, `fdctl configure init workspace --config ~/config.toml`.

* Make sure the `config.toml` specified during this command is the
same as the one specified with the `run` command. Also make sure
that the content is valid TOML.

* Read the output of the command carefully, `fdctl` often prints out
a helpful message that contains suggestions on how to resolve some
errors. Be sure to try them out!

## Running

### General Recommendations

* Always run `fdctl configure init all --config ~/config.toml` before
running the `fdctl run --config ~/config.toml`. If using a systemd unit,
specify both of the commands together for starting Frankendancer.

* Make sure the `~/config.toml` being used is the same in the `configure`
and `run` commands.
81 changes: 81 additions & 0 deletions book/guide/tuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Tuning

## Tiles

To stay caught up with the cluster, the replay stage needs enough
cores and processing power. If you see your validator falling
behind with the default configuration, consider trying out the
following:

### Increase Shred Tiles

Example Original Config:

```toml
[layout]
affinity = "1-18"
quic_tile_count = 2
verify_tile_count = 4
bank_tile_count = 4
solana_labs_affinity = "19-31"
```

Example New Config:

```toml
[layout]
affinity = "1-18"
quic_tile_count = 2
verify_tile_count = 5
bank_tile_count = 2
shred_tile_count = 2
solana_labs_affinity = "19-31"
```

This takes a core from the `bank` tile (transaction execution) and
gives it to another `shred` tile (turbine and shred processing). It
takes another core from another `bank` tile and gives it to a `verify`
(signature verification) tile.

### Increase Cores for Solana Labs

Example Original Config:

```toml
[layout]
affinity = "1-18"
quic_tile_count = 2
verify_tile_count = 5
bank_tile_count = 2
shred_tile_count = 2
solana_labs_affinity = "19-31"
```

Example New Config:

```toml
[layout]
affinity = "1-16"
quic_tile_count = 1
verify_tile_count = 4
bank_tile_count = 2
shred_tile_count = 2
solana_labs_affinity = "17-31"
```

This takes 1 core from the `quic` tile and another from the `verify`
tile gives them both to the solana labs threads (where the replay stage
runs).

## QUIC

There is a lot of QUIC traffic in the cluster. If the validator is
having a hard time establishing QUIC connections, it might end up
getting less transactions. Some parameters that can be tuned to address
this are (these 2 parameters need to be the same value):

```toml
[tiles.quic]
max_concurrent_connections = 2048
max_concurrent_handshakes = 2048
```

0 comments on commit d4ac2c4

Please sign in to comment.