Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling tracing by usage of opentelemetry #99

Closed
wants to merge 30 commits into from

Conversation

rem1-dev
Copy link

I've added tracing to the project by introducing mainly telemetry, opentelemetry and Tokyo crates. Having tracing allows analysis of bottleneck functions (functions that take majority of execution time).

@RCasatta RCasatta changed the title Enabling tracing by usage of opentelemtry Enabling tracing by usage of opentelemetry Aug 15, 2024
@RCasatta
Copy link
Collaborator

Isn't prometheus trying to achieve the same thing? Should we remove that if we decide to use opentelemetry?

@philippem
Copy link
Collaborator

Isn't prometheus trying to achieve the same thing? Should we remove that if we decide to use opentelemetry?

  • prometheus gives us metrics ( example there is a histogram of times spent making RPC calls to bitcoind). Many alerts can be directly built on prometheus metrics.
  • opentelemtry (OTLP) gives us tracing within and between processes (for example we can look at a mempool update operation and see where the time is being spent)

They both enhance observability but are complementary.

@shesek
Copy link
Collaborator

shesek commented Aug 24, 2024

Isn't prometheus trying to achieve the same thing? Should we remove that if we decide to use opentelemetry?

...
They both enhance observability but are complementary.

I'm still not quite sure I understand the differences. Prometheus can and is being used to track function execution times, for example in these functions (via _timer and its Drop) where opentelemetry was added too:

+    #[instrument(skip_all, name="Mempool::utxo")]
     pub fn utxo(&self, scripthash: &[u8]) -> Vec<Utxo> {
         let _timer = self.latency.with_label_values(&["utxo"]).start_timer();
         let entries = match self.history.get(scripthash) {
@@ -209,6 +216,7 @@ impl Mempool {
             .collect()
     }
 
+    #[instrument(skip_all, name="Mempool::stats")]
     // @XXX avoid code duplication with ChainQuery::stats()?
     pub fn stats(&self, scripthash: &[u8]) -> ScriptStats {
         let _timer = self.latency.with_label_values(&["stats"]).start_timer();
@@ -258,12 +266,14 @@ impl Mempool {
         stats
     }
 
+    #[instrument(skip_all, name="Mempool::txids")]
     // Get all txids in the mempool
     pub fn txids(&self) -> Vec<&Txid> {
         let _timer = self.latency.with_label_values(&["txids"]).start_timer();
         self.txstore.keys().collect()
     }

src/bin/electrs.rs Outdated Show resolved Hide resolved
src/electrum/server.rs Outdated Show resolved Hide resolved
src/config.rs Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
@shesek
Copy link
Collaborator

shesek commented Aug 24, 2024

It's not critical, but it could aid review if the addition of #[instrument()] attributes was separated to its own commit

junderw pushed a commit to junderw/electrs that referenced this pull request Sep 4, 2024
…-type

Add support for anchor output type
src/daemon.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
@shesek
Copy link
Collaborator

shesek commented Sep 20, 2024

How do you feel about defining a no-op #[instrument] macro when the tracing feature is disabled, so that we don't have to wrap it in #[cfg_attr(feature = "tracing",... )) every time?

@rem1-dev
Copy link
Author

rem1-dev commented Oct 1, 2024

How do you feel about defining a no-op #[instrument] macro when the tracing feature is disabled, so that we don't have to wrap it in #[cfg_attr(feature = "tracing",... )) every time?

I found out that instead of defining my own procedural macro I can just use tracing/max_level_off to turn off tracing by default and make #[instrument] macro no-op by default. I implemented it.

@rem1-dev
Copy link
Author

rem1-dev commented Oct 1, 2024

Tracing is turned off by default and can be turned on by providing following params for both cargo build and cargo run commands:

--no-default-features --features otlp-tracing

@shesek
Copy link
Collaborator

shesek commented Oct 1, 2024

It would be nice if it was possible to avoid the additional dependencies altogether, not just disable tracing

@rem1-dev
Copy link
Author

rem1-dev commented Oct 3, 2024

It would be nice if it was possible to avoid the additional dependencies altogether, not just disable tracing

I guess that needs, as you suggested, a new macro that is no-op in case of otlp-tracing feature flag is off. You're right that we still have dependency on new crates. But by default the only new crates we will depend on is just one: tracing - so that we can turn off it's macro that we used. Only if we turn on future flag we pull other dependencies in addition to tracing:
"tracing-subscriber",
"opentelemetry",
"tracing-opentelemetry",
"opentelemetry-otlp",
"opentelemetry-semantic-conventions"

so we add 1 new dependency only (or 6 if one decides to use tracing). Is it still not acceptable @shesek ?

@shesek
Copy link
Collaborator

shesek commented Oct 15, 2024

I had a preference to avoid additional dependencies altogether, and thought this should be as simple as

#[not(cfg(feature = "otlp"))]
#[proc_macro_attribute]
pub fn instrument(_attr: TokenStream, item: TokenStream) -> TokenStream { item }

However I realized that this would require making this a proc-macro crate would would prevent us from exporting anything other than proc macros, or make a separate crate just for this, which overly complicates things... So yeah, let's stick with the one extra dependency.

Cargo.toml Outdated Show resolved Hide resolved
@rem1-dev rem1-dev force-pushed the trace-otlp branch 2 times, most recently from 1dd5bc8 to e871af5 Compare October 21, 2024 12:06
Copy link
Collaborator

@shesek shesek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a suggestion and some nits (with apologies for the newline pedantry 😅)

It's kind of unfortunate that we have to use a negative feature, which requires using --no-default-features to disable no-otlp-tracing when enabling otlp-tracing. It'll become more annoying if we ever add other default features, as they'll have to be re-enabled manually. But I don't see a way around it... is there? (/cc @RCasatta)

src/daemon.rs Show resolved Hide resolved
src/electrum/server.rs Show resolved Hide resolved
src/electrum/server.rs Show resolved Hide resolved
src/new_index/query.rs Show resolved Hide resolved
src/new_index/schema.rs Show resolved Hide resolved
src/new_index/schema.rs Show resolved Hide resolved
src/new_index/schema.rs Show resolved Hide resolved
src/new_index/schema.rs Show resolved Hide resolved
src/rest.rs Show resolved Hide resolved
src/bin/electrs.rs Show resolved Hide resolved
@shesek
Copy link
Collaborator

shesek commented Oct 22, 2024

It's kind of unfortunate that we have to use a negative feature ...

One thing we could do to make this less of a footgun is to add some explicit useful error messages when --no-default-features is used without re-enabling either otlp features, or when otlp-tracing is enabled without using --no-default-features:

#[cfg(not(any(feature = "otlp-tracing", feature = "no-otlp-tracing")))]
compile_error!("Must enable one of the 'otlp-tracing' or 'no-otlp-tracing' features");
#[cfg(all(feature = "otlp-tracing", feature = "no-otlp-tracing"))]
compile_error!("Cannot enable both the 'otlp-tracing' and 'no-otlp-tracing' features");

@RCasatta
Copy link
Collaborator

RCasatta commented Oct 22, 2024

It's kind of unfortunate that we have to use a negative feature, which requires using --no-default-features to disable no-otlp-tracing when enabling otlp-tracing. It'll become more annoying if we ever add other default features, as they'll have to be re-enabled manually. But I don't see a way around it... is there? (/cc @RCasatta)

By looking here https://github.com/tokio-rs/tracing/blob/bdbaf8007364ed2a766cca851d63de31b7c47e72/tracing/src/level_filters.rs#L68 it seems that maximum tracing is the default so we don't need to opt-in for it? Thus, it seems to me we can avoid the positive feaure?
It would require to use code like not(feature = "no_otlp_tracing") which doesn't look great but at least we would need only 1 feature and continue to guarantee that features are additive.

@shesek
Copy link
Collaborator

shesek commented Oct 22, 2024

Thus, it seems to me we can avoid the positive feature?

The original goal for using the feature was to make (most) of the additional dependencies optional, which (I think?) we can't do with just a negative feature

@RCasatta
Copy link
Collaborator

RCasatta commented Oct 23, 2024

I see... I understand the issue of having too many deps, but I think it's greater in "money-handling" libs such as wallets than in electrs, so I would bite the bullet and build some deps even though they are not used instead of adding complexity.

@rem1-dev
Copy link
Author

Thus, it seems to me we can avoid the positive feature?

The original goal for using the feature was to make (most) of the additional dependencies optional, which (I think?) we can't do with just a negative feature

Yes, that was the motivation behind this, keep additional dependencies at minimum and also make #instrument[] annotations no-op by default. I couldn't find other solution than --no-default-features --features otlp-tracing and turning off tracing in default features.

src/bin/electrs.rs Outdated Show resolved Hide resolved
mariusz-reichert and others added 26 commits November 6, 2024 16:47
- Avoid unnecessary copying of prev outpoints

- When looking for both mempool and on-chain txos, accumulate the set of
  outpoints that remain to be looked up to avoid re-checking for them
  later again in the found set

- Refactored lookup_txos() to use lookup_txo() internally rather than
  the other way around, which was less efficient

- Lookup txos in mempool first, then on-chain

- ChainQuery::lookup_txos() now returns a Result instead of panicking
  when outpoints are missing

- Removed ChainQuery::lookup_avail_txos() and allow_missing, which are
  no longer neceesary
add binLiquid to the inherit list to enable specific builds
Using criterion so that we don't need nightly to run benchmark.

The "bench" features is also introduced as a way to bench private
methods via public methods feature gated via this feature.
on my machine:

add_blocks              time:   [4.7562 ms 4.7754 ms 4.7972 ms]
                        change: [-15.059% -14.650% -14.169%] (p = 0.00 < 0.05)
                        Performance has improved.
add_blocks              time:   [3.9458 ms 3.9586 ms 3.9717 ms]
                        change: [-17.564% -17.103% -16.660%] (p = 0.00 < 0.05)
                        Performance has improved.
add_blocks              time:   [2.9788 ms 2.9938 ms 3.0101 ms]
                        change: [-24.810% -24.373% -23.897%] (p = 0.00 < 0.05)
                        Performance has improved.
Now on nix env we provide the env vars of the executables needed for
integration testing, so we can enable tests.

To be coherent with the electrum nix version used, upgrade also the
autodownloaded one.

Note we have to change a test assertion, it seems electrum behaviour
changed, upgrading balances before confirmation.
to instead have logs like before use `RUST_LOG=debug cargo test`
By doing so simply `cargo run` launch the server, while now it is
required `cargo run --bin electrs`
While compaction during initial sync makes thing much slower, it may be
preferred to not require to size the disk the double of the final
required space.
it is less interesting to see how many rows are written in the db and
more interesting knowing the last height indexed
This actually hurts performance because the batched response has to be
bueffered on the bitcoind side, as @TheBlueMatt explains at
romanz#373 (comment)

Instead, send multiple individual RPC requests in parallel using a
thread pool, with a separate RPC TCP connection for each thread.

Also see romanz#374
The indexing process was adding transactions into the store so that
prevouts funded & spent within the same batch could be looked up via
Mempool::lookup_txos(). If the indexing process later failed for any
reason, these transactions would remain in the store.

With this change, we instead explicitly look for prevouts funded within
the same batch, then look for the rest in the chain/mempool indexes and
fail if any are missing, without keeping the transactions in the store.
Previously, if any of the mempool transactions were not available
because they were evicted between getting the mempool txids and txs
themselves, the mempool syncing operation would be aborted and tried
again from scratch.

With this change, we instead keep whatever transactions we were able to
fetch, then get the updated list of mempool txids and re-try fetching
missing ones continuously until we're able to get a full snapshot.
- Reduce logging level for bitcoind's JSONRPC response errors

  These can happen pretty often for missing mempool txs and do not
  warrant warn-level logging. Unexpected RPC errors will bubble up and
  be handled appropriately.

- Add more verbose logging for mempool syncing
Keep RPC TCP connections open between sync runs and reuse them,
to reduce TCP connection initialization overhead.
@shesek
Copy link
Collaborator

shesek commented Dec 15, 2024

--> #128

@shesek shesek closed this Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants