Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search tx optimization experiments (user_commands_aggregated) #58

Merged
merged 13 commits into from
Oct 31, 2024

Conversation

piotr-iohk
Copy link
Collaborator

@piotr-iohk piotr-iohk commented Oct 25, 2024

Summary

This PR introduces optimizations for the /search/transactions endpoint. The main feature is a new table, user_commands_aggregated, designed to improve query performance by trading off normalization for speed and increased storage usage. A function and trigger are also added to keep the table updated.

Additionally, a new command has been added to the Mesh tool for managing these optimizations (mina-mesh search-tx-optimizations --apply, --drop, or --check their status), and the serve command now includes a --use-search-tx-optimizations flag to enable these optimizations. These features are intended for testing and may be temporary.


New Commands

The mina-mesh tool now supports the following:

$ mina-mesh search-tx-optimizations
Error: You must specify either --apply, --drop, or --check.

$ mina-mesh search-tx-optimizations --apply
Applying search transaction optimizations on Archive Database (this may take few minutes)...
Optimizations applied successfully.

Once optimizations are applied, you can run the Mesh server with optimizations enabled:

$ mina-mesh serve --use-search-tx-optimizations
2024-10-24T08:05:41.349348Z  INFO mina_mesh::commands::serve: listening on http://0.0.0.0:3000

Performance Test Results

The following table shows a comparison of query performance between Rosetta, the default Mesh server, and the optimized Mesh server:

QUERY Rosetta mina-mesh serve mina-mesh serve --use-search-tx-optimizations (35c54dd) mina-mesh serve --use-search-tx-optimizations (c1332d2)
"address":"B62qiburnzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzmp7r7UN6X", "limit":5000 12-16s 12-14s ~7.5s 3.8-4.1s
"address":"B62qpXXYbzeZkXrpa3EuZcXgqFSsBsSWrrvi16GJnXLhaqELBSfbnGF" 10-12s 10-12s 6s 3.1s
"address":"B62qowpMhZ2Ww7b8xQxcK7rrpfsL5Nt5Yz5uxaizUBKqpeZUqBETa31","status":"applied","limit":100 10-12s 11-14s 6-8s 3.1s
"max_block":394837,"status":"failed","limit":1000 2.4s 2.7s 1.5s 0.9s
"transaction_identifier":{"hash":"5JvFj6DJh1dnMnLPki9ZnmgbxcgfNZCc6hRs8FhvVhaautt84EpY"} 0.46s 0.47s 0.046s 0.1s

Notes on performance test:

  • executed against mainnet setup (archive DB + archive node + mina daeomon + rosetta + mesh) on local machine (Specs: Linux PopOS, 13th Gen Intel® Core™ i9-13900HX × 32, 64GB RAM)
  • measurements using time tool using the query template as follows:
$ time curl -L -X POST 'http://localhost:3000/search/transactions' \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' -d '{
 "network_identifier": {
               "blockchain": "mina",
               "network": "mainnet"
 },
  <QUERY>
}'
  • each query has been run 3-4 times against each server and a time range is provided in the table

Storage Overhead

The storage overhead on Postgres server is around 3GB as this is the size of the table with indexes:

SELECT pg_size_pretty(pg_total_relation_size('user_commands_aggregated')) AS total_size;
"3126 MB"

The storage overhead on the database dump (taken with pg_dump tool) is around 2GB:

$ du -h archive-*
3,1G	archive-no-optimizations.sql
4,9G	archive-optimizations.sql

Dev Notes

This PR introduces new table and this table needs to be available in the development schema in compile time (as sqlx needs it for checks) therefore there are new deno tasks provided for applying and dropping these optimizations:

- pg:apply_optimizations
    deno run -A ./tasks/search_tx_optimizations.ts --apply
- pg:drop_optimizations
    deno run -A ./tasks/search_tx_optimizations.ts --drop

Comment on lines 118 to 121
CREATE
OR REPLACE trigger trigger_add_to_user_commands_aggregated
AFTER insert ON blocks_user_commands FOR each ROW
EXECUTE function add_to_user_commands_aggregated ();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're only inserting new row to user_commands_aggregated when there is new row in the blocks_user_commands. There is no logic for AFTER INSERT to user_commands or AFTER UPDATE/AFTER DELETE to either user_commands or blocks_user_commands.

Since the underlying query for user_commands_aggregated has this join:

SELECT
 *
FROM
  user_commands AS u
  INNER JOIN blocks_user_commands AS buc ON u.id=buc.user_command_id;

so I was thinking that such trigger (and add function) is enough... but the question is do we also need other triggers e.g.:

  • AFTER UPDATE, AFTER DELETE on blocks_user_commands
  • AFTER INSERT, AFTER UPDATE or AFTER DELETE on user_commands

@joaosreis may you weigh in on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost certain the archive node doesn't perform UPDATE or DELETE operations, so we can safely ignore those.

AFTER INSERT on user_commands should also not be needed as entries on block_user_commands require that the respective user_commands entry already exists (and we get that data from the JOIN). Also, a new block with user commands ensures that a new block_user_commands entry is created, which might not happen with user_commands (if the metadata of the user commands is the same as a previous one).

These assumptions also apply to internal and zkapps commands.

Comment on lines 48 to 51
AND (
b.chain_status='canonical'
OR b.chain_status='pending'
)
Copy link
Collaborator Author

@piotr-iohk piotr-iohk Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that adding this while dropping the following CTEs (from this query as well as others):

  canonical_blocks AS (
    SELECT
      *
    FROM
      blocks
    WHERE
      chain_status='canonical'
  ),
  max_canonical_height AS (
    SELECT
      max(HEIGHT) AS max_height
    FROM
      canonical_blocks
  ),
  pending_blocks AS (
    SELECT
      b.*
    FROM
      blocks AS b,
      max_canonical_height AS m
    WHERE
      b.height>m.max_height
      AND b.chain_status='pending'
  ),
  blocks AS (
    SELECT
      *
    FROM
      canonical_blocks
    UNION ALL
    SELECT
      *
    FROM
      pending_blocks
  )

improves performance by another ~50%... updated the performance tests table in PR summary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's break apart the sql directory into two dirs: queries and migrations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead just utilize the sqlx CLI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the optimizations become a permanent requirement for MinaMesh functionality and we decide to keep the "migration" scripts here and not in the Archive DB, integrating them into a migration workflow with sqlx migrate run would be a way to go I think. For now it is still a bit experimental so I thought that having this command would be more sensible. It allows for applying the optimizations and testing them out as an option, at the same time one is not required to use/apply them.

src/error.rs Show resolved Hide resolved
const connectionString = Deno.env.get("DATABASE_URL");
assertExists(connectionString);

const args = Deno.args;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't quite grok: what is the overlap between this script and the search_tx_optimizations.rs? Also, let's utilize the the @std/cli's parseArgs util for argument parsing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't quite grok: what is the overlap between this script and the search_tx_optimizations.rs?

Well, you need to compile mina-mesh to be able to use the search_tx_optimizations.rs, but you cannot compile mina-mesh without having the additional table user_commands_aggregated. Therefore the deno task is intended for creating the dev environment such that you're able to compile the project. On the other hand the new command ( search_tx_optimizations.rs) is user facing command for applying the optimizations on the archive DB. As I mentioned in the PR summary, these (search-tx-optimizations and --use-search-tx-optimizations) may be temporary (for instance if we decide to move/apply optimzations into the archive DB schema directly), but for now that lives as an utility in the mina-mesh as we're testing/researching this option.

Also, let's utilize the the @std/cli's parseArgs util for argument parsing.

👍

@harrysolovay harrysolovay mentioned this pull request Oct 29, 2024

/// Apply optimizations
#[arg(long, conflicts_with = "drop")]
apply: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a nested subcommand instead of the boolean flags + conflicts_with. That being said, it's not super important; let's merge for now.

println!("Applying search transaction optimizations on Archive Database (this may take few minutes)...");

// Load and execute the SQL from the file
let sql = include_str!("../../sql/migrations/apply_search_tx_optimizations.sql");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cases like this, we should probably use the following (incase we want to move the file elsewhere / refactor the line elsewhere, we don't need to worry about the relative reference)

include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/root-relative/path/to/the/file.sql"))

println!("Dropping search transaction optimizations from Archive Database...");

// Load and execute the SQL from the file
let sql = include_str!("../../sql/migrations/drop_search_tx_optimizations.sql");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here

@piotr-iohk piotr-iohk merged commit f7960bb into main Oct 31, 2024
6 checks passed
@piotr-iohk piotr-iohk deleted the search-tx-optimization-experiments branch October 31, 2024 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants