Search tx optimization experiments (`user_commands_aggregated`) #58

piotr-iohk · 2024-10-25T08:16:44Z

Summary

This PR introduces optimizations for the /search/transactions endpoint. The main feature is a new table, user_commands_aggregated, designed to improve query performance by trading off normalization for speed and increased storage usage. A function and trigger are also added to keep the table updated.

Additionally, a new command has been added to the Mesh tool for managing these optimizations (mina-mesh search-tx-optimizations --apply, --drop, or --check their status), and the serve command now includes a --use-search-tx-optimizations flag to enable these optimizations. These features are intended for testing and may be temporary.

New Commands

The mina-mesh tool now supports the following:

$ mina-mesh search-tx-optimizations
Error: You must specify either --apply, --drop, or --check.

$ mina-mesh search-tx-optimizations --apply
Applying search transaction optimizations on Archive Database (this may take few minutes)...
Optimizations applied successfully.

Once optimizations are applied, you can run the Mesh server with optimizations enabled:

$ mina-mesh serve --use-search-tx-optimizations
2024-10-24T08:05:41.349348Z  INFO mina_mesh::commands::serve: listening on http://0.0.0.0:3000

Performance Test Results

The following table shows a comparison of query performance between Rosetta, the default Mesh server, and the optimized Mesh server:

QUERY	Rosetta	`mina-mesh serve`	`mina-mesh serve --use-search-tx-optimizations` (`35c54dd`)	`mina-mesh serve --use-search-tx-optimizations` (`c1332d2`)
`"address":"B62qiburnzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzmp7r7UN6X", "limit":5000`	12-16s	12-14s	~7.5s	3.8-4.1s
`"address":"B62qpXXYbzeZkXrpa3EuZcXgqFSsBsSWrrvi16GJnXLhaqELBSfbnGF"`	10-12s	10-12s	6s	3.1s
`"address":"B62qowpMhZ2Ww7b8xQxcK7rrpfsL5Nt5Yz5uxaizUBKqpeZUqBETa31","status":"applied","limit":100`	10-12s	11-14s	6-8s	3.1s
`"max_block":394837,"status":"failed","limit":1000`	2.4s	2.7s	1.5s	0.9s
`"transaction_identifier":{"hash":"5JvFj6DJh1dnMnLPki9ZnmgbxcgfNZCc6hRs8FhvVhaautt84EpY"}`	0.46s	0.47s	0.046s	0.1s

Notes on performance test:

executed against mainnet setup (archive DB + archive node + mina daeomon + rosetta + mesh) on local machine (Specs: Linux PopOS, 13th Gen Intel® Core™ i9-13900HX × 32, 64GB RAM)
measurements using time tool using the query template as follows:

$ time curl -L -X POST 'http://localhost:3000/search/transactions' \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' -d '{
 "network_identifier": {
               "blockchain": "mina",
               "network": "mainnet"
 },
  <QUERY>
}'

each query has been run 3-4 times against each server and a time range is provided in the table

Storage Overhead

The storage overhead on Postgres server is around 3GB as this is the size of the table with indexes:

SELECT pg_size_pretty(pg_total_relation_size('user_commands_aggregated')) AS total_size;
"3126 MB"

The storage overhead on the database dump (taken with pg_dump tool) is around 2GB:

$ du -h archive-*
3,1G	archive-no-optimizations.sql
4,9G	archive-optimizations.sql

Dev Notes

This PR introduces new table and this table needs to be available in the development schema in compile time (as sqlx needs it for checks) therefore there are new deno tasks provided for applying and dropping these optimizations:

- pg:apply_optimizations
    deno run -A ./tasks/search_tx_optimizations.ts --apply
- pg:drop_optimizations
    deno run -A ./tasks/search_tx_optimizations.ts --drop

piotr-iohk · 2024-10-25T09:23:58Z

sql/apply_search_tx_optimizations.sql

+CREATE
+OR REPLACE trigger trigger_add_to_user_commands_aggregated
+AFTER insert ON blocks_user_commands FOR each ROW
+EXECUTE function add_to_user_commands_aggregated ();


So we're only inserting new row to user_commands_aggregated when there is new row in the blocks_user_commands. There is no logic for AFTER INSERT to user_commands or AFTER UPDATE/AFTER DELETE to either user_commands or blocks_user_commands.

Since the underlying query for user_commands_aggregated has this join:

SELECT * FROM user_commands AS u INNER JOIN blocks_user_commands AS buc ON u.id=buc.user_command_id;

so I was thinking that such trigger (and add function) is enough... but the question is do we also need other triggers e.g.:

AFTER UPDATE, AFTER DELETE on blocks_user_commands

AFTER INSERT, AFTER UPDATE or AFTER DELETE on user_commands

@joaosreis may you weigh in on this?

I'm almost certain the archive node doesn't perform UPDATE or DELETE operations, so we can safely ignore those.

AFTER INSERT on user_commands should also not be needed as entries on block_user_commands require that the respective user_commands entry already exists (and we get that data from the JOIN). Also, a new block with user commands ensures that a new block_user_commands entry is created, which might not happen with user_commands (if the metadata of the user commands is the same as a previous one).

These assumptions also apply to internal and zkapps commands.

piotr-iohk · 2024-10-25T15:32:45Z

sql/indexer_internal_commands_optimized.sql

+      AND (
+        b.chain_status='canonical'
+        OR b.chain_status='pending'
+      )


It seems that adding this while dropping the following CTEs (from this query as well as others):

canonical_blocks AS ( SELECT * FROM blocks WHERE chain_status='canonical' ), max_canonical_height AS ( SELECT max(HEIGHT) AS max_height FROM canonical_blocks ), pending_blocks AS ( SELECT b.* FROM blocks AS b, max_canonical_height AS m WHERE b.height>m.max_height AND b.chain_status='pending' ), blocks AS ( SELECT * FROM canonical_blocks UNION ALL SELECT * FROM pending_blocks )

improves performance by another ~50%... updated the performance tests table in PR summary.

harrysolovay · 2024-10-28T13:33:45Z

sql/apply_search_tx_optimizations.sql

Let's break apart the sql directory into two dirs: queries and migrations.

harrysolovay · 2024-10-28T13:36:43Z

src/commands/search_tx_optimizations.rs

Should we instead just utilize the sqlx CLI?

If the optimizations become a permanent requirement for MinaMesh functionality and we decide to keep the "migration" scripts here and not in the Archive DB, integrating them into a migration workflow with sqlx migrate run would be a way to go I think. For now it is still a bit experimental so I thought that having this command would be more sensible. It allows for applying the optimizations and testing them out as an option, at the same time one is not required to use/apply them.

src/error.rs

harrysolovay · 2024-10-28T13:40:13Z

tasks/search_tx_optimizations.ts

+const connectionString = Deno.env.get("DATABASE_URL");
+assertExists(connectionString);
+
+const args = Deno.args;


I can't quite grok: what is the overlap between this script and the search_tx_optimizations.rs? Also, let's utilize the the @std/cli's parseArgs util for argument parsing.

I can't quite grok: what is the overlap between this script and the search_tx_optimizations.rs?

Well, you need to compile mina-mesh to be able to use the search_tx_optimizations.rs, but you cannot compile mina-mesh without having the additional table user_commands_aggregated. Therefore the deno task is intended for creating the dev environment such that you're able to compile the project. On the other hand the new command ( search_tx_optimizations.rs) is user facing command for applying the optimizations on the archive DB. As I mentioned in the PR summary, these (search-tx-optimizations and --use-search-tx-optimizations) may be temporary (for instance if we decide to move/apply optimzations into the archive DB schema directly), but for now that lives as an utility in the mina-mesh as we're testing/researching this option.

Also, let's utilize the the @std/cli's parseArgs util for argument parsing.

👍

harrysolovay · 2024-10-31T13:33:36Z

src/commands/search_tx_optimizations.rs

+
+  /// Apply optimizations
+  #[arg(long, conflicts_with = "drop")]
+  apply: bool,


Could use a nested subcommand instead of the boolean flags + conflicts_with. That being said, it's not super important; let's merge for now.

harrysolovay · 2024-10-31T13:36:25Z

src/commands/search_tx_optimizations.rs

+    println!("Applying search transaction optimizations on Archive Database (this may take few minutes)...");
+
+    // Load and execute the SQL from the file
+    let sql = include_str!("../../sql/migrations/apply_search_tx_optimizations.sql");


For cases like this, we should probably use the following (incase we want to move the file elsewhere / refactor the line elsewhere, we don't need to worry about the relative reference)

include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/root-relative/path/to/the/file.sql"))

harrysolovay · 2024-10-31T13:38:39Z

src/commands/search_tx_optimizations.rs

+    println!("Dropping search transaction optimizations from Archive Database...");
+
+    // Load and execute the SQL from the file
+    let sql = include_str!("../../sql/migrations/drop_search_tx_optimizations.sql");


Same thing here

piotr-iohk added 8 commits October 23, 2024 08:23

user_command_aggregated_info playing around

23a1241

queries

9fff045

user_commands_aggregated sqls

04c6d98

search tx optimizations command

ec51cdf

use search tx optimizations flag

61d33c6

apply/drop optimizations deno tasks

3129b30

idx_user_commands_aggregated_hash

dcac749

minor updates

a3d438f

piotr-iohk self-assigned this Oct 25, 2024

piotr-iohk requested review from harrysolovay, joaosreis and johnmarcou as code owners October 25, 2024 08:16

piotr-iohk added 2 commits October 25, 2024 10:58

update query cache

f1f5e80

satisfy clippy

35c54dd

piotr-iohk commented Oct 25, 2024

View reviewed changes

internal/zkapp optimization foundations

c1332d2

piotr-iohk commented Oct 25, 2024

View reviewed changes

harrysolovay suggested changes Oct 28, 2024

View reviewed changes

harrysolovay mentioned this pull request Oct 29, 2024

Dev Command #61

Closed

piotr-iohk added 2 commits October 30, 2024 10:07

sort out queries into subdirs

d687abd

use parseArgs

f6012fe

harrysolovay reviewed Oct 31, 2024

View reviewed changes

harrysolovay approved these changes Oct 31, 2024

View reviewed changes

piotr-iohk merged commit f7960bb into main Oct 31, 2024
6 checks passed

piotr-iohk deleted the search-tx-optimization-experiments branch October 31, 2024 15:01

piotr-iohk mentioned this pull request Nov 6, 2024

More search tx optimizations (internal_commands_aggregated, zkapp_commands_aggregated) #63

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search tx optimization experiments (`user_commands_aggregated`) #58

Search tx optimization experiments (`user_commands_aggregated`) #58

piotr-iohk commented Oct 25, 2024 •

edited

Loading

piotr-iohk Oct 25, 2024

joaosreis Nov 7, 2024

piotr-iohk Oct 25, 2024 •

edited

Loading

harrysolovay Oct 28, 2024

harrysolovay Oct 28, 2024

piotr-iohk Oct 30, 2024

harrysolovay Oct 28, 2024

piotr-iohk Oct 30, 2024

harrysolovay Oct 31, 2024

harrysolovay Oct 31, 2024

harrysolovay Oct 31, 2024

Search tx optimization experiments (user_commands_aggregated) #58

Search tx optimization experiments (user_commands_aggregated) #58

Conversation

piotr-iohk commented Oct 25, 2024 • edited Loading

Summary

New Commands

Performance Test Results

Storage Overhead

Dev Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piotr-iohk Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Search tx optimization experiments (`user_commands_aggregated`) #58

Search tx optimization experiments (`user_commands_aggregated`) #58

piotr-iohk commented Oct 25, 2024 •

edited

Loading

piotr-iohk Oct 25, 2024 •

edited

Loading