-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search tx optimization experiments (user_commands_aggregated
)
#58
Conversation
CREATE | ||
OR REPLACE trigger trigger_add_to_user_commands_aggregated | ||
AFTER insert ON blocks_user_commands FOR each ROW | ||
EXECUTE function add_to_user_commands_aggregated (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're only inserting new row to user_commands_aggregated
when there is new row in the blocks_user_commands
. There is no logic for AFTER INSERT
to user_commands
or AFTER UPDATE
/AFTER DELETE
to either user_commands
or blocks_user_commands
.
Since the underlying query for user_commands_aggregated
has this join:
SELECT
*
FROM
user_commands AS u
INNER JOIN blocks_user_commands AS buc ON u.id=buc.user_command_id;
so I was thinking that such trigger (and add function) is enough... but the question is do we also need other triggers e.g.:
AFTER UPDATE
,AFTER DELETE
onblocks_user_commands
AFTER INSERT
,AFTER UPDATE
orAFTER DELETE
onuser_commands
@joaosreis may you weigh in on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost certain the archive node doesn't perform UPDATE
or DELETE
operations, so we can safely ignore those.
AFTER INSERT
on user_commands
should also not be needed as entries on block_user_commands
require that the respective user_commands
entry already exists (and we get that data from the JOIN
). Also, a new block with user commands ensures that a new block_user_commands
entry is created, which might not happen with user_commands
(if the metadata of the user commands is the same as a previous one).
These assumptions also apply to internal and zkapps commands.
AND ( | ||
b.chain_status='canonical' | ||
OR b.chain_status='pending' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that adding this while dropping the following CTEs (from this query as well as others):
canonical_blocks AS (
SELECT
*
FROM
blocks
WHERE
chain_status='canonical'
),
max_canonical_height AS (
SELECT
max(HEIGHT) AS max_height
FROM
canonical_blocks
),
pending_blocks AS (
SELECT
b.*
FROM
blocks AS b,
max_canonical_height AS m
WHERE
b.height>m.max_height
AND b.chain_status='pending'
),
blocks AS (
SELECT
*
FROM
canonical_blocks
UNION ALL
SELECT
*
FROM
pending_blocks
)
improves performance by another ~50%... updated the performance tests table in PR summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's break apart the sql
directory into two dirs: queries
and migrations
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead just utilize the sqlx
CLI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the optimizations become a permanent requirement for MinaMesh functionality and we decide to keep the "migration" scripts here and not in the Archive DB, integrating them into a migration workflow with sqlx migrate run
would be a way to go I think. For now it is still a bit experimental so I thought that having this command would be more sensible. It allows for applying the optimizations and testing them out as an option, at the same time one is not required to use/apply them.
tasks/search_tx_optimizations.ts
Outdated
const connectionString = Deno.env.get("DATABASE_URL"); | ||
assertExists(connectionString); | ||
|
||
const args = Deno.args; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't quite grok: what is the overlap between this script and the search_tx_optimizations.rs
? Also, let's utilize the the @std/cli
's parseArgs
util for argument parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't quite grok: what is the overlap between this script and the search_tx_optimizations.rs?
Well, you need to compile mina-mesh to be able to use the search_tx_optimizations.rs
, but you cannot compile mina-mesh without having the additional table user_commands_aggregated
. Therefore the deno
task is intended for creating the dev environment such that you're able to compile the project. On the other hand the new command ( search_tx_optimizations.rs
) is user facing command for applying the optimizations on the archive DB. As I mentioned in the PR summary, these (search-tx-optimizations
and --use-search-tx-optimizations
) may be temporary (for instance if we decide to move/apply optimzations into the archive DB schema directly), but for now that lives as an utility in the mina-mesh as we're testing/researching this option.
Also, let's utilize the the @std/cli's parseArgs util for argument parsing.
👍
|
||
/// Apply optimizations | ||
#[arg(long, conflicts_with = "drop")] | ||
apply: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a nested subcommand instead of the boolean flags + conflicts_with
. That being said, it's not super important; let's merge for now.
println!("Applying search transaction optimizations on Archive Database (this may take few minutes)..."); | ||
|
||
// Load and execute the SQL from the file | ||
let sql = include_str!("../../sql/migrations/apply_search_tx_optimizations.sql"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cases like this, we should probably use the following (incase we want to move the file elsewhere / refactor the line elsewhere, we don't need to worry about the relative reference)
include_str!(concat!(env!("CARGO_MANIFEST_DIR"), "/root-relative/path/to/the/file.sql"))
println!("Dropping search transaction optimizations from Archive Database..."); | ||
|
||
// Load and execute the SQL from the file | ||
let sql = include_str!("../../sql/migrations/drop_search_tx_optimizations.sql"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here
Summary
This PR introduces optimizations for the
/search/transactions
endpoint. The main feature is a new table,user_commands_aggregated
, designed to improve query performance by trading off normalization for speed and increased storage usage. A function and trigger are also added to keep the table updated.Additionally, a new command has been added to the Mesh tool for managing these optimizations (
mina-mesh search-tx-optimizations --apply
,--drop
, or--check
their status), and theserve
command now includes a--use-search-tx-optimizations
flag to enable these optimizations. These features are intended for testing and may be temporary.New Commands
The
mina-mesh
tool now supports the following:Once optimizations are applied, you can run the Mesh server with optimizations enabled:
Performance Test Results
The following table shows a comparison of query performance between Rosetta, the default Mesh server, and the optimized Mesh server:
mina-mesh serve
mina-mesh serve --use-search-tx-optimizations
(35c54dd)mina-mesh serve --use-search-tx-optimizations
(c1332d2)"address":"B62qiburnzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzmp7r7UN6X", "limit":5000
"address":"B62qpXXYbzeZkXrpa3EuZcXgqFSsBsSWrrvi16GJnXLhaqELBSfbnGF"
"address":"B62qowpMhZ2Ww7b8xQxcK7rrpfsL5Nt5Yz5uxaizUBKqpeZUqBETa31","status":"applied","limit":100
"max_block":394837,"status":"failed","limit":1000
"transaction_identifier":{"hash":"5JvFj6DJh1dnMnLPki9ZnmgbxcgfNZCc6hRs8FhvVhaautt84EpY"}
Notes on performance test:
time
tool using the query template as follows:Storage Overhead
The storage overhead on Postgres server is around 3GB as this is the size of the table with indexes:
The storage overhead on the database dump (taken with
pg_dump
tool) is around 2GB:Dev Notes
This PR introduces new table and this table needs to be available in the development schema in compile time (as
sqlx
needs it for checks) therefore there are newdeno
tasks provided for applying and dropping these optimizations: