Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1545 without dos and stats cleanups #2046

Open
wants to merge 309 commits into
base: v4.0.0
Choose a base branch
from

Conversation

kladkogex
Copy link
Collaborator

@kladkogex kladkogex commented Nov 25, 2024

State locking/ performance improvements for SKALEd.

  1. Use LevedlDB read-only snaps for all read-only JSON-RPC calls

In the past we had to constantly lock/unlock LevelDB state because it was used both during block processing
(write access) and JSON-RPC calls (read access)

Now we use the actual LevelDB only during block processing, so we do not need to lock it, since LevelDB will always be used by a single thread.

For read-only calls through JSON-RPC we use snapshots created at the end of each block.
This significantly improved performance and made calls much simpler, because we do not do state locking anymore.

image

Added SnapshotManager class that manages snapshots and closes them once they are not used anymore.

Since we already have a feature to reopen LevelDB during execution, added mechanics to close anbd reopen snapshot objects during reopen of the database (if you want to close database handle, the snapshot objects created for this handle need to be closed first)

  1. Simplified pending queue/broadcast and removed unnecessary locks.

Now when a new transaction comes from JSON-RPC it is immediately broadcast to other nodes and added to the pending queue. Previously a transaction would not be added to the blockproposal until it was broadcast, which was making things unnecessary complex and delaying transaction inclusion in the block.

  1. Increased default ZMQ outgoing queue size to 1024 transactions.

It was only 16 transactions before, which caused the queue to overfill and lose transactions during broadcast.

  1. Removed unnecessary transaction cache on transaction arrival. The cash did not improve performance but caused high memory consumption and memory leaks because it was never cleaned.

  2. Fixed a bug in partial catchup receipts storage, which caused large blocks processing to stuck for seconds and also could process the same block twice. Now for each transaction once it is processed the receipt is immediately written to LevelDB. If there is a crash and restart, only the unprocessed transactions in the block are processed. Added a test for partial catchup (previously there was no test)

  3. Added a set of parallel performance tests to test_eth that test ERC-20 token tranfers, native token transfers and all of major lots of JSON-RPC calls

  4. Changed m_rawData field in Transaction to shared pointer to speed up Transaction copying and save memory. Skaled copies Transaction objects all the time, shared pointer guarantees that copies do not consume additional memory.

  5. Did some small fixes to JSON-RPC API error handling to make it similar to Geth.

@kladkogex kladkogex self-assigned this Nov 25, 2024
@kladkogex kladkogex added this to the SKALE 4.0 milestone Nov 25, 2024
@@ -48,6 +48,7 @@ DEV_SIMPLE_EXCEPTION( NotEnoughAvailableSpace );
DEV_SIMPLE_EXCEPTION( NotEnoughCash );
DEV_SIMPLE_EXCEPTION( GasPriceTooLow );
DEV_SIMPLE_EXCEPTION( SameNonceAlreadyInQueue );
DEV_SIMPLE_EXCEPTION( NonceTooMuchInTheFuture );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it for? it is not used anywhere in the code and geth doesn't have such exception

@@ -1046,8 +1050,9 @@ int main( int argc, char** argv ) try {

setupLogging( loggingOptions );

const size_t nCpuCount = skutils::tools::cpu_count();
size_t nDispatchThreads = nCpuCount * 2;
// we do not really use these threads anymore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where were these threads used? should we remove them completely if we don't use them?

} catch ( InvalidNonce& ) {
if ( !client()->chainParams().sChain.multiTransactionMode ) {
// make it similar to what geth does
throw std::runtime_error( "Nonce in the future." );
Copy link
Contributor

@olehnikolaiev olehnikolaiev Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need any specific error message in this case?

@@ -90,7 +90,8 @@ void* ZmqBroadcaster::server_socket() const {
if ( !m_zmq_server_socket ) {
m_zmq_server_socket = zmq_socket( m_zmq_context, ZMQ_PUB );

int val = 16;
// it was 16 before, this caused messages lost during performance tests
int val = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should measure memory consumption for this particular change

if ( FAIL_AT_TX_NUM ) {
if ( transactionCount == std::stoi( FAIL_AT_TX_NUM ) ) {
// fail hard for test
cerror << "Test: crashing skaled on purpose after processing " << i
Copy link
Contributor

@olehnikolaiev olehnikolaiev Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove this before merging? I think it fails if the env variable was not passed

@@ -38,6 +38,7 @@

#include "libweb3jsonrpc/Eth.h"
#include "libweb3jsonrpc/JsonHelper.h"
#include "test/tools/libtestutils/FixedClient.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should it be included in this file?

void OverlayDB::clearPartialTransactionReceipts() {
dev::eth::BlockReceipts blockReceipts;
setPartialTransactionReceipts( blockReceipts.rlp() );
m_db_face->insert( skale::slicing::toSlice( key ), skale::slicing::toSlice( _newReceipt ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why cant we do this->lastExecutedTransactionReceipt = _newRecipt; here and then commit this data during the regular commit stage? in this way we will do twice less inserts to the db per block

}

Transactions TransactionQueue::topTransactions_WITH_LOCK(
unsigned _limit, int _maxCategory, int _setCategory ) {
Transactions TransactionQueue::topTransactions_WITH_LOCK( unsigned _limit ) {
MICROPROFILE_SCOPEI( "TransactionQueue", "topTransactions_WITH_LOCK_cat", MP_PAPAYAWHIP );

Transactions top_transactions;
std::vector< PriorityQueue::node_type > found;

VerifiedTransaction dummy = VerifiedTransaction( Transaction() );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this variable is not used later, lets remove it

@@ -230,6 +222,17 @@ Transactions TransactionQueue::topTransactions_WITH_LOCK(
return top_transactions;
}

// note - this function is currently only used when tracing is enabled
Copy link
Contributor

@olehnikolaiev olehnikolaiev Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find the place where this function is used. maybe something is missing?

std::pair< u256, ExecutionResult > estimateGas( Address const& _from, u256 _value,
Address _dest, bytes const& _data, int64_t _maxGas, u256 _gasPrice,
GasEstimationCallback const& _callback = GasEstimationCallback() ) override;
std::pair< bool, ExecutionResult > estimateGasStep( int64_t _gas, Block& _latestBlock,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this method be public?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 1799 to 1864
BOOST_REQUIRE( receipt["status"].asString() == "0x0" );
BOOST_REQUIRE( receipt["gasUsed"].asString() == "0x61cb" );

BOOST_REQUIRE( receipt["status"].asString() == "0x1" );
BOOST_REQUIRE( receipt["gasUsed"].asString() == "0x13ef4" );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transaction should fail as it was before

test/unittests/libweb3jsonrpc/jsonrpc.cpp Outdated Show resolved Hide resolved
libskale/State.h Outdated
}
return oss.str();
}
void initStateWithRandomTestDb() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is never used, let's remove it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 431 to 446
try {
return toJS( client()->importTransaction( t, TransactionBroadcast::BroadcastToAll ) );
} catch ( PendingTransactionAlreadyExists& ) {
throw std::runtime_error( "Transaction with same nonce already exists in the queue." );
} catch ( TransactionAlreadyInChain& ) {
// make it similar to what geth does
throw std::runtime_error( "Nonce too low." );
} catch ( InvalidNonce& ) {
if ( !client()->chainParams().sChain.multiTransactionMode ) {
// make it similar to what geth does
throw std::runtime_error( "Nonce in the future." );
} else {
// make it similar to what geth does
throw std::runtime_error( "Invalid nonce." );
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling for eth_sendRawTransaction is already implemented here:
https://github.com/skalenetwork/skaled/blob/develop/libweb3jsonrpc/rapidjson_handlers.cpp#L55

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 435 to 436
// BOOST_CHECK_EQUAL( gasBefore - gas, _expectedGasConsumed );
// BOOST_CHECK_EQUAL( extVm.sub.refunds, _expectedRefund );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these lines should be commented?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

There are test eth tests that fail if this is left. These are tests that are not executed by Github Actions

We need to fix these tests later.

BOOST_CHECK_EQUAL(
fixture.sendingRawShouldFail( signedTx["raw"].asString() ), "Invalid transaction nonce." );

fixture.sendingRawShouldFail( signedTx["raw"].asString() );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to check the error message as it was before

Comment on lines 731 to 753
BOOST_CHECK_EQUAL( fixture.sendingRawShouldFail( signedTx["raw"].asString() ),
"Same transaction already exists in the pending transaction queue." );
fixture.sendingRawShouldFail( signedTx["raw"].asString() );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to check the error message as it was before

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleanup state locking and make JSON RPC calls like eth_call faster
3 participants