Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAM growth while snapshot hash computation #1588

Closed
dimalit opened this issue Jul 11, 2023 · 7 comments · Fixed by #1589 or #1787
Closed

RAM growth while snapshot hash computation #1588

dimalit opened this issue Jul 11, 2023 · 7 comments · Fixed by #1589 or #1787
Assignees
Labels
bug Something isn't working epic:HPD
Milestone

Comments

@dimalit
Copy link
Collaborator

dimalit commented Jul 11, 2023

Currently, this was only reproduced on skaled'd stability tests

  1. Limit transaction queue size to 100 txns.
  2. Run 16-node stability test (it runs load test as well).
  3. When snapshot hash is being calculated (every 1 hr) RSS grows by approx 650MB
2023-07-11 14:00:02.709529   "/data_dir/snapshots/1421/a5cf2af8/12041/state" hash is: f4415c534cf5a2c0addfa1a7e5dbe6565112ef5686e23160902a2261c5e0e538
2023-07-11 14:00:40.375903   "/data_dir/snapshots/1421/a5cf2af8/blocks_and_extras/0.db" hash is: 419d9d3624db028ca27e2f4c9199ebda986a7038140527fdc693575a4381938c
2023-07-11 14:00:40.420298   "/data_dir/snapshots/1421/a5cf2af8/blocks_and_extras/1.db" hash is: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2023-07-11 14:00:40.425411   "/data_dir/snapshots/1421/a5cf2af8/blocks_and_extras/2.db" hash is: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2023-07-11 14:00:40.431677   "/data_dir/snapshots/1421/a5cf2af8/blocks_and_extras/3.db" hash is: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
2023-07-11 14:00:49.327736   "/data_dir/snapshots/1421/a5cf2af8/blocks_and_extras/4.db" hash is: e3c9f1ab5b5ebcdfdec465604206e1d44370441256768f9b1040653d9b951152
2023-07-11 14:00:49.420597   Latest price hash is: 3bb78535cc9555ff19fe3556aaa41c78a0a45c64d49ba2bc564507648a8e77a1
2023-07-11 14:00:49.525538   Computed hash for snapshot 1421: #c1014dab…

ram

@DmytroNazarenko DmytroNazarenko added bug Something isn't working release:2.2 epic:HPD labels Jul 12, 2023
@DmytroNazarenko DmytroNazarenko moved this to Ready For Pickup in SKALE Engineering 🚀 Jul 12, 2023
@olehnikolaiev olehnikolaiev moved this from Ready For Pickup to In Progress in SKALE Engineering 🚀 Jul 12, 2023
@PolinaKiporenko PolinaKiporenko added this to the 2.2 milestone Jul 12, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Ready For Release Candidate in SKALE Engineering 🚀 Jul 12, 2023
@dimalit
Copy link
Collaborator Author

dimalit commented Jul 17, 2023

Partially resolved
image

@dimalit dimalit reopened this Jul 17, 2023
@github-project-automation github-project-automation bot moved this from Ready For Release Candidate to Ready For Pickup in SKALE Engineering 🚀 Jul 17, 2023
@PolinaKiporenko
Copy link
Contributor

partly solved in 2.2

@PolinaKiporenko PolinaKiporenko removed this from the 2.2 milestone Jul 24, 2023
@oleksandrSydorenkoJ
Copy link

oleksandrSydorenkoJ commented Dec 7, 2023

UPD.
Skaled under load may be crashed without any reason or stack trace, when SWAP file is fully used and there is no free RSS to compute snapshot hash, which leads to #1094

example

============================================
Node 1:
============================================
docker inspect skale_schain_ill-informed-friendly-haedi | grep "StartedAt\|FinishedAt" &&\
docker inspect skale_schain_hungry-formal-ascella | grep "StartedAt\|FinishedAt" &&\ 
docker inspect skale_schain_rural-colossal-cebalrai  | grep "StartedAt\|FinishedAt

skale_schain_ill-informed-friendly-haedi
            "StartedAt": "2023-12-03T17:00:55.395322791Z",
            "FinishedAt": "2023-12-03T17:00:42.068534Z"
skale_schain_hungry-formal-ascella << not crashed 
            "StartedAt": "2023-11-22T17:45:45.56355484Z",
            "FinishedAt": "0001-01-01T00:00:00Z" 
skale_schain_rural-colossal-cebalrai
            "StartedAt": "2023-12-05T21:00:45.509193177Z",
            "FinishedAt": "2023-12-05T21:00:40.267984395Z"

3_17_1_skaled_crashes_during_snapshhot_hash_calculations.txt

@DmytroNazarenko DmytroNazarenko modified the milestones: SKALE 2.4, 2.3, SKALE 2.3 Dec 7, 2023
@DmytroNazarenko DmytroNazarenko moved this to Ready For Pickup in SKALE Engineering 🚀 Dec 7, 2023
@olehnikolaiev olehnikolaiev moved this from Ready For Pickup to In Progress in SKALE Engineering 🚀 Dec 15, 2023
@oleksandrSydorenkoJ
Copy link

UPD
Skaled doesn't use SWAP if only 1 chain is under load, but RSS consumption and SWAP usage grow when at least 2 chains are under load 60-80 TPS

image

@oleksandrSydorenkoJ
Copy link

oleksandrSydorenkoJ commented Dec 27, 2023

may be useful

https://github.com/google/leveldb/releases/tag/1.23

Sync MANIFEST before closing in db_impl when creating a new DB. Add logging with debugging information when failing to load a version set.

Manifest files increases every snapshot creation
Before snapshot creation

Node 1:
29M     /mnt/schains-ill-informed-friendly-haedi/28e07f34/12041/state/MANIFEST-27877107
75M     /mnt/schains-hungry-formal-ascella/28e07f34/12041/state/MANIFEST-33707922
33M     /mnt/schains-rural-colossal-cebalrai/28e07f34/12041/state/MANIFEST-26992082

after few snapshots:

33M     /mnt/schains-ill-informed-friendly-haedi/28e07f34/12041/state/MANIFEST-27877107
85M     /mnt/schains-hungry-formal-ascella/28e07f34/12041/state/MANIFEST-33707922
37M     /mnt/schains-rural-colossal-cebalrai/28e07f34/12041/state/MANIFEST-26992082

olehnikolaiev added a commit that referenced this issue Jan 8, 2024
@olehnikolaiev olehnikolaiev linked a pull request Jan 17, 2024 that will close this issue
olehnikolaiev added a commit that referenced this issue Jan 18, 2024
@olehnikolaiev olehnikolaiev moved this from In Progress to Code Review in SKALE Engineering 🚀 Jan 18, 2024
@olehnikolaiev olehnikolaiev moved this from Code Review to Ready For Release Candidate in SKALE Engineering 🚀 Jan 18, 2024
@PolinaKiporenko PolinaKiporenko moved this from Ready For Release Candidate to In Progress in SKALE Engineering 🚀 Jan 19, 2024
@olehnikolaiev olehnikolaiev moved this from In Progress to Ready For Release Candidate in SKALE Engineering 🚀 Jan 22, 2024
@DmytroNazarenko
Copy link
Collaborator

skaled: 3.18.0-beta.0

@DmytroNazarenko DmytroNazarenko moved this from Ready For Release Candidate to Merged To Release Candidate in SKALE Engineering 🚀 Jan 23, 2024
@EvgeniyZZ EvgeniyZZ moved this from Merged To Release Candidate to QA in SKALE Engineering 🚀 Jan 23, 2024
@oleksandrSydorenkoJ
Copy link

Verified on Legacy network

Skaled: 3.18.0:
RSS Grows up to 100-200 MB and decreases over time without using the swap file
image
image

skaled 3.17.1
RSS Grows up to 1200MB and decreases over time with using the swap file
image
image

@EvgeniyZZ EvgeniyZZ moved this from QA to Done in SKALE Engineering 🚀 Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment