Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethereum Node Syncing got Stuck #12837

Open
shivraj001 opened this issue Nov 22, 2024 · 9 comments
Open

Ethereum Node Syncing got Stuck #12837

shivraj001 opened this issue Nov 22, 2024 · 9 comments
Labels
imp2 Medium importance

Comments

@shivraj001
Copy link

Erigon Version:
v2.60.0

Network Type:
Mainnet Node

Environment Details:

Operating System: Ubuntu 20.04
Machine Specs:
CPU: Intel 8CPU
RAM: 32 GB
Disk: 30TB
Deployment Method: Binary
Description of the Issue:
The Ethereum archive node using Erigon has stopped syncing at block number 18999999 on the Mainnet. This issue has persisted for several weeks. Syncing stalls without progressing beyond this block, even after restarting the node.

Steps to Reproduce:

Deploy Erigon v2.60.0 in an Ethereum archive node setup for the Mainnet.
Start syncing the node.
Observe syncing halts at block number 18999999.

Command used to run node:
erigon --datadir /datadisk/node/erigon --chain=mainnet --http.api=eth,debug,net,trace,web3 --http.addr=0.0.0.0 --http.port=8545 --http.vhosts=* --metrics --metrics.addr=0.0.0.0 --log.console.verbosity=dbug --log.dir.path=/datadisk/node/logs --txpool.disable --internalcl

Expected Behavior:
The node should continue syncing blocks beyond 18999999 without interruption.

Actual Behavior:
The node remains stuck at block 18999999, with no further progress.

Logs:
Logs captured during the issue are included below:

000000000000, Merge Netsplit: , Shanghai: 1681338455, Cancun: 1710338135, Prague: , Osaka: , Engine: ethash, NoPruneContracts: map[0x00000000219ab540356cBB839Cbe05303d7705Fa:true]}" genesis=0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
DBUG[11-22|04:55:22.446] [db] open label=downloader sizeLimit=16GB pageSize=4096
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-015000-015500-bodies.seg local=b73b367baaf013d8385a76a865bdbad2809b385f known=7255de209286291b92536dc39011bbc412b0a768
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-014500-015000-bodies.seg local=7e952b86719fd8c795abb11c63503a9427f32c4f known=ee52f47d12eed45f7be6f1946e44e72902305a20
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018200-018300-bodies.seg local=d5043dfedec41bceb922468df50090b70b1c4c8a known=6cd5dbd8a51bc687362f178be3bb1a0700ca1151
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018300-018400-bodies.seg local=cf8eb6732cc1755293a6a02dcd92dc1dd9cf6cb2 known=44f4bb367c2794872ac62dd8637ef408572713a0
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018400-018500-bodies.seg local=fe3ac642940cb55367217ba6400b44fdadc40ed1 known=416a2274d0101e56c9e47b2e3bc5a6469de5df54
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-016000-016500-bodies.seg local=11fb22b8b2415668b26841174e85727951772937 known=910ed8afd69ce09e1543487825f3a90f755f79df
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-017000-017500-bodies.seg local=eb0b538a5d7c1eacfc0bc8cf1214c2e381ce3087 known=0e0ea3be565df0821c70a78bfa6cbe135a67e64b
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-017500-018000-bodies.seg local=656969e6a2cf1e704b973e20af4382ac966835ae known=b725c77a8afdf536b0b72966b2dee1c1e2d88e0c
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-015500-016000-bodies.seg local=1f559cbc16d5e01ba1e31d892bdfc7baf25cac53 known=12ded631880b051c700abb700d4b48155e68b08e
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-016500-017000-bodies.seg local=b72b80e722e8c4dd8c393dc5d1f4e69d8be3d312 known=d098d8f7c9d3fdcff92feabe01cdc2f27f539c76
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018000-018100-bodies.seg local=99a4b4d8382c86c7c745c81f6647672b65d2db9b known=17693397d915b4aac755f11e54306e8966ce1c40
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018100-018200-bodies.seg local=dd1f25eaca5afdeb5f4b37d8da6f89d2106270eb known=9ba1712859a78aaabd934686c1b3934358ac88ff
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018500-018600-bodies.seg local=73ad11574bdb5345373a9edab8b949ae1a564d48 known=7c4c95a98046e56ce0045a550c551592693d281c
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018600-018700-bodies.seg local=731a3579ca9e86608ea1ba668aaffb2a107a081c known=c10c7138967149cea4a9da9eb779e0a71f223ead
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018900-019000-bodies.seg local=6883d2501de0ca385cccaa1a2bab726e47176902 known=c94c56736466d2bdbee5c88d7bf027d3681c9110
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018700-018800-bodies.seg local=2501ab81654fd0b234c819fead66ab197f4c0438 known=2a008a1054c43fd6df76e6845d04d54d7cd2ba60
DBUG[11-22|04:55:22.486] [downloader] local file hash does not match known file=v1-018800-018900-bodies.seg local=55266ef559f6f0f188d398b9a77ee1e9dd7157de known=93ce47fc00a0d834d98080157d4f25cb8ce58fa9

DBUG[11-22|04:57:52.593] Received block via gossip slot=10452190
DBUG[11-22|04:57:52.593] Block scheduled for later processing block=10452190
DBUG[11-22|04:57:52.696] import operations time=1.272µs
DBUG[11-22|04:57:53.623] Error in DNS random node sync tree=all.mainnet.ethdisco.net err="lookup MG6ATRV2DSOEX2QGPW7RHHRSPE.all.mainnet.ethdisco.net on 127.0.0.53:53: no such host"
DBUG[11-22|04:58:01.358] Received blob sidecar via gossip index=3 size=128KB
DBUG[11-22|04:58:01.358] Received block via gossip slot=10452288
DBUG[11-22|04:58:01.358] Block scheduled for later processing block=10452288
DBUG[11-22|04:58:01.359] Received blob sidecar via gossip index=2 size=128KB
DBUG[11-22|04:58:01.385] Received blob sidecar via gossip index=0 size=128KB
DBUG[11-22|04:58:01.647] Received blob sidecar via gossip index=1 size=128KB
DBUG[11-22|04:58:06.666] Error in DNS random node sync tree=all.mainnet.ethdisco.net err="lookup YIK7J6QF5HLFLUGMDBCJWNU5RU.all.mainnet.ethdisco.net on 127.0.0.53:53: no such host"
DBUG[11-22|04:58:12.440] Received block via gossip slot=10452289
DBUG[11-22|04:58:12.440] Block scheduled for later processing block=104522

DBUG[11-22|05:02:21.552] [downloader] Collecting... from=21241053 to=21241053 len=1
DBUG[11-22|05:02:21.552] [downloader] posAnchor is nil
INFO[11-22|05:02:21.974] Node is still syncing... downloading past blocks app=caplin stage=DownloadHistoricalBlocks slot=10447100 blockNumber=21235864 blk/sec=7.8 snapshots=0
DBUG[11-22|05:02:22.463] [downloader] Collecting... from=21241053 to=21241053 len=1
DBUG[11-22|05:02:22.463] [downloader] posAnchor is nil
DBUG[11-22|05:02:24.365] Received block via gossip slot=10452310
DBUG[11-22|05:02:24.365] Block scheduled for later processing block=10452310
DBUG[11-22|05:02:31.174] Error in DNS random node sync tree=all.mainnet.ethdisco.net err="lookup YIXI2ANEAWCPNDH2ZMDXDUANEI.all.mainnet.ethdisco.net on 127.0.0.53:53: no such host"
DBUG[11-22|05:02:37.727] Received blob sidecar via gossip index=5 size=128KB
DBUG[11-22|05:02:37.746] Received block via gossip slot=10452311
DBUG[11-22|05:02:37.746] Block scheduled for later processing block=10452311
DBUG[11-22|05:02:37.786] Received blob sidecar via gossip index=0 size=128KB
DBUG[11-22|05:02:37.809] Received blob sidecar via gossip index=1 size=128KB
DBUG[11-22|05:02:37.839] Received blob sidecar via gossip index=4 size=128KB
DBUG[11-22|05:02:37.866] [downloader] Collecting... from=13773036 to=137

Any kind of support is appreciated.

@AskAlexSharov
Copy link
Collaborator

add --internalcl

@shivraj001
Copy link
Author

@AskAlexSharov --internalcl flag is included already.

erigon --datadir /datadisk/node/erigon --chain=mainnet --http.api=eth,debug,net,trace,web3 --http.addr=0.0.0.0 --http.port=8545 --http.vhosts=* --metrics --metrics.addr=0.0.0.0 --log.console.verbosity=dbug --log.dir.path=/datadisk/node/logs --txpool.disable --internalcl

@AskAlexSharov AskAlexSharov reopened this Nov 22, 2024
@AskAlexSharov
Copy link
Collaborator

grep -v DBUG

@lystopad
Copy link
Member

@shivraj001 , could you, please, clarify erigon version?
Also, could you, please, try with latest 2.60.10 ?

@shivraj001
Copy link
Author

shivraj001 commented Nov 23, 2024

@AskAlexSharov @lystopad for some reason everyday it gets Killed with below log.

DBUG[11-22|22:33:36.699] Received blob sidecar via gossip index=2 size=128KB
DBUG[11-22|22:33:46.692] [p2p] Dial scheduler protocol=68 peers=84/33 tried=26978 static=0 i/o timeout=4151 connect: connection refused=196 connect: no route to host=67 connect: connection reset by peer=10
DBUG[11-22|22:33:46.613] [p2p] Server protocol=67 peers=32 trusted=0 inbound=0 too many peers=62992 EOF=6446 closed by remote=10650 i/o timeout=7443 already connected=256
DBUG[11-22|22:33:47.130] [p2p] Discovery table protocol=68 version=v4 len=180 live=170 unsol=500 ips=279 db=0 reval=12662 RPC timeout=412 invalid ID in response record=9 invalid IP in response record: LAN address from WAN host=39 unknown node=18 unsolicited reply=197 expired=3
DBUG[11-22|22:33:48.029] [p2p] Discovery table protocol=67 version=v4 len=189 live=182 unsol=500 ips=298 db=0 reval=12665 RPC timeout=364 invalid IP in response record: loopback address from non-loopback host=2 invalid IP in response record: LAN address from WAN host=34 invalid ID in response record=7 unsolicited reply=1244 unknown node=165 expired=20
DBUG[11-22|22:33:49.470] [p2p] Server protocol=68 peers=85 trusted=0 inbound=53 i/o timeout=493 ecies: invalid message=120 already connected=9 unexpected EOF=3 invalid node identity=1 too many peers=13521 EOF=1284 closed by remote=2380
DBUG[11-22|22:33:49.741] [p2p] Dial scheduler protocol=67 peers=32/33 tried=127715 static=0 i/o timeout=16022 connect: connection refused=927 connect: no route to host=238 connect: connection reset by peer=1
INFO[11-22|22:33:52.385] Node is still syncing... downloading past blocks app=caplin stage=DownloadHistoricalBlocks slot=9006359 blockNumber=19802918 blk/sec=2.1 snapshots=0
DBUG[11-22|22:34:16.595] Received blob sidecar via gossip index=0 size=128KB
DBUG[11-22|22:34:16.809] Received blob sidecar via gossip index=2 size=128KB
DBUG[11-22|22:34:18.587] Received blob sidecar via gossip index=2 size=128KB
DBUG[11-22|22:34:18.817] [p2p] Discovery table protocol=any version=v5 len=185 live=181 unsol=0 ips=269 db=0 reval=12672 RPC timeout=808 0 nodes in response for distance zero=2
INFO[11-22|22:34:18.845] P2P app=caplin peers=68
DBUG[11-22|22:34:19.917] Received blob sidecar via gossip index=3 size=128KB
DBUG[11-22|22:34:21.343] Received blob sidecar via gossip index=3 size=128KB
INFO[11-22|22:34:43.718] [p2p] GoodPeers eth67=30 eth68=83 eth66=2
DBUG[11-22|22:34:46.640] [p2p] Server protocol=68 peers=83 trusted=0 inbound=52 too many peers=13526 EOF=1292 closed by remote=2380 i/o timeout=494 ecies: invalid message=120 already connected=9 unexpected EOF=3 invalid node identity=1
DBUG[11-22|22:34:46.932] [p2p] Discovery table protocol=68 version=v4 len=185 live=168 unsol=500 ips=284 db=0 reval=12666 RPC timeout=414 invalid ID in response record=9 invalid IP in response record: LAN address from WAN host=39 unsolicited reply=205 expired=3 unknown node=18
DBUG[11-22|22:34:46.671] [p2p] Discovery table protocol=67 version=v4 len=189 live=180 unsol=500 ips=297 db=0 reval=12668 invalid IP in response record: LAN address from WAN host=34 invalid ID in response record=7 RPC timeout=366 invalid IP in response record: loopback address from non-loopback host=2 unsolicited reply=1257 unknown node=165 expired=27
DBUG[11-22|22:34:46.689] [p2p] Server protocol=67 peers=32 trusted=0 inbound=0 too many peers=62996 EOF=6447 closed by remote=10651 i/o timeout=7444 already connected=256
DBUG[11-22|22:34:46.768] [p2p] Dial scheduler protocol=68 peers=83/33 tried=27000 static=0 i/o timeout=4155 connect: connection refused=196 connect: no route to host=67 connect: connection reset by peer=13
DBUG[11-22|22:34:48.083] [p2p] Dial scheduler protocol=67 peers=32/33 tried=127725 static=0 i/o timeout=16023 connect: connection refused=927 connect: no route to host=238 connect: connection reset by peer=2
INFO[11-22|22:34:53.116] [mem] memory stats Rss=14.3GB Size=0B Pss=14.3GB SharedClean=4.0KB SharedDirty=0B PrivateClean=4.3MB PrivateDirty=14.3GB Referenced=14.1GB Anonymous=14.3GB Swap=0B alloc=13.8GB sys=14.5GB
DBUG[11-22|22:35:10.862] Received block via gossip slot=10457567
DBUG[11-22|22:35:12.464] Block scheduled for later processing block=10457567
DBUG[11-22|22:35:19.966] [p2p] Discovery table protocol=any version=v5 len=184 live=180 unsol=0 ips=266 db=0 reval=12676 RPC timeout=811 0 nodes in response for distance zero=2
INFO[11-22|22:35:22.933] P2P app=caplin peers=60
Killed

@shivraj001
Copy link
Author

@AskAlexSharov @lystopad I have updated to the latest version 2.60.10 but still I'm facing syncing issue.

@AskAlexSharov
Copy link
Collaborator

grep -v DBUG

@AskAlexSharov
Copy link
Collaborator

AskAlexSharov commented Nov 25, 2024

try GOGC=50

@AskAlexSharov
Copy link
Collaborator

also can show:
go tool pprof -alloc_objects -png http://127.0.0.1:6060/debug/pprof/heap > mem.png

@yperbasis yperbasis added the imp2 Medium importance label Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imp2 Medium importance
Projects
None yet
Development

No branches or pull requests

4 participants