-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow store queries may block fleet nodes #1017
Comments
cc @kaiserd |
See #982 under efficiency issue. Sparse filters are relatively slow. This combined with the fact that some operations are blocking (see todo list in #982) may lead to store nodes becoming unresponsive. I ran into this issue with toy chat and a large DB (1.6GB, >800k messages) that only contained a few toy chat messages. I will be on it after making more progress on vacp2p/research#104 |
DB size is not huge:
|
Thank you! |
There is a known issue in `v0.10` where certain store queries can slow down nodes: waku-org/nwaku#1017 A possible (partial?) fix has been merged and deployed to `wakuv2.test`: waku-org/nwaku#1018 But the SQLite store can be disable on `prod` in the meantime. Signed-off-by: Jakub Sokołowski <[email protected]>
@LNSD not sure how relevant this issue still is. Based on this comment: https://discord.com/channels/864066763682218004/1011953262586507354/1012036459097767946 I think there is still some investigation necessary, but likely unrelated to the issue that was reported (and addressed) here. |
Closing this issue as the changes merged as part of #1120 have improved the store query time significantly. |
Otherwise nodes try to load all messages into memory at startup. And the issue that originally caused this to be disabled is closed: waku-org/nwaku#1017 waku-org/nwaku#1018 Signed-off-by: Jakub Sokołowski <[email protected]>
With the SQLite-only store it seems as if nodes can become unresponsive when building responses to sparse queries.
This has been observed on the
wakuv2.prod
fleet fornode-01.gc-us-central1-a.wakuv2.prod
around 8 UTC on 2022-06-23, though the issue may have been present quite some time before that. New connection attempts to the node time out and consul reports metrics reporting failures.Successive backtraces on the running node reveals that the node is stuck answering a store query:
The last query in the nodes logs was:
which is known to be a slow query.
The text was updated successfully, but these errors were encountered: