-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No details for the LMDB crashes mentioned in the README #216
Comments
Are the bugs in LMDB itself or in the Rust bindings? Do they trigger with single-instance, concurrent reads and writes, or heavy contention? Is sled more reliable? |
I saw this issue has been open without activity for a while. A brief search on Bugzilla revealed a related issue. It seems like the LMDB crashes are caused by attempting to load a corrupt database file. No mention was made of other failure modes, and it was indicated that the crashes are an issue present in the upstream LMDB project. The LMDB authors, Symas, have indicated they won't fix it. The crash seems to come from a panic within the LMDB API when opining a corrupted DB file, so a possible workaround might be to catch this specific panic (during the opening of the DB file). However, I'm still on-boarding to Rust, so not sure how appropriate something like Could one of the recent maintainers (@badboy, @saschanaz) confirm if the above failure mode is the one referenced ambiguously in the README? |
Not a maintainer, but I've since come across a LMDB failure mode of database corruption followed by uncontrolled crashes when opening the corrupt database, in Baloo (not rkv-based). The bug report thread is at https://bugs.kde.org/show_bug.cgi?id=434926. The crash is SIGBUS on Linux (and untested on Windows), so you can't just catch it through catch_unwind alone. |
We used to have links to the crashes we saw (bug 1538539, bug 1538541), not sure why/when we removed them. These are directly in LMDB (as @nyanpasu64 also mentioned), so Further we're not using LMDB mode anymore (or moving away from it in the few places we still have it enabled), so we neither have more/newer crash data nor any attempts to fix it. |
That seems to be a bit of a mischaracterization. We have support in LMDB 1.0 (https://github.com/LMDB/lmdb/tree/mdb.master3) for per-page checksums, and will return an error for corrupted pages. Certainly we can't roll this feature out in LMDB 0.9 since it requires a DB on-disk format change (to leave space for storing the checksums). Aside from that though, we were never pointed at anything that could help identify the cause of the corruptions in the first place. With the code coverage and everything else that is tested in the LMDB codebase, there's no indication that LMDB itself mis-wrote any pages. Plus literally millions of hours of reliable use in countless other projects that have never encountered similar issues. PS: we attempted to build Baloo to investigate, but executables built from source always crashed for us, prior to even touching any LMDB code. |
Thank you for chiming in, @hyc! I looked into one of the BugZilla tickets linked above and saw you replied there as well, but didn't get any tangible feedback from the team (at least, not on that ticket). I'm now wondering why the original ticket I found indicated the data corruption is an unrecoverable fault in LMDB, instead of a potential usage error. For example, one of the tickets above mentioned LMDB's max key size may have been violated, leading to the corruption; on a super quick check of Regardless--because the different issues both here and on BugZilla present conflicting views of the situation, it would be great if these could at least be cleared up explicitly as part of this repo's documentation. As a potential user, I'd love to use the Mozilla-maintained |
LMDB will always reject attempts to use a too-large key with Remember that LMDB is a single-writer database and serialization is enforced with a simple mutex. As such, it is impossible for writer concurrency to cause any race conditions or other memory corruption issues in LMDB. However, if you violate the 1:1 association between threads and transactions, you can easily corrupt LMDB's data structures. That is apparently what has happened in the Mozilla codebase, though we never got sufficient info to identify the root cause. I suggest if you want a well supported rust wrapper, use https://github.com/meilisearch/heed |
I've been looking for a rust binding to LMDB and I thought rkv may be a good candidate given it is under mozilla.
However, the recommendations in the README for production uses are quite conservative (full db in memory and synched transactions) and furthermore, there are references to LMDB crashes to be fixed.
It would be great if the readme provided references to what these crashes are, since that statement made my question my positive view of LMDB's stability.
The text was updated successfully, but these errors were encountered: