Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process missed blocks on startup #20
Process missed blocks on startup #20
Changes from 16 commits
c74ef48
1280b4a
551b90e
5ca0bd0
ba3f2fb
39fdc21
a1aab00
6913fb9
505b337
de7a347
349ac1e
162ca0d
7d07d82
9a0baa0
dc7ae3c
860f595
aa8dd8b
ac5d57d
9a51727
16af742
2bb9605
1727f90
e49395b
ebdfc79
532094c
f4d2c83
17879c0
4e5419d
cf9a0e7
9fdd275
32146a4
ce66b82
e012c4e
44384ca
e9c0b01
878d4c6
85cf76c
813a8e9
fde1675
c4e36bd
b8902c5
9bda673
48ec689
4ed7671
4cdc487
55bc03c
232f6fe
5b0cc63
995d204
e30a4ac
1e9891d
8870d8b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return []byte instead of string directly? When we write in to the file, we convert []byte to string. Also, when we use the result, we convert []byte to string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline, but for visibility, I chose
[]byte
in the interface so that the keys and values could be as general as possible. The underlying JSON databse uses strings so that the resulting file is human readable.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit counter intuitive that
Put
would first involve reading the whole DB file from disk. Should we consider adding an in-memory cache/copy of the current values such that we don't need to read before writing?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I added a field for the current chain state, which Put writes to before writing the whole object to disk. We can of course use the cached state in
Get
as well, but I think we should move that to its own issue.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is a problem in our use case, but by releasing the lock in between the read and write, concurrent Puts can result in "lost updates" where one or more writes are "lost" (essentially overwritten by a previous snapshot).
If this can be a problem, then we should grab the lock before the read and release it after the write. We can do that by doing one of the following: move the locking out of read and write (and perform them in the functions that call them), use recursive locks (I don't think they are supported by default in golang), or add a flag to read/write to tell them that a lock is not required for this call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our use case, only under extreme circumstances would there be concurrent writes to the same JSON file in such a way that could cause issues. In order for that to happen, we'd have to have logs be processed in such a way that subsequent blocks would overlap in the DB write. Even if that were to occur, the worst that would happen would be a log being processed multiple times, which wouldn't be an issue for correctness. #23 would guard against this as well.
That all being said, having the writes be atomic is a good idea, that eliminates this class of bugs altogether, so I've moved the locking behavior into
Put
/Get
as you suggested.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for a future PR: If we want to keep a few old versions of the file, we can implement "log rotation" by renaming the existing file to a backup name (same name but with an added suffix) before overwriting it. Probably not necessary for our use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We may want to consider to performing a sync on the file if we want to be sure it is on disk (and not just in the writeback buffer of the file cache). Not sure if it is necessary in our use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of this until you pointed it out, but Go
File
s do not perform buffered writes, so as soon asFile.Write
is called (as is the case internally in all of theos
function calls here), the changes are written to disk. See here for a discussion on this: https://stackoverflow.com/a/10890222There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect golang's File object is a light wrapper around a file descriptor (created using an open syscall). Writes to it are not buffered at the application level (unlike files opened with fopen) so they do not need to be flushed, but they are still written to the file system's buffercache first and later written back to disk.
An explicit sync is not necessary if the system does not abruptly power down. However, if there is a power outage, data in the buffercache that hasn't been written to disk can be lost. Most applications don't bother with calling sync as it hurts performance. However, databases often do, and we are building a type of database here.
With that said, a sync here might not be necessary if we assume that the filesystem will writeback the inode update (for the rename) after the writes to the file. This is probably a good assumption for most filesystems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would
os.ErrNotExist
catch? In the comments and it mentions returning a*PathError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and how should we handle if an error is returned but it's not
os.ErrNoExist
? Right now we continue to read fileThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.Stat
returns a*PathError
that wraps aos.ErrNotExist
if the file is not found. See here for an example: https://go.dev/play/p/qFwPDKFqSZS