-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new rcur
read-only cursor type, adapt code to use it.
#45
base: master
Are you sure you want to change the base?
Conversation
This code was really confusing! 1. We're using single-char lookahead to speed up the parsing, use a case. 2. Every success path is the same, so use a goto. 3. There was a bug in cursor_skip which failed at the end of a buffer. 4. It manually manipulated cursor instead of using cursor_skip. Signed-off-by: Rusty Russell <[email protected]>
For now it looks almost identical to `struct cursor`, for ease of transition. But it has several nice properties: 1. Once invalidated, it's a noop, for much easier error handling. 2. It's hard to abuse, since it's const. 3. Has some nice functions and constructors. It's only inline for those constructors: the optimizer should deal very nicely with returning a structure by copy in this case. Signed-off-by: Rusty Russell <[email protected]>
This shows what it looks like to use rbuf. The key compare doesn't care what happens if there's an error (as long as it doesn't crash), so we can mainly just bulldoze though. Signed-off-by: Rusty Russell <[email protected]>
This shows several things: 1. The rcur functions map naturally onto usage (e.g. returning NULL). 2. You don't need to worry about testing every return, just test if it was all OK at the end. As a general rule: 1. Avoid `inline` inside .c files, since compiler will do that. 2. bool is clearer than int if it's really boolean. Signed-off-by: Rusty Russell <[email protected]>
Fairly simple transform, now we have 'rcur_pull_word'. Note that return value was unused, so changed to void. Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
1. rcur_peek returns a pointer, clearer than cursor's peek_char. 2. parse functions which leave untouched on failure renamed to "parse_if_xxx". 3. consume_until_end_url "or_end" is always true, remove it. 4. Don't look backwards for '(' before URL, pass in from caller. I had to change parse_if_mention_index, since GCC didn't understand d3 couldn't be a digit unless d2 was: ``` In function ‘parse_if_mention_index’, inlined from ‘ndb_parse_content’ at src/content_parser.c:518:8: src/content_parser.c:57:40: error: ‘d2’ may be used uninitialized [-Werror=maybe-uninitialized] 57 | ind = (d1 * 100) + (d2 * 10) + d3; | ~~~~^~~~~ src/content_parser.c: In function ‘ndb_parse_content’: src/content_parser.c:42:17: note: ‘d2’ was declared here 42 | int d1, d2, d3, ind; | ^~ cc1: all warnings being treated as errors ``` Signed-off-by: Rusty Russell <[email protected]>
Straightforward, but note that `parse_nostr_tlv` fails (*without* invaliding rcur) if it's at EOF. This is because it's used to iterate. Many functions return bool, or void. The bool is sometimes gratuitous (the caller can check if the rcur is still valid), but convenient in the existing code. Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
pull_block_type used to restore the cursor on failure, but the callers don't care, so don't bother (they could trivially do it if they wanted). Signed-off-by: Rusty Russell <[email protected]>
It's convenient now we've converted the code. Signed-off-by: Rusty Russell <[email protected]>
A few places where we reached into rcur directly can now be removed. This means we save the pointer to the initial content, and have helpers to transfer into ndb_str_block. URL parsing is a bit awkward, but it always was. parse_nostr_bech32 (only used in test code) is altered too. Signed-off-by: Rusty Russell <[email protected]>
Now we've gotten rid of the remaining direct accesses, we can simplify into a more natural ptr/len pair. And get rid of the cursor shims. Signed-off-by: Rusty Russell <[email protected]>
The content parser refactor looks much better! I always disliked that code. I'll test this in notedeck and see if anything crops up. |
return rcur_pull(rcur, len) != NULL; | ||
} | ||
|
||
unsigned char rcur_pull_byte(struct rcur *rcur) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very hot function (especially when used in ndb_text_search_key_compare
> rcur_pull_varint
) where lmdb can call this many times when walking the btree during search queries.
it seems this hot code path now calls (rcur_pull
> rcur_peek
> rcur_peek_remainer
+memcpy
for a single byte)
when before it was just:
static inline int cursor_pull_byte(struct cursor *cursor, unsigned char *c)
{
if (unlikely(cursor->p >= cursor->end))
return 0;
*c = *cursor->p;
cursor->p++;
return 1;
}
this makes me a bit nervous, but maybe the compiler will optimize this all away. will try to do some search benchmarking and compare
this seems to break something with search. I try this is my fault somewhat since many of these core things don't have much test coverage... |
some other tests: I tried to import data with this tells me this branch is not only reading, but writing the indices differently. or there could be some other issue, not sure. we have |
The result is cleaner, I think, but the transition is kinda painful!
An
rcur
can be safely be pulled from indefinitely (giving zeros), and then at the end you can check if it was all OK. This makes for very robust error handling, and if you want to peek, rather than pull, you can save and restorercur
easily (though we don't often do that!).After this, I have a PR series which replaces the remaining cursor parts with
wbuf
.