Add a new `rcur` read-only cursor type, adapt code to use it. #45

rustyrussell · 2024-08-31T07:29:03Z

The result is cleaner, I think, but the transition is kinda painful!

An rcur can be safely be pulled from indefinitely (giving zeros), and then at the end you can check if it was all OK. This makes for very robust error handling, and if you want to peek, rather than pull, you can save and restore rcur easily (though we don't often do that!).

After this, I have a PR series which replaces the remaining cursor parts with wbuf.

This code was really confusing! 1. We're using single-char lookahead to speed up the parsing, use a case. 2. Every success path is the same, so use a goto. 3. There was a bug in cursor_skip which failed at the end of a buffer. 4. It manually manipulated cursor instead of using cursor_skip. Signed-off-by: Rusty Russell <[email protected]>

For now it looks almost identical to `struct cursor`, for ease of transition. But it has several nice properties: 1. Once invalidated, it's a noop, for much easier error handling. 2. It's hard to abuse, since it's const. 3. Has some nice functions and constructors. It's only inline for those constructors: the optimizer should deal very nicely with returning a structure by copy in this case. Signed-off-by: Rusty Russell <[email protected]>

This shows what it looks like to use rbuf. The key compare doesn't care what happens if there's an error (as long as it doesn't crash), so we can mainly just bulldoze though. Signed-off-by: Rusty Russell <[email protected]>

This shows several things: 1. The rcur functions map naturally onto usage (e.g. returning NULL). 2. You don't need to worry about testing every return, just test if it was all OK at the end. As a general rule: 1. Avoid `inline` inside .c files, since compiler will do that. 2. bool is clearer than int if it's really boolean. Signed-off-by: Rusty Russell <[email protected]>

Fairly simple transform, now we have 'rcur_pull_word'. Note that return value was unused, so changed to void. Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]>

1. rcur_peek returns a pointer, clearer than cursor's peek_char. 2. parse functions which leave untouched on failure renamed to "parse_if_xxx". 3. consume_until_end_url "or_end" is always true, remove it. 4. Don't look backwards for '(' before URL, pass in from caller. I had to change parse_if_mention_index, since GCC didn't understand d3 couldn't be a digit unless d2 was: ``` In function ‘parse_if_mention_index’, inlined from ‘ndb_parse_content’ at src/content_parser.c:518:8: src/content_parser.c:57:40: error: ‘d2’ may be used uninitialized [-Werror=maybe-uninitialized] 57 | ind = (d1 * 100) + (d2 * 10) + d3; | ~~~~^~~~~ src/content_parser.c: In function ‘ndb_parse_content’: src/content_parser.c:42:17: note: ‘d2’ was declared here 42 | int d1, d2, d3, ind; | ^~ cc1: all warnings being treated as errors ``` Signed-off-by: Rusty Russell <[email protected]>

Straightforward, but note that `parse_nostr_tlv` fails (*without* invaliding rcur) if it's at EOF. This is because it's used to iterate. Many functions return bool, or void. The bool is sometimes gratuitous (the caller can check if the rcur is still valid), but convenient in the existing code. Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]>

pull_block_type used to restore the cursor on failure, but the callers don't care, so don't bother (they could trivially do it if they wanted). Signed-off-by: Rusty Russell <[email protected]>

It's convenient now we've converted the code. Signed-off-by: Rusty Russell <[email protected]>

A few places where we reached into rcur directly can now be removed. This means we save the pointer to the initial content, and have helpers to transfer into ndb_str_block. URL parsing is a bit awkward, but it always was. parse_nostr_bech32 (only used in test code) is altered too. Signed-off-by: Rusty Russell <[email protected]>

Now we've gotten rid of the remaining direct accesses, we can simplify into a more natural ptr/len pair. And get rid of the cursor shims. Signed-off-by: Rusty Russell <[email protected]>

jb55 · 2024-08-31T10:59:47Z

The content parser refactor looks much better! I always disliked that code.

I'll test this in notedeck and see if anything crops up.

jb55 · 2024-08-31T11:17:56Z

src/rcur.c

+    return rcur_pull(rcur, len) != NULL;
+}
+
+unsigned char rcur_pull_byte(struct rcur *rcur)


this is a very hot function (especially when used in ndb_text_search_key_compare > rcur_pull_varint) where lmdb can call this many times when walking the btree during search queries.

it seems this hot code path now calls (rcur_pull > rcur_peek > rcur_peek_remainer+memcpy for a single byte)

when before it was just:

static inline int cursor_pull_byte(struct cursor *cursor, unsigned char *c) { if (unlikely(cursor->p >= cursor->end)) return 0; *c = *cursor->p; cursor->p++; return 1; }

this makes me a bit nervous, but maybe the compiler will optimize this all away. will try to do some search benchmarking and compare

jb55 · 2024-08-31T13:21:46Z

this seems to break something with search. I try ndb search bitcoin and it returns 0 results on my notedeck db with 122,748 notes in it. this could happen if any of the text search key compare functions have behavioral changes.

this is my fault somewhat since many of these core things don't have much test coverage...

jb55 · 2024-08-31T13:50:24Z

some other tests:

I tried to import data with ndb import with this branch, search still did not work. when I switched back and tried search, it still did not work. I had to re-import with the data with master for it to work again.

this tells me this branch is not only reading, but writing the indices differently. or there could be some other issue, not sure. we have ndb print-search-tags which doesn't show any obvious issue. will have to step through ndb search and see why its not returning anything.

rustyrussell added 13 commits August 27, 2024 18:22

nostrdb.c: use rcur for text_search_key_compare.

0b55c52

This shows what it looks like to use rbuf. The key compare doesn't care what happens if there's an error (as long as it doesn't crash), so we can mainly just bulldoze though. Signed-off-by: Rusty Russell <[email protected]>

nostrdb.c: use rcur inside ndb_parse_words().

48c6226

Fairly simple transform, now we have 'rcur_pull_word'. Note that return value was unused, so changed to void. Signed-off-by: Rusty Russell <[email protected]>

content_parser.c: consume_url_path doesn't fail, return void.

b8c8747

Signed-off-by: Rusty Russell <[email protected]>

invoice.c: use rcur.

79e1ee6

Signed-off-by: Rusty Russell <[email protected]>

block.c: use rcur.

d8191d9

pull_block_type used to restore the cursor on failure, but the callers don't care, so don't bother (they could trivially do it if they wanted). Signed-off-by: Rusty Russell <[email protected]>

block.c: make iterator use an rcur, not a simple pointer.

0a784d1

It's convenient now we've converted the code. Signed-off-by: Rusty Russell <[email protected]>

rcur: simplify.

c67ae4e

Now we've gotten rid of the remaining direct accesses, we can simplify into a more natural ptr/len pair. And get rid of the cursor shims. Signed-off-by: Rusty Russell <[email protected]>

jb55 reviewed Aug 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new `rcur` read-only cursor type, adapt code to use it. #45

Add a new `rcur` read-only cursor type, adapt code to use it. #45

rustyrussell commented Aug 31, 2024

jb55 commented Aug 31, 2024

jb55 Aug 31, 2024

jb55 commented Aug 31, 2024 •

edited

Loading

jb55 commented Aug 31, 2024

Add a new rcur read-only cursor type, adapt code to use it. #45

Are you sure you want to change the base?

Add a new rcur read-only cursor type, adapt code to use it. #45

Conversation

rustyrussell commented Aug 31, 2024

jb55 commented Aug 31, 2024

jb55 Aug 31, 2024

Choose a reason for hiding this comment

jb55 commented Aug 31, 2024 • edited Loading

jb55 commented Aug 31, 2024

Add a new `rcur` read-only cursor type, adapt code to use it. #45

Add a new `rcur` read-only cursor type, adapt code to use it. #45

jb55 commented Aug 31, 2024 •

edited

Loading