Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new rcur read-only cursor type, adapt code to use it. #45

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

rustyrussell
Copy link
Contributor

The result is cleaner, I think, but the transition is kinda painful!

An rcur can be safely be pulled from indefinitely (giving zeros), and then at the end you can check if it was all OK. This makes for very robust error handling, and if you want to peek, rather than pull, you can save and restore rcur easily (though we don't often do that!).

After this, I have a PR series which replaces the remaining cursor parts with wbuf.

This code was really confusing!

1. We're using single-char lookahead to speed up the parsing, use a case.
2. Every success path is the same, so use a goto.
3. There was a bug in cursor_skip which failed at the end of a buffer.
4. It manually manipulated cursor instead of using cursor_skip.

Signed-off-by: Rusty Russell <[email protected]>
For now it looks almost identical to `struct cursor`, for ease of
transition.

But it has several nice properties:
1. Once invalidated, it's a noop, for much easier error handling.
2. It's hard to abuse, since it's const.
3. Has some nice functions and constructors.

It's only inline for those constructors: the optimizer should
deal very nicely with returning a structure by copy in this case.

Signed-off-by: Rusty Russell <[email protected]>
This shows what it looks like to use rbuf.  The key compare doesn't
care what happens if there's an error (as long as it doesn't crash),
so we can mainly just bulldoze though.

Signed-off-by: Rusty Russell <[email protected]>
This shows several things:
1. The rcur functions map naturally onto usage (e.g. returning NULL).
2. You don't need to worry about testing every return, just test if it
   was all OK at the end.

As a general rule:
1. Avoid `inline` inside .c files, since compiler will do that.
2. bool is clearer than int if it's really boolean.

Signed-off-by: Rusty Russell <[email protected]>
Fairly simple transform, now we have 'rcur_pull_word'.

Note that return value was unused, so changed to void.

Signed-off-by: Rusty Russell <[email protected]>
1. rcur_peek returns a pointer, clearer than cursor's peek_char.
2. parse functions which leave untouched on failure renamed to "parse_if_xxx".
3. consume_until_end_url "or_end" is always true, remove it.
4. Don't look backwards for '(' before URL, pass in from caller.

I had to change parse_if_mention_index, since GCC didn't understand
d3 couldn't be a digit unless d2 was:

```
In function ‘parse_if_mention_index’,
    inlined from ‘ndb_parse_content’ at src/content_parser.c:518:8:
src/content_parser.c:57:40: error: ‘d2’ may be used uninitialized [-Werror=maybe-uninitialized]
   57 |                 ind = (d1 * 100) + (d2 * 10) + d3;
      |                                    ~~~~^~~~~
src/content_parser.c: In function ‘ndb_parse_content’:
src/content_parser.c:42:17: note: ‘d2’ was declared here
   42 |         int d1, d2, d3, ind;
      |                 ^~
cc1: all warnings being treated as errors
```

Signed-off-by: Rusty Russell <[email protected]>
Straightforward, but note that `parse_nostr_tlv` fails (*without*
invaliding rcur) if it's at EOF.  This is because it's used to
iterate.

Many functions return bool, or void.  The bool is sometimes gratuitous
(the caller can check if the rcur is still valid), but convenient in
the existing code.

Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
pull_block_type used to restore the cursor on failure, but the callers
don't care, so don't bother (they could trivially do it if they wanted).

Signed-off-by: Rusty Russell <[email protected]>
It's convenient now we've converted the code.

Signed-off-by: Rusty Russell <[email protected]>
A few places where we reached into rcur directly can now be removed.

This means we save the pointer to the initial content, and have
helpers to transfer into ndb_str_block.

URL parsing is a bit awkward, but it always was.

parse_nostr_bech32 (only used in test code) is altered too.

Signed-off-by: Rusty Russell <[email protected]>
Now we've gotten rid of the remaining direct accesses, we can simplify
into a more natural ptr/len pair.  And get rid of the cursor shims.

Signed-off-by: Rusty Russell <[email protected]>
@jb55
Copy link
Contributor

jb55 commented Aug 31, 2024

The content parser refactor looks much better! I always disliked that code.

I'll test this in notedeck and see if anything crops up.

return rcur_pull(rcur, len) != NULL;
}

unsigned char rcur_pull_byte(struct rcur *rcur)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very hot function (especially when used in ndb_text_search_key_compare > rcur_pull_varint) where lmdb can call this many times when walking the btree during search queries.

it seems this hot code path now calls (rcur_pull > rcur_peek > rcur_peek_remainer+memcpy for a single byte)

when before it was just:

static inline int cursor_pull_byte(struct cursor *cursor, unsigned char *c)
{
        if (unlikely(cursor->p >= cursor->end))
                return 0;

        *c = *cursor->p;
        cursor->p++;

        return 1;
}

this makes me a bit nervous, but maybe the compiler will optimize this all away. will try to do some search benchmarking and compare

@jb55
Copy link
Contributor

jb55 commented Aug 31, 2024

this seems to break something with search. I try ndb search bitcoin and it returns 0 results on my notedeck db with 122,748 notes in it. this could happen if any of the text search key compare functions have behavioral changes.

this is my fault somewhat since many of these core things don't have much test coverage...

@jb55
Copy link
Contributor

jb55 commented Aug 31, 2024

some other tests:

I tried to import data with ndb import with this branch, search still did not work. when I switched back and tried search, it still did not work. I had to re-import with the data with master for it to work again.

this tells me this branch is not only reading, but writing the indices differently. or there could be some other issue, not sure. we have ndb print-search-tags which doesn't show any obvious issue. will have to step through ndb search and see why its not returning anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants