Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make live queries aware of reindexed (decrypted) values #236

Closed
wants to merge 1 commit into from

Conversation

arj03
Copy link
Member

@arj03 arj03 commented Dec 5, 2022

With this ssbc/ssb-db2#410 is now working. Clearly this needs a test here as well. But first I was wondering if this would be a good approach or not?

@github-actions
Copy link

github-actions bot commented Dec 5, 2022

Benchmark results

Part Speed Heap Change Samples
Count 1 big index (3rd run) 0.77ms ± 0.53ms -1.07 kB ± 32.83 kB 46
Create an index twice concurrently 839.81ms ± 6.23ms 17.39 kB ± 34.36 kB 63
Load core indexes 1.62ms ± 0.02ms 97.92 B ± 467.83 B 4838
Load two indexes concurrently 741.71ms ± 10.82ms 307.26 kB ± 299.81 kB 16
Paginate 10 results 19.71ms ± 0.15ms -38.23 kB ± 65.6 kB 27
Paginate 20000 msgs with pageSize=5 7703.63ms ± 247.84ms 458.48 kB ± 3571.47 kB 5
Paginate 20000 msgs with pageSize=500 551.15ms ± 5.13ms -1.83 kB ± 861.2 kB 20
Query 1 big index (1st run) 823.8ms ± 5.68ms 3.09 kB ± 80.43 kB 65
Query 1 big index (2nd run) 394.56ms ± 4.63ms 44.9 kB ± 27.58 kB 36
Query 3 indexes (1st run) 1257.85ms ± 12.84ms -34.55 kB ± 96.23 kB 42
Query 3 indexes (2nd run) 272.31ms ± 1.59ms -21.77 kB ± 45.64 kB 48
Query a prefix map (1st run) 326.73ms ± 3.82ms -46.78 kB ± 158.29 kB 21
Query a prefix map (2nd run) 10.29ms ± 0.72ms 79.94 kB ± 76.94 kB 23

@staltz
Copy link
Member

staltz commented Dec 6, 2022

Thanks for tackling this!

What do you think about a new API for this? I'm wondering if adding these decrypted msgs into the live() will break some assumptions about our APIs. Two assumptions: (1) every message appears in the stream only once, (2) live stream delivers only msgs recently added to the log.

@arj03
Copy link
Member Author

arj03 commented Dec 6, 2022

That is a good point. I considered that, but as you can see this is only in live. So for this to happen you would have to receive a message and the key at almost the same time. I fear that a new api might be a gotcha.

@staltz
Copy link
Member

staltz commented Dec 6, 2022

Wouldn't it be possible to read the same msg from the live stream, even if receival time is very different to decryption time?

Like if I subscribe to this live stream at 13:00, then receive a box2 encrypted msg at 13:02, and then later I'm added to a group at 14:00 and thus I get the decrypted msg at 14:00.

What I'm worried is that we're do a breaking change on live() and then creating new issues somewhere downstream. It's easier to make a non-breaking change with a new API.

@arj03
Copy link
Member Author

arj03 commented Dec 6, 2022

It's an interesting question about semantics 🙂 like what did the original live promise and how does it relate to box 2 which is a bit different in that you can not suddenly read older messages.

Thinking about this some more, maybe a good compromise would be to have two streams as you say and then people can combine them if needed. if it turns out that people do this pattern all the time, we can add a helper. In any case it will be really important to add this in the documentation that live does not return newly decrypted messages but that comes quite naturally when added the new api to the docs.

@arj03
Copy link
Member Author

arj03 commented Dec 6, 2022

As for your query. It's true that if you did a query on the author only, then you would get it twice. Good point

@staltz
Copy link
Member

staltz commented Dec 8, 2022

maybe a good compromise would be to have two streams as you say and then people can combine them if needed.

I like that compromise.

It's an interesting question about semantics slightly_smiling_face like what did the original live promise

From our readme, it says (bold is my emphasis):

Will setup a pull stream and this in cb. The pull stream will emit new values as they are added to the underlying log. This is meant to run after paginate or all.

@arj03
Copy link
Member Author

arj03 commented Dec 8, 2022

Yeah, I'm working on that solution. It's almost ready, just have to get all the tests working. It's a pitty that live has this old: true flag, in that when we add another operator (like decrypted()) it gets a bit muddy what these mean when you combine them. But the diff is pretty short. Will share later when I get everything running properly.

@staltz
Copy link
Member

staltz commented Dec 8, 2022

Also, if we split the stream then shouldn't this be implemented in ssb-db2? We can pick only the records that have been box2 decrypted and pass them into some pull-stream.

I actually don't understand how this PR #236 works, because it's adding the lowest offset of the range being reindexed, but we should be streaming all the offsets that have been decrypted within that range.

@staltz
Copy link
Member

staltz commented Dec 8, 2022

@staltz
Copy link
Member

staltz commented Dec 8, 2022

Sorry for yet another comment, but

I actually don't understand how this PR #236 works, because it's adding the lowest offset of the range being reindexed, but we should be streaming all the offsets that have been decrypted within that range.

I see now how it works because https://github.com/ssbc/ssb-db2/blob/b095c33cde3be59ce6e6e69be47ce80fdaf8116e/core.js#L1015 is called for all the just-decrypted records, BUT shouldn't we change this implementation such that we only call jitdb.reindex for the minimum offset, such that this offset covers the whole range of decrypted records? See e.g. that Airtable task about optimizing reindexEncrypted, and ssbc/ssb-db2#407

@arj03
Copy link
Member Author

arj03 commented Dec 9, 2022

Thanks for the input, working on a better solution now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants