-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace JSON-based marshal with dense binary encoding #2589
Comments
We should also switch before this to a more efficient text representation. That makes integration and debugging in browsers and other web infrastructure much easier. There's no reason not to have a format that is more readable and incidentally more efficient than the JSON encoding. |
There are plenty of reasons. Whether this is the right thing to do or not, it is a tradeoff. For one thing, any text format other than JSON will need a parser written in JS, so it will be hard to beat |
I'm struggling to understand the problem statement. All I can find is "they're awfully fluffy". What are the symptoms of this problem? What suggests these symptoms are going to have sufficient impact that it's worth a change? |
Wow, that is much more different than I expected! Definitely something we should look at closely. |
prompted by #6038 |
@dtribble also had an idea:
|
Provided there is an outer novel qclass record to prevent misinterpretation across the version skew, I endorse @dtribble 's idea. Both in absolute terms and in comparison to the current state of |
See endojs/endo#1349 . Well, except the replacing part. It will co-exist with smallcaps, not replace it. |
Hopefully fixed by endojs/endo#1594 Though still, except the replacing part. It will co-exist with smallcaps, not replace it. |
I am wondering more and more if direct CBOR serialization is not a better approach to avoid all the string manipulation logic, even with more efficient passable encodings, and approaches like endojs/endo#1478 which effectively defers some serialization with marshal only taking care of encoding into a serializable object. CBOR seems to support streaming and can easily nest pre-serialized data by wrapping it (which can be handled using a list of ArrayBuffers in the related APIs). The main problem is that the story for "frozen" ArrayBuffer in JS is still not good at all (and handling of binary data still very primitive). |
If we ever get around to using the compact-order encoding from endojs/endo#1594, updating liveslots might be complex. Besides migrating existing data in legacy encoding, the legacy encoding had the property that all output sorted before |
Endo issue: endojs/endo#2600 |
What is the Problem Being Solved?
(I thought we had a ticket for this already, but I couldn't find it.. feel free to close this as a duplicate if you find one)
Our
marshal
library currently uses a JSON-based encoding scheme to convert capability-bearing object graphs into pure data ("capdata", which is an object with abody
string andslots
array of object reference strings). It works by building a tree of objects from the original object graph, then using the built-inJSON.stringify
to convert the tree into a single string. Thedeserialize
function does the same thing in reverse.The swingset comms vat has a related encoding scheme to convert messages to/from the "comms protocol", which adds additional control messages to the mix (deliver, notify, dropImport, and sequence numbers). It basically takes the two-part capdata object and encodes it into a single string.
The resulting output formats are pretty easy to specify, and nicely neutral (it would be pretty straightforward to port them to other languages than JavaScript), and convenient for humans to read the encoded output. But they're awfully fluffy. Common JS objects like
undefined
andBigInt(1)
expand to 23- and 33- byte strings respectively.#1661 touches on shrinking the representation of
undefined
, but points out that it would be a distraction: the real task is to replace the encoding entirely, with some dense binary format.Given our heritage and community, CapnProto is a likely choice. #1827 explores the idea (in a different context). We might also consider CBOR (RFC 8949), which I think is the encoding of preference in the IPFS world.
This touches on the question of how we want to represent binary data in our system: currently we can use the JS
ByteArray
in vat code, but they are mutable, andmarshal
can't handle them, so we're left in the awkward position of recommending base64-encoded strings, eww. #46 ("large string device") touches on this, as the data it manages might be arbitrary bytes.Compatibility Issues
Every vat within a single kernel should probably use the same encoding scheme, because the messages created by liveslots in one vat are sent into the kernel, (sometimes) stored in kernel promise queues, and delivered to other vats. The kernel should not be in the business of interpreting the
body
of capdata, so I don't think I'd want it responsible for conversion. User-level vat code does not generally see the output ofmarshal
, because liveslots and/or the virtual objects layer abstracts that away, although I believe userspace is given access toserialize
(but notdeserialize
) for some reason I don't currently understand.Converting an existing kernel to the new format could be tricky. We'd need to convert all vats' liveslots code at the same time, and simultaneously convert anything on the run-queue and the various unresolved-promise message queues (which assumes we could write a function that converts an old-format message to a new-format message, which at least sounds plausible).
Two different swingsets, connected by a comms connection, have a similar problem. We might lean more heavily on that hypothetical format-conversion function, and make the comms vat tolerant of the other version (upconverting upon receipt, downconverting on transmit).
The text was updated successfully, but these errors were encountered: