Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(marshal): Add a smaller and more efficient passable encoding #1594

Merged
merged 14 commits into from
Feb 5, 2024

Conversation

gibson042
Copy link
Contributor

@gibson042 gibson042 commented May 23, 2023

Fixes #1349

Open questions:

  • [REJECTED] Should we push for extra compactness, e.g. bfalsebf/btruebt (booleans) and/or p1:0o (0n)?
  • [REJECTED] Should we allow custom remotable/promise/error encoding to use sequences and/or characters that would not otherwise be allowed, such as raw control characters U+0000 through U+001F or raw string escape characters !/_?
  • [REPLACE ALL C0 CONTROLS] For that matter, should we replace the full U+0000 through U+001F range in strings, or just a subset? The full range is necessary if U+0020 SPACE is the array element separator, but that could alternatively be U+0009 TAB and escapes could be limited to code points at or before that. This is really a tradeoff around expectations of character frequency inside strings, limiting blowup when embedding encodings in JSON-serialized containers such as current CapData communication, and (to some extent) human comprehensibility. There's not really a performance difference and at most a linear compactness difference, so it's really a question of aesthetics—for example, JSON-serializing the encoding of { "metasyntactic variables": ["foo", "bar", "baz"], tab: "\t", null: "\0" } would change as follows if we used tab-termination rather than space-termination:
    -"~(^^stab snull smetasyntactic!_variables  ^s!* s!! ^sfoo sbar sbaz   "
    +"~(^^stab\tsnull\tsmetasyntactic variables\t\t^s\n9\ts\n0\t^sfoo\tsbar\tsbaz\t\t\t"

@gibson042 gibson042 requested a review from erights May 23, 2023 20:24
@gibson042 gibson042 marked this pull request as draft May 23, 2023 20:24
packages/marshal/src/encodePassable.js Show resolved Hide resolved
@@ -328,11 +451,17 @@ export const makeEncodePassable = (encodeOptions = {}) => {
encodeRemotable = (rem, _) => Fail`remotable unexpected: ${rem}`,
encodePromise = (prom, _) => Fail`promise unexpected: ${prom}`,
encodeError = (err, _) => Fail`error unexpected: ${err}`,
xxx = false,
Copy link
Contributor Author

@gibson042 gibson042 May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a name for the smaller encoding and this property opting in to it, and ideally also a name for the current encoding. I don't have any good suggestions at the moment, so 🚲🏠 away!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts:

  • oldAndBusted vs newHawtness
  • tired vs wired

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • escapism vs realism

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacyOrdered vs compactOrdered ?

Copy link
Contributor Author

@gibson042 gibson042 Aug 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I don't think there's a fundamental aspect to the new encoding other than translation of rank ordering into lexicographic ordering (a feature which it shares with the current encoding). And the key detail differentiating the two is that the current encoding applies escaping to array elements (causing blowup for deep structures) while the new one applies escaping to strings and transitively to symbols, which are the only types that can contain characters confusable with array encoding delimiters and as scalars are immune to the blowup issue.

"Compact" is accurate in a relative sense, but keep in mind that we're still using text rather than raw octets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So legacyOrdered vs compactOrdered is fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think just "legacy" vs. "compact", e.g. makeEncodePassable({ format: "compact" }). What significance of "ordered" am I missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither smallcaps nor its predecessor are order preserving. The main feature of both the formats we're naming here is that they are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so IIUC you're thinking about something like makeMarshal(convertValToSlot, convertSlotToVal, { serializeBodyFormat: "compactOrdered" }) and wanting "compactOrdered" to be used everywhere. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly.

packages/marshal/src/encodePassable.js Outdated Show resolved Hide resolved
packages/marshal/src/encodePassable.js Outdated Show resolved Hide resolved
packages/marshal/src/encodePassable.js Outdated Show resolved Hide resolved
packages/marshal/src/encodePassable.js Show resolved Hide resolved
@erights
Copy link
Contributor

erights commented Jun 23, 2023

Hopefully fixes Agoric/agoric-sdk#2589 , yes?

Copy link
Contributor

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took so long!

Looking forward to using this! I think I can start making use of this soon.

packages/marshal/src/encodePassable.js Outdated Show resolved Hide resolved
@@ -328,11 +451,17 @@ export const makeEncodePassable = (encodeOptions = {}) => {
encodeRemotable = (rem, _) => Fail`remotable unexpected: ${rem}`,
encodePromise = (prom, _) => Fail`promise unexpected: ${prom}`,
encodeError = (err, _) => Fail`error unexpected: ${err}`,
xxx = false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacyOrdered vs compactOrdered ?

packages/marshal/src/encodePassable.js Outdated Show resolved Hide resolved
packages/marshal/src/encodePassable.js Show resolved Hide resolved
packages/marshal/src/encodePassable.js Outdated Show resolved Hide resolved
Comment on lines 455 to 468
const encodeString = xxx
? encodeStringWithEscapes
: encodeStringWithoutEscapes;
const encodeArray = xxx ? encodeArrayWithoutEscapes : encodeArrayWithEscapes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially very confused until I realized that encodeStringWithEscapes goes with encodeArrayWithoutEscapes and encodeStringWithoutEscapes goes with encodeArrayWithEscapes. Once we have a name for xxx, then perhaps use that in these function names so it is less surprising which goes with which.

packages/marshal/src/encodePassable.js Outdated Show resolved Hide resolved
packages/marshal/src/encodePassable.js Show resolved Hide resolved
packages/marshal/src/encodePassable.js Show resolved Hide resolved
packages/marshal/src/encodePassable.js Outdated Show resolved Hide resolved
@erights
Copy link
Contributor

erights commented Aug 18, 2023

Btw @gibson042 , I reviewed it as if it is Ready For Review. Is there a reason it is still Draft?

@erights
Copy link
Contributor

erights commented Aug 18, 2023

Any idea why CI didn't complete?

@gibson042
Copy link
Contributor Author

Hopefully fixes Agoric/agoric-sdk#2589 , yes?

Yes, I think we can make that claim.

Btw @gibson042 , I reviewed it as if it is Ready For Review. Is there a reason it is still Draft?

I consider it a draft until we settle on a name.

Any idea why CI didn't complete?

Pushed to kick CI, and I can also rebase on master if that won't disrupt review.

@erights
Copy link
Contributor

erights commented Aug 26, 2023

and I can also rebase on master if that won't disrupt review

Please do rebase to master, thanks.

@gibson042 gibson042 force-pushed the gibson-1349-encodepassable-escaping branch from 0cee48b to 402bb08 Compare August 27, 2023 16:12
gibson042 added a commit that referenced this pull request Aug 29, 2023
This provides better differention w.r.t. smallcaps, cf.
#1594 (comment)
gibson042 added a commit that referenced this pull request Sep 7, 2023
This provides better differention w.r.t. smallcaps, cf.
#1594 (comment)
@gibson042 gibson042 force-pushed the gibson-1349-encodepassable-escaping branch from c97046e to f2c1df6 Compare September 7, 2023 21:23
@gibson042
Copy link
Contributor Author

Updated to use "legacyOrdered" and "compactOrdered" names—so this PR is no longer a draft!

@gibson042 gibson042 marked this pull request as ready for review September 7, 2023 22:13
Copy link
Contributor

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

(If possible, would be nice to see this merged before tomorrow's ocapn mtg)

@gibson042 gibson042 marked this pull request as draft September 12, 2023 19:59
@gibson042
Copy link
Contributor Author

gibson042 commented Sep 12, 2023

Returning to draft because I found that the new format needs more handling for output from functions provided to makeEncodePassable:

import "@endo/init";
import { Far } from "@endo/far";
import { makeMarshal, makeEncodePassable } from "@endo/marshal";
const r0 = Far("Foo");
const data = harden([r0, r0, "."]);
const safeEncodeRemotable = r => "r0";
const unsafeEncodeRemotable = r => "r 0";
const baseOptions = { format: "compactOrdered" };
console.log({
  safeEncoding: makeEncodePassable({ ...baseOptions, encodeRemotable: safeEncodeRemotable })(data),
  unsafeEncoding: makeEncodePassable({ ...baseOptions, encodeRemotable: unsafeEncodeRemotable })(data),
});
/* =>
{
  safeEncoding: '~^r0 r0 s. ',
  unsafeEncoding: '~^r 0 r 0 s. '
}
*/

@gibson042
Copy link
Contributor Author

Returning to draft because I found that the new format needs more handling for output from functions provided to makeEncodePassable

Resolved now; @erights please review the updates at 9c9d1e3 .

@gibson042 gibson042 marked this pull request as ready for review September 12, 2023 22:42
@@ -346,7 +567,7 @@ export const makeEncodePassable = (encodeOptions = {}) => {
return encodeBinary64(passable);
}
case 'string': {
return `s${passable}`;
return `s${encodeStringSuffix(passable)}`;
}
case 'boolean': {
return `b${passable}`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're making changes, should we also replace "btrue" and "bfalse" with e.g. "bt" and "bf"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b1 and b0?

@erights erights self-requested a review September 13, 2023 02:33
@erights
Copy link
Contributor

erights commented Sep 13, 2023 via email

@gibson042 gibson042 force-pushed the gibson-1349-encodepassable-escaping branch from 85a996c to a99748a Compare January 12, 2024 16:33
@warner
Copy link
Contributor

warner commented Jan 17, 2024

Any chance we could get a neighboring .md document that explains the format?

This document would also be a good place to mention that the format: option limits the decoder's expectations.. if I read this correctly, there's no option to build a decoder that can handle both new- and old- format data. To use this in liveslots, we'll either need:

  • 1: Set a vat-wide flag to say what format we're using for all virtual collection keys. If we're launching a brand new vat, we can set it to use the new format forever. If the new-format-capable liveslots is loaded in a pre-existing vat (ie vat upgrade), and it doesn't see this flag, it must stick with the old format forever.
  • or 2: Introduce some sort of prefix in the vatstore records to indicate whether each record is new- or old- format. The current vatstore key is like vc.5.sfoo (where vc means "virtual collection", 5 means collection number 5, and sfoo is the passable-encoded string key foo). If there's a format key (s/n/p/b/r/etc) that's guaranteed to not appear in the old format, we could use that as a per-record indicator, but it would break sort order between old- and new- format keys of the same category. So that doesn't leave us much/any room for a per-record format indicator, but maybe we could do a per-collection indicator, and at least allow collections created after the upgrade to use the new format.

@mhofman
Copy link
Contributor

mhofman commented Jan 17, 2024

  • Introduce some sort of prefix in the vatstore records to indicate whether each record is new- or old- format.

I expect this to be per collection to preserve sort order between entries, so it could be in the vc metadata.

gibson042 added a commit that referenced this pull request Jan 18, 2024
This provides better differention w.r.t. smallcaps, cf.
#1594 (comment)
@gibson042 gibson042 force-pushed the gibson-1349-encodepassable-escaping branch from a99748a to 8f93653 Compare January 18, 2024 01:04
gibson042 added a commit that referenced this pull request Jan 25, 2024
This provides better differention w.r.t. smallcaps, cf.
#1594 (comment)
@gibson042 gibson042 force-pushed the gibson-1349-encodepassable-escaping branch from 8f93653 to 17601ac Compare January 25, 2024 06:31
This provides better differention w.r.t. smallcaps, cf.
#1594 (comment)
…ble/promise/error compactOrdered encodings

(rather than forcing everything through string encoding)
… remotable/promise/error compactOrdered encodings
* Introduce a short-circuiting `getSuffix` (for skip=0).
* Extend "skip" to `innerDecode`, `decodeBinary64`, `decodeRecord`, and `decodeTagged`.
@gibson042 gibson042 force-pushed the gibson-1349-encodepassable-escaping branch from 17601ac to 70bcca3 Compare February 5, 2024 16:58
@gibson042 gibson042 enabled auto-merge February 5, 2024 16:59
@gibson042 gibson042 merged commit f721c3a into master Feb 5, 2024
14 checks passed
@gibson042 gibson042 deleted the gibson-1349-encodepassable-escaping branch February 5, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change ordered binary encoding to fix bad nesting complexity.
5 participants