Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send output item data as VSBuffer #125819

Closed
Tracked by #130950
jrieken opened this issue Jun 9, 2021 · 5 comments · Fixed by #130452
Closed
Tracked by #130950

Send output item data as VSBuffer #125819

jrieken opened this issue Jun 9, 2021 · 5 comments · Fixed by #130452
Assignees
Labels
debt Code quality issues insiders-released Patch has been released in VS Code Insiders notebook
Milestone

Comments

@jrieken
Copy link
Member

jrieken commented Jun 9, 2021

Currently output items send data as number array. That's not ideal and use send things a VSBuffer for which the IPC layer is fast. This is a bigger change because it requires to stop using IOutputItemDto from extHost.protocol.ts

@jrieken jrieken self-assigned this Jun 9, 2021
@jrieken jrieken added debt Code quality issues notebook labels Jun 9, 2021
@jrieken
Copy link
Member Author

jrieken commented Jun 9, 2021

kinda blocked by #125668

@RandomFractals
Copy link

RandomFractals commented Jul 26, 2021

@jrieken even the current output data array is chopped off: RandomFractals/vscode-data-table#30 (comment)

would be nice to read the whole binary array data from cell output. Are you truncating it for text data display only?

data-array-loading-error

@mjbvz
Copy link
Collaborator

mjbvz commented Aug 5, 2021

I looked into this a little since I'd done something similar for webviews. I think there are two potential approaches to optimize NotebookOutputItemDto:

Pass the the output data buffers as top level arguments to ipc calls

For any ipc call that deals with an cell output, pass the list of output VSBuffers as top level arguments:

$tryApplyEdits(editorId: string, modelVersionId: number, cellEdits: ICellEditOperationDto[], ...outputBuffers: VSBuffer[]): Promise<boolean>

This ensures that they are efficiently serialized by our existing ipc implementation

Then for NotebookOutputItemDto, we use a reference to tell us which output buffer from the list of buffers to use:

export interface NotebookOutputItemDto {
	readonly mime: string;
	readonly bufferIndex: number;
}

Making this change is sort of a pain because it requires updating a lot of to/from conversation functions to take an array of output buffers

Add ipc support for efficiently serializing VSBuffers inside of objects

First update NotebookOutputItemDto to use a VS Buffer

export interface NotebookOutputItemDto {
	readonly mime: string;
	readonly buffer: VSBuffer;
}

On its own, this would result in the buffer being very inefficiently serialized. So we'd also need update our IPC implementation to deeply detect and VSBuffers inside of passed in arguments and then serialize them more efficiently

This efficiently serialization would likely be similar to what we do for webviews:

  • While serializing the arguments, walk each object to extract all buffers and replace them with references (just indexes into the list of extracted buffers)
  • The write the serialized object and buffers to the wire
  • On the receiving side while deserializing objects, walk though them again and replace all buffer references with the corresponding buffer

I feel this approach is probably better since it not specific to notebooks. However I'm not sure if it would significantly degrade performance for all ipc calls. I believe we already walk objects during serialization/deserialization to handle URIs so it may be ok. We could also make the behavior opt-in and only enable it for specific calls that need it

@RandomFractals
Copy link

RandomFractals commented Aug 5, 2021

I have not looked at what constitutes a VSBuffer. Would it be possible to instead use application/octet-stream mime type for cell outputs with binary data, and allow renderers to read them with a standard ReadableStream? See Using Readable Streams and Using Writable Streams.

NodeJS has much better documentation on streams. I assume VSBuffer does some of it. However, I don't like the proposed api above. I'd prefer to see cell outputs and renderers use standard JS APIs for data streaming and piping data to make them easier to test, etc.

I liked that current cell output API exposes blob(), but I was not able to read the whole data blob either in my tests.

This goes back to the original discussion on cell data outputs we started here: #123884 (comment)

@mjbvz mjbvz self-assigned this Aug 10, 2021
mjbvz added a commit that referenced this issue Aug 10, 2021
Fixes #125819

This PR lets us transfer the notebook output as a VSBuffer instead of as an array of numbers

To implement this, I reworked our rpc logic to support serializing VSBuffers inside of objects. Previously rpc calls could only efficiently serialize VSBuffers that appear as top level arguments

To support transferring VS Buffers, during serialization we use the `replacer` argument on `JSON.stringify` to extract and replace all buffers in the arguments with special reference objects. During de-serialization, when we encounter one of these reference objects we simply replace it with the corresponding restored buffer.

While implementing the new rpc logic, the separation between 'mixed' and 'simple' rpc arguments became less clear. Therefore I decided to try to combine roc calls to use a single arguments format. This simplified the code but I may revisit this to see if I can improve the performance. Right now serialization up 2x slower for large objects (usually it's less than that), while deserialization is about the same as it was previously. I believe the root cause is that we now potentially make two function calls per argument value: one for the outer replacer and one for the inner `JSONStringifyReplacer`.
mjbvz added a commit that referenced this issue Aug 11, 2021
Fixes #125819

This PR lets us transfer the notebook output as a VSBuffer instead of as an array of numbers

To implement this, I reworked our rpc logic to support serializing VSBuffers inside of objects. Previously rpc calls could only efficiently serialize VSBuffers that appear as top level arguments

To support transferring VS Buffers, during serialization we use the `replacer` argument on `JSON.stringify` to extract and replace all buffers in the arguments with special reference objects. During de-serialization, when we encounter one of these reference objects we simply replace it with the corresponding restored buffer.

While implementing the new rpc logic, the separation between 'mixed' and 'simple' rpc arguments became less clear. Therefore I decided to try to combine roc calls to use a single arguments format. This simplified the code but I may revisit this to see if I can improve the performance. Right now serialization up 2x slower for large objects (usually it's less than that), while deserialization is about the same as it was previously. I believe the root cause is that we now potentially make two function calls per argument value: one for the outer replacer and one for the inner `JSONStringifyReplacer`.
mjbvz added a commit that referenced this issue Aug 11, 2021
Fixes #125819

This PR lets us transfer the notebook output as a VSBuffer instead of as an array of numbers

To implement this, I reworked our rpc logic to support serializing VSBuffers inside of objects. Previously rpc calls could only efficiently serialize VSBuffers that appear as top level arguments

To support transferring VS Buffers, during serialization we use the `replacer` argument on `JSON.stringify` to extract and replace all buffers in the arguments with special reference objects. During de-serialization, when we encounter one of these reference objects we simply replace it with the corresponding restored buffer.

While implementing the new rpc logic, the separation between 'mixed' and 'simple' rpc arguments became less clear. Therefore I decided to try to combine roc calls to use a single arguments format. This simplified the code but I may revisit this to see if I can improve the performance. Right now serialization up 2x slower for large objects (usually it's less than that), while deserialization is about the same as it was previously. I believe the root cause is that we now potentially make two function calls per argument value: one for the outer replacer and one for the inner `JSONStringifyReplacer`.
mjbvz added a commit that referenced this issue Aug 12, 2021
Fixes #125819

This PR lets us transfer the notebook output as a VSBuffer instead of as an array of numbers. This  significantly reduces the rpc message size when talking about notebook outputs and also speed up serialization and deserialization of notebook outputs.

To implement this, I reworked our RPC logic to optionally support serializing VSBuffers inside of objects. Previously RPC calls could only efficiently serialize VSBuffers that appeared as top level arguments
@mjbvz mjbvz added this to the August 2021 milestone Aug 13, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Oct 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
debt Code quality issues insiders-released Patch has been released in VS Code Insiders notebook
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@RandomFractals @jrieken @mjbvz and others