We are publishing a protocol for bidirectional streaming RPCs, informed by our experience building GenAI applications. Our goal is to create a protocol that can support a wide range of existing, planned, and unforeseen use cases in the long run, while being immediately useful for generative AI use cases today.
We would appreciate any feedback on the design of the protocol, compelling use cases, and things we haven't considered. To reach out to us, either use the Github issues tool or send us an email at [email protected].
Evergreen is a protocol representing a logical session between a client and a server. Session is a medium where nodes and actions are both produced and consumed.
Each leaf node (node without children) is a typed sequence of bytes. The bytes are split up into chunks and can be incrementally delivered over a transport like bidirectional streaming gRPC. The chunk boundaries here do not have semantic meaning--a node where all chunks in the same leaf have been concatenated into a single chunk represents the same data as the original node.
Non-leaf nodes allow us to deliver a sequence of leaf nodes, concurrently and in an out-of-order fashion. For example, a Wikipedia page can be sent as a root node, with several text, image and video child nodes. Text and image nodes would all be leaf nodes containing their respective chunk. In contrast, each video node is then further transformed into a sequence of leaf nodes, each containing the chunk for a single video frame.
Conceptually, each input or output is a chunk of data, or a list of chunks. However, instead of sending chunks over the wire as a simple list, we use a tree of nodes for input and output to (a) share common content across different actions, and (b) stream different chunks out of order. The input and output of an action is the result of flattening the respective tree of nodes into a list of chunks.
Note that, even though conceptually similar to trees, nodes can form a directed acyclic graph (DAG) due to aliasing and sharing nodes in the hierarchy. In this doc, we use the term tree to describe a hierarchy of nodes instead of DAG for simplicity.
Nodes are identified using a unique ID within the session. The ID is determined by the producer. In addition to ID, each chunk has a mime type indicating how the payload should be interpreted.
Both nodes and chunks can be streamed.
Actions are a predefined set of functions (e.g., GENERATE
) that can be
executed over a set of nodes, and produce nodes. The inputs and outputs are
named, similar to
named parameters in programming
languages. The acceptable input and output parameters, the type of outputs, and
the configuration fields are defined by the action. We do not impose any
limitations on that.
Nodes are identified using a unique ID within the session. These IDs are determined by the producers. Nodes and chunks cannot be shared across sessions. However, a chunk can reference external storage, which can implicitly result in sharing state across sessions.
IDs should not bear semantics: requests should remain valid and semantics should not change if each ID is replaced with a randomly picked unique string. Input/output names should be used for semantic representation instead.
Nodes are bound to sessions and they will be deleted as soon as the session ends. Within a session, nodes and chunks will persist for the length of the session. As soon as a session expires, all nodes and chunks within that session are expired too.
Children of a node can be generated and/or received in any order. The ordering
within a node is denoted using a 0-based sequence number. Sequence numbers
should be unique within a node. If a consumer receives more than one
NodeFragment
message with the same sequence number with the same parent node,
the consumer should accept the first received NodeFragment
message with that
seq
number, and ignore the rest. The consumer should not produce an error, to
facilitate easier implementation of retries. If nodes must be processed in an
ordered fashion, it is the responsibility of the consumer to order them upon
receipt.
A node can reference nodes that are not received yet as their children. The consumer should form proper data dependency barriers to support that. A node can be a descendant of more than one node in a session, and a named node can be used as an input to more than one action.
While nodes cannot transitively include themselves, we still support deep nesting. Implementations should have a reasonable limit for the maximum nesting level.
Implementations are encouraged but not required to send actions before input nodes, and parent nodes before their children, to enable potential pipelining.1 Regardless of when a node is received, the receiver must hold on to the given node until the end of the session. Note that a session may be unilaterally aborted by the receiver, which will result in dropping all nodes in that session.
Messages for a single node contain a continued
flag to indicate that the node
is not finalized (there are more fragments of the same node). If the receiver
receives any NodeFragment
message with a sequence number greater than the one
of the final message of the node (a NodeFragment
message in which continued
is False), the session will be aborted with an error.
Since the continued
flag is by default false, any NodeFragment
message
received without this flag is considered final.
Each chunk has a mime type, which can inform the consumer how to
interpret/decode the bytes in that chunk. We require the message with seq=0
to
populate the metadata field, and that no other chunks contain metadata. In the
same node, we assume all messages have the same metadata as the one sent in
sequence number 0. If a message other than sequence 0 contains metadata, the
session will be aborted.
Note that the payload of a chunk can be either an inline data or a reference to an external source. For external sources, mime type indicates the type of the content in the external storage.
Non-leaf nodes can directly or transitively include leaf nodes with different mimetypes. Note that nodes themselves do not have a payload, and hence no mime type.
Chunks can have inline binary data
or an external ref
as their payloads.
External references are URIs. The supported URIs depend on the implementation.
Regardless of having inline or external payload, leaf nodes are built out of
concatenation of chunks, ordered by sequence number.
For each action, the producer must provide a list of inputs and list of outputs.
Each input and output entry is a pair of (parameter name, node ID). The
parameter name is the name of an input or output parameter of the invoked
action. This is analogous to named arguments in programming languages, without
any particular ordering. For example, to generate a video from text, the user
can invoke generate(input=[(text=prompt1)], output=[(video=frames1)]
which
uses the prompt1
node as the text input, and puts the model's output video in
the frames1
node.
The producer should generate chunks with the root node IDs provided in the
output NamedParameters
. The provided node IDs in the output must be new and
must not be reused. If the sender does not provide a NamedParameter
corresponding to a supported output on the action type, it signals that the
sender will not use that particular field of the output and it need not be
populated by the server.
When an action is received, the consumer can start processing an action at any time (e.g., it can wait for all input nodes to be received, or can start processing the action immediately). Outputs from previous actions can be used as inputs without re-sending them.
Actions may not successfully execute for various reasons, including malformed or missing input parameters. When an action fails, we abort a session.
There are other cases where the implementation may prefer to abort the session. For example, when the user sends invalid chunks, or when the implementation detects abuse.
Note that we expect servers (including models) to provide different actions. The action name, input names and types, output names and types, and accepted configurations should be agreed upon between the producer and the consumer.
For existing applications, we believe most generative models will provide a
GENERATE
method. We envision that, in the near future, agent infrastructure
will provide various actions as a menu of functions.
- Sequence number (
seq
) is 0 if sequence number is not provided. continued
is false if not provided.
We may provide thick libraries wrapping the bidirectional streaming protocol to avoid exposing users to the complexity of the base protocol.
We may consider a DROP(node)
action that will drop the node from the session.
There may be a SET_TTL(node)
action, which sets the TTL of node and all
descendent chunks. Then servers must hold on to the given node until the TTL of
the node expires or the session goes away. (Note that a session may be
unilaterally aborted by the receiver.)
In the future, we may extend the protocol to allow partial failures within a session, where some action failures do not turn into session failures. For now, the session is aborted upon any action failures.
We may provide a way to cancel ongoing actions.
Client sends:
action {
name: "GENERATE"
input {name: "prompt", id: "prompt_1"}
output {name: "response", id: "response_1"}
}
node_fragment {
id: "prompt_1"
child_ids: "question_1"
child_ids: "video_1"
}
node_fragment {
id: "question_1"
chunk_fragment: {
metadata { mimetype: "text/plain" }
data: "Write a summary of this video: "
}
}
node_fragment {
id: "video_1"
seq: 0
continued: true
chunk_fragment: {
metadata { mimetype: "video/mp4" }
ref: "file://path/to/file/part1"
}
}
node_fragment {
id: "video_1"
seq: 1
chunk_fragment: {
# NOTE: Metadata in the second chunk can be omitted.
metadata { mimetype: "video/mp4" }
# the content of the video at /part2 is concatenated
# with the content of the video at /part1 to build the
# full video in the buffer.
ref: "file://path/to/file/part2"
}
}
Server sends:
node_fragment {
id: "response_1"
seq: 0
continued: true
chunk_fragment {
metadata { mimetype: "text/plain" }
data: "It is a translation of an "
}
}
node_fragment {
id: "response_1"
seq: 1
chunk_fragment {
metadata { mimetype: "text/plain" }
data: "F1 race. "
}
}
Client sends:
action {
name: "GENERATE"
input {name: "prompt", id: "prompt_2"}
output {name: "response", id: "response_2"}
}
node_fragment {
id: "prompt_2"
child_ids: "prompt_1"
child_ids: "response_1"
child_ids: "question_2"
}
node_fragment {
id: "question_2"
chunk_fragment {
metadata { mimetype: "text/plain" }
data: "Who's winning?"
}
}
Server sends:
node_fragment {
id: "response_2"
chunk_fragment {
metadata { mimetype: "text/plain" }
data: "Ayrton Senna."
}
}
Client sends:
action {
name: "GENERATE"
config: [...proto.Any wrapping GenerateConfig...]
input {name: "text", id: "prompt_1"}
output {name: "text", id: "response_1"}
}
node_fragment {
id: "prompt_1"
child_ids: "prompt_1_text"
child_ids: "prompt_1_eot"
}
node_fragment {
id: "prompt_1_text"
chunk_fragment {
metadata { mimetype: "text/plain" }
data: "Write a heroic novel about a half-eaten jam doughnut."
}
}
node_fragment {
id: "prompt_1_eot"
chunk_fragment {
metadata { mimetype: "application/x-protobuf; type=EndOfTurn" }
}
}
Client sends:
action {
name: "GENERATE"
config: [...proto.Any wrapping GenerateConfig...]
input {name: "text", id: "prompt_1"}
output {name: "text", id: "response_1"}
}
node_fragment {
id: "prompt_1"
child_ids: "prompt_1_text"
continued: true
// `seq: 0` is implicit.
}
node_fragment {
id: "prompt_1_text"
chunk_fragment {
metadata { mimetype: "text/plain" }
data: "Write a heroic novel about a half-eaten jam doughnut."
}
}
// `prompt_1` is streamed by sending more chunks belonging to the
// `prompt_1` tree.
node_fragment {
id: "prompt_1"
child_ids: "prompt_1_eot"
seq: 1
// `continued: false` is implicit.
}
node_fragment {
id: "prompt_1_eot"
chunk_fragment {
metadata { mimetype: "application/x-protobuf; type=EndOfTurn" }
}
}
1 If a chain or buffer is received before the action, the receiver has no choice but to buffer it. If however the action is known already, the buffer can start being processed as its contents arrive.
Copyright 2024 DeepMind Technologies Limited.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.