Skip to content

Commit

Permalink
chore(docs): developer docs on plugin system communication
Browse files Browse the repository at this point in the history
  • Loading branch information
j-lanson committed Dec 17, 2024
1 parent a02ceea commit b0997a3
Show file tree
Hide file tree
Showing 3 changed files with 197 additions and 0 deletions.
196 changes: 196 additions & 0 deletions site/content/docs/contributing/developer-docs/plugin-query-system.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
---
title: The Hipcheck Plugin Query System
weight: 2
---

# The Hipcheck Plugin Query System

This document describes the control flow through Hipcheck core, down to gRPC,
into the Rust SDK, and back to the core during a plugin system query. This
document assumes the plugins are already started and configured, and that we
have established a gRPC stream with them over which to send and receive
messages defined by our `hipcheck-common/proto` protobuf schema.

#![image info](static/images/developer_docs_plugin_grpc.png)

## Overview and Design Requirements

Hipcheck plugins are child processes of the Hipcheck core process that it
communicates with over distinct gRPC channels. Each Hipcheck plugin defines a
set of query endpoints that act as remote functions. They each receive a JSON-
serialized key and return a JSON-serialized result. During a query endpoint's
execution, it may need to invoke another plugin's query endpoint(s). All
communicaton between plugins goes through Hipcheck core, so just as Hipcheck
core issues a request to a given endpoint, the endpoint can request Hipcheck
core to issue another request to a different endpoint and report to the original
endpoint the response, so that the original endpoint can complete its own
behavior.

Hipcheck expects each plugin to be able to handle multiple active "sessions,"
such that if a queried endpoint is waiting for it's own request to be responded
to by Hipcheck core, that plugin is not blocked from receiving and starting new
queries. Since each plugin only has one gRPC channel with Hipcheck core, each
query to an endpoint is associated with a session ID to disambiguate the session
to which session it belongs.

In Rust, the gRPC channel is accessed with using a `mpsc::{Sender, Receiver}`
pair. The `Sender` can be cloned many times, meaning that many threads can
`send` messages on the channel without needing exclusive access to any resource.
However, a `Receiver` cannot be shared. The Rust SDK addresses this restriction
and the above multiple-session, single-channel requirement by having a
`HcSessionSocket` object that has the exclusive `Receiver` to the plugin's gRPC
channel with Hipcheck core. The `HcSessionSocket` is responsible for tracking
the set of live sessions, and detecting whether a new message from the gRPC
channel should be forwarded to a live session, or constitues an entirely new
session. Each session is represented by a `PluginEngine` instance. The
`HcSessionSocket` sets up its own `mpsc` channel with the new `PluginEngine`
and gives it a clone of the gRPC channel `Sender`. In summation, all gRPC
messages received by the Rust SDK must go through `HcSessionSocket` for
demultiplexing, but each `PluginEngine` can send messages on the gRPC channel
directly.

### `hipcheck-common` and Chunking

The actual type that we can send over our gRPC channel to the live plugin is
called `PluginQuery`, and is automatically defined by the Rust code generated
from the protobuf definitions in `hipcheck-common/proto`. We choose to define
this high-level `Query` object to allow us to control the Hipcheck-facing struct
definition. For instance, `PluginQuery`'s `state` field is an `i32`, but for
`Query` we can make `state` a custom `enum` and translate from
`PluginQuery.state` to improve readability.

An additional complexity is that gRPC has a maximum per-message size of 4MB. To
abstract this reality from users, the `hipcheck-common` crate defines a chunking
algorithm used by both Hipcheck core and the Rust SDK. Each code-facing `Query`
object is chunked into one or more `PluginQuery` objects before being sent on
the wire, and on the listening side the message is de-fragemented with a
`hipcheck-common::QuerySyntesizer`.

## Part 1: Sending a request to a plugin

The plugin query system begins with a call to `score_results()`, which
iterates through all the policy file's top-level analyses one-by-one.
For each, `score_results()` calls `HcEngine::query()`, which is the
entrypoint for all queries to plugins. `HcEngine::query()` is memo-ized
using the `salsa` crate, so the running `hc` core binary caches all
queries and responses sent through `HcEngine::query()`. If later in
execution `HcEngine::query()` is called again for the same set of
parameters, it will return the cached output value without involving
the plugin process.

As described in the Overview, Hipcheck core has a unique gRPC channel with each
running plugin, so the first thing `HcEngine::query()` must do is find the
appropriate plugin channel handle for the target plugin. The `HcPluginCore`
object that powers `HcEngine` under the hood (set with `HcEngine::set_core()`)
has a map containing all the plugin handles. `HcEngine::query()` keys this map
using the target publisher/plugin pair to get the appropriate plugin handle,
which is an object of type `ActivePlugin`. It then forwards the target
query endpoint and key to `ActivePlugin::query()`.

Now that we have the active plugin handle, and therefore the right gRPC channel
for this query, we can formulate a query message. `ActivePlugin::query()`
formulates the high-level `Query` object and forwards it to the `query()`
function of the contained `PluginTransport` type. `ActivePlugin` is merely a
thin wrapper around `PluginTransport` with some additional state tracking
the next session ID to use.

Inside `PluginTransport::query()` is where the `Query` object gets chunked into
a `Vec<PluginQuery>` and each one gets sent over the gRPC channel. We have now
successfully sent out a query.

## Part 2 - Receiving Queries from gRPC

Meanwhile, the plugin process (if using the Rust SDK), has been
listening on the gRPC channel with `HcSessionSocket.rx::recv()`. As mentioned in
the Overview, there is one `HcSessionSocket` instance that receives
all `PluginQuery` messages off the wire. Each message is returned
to the `HcSessionSocket::listen()` function, which determines if the message's
session ID matches its list of active sessions. If not, this newly-received `PluginQuery`
object marks the start of a new session, so the `HcSessionSocket` creates and
initializes a `PluginEngine` instance to handle it. `HcSessionSocket` creates
a one-way `mpsc` channel for it to forward `PluginQuery` objects with the
appropriate session ID to this `PluginEngine`. Thus, when a `PluginEngine`
called `recv()` on its channel that it shares with `HcSessionSocket`, it can be sure that all messages
have the same session ID. The last thing `HcSessionSocket::listen()` does
is forward the `PluginQuery` over this channel, then goes back to listening
for gRPC messages.

The `PluginQuery` travels up through `PluginEngine::recv_raw()` into
`PluginEngine::recv()`, where it is de-fragmentized with zero or more
other messages to produce a software-facing `Query` object.

If this is the first `Query` to a new `PluginEngine`, the object is
received by `PluginEngine::handle_session_fallible()`. The `PluginEngine`
doesn't yet know which query endpoint to call, so it has to match
`Query.name` against the output of `Plugin.queries()` to find the right
one. Once we have the right endpoint, we take the key (the argument) from
`Query.key` and call the endpoint with it.

## Part 3 - Querying other plugins

Now we are actually executing query endpoint code. Over the course of its
execution, the endpoint may need information from another plugin. To enable
the query endpoint to do so, each query endpoint is provided a handle to
its associated `PluginEngine` along with the query key. The endpoint can then
call `PluginEngine::query()` with the plugin publisher and name, the target
query endpoint name, and the query key. Within `PluginEngine::query()`, these
parameters are formulated into a `Query` object and forwarded to
`PluginEngine::send()`. The `send()` function uses the chunking algorithm from
`hipcheck-common` to produce a `Vec<PluginQuery>` and send them out over the
gRPC channel `Sender` with `PluginEngine.tx::send()`. As a reminder, this does
not go back through the `HcSessionSocket`, the `PluginEngine` can send messages
to Hipcheck core directly.

## Part 4 - Receiving and Interpreting Messages from Plugins

When we last left the Hipcheck core, it had just sent its `Vec<PluginQuery>`
over gRPC with `PluginTransport.tx::send()`. Note that this is just one thread
of execution in Hipcheck core. Just as a plugin process must be able to handle
multiple live sessions, the Hipcheck core may have multiple tasks each executing
independent queries. Thus, Hipcheck has the same issue of ensuring messages
received from the gRPC channel make it to the correct `PluginTransport` objects,
but it solves this problem differently than the Rust SDK does.

Each `PluginTransport` object shares a `Mutex` that guards the
`MultiplexedQueryReceiver` object. While the `PluginTransport` waits for a
message from the `PluginEngine` session that was spawned remotely to handle its
request, it enters a loop. In each iteration of the loop, it blocks until it can
acquire the `MulitplexedQueryReceiver`. Once it has acquired the receiver, it
checks the receiver's backlog for any messages matching its target session ID.
If none are found, it listens on the gRPC wire directly for the next message. If
the next message matches our session, we take the message, otherwise we put it
in the backlog to save it for the `PluginTransport` that does want that message.
After this, we drop our lock on the `Mutex<MultiplexedQueryReceiver>` and
restart the loop. The reason we drop and re-acquire the lock is so that one
`PluginTransport` that spends a very long time waiting for its message(s) does
not prevent other `PluginTransport` instances from receiving their messages. By
dropping and trying to re-acquire the `Mutex` lock, we give other
`PluginTransport` instances a chance to acquire the receiver.

The `PluginTransport` continues this loop until it has received all the
`PluginQuery` objects it needs to de-fragment into a `Query` object. It then
returns the `Query` to the caller, which is `ActivePlugin::query()`. This
function does the job of converting `Query` into a Hipcheck core-specific type
called `PluginResponse`. Until now, the Hipcheck core has not really checked the
content of the `Query`, but now it needs to decide whether the `Query` is the
query endpoint returning a value or requesting additional information. The
`PluginResponse` enum separates these two possibilities, plus an additional
error variant.

`ActivePlugin::query()` returns the `PluginResponse` up to the caller, namely
`HcEngine::query()`. Here, if the `PluginResponse` was `Completed`, we have
finished the query and return its output value that was stored as a field in
`Completed`. Otherwise, we have to recursively call `HcEngine::query()` with the
query information stored in `PluginResponse::AwaitingResult`.

Once this recursive call completes, we must forward the output of that query to
forward to our original query endpoint who asked for it. We do this by passing
that output to `ActivePlugin::resume_query()`. One of the main differences of
this function is that the generated `Query` object uses an existing session ID
instead of a newly-generated one, since this `Query` is part of an ongoing
session.

The original query endpoint may return a `PluginResponse::AwaitingResult` zero
or more times, but eventually we will get a `PluginResponse::Completed`, and by
passing the contained output up to the calling function, we have completed a
query using the plugin system!
1 change: 1 addition & 0 deletions site/static/assets/developer_docs_plugin_grpc.drawio
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<mxfile host="charts.mitre.org" modified="2024-12-16T18:24:52.210Z" agent="5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0" version="20.8.5" etag="K-q-bwHREtJwMxqPlC3R" type="device"><diagram id="kli-Tog3HTVBQzar_-YL" name="Page-1">7Vxrc+K2Gv41zLQfyPgCBj4mJDl7prd0t+e0/ZQRtsDqGotIdoD99dXVNxnigsHebHdmJ7Ysy+LV+7x3aeDO17v/ELAJf8IBjAaOFewG7v3AcRxrPGN/eMtetkwd1bAiKJBNdt7wCX2BqtFSrSkKIC11TDCOErQpN/o4jqGflNoAIXhb7rbEUfmrG7CCRsMnH0Rm6+8oSEL1K8ZW3v4BolWov2xb6ska6M6qgYYgwNtCk/swcOcE40RerXdzGHHiabrI9x4PPM0mRmCcNHnh6Zffv4DlyPdmT788v/rYI3QxHI/kMK8gStUvVrNN9poEBKdxAPko1sC924YogZ82wOdPt2zRWVuYrCN2Z7PLJYqiOY4wEe+6wRhOgxFrpwnBn2HhydRZuJ7HnqgJQJLA3cGfZmcEY5wG8RomZM+6aDbzFI0Vk9kjdb/Nl2yi1yEsLNdYvwgUm6yysXNKsgtFzH9AWHt6YcICOF36dYT1/ClcLNsh7DDj3iOUHU2vSlnPIOz/KOS/iuBXxMnJLtNFhGgICf9+LFqidIViPnS8xEKGDPgv8yI23bsF6+etEkFk0dvHhHBhwn4Sez2C4j3+VsgvKeRP8JIP7ifoFdYOJb9IsxksMdkCwq8GnvWSQkYJTiL1lF1/hqqFzc29NZgFBkwuqVtMkhCvcAyih7z1rsxOeZ8fMd4oJvoLJsleCVmQ8g8VWUx+k3/oOLuweeGU+PAY+ytRDcgKJsfkTz37ERgBQdmSOG8dowYnwZitGbwhlK+Ae/vBfxAN8k4s2nffnwfjFkA5KkPSmZqQdKwaSI4uhUhTh/w3gQQkHBj4VaAQAj8UrL0ZRvBV2AiAseaeIlqLngAyKAosF4C3wRHy9xlmmJKOOkIKWymy/4O/fzPWt3+q4cTN/a50t1d3LSLMaYiwA7x0HYQ5BmdQJltzgIm7ZwJpGiX0XGhVNORs9vg4m10EcrbXNeRmBmF/S0ksfxNgn4CECuwIPQfY/1+FxmF4XPwlNFsVcFUlJV5EzKYmAmIeWHNqxwu6ySj6zUFu3BByky4hNzY4Q9s+j2scZMi7FZbLk3zUZ/U2srrGmnZFiySFhEFlzZVQmMafUbzi0IprcCaBxGzTIPXrDUWOzv9DX5uUcknKwzAszzO1R6H4IzVqdTAcQ/1NqTOZHQvRq5xgZgSz6fiQUq2ev1kdOmmqQ89VourVJ8wkcs7pdoXVvSoLy4mptypcnE3jdMaeHBYVyX4DaSYsTD6TXPobATHdsLXvtQQZO51LENPTOJXSN8lOmU5MDPSP1l5NfODKtG4Q0aIh2PDLZQR3tzxIKIRMoC7v/QhQimRwBZBEN8dcuJboVyY23KGkIM/Y3Z9a1LHrXJrxGy3MijYr83lAKlb/kGi03xCNYrqPiBOsbVFpN/XonWk9vxQYwq5jiKzxTKFajQWOZxVOkz/VEKrGQCO3PNDEbiadGbuAfaHbhnegzbXAZGQdnZc7PdqfXcgZtKoqbNOsXH18mgsDCDAvIaq1IH4EC/akBBkQoVXMMcY4kJsed1z6IObK36oHaxQE0sCAFH0BCzEeZ2ZFRzb4+G4wvj8mvlQyQL08yIKYRcY/IjwOh0OtG2tmTUrkd1vhWrs86FDf6xHwcinCju0bAbbpSjLV85LytMTdB7TxQ+h/Zs/nzFPPH5h6Ksy6PlLiG+zAlii5HCdw65JKQ9M2YuJKbhdFrWpqQf8Nq6GBGnfFtq+pAB0zQI5kOI4Kr4AprRWPxQ085RDccLsNQR5/0fFnx5KR8HyRC2upeWCBSQDJ0Jc0vRWfJN8Nh8X27yWv5FwjeGUNEj+UPsnPYA0Dw2liztXFPq4/Jj0jEWNcEsx9udAXsoPzufK1SCoi+CipFW8XdpDqbYMWdbqWXW/r9E7D9HqahvX8XAnXS1O5GLKvEVMiofNMmfuLcPy8BFGEmGTpgSXtVPR6D+L6JuERD1G8KASh9SaCa/bzGBOI4AeBLykiQs4EIAE1+TUJNBDjRGbntBs0F6SiVLybxynyPB7vYI4XM+khn+k55S0SwTyr1mFK7dQgx+Wx3zSY6XpHuda68Txvcp4NdHkjZzT6d/FPWvxZl4J/ZAaotiGMC2hn1qKfRlr8+JgLJGHnzMUkk5Rw/V2VGh9FvmnIpfhG6H/5gWNi4qvzZEbHPRkG25kzKvsc7XgybmlQ172WH+OYHirT7JltIK0+ccmMuh6q++5zio5ZslRvZx0KUj6obrsbZlu/9pDG3UcnXdOkysRUMQsk8zjcC7LW0iEp5YWMtI9OEzH0goJHI3NADSI1HWnCtwKal9eEs6aa0OlSEzpmkOYUaCr5x8D5TMC2hwDtPlXjmqmaHKA8VUtRvIoMQOahCwXAg75PXyB5IPNQm8PI0ha1OYxTwd0mjKcNYeyca9DWGz0TU9HcWIV/TnnApimI6rBj7+iwBxIS13S0XNMKC+BQFEnkpbB0HzMU8HhtffGf0IOq4ijKZZjWffLBSmoCXeNUftjDOqXOVd3ZGqxZOsp1rEZc2RrHOS2rxh6qxe5roEa2QZRvOYDiek3Nxk5rAUemR8fL1qSf8JJCmhgCtFmBKH1nwZI3cnw8yGlVoiXqm+fmfUuDXi3p65o5wtOFZl/KvjLnoT8JlBrddGrl6HXLRoup0F5Ui15epDctBnUPFDhdR6S7h4o2T0FuT2oIq8DtQRSgnVCo7JPsbnpK6O7joXrgf+3KQr7tGnblSaWK2bZ37ZdUt5xX+nvO0f6XKVUcmyH23OilGxxT2MTqbbXm6F0bz1Y9K+bG89Stljcqu+jcUt/SoFfLNGqQvvN6pInXN3N6bHotOdQ+psKb/XT/w7EqVRrwAlUi+lbo+14rVMeTcWkd6ypU605wuNgyeg2OxrjMDo22d1sU1zAfr7oL45R9IW3aFE1zI7bl1jPStbZuTCsSx5tWWLBp3qRqdxgDtbR1o/odna85NK9hRqn6Fy5jEHmmvnqPeze8A375e967oTOARraatS14kb3IsJkBosP1I6Lynk2dYh+BRFTHZeW+QW64dp9nu7gvpsRhI8HZ7TE21qEC+RO2l5LdszrzqE8JsaFj9S1yYeuEQPFwkyyxXQVcHpEtHMphGKe6tMQIBavdKBravStn7zy5rSDYCKyd1jRnE313u+4NjHYfxrWtmr1wXBkOPAXPjzoiJHa+oThg5kxyoCxFlZeLWvN5hsVC7bg8korvVumLOuxBaZZtNT9l40Ak6Uz7zLC6Mya8UomKrTfPX5cljvqnFZe56rNeQWA3ZotuxbWZb/sqDgk0pHH34TvbMgM/zNvjP7PO5smOLqsX1lXprGS3TKhvQ6g2GKqTA/WOwhDQzJDizo0xijDR9EYiYXnRrRhI+EWoUDojvlR9X0xaO0zmNse+6IVWgdw0Z6Ww1B2UD8Xtv5qj0QxQd18YaNecjvbGGY8W/6KBnMtL1cr5kPx0yLbOhzQW5pK7uXgCIzs1XNoo+dnr7sPf</diagram></mxfile>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit b0997a3

Please sign in to comment.