diff --git a/site/content/docs/contributing/_index.md b/site/content/docs/contributing/_index.md index d76135bd..8f721195 100644 --- a/site/content/docs/contributing/_index.md +++ b/site/content/docs/contributing/_index.md @@ -30,4 +30,8 @@ A walkthrough of running Hipcheck for the first time. A walkthrough of running Hipcheck for the first time. {% end %} +{% waypoint(title="Developer Docs", path="@/docs/contributing/developer-docs/_index.md") %} +Documentation for Hipcheck developers. +{% end %} + diff --git a/site/content/docs/contributing/developer-docs/_index.md b/site/content/docs/contributing/developer-docs/_index.md new file mode 100644 index 00000000..e381d3fe --- /dev/null +++ b/site/content/docs/contributing/developer-docs/_index.md @@ -0,0 +1,21 @@ +--- +title: Developer Docs +template: docs.html +sort_by: weight +page_template: docs_page.html +weight: 3 +--- + +# Hipcheck Developer Docs + +
+ +{% waypoint(title="Architecture", path="@/docs/contributing/developer-docs/architecture.md") %} +Hipcheck's distributed architecture and how plugins get started. +{% end %} + +{% waypoint(title="Query System", path="@/docs/contributing/developer-docs/plugin-query-system.md") %} +The life of a plugin query from inception, through gRPC, to SDK, and back. +{% end %} + +
diff --git a/site/content/docs/contributing/developer-docs/architecture.md b/site/content/docs/contributing/developer-docs/architecture.md new file mode 100644 index 00000000..1a5cd41c --- /dev/null +++ b/site/content/docs/contributing/developer-docs/architecture.md @@ -0,0 +1,56 @@ +--- +title: The Hipcheck Architecture +weight: 2 +--- + +# The Hipcheck Architecture and Plugin Startup + +This document describes the distributed architecture of Hipcheck and how plugins +get started. + +Hipcheck is a relatively simple multiprocessed tool that follows a star +topology. Users invoke the main Hipcheck binary, often referred to as "Hipcheck +core" or `hc`, on the command line, and provide a [policy file][policy_file] +which specifies the set of top-level plugins to use during analysis. Once +Hipcheck resolves these plugins and their dependencies, it starts each plugin in +a separate child process. Once all plugins are started and initialized, Hipcheck +core enters the analysis phase. During this phase it acts as a simple hub for +querying top-level plugins and relaying queries between plugins, as plugins are +intended to only communicate with each other through the core. + +## Plugin Startup + +Hipcheck core uses the `plugins/manager.rs::PluginExecutor` struct to start +plugins. The `PluginExecutor` has fields like `max_spawn_attempts` and +`backoff_interval` for controlling the startup process. These fields can be +configured using the `Exec.kdl` file. + +The main function in `PluginExecutor` is `start_plugin()`, which takes a +description of a plugin on file and returns a `Result` containing a handle to +the plugin process, called `PluginContext`. + +In `start_plugin()`, once the `PluginExecutor` has done the work of locating the +plugin entrypoint binary on disk, it moves into a loop of attempting to start +the plugin, at most `max_spawn_attempts` times. For each spawn attempt, it will +call `PluginExecutor::get_available_port()` to get valid local port to tell the +plugin to listen on. The executor creates a +`std::process::Command` object for the child process, with `stdout/stderr` +forwarded to Hipcheck core. + +Within each spawn attempt, `PluginExecutor` will try to connect to the port on +which the plugin should be listening. Since process startup and port +initialization can take differing amounts of time, the executor does a series of +up to `max_conn_attempts` connection attempts. For each failed connection, +the executor waits `backoff_interval`, which increases linearly with the number of +failed connections. The calculated backoff is modulated by a random `jitter` +between 0 and `jitter_percent`. + +Overall, the sleep duration between failed connections is equal to + + (backoff * conn_attempts) * (1.0 +/- jitter) + +As soon as `PluginExecutor::start_plugin()` successfully starts and connects to the child +process, it stores the process and plugin information in a `PluginContext` and +returns it to the caller. It however returns an error if `max_spawn_attempts` is reached. + +[policy_file]: @/docs/guide/config/policy-file.md diff --git a/site/content/docs/contributing/developer-docs/plugin-query-system.md b/site/content/docs/contributing/developer-docs/plugin-query-system.md new file mode 100644 index 00000000..737b5661 --- /dev/null +++ b/site/content/docs/contributing/developer-docs/plugin-query-system.md @@ -0,0 +1,203 @@ +--- +title: The Hipcheck Plugin Query System +weight: 2 +--- + +# The Hipcheck Plugin Query System + +This document describes the control flow through [Hipcheck core][hc_core], down +to gRPC, into the [Rust SDK][rust_sdk], and back to the core during a plugin +system query. This document assumes the plugins are already started and +configured, and that we have established a gRPC stream with them over which to +send and receive messages defined by our `hipcheck-common/proto` protobuf +schema. + +{{ image(path="images/developer_docs_plugin_grpc.png") }} + +## Overview and Design Requirements + +Hipcheck plugins are child processes of the Hipcheck core process that it +communicates with over distinct gRPC channels. Each Hipcheck plugin defines a +set of query endpoints that act as remote functions. They each receive a JSON- +serialized key and return a JSON-serialized result. During a query endpoint's +execution, it may need to invoke another plugin's query endpoint(s). All +communicaton between plugins goes through Hipcheck core, so just as Hipcheck +core issues a request to a given endpoint, the endpoint can request Hipcheck +core to issue another request to a different endpoint and report to the original +endpoint the response, so that the original endpoint can complete its own +behavior. + +A "session" describes the series of messages between Hipcheck core and a query +endpoint needed to complete a single query. This includes those queries made by +the endpoint to other plugins as part of answering the original query. Hipcheck +expects each plugin to be able to handle multiple active "sessions," such that +if a queried endpoint is waiting for it's own request to be responded to by +Hipcheck core, the plugin process is not blocked from receiving and handling new +queries, including to the same query endpoint. Thus, each query object sent to +and from the plugin has a session ID field to associate it with a particular +session. + +In Rust, the gRPC channel is accessed with using a `mpsc::{Sender, Receiver}` +pair. The `Sender` can be cloned many times, meaning that many threads can +`send` messages on the channel without needing exclusive access to any resource. +However, a `Receiver` cannot be shared. The Rust SDK addresses this restriction +and the above multiple-session, single-channel requirement by having a +`HcSessionSocket` object that has the exclusive `Receiver` to the plugin's gRPC +channel with Hipcheck core. The `HcSessionSocket` is responsible for tracking +the set of live sessions, and detecting whether a new message from the gRPC +channel should be forwarded to a live session, or constitues an entirely new +session. Each session is represented by a `PluginEngine` instance. The +`HcSessionSocket` sets up its own `mpsc` channel with the new `PluginEngine` +and gives it a clone of the gRPC channel `Sender`. In summation, all gRPC +messages received by the Rust SDK must go through `HcSessionSocket` for +demultiplexing, but each `PluginEngine` can send messages on the gRPC channel +directly. + +### `hipcheck-common` and Chunking + +The actual type that we can send over our gRPC channel to the live plugin is +called `PluginQuery`, and is automatically defined by the Rust code generated +from the protobuf definitions in `hipcheck-common/proto`. We choose to define +this high-level `Query` object to allow us to control the Hipcheck-facing struct +definition. For instance, `PluginQuery`'s `state` field is an `i32`, but for +`Query` we can make `state` a custom `enum` and translate from +`PluginQuery.state` to improve readability. + +An additional complexity is that gRPC has a maximum per-message size of 4MB. To +abstract this reality from users, the `hipcheck-common` crate defines a chunking +algorithm used by both Hipcheck core and the Rust SDK. Each code-facing `Query` +object is chunked into one or more `PluginQuery` objects before being sent on +the wire, and on the listening side the message is de-fragemented with a +`hipcheck-common::QuerySyntesizer`. + +## Part 1: Sending a request to a plugin + +The plugin query system begins with a call to `score_results()`, which +iterates through all the policy file's top-level analyses one-by-one. +For each, `score_results()` calls `HcEngine::query()`, which is the +entrypoint for all queries to plugins. `HcEngine::query()` is memo-ized +using the `salsa` crate, so the running `hc` core binary caches all +queries and responses sent through `HcEngine::query()`. If later in +execution `HcEngine::query()` is called again for the same set of +parameters, it will return the cached output value without involving +the plugin process. + +As described in the Overview, Hipcheck core has a unique gRPC channel with each +running plugin, so the first thing `HcEngine::query()` must do is find the +appropriate channel handle for the target plugin. The `HcPluginCore` object that +powers `HcEngine` under the hood (set with `HcEngine::set_core()`) has a map +containing all the plugin handles. `HcEngine::query()` keys this map using the +target publisher/plugin pair to get the appropriate plugin handle, which is an +object of type `ActivePlugin`. It then forwards the target query endpoint and +key to `ActivePlugin::query()`. + +Now that we have the active plugin handle, and therefore the right gRPC channel +for this query, we can formulate a query message. `ActivePlugin::query()` +formulates the high-level `Query` object and forwards it to the `query()` +function of the contained `PluginTransport` type. `ActivePlugin` is merely a +thin wrapper around `PluginTransport` with some additional state tracking +the next session ID to use. + +Inside `PluginTransport::query()` is where the `Query` object gets chunked into +a `Vec` and each one gets sent over the gRPC channel. We have now +successfully sent out a query. + +## Part 2 - Receiving Queries from gRPC + +Meanwhile, the plugin process (if using the Rust SDK), has been +listening on the gRPC channel with `HcSessionSocket.rx::recv()`. As mentioned in +the Overview, there is one `HcSessionSocket` instance that receives +all `PluginQuery` messages off the wire. Each message is returned +to the `HcSessionSocket::listen()` function, which determines if the message's +session ID matches its list of active sessions. If not, this newly-received `PluginQuery` +object marks the start of a new session, so the `HcSessionSocket` creates and +initializes a `PluginEngine` instance to handle it. `HcSessionSocket` creates +a one-way `mpsc` channel for it to forward `PluginQuery` objects with the +appropriate session ID to this `PluginEngine`. Thus, when a `PluginEngine` +called `recv()` on its channel that it shares with `HcSessionSocket`, it can be sure that all messages +have the same session ID. The last thing `HcSessionSocket::listen()` does +is forward the `PluginQuery` over this channel, then goes back to listening +for gRPC messages. + +The `PluginQuery` travels up through `PluginEngine::recv_raw()` into +`PluginEngine::recv()`, where it is de-fragmentized with zero or more +other messages to produce a software-facing `Query` object. + +If this is the first `Query` to a new `PluginEngine`, the object is +received by `PluginEngine::handle_session_fallible()`. The `PluginEngine` +doesn't yet know which query endpoint to call, so it has to match +`Query.name` against the output of `Plugin.queries()` to find the right +one. Once we have the right endpoint, we take the key (the argument) from +`Query.key` and call the endpoint with it. + +## Part 3 - Querying other plugins + +Now we are actually executing query endpoint code. Over the course of its +execution, the endpoint may need information from another plugin. To enable +the query endpoint to do so, each query endpoint is provided a handle to +its associated `PluginEngine` along with the query key. The endpoint can then +call `PluginEngine::query()` with the plugin publisher and name, the target +query endpoint name, and the query key. Within `PluginEngine::query()`, these +parameters are formulated into a `Query` object and forwarded to +`PluginEngine::send()`. The `send()` function uses the chunking algorithm from +`hipcheck-common` to produce a `Vec` and send them out over the +gRPC channel `Sender` with `PluginEngine.tx::send()`. As a reminder, this does +not go back through the `HcSessionSocket`, the `PluginEngine` can send messages +to Hipcheck core directly. + +## Part 4 - Receiving and Interpreting Messages from Plugins + +When we last left the Hipcheck core, it had just sent its `Vec` +over gRPC with `PluginTransport.tx::send()`. Note that this is just one thread +of execution in Hipcheck core. Just as a plugin process must be able to handle +multiple live sessions, the Hipcheck core may have multiple tasks each executing +independent queries. Thus, Hipcheck has the same issue of ensuring messages +received from the gRPC channel make it to the correct `PluginTransport` objects, +but it solves this problem differently than the Rust SDK does. + +Each `PluginTransport` object shares a `Mutex` that guards the +`MultiplexedQueryReceiver` object. While the `PluginTransport` waits for a +message from the `PluginEngine` session that was spawned remotely to handle its +request, it enters a loop. In each iteration of the loop, it blocks until it can +acquire the `MulitplexedQueryReceiver`. Once it has acquired the receiver, it +checks the receiver's backlog for any messages matching its target session ID. +If none are found, it listens on the gRPC wire directly for the next message. If +the next message matches our session, we take the message, otherwise we put it +in the backlog to save it for the `PluginTransport` that does want that message. +After this, we drop our lock on the `Mutex` and +restart the loop. The reason we drop and re-acquire the lock is so that one +`PluginTransport` that spends a very long time waiting for its message(s) does +not prevent other `PluginTransport` instances from receiving their messages. By +dropping and trying to re-acquire the `Mutex` lock, we give other +`PluginTransport` instances a chance to acquire the receiver. + +The `PluginTransport` continues this loop until it has received all the +`PluginQuery` objects it needs to de-fragment into a `Query` object. It then +returns the `Query` to the caller, which is `ActivePlugin::query()`. This +function does the job of converting `Query` into a Hipcheck core-specific type +called `PluginResponse`. Until now, the Hipcheck core has not really checked the +content of the `Query`, but now it needs to decide whether the `Query` is the +query endpoint returning a value or requesting additional information. The +`PluginResponse` enum separates these two possibilities, plus an additional +error variant. + +`ActivePlugin::query()` returns the `PluginResponse` up to the caller, namely +`HcEngine::query()`. Here, if the `PluginResponse` was `Completed`, we have +finished the query and return its output value that was stored as a field in +`Completed`. Otherwise, we have to recursively call `HcEngine::query()` with the +query information stored in `PluginResponse::AwaitingResult`. + +Once this recursive call completes, we must forward the output of that query to +forward to our original query endpoint who asked for it. We do this by passing +that output to `ActivePlugin::resume_query()`. One of the main differences of +this function is that the generated `Query` object uses an existing session ID +instead of a newly-generated one, since this `Query` is part of an ongoing +session. + +The original query endpoint may return a `PluginResponse::AwaitingResult` zero +or more times, but eventually we will get a `PluginResponse::Completed`, and by +passing the contained output up to the calling function, we have completed a +query using the plugin system! + +[hc_core]: @/docs/contributing/developer-docs/architecture.md +[rust_sdk]: @/docs/guide/making-plugins/rust-sdk.md diff --git a/site/static/assets/developer_docs_plugin_grpc.drawio b/site/static/assets/developer_docs_plugin_grpc.drawio new file mode 100755 index 00000000..fb114031 --- /dev/null +++ b/site/static/assets/developer_docs_plugin_grpc.drawio @@ -0,0 +1 @@ +7Vxrc+K2Gv41zLQfyPgCBj4mJDl7prd0t+e0/ZQRtsDqGotIdoD99dXVNxnigsHebHdmJ7Ysy+LV+7x3aeDO17v/ELAJf8IBjAaOFewG7v3AcRxrPGN/eMtetkwd1bAiKJBNdt7wCX2BqtFSrSkKIC11TDCOErQpN/o4jqGflNoAIXhb7rbEUfmrG7CCRsMnH0Rm6+8oSEL1K8ZW3v4BolWov2xb6ska6M6qgYYgwNtCk/swcOcE40RerXdzGHHiabrI9x4PPM0mRmCcNHnh6Zffv4DlyPdmT788v/rYI3QxHI/kMK8gStUvVrNN9poEBKdxAPko1sC924YogZ82wOdPt2zRWVuYrCN2Z7PLJYqiOY4wEe+6wRhOgxFrpwnBn2HhydRZuJ7HnqgJQJLA3cGfZmcEY5wG8RomZM+6aDbzFI0Vk9kjdb/Nl2yi1yEsLNdYvwgUm6yysXNKsgtFzH9AWHt6YcICOF36dYT1/ClcLNsh7DDj3iOUHU2vSlnPIOz/KOS/iuBXxMnJLtNFhGgICf9+LFqidIViPnS8xEKGDPgv8yI23bsF6+etEkFk0dvHhHBhwn4Sez2C4j3+VsgvKeRP8JIP7ifoFdYOJb9IsxksMdkCwq8GnvWSQkYJTiL1lF1/hqqFzc29NZgFBkwuqVtMkhCvcAyih7z1rsxOeZ8fMd4oJvoLJsleCVmQ8g8VWUx+k3/oOLuweeGU+PAY+ytRDcgKJsfkTz37ERgBQdmSOG8dowYnwZitGbwhlK+Ae/vBfxAN8k4s2nffnwfjFkA5KkPSmZqQdKwaSI4uhUhTh/w3gQQkHBj4VaAQAj8UrL0ZRvBV2AiAseaeIlqLngAyKAosF4C3wRHy9xlmmJKOOkIKWymy/4O/fzPWt3+q4cTN/a50t1d3LSLMaYiwA7x0HYQ5BmdQJltzgIm7ZwJpGiX0XGhVNORs9vg4m10EcrbXNeRmBmF/S0ksfxNgn4CECuwIPQfY/1+FxmF4XPwlNFsVcFUlJV5EzKYmAmIeWHNqxwu6ySj6zUFu3BByky4hNzY4Q9s+j2scZMi7FZbLk3zUZ/U2srrGmnZFiySFhEFlzZVQmMafUbzi0IprcCaBxGzTIPXrDUWOzv9DX5uUcknKwzAszzO1R6H4IzVqdTAcQ/1NqTOZHQvRq5xgZgSz6fiQUq2ev1kdOmmqQ89VourVJ8wkcs7pdoXVvSoLy4mptypcnE3jdMaeHBYVyX4DaSYsTD6TXPobATHdsLXvtQQZO51LENPTOJXSN8lOmU5MDPSP1l5NfODKtG4Q0aIh2PDLZQR3tzxIKIRMoC7v/QhQimRwBZBEN8dcuJboVyY23KGkIM/Y3Z9a1LHrXJrxGy3MijYr83lAKlb/kGi03xCNYrqPiBOsbVFpN/XonWk9vxQYwq5jiKzxTKFajQWOZxVOkz/VEKrGQCO3PNDEbiadGbuAfaHbhnegzbXAZGQdnZc7PdqfXcgZtKoqbNOsXH18mgsDCDAvIaq1IH4EC/akBBkQoVXMMcY4kJsed1z6IObK36oHaxQE0sCAFH0BCzEeZ2ZFRzb4+G4wvj8mvlQyQL08yIKYRcY/IjwOh0OtG2tmTUrkd1vhWrs86FDf6xHwcinCju0bAbbpSjLV85LytMTdB7TxQ+h/Zs/nzFPPH5h6Ksy6PlLiG+zAlii5HCdw65JKQ9M2YuJKbhdFrWpqQf8Nq6GBGnfFtq+pAB0zQI5kOI4Kr4AprRWPxQ085RDccLsNQR5/0fFnx5KR8HyRC2upeWCBSQDJ0Jc0vRWfJN8Nh8X27yWv5FwjeGUNEj+UPsnPYA0Dw2liztXFPq4/Jj0jEWNcEsx9udAXsoPzufK1SCoi+CipFW8XdpDqbYMWdbqWXW/r9E7D9HqahvX8XAnXS1O5GLKvEVMiofNMmfuLcPy8BFGEmGTpgSXtVPR6D+L6JuERD1G8KASh9SaCa/bzGBOI4AeBLykiQs4EIAE1+TUJNBDjRGbntBs0F6SiVLybxynyPB7vYI4XM+khn+k55S0SwTyr1mFK7dQgx+Wx3zSY6XpHuda68Txvcp4NdHkjZzT6d/FPWvxZl4J/ZAaotiGMC2hn1qKfRlr8+JgLJGHnzMUkk5Rw/V2VGh9FvmnIpfhG6H/5gWNi4qvzZEbHPRkG25kzKvsc7XgybmlQ172WH+OYHirT7JltIK0+ccmMuh6q++5zio5ZslRvZx0KUj6obrsbZlu/9pDG3UcnXdOkysRUMQsk8zjcC7LW0iEp5YWMtI9OEzH0goJHI3NADSI1HWnCtwKal9eEs6aa0OlSEzpmkOYUaCr5x8D5TMC2hwDtPlXjmqmaHKA8VUtRvIoMQOahCwXAg75PXyB5IPNQm8PI0ha1OYxTwd0mjKcNYeyca9DWGz0TU9HcWIV/TnnApimI6rBj7+iwBxIS13S0XNMKC+BQFEnkpbB0HzMU8HhtffGf0IOq4ijKZZjWffLBSmoCXeNUftjDOqXOVd3ZGqxZOsp1rEZc2RrHOS2rxh6qxe5roEa2QZRvOYDiek3Nxk5rAUemR8fL1qSf8JJCmhgCtFmBKH1nwZI3cnw8yGlVoiXqm+fmfUuDXi3p65o5wtOFZl/KvjLnoT8JlBrddGrl6HXLRoup0F5Ui15epDctBnUPFDhdR6S7h4o2T0FuT2oIq8DtQRSgnVCo7JPsbnpK6O7joXrgf+3KQr7tGnblSaWK2bZ37ZdUt5xX+nvO0f6XKVUcmyH23OilGxxT2MTqbbXm6F0bz1Y9K+bG89Stljcqu+jcUt/SoFfLNGqQvvN6pInXN3N6bHotOdQ+psKb/XT/w7EqVRrwAlUi+lbo+14rVMeTcWkd6ypU605wuNgyeg2OxrjMDo22d1sU1zAfr7oL45R9IW3aFE1zI7bl1jPStbZuTCsSx5tWWLBp3qRqdxgDtbR1o/odna85NK9hRqn6Fy5jEHmmvnqPeze8A375e967oTOARraatS14kb3IsJkBosP1I6Lynk2dYh+BRFTHZeW+QW64dp9nu7gvpsRhI8HZ7TE21qEC+RO2l5LdszrzqE8JsaFj9S1yYeuEQPFwkyyxXQVcHpEtHMphGKe6tMQIBavdKBravStn7zy5rSDYCKyd1jRnE313u+4NjHYfxrWtmr1wXBkOPAXPjzoiJHa+oThg5kxyoCxFlZeLWvN5hsVC7bg8korvVumLOuxBaZZtNT9l40Ak6Uz7zLC6Mya8UomKrTfPX5cljvqnFZe56rNeQWA3ZotuxbWZb/sqDgk0pHH34TvbMgM/zNvjP7PO5smOLqsX1lXprGS3TKhvQ6g2GKqTA/WOwhDQzJDizo0xijDR9EYiYXnRrRhI+EWoUDojvlR9X0xaO0zmNse+6IVWgdw0Z6Ww1B2UD8Xtv5qj0QxQd18YaNecjvbGGY8W/6KBnMtL1cr5kPx0yLbOhzQW5pK7uXgCIzs1XNoo+dnr7sPf \ No newline at end of file diff --git a/site/static/images/developer_docs_plugin_grpc.png b/site/static/images/developer_docs_plugin_grpc.png new file mode 100755 index 00000000..3df47e63 Binary files /dev/null and b/site/static/images/developer_docs_plugin_grpc.png differ diff --git a/site/templates/shortcodes/image.html b/site/templates/shortcodes/image.html new file mode 100644 index 00000000..e40f38f2 --- /dev/null +++ b/site/templates/shortcodes/image.html @@ -0,0 +1,2 @@ +{% set image_url = get_url(path=path) %} +