What if everything was a process? #59

bkolobara · 2021-09-02T15:17:35Z

bkolobara
Sep 2, 2021
Maintainer

Lunatic, similar to Erlang and Elixir, encourages application designs around processes as the primitive building block. Processes in lunatic are extremely lightweight and spawning a process could be somewhat compared to creating a new object in object oriented languages. Even Alan Kay, the inventor of the OOP term, envisioned a more process like design when discussing object oriented patterns:

I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages (so messaging came at the very beginning -- it took a while to see how to do messaging in a programming language efficiently enough to be useful). ~ Alan Kay

In the rest of this post I'm going to explore some of the new features in lunatic and the relationship between message passing and function/method invocations.

1. Permissions through processes

Lunatic has a much stronger sandboxing model around individual processes than Erlang. It allows you to spawn processes and limit their CPU & memory usage and access to certain host functions.

Host functions in lunatic are similar to syscalls in lower level languages. They allow you to open files, read from the network or even spawn new lunatic processes.

This characteristic is really important in a world where we depend on so many 3rd party dependencies that it becomes impossible to audit them all, but if we can sandbox them inside processes without any permissions, we can limit their impact. A pattern where you spawn a process with specific permissions just to call a library function inside of it could be a viable option in an environment where processes are cheap. Or even a more common use case, answering each web requests in a separate sandboxed process.

There are efforts in other runtimes to solve similar problems, like deno's permissions. However in deno's case the permissions affect all code that is running and is more intended for endusers running scripts than for developers giving specific permissions to dependencies. I feel like inverting the power here and giving the power to developers to set permissions on parts of their code base is a much more powerful concept. Somehow endusers always end up "clicking OK" until the script they are trying to run works.

Forbidding a process to access the network or the filesystem can be useful, but sometimes it's not enough. I would like to show here how we can use processes to augment host functions.

Recently two new exciting features landed in lunatic:

The Registry allows you to register processes under a name and version.
Tags allow for selective receiving of messages.

With them we can create new interesting designs where many host functions are replaced or augmented by processes and messages.

One interesting use case would be changing host functions behaviour by wrapping them into processes. Let's look at an example of creating a wrapper process that changes the behaviour of the DNS resolve host call by limiting the lookup to only "*.example.com" subdomains:

pub fn resolve(mailbox: Mailbox<Request<String, SocketAddr>>) {
    loop {
        match mailbox.receive() {
            Ok(request) => {
                let lookup_name = request.data();
                // Only allow *.example.com lookups.
                if lookup_name.ends_with(".example.com") {
                    let ip = lunatic::resolve(lookup_name).unwrap().next().unwrap();
                    request.reply(ip)
                }
            }
            Err(_) => (), // ignore invalid requests
        }
    }
}

This process runs forever in a loop and answers requests, imitating a lunatic::resolve function, but only if the request is for "*.example.com" subdomains. It actually internally uses lunatic::resolve and adds additional behaviour to it.

To use this resolve process we would create a new Envrionment and register the process under the name "resolve" and version "1.0":

// Spawn a new resolve process.
let resolve = process::spawn(resolve);
// Create a new environment without access to the "lunatic::net::resolve" function.
let mut config = Config::new(1_200_000, Some(1));
config.allow_namespace("lunatic::process");
config.allow_namespace("lunatic::message");
let mut env = Environment::new(config).unwrap();
// Register the resolve process inside the environment under a well known name.
env.register("resolve", "1.0", resolve);
// Add the currently running module to the environment.
let module = env.add_this_module().unwrap();
// Spawn a child process inside of the limited environment.
module.spawn(child);

Now the child process spawned inside of env can resolve domain names by first looking up the resolve process and sending requests to it:

pub fn child(_: Mailbox<()>) {
    let resolve = lunatic::lookup("resolve", "^1.0").unwrap();
    let _ip = resolve.request("test.example.com").unwrap();
}

Notice that only two host function namespaces are allowed inside the environment lunatic::process & lunatic::message, but not the lunatic::net::* one containing the original lunatic::resolve host function. That way the child process can't circumvent our limitation and directly use the resolve host function. That's a great way to implement some higher level logic around permissions inside of child processes. You basically block the original host function and re-expose it through a process that has a well known name and version.

The resolve process could be considered an object and can have some internal state. It could for example keep track of lookups and rate limit them depending on the caller. Being able to change the behaviour of host functions programatically by wrapping them into processes become a superpower with unlimited possibilities.

The code is quite verbose and uses some of the lower level APIs for educational purposes, but these features can always be wrapped into higher level types, abstracting away details from developers. For simplicity reasons I didn't include any error handling in the examples, but you will probably want to send a reply back in case of an error, instead of just not sending anything and have the client wait forever on the response. Also, I didn't try to compile the code.

2. Host functions as processes

The previous example demonstrates well how messaging is not that different from function calls, a request is made (with arguments) and a response is received (return value). We could take this a step further and replace most of our host functions with native processes. Using the versioning ability of the registry we could even provide backwards compatibility in the runtime by shipping for example a [email protected] that has a different signature/behaviour, but not removing [email protected] from the VM. All this "processes" could be implemented as native processes using the Process trait in the runtime, but instead of using a message queue they would be function calls without much of additional overhead.

One downside of this approach would be that we lose the ability to get instantiation time errors when loading the module, on host function type mismatches. Only during the runtime can we tell if the request was a correct message or not. This burden could always be moved onto the library developers that wrap the message sending into correctly typed functions. In this case we have truly come full circle and have again "host functions", but this time on the library and not VM level.

3. Resources as processes

We can even take this ideas further and treat resources as processes.

A good example would be UDP connections. A processes subscribes itself to the UDP process and gets all datagrams from the UPD connection or can send data to it. Another would be a stdio process, sending messages to it writes to the standard output and if there is new input the process forwards it to us.

One more complicated example would be a GPU process representing a GPU resource. You could send messages to it containing data that is copied from the local memory into the GPU memory or commands that instruct the GPU what operations to perform on that data. This way of using GPUs maps well onto how modern GPUs actually work. Sending a big data message is usually considered bad practice in Erlang (and lunatic), because the process you are sending it to could live on a different machine and sending a lot of data over the network could slow everything down. Modern GPUs have separate memories and sending a lot of data to them requires copying them to a different memory and should be avoided when possible. Usually you want to transfer the data only once and later just perform operations on it. This is a good demonstration of how also the system limitations are communicated well by using a process abstraction.

TCP streams

Even TCP streams may seem to be in a similar basket as UDP when it comes to modelling them with processes, they are quite different. Usually you will end up having a message based protocol built on top of the TCP streams, but the stream is not a message based protocol itself. To construct theses higher level messages you want the freedom to pull data out of the stream and put it into a message. You could take the same approach as UDP, but it just feels more natural to me to pull exact chunks of data out, instead of someone pushing arbitrary sized chunks into your message queue when they arrive.

With the UDP process there was a concept of subscribing to it and receiving new UDP datagrams, but with TCP I would use a different approach more similar to a function call. The main problem with TCP streams is that the decision of what represents a message/request/replay is defined by some higher level agreement and can't be decided without reading parts of the message first. In many cases this would be reading a few bytes that contain the size of the message and then the whole message once we have the size. This could be modelled by sending a request to the TCP stream process with the data size we want to read and waiting on a message containing the data.

This approach is almost identical to doing a read() function call and doesn't have any other benefits. In this case I would actually prefer just to use the host function directly and model the (message based) higher level protocols around processes and messages.

Conclusion

I was never a big fan of Java's "everything is an object" or Unix's "everything is a file" approach. Some concepts just don't map well onto files or objects and the API around them feels somewhat awkward or it negatively impacts performance. And probably modelling your application architecture with an "everything is a process" approach would have a similar result.

However, I hope I managed to get you a bit excited about using processes to model different parts of your application, and to show how versatile processes actually are. Like an object, a process can represent many different things too.

I would love to hear from other developers some ideas that could push processes/actors into new and interesting use-cases. Please leave a comment if you have any thoughts!

gaolaowai · 2021-10-05T14:56:38Z

gaolaowai
Oct 5, 2021

I've been considering files as a process, where file objects live for limited duration (from minutes to days) and need to be accessible to multiple other hosts within a network. I know REDIS has the ability to do something similar, but being written in C/C++, it has newly reported vulnerabilities discovered every other day...

As a use-case, this appeals to me. We get the in-memory nature of processes that store binary data until they expire, isolation between those processes, with memory safety guarantees of Rust (plus WASM sandboxing).

1 reply

teymour-aldridge Oct 29, 2021

That idea sounds awesome (and a nice way to handle files in a distributed system declaratively)!

(Assuming there isn't some hidden caveat of course)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What if everything was a process? #59

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

What if everything was a process? #59

bkolobara Sep 2, 2021 Maintainer

1. Permissions through processes

2. Host functions as processes

3. Resources as processes

TCP streams

Conclusion

Replies: 1 comment · 1 reply

gaolaowai Oct 5, 2021

teymour-aldridge Oct 29, 2021

bkolobara
Sep 2, 2021
Maintainer

Replies: 1 comment 1 reply

gaolaowai
Oct 5, 2021