Skip to content

Sockets and Shared Memory (10,000X speedup)

Latest
Compare
Choose a tag to compare
@arendsee arendsee released this 02 Jan 17:31
· 7 commits to master since this release

This release includes a full rewrite of the morloc backend. In prior releases,
every cross-language call would require writing all arguments to files on the
disk and calling a morloc-generated executable for the given language with the
arguments as temporary files. For interpreted languages like R and Python,
starting these executables would require initializing the interpreter at a cost
of ~300ms or ~50ms, respectively. So cross-language calls were very expensive.

This release replaces file-based communication with shared memory and cold calls
to executables with UNIX domain socket messages between daemons. Compiling a
morloc module creates a nexus executable that serves as the command-line
interface to the exported functions. The nexus accepts function arguments as raw
JSON, JSON files or MessagePack binary files. Calling a specific function will
first initialize a daemon for every language the morloc function uses. Each
daemon listens over a UNIX domain socket for commands (either from the nexus or
another pool). When the nexus or a language daemon makes a cross-language call,
arguments are converted to a generic binary form in a shared memory pool. The
relative pointers to these arguments are sent to the downstream daemon via a
message over a UNIX domain socket. The downstream daemon performs a computation,
writes the result back to the shared memory, and returns a message over the
socket telling the caller where to find the result. These messages also encode
error status, allowing error messages and possibly other metadata to propagate
between languages and ultimately back to the user.

Cross-language communication now has a constant overhead of a few microseconds
for needed to message over a socket plus the time required to format
argument data to/from generic binary forms in shared memory. As a simple test,
the morloc function map inc xs -- where map is a C++ loop, inc is a python
function that increments an integer, and xs is a list of integers -- runs
at under 3 microseconds per integer. This is a ~10-20 thousand fold
improvement over the past cost of ~50ms per call to Python.

None of these changes to the backend have any effect on the code the morloc
programmer writes. The type annotations that were added in the past are
sufficient for the morloc compiler to convert all types from all languages to
and from generic binary structures.