Skip to content

Building CosmicOS Messages

Paul Fitzpatrick edited this page Aug 1, 2016 · 3 revisions

Hello good evening and welcome to the build system from beyond the moon. This document assumes you made it through all the hurdles of building the CosmicOS repo (see README) and are now interested in understanding how it works in order to change it.

Message assembly

The message is assembled from parts. The parts are listed in src/README.cmake, which looks like this:

set(COSMIC_DEPENDS
  COS_License
  COS_Intro
  COS_Compare
  COS_NotLogic
  COS_Unary
  ...)

The message is constructed by going through each of these in order and finding a matching file in the src directory. For example, take COS_Compare. In the src directory, we look for any file called COS_Compare.*. We find (at the time of writing) Cos_Compare.js, a .js file. Different parts of the message are assembled using different languages or notations. To see what will happen .js file, we look in transform/cmake/convert_js.cmake:

...
execute_process(COMMAND node ${base}/${in} OUTPUT_FILE ${out} RESULT_VARIABLE result)
...

So basically we just run the file using node and store the output to append it directly to the message. The same is true for *.pl files. *.java files get compiled by javac, and then have their byte-code inserted into the message (there's a minimal interpreter described within the message). *.scm files are copied directly into the message without any processing - this was me being lazy, using the .scm extension cued my editor to do basically the right syntax highlighting and parenthesis matching. *.gate files are circuit descriptions that get compiled into a few representations.

The output of assembly is a file build/transform/assem.txt that looks like this (at time of writing):

== COS_License ==
#   Author: 
...

== COS_Intro ==
# MATH introduce numbers (in unary notation)
# Here we count up from zero, go through some primes, etc. There is some
# syntax around the numbers, but that doesn't need to be understood at
# this point. We give numbers in a tweaked unary format, rather than the
# encoding used in the main body of the message.
intro is:int;
is:int | unary 0;
is:int | unary 1 0;
is:int | unary 1 1 0;
is:int | unary 1 1 1 0;
is:int | unary 1 1 1 1 0;
is:int | unary 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 1 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 1 1 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0;
is:int | unary 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0;
intro is:square;
is:square | unary 0;
is:square | unary 1 0;
is:square | unary 1 1 1 1 0;
is:square | unary 1 1 1 1 1 1 1 1 1 0;
...
  • Lines of the form == section name == mark the beginning of sections. The notion of sections is meta-information not intended for inclusion in the message itself.
  • Lines that begin with # are comments, for use during development, and not intended for inclusion in the message itself.
  • Lines ending with a ; are textual versions of material intended for inclusion in the message. The actual way the line is encoded in the message is handled via a later compilation process.

Wrinkles of the assembled message

  • Lines of the form (....); should now have their outer parenthesis removed, becoming ....; - this was part of a reform to strip away any unnecessary nesting, but not all sections have been rewritten to reflect this. This should eventually become consistent.
  • Lines of the form demo | ....; will currently be replaced with = <val> | ....; in the actual message, where <val> is the result of evaluating the statement after demo. This feature was handy in a section or two, but isn't by any means essential, so maybe this wrinkle should get removed.

Encoding the message

The format of assem.txt evolved over time and is a bit awkward to work with. To get to a firmer starting point for processing the message, assem.txt is parsed into assem.json which looks like this:

[
...
  {
    "role": "comment",
    "lines": [
      "# note the appearance of new in the next line --",
      "# this is the only difference to previous version"
    ]
  },
  {
    "role": "code",
    "lines": [
      "define point1 (point new 1 11);"
    ]
  },
  {
    "role": "code",
    "lines": [
      "define point2 (point new 2 22);"
    ]
  },
...
]

We now have a list of elements, where each element has a clear role (currently comment, code, file meaning section, and gate). Once we have assem.json, we start enriching the code elements, saving the results in the creatively-named build/transform/assem2.json. A code section might expand to, for example:

  {
    "role": "code",
    "lines": [
      "is:int | unary 1 1 1 1 1 0;"
    ],
    "code": "210110111302321111111132132132132132132032233",
    "parse": [
      "is:int",
      [
        -1,
        "unary",
        1,
        1,
        1,
        1,
        1,
        0
      ]
    ],
    "stanza": 10
  },

The terribly named code subfield is the default encoding of the message.