A lot of code in prism's repository is templated from a single configuration file, config.yml. This file is used to generate the following files:
ext/prism/api_node.c
- for defining how to build Ruby objects for the nodes out of C structsinclude/prism/ast.h
- for defining the C structs that represent the nodesjavascript/src/deserialize.js
- for defining how to deserialize the nodes in JavaScriptjavascript/src/nodes.js
- for defining the nodes in JavaScriptjava/org/prism/AbstractNodeVisitor.java
- for defining the visitor interface for the nodes in Javajava/org/prism/Loader.java
- for defining how to deserialize the nodes in Javajava/org/prism/Nodes.java
- for defining the nodes in Javalib/prism/compiler.rb
- for defining the compiler for the nodes in Rubylib/prism/dispatcher.rb
- for defining the dispatch visitors for the nodes in Rubylib/prism/dot_visitor.rb
- for defining the dot visitor for the nodes in Rubylib/prism/dsl.rb
- for defining the DSL for the nodes in Rubylib/prism/mutation_compiler.rb
- for defining the mutation compiler for the nodes in Rubylib/prism/node.rb
- for defining the nodes in Rubylib/prism/serialize.rb
- for defining how to deserialize the nodes in Rubylib/prism/visitor.rb
- for defining the visitor interface for the nodes in Rubysrc/node.c
- for defining how to free the nodes in C and calculate the size in memory in Csrc/prettyprint.c
- for defining how to prettyprint the nodes in Csrc/serialize.c
- for defining how to serialize the nodes in Csrc/token_type.c
- for defining the names of the token types
Whenever the structure of the nodes changes, you can run rake templates
to regenerate these files. Alternatively tasks like rake test
should pick up on these changes automatically. Every file that is templated will include a comment at the top indicating that it was generated and that changes should be made to the template and not the generated file.
config.yml
has a couple of top level fields, which we'll describe below.
This is a list of tokens to be used by the lexer. It is shared here so that it can be templated out into both an enum and a function that is used for debugging that returns the name of the token.
Each token is expected to have a name
key and a comment
key (both as strings). Optionally they can have a value
key (an integer) which is used to represent the value in the enum.
In C these tokens will be templated out with the prefix PM_TOKEN_
. For example, if you have a name
key with the value PERCENT
, you can access this in C through PM_TOKEN_PERCENT
.
Sometimes we need to communicate more information in the tree than can be represented by the types of the nodes themselves. For example, we need to represent the flags passed to a regular expression or the type of call that a call node is performing. In these circumstances, it's helpful to reference a bitset of flags. This field is a list of flags that can be used in the nodes.
Each flag is expected to have a name
key (a string) and a values
key (an array). Each value in the values
key should be an object that contains both a name
key (a string) that represents the name of the flag and a comment
key (a string) that represents the comment for the flag.
In C these flags will get templated out with a PM_
prefix, then a snake-case version of the flag name, then the flag itself. For example, if you have a flag with the name RegularExpressionFlags
and a value with the name IGNORE_CASE
, you can access this in C through PM_REGULAR_EXPRESSION_FLAGS_IGNORE_CASE
.
Every node in the tree is defined in config.yml
. Each node is expected to have a name
key (a string) and a comment
key (a string). By convention, the comment
key uses the multi-line syntax of : |
because the newlines will get templated into the comments of various files.
Optionally, every node can define a child_nodes
key that is an array. This array represents each part of the node that isn't communicated through the type and location of the node itself. Within the child_nodes
key, each entry should be an object with a name
key (a string) and a type
key (a string). The name
key represents the name of the child node and the type
is used to determine how it should be represented in each language.
The available values for type
are:
node
- A child node that is a node itself. This is apm_node_t *
in C.node?
- A child node that is optionally present. This is also apm_node_t *
in C, but can beNULL
.node[]
- A child node that is an array of nodes. This is apm_node_list_t
in C.string
- A child node that is a string. For example, this is used as the name of the method in a call node, since it cannot directly reference the source string (as in@-
orfoo=
). This is apm_string_t
in C.constant
- A variable-length integer that represents an index in the constant pool. This is apm_constant_id_t
in C.constant[]
- A child node that is an array of constants. This is apm_constant_id_list_t
in C.location
- A child node that is a location. This is apm_location_t
in C.location?
- A child node that is a location that is optionally present. This is apm_location_t
in C, but if the value is not present then thestart
andend
fields will beNULL
.uint32
- A child node that is a 32-bit unsigned integer. This is auint32_t
in C.
If the type is node
or node?
then the value also accepts an optional kind
key (a string). This key is expected to match to the name of another node type within config.yml
. This changes a couple of places where code is templated out to use the more specific struct name instead of the generic pm_node_t
. For example, with kind: StatementsNode
the pm_node_t *
in C becomes a pm_statements_node_t *
.