Skip to content

Latest commit

 

History

History
435 lines (336 loc) · 21.1 KB

File metadata and controls

435 lines (336 loc) · 21.1 KB

Fidl Compiler Frontend

This document serves as an overview of the internal workings of the FIDL compiler fidlc. fidlc is a command line tool that takes in a number of FIDL files, and outputs FIDL JSON IR.

Overview

The main inputs to the compiler are the arguments to the --files flags, which describe a list of files grouped by library. The compiler parses each file individually to obtain a parse tree for each file:

  1. Each file is loaded into a SourceFile, which owns the buffer backing the file
  2. The compiler initializes a Lexer, which takes the SourceFile as an argument to its constructor. This class exposes a Lex() method, which returns the next Token in the file; it can be called repeatedly to get the sequence of Tokens in the file.
  3. The compiler uses the Lexer to initialize a Parser and then calls the Parse() method, which constructs a parse tree. The code refers to parse trees as a raw AST. This function returns a raw::File, which is the class representing the root node of the raw AST.
  4. Once the compiler has parsed each file into a parse tree, it then groups the parse trees into a single AST (referred to in the code as a flat AST) for each library. The root of this flat AST is a flat::Library.
    • For each library, the compiler traverses every raw::File parse tree corresponding to that library, converting raw:: nodes to their flat:: counterparts. For example, a raw::StructDeclaration becomes a flat::Struct, and a raw::Constant becomes a flat::Constant. The converted flat:: AST nodes are then stored under a single flat::Library. Initially these flat AST nodes contain the same information as the raw AST nodes, but they also contain fields such as a value field for values and a typeshape field for types that are later set during the compilation step.
  5. Once the AST has been fully initialized, the compiler evaluates constants and determines the memory alignment and size information for the declared types.

Note: One of the key distinction between raw and flat ASTs is that each raw tree represents a single file, whereas each flat tree represents a single FIDL library that may contain multiple files.

Finally, we end up with a flat AST that is processed and ready for backend generation either to C bindings or to a JSON IR.

Lexing {#lexing}

The Lexer works primarily by keeping track of two char * pointer into the file data. The current_ pointer marks the current location that the class is at. Thetoken_start_ pointer marks the start of the lexeme that is currently being worked on. Each time the Lex() method is called, the current_ pointer is advanced until a complete lexeme has been traversed. Then, a Token is constructed using the data in between the two pointers.

The lexer also keeps track of a third previous_end_ pointer so it can get the data between lexemes (generally whitespace) when it constructs the Token. This example shows of how the pointers change during a call to Lex() on the short FIDL snippet const bool flag = true;:

Initial state after lexing the const keyword:

 const bool flag = true;
      ^current_
      ^token_start_
      ^previous_end_

Whitespaces are skipped until the next lexeme:

const bool flag = true;
      ^current_
      ^token_start_
     ^previous_end_

The current_ pointer is updated until the end of the lexeme:

const bool flag = true;
          ^current_
      ^token_start_
     ^previous_end_

At this point, the next Token that gets returned is ready to be constructed. A Token is created with its previous_end_ argument set to the data between previous_end_ and token_start_. The location_ is set to the data between token_start_ and current_. The kind is set to Identifier. Before returning, the pointers are reset and end up in a state similar to the initial state. This process can then be repeated for the next token:

 const bool flag = true;
           ^current_
           ^token_start_
           ^previous_end_

Internally the two pointers are manipulated with these main methods:

  • Skip(). Skips over any unnecessary characters (e.g. whitespace) by moving both the current_ and the token_start_ pointers forward.
  • Consume(). Returns the current character and advances current_.
  • Reset(). Returns the data between token_start_ and current_. Then, sets token_start_ to the value of current_.

Parsing {#parsing}

The Parser's goal is to convert a Token stream into a parse tree (the raw:: AST) with the Parse() method. The Token stream is generated by calling Lex repeatedly on a FIDL file. This parser is implemented with recursive descent. Each node of the raw AST has a corresponding ParseFoo() method that consumes Tokens from the Lexer and returns a unique_ptr to an instance of that node. In case there is a failure, a nullptr is returned.

Note: A node of the raw AST is a start and end token pair with pointers to any children. The start and end tokens of each child are contained within the start and end tokens of its parent. The root node, which is a raw::File, has a start token equal to the first token of the file and an end token equal to the last token of the file.

The Parser keeps track of the:

  • Current nodes that are being built using a stack of SourceElements, which are stored in active_ast_scopes_.
  • Current and previous Tokens that are being processed. last_token_ and previous_token_, respectively.
  • Its own state machine inside of the Parser::Lex method. The current token (last_token_) is always the next token that is about to be consumed, which effectively gives the parser a one token lookahead (i.e. LL(1)).

The Parser determines what kind of node the current Token belongs to based on the Token::Kind of last_token_, using the Peek() method. Then, the Parser updates its state and constructs the nodes through the use of the ASTScope class as well as the ConsumeToken and MaybeConsumeToken helper methods.

This example shows a simple non-recursive case line by line. The parser method looks like this:

std::unique_ptr<raw::StringLiteral> Parser::ParseStringLiteral() {
    ASTScope scope(this);
    ConsumeToken(OfKind(Token::Kind::kStringLiteral));
    if (!Ok())
        return Fail();

    return std::make_unique<raw::StringLiteral>(scope.GetSourceElement());
}

It consumes a single token and returns a leaf node, raw::StringLiteral:

class StringLiteral : public SourceElement {
public:
    explicit StringLiteral(SourceElement const& element) : SourceElement(element) {}

    virtual ~StringLiteral() {}
}

The method starts by creating a new ASTScope, which initializes the SourceElement that will later be used to create the returned node by pushing it onto active_ast_scopes_. The start of the SourceElement gets set to the first token that is consumed and the end gets set in the call to GetSourceElement() before returning.

The new SourceElement for the StringLiteral gets initialized inside the ASTScope constructor:

parser_->active_ast_scopes_.push_back(raw::SourceElement(Token(), Token()));

Then, ConsumeToken is called to process the next token. This method takes a predicate constructed with OfKind() and calls that predicate on last_token_'s kind or subkind. OfKind() returns a function that checks that its input matches the given kind or subkind. If the predicate fails (in this case, if the current token is not a string literal), the error gets stored in the class and the method returns. The Ok() call catches the error and stops the compiler with a parsing error. If the predicate succeeds, the start token of any SourceElements in the stack that are uninitialized are set to the current token:

for (auto& scope : active_ast_scopes_) {
    if (scope.start_.kind() == Token::Kind::kNotAToken) {
        scope.start_ = token;
    }
}

In this example, the start token is set to the top element of the stack since it was initialized at the start of the method. The Parser then advances to the next token by setting previous_token_ = last_token_ and last_token_ = Lex().

Finally, the resulting StringLiteral node is returned using scope.GetSourceElement(). This sets the end token of the SourceElement at the top of the stack to the previous_token_, and then returns the SourceElement. The final node has the same start and end token because StringLiterals are only ever a single token long, but other methods may consume many tokens before calling GetSourceElement(). When the method returns, the destructor of ASTScope pops the top element off of active_ast_scopes_.

Compilation

At this point, the compiler has successfully constructed a raw::File for each input file and proceeds to compile these raw ASTs into flat ASTs in three steps:

  1. Consumption: where the raw ASTs (whose representation tries to match the FIDL grammar) are grouped by FIDL library and de-sugared into flat ASTs (whose representation tries to match the FIDL language semantics). This step converts one raw::File per file into one flat::Library per library.
  2. Topological Sorting: to determine the order in which to resolve the flat AST nodes (see next step).
  3. Resolution: which performs name resolution, type resolution, and type shape and size calculations. The resolution process is done node by node, and the resulting information is stored into the flat:: node itself.

Note: Within the code, the term "compilation" may refer to these three steps as a whole (for example, all three steps are performed within the Library::Compile method), or it could refer specifically to the "resolution" step (for example, the logic for performing resolution is contained in the Library::CompileDecl method ). In general this document tries to use "resolution" when referring to the third step to disambiguate between the two possible meanings.

Consumption

After each file is parsed into a raw::File and an empty AST (flat::Library) has been initialized for each library, the AST needs to be updated with all of data from the raw::Files that correspond to it. This is done recursively with the ConsumeFoo() methods. Each ConsumeFoo() method generally takes the corresponding raw AST node as input, updates the Library class, and then return a bool to indicate success or failure. These methods are responsible for:

  • Validating the placement of attributes; for example checking that the Transport attribute is only specified for protocols.
  • Checking for any undefined library dependencies (e.g. using textures.Foo will error if the textures library was not imported)
  • Converting the raw AST nodes into their flat AST node equivalents, and storing them in the Library's foo_declarations_ attribute. Initially the values of the flat AST nodes are unset but they are calculated later during compilation.
  • Registering each declaration by adding them to the declarations_ vector. Const declarations and enum/bit fields (which declare a value) are also added to the constants_ vector, whereas all other declarations (which declare a type) get their corresponding type template added to the library's typespace.

Topological Sorting

Once all of the declarations for a given Library have been added to the declarations_ vector, the compiler can proceed to resolve each individual declaration. However, it must do this in the correct order (so that any dependencies of a declaration are resolved before it); this is accomplished by first sorting the declarations into a separate declarations_order_ vector, and then iterating through it to compile each declaration. The sorting is done in the SortDeclarations() method, and makes use of DeclDependencies() to determine the dependencies for a given declaration.

Resolution

Given the sorted declarations, the compilation happens through the CompileFoo methods, generally corresponding to the AST nodes (e.g. CompileStruct, CompileConst), with CompileDecl as the entrypoint. The main purpose of CompileDecl is for:

Once this step is complete, the flat::Library contains all the necessary information for any code generation. The FIDL compiler can directly generate C bindings, or can generate a JSON IR that can be consumed by a separate backend

Additional Checks

Before marking compilation as successful, the FIDL compiler also does a few additional checks: for example, it will check that any constraints on attributes are satisfied, and that all declared library dependencies were actually used.

Backend generation

The FIDL compiler emits a JSON intermediate representation (IR). The JSON IR is consumed by a separate program, named the back-end, that generates the language bindings from the JSON IR.

The officially supported FIDL language back-ends are:

C Bindings

For historical reasons, C bindings are directly generated from the FIDL compiler - the C bindings do not support all features and types that are used with the C bindings need to be annotated with the Layout = "Simple" attribute.

C Family Runtime

fidlc is also responsible for generating "coding tables", which are instances of fidl_type_t used to represent the FIDL messages at runtime and are used by all of the bindings for the C family of languages (C, LLCPP, HLCPP). To do this, the flat AST is converted to an intermediate representation called the "coded" AST with the CodedTypesGenerator, which iterates through the flat::Decls in a given flat::Library and converts each raw::Decl node to the corresponding coded::Type node (e.g. flat::Struct becomes a coded::StructType).

The coding tables are then generated by the TablesGenerator, which will generate C code for each coded::Type that constructs the equivalent fidl_type_t type. For example, for a coded::StructType called MyStruct, the TablesGenerator would write out C code that constructs an equivalent fidl::FidlCodedStruct, something like:

const fidl_type_t MyStruct = fidl_type_t(::fidl::FidlCodedStruct(
    my_struct_fields, 1u, 32u, "mylibrary/MyStruct"));

Coding tables for FIDL libraries in //sdk/fidl are generated into the out directory under fidling/gen/sdk/fidl/LIBRARY/LIBRARY.fidl.tables.c. For example out/default/fidling/gen/sdk/fuchsia.sys/fuchsia.sys.fidl.tables.c. The README gives additional context. The fidl_type_t definitions (such as for FidlCodedStruct) can be found in internal.h.

Glossary

Decl {#decl}

The Decl is the base of all flat AST nodes, just like SourceElement is the base of all parser tree nodes, and corresponds to all possible declarations that a user can make in a FIDL file. There are two types of Decls:

  • Consts, which declare a value, and have a value attribute that gets resolved during compilation, and
  • TypeDecls, which declare a message type or interface and have a typeshape attribute that gets set during compilation.

TypeDecls representing an aggregate type (e.g. structs, tables, and unions) also have a static Shape() method, which contains the logic for determining the Typeshape of that given type.

FieldShape {#fieldshape}

A FieldShape describes the offset and padding information for members of an aggregate type like a struct or union. Generally these fields will require both a Typeshape as well as a FieldShape

Name {#name}

A Name represents a scope variable name, and consists of the library the name belongs to (or nullptr for global names), and the variable name itself as a string (for anonymous names) or a SourceLocation. Names can also optionally include a member_name_ field when referring to the field of a declared variable (for example MyEnum.MyField).

SourceElement {#sourceelement}

A SourceElement represents a block of code inside a fidl file and is parameterized by a start_ and end_ Token. All parser tree ("raw" AST) nodes inherit from this class.

SourceFile {#sourcefile}

Wrapper around a file, which is responsible for owning the data in that file. Also see Virtual SourceFile

SourceLocation {#sourcelocation}

Wrapper around a StringView and the SourceFile it comes from. It provides methods to get the surrounding line of the StringView as well as its location in the form of a "[filename]:[line]:[col]" string

SourceManager {#sourcemanager}

Wrapper around a set of SourceFiles that all relate to a single Library.

Token {#token}

A token is essentially a lexeme (in the form of a SourceLocation stored as the location_ attribute), enhanced with two other pieces information that are useful to the parser during the later stages of compilation:

  • previous_end_. A SourceLocation, which starts at the end of the previous token and ends at the start of this token. It contains data that is uninteresting to the parser such as whitespace.
  • A kind and subkind that, together, classify the lexeme. The possible kinds are:
    • The special characters such as Kind::LeftParen, Kind::Dot, Kind::Comma, etc...
    • String and numeric constants
    • Identifiers. Tokens that are keywords (e.g. const, struct) are considered identifiers, but also have a subkind defined to identify which keyword it is (e.g. Subkind::Const, Subkind::Struct). All other tokens have a subkind of None.
    • Uninitialized tokens have a kind of kNotAToken.

Type {#type}

A struct representing an instance of a type. For example, the vector<int32>:10? type corresponds to an instance of the VectorType TypeDecl with max_size_ = 10 and maybe_arg_type set to the Type corresponding to int32. Built-in types all have a static Shape() method that returns the Typeshape given the parameters for an instance of that type. User defined types (e.g. structs or unions) will all have a type of IdentifierType - the corresponding TypeDecl, like Struct provides the static Shape() method instead.

TypeDecl {#typedecl}

See Decl

TypeShape {#typeshape}

Information about how objects of a type should be laid out in memory, including their size, alignment, depth, etc.

Typespace {#typespace}

The typespace is a map from Type names to a TypeTemplate for that Type. During compilation, the typespace is initialized to include all of the built-in types (e.g. "vector" maps to VectorTypeTemplate), whereas user-defined types get added during the compilation process. This also ensures that a single type template instance exists per type and avoids name collisions/shadowing of types (e.g. fxbug.dev/7646).

TypeTemplate {#typetemplate}

Instances of TypeTemplates provide a Create() method to create a new instance of a specific Type - therefore there is a TypeTemplate subclass for each built-in FIDL type (e.g. ArrayTypeTemplate, PrimitiveTypeTemplate, etc.) as well as a single class for all user defined types (TypeDeclTypeTemplate), and one for type aliases (TypeAliasTypeTemplate). Create() takes as parameters the possible type parameters: argument type, nullability, and size.

For example, to create an object representing the type vector<int32>:10? the compiler would call the Create() method of the VectorTypeTemplate with an argument type of int32, max size of 10, and a nullability of types::Nullability::kNullable. This call returns an instance of a VectorType with those parameters. Note that not all 3 of these parameters apply to all of the types (e.g. PrimitiveTypes, like int32 have none of the 3). The Create() method of the type template for each type automatically checks that only the relevant parameters are passed.

The concrete type for user defined types is the IdentifierType, which gets generated by the TypeDeclTypeTemplate.

Virtual SourceFile {#virtualsourcefile}

A subclass of SourceFile that has a fake "filename" and is initialized with no backing data. It exposes an AddLine() method to add data to the file, and is used as a backing SourceFile for content that does not appear directly in any of the input SourceFiles, like for generated anonymous Names.