Skip to content

Latest commit

 

History

History
117 lines (83 loc) · 4.97 KB

large-inputs-and-stack-overflows.md

File metadata and controls

117 lines (83 loc) · 4.97 KB
title category categoryindex index
Large inputs
Compiler Internals
200
500

Processing large inputs without stack overflows

The compiler accepts large inputs such as:

  • Large literals, such as let str = "a1" + "a2" + ... + "a1000"
  • Large array expressions
  • Large list expressions
  • Long lists of sequential expressions
  • Long lists of bindings, such as let v1 = e1 in let v2 = e2 in ....
  • Long sequences of if .. then ... else expressions
  • Long sequences of match x with ... | ... expressions
  • Combinations of these

The compiler performs constant folding for large constants so there are no costs to using them at runtime. However, this is subject to a machine's stack size when compiling, leading to StackOverflow exceptions if those constants are very large. The same can be observed for certain kinds of array, list, or sequence expressions. This appears to be more prominent when compiling on macOS because macOS has a smaller stack size.

Many sources of StackOverflow exceptions prior to F# 4.7 when processing these kinds of constructs were resolved by processing them on the heap via continuation passing techniques. This avoids filling data on the stack and appears to have negligible effects on overall throughput or memory usage of the compiler.

There are two techniques to deal with this

  1. Linearizing processing of specific input shapes, keeping stacks small
  2. Using stack guards to simply temporarily move to a new thread when a certain threshold is reached.

Linearizing processing if certain inputs

Aside from array expressions, most of the previously-listed inputs are called "linear" expressions. This means that there is a single linear hole in the shape of expressions. For example:

  • expr :: HOLE (list expressions or other right-linear constructions)
  • expr; HOLE (sequential expressions)
  • let v = expr in HOLE (let expressions)
  • if expr then expr else HOLE (conditional expression)
  • match expr with pat[vs] -> e1[vs] | pat2 -> HOLE (for example, match expr with Some x -> ... | None -> ...)

Processing these constructs with continuation passing is more difficult than a more "natural" approach that would use the stack.

For example, consider the following contrived example:

and remapLinearExpr g compgen tmenv expr contf =
    match expr with
    | Expr.Let (bind, bodyExpr, m, _) ->
        ...
        // tailcall for the linear position
        remapLinearExpr g compgen tmenvinner bodyExpr (contf << (fun bodyExpr' ->
            ...))

    | Expr.Sequential (expr1, expr2, dir, spSeq, m)  ->
        ...
        // tailcall for the linear position
        remapLinearExpr g compgen tmenv expr2 (contf << (fun expr2' ->
            ...))

    | LinearMatchExpr (spBind, exprm, dtree, tg1, expr2, sp2, m2, ty) ->
        ...
        // tailcall for the linear position
        remapLinearExpr g compgen tmenv expr2 (contf << (fun expr2' ->  ...))

    | LinearOpExpr (op, tyargs, argsFront, argLast, m) ->
        ...
        // tailcall for the linear position
        remapLinearExpr g compgen tmenv argLast (contf << (fun argLast' -> ...))

    | _ -> contf (remapExpr g compgen tmenv e)

and remapExpr (g: TcGlobals) (compgen:ValCopyFlag) (tmenv:Remap) expr =
    match expr with
    ...
    | LinearOpExpr _
    | LinearMatchExpr _
    | Expr.Sequential _
    | Expr.Let _ -> remapLinearExpr g compgen tmenv expr (fun x -> x)

The remapExpr operation becomes two functions, remapExpr (for non-linear cases) and remapLinearExpr (for linear cases). remapLinearExpr uses tailcalls for constructs in the HOLE positions mentioned previously, passing the result to the continuation.

Some common aspects of this style of programming are:

  • The tell-tale use of contf (continuation function)
  • The processing of the body expression e of a let-expression is tail-recursive, if the next construct is also a let-expression.
  • The processing of the e2 expression of a sequential-expression is tail-recursive
  • The processing of the second expression in a cons is tail-recursive

The previous example is considered incomplete, because arbitrary combinations of let and sequential expressions aren't going to be dealt with in a tail-recursive way. The compiler generally tries to do these combinations as well.

Stack Guards

The StackGuard type is used to count synchronous recursive processing and move to a new thread if a limit is reached. Compilation globals are re-installed. Sample:

let TcStackGuardDepth = StackGuard.GetDepthOption "Tc"

...
   stackGuard = StackGuard(TcMaxStackGuardDepth)

let rec ....

and TcExpr cenv ty (env: TcEnv) tpenv (expr: SynExpr) =

    // Guard the stack for deeply nested expressions
    cenv.stackGuard.Guard <| fun () ->

    ...

Note stack guarding doesn't result in a tailcall so will appear in recursive stack frames, because a counter must be decremented after the call. This is used systematically for recursive processing of:

  • SyntaxTree SynExpr
  • TypedTree Expr

We don't use it for other inputs.