-
Notifications
You must be signed in to change notification settings - Fork 1
/
basics.tex
169 lines (116 loc) · 12.3 KB
/
basics.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
\chapter{Basics and related work}\label{sec:basics}
% Compiler
\section{Compiler}\label{sec:basics:compiler}
The primary function of a compiler is to automatically convert high-level code created by a developer into (optimized) machine code.
As a compiler is an inherently large software project, an architecture needs to be chosen that allows for extensions and modifications.
Modern compilers mostly follow a layered architecture style: They are each comprised of a front-, middle-, and back-end.
In this architecture, the front-end converts the high-level code into an abstract intermediary representation, which is then used by the middle-end for optimizations and transformations.
Lastly, the back-end is responsible for converting the optimized intermediary code into instructions for the target system architecture (e.g., RISC-V, x86, ARM, or similar).
% Basic blocks
\section{Basic blocks and control-flow}\label{sec:basics:bb-cf}
To better handle code and give it a logical structure, most compilers divide code up into so-called \textit{basic blocks}.
(Basic) blocks are sets of consecutive operations that do not contain jumps or targets thereof, but rather only jumps connecting them.
Therefore, a basic block is either executed entirely or not executed at all.
A usual way to represent this in a human-readable form is to output it as a control-flow-graph (CFG).
CFGs depict basic blocks as nodes and jumps between basic blocks as edges.
Furthermore, it is a convention in these graphs to have precisely one start-node and one end-node.
We note that CFGs, in general, are cyclical graphs.
They are non-cyclical graphs if the original code does not contain any jumps going backward in the control-flow.
Another important concept of CFGs is dominance.
To explain this concept, we define a starting block $S$, and assume there are two (not necessarily different) blocks, present in the CFG, $N_1$, and $N_2$.
With this information, we define dominance as follows:
$$N_1~\text{dominates}~N_2 \Longleftrightarrow \forall p \in \text{Paths}(S, N_2):~N_1 \in p$$
In lucid terms, this means $N_1$ dominates $N_2$, iff to get to $N_2$ from $S$ you have to visit $N_1$ on the way.
It is important to note that a block always dominates itself.
% Lifetime
\section{Loops}\label{sec:basics:loops}
We define a loop to be a set of blocks that are all in a cyclical control-flow structure.
Formally this can be expressed as:
$$L~\text{is a loop} \Longleftrightarrow L \neq \emptyset \wedge \forall n_1, n_2 \in L:~\exists Path(n_1, n_2) \subseteq L$$
Henceforth let $L$ be a loop.
If a loop is completely contained inside another loop, it is said to be nested.
$$L'~\text{is nested in}~L \Longleftrightarrow L' \subsetneq L \wedge L'~\text{is a loop}$$
If a loop has no nested loops inside of it, we call it an \textit{innermost} loop.
$$L~\text{is innermost loop} \Longleftrightarrow \nexists L': L'~\text{is a nested loop in}~L$$
Loops can furthermore have a header, which is the sole entry point into a loop~\cite{aebi18bachelorarbeit} and defined as follows:
$$H~\text{is header of}~L \Longleftrightarrow H \in L \wedge \forall n \in L: H~\text{dominates}~n$$
N.B.: Not all loops have to have a header.
If a loop has a header, its body is the set containing all blocks in the loop, except for the header.
$$B~\text{is body of}~L \Longleftrightarrow B = L \backslash \{H\}, H~\text{is header of}~L$$
% SSA
\section{Single-Static-Assignment (SSA)}\label{sec:basics:ssa}
The \textit{single-static-assignment} (\textit{SSA}) form is a property of intermediary representations, that requires each value to only be assigned exactly once.
Moreover, every value has to be assigned before it is being used~\cite{cytron91}.
This property mainly implies that the block, which declares a given value $v$, has to dominate all blocks that use $v$.
This declaration point will be unambiguous across all possible usages.
The SSA form is used to simplify optimizations in the regard that one can be sure that a set point in the code currently defines a given value in use.
\Cref{fig:basics:SSA-simple} shows an example program in SSA form.
While in the base code $x$ is assigned twice, in the code transformed into SSA form, simply a new value was defined to make sure that each variable is only defined once.
\input{fig/simple-ssa-example.tex}
In a loop or a conditional statement, a scenario might arise where a given value could be assigned at multiple locations.
In cases like these we can use a \textit{$\Phi$-function}.
A $\Phi$-function is a theoretical construct that returns the correct value depending on the control-flow predecessor.
\Cref{fig:basics:SSA-phi} shows an example of the use of a $\Phi$-function, where depending on the control-flow, either $m_1$ or $m_2$ are selected.
\input{fig/max-ssa.tex}
% LCSAA
\section{Loop-Closed-Single-Static-Assignment (LCSSA)}\label{sec:basics:LCSSA}
An extension to the aforementioned SSA form is the \textit{loop-closed-single-static-assignment} (\textit{LCSSA}) form.
A CFG in LCSSA form has all properties that a CFG in SSA form has, and additionally the property that each value assigned in a given loop and used outside of this loop has to be used by a $\Phi$-node in the first block after the loop.
This form is used to reduce special casing when transforming loops~\cite{aebi18bachelorarbeit} and is therefore utilized through all of~\Cref{sec:impl}.
To visualize this property,~\Cref{fig:basics:LCSSA} depicts its effect.
\input{fig/lcssa-example.tex}
% FIRM
\section{\libFIRM}\label{sec:basics:firm}
\libFIRM{} is a compiler middle- and back-end that takes a graph-based intermediate representation in SSA form, optimizes it, and produces assembly code~\cite{libfirm}.
Since 1996, Karlsruhe Institute of Technology (KIT) actively develops~\libFIRM.
A graph in \libFIRM{} contains information about basics blocks, the control-flow, and memory and data dependencies.
Basics blocks in \libFIRM{} contain further nodes that are responsible for the control-flow of the program.
These are pointed to by (other) basic blocks that are the target of these control-flow operations.
The resulting control-flow edges are represented by red edges in visualizations of \libFIRM{} graphs.
Any node operating on memory also connects to the previous node operating on memory, so that the node uses the prior state of memory and then provides a new state with its changes.
Memory is, like control flow, connected by edges, which are colored blue in graphical representations.
Lastly, \libFIRM{} has data dependency edges between nodes, which represent dependencies needed for calculations.
\Cref{fig:basics:firm} portrays an example \libFIRM{} graph of the program initially shown in~\Cref{fig:basics:SSA-phi}.
It is especially to be noted that the graph has both memory, data and control-flow edges, and is in SSA form, since it contains a $\Phi$-node.
\input{fig/max.tex}
Another set of functionality that~\libFIRM{} provides is loop information.
\libFIRM{} will not only (if applicable) be able to map blocks to their respective loops and vice-versa, but also has information on loop nesting structure.
Thus, one can quickly determine, whether a loop is an innermost loop~\cite{libfirm}.
Further,~\libFIRM{} also allows for finding the header of a loop, given that it has a header~\cite{aebi18bachelorarbeit}.
Since loop unrolling involves node duplication in the implementation that we will use~\cite{aebi18bachelorarbeit}, it is worth mentioning, that upon a node will has a field called \texttt{link} that will be set to reference the copied node and vice-versa.
This allows for easy access of the duplicated and original nodes in later algorithms.
% Unroll loops
\section{Loop unrolling}\label{sec:basics:unrolling}
Loop unrolling is a compiler optimization that attempts to duplicate the loop body and to reduce the controlling instructions, such as the loop condition or repetitive arithmetic~\cite{aho_ullman_1979}.
\Cref{fig:basics:old-loop-unrolling} shows a pseudo-code example of unrolling a simple loop with a factor (the number of times the body is copied) of four.
It is to be noted that the loop condition has to be checked less often, on account of each loop iteration being four times as long as in the original program.
Furthermore, it could also be used to vectorize the code, eliminate repeating conditions, and for many other following optimizations~\cite{fog_2018}.
A negative side effect of loop unrolling is that the binary size increases and that there could be more pressure on the code cache and registers, causing more spilled values~\cite{Sarkar2001}.
\libFIRM{} supports a restricted form of loop unrolling for loops that have static bounds and increments~\cite{helmer10studienarbeit}.
This optimization was recently improved but now requires the intermediary representation to be in LCSSA form, which means \libFIRM{}'s intermediary representation has to be converted into LCSSA form prior to the optimization running~\cite{aebi18bachelorarbeit}.
Though this optimization had no preconditions and merely duplicated the loop in the hope a later optimization would remove the duplicated headers, so that it resembles an unrolled loop of our definition, in which only the body (and not the header) is duplicated.
In this form the amount of conditional jumps does not decrease in most cases, since the later optimization removing headers would not trigger.
Hence, the application of this method showed no improvements, which caused preconditions for the later removal of excess headers to be added~\cite{libfirm-unroll-static}.
\libFIRM{} now only unrolls loops, similar to the loop in~\Cref{fig:basics:old-loop-unrolling}, with static bounds and for which a constant bit analysis can remove excess headers.
The benefits of the now changed optimization were also very slim, likely since the requirements for a loop to now be unrollable are very strict.
With these restrictions in place, only approximately 5\% of the innermost loops can be unrolled\footnote{Measured in \texttt{spec2006}}.
\input{fig/loop-unrolling.tex}
% Duff's Device
\section{Duff's device}\label{sec:basics:duffs}
A common problem with the loop unrolling shown in~\Cref{fig:basics:old-loop-unrolling} is that it requires the number of iterations to be constant and divisible by the unroll-factor.
A way to tackle this issue is to use a construct known as Duff's device: It preemptively unrolls a loop with a given factor and uses~\textit{fixup} code to ensure that the remaining iterations are completed~\cite{duff_1983}.
Mathematically this means the construct executes the loop body $\floor{\frac{M}{f}} \cdot f + M \text{ mod } F = M$ times, where $M$ is the number of total times the loop body would be executed without the transformation and where $f$ is the unroll-factor.
This is due to the fact that mod is defined as:
$$x \text{ mod }{} y = x - \floor{\frac{x}{y}} \cdot y$$
If we substitute $M$ for $x$ and $f$ for $y$ and rearrange for $M$, we get said form.
\Cref{fig:basics:duff} shows an example of unrolling a loop with a non-divisible bound using a factor of eight\footnote{The original Duff's device used special C syntax to entangle the switch statement and loop~\cite{duff_1983}}.
Duff's device copies the loop body eight times and to ensure that the number of executions is correct, the first time around the code jumps to the corresponding instruction, depending on the need for fixup code.
Many compilers, such as GCC~\cite{gcc}, use Duff's device for unrolling loops and improving performance while keeping code size relatively small.
Further \libFIRM{} previously utilized Duff's device for unrolling loops with static bounds, but for which no unroll-factor could be determined~\cite{helmer10studienarbeit}.
\input{fig/loop-unrolling-duff}
\section{Overflow detection}\label{sec:basics:overflow}
When subtracting (or adding) two numbers that are integer-like the operation might cause an over- or underflow, because the integer is of a fixed bit two's complement representation.
Due to the therefore inherent limitation to the range of possible values, this problem is unavoidable, yet detectable.
In the following, we will take $t_{min}$ and $t_{max}$ to be the lower and upper bound of an integer representation.
\Cref{alg:basics:overflow:detect}~\cite{pmg_2009} shows a way to detect whether an overflow or underflow occurs for an operation $x - a, \text{ where } x, a \in \ZInt$, by checking whether the result increased or decreased relative to the bounds and comparing it to the expectation.
\input{fig/alg-detect-overflow.tex}