doc(wiki): parallel patterns

discopop-project · Jan 10, 2024 · c658a06 · c658a06
1 parent 167aef1
commit c658a06
Show file tree

Hide file tree

Showing 8 changed files with 431 additions and 11 deletions.
diff --git a/DEPRECATED_DOCS/DEPRECATED_Pattern_Detection/Patterns/Pipeline.md b/DEPRECATED_DOCS/DEPRECATED_Pattern_Detection/Patterns/Pipeline.md
@@ -94,9 +94,7 @@ Stages:
 Node: 1:13
 	Start line: 1:4
 	End line: 1:4
-	pragma: "#pragma omp task"
-	first private: ['i']
-	private: []
+
 	shared: ['d', 'in']
 	reduction: []
 	InDeps: []

diff --git a/docs/data/Parallel_patterns.md b/docs/data/Parallel_patterns.md
diff --git a/docs/data/Parallel_patterns/Patterns.md b/docs/data/Parallel_patterns/Patterns.md
@@ -0,0 +1,9 @@
+---
+layout: default
+title: Parallel patterns
+parent: Data
+has_children: true
+permalink: /Data/Patterns
+---
+
+# Parallel Patterns
diff --git a/docs/data/Parallel_patterns/doall.md b/docs/data/Parallel_patterns/doall.md
@@ -0,0 +1,78 @@
+---
+layout: default
+title: Do-All
+has_children: true
+parent: Parallel Patterns
+grand_parent: Data
+nav_order: 1
+---
+
+
+# Do-All Loop
+
+## Reporting
+Do-All Loops are reported in the following format:
+```
+Do-all at: 1:2
+Start line: 1:7
+End line: 1:9
+pragma: "#pragma omp parallel for"
+private: []
+shared: []
+first private: []
+reduction: []
+last private: []
+```
+
+## Interpretation
+The reported values shall be interpreted as follows:
+* `Do-all at: <file_id>:<cu_id>`, where the respective parent file can be looked up in the `FileMapping.txt` using `file_id` and `cu_id` can be used for a look up in `Data.xml`
+* `Start line: <file_id>:<line_num>`, where `line_num` refers to the source code line of the parallelizable loop.
+* `End line: <file_id>:<line_num>`, where `line_num` refers to the last line of the parallelizable loop.
+<!--
+Note: Disabled, since these values are not determined correctly at the moment. Values will be added to the result once their implementations are fixed.
+* `iterations: <num>` specifies the counted amount of iterations the loop has executed during the profiling.
+* `instructions: <num>` specifies the summed number of instructions executed within one iteration of the loop body
+* `TODO: workload: <num>` provides an arbitrary value which represents the computational weight of one iteration of the loop.
+-->
+* `pragma:`shows which type of OpenMP pragma shall be inserted before the target loop in order to parallelize it.
+* `private: [<vars>]` lists a set of variables which have been identified as thread-`private`
+* The same interpretation applies to the following values aswell:
+    * `shared`
+    * `first_private`
+    * `last_private`
+* `reduction: [<operation>:<var>]` specifies a set of identified reduction  operations and variables. For `Do-All` suggestions, this list is always empty.
+
+## Implementation
+In order to implement a suggestion, first open the source code file corresponding to `file_id` and navigate to line `Start line -> <line_num>`.
+Insert `pragma` before the loop begins.
+In order to ensure a valid parallelization, you need to add the following clauses to the OpenMP pragma, if the respective lists are not empty:
+* `private` -> clause: `private(<vars>)`
+* `shared` -> clause: `shared(<vars>)`
+* `first_private` -> clause: `firstprivate(<vars>)`
+* `last_private` -> clause: `lastprivate(<vars>)`
+* `reduction`-> clause: `reduction(<operation>:<vars>)`
+
+### Example
+As an example, we will analyze the following code snippet for parallelization potential. All location and meta data will be ignored for the sake of simplicity.
+
+    for (int i = 0; i < 10; ++i) {
+        local_array[i] += 1;
+    }
+
+Analyzing this code snippet results in the following parallelization suggestion:
+
+    pragma: "#pragma omp parallel for"
+    private: ["i"]
+    shared: ["local_array"]
+    first private: []
+    reduction: []
+    last private: []
+
+
+After interpreting and implementing the suggestion, the resulting, now parallel, source code could look as follows:
+
+    #pragma omp parallel for private(i) shared(local_array)
+    for (int i = 0; i < 10; ++i) {
+        local_array[i] += 1;
+    }
diff --git a/docs/data/Parallel_patterns/geometric_decomposition.md b/docs/data/Parallel_patterns/geometric_decomposition.md
@@ -0,0 +1,118 @@
+---
+layout: default
+title: Geometric Decomposition
+has_children: true
+parent: Parallel patterns
+grand_parent: Data
+nav_order: 2
+---
+
+
+# Geometric Decomposition
+
+## Reporting
+Possible geometric decompositions are reported in the following format:
+```
+Geometric decomposition at: 1:9
+Start line: 1:26
+End line: 1:36
+Do-All loops: ['1:11']
+Reduction loops: []
+	Number of tasks: 24
+	Chunk limits: 1000
+	pragma: for (i = 0; i < num-tasks; i++) #pragma omp task]
+	private: []
+	shared: []
+	first private: ['i']
+	reduction: []
+	last private: []
+```
+
+## Interpretation
+The reported values shall be interpreted as follows:
+* `Geometric decomposition at: <file_id>:<cu_id>`, where the respective parent file can be looked up in the `FileMapping.txt` using `file_id` and `cu_id` can be used for a look up in `Data.xml`
+* `Start line: <file_id>:<line_num>`, where `line_num` refers to the first source code line of the potential geometrically decomposable code.
+* `End line: <file_id>:<line_num>`, where `line_num` refers to the last line of the suggested pattern.
+* `Do-All loops: [<file_id>:<cu_id>]` specifies which [Do-all loops](Do-All.md) can be part of the geometric decomposition.
+* `Reduction loops: [<file_id>:<ci_id>]` specifies which [Reduction loops](Reduction.md) can be part of the geometric decomposition.
+* `Number of tasks: <int>` specifies the number of tasks which should or can be spawned in order to process the geometric decomposition.
+* `Chunk limits: <int>` determine the size of a workload package (amount of iterations) for each individual spawned task.
+* `private, shared, first_private` and `last_private` indicate variables which should be mentioned within the respective OpenMP data sharing clauses.
+* `reduction: [<operation>:<var>]` specifies a set of identified reduction operations and variables.
+
+
+## Implementation
+In order to implement a geometric decomposition, first open the source code file corresponding to `file_id` and navigate to line `Start line -> <line_num>`.
+Insert `pragma` before each of the loops mentioned in `Do-all loops` and `Reduction loops`. Make sure to replace `num-tasks` with the specified `Number of tasks`, or insert a respective variable into the source code.
+Modify the loop conditions of the original source code in order to allow a geometric decomposition. Each task should be responsible for processing a chunk of the size `Chunk limits`.
+In order to ensure a valid parallelization, you need to add the following clauses to the OpenMP pragma, if the respective lists are not empty:
+* `private` -> clause: `private(<vars>)`
+* `shared` -> clause: `shared(<vars>)`
+* `first_private` -> clause: `firstprivate(<vars>)`
+* `last_private` -> clause: `lastprivate(<vars>)`
+* `reduction`-> clause: `reduction(<operation>:<vars>)`
+
+### Example
+As an example, we will analyze the following code snippet for parallelization potential. Some location and meta data will be ignored for the sake of simplicity.
+
+    int main( void)
+    {
+        int i;
+        int d=20,a=22, b=44,c=90;
+        for (i=0; i<100; i++) {
+            a = foo(i, d);
+            b = bar(a, d);
+            c = delta(b, d);
+        }
+        a = b;
+        return 0;
+    }
+
+Analyzing this code snippet results in the following geometric decomposition suggestion:
+```
+Geometric decomposition at: 1:1
+Start line: 1:2
+End line: 1:12
+Type: Geometric Decomposition Pattern
+Do-All loops: ['1:3']  // line 5
+Reduction loops: []
+	Number of tasks: 10
+	Chunk limits: 10
+	pragma: for (i = 0; i < num-tasks; i++) #pragma omp task]
+	private: []
+	shared: []
+	first private: ['i']
+	reduction: []
+	last private: []
+```
+
+After interpreting and implementing the suggestion, the resulting, now parallel, source code could look as follows.
+Since `i` has been used in the original source code already, the inserted `pragma` uses `x` instead.
+As a last modification, the loop conditions in the original source code need to be modified slightly in order to allow the decomposition.
+For a simpler interpretation of the example we have added the `chunk_size` and `tid` variables.
+Note: Since the geometric decomposition relies on the identification of the thread number, the outermost `for` loop should be located inside a `parallel region`. However, depending on the specific analyzed source code, a surrounding `parallel region` might already exist or a different location for the surrounding `parallel region` may be more beneficial.
+
+    int main( void)
+    {
+        int i;
+        int d=20,a=22, b=44,c=90;
+
+        #pragma omp parallel
+        #pragma omp single
+        for (int x = 0; x < 10; x++ ) {
+            #pragma omp task
+            {
+                int tid = omp_get_thread_num();
+                int chunk_size = 10;     // value of Chunk limits
+
+                for (i = tid*chunk_size; i < tid*chunk_size + chunk_size; i++) {
+                    a = foo(i, d);
+                    b = bar(a, d);
+                    c = delta(b, d);
+                }
+            }
+        }
+
+        a = b;
+        return 0;
+    }
diff --git a/docs/data/Parallel_patterns/pipeline.md b/docs/data/Parallel_patterns/pipeline.md
@@ -0,0 +1,143 @@
+---
+layout: default
+title: Pipeline
+has_children: true
+parent: Parallel patterns
+grand_parent: Data
+nav_order: 3
+---
+
+
+# Pipeline
+
+## Reporting
+
+### Pipelines
+Pipelines are reported in the following format:
+```
+Pipeline at: 1:11
+Start line: 1:30
+End line: 1:34
+Stages:
+    <stage_1>
+
+    <stage_2>
+
+    ...
+```
+The reported values shall be interpreted as follows:
+* `Pipeline at: <file_id>:<cu_id>`, where the respective parent file can be looked up in the `FileMapping.txt` using `file_id` and `cu_id` can be used for a look up in `Data.xml`
+* `Start line: <file_id>:<line_num>`, where `line_num` refers to the first source code line of the identified pipeline.
+* `End line: <file_id>:<line_num>`, where `line_num` refers to the last line of the pipeline loop.
+* `Stages` defines a list of stages contained in the identified pipeline. The specific format of the stages is described in the following.
+
+### Pipeline Stages
+Individual stages of a pipeline are reported in the following format:
+```
+Node: 1:13
+Start line: 1:31
+End line: 1:31
+pragma: "#pragma omp task"
+first private: ['i']
+private: []
+shared: ['d', 'in']
+reduction: []
+InDeps: []
+OutDeps: ['a']
+InOutDeps: []
+```
+
+The reported values shall be interpreted as follows:
+* `Node: <file_id>:<cu_id>`, where the respective parent file can be looked up in the `FileMapping.txt` using `file_id` and `cu_id` can be used for a look up in `Data.xml`
+* `Start line: <file_id>:<line_num>`, where `line_num` refers to the first source code line of the identified pipeline stage.
+* `End line: <file_id>:<line_num>`, where `line_num` refers to the last line of the stage.
+* `pragma:`shows which type of OpenMP pragma shall be inserted before the `start line`.
+* `private: [<vars>]` lists a set of variables which have been identified as thread-`private`
+* The same interpretation applies to the following values aswell:
+    * `shared`
+    * `first_private`
+* `reduction: [<operation>:<var>]` specifies a set of identified reduction operations and variables.
+* `InDeps: [<vars>]` specifies `in`-dependencies according to the [OpenMP depend clause](https://www.openmp.org/spec-html/5.0/openmpsu99.html).
+* `OutDeps: [<vars>]` specifies `out`-dependencies according to the [OpenMP depend clause](https://www.openmp.org/spec-html/5.0/openmpsu99.html).
+* `InOutDeps: [<vars>]` specifies `inout`-dependencies according to the [OpenMP depend clause](https://www.openmp.org/spec-html/5.0/openmpsu99.html).
+
+
+## Implementation
+In order to implement a suggested pipeline, first navigate to the source code location specified by `Pipeline at:`.
+For each individual stage the following OpenMP pragmas and closes need to be added to the source code, if the respective lists are not empty:
+* Insert `pragma` prior to the `start line` mentioned by the stage.
+* If `private` is not empty, add the clause `private(<vars>)`, where vars are separated by commas to the pragma.
+* Do the same for:
+    * `shared` -> clause: `shared(<vars>)`
+    * `first_private` -> clause: `firstprivate(<vars>)`
+    * `reduction`-> clause: `reduction(<operation>:<vars>)`
+    * `InDeps` -> clause: `depend(in:<vars>)`
+    * `OutDeps` -> clause: `depend(out:<vars>)`
+    * `InOutDeps` -> clause: `depend(inout:<vars>)`
+
+
+### Example
+As an example, we will analyze the following code snippet for parallelization potential. Some location and meta data will be ignored for the sake of simplicity.
+
+    int i;
+    int d=20,a=22, b=44,c=90;
+    for (i=0; i<100; i++) {
+        a = foo(i, d);
+        b = bar(a, d);
+        c = delta(b, d);
+    }
+    a = b;
+
+Analyzing this code snippet results in the following parallelization suggestion:
+```
+Pipeline at:
+Start line: 1:3
+End line: 1:7
+Stages:
+Node: 1:13
+	Start line: 1:4
+	End line: 1:4
+
+	shared: ['d', 'in']
+	reduction: []
+	InDeps: []
+	OutDeps: ['a']
+	InOutDeps: []
+
+	Start line: 1:5
+	End line: 1:5
+	pragma: "#pragma omp task"
+	first private: []
+	private: []
+	shared: ['d', 'in']
+	reduction: []
+	InDeps: ['a']
+	OutDeps: ['b']
+	InOutDeps: []
+
+	Start line: 1:6
+	End line: 1:7
+	pragma: "#pragma omp task"
+	first private: []
+	private: ['c']
+	shared: ['d', 'in']
+	reduction: []
+	InDeps: ['b']
+	OutDeps: []
+	InOutDeps: []
+```
+
+After interpreting and implementing the suggestion, the resulting, now parallel, source code could look as follows:
+
+    int i;
+    int d=20,a=22, b=44,c=90;
+    for (i=0; i<100; i++) {
+        #pragma omp task firsprivate(i) shared(d, in) depend(out:a)
+        a = foo(i, d);
+        #pragma omp task shared(d, in) depend(in:a) depend(out:b)
+        b = bar(a, d);
+        #pragma omp task private(c) shared(d, in) depend(in: b)
+        c = delta(b, d);
+    }
+    a = b;
+