Skip to content

Commit

Permalink
doc(wiki): parallel patterns
Browse files Browse the repository at this point in the history
  • Loading branch information
lukasrothenberger committed Jan 10, 2024
1 parent 167aef1 commit c658a06
Show file tree
Hide file tree
Showing 8 changed files with 431 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,7 @@ Stages:
Node: 1:13
Start line: 1:4
End line: 1:4
pragma: "#pragma omp task"
first private: ['i']
private: []
shared: ['d', 'in']
reduction: []
InDeps: []
Expand Down
8 changes: 0 additions & 8 deletions docs/data/Parallel_patterns.md

This file was deleted.

9 changes: 9 additions & 0 deletions docs/data/Parallel_patterns/Patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
layout: default
title: Parallel patterns
parent: Data
has_children: true
permalink: /Data/Patterns
---

# Parallel Patterns
78 changes: 78 additions & 0 deletions docs/data/Parallel_patterns/doall.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
layout: default
title: Do-All
has_children: true
parent: Parallel Patterns
grand_parent: Data
nav_order: 1
---


# Do-All Loop

## Reporting
Do-All Loops are reported in the following format:
```
Do-all at: 1:2
Start line: 1:7
End line: 1:9
pragma: "#pragma omp parallel for"
private: []
shared: []
first private: []
reduction: []
last private: []
```

## Interpretation
The reported values shall be interpreted as follows:
* `Do-all at: <file_id>:<cu_id>`, where the respective parent file can be looked up in the `FileMapping.txt` using `file_id` and `cu_id` can be used for a look up in `Data.xml`
* `Start line: <file_id>:<line_num>`, where `line_num` refers to the source code line of the parallelizable loop.
* `End line: <file_id>:<line_num>`, where `line_num` refers to the last line of the parallelizable loop.
<!--
Note: Disabled, since these values are not determined correctly at the moment. Values will be added to the result once their implementations are fixed.
* `iterations: <num>` specifies the counted amount of iterations the loop has executed during the profiling.
* `instructions: <num>` specifies the summed number of instructions executed within one iteration of the loop body
* `TODO: workload: <num>` provides an arbitrary value which represents the computational weight of one iteration of the loop.
-->
* `pragma:`shows which type of OpenMP pragma shall be inserted before the target loop in order to parallelize it.
* `private: [<vars>]` lists a set of variables which have been identified as thread-`private`
* The same interpretation applies to the following values aswell:
* `shared`
* `first_private`
* `last_private`
* `reduction: [<operation>:<var>]` specifies a set of identified reduction operations and variables. For `Do-All` suggestions, this list is always empty.

## Implementation
In order to implement a suggestion, first open the source code file corresponding to `file_id` and navigate to line `Start line -> <line_num>`.
Insert `pragma` before the loop begins.
In order to ensure a valid parallelization, you need to add the following clauses to the OpenMP pragma, if the respective lists are not empty:
* `private` -> clause: `private(<vars>)`
* `shared` -> clause: `shared(<vars>)`
* `first_private` -> clause: `firstprivate(<vars>)`
* `last_private` -> clause: `lastprivate(<vars>)`
* `reduction`-> clause: `reduction(<operation>:<vars>)`

### Example
As an example, we will analyze the following code snippet for parallelization potential. All location and meta data will be ignored for the sake of simplicity.

for (int i = 0; i < 10; ++i) {
local_array[i] += 1;
}

Analyzing this code snippet results in the following parallelization suggestion:

pragma: "#pragma omp parallel for"
private: ["i"]
shared: ["local_array"]
first private: []
reduction: []
last private: []


After interpreting and implementing the suggestion, the resulting, now parallel, source code could look as follows:

#pragma omp parallel for private(i) shared(local_array)
for (int i = 0; i < 10; ++i) {
local_array[i] += 1;
}
118 changes: 118 additions & 0 deletions docs/data/Parallel_patterns/geometric_decomposition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
layout: default
title: Geometric Decomposition
has_children: true
parent: Parallel patterns
grand_parent: Data
nav_order: 2
---


# Geometric Decomposition

## Reporting
Possible geometric decompositions are reported in the following format:
```
Geometric decomposition at: 1:9
Start line: 1:26
End line: 1:36
Do-All loops: ['1:11']
Reduction loops: []
Number of tasks: 24
Chunk limits: 1000
pragma: for (i = 0; i < num-tasks; i++) #pragma omp task]
private: []
shared: []
first private: ['i']
reduction: []
last private: []
```

## Interpretation
The reported values shall be interpreted as follows:
* `Geometric decomposition at: <file_id>:<cu_id>`, where the respective parent file can be looked up in the `FileMapping.txt` using `file_id` and `cu_id` can be used for a look up in `Data.xml`
* `Start line: <file_id>:<line_num>`, where `line_num` refers to the first source code line of the potential geometrically decomposable code.
* `End line: <file_id>:<line_num>`, where `line_num` refers to the last line of the suggested pattern.
* `Do-All loops: [<file_id>:<cu_id>]` specifies which [Do-all loops](Do-All.md) can be part of the geometric decomposition.
* `Reduction loops: [<file_id>:<ci_id>]` specifies which [Reduction loops](Reduction.md) can be part of the geometric decomposition.
* `Number of tasks: <int>` specifies the number of tasks which should or can be spawned in order to process the geometric decomposition.
* `Chunk limits: <int>` determine the size of a workload package (amount of iterations) for each individual spawned task.
* `private, shared, first_private` and `last_private` indicate variables which should be mentioned within the respective OpenMP data sharing clauses.
* `reduction: [<operation>:<var>]` specifies a set of identified reduction operations and variables.


## Implementation
In order to implement a geometric decomposition, first open the source code file corresponding to `file_id` and navigate to line `Start line -> <line_num>`.
Insert `pragma` before each of the loops mentioned in `Do-all loops` and `Reduction loops`. Make sure to replace `num-tasks` with the specified `Number of tasks`, or insert a respective variable into the source code.
Modify the loop conditions of the original source code in order to allow a geometric decomposition. Each task should be responsible for processing a chunk of the size `Chunk limits`.
In order to ensure a valid parallelization, you need to add the following clauses to the OpenMP pragma, if the respective lists are not empty:
* `private` -> clause: `private(<vars>)`
* `shared` -> clause: `shared(<vars>)`
* `first_private` -> clause: `firstprivate(<vars>)`
* `last_private` -> clause: `lastprivate(<vars>)`
* `reduction`-> clause: `reduction(<operation>:<vars>)`

### Example
As an example, we will analyze the following code snippet for parallelization potential. Some location and meta data will be ignored for the sake of simplicity.

int main( void)
{
int i;
int d=20,a=22, b=44,c=90;
for (i=0; i<100; i++) {
a = foo(i, d);
b = bar(a, d);
c = delta(b, d);
}
a = b;
return 0;
}

Analyzing this code snippet results in the following geometric decomposition suggestion:
```
Geometric decomposition at: 1:1
Start line: 1:2
End line: 1:12
Type: Geometric Decomposition Pattern
Do-All loops: ['1:3'] // line 5
Reduction loops: []
Number of tasks: 10
Chunk limits: 10
pragma: for (i = 0; i < num-tasks; i++) #pragma omp task]
private: []
shared: []
first private: ['i']
reduction: []
last private: []
```

After interpreting and implementing the suggestion, the resulting, now parallel, source code could look as follows.
Since `i` has been used in the original source code already, the inserted `pragma` uses `x` instead.
As a last modification, the loop conditions in the original source code need to be modified slightly in order to allow the decomposition.
For a simpler interpretation of the example we have added the `chunk_size` and `tid` variables.
Note: Since the geometric decomposition relies on the identification of the thread number, the outermost `for` loop should be located inside a `parallel region`. However, depending on the specific analyzed source code, a surrounding `parallel region` might already exist or a different location for the surrounding `parallel region` may be more beneficial.

int main( void)
{
int i;
int d=20,a=22, b=44,c=90;

#pragma omp parallel
#pragma omp single
for (int x = 0; x < 10; x++ ) {
#pragma omp task
{
int tid = omp_get_thread_num();
int chunk_size = 10; // value of Chunk limits

for (i = tid*chunk_size; i < tid*chunk_size + chunk_size; i++) {
a = foo(i, d);
b = bar(a, d);
c = delta(b, d);
}
}
}

a = b;
return 0;
}
143 changes: 143 additions & 0 deletions docs/data/Parallel_patterns/pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
layout: default
title: Pipeline
has_children: true
parent: Parallel patterns
grand_parent: Data
nav_order: 3
---


# Pipeline

## Reporting

### Pipelines
Pipelines are reported in the following format:
```
Pipeline at: 1:11
Start line: 1:30
End line: 1:34
Stages:
<stage_1>
<stage_2>
...
```
The reported values shall be interpreted as follows:
* `Pipeline at: <file_id>:<cu_id>`, where the respective parent file can be looked up in the `FileMapping.txt` using `file_id` and `cu_id` can be used for a look up in `Data.xml`
* `Start line: <file_id>:<line_num>`, where `line_num` refers to the first source code line of the identified pipeline.
* `End line: <file_id>:<line_num>`, where `line_num` refers to the last line of the pipeline loop.
* `Stages` defines a list of stages contained in the identified pipeline. The specific format of the stages is described in the following.

### Pipeline Stages
Individual stages of a pipeline are reported in the following format:
```
Node: 1:13
Start line: 1:31
End line: 1:31
pragma: "#pragma omp task"
first private: ['i']
private: []
shared: ['d', 'in']
reduction: []
InDeps: []
OutDeps: ['a']
InOutDeps: []
```

The reported values shall be interpreted as follows:
* `Node: <file_id>:<cu_id>`, where the respective parent file can be looked up in the `FileMapping.txt` using `file_id` and `cu_id` can be used for a look up in `Data.xml`
* `Start line: <file_id>:<line_num>`, where `line_num` refers to the first source code line of the identified pipeline stage.
* `End line: <file_id>:<line_num>`, where `line_num` refers to the last line of the stage.
* `pragma:`shows which type of OpenMP pragma shall be inserted before the `start line`.
* `private: [<vars>]` lists a set of variables which have been identified as thread-`private`
* The same interpretation applies to the following values aswell:
* `shared`
* `first_private`
* `reduction: [<operation>:<var>]` specifies a set of identified reduction operations and variables.
* `InDeps: [<vars>]` specifies `in`-dependencies according to the [OpenMP depend clause](https://www.openmp.org/spec-html/5.0/openmpsu99.html).
* `OutDeps: [<vars>]` specifies `out`-dependencies according to the [OpenMP depend clause](https://www.openmp.org/spec-html/5.0/openmpsu99.html).
* `InOutDeps: [<vars>]` specifies `inout`-dependencies according to the [OpenMP depend clause](https://www.openmp.org/spec-html/5.0/openmpsu99.html).


## Implementation
In order to implement a suggested pipeline, first navigate to the source code location specified by `Pipeline at:`.
For each individual stage the following OpenMP pragmas and closes need to be added to the source code, if the respective lists are not empty:
* Insert `pragma` prior to the `start line` mentioned by the stage.
* If `private` is not empty, add the clause `private(<vars>)`, where vars are separated by commas to the pragma.
* Do the same for:
* `shared` -> clause: `shared(<vars>)`
* `first_private` -> clause: `firstprivate(<vars>)`
* `reduction`-> clause: `reduction(<operation>:<vars>)`
* `InDeps` -> clause: `depend(in:<vars>)`
* `OutDeps` -> clause: `depend(out:<vars>)`
* `InOutDeps` -> clause: `depend(inout:<vars>)`


### Example
As an example, we will analyze the following code snippet for parallelization potential. Some location and meta data will be ignored for the sake of simplicity.

int i;
int d=20,a=22, b=44,c=90;
for (i=0; i<100; i++) {
a = foo(i, d);
b = bar(a, d);
c = delta(b, d);
}
a = b;

Analyzing this code snippet results in the following parallelization suggestion:
```
Pipeline at:
Start line: 1:3
End line: 1:7
Stages:
Node: 1:13
Start line: 1:4
End line: 1:4
shared: ['d', 'in']
reduction: []
InDeps: []
OutDeps: ['a']
InOutDeps: []
Start line: 1:5
End line: 1:5
pragma: "#pragma omp task"
first private: []
private: []
shared: ['d', 'in']
reduction: []
InDeps: ['a']
OutDeps: ['b']
InOutDeps: []
Start line: 1:6
End line: 1:7
pragma: "#pragma omp task"
first private: []
private: ['c']
shared: ['d', 'in']
reduction: []
InDeps: ['b']
OutDeps: []
InOutDeps: []
```

After interpreting and implementing the suggestion, the resulting, now parallel, source code could look as follows:

int i;
int d=20,a=22, b=44,c=90;
for (i=0; i<100; i++) {
#pragma omp task firsprivate(i) shared(d, in) depend(out:a)
a = foo(i, d);
#pragma omp task shared(d, in) depend(in:a) depend(out:b)
b = bar(a, d);
#pragma omp task private(c) shared(d, in) depend(in: b)
c = delta(b, d);
}
a = b;

Loading

0 comments on commit c658a06

Please sign in to comment.