Skip to content

Commit

Permalink
Merge pull request #215 from wpleonardo/master
Browse files Browse the repository at this point in the history
Add 6 tech articles and the figures
  • Loading branch information
liboyang0730 authored Dec 20, 2024
2 parents 8f026c8 + 1f1b5ce commit 39b1d3e
Show file tree
Hide file tree
Showing 56 changed files with 1,007 additions and 0 deletions.
311 changes: 311 additions & 0 deletions docs/blogs/tech/parallel-execution-I.md

Large diffs are not rendered by default.

137 changes: 137 additions & 0 deletions docs/blogs/tech/parallel-execution-II.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
slug: parallel-execution-II
title: 'Mastering Parallel Execution in OceanBase Database: Part 2 - Set the DOP'
---

> The degree of parallelism (DOP) refers to the number of worker threads used for the execution of a single data flow operation (DFO). Parallel execution is designed to make full use of multi-core resources. In the parallel execution (PX) framework of OceanBase Database, you can manually specify a DOP or use the [Auto DOP](https://en.oceanbase.com/docs/common-oceanbase-database-10000000001105913) feature to allow the database to automatically select one. This article introduces how to manually specify a DOP.
This is the second article of a seven-part series on parallel execution.

Part 1

[Introduction](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-I)

Part 2

[Set the DOP](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-II)

Part 3

[Concurrency Control and Queuing](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-III)

Part 4

[Parallel Execution Types](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-IV)

Part 5

[Parallel Execution Parameters](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-V)

Part 6

[Troubleshooting and Tuning Tips](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VI)

Part 7

[Get Started with a PoC Test](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VII)

2.1 Manually Specify a DOP
-----------

Once you specify a DOP for a table, all scans of the table are executed in parallel.

### 2.1.1 Methods for Specifying a DOP

#### Specify a DOP by using a table attribute

The following statements respectively specify a DOP for a primary table and an index table.
![1](/img/blogs/tech/parallel-execution-II/1.png)


Assume that an SQL statement involves only one table. If the primary table is queried in the statement, not only the DFOs of the primary table but also other DFOs are executed based on a DOP of 4. If the index table is queried in the SQL statement, not only the DFOs of the index table but also other DFOs are executed based on a DOP of 2.

If an SQL statement involves multiple tables, the maximum `PARALLEL` value is used as the DOP for the whole execution plan of the statement.

#### Specify a DOP by using a PARALLEL hint

You can use a global PARALLEL hint to specify a DOP for a whole SQL statement, or use a table-level PARALLEL hint to specify a DOP for a specific table. If PARALLEL hints are specified for multiple tables in an SQL statement, the DOP of the DFOs of each table is subject to the value of the corresponding table-level PARALLEL hint. If a DFO involves multiple tables, the maximum PARALLEL value is used as the DOP for the DFO.

![2](/img/blogs/tech/parallel-execution-II/2.png)
![3](/img/blogs/tech/parallel-execution-II/3.png)


For a DML statement, the preceding hint only enables parallel execution for the query part of the statement. The write part is still executed in serial. To enable parallel execution for the write part, you must add the hint ENABLE_PARALLEL_DML. Here is an example:

![4](/img/blogs/tech/parallel-execution-II/4.png)



Notice: You must specify a global PARALLEL hint for parallel DML. A table-level PARALLEL hint cannot enable parallel execution for the write part. For example, the following SQL statement does not enable parallel execution for DML statements:

`insert /*+ parallel(t3 3) enable_parallel_dml */ into t3 select * from t1;`



#### Specify a DOP for a session

If you specify a DOP for a session, all query statements in the session are executed based on the specified DOP. Note that the specified DOP is used even for single-row query statements. This can compromise the query performance.

![5](/img/blogs/tech/parallel-execution-II/5.png)

For a DML statement, the preceding statement only enables parallel execution for the query part of the statement. The write part is still executed in serial. To enable parallel execution for the write part, execute the following statement:

![6](/img/blogs/tech/parallel-execution-II/6.png)

### 2.1.2 DOP Priorities

The priorities of DOPs specified in different ways are sorted in descending order as follows: **DOP specified by a global hint > DOP specified by a table-level hint > DOP specified for a session > DOP specified for a table**.

The following example shows that when a global hint is specified, a table-level hint does not take effect.

![7](/img/blogs/tech/parallel-execution-II/7.png)


The following example shows that when a table-level hint is specified, the DOP specified for a session does not take effect.

![8](/img/blogs/tech/parallel-execution-II/8.png)

The following example shows that when a DOP is specified for a session, the DOP specified for the session has a higher priority than that specified for a table by using the `PARALLEL` attribute.

![9](/img/blogs/tech/parallel-execution-II/9.png)
![10](/img/blogs/tech/parallel-execution-II/10.png)



### 2.1.3 Principles for Specifying a DOP

Here are two basic principles: 1. **The purpose of setting a DOP is to make full use of the CPU resources**. 2. **A higher DOP does not guarantee better performance**. For example, when a tenant has 20 CPU cores:

* For simple single-table operations, such as scan, filtering, addition, deletion, and modification, the theoretical DOP is 20.
* For multi-table join queries and parallel DML (PDML) operations involving global indexes, the theoretical DOP is 10.
* For complex execution plans in the form of a right-deep tree, the theoretical DOP is around 7.

**Explanations** are as follows:

* For a single-table operation with only one DFO, all the 20 CPU cores can be allocated to this DFO.
* For a multi-table join, two DFOs are started at the same time to form a data pipeline, where one DFO is the producer and the other is the consumer. Each DFO can be allocated with 10 CPU cores.
* For a complex execution plan in the form of a right-deep tree, three DFOs are started at the same time. Each DFO can be allocated with around 7 CPU cores for efficient execution.

On top of the preceding **basic principles**, **tuning** is also required. Explanations are as follows:

* For a single-table operation with only one DFO, all the 20 CPU cores can be allocated to this DFO.

> If the DFO spends most of its time on I/O, setting the DOP to a value higher than 20, such as 25, can make full use of the CPU cores.
* For a multi-table join, two DFOs are started at the same time to form a data pipeline, where one DFO is the producer and the other is the consumer. Each DFO can be allocated with 10 CPU cores.

> However, it cannot be guaranteed that each DFO can make full use of the CPU cores allocated to it. Therefore, slightly increasing the DOP to 15, for example, can make better use of the CPU cores. \*\* However, we recommend that you do not increase the DOP indefinitely. A DOP of 50 brings no benefit but increases the thread and framework scheduling overhead.
* For a complex execution plan in the form of a right-deep tree, three DFOs are started at the same time. Each DFO can be allocated with around 7 CPU cores for efficient execution.

> For an execution plan, three DFOs need to be started only in some steps. In most steps, only two DFOs need to be started at the same time. In this case, a DOP of 10 might be better than a DOP of 7.
After tuning, the situation of the tenant with 20 CPU cores may be as follows:

* For simple single-table operations, such as scan, filtering, addition, deletion, and modification, the **actual DOP** is 30.
* For multi-table join queries and PDML operations involving global indexes, the **actual DOP** is 15.
* For complex execution plans in the form of a right-deep tree, the **actual DOP** is between 10 to 15.
84 changes: 84 additions & 0 deletions docs/blogs/tech/parallel-execution-III.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
slug: parallel-execution-III
title: 'Mastering Parallel Execution in OceanBase Database: Part 3 - Concurrency Control and Queuing'
---

> Parallel queries may queue while waiting for threads. This article introduces thread management in parallel execution.
This is the third article of a seven-part series on parallel execution.

Part 1

[Introduction](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-I)

Part 2

[Set the DOP](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-II)

Part 3

[Concurrency Control and Queuing](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-III)

Part 4

[Parallel Execution Types](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-IV)

Part 5

[Parallel Execution Parameters](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-V)

Part 6

[Troubleshooting and Tuning Tips](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VI)

Part 7

[Get Started with a PoC Test](https://oceanbase.github.io/docs/blogs/tech/parallel-execution-VII)

3.1 Concurrency Control in Parallel Execution
------------

You can use the tenant-level variable `PARALLEL_SERVERS_TARGET` to specify the number of parallel execution (PX) worker threads available for the tenant on each node. A parallel query requests worker threads from all related OBServer nodes before being executed. If any OBServer node fails to provide sufficient worker threads, the parallel query is not executed. In this case, the parallel query is put back in the queue. When it is scheduled the next time, it retries to request threads from the nodes until sufficient worker threads are obtained. After the whole query is completed, the requested worker threads are immediately released.

The process of trying to request worker threads, requeuing due to insufficient thread resources, being scheduled again, and retrying to request worker threads is called **parallel query queuing**. The allocation of worker threads on all OBServer nodes is managed by a module named **PX resource manager**.

For each parallel query, the PX resource manager splits the execution plan of the query into multiple data flow operations (DFOs), simulates the DFO scheduling process, and calculates the maximum number of worker threads required for the query on each OBServer node based on the PARALLEL hint or table-level PARALLEL attribute. This group of threads is called a **resource vector**.

The resource vector is a logical concept used for concurrency control and queuing. After a parallel query requests sufficient worker threads from the PX resource manager based on the resource vector, the parallel query execution starts. During the execution, physical threads are requested and released as different DFOs are scheduled and executed. However, logical threads are not returned to the PX resource manager. The resource vector is returned to the PX resource manager only after the parallel query is completed.

When a large number of parallel queries try to request threads from the PX resource manager, threads are allocated based on the First-come, First-serve (FCFS) strategy until no thread is left or the remaining threads are insufficient for any query. All subsequent queries will wait in the queue and retry to request threads when being scheduled again.



3.2 Allocation of PX Worker Threads
--------------

Each OBServer node of the tenant has a **PX thread pool** for executing parallel queries. When the threads in the thread pool are insufficient, the thread pool is dynamically scaled out. If threads in the thread pool remain idle for more than 10 minutes, the thread pool is scaled in to 10 threads. If threads in the thread pool remain idle for more than 60 minutes, the thread pool can be scaled in to 0 threads.

When a parallel query is scheduled, each DFO can obtain required PX threads from the PX thread pool on the corresponding OBServer node. By default, the number of threads allocated to a DFO on an OBServer node cannot exceed the value of `MIN_CPU` of the tenant × 10. If the number of threads requested by a DFO exceeds this value, the thread pool still allocates `MIN_CPU` × 10 threads to the DFO.



3.3 Two-level Resource Control Model
------------

Any parallel query experiences two levels of resource control.

* Global control: The parallel query requests a resource vector with sufficient PX threads from the PX resource manager.
* Local control: The PX thread pool allocates the expected number of physical threads.

**Global control** is responsible for resource acquisition in distributed scenarios. **Local control** is responsible only for resource allocation in the thread pool on a single node. Global control ensures successful execution of a query that passes the check by avoiding the situation where resources cannot be obtained during running. Local control ensures that in extreme circumstances, a single DFO of a query will not request an excessively large number of physical threads, avoiding waste of thread resources. A parallel query that passes the check in the global control phase can be successfully executed. Sufficient physical threads are available for this parallel query regardless of the degree of parallelism (DOP).

![1705634075](/img/blogs/tech/parallel-execution-III/1705634075415.png)



3.4 View Related to PX Resource Manager
-----------------

The PX resource manager can query the `GV$OB_PX_TARGET_MONITOR` view for the thread usage information on each OBServer node of a tenant. For more information about fields in the view, see the **GV$OB_PX_TARGET_MONITOR** topic of the OceanBase Database documentation.

![1](/img/blogs/tech/parallel-execution-III/1.png)


The global resource usage status queried at a specific moment may be inconsistent on different OBServer nodes. However, the global status is synchronized every 500 ms at the background. Generally, the global resource usage status queried on the OBServer nodes is basically consistent without obvious deviations.
Loading

0 comments on commit 39b1d3e

Please sign in to comment.