Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] add heterogeneous computing capabilities to UADK #638

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Liulongfang
Copy link
Collaborator

In the current UADK framework, the hardware acceleration function and
the software acceleration functioIn the current UADK framework, the hardware acceleration function and
the software acceleration function are merged to ensure that the software
function of instruction acceleration and the hardware function of hardware
offload can run at the same time, thus providing users with stronger performance

Under the heterogeneous scheduling mode enabled in the current scheduler,
the test performance data is as follows:

Alg Mode(1KB) Performance(MB/s) CPU
sync async sync async
sm4-ecb init1(HW) 454 1322 100% 200.00%
init2(HW+CE) 1445.1 1864 100% 195.00%
increase 218.30% 41.00% 0.00% -2.50%
sm3 init1(HW) 153.1 1481 99% 199.80%
init2(HW+CE) 431.5 508 100% 199.80%
increase 181.84% -65.70% 0.91% 0.00%

Alg Mode(8KB) Performance(MB/s) CPU
sync async sync async
sm4-ecb init1(HW) 1407.5 9092 100% 198.00%
init2(HW+CE) 3626.8 6021 100% 199.80%
increase 157.68% -33.78% 0.00% 0.91%
sm3 init1(HW) 960.4 5161.1 100% 183.80%
init2(HW+CE) 549.6 530.1 100% 199.80%
increase -42.77% -89.73% -0.40% 8.71%

Without increasing the CPU usage, the performance improvement of the
synchronous mode is very huge.
In the asynchronous mode, the performance is reduced because the CPU is
used for soft calculations, which can be solved by creating dedicated
calculation threads later.

Longfang Liu added 8 commits October 8, 2024 15:00
Synchronize internal development codes to keep basic functional
codes consistent.

Signed-off-by: Longfang Liu <[email protected]>
Synchronize interface layer code to ensure that basic functions
are consistent before adding new functions

Signed-off-by: Longfang Liu <[email protected]>
Synchronize the code of the test tool UADK Tools to ensure that
the test tool code is normal before adding new functions

Signed-off-by: Longfang Liu <[email protected]>
Added heterogeneous scheduling function in UADK. Combined hard
computing acceleration and soft computing instruction acceleration
functions,.
keeping both types of acceleration functions effective at the same
time. This improves acceleration capability.

Signed-off-by: Longfang Liu <[email protected]>
Added a scheduler for heterogeneous computing. Added a dynamic
scheduling solution. Balance the load of soft and hard computing
to maintain the best performance

Signed-off-by: Longfang Liu <[email protected]>
Added heterogeneous hybrid computing function for cipher,
digest and comp

Signed-off-by: Longfang Liu <[email protected]>
Add heterogeneous computing functions to the soft and hard computing
drivers of UADK. Adapt the drivers to ensure that different devices
can perform heterogeneous computing at the same time and provide
acceleration functions.

In order to ensure normal compilation, some drivers have been
processed with hac mode, and can be compiled directly through UADK_MK.SH

Signed-off-by: Longfang Liu <[email protected]>
In uadk tools, enable the heterogeneous computing function of init2
mode of cipher and digest. This allows the init2 interface to directly
complete heterogeneous computing.

Signed-off-by: Longfang Liu <[email protected]>
@Liulongfang
Copy link
Collaborator Author

Liulongfang commented Nov 27, 2024

Performance test results of the new framework:

    SM3 1024B Performance(MB/s)                

tds------init1(HW)-----init1(HW + CE)----increase
1-----------393.3--------437.1-------------11.14%
2----------762.1---------823.4------------8.04%
4----------1508.4-------1564.1------------3.69%
8----------3007.4------3074.9-----------2.24%
16---------4851.8-------5429.2-----------11.90%
32--------4854.1-------8698.8------------79.21%

    SM4 1024B Performance(MB/s)                

tds-------init1(HW)----init1(HW + CE)---------increase
1-------------461----------1482.5---------------221.58%
2------------914----------2575.4---------------181.77%
4-----------1699.9--------4737.6---------------178.70%
8-----------3301.5--------7327.8---------------121.95%
16----------5837.5--------9737.4---------------66.81%
32----------8897.7-------10432.4--------------17.25%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant