Optimization of Layernormalization #103

luoyu-intel · 2024-01-31T07:10:28Z

Type of Change

Implement AVX2 and AVX512F Layernormalization in BesTLA.
Use BesTLA kernel if src0 is contiguous.
~3x speedup.

Before:
GPTJ: perf_total_per_op_us[ NORM] = 0.477 ms
LLAMA2: perf_total_per_op_us[ RMS_NORM] = 0.362 ms

After:
GPTJ: perf_total_per_op_us[ NORM] = 0.055 ms
LLAMA2: perf_total_per_op_us[ RMS_NORM] = 0.117 ms

zhewang1-intc

LGTM

airMeng

I believe @zhewang1-intc implemented gemm+layernorm fusion before, is it necessary?

airMeng · 2024-01-31T08:23:25Z

bestla/CMakeLists.txt

@@ -4,7 +4,7 @@ project(bestla LANGUAGES CXX VERSION 0.1.0)
 file(GLOB headers ${PROJECT_NAME}/*.h ${PROJECT_NAME}/*.hpp)
 file(GLOB xbyak_headers ${PROJECT_NAME}/xbyak/*.h ${PROJECT_NAME}/xbyak/*.hpp)

-option(BTLA_USE_OPENMP "Enable OpenMP thread pool" ON)
+option(BTLA_USE_OPENMP "Enable OpenMP thread pool" OFF)


we already have a customized threadpool implemented?

It's better not to set it ON as default. It can be set in neural_speed as it uses OMP as default.

luoyu-intel · 2024-01-31T13:18:54Z

I believe @zhewang1-intc implemented gemm+layernorm fusion before, is it necessary?

Yes, it's for ONNX's definition. It can have scale and bias.

luoyu-intel added 6 commits January 31, 2024 11:29

add layernorm and UT

7919fb6

make scale pointer optional

5a397cf

add to ne_layers

6b2070a

fix dtype

3ed8980

use single thread for some cases

7b88a3b

add single thread and norm.

29a7858

luoyu-intel requested review from airMeng and zhewang1-intc and removed request for airMeng January 31, 2024 07:29

luoyu-intel added 3 commits January 31, 2024 15:31

clang-format

67f5499

remove template parameter

ca113f1

add omp for UT build

f84103e

luoyu-intel added BesTLA ready to merge labels Jan 31, 2024

zhewang1-intc approved these changes Jan 31, 2024

View reviewed changes

airMeng approved these changes Jan 31, 2024

View reviewed changes

VincyZhang merged commit 98ffee4 into main Jan 31, 2024
11 checks passed

luoyu-intel deleted the ort_patch branch May 21, 2024 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of Layernormalization #103

Optimization of Layernormalization #103

luoyu-intel commented Jan 31, 2024 •

edited

Loading

zhewang1-intc left a comment

airMeng left a comment

airMeng Jan 31, 2024

luoyu-intel Jan 31, 2024

luoyu-intel commented Jan 31, 2024

Optimization of Layernormalization #103

Optimization of Layernormalization #103

Conversation

luoyu-intel commented Jan 31, 2024 • edited Loading

Type of Change

zhewang1-intc left a comment

Choose a reason for hiding this comment

airMeng left a comment

Choose a reason for hiding this comment

airMeng Jan 31, 2024

Choose a reason for hiding this comment

luoyu-intel Jan 31, 2024

Choose a reason for hiding this comment

luoyu-intel commented Jan 31, 2024

luoyu-intel commented Jan 31, 2024 •

edited

Loading