Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Optimization of Layernormalization #103

Merged
merged 9 commits into from
Jan 31, 2024
Merged

Optimization of Layernormalization #103

merged 9 commits into from
Jan 31, 2024

Conversation

luoyu-intel
Copy link
Contributor

@luoyu-intel luoyu-intel commented Jan 31, 2024

Type of Change

Implement AVX2 and AVX512F Layernormalization in BesTLA.
Use BesTLA kernel if src0 is contiguous.
~3x speedup.

Before:
GPTJ: perf_total_per_op_us[ NORM] = 0.477 ms
LLAMA2: perf_total_per_op_us[ RMS_NORM] = 0.362 ms

After:
GPTJ: perf_total_per_op_us[ NORM] = 0.055 ms
LLAMA2: perf_total_per_op_us[ RMS_NORM] = 0.117 ms

@luoyu-intel luoyu-intel requested review from airMeng and zhewang1-intc and removed request for airMeng January 31, 2024 07:29
Copy link
Contributor

@zhewang1-intc zhewang1-intc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@airMeng airMeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @zhewang1-intc implemented gemm+layernorm fusion before, is it necessary?

@@ -4,7 +4,7 @@ project(bestla LANGUAGES CXX VERSION 0.1.0)
file(GLOB headers ${PROJECT_NAME}/*.h ${PROJECT_NAME}/*.hpp)
file(GLOB xbyak_headers ${PROJECT_NAME}/xbyak/*.h ${PROJECT_NAME}/xbyak/*.hpp)

option(BTLA_USE_OPENMP "Enable OpenMP thread pool" ON)
option(BTLA_USE_OPENMP "Enable OpenMP thread pool" OFF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a customized threadpool implemented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better not to set it ON as default. It can be set in neural_speed as it uses OMP as default.

@VincyZhang VincyZhang merged commit 98ffee4 into main Jan 31, 2024
11 checks passed
@luoyu-intel
Copy link
Contributor Author

I believe @zhewang1-intc implemented gemm+layernorm fusion before, is it necessary?

Yes, it's for ONNX's definition. It can have scale and bias.

@luoyu-intel luoyu-intel deleted the ort_patch branch May 21, 2024 03:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants