Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNN kernels to support GPT decoder models and additional utilities #87

Merged
merged 70 commits into from
Feb 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
6d4ab58
banshee: add dump routine to banshee to avoid missing declarations
Jan 9, 2024
ed7f3f3
sw: add loop unrolled FP32 baseline GEMM
Jan 10, 2024
68f2cc7
sw: formatting and cluster offset rename due to Occamy build err
Jan 10, 2024
a18cdd7
dnn: Refactor and verify layernorm
colluca Oct 26, 2023
b7c4fe7
layernorm: Trivial multi-cluster implementation
colluca Oct 28, 2023
bb4ab29
Extend layout utils to accept HW config as input
colluca Oct 28, 2023
69deb71
softmax: Add IPC verification
colluca Oct 30, 2023
7abcc5c
sw: remove softmax + layernorm from run script due to missing FDIV an…
Jan 12, 2024
0fcdf55
lint: fix too long line err
Jan 12, 2024
35bd898
gemm: Add multi-precision verification
colluca Nov 13, 2023
e0320da
lint: GEMM verification script
Jan 26, 2024
f5215fb
gemm: Add tiling
colluca Nov 13, 2023
da71153
sw: fix datagen script in GEMM
Jan 26, 2024
16f47ef
sw: fix build err due to datagen script in GEMM
Jan 26, 2024
60b5b8f
sw: explicit undef of BIST due to build err
Jan 26, 2024
c4d80ff
dnn: Add FlashAttention-2 layer
colluca Nov 13, 2023
eed5bbe
sw: typo fix in run script
Jan 29, 2024
1e635a5
sw: remove flashattention from verification due to stalls
Jan 29, 2024
46b5da5
lint: run script
Jan 29, 2024
68c337f
dnn: Refactor and verify GeLU
colluca Nov 8, 2023
19eef47
util/sim: Extend struct definition to include array initializers
colluca Nov 12, 2023
5447eb0
dnn: Add Concat layer
colluca Nov 12, 2023
3916793
lint: python and C sources
Jan 29, 2024
6597471
sw: Add convenience 2D tile DMA transfer functions
colluca Nov 14, 2023
3fc5d9d
lint: gemm kernel and dnn lib
Jan 29, 2024
8be9c7f
snRuntime: Add global reduction function
colluca Nov 14, 2023
82e3baa
lint: global reduction runtime
Jan 29, 2024
0748064
gemm: Add options to parallelize over K and bypass DMA
colluca Nov 14, 2023
8e4a6c1
lint: gemm based kernels and DMA header
Jan 29, 2024
0012f38
dnn: Add FusedConcatLinear layer
colluca Nov 12, 2023
4f420a2
sw: remove double concat def
Jan 29, 2024
34f9653
dnn: Restore FlashAttention-2 until GEMM no-load options are fixed
colluca Nov 15, 2023
281e7e8
math: Add safe `asuint64` and `asdouble` functions
colluca Nov 16, 2023
2f6e272
lint: flashattention-2
Jan 29, 2024
4aa5de0
libm: fix safe double and integer casts
Jan 29, 2024
c55dcab
target/cfg: Add configuration with HW FDIV unit
colluca Jan 25, 2024
ef20364
ci: Add config with hardware FDIV unit
colluca Jan 26, 2024
1fa257c
dnn: add softmax to FDIV tests and Taylor exp approx
Jan 30, 2024
281e877
dnn: add layernorm to FDIV tests
Jan 30, 2024
07b01d8
dnn: add i-GELU and move test to FDIV tests
Jan 30, 2024
34461d0
gemm: add baseline flag
Jan 31, 2024
ccf41dd
flashattention: move tensors into -1 to 1 range
Jan 31, 2024
3e6c7dd
flashattention: bug fix and preliminary exp functions due to HW bug
Jan 31, 2024
ec76c84
tests: add FA-2 to FDIV tests
Jan 31, 2024
a05e8d5
lint: FA-2 and GEMM
Jan 31, 2024
1628432
clang: remove non-existent files
Feb 2, 2024
de6a9cd
verification: move prec and data type defs to data utils
Feb 2, 2024
bbef403
lint: remove dnn datagen from excluded files
Feb 2, 2024
1d0817d
sw: remove undef of BIST in GEMM
Feb 2, 2024
3f6c5b0
sw: remove comments
Feb 2, 2024
2cf6d2d
sw: add CONV2D back to compiled apps and align with verification fram…
Feb 5, 2024
9992068
sw: add FusedConv to compiled apps w/o verification
Feb 5, 2024
3994a7a
sw: remove CONV2D from verification due to bug in script
Feb 5, 2024
b5720b3
sw: correct verification script (kernel still failing)
Feb 7, 2024
c0171db
sw: add baseline opt as struct flag for GEMM
Feb 7, 2024
97f2bb5
sw: verification framework for FusedConv (failing)
Feb 7, 2024
c908a5b
lint: data utils
Feb 7, 2024
a60890e
lint: gemm
Feb 7, 2024
7715ffc
Implement revisions
colluca Feb 8, 2024
ae1e5dc
conv2d: Use `SNRT_CLUSTER_OFFSET`
colluca Feb 8, 2024
65b9e4f
Correct linting
colluca Feb 8, 2024
431a55f
Remove broken Linear layer
colluca Feb 8, 2024
100c92b
Use `expf` from `math.h` in FlashAttention and Softmax
colluca Feb 8, 2024
7bf1fee
Fix linting
colluca Feb 8, 2024
7634ab6
docs: Add `data_utils` documentation
colluca Feb 8, 2024
630ce2a
util/container: Update Bender installation method after cargo fail
colluca Feb 8, 2024
989521b
ci: Free up disk space
colluca Feb 9, 2024
272f131
ci: Fix linting
colluca Feb 9, 2024
60c2e8c
whatever
colluca Feb 9, 2024
9de5cf0
Delete dnn/gemm
colluca Feb 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
# The CI runs on `clang-format` version 10
BasedOnStyle: Google
IndentWidth: 4
IncludeBlocks: Preserve
IncludeBlocks: Preserve
12 changes: 12 additions & 0 deletions .github/workflows/build-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,18 @@ jobs:
name: Deploy Docker image
runs-on: ubuntu-22.04
steps:
# Free up disk space on Github-hosted runner
- name: Disk usage
run: df -h
- uses: jlumbroso/[email protected]
with:
android: true
dotnet: true
haskell: true
large-packages: true
- name: Disk usage after freeing up space
run: df -h
# Actually build the Docker container
- uses: actions/checkout@v2
- uses: docker/setup-buildx-action@v1
- name: GHCR Log-in
Expand Down
24 changes: 24 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,30 @@ jobs:
run: |
./run.py sw/run.yaml --simulator verilator -j

# Tests requiring hardware FDIV unit
sw-snitch-cluster-fdiv-vlt:
name: Simulate FDIV SW on Snitch Cluster w/ Verilator
runs-on: ubuntu-22.04
container:
image: ghcr.io/pulp-platform/snitch_cluster:main
steps:
- uses: actions/checkout@v2
with:
submodules: 'recursive'
- name: Build Software
working-directory: target/snitch_cluster
run: |
bender vendor init
make CFG_OVERRIDE=cfg/fdiv.hjson sw
- name: Build Hardware
working-directory: target/snitch_cluster
run: |
make CFG_OVERRIDE=cfg/fdiv.hjson bin/snitch_cluster.vlt
- name: Run Tests
working-directory: target/snitch_cluster
run: |
./run.py sw/fdiv.yaml --simulator verilator -j

#########################################
# Build SW on Snitch Cluster w/ Banshee #
#########################################
Expand Down
8 changes: 8 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,11 @@ snitch-cluster-banshee:
- cargo install --debug --path .
- cd ../target/snitch_cluster
- ./run.py sw/run.yaml --simulator banshee -j --run-dir runs/banshee

# Tests requiring hardware FDIV unit
snitch-cluster-fdiv-vsim:
script:
- cd target/snitch_cluster
- make CFG_OVERRIDE=cfg/fdiv.hjson sw
- make bin/snitch_cluster.vsim
- ./run.py sw/fdiv.yaml --simulator vsim -j --run-dir runs/vsim
1 change: 1 addition & 0 deletions docs/rm/sim/data_utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: data_utils
5 changes: 3 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ markdown_extensions:
- pymdownx.superfences
- pymdownx.tabbed
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
emoji_generator: !!python/name:materialx.emoji.to_svg
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
plugins:
- include-markdown
- mkdocstrings:
Expand Down Expand Up @@ -54,6 +54,7 @@ nav:
# - Solder: rm/solder.md
- Software:
- Simulation Utilities:
- data_utils: rm/sim/data_utils.md
- sim_utils: rm/sim/sim_utils.md
- rm/sim/Simulation.md
- rm/sim/Simulator.md
Expand Down
1 change: 1 addition & 0 deletions python-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ hjson
jsonref
jsonschema
mako
mkdocs-material
progressbar2
tabulate
yamllint
Expand Down
14 changes: 7 additions & 7 deletions sw/blas/axpy/data/datagen.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
import os

sys.path.append(os.path.join(os.path.dirname(__file__), "../../../../util/sim/"))
from data_utils import format_scalar_definition, format_vector_definition, \
format_vector_declaration, format_ifdef_wrapper # noqa: E402
from data_utils import format_scalar_definition, format_array_definition, \
format_array_declaration, format_ifdef_wrapper # noqa: E402

MIN = -1000
MAX = +1000
Expand Down Expand Up @@ -47,16 +47,16 @@ def main():
a = np.random.uniform(MIN, MAX, 1)
x = np.random.uniform(MIN, MAX, length)
y = np.random.uniform(MIN, MAX, length)
z = np.zeros(length)
g = golden_model(a, x, y)

# Format header file
l_str = format_scalar_definition('const uint32_t', 'l', length)
a_str = format_scalar_definition('const double', 'a', a[0])
x_str = format_vector_definition('double', 'x', x, alignment=BURST_ALIGNMENT, section=section)
y_str = format_vector_definition('double', 'y', y, alignment=BURST_ALIGNMENT, section=section)
z_str = format_vector_declaration('double', 'z', z, alignment=BURST_ALIGNMENT, section=section)
g_str = format_vector_definition('double', 'g', g)
x_str = format_array_definition('double', 'x', x, alignment=BURST_ALIGNMENT, section=section)
y_str = format_array_definition('double', 'y', y, alignment=BURST_ALIGNMENT, section=section)
z_str = format_array_declaration('double', 'z', [length],
alignment=BURST_ALIGNMENT, section=section)
g_str = format_array_definition('double', 'g', g)
g_str = format_ifdef_wrapper('BIST', g_str)
f_str = '\n\n'.join([l_str, a_str, x_str, y_str, z_str, g_str])
f_str += '\n'
Expand Down
10 changes: 5 additions & 5 deletions sw/blas/axpy/verify.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
sys.path.append(str(Path(__file__).parent / '../../../util/sim/'))
import verification # noqa: E402
from elf import Elf # noqa: E402
from data_utils import bytes_to_doubles # noqa: E402
from data_utils import from_buffer # noqa: E402


ERR_THRESHOLD = 1E-10
Expand All @@ -27,16 +27,16 @@ def main():
symbols_bin=args.symbols_bin,
log=args.log,
output_uids=['z'])
z_actual = np.array(bytes_to_doubles(raw_results['z']))
z_actual = from_buffer(raw_results['z'], 'double')

# Extract input operands from ELF file
if args.symbols_bin:
elf = Elf(args.symbols_bin)
else:
elf = Elf(args.snitch_bin)
a = np.array(bytes_to_doubles(elf.get_symbol_contents('a')))
x = np.array(bytes_to_doubles(elf.get_symbol_contents('x')))
y = np.array(bytes_to_doubles(elf.get_symbol_contents('y')))
a = elf.from_symbol('a', 'double')
x = elf.from_symbol('x', 'double')
y = elf.from_symbol('y', 'double')

# Verify results
z_golden = golden_model(a, x, y)
Expand Down
2 changes: 1 addition & 1 deletion sw/blas/gemm/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ MK_DIR := $(dir $(realpath $(lastword $(MAKEFILE_LIST))))
DATA_DIR := $(realpath $(MK_DIR)/data)
SRC_DIR := $(realpath $(MK_DIR)/src)

DATA_CFG ?= $(DATA_DIR)/params.hjson
DATA_CFG ?= $(DATA_DIR)/params.json
SECTION ?=

APP ?= gemm
Expand Down
72 changes: 47 additions & 25 deletions sw/blas/gemm/data/datagen.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@
import numpy as np
import argparse
import pathlib
import hjson
import json5
import sys
import os

sys.path.append(os.path.join(os.path.dirname(__file__), "../../../../util/sim/"))
from data_utils import emit_license, format_scalar_definition, \
format_vector_definition, format_ifdef_wrapper # noqa: E402
format_array_definition, format_ifdef_wrapper # noqa: E402


np.random.seed(42)
Expand Down Expand Up @@ -52,25 +52,41 @@ def emit_header(**kwargs):

# Generate random input matrices
dtype = NUMPY_TYPES[str(kwargs['prec'])]
M, N, K = kwargs['M'], kwargs['N'], kwargs['K']
m_tiles = kwargs['m_tiles']
n_tiles = kwargs['n_tiles']
k_tiles = kwargs['k_tiles']
parallelize_m = kwargs['parallelize_m']
parallelize_k = kwargs['parallelize_k']
baseline = kwargs['baseline']

assert (M % m_tiles) == 0, 'M is not an integer multiple of tile size'
assert (N % n_tiles) == 0, 'N is not an integer multiple of tile size'
assert (K % k_tiles) == 0, 'K is not an integer multiple of tile size'
frac_m = M / m_tiles
assert (frac_m % 8) == 0, 'frac_m is not an integer multiple of the number of cores per' \
'cluster'
assert not (parallelize_m and parallelize_k), 'Cannot parallelize K and M simultaneously'

if (kwargs['prec']) == 8:
# sign -1 or 1
sign_a = np.random.randint(0, 2, (kwargs['M'], kwargs['K'])).astype(dtype)
sign_a = np.random.randint(0, 2, (M, K)).astype(dtype)
# esponent < 0b01111
exponent_a = np.random.randint(0, 16, (kwargs['M'], kwargs['K'])).astype(dtype)
exponent_a = np.random.randint(0, 16, (M, K)).astype(dtype)
# mantissa can be arbitrary
mantissa_a = np.random.randint(0, 4, (kwargs['M'], kwargs['K'])).astype(dtype)
mantissa_a = np.random.randint(0, 4, (M, K)).astype(dtype)
# sign -1 or 1
sign_b = np.random.randint(0, 2, (kwargs['K'], kwargs['N'])).astype(dtype)
sign_b = np.random.randint(0, 2, (K, N)).astype(dtype)
# esponent < 0b01111
exponent_b = np.random.randint(0, 16, (kwargs['K'], kwargs['N'])).astype(dtype)
exponent_b = np.random.randint(0, 16, (K, N)).astype(dtype)
# mantissa can be arbitrary
mantissa_b = np.random.randint(0, 4, (kwargs['K'], kwargs['N'])).astype(dtype)
mantissa_b = np.random.randint(0, 4, (K, N)).astype(dtype)
# sign -1 or 1
sign_c = np.random.randint(0, 2, (kwargs['M'], kwargs['N'])).astype(dtype)
sign_c = np.random.randint(0, 2, (M, N)).astype(dtype)
# esponent < 0b01111
exponent_c = np.random.randint(0, 16, (kwargs['M'], kwargs['N'])).astype(dtype)
exponent_c = np.random.randint(0, 16, (M, N)).astype(dtype)
# mantissa can be arbitrary
mantissa_c = np.random.randint(0, 4, (kwargs['M'], kwargs['N'])).astype(dtype)
mantissa_c = np.random.randint(0, 4, (M, N)).astype(dtype)
_a = ((-1.0)**sign_a.astype(np.double))*(2.0**(exponent_a.astype(np.double)-15.0)) \
* (1.0 + mantissa_a.astype(np.double) / (2**2))
_b = ((-1.0)**sign_b.astype(np.double))*(2.0**(exponent_b.astype(np.double)-15.0)) \
Expand All @@ -82,36 +98,42 @@ def emit_header(**kwargs):
b = sign_b << 7 | exponent_b << FP8_FORMATS['fp8']['mant'] | mantissa_b
c = sign_c << 7 | exponent_c << FP8_FORMATS['fp8']['mant'] | mantissa_c
else:
a = np.random.rand(kwargs['M'], kwargs['K']).astype(dtype)
b = np.random.rand(kwargs['K'], kwargs['N']).astype(dtype)
c = np.random.rand(kwargs['M'], kwargs['N']).astype(dtype)
a = np.random.rand(M, K).astype(dtype)
b = np.random.rand(K, N).astype(dtype)
c = np.random.rand(M, N).astype(dtype)
result = golden_model(1, a, b, kwargs['beta'], c)

# Store matrices in transposed form if requested
a = a.T if kwargs['ta'] else a
b = b.T if kwargs['tb'] else b

data_str = [emit_license()]
data_str += [format_scalar_definition('uint32_t', 'M', kwargs['M'])]
data_str += [format_scalar_definition('uint32_t', 'N', kwargs['N'])]
data_str += [format_scalar_definition('uint32_t', 'K', kwargs['K'])]
data_str += [format_scalar_definition('uint32_t', 'M', M)]
data_str += [format_scalar_definition('uint32_t', 'N', N)]
data_str += [format_scalar_definition('uint32_t', 'K', K)]
data_str += [format_scalar_definition('uint32_t', 'TA', int(kwargs['ta']))]
data_str += [format_scalar_definition('uint32_t', 'TB', int(kwargs['tb']))]
data_str += [format_scalar_definition('uint32_t', 'BETA', kwargs['beta'])]
data_str += [format_scalar_definition('uint32_t', 'dtype_size', kwargs['prec']//8)]
data_str += [format_scalar_definition('uint32_t', 'expand', kwargs['expand'])]
data_str += [format_vector_definition(C_TYPES[str(kwargs['prec'])], 'a', a.flatten(),
data_str += [format_scalar_definition('uint32_t', 'm_tiles', kwargs['m_tiles'])]
data_str += [format_scalar_definition('uint32_t', 'n_tiles', kwargs['n_tiles'])]
data_str += [format_scalar_definition('uint32_t', 'k_tiles', kwargs['k_tiles'])]
data_str += [format_scalar_definition('uint32_t', 'parallelize_m', kwargs['parallelize_m'])]
data_str += [format_scalar_definition('uint32_t', 'parallelize_k', kwargs['parallelize_k'])]
data_str += [format_scalar_definition('uint32_t', 'baseline', int(baseline))]
data_str += [format_array_definition(C_TYPES[str(kwargs['prec'])], 'a', a.flatten(),
alignment=BURST_ALIGNMENT, section=kwargs['section'])]
data_str += [format_vector_definition(C_TYPES[str(kwargs['prec'])], 'b', b.flatten(),
data_str += [format_array_definition(C_TYPES[str(kwargs['prec'])], 'b', b.flatten(),
alignment=BURST_ALIGNMENT, section=kwargs['section'])]
data_str += [format_vector_definition(C_TYPES[str(kwargs['prec'])], 'c', c.flatten(),
data_str += [format_array_definition(C_TYPES[str(kwargs['prec'])], 'c', c.flatten(),
alignment=BURST_ALIGNMENT, section=kwargs['section'])]
if kwargs['prec'] == 8:
result_def = format_vector_definition(C_TYPES['64'], 'result', result.flatten())
result_def = format_array_definition(C_TYPES['64'], 'result', result.flatten())
else:
result_def = format_vector_definition(C_TYPES[str(kwargs['prec'])],
'result',
result.flatten())
result_def = format_array_definition(C_TYPES[str(kwargs['prec'])],
'result',
result.flatten())
data_str += [format_ifdef_wrapper('BIST', result_def)]
data_str = '\n\n'.join(data_str)

Expand All @@ -135,7 +157,7 @@ def main():

# Load param config file
with args.cfg.open() as f:
param = hjson.loads(f.read())
param = json5.loads(f.read())
param['section'] = args.section

# Emit header file
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,11 @@
ta: false,
tb: true, // must be true for SIMD
prec: 64,
expand: 0
expand: 0,
m_tiles: 2, // number of tiles in M dimension
k_tiles: 1, // number of tiles in K dimension
n_tiles: 1, // number of tiles in N dimension
parallelize_k: 0,
parallelize_m: 0,
baseline: false
}
Loading
Loading