-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda heat example w quaditer #913
Draft
Abdelrahman912
wants to merge
144
commits into
Ferrite-FEM:master
Choose a base branch
from
Abdelrahman912:cuda-heat-example-w-quaditer
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 1 commit
Commits
Show all changes
144 commits
Select commit
Hold shift + click to select a range
a979fb2
Initial ideas
KnutAM 298158c
Working implementation
KnutAM 1794db3
Merge branch 'master' into kam/QuadraturePointIterator
KnutAM 51ab4f2
Add static values version and improve interface
KnutAM 22a7377
Add dev example and test
KnutAM 18377f3
Merge branch 'master' into kam/QuadraturePointIterator
KnutAM 27a3a96
Add StaticCellValues without stored cell coordinates
KnutAM 95b5729
initial ideas
Abdelrahman912 d4e881d
minor changes
Abdelrahman912 f55b878
Merge branch 'Ferrite-FEM:master' into cuda-heat-example-w-quaditer
Abdelrahman912 c1ef6ad
add some abstractions
Abdelrahman912 394ac6a
add minor comment
Abdelrahman912 1f0df67
add z dierction for numerical integration
Abdelrahman912 3152042
add Float32
Abdelrahman912 aac5994
minor fix
Abdelrahman912 142f89a
init coloring implementation
Abdelrahman912 eaff534
init working on the assembler
Abdelrahman912 ffdc341
init gpu_assembler
Abdelrahman912 59595e8
implement naive gpu_assembler
Abdelrahman912 0e3cb21
minor fix
Abdelrahman912 687141d
use CuSparseMatrixCSC in assembler
Abdelrahman912 11d5a01
minor fix
Abdelrahman912 d5c951c
minor fix
Abdelrahman912 f4272a6
hoist dh, cellvalues, assembler outside the cuda loop
Abdelrahman912 d5cf949
add run_gpu macro
Abdelrahman912 2e52de1
init using int32 instead of int64 to reduce number of registers
Abdelrahman912 2cd0168
finish use int32
Abdelrahman912 54922ab
stupid way to circumvent rubbish values
Abdelrahman912 9406ff9
add discorse ref
Abdelrahman912 8fedba5
add ncu benchmark
Abdelrahman912 8bd417a
fix error in benchmark and add ref.
Abdelrahman912 abf11b6
set the code for debugging
Abdelrahman912 4f85cf5
init test
Abdelrahman912 4935b70
fix adapt issue
Abdelrahman912 188cceb
remove unnecessary cushow
Abdelrahman912 9c904e4
add heat equation main test set
Abdelrahman912 06432db
remove unncessary comments
Abdelrahman912 a67caaa
add nsys benchmark
Abdelrahman912 ecee17f
Merge branch 'master' into cuda-heat-example-w-quaditer
Abdelrahman912 60edda9
fix some issues regarding the merge
Abdelrahman912 063ff7a
minor fix
Abdelrahman912 9206be3
remove nsight files
Abdelrahman912 1eeb568
minor fix
Abdelrahman912 5e339a0
add comments
Abdelrahman912 204f3be
minor fix
Abdelrahman912 0f2e6b7
add comments
Abdelrahman912 7100e0a
fix for CI
Abdelrahman912 f129449
fix for CI
Abdelrahman912 618adb5
CI fix
Abdelrahman912 78f120c
ci
Abdelrahman912 4971cba
minor fix
Abdelrahman912 ea8451c
fix ci
Abdelrahman912 986c5db
remove file
Abdelrahman912 f93fdfb
add CUDA to docs project
Abdelrahman912 f442ae2
add v2 for gpu_heat_equation
Abdelrahman912 81274d5
add adapt to docs
Abdelrahman912 fbc05ed
minor fix
Abdelrahman912 506328c
init assemble per dof
Abdelrahman912 b505189
assemble global v3
Abdelrahman912 b0a94aa
minor fix
Abdelrahman912 aa3d1ae
add comment + start in v4
Abdelrahman912 c8cf6fe
add map dof to elements
Abdelrahman912 8a4523d
add 3d array for local matrices
Abdelrahman912 9617a4f
init code for v4
Abdelrahman912 427a6b0
fix bug w assemble global in v4
Abdelrahman912 bbed047
precommit fix
Abdelrahman912 85c055c
add preserve ref
Abdelrahman912 2b77613
fix precommit
Abdelrahman912 f9c70ab
fix logic error in v4
Abdelrahman912 0519016
init shared array usage
Abdelrahman912 5752676
optimize threads for dynamic shared memory threshold
Abdelrahman912 0fe023c
fix bug in dynamic shared mem
Abdelrahman912 a352612
minor fix
Abdelrahman912 2a6120a
init kernel abstractions
Abdelrahman912 67face7
add local matrix kernel
Abdelrahman912 aca8a6f
add global matrix kernel with CUDA dependency
Abdelrahman912 9e4d592
minor change
Abdelrahman912 6114495
init working KS implementation (still CUDA dependent )
Abdelrahman912 2a8abeb
remove cuda dependency
Abdelrahman912 630017c
add refrence to
Abdelrahman912 fc26670
use Atomix.jl
Abdelrahman912 ae7bc93
init v4 ks
Abdelrahman912 0e28f14
init cell cache prototype
Abdelrahman912 0eb376d
working gpu cell cache
Abdelrahman912 8f7a182
fix types
Abdelrahman912 9b1567d
init gpu cell iterator
Abdelrahman912 a08ab97
add iterator
Abdelrahman912 b34c43b
add stride kernel
Abdelrahman912 b289b69
minor fix
Abdelrahman912 b2c0347
fix blocks, threads for kernel launch
Abdelrahman912 b87d78b
minor fix for thread, blocks
Abdelrahman912 e10e2f6
Merge branch 'master' into cuda-heat-example-w-quaditer
Abdelrahman912 42a28e1
add gpu as extension
Abdelrahman912 e59b8b8
add some documentaion and remove unnecessary implementations.
Abdelrahman912 e7157e4
Merge branch 'master' into cuda-heat-example-w-quaditer
Abdelrahman912 e4b194d
init unit test
Abdelrahman912 a613107
init test for iterators
Abdelrahman912 113a7a2
Merge branch 'master' into cuda-heat-example-w-quaditer
Abdelrahman912 d1e831e
add tests in GPU/
Abdelrahman912 7f8fa3c
add test local ke and fe
Abdelrahman912 c38419c
minor fix
Abdelrahman912 190e43e
fix ci - 1
Abdelrahman912 763c6b5
fix ci-2
Abdelrahman912 1b6060d
minor edit
Abdelrahman912 d767668
fix ci
Abdelrahman912 726ea9e
ci
Abdelrahman912 8590aa4
fix ci
Abdelrahman912 39e1f0c
minor edit
Abdelrahman912 f0cd305
add validation for cuda, minor fix, seperate unit tests into multiple…
Abdelrahman912 9d4e8b9
fix precommit shit
Abdelrahman912 12f64bb
try documentation test fix
Abdelrahman912 361333b
documentation test fix
Abdelrahman912 e31c6e3
make ci happy
Abdelrahman912 626dec2
change kernel launch, init adapt test
Abdelrahman912 fbc1b4b
minor fix
Abdelrahman912 ea83925
add test_adapt, some comments
Abdelrahman912 a356d8d
fix precommit
Abdelrahman912 ee1f77c
init cpu multi threading
Abdelrahman912 fb7e1fc
Merge branch 'master' into cuda-heat-example-w-quaditer
Abdelrahman912 b38ab72
hot fix for buggy assembly logic
Abdelrahman912 adb166a
minor fix
Abdelrahman912 6300a4a
test sth
Abdelrahman912 b7301c2
precommit fix
Abdelrahman912 18f47b8
fix explicit imports
Abdelrahman912 f6e9cc6
add fillzero
Abdelrahman912 8a796de
Merge branch 'master' into cuda-heat-example-w-quaditer
Abdelrahman912 75e89ed
minor fix for gpu assembly
Abdelrahman912 a77c347
minor minor fix
Abdelrahman912 7338788
make cache mutable
Abdelrahman912 cbab665
put the coloring stuff in the init
Abdelrahman912 1c81281
minor fix
Abdelrahman912 d42bcab
code for benchmarking (to be removed)
Abdelrahman912 1ab1650
rm cpu multithreading benchmark code
Abdelrahman912 bc8ec95
init fix for higher order approximations in gpu
Abdelrahman912 c7f4b0f
add working imp for global gpu mem
Abdelrahman912 d4d5967
add some comments
Abdelrahman912 3b2196b
trying to make the ci happy
Abdelrahman912 825d257
minor fix
Abdelrahman912 6109bd1
comment gpu related stuff in eg to pass ci
Abdelrahman912 9caa60b
some review fixes
Abdelrahman912 868d559
some review fixes
Abdelrahman912 a4637b6
add allocate_matrix for CuSparseMatrix
Abdelrahman912 1619986
init first ideas for cuda mem allocator
Abdelrahman912 69eb55a
add cuda mem interface
Abdelrahman912 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
using SparseArrays | ||
using CUDA | ||
struct GPUSparseMatrixCSC{Tv,VEC <: AbstractVector{Int32} , NZVEC <: AbstractVector{Tv}} | ||
m::Int32 # Number of rows | ||
n::Int32 # Number of columns | ||
colptr::VEC # Column i is in colptr[i]:(colptr[i+1]-1) | ||
rowval::VEC # Row indices of stored values | ||
nzval::NZVEC # Stored values, typically nonzeros | ||
end | ||
|
||
function GPUSparseMatrixCSC{Tv}(m::Int32, n::Int32, colptr::AbstractVector{Int32}, | ||
rowval:: AbstractVector{Int32}, nzval::AbstractVector{Tv}) where {Tv} | ||
new(m, n, colptr, rowval, nzval) | ||
end | ||
|
||
function GPUSparseMatrixCSC{Tv}(A::SparseMatrixCSC{Tv}) where {Tv} | ||
GPUSparseMatrixCSC(A.m, A.n, A.colptr, A.rowval, A.nzval) | ||
end | ||
|
||
|
||
function Base.getindex(A::GPUSparseMatrixCSC{Tv}, i::Int32, j::Int32) where Tv | ||
# TODO: Add bounds checking | ||
|
||
col_start = A.colptr[j] | ||
col_end = A.colptr[j + 1] - 1 | ||
|
||
for k in col_start:col_end | ||
if A.rowval[k] == i | ||
return A.nzval[k] | ||
end | ||
end | ||
|
||
return zero(Tv) | ||
end | ||
|
||
function Base.setindex!(A::GPUSparseMatrixCSC{T}, v::Float32, i::Int32, j::Int32) where T | ||
col_start = A.colptr[j] | ||
col_end = A.colptr[j + 1] - 1 | ||
|
||
for k in col_start:col_end | ||
if A.rowval[k] == i | ||
# Update the existing element | ||
A.nzval[k] = v | ||
return | ||
end | ||
end | ||
end | ||
|
||
|
||
function custom_atomic_add!(A::GPUSparseMatrixCSC{T}, v::Float32, i::Int32, j::Int32) where T | ||
col_start = A.colptr[j] | ||
col_end = A.colptr[j + 1] - 1 | ||
|
||
for k in col_start:col_end | ||
if A.rowval[k] == i | ||
# Update the existing element | ||
CUDA.@atomic A.nzval[k] += v | ||
return | ||
end | ||
end | ||
|
||
end | ||
|
||
function gpu_sparse_norm(A::GPUSparseMatrixCSC{T}, p::Real=2) where T | ||
if p == 2 # Frobenius norm | ||
return sqrt(sum(abs2, A.nzval)) | ||
elseif p == 1 # L1 norm | ||
col_sums = zeros(T, A.n) | ||
for j in 1:A.n | ||
for k in A.colptr[j]:(A.colptr[j + 1] - 1) | ||
col_sums[j] += abs(A.nzval[k]) | ||
end | ||
end | ||
return maximum(col_sums) | ||
elseif p == Inf # L∞ norm | ||
row_sums = zeros(T, A.m) | ||
for j in 1:A.n | ||
for k in A.colptr[j]:(A.colptr[j + 1] - 1) | ||
i = A.rowval[k] | ||
row_sums[i] += abs(A.nzval[k]) | ||
end | ||
end | ||
return maximum(row_sums) | ||
else | ||
return -1.0f0 | ||
end | ||
end | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing the analogue benchmark using
QuadraturePointIterator