[WIP] Expose OpenCL optimization pass #1353

SteveBronder · 2023-08-29T17:44:31Z

Submission Checklist

Run unit tests
Documentation
- If a user-facing facing change was made, the documentation PR is here:
- OR, no user-facing changes were made

Summary

Creates a user exposed flag -fopencl that will attempt to promote log_prob for reverse mode such that we can run the entire thing on a GPU via OpenCL.

This is about halfway there with a few things to figure out

The lower level C++ mir needs to be able to promote vectors and scalars so that they get moved over to the GPU. I think this means that we need to add a tag to arrays and scalars in the mir for a Mem_pattern.t. @WardBrian can you think of another way to do this that wouldn't require that? Would just be annoying since we would have to touch a lot of code.
The optimization pass is all or none aka we either are able to move the entire log_prob over to the GPU or we go back to running only on the CPU. If we are doing this scheme then I think I also need to add a logger so that when a parameter fails we can give the user a reason as to why their model failed to be moved over to the GPU. (UPDATE: done but it's just a writer to stderr)
The current scheme right now is
a. Do the exact same pass as the SoA optimization using the monotone framework
b. At the end check the compiler checks whether all the parameters declared in the parameters, transformed parameters, and model block are able to go on the GPU. If so then we are good and continue, otherwise we stop here and throw an error
c. If (b) passes then we do another pass over those blocks collecting the names of all of the data used in the model
d. Take the data names from (c) and in the data section add declarations and assignments of that data over to the GPU using to_matrix_cl like in the code below.
```
matrix_cl<{TYPE}> {DATA_NAME}_opencl__ = to_matric_cl({DATA_NAME});
```

I ripped out all the previous OpenCL code and my goal right now is just to get all of those tests compiling and working correctly.

I think it would be a good idea for now to leave the target on the CPU as when we write the scalar from the GPU to CPU that's a stopping point for the async opencl code to know it needs to finish before passing that scalar back.
Mem_pattern.t now has an OpenCL type that indicates a statement or expression can be used on the GPU. For functions, every function in the math library that supports OpenCL also supports the new matrix type, but all functions that support the new matrix type do not support OpenCL. So for the table of available function signatures, if something is tagged OpenCL we assume it can support both SoA and OpenCL. For now I just tagged everything but before we merge and start testing I need to go through the math library and see which functions actually support OpenCL

Release notes

Allow -fopencl that performs a pass on log prob to attempt to promote the model to run on the GPU via OpenCL

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

…s needed

…fopencl everywhere. Still need to figure out how to handle data assignment. Wrote the code to parse data from log_prob and make new decls for the OpenCL data

rok-cesnovar · 2023-08-29T18:35:16Z

Awesome stuff!! Let me know if I can help in any way.

SteveBronder · 2023-08-29T19:41:23Z

Ty! At this point it's mostly just brain storming nice patterns for all this. If you can look at stan-dev/stan#3219 that is a PR we are waiting on before we can merge this

andrjohns · 2023-08-30T14:26:31Z

This is great! Looking forward to this!

WardBrian

First round - really more asking questions than anything proper review-y

WardBrian · 2023-08-30T14:32:33Z

src/middle/Mem_pattern.ml


 let lub_mem_pat lst =
-  let find_soa mem_pat = mem_pat = SoA in
+  let find_soa mem_pat = match mem_pat with SoA -> true | _ -> false in


Equivalent to is_soa above

WardBrian · 2023-08-30T14:33:23Z

src/middle/Mem_pattern.ml

+type t = AoS | SoA | OpenCL [@@deriving sexp, compare, map, hash, fold, equal]

 let pp ppf = function
  | AoS -> Fmt.string ppf "AoS"
  | SoA -> Fmt.string ppf "SoA"
+  | OpenCL -> Fmt.string ppf "OpenCL"
+
+let is_soa mem = match mem with SoA -> true | _ -> false
+let is_aos mem = match mem with AoS -> true | _ -> false
+let is_opencl mem = match mem with OpenCL -> true | _ -> false


There is a relationship between these right? All "OpenCL" memory layouts are automatically SoA, right?

Should is_soa take this into account?

All "OpenCL" memory layouts are automatically SoA, right?

Yes, but I think we should keep them different. For instance like printing the cpp we probably want to know the difference. I need to think about a right way to describe this

WardBrian · 2023-08-30T14:36:13Z

src/middle/SizedType.ml

+let is_eigen_type st =
+  match st with
+  | (SVector (mem, _) | SRowVector (mem, _) | SMatrix (mem, _, _))
+    when Mem_pattern.is_opencl mem ->
+      false
+  | SVector _ | SRowVector _ | SMatrix _ | SComplexRowVector _
+   |SComplexVector _ | SComplexMatrix _ ->
+      true
+  | _ -> false
+


I feel a little nervous that SizedType.is_eigen_type st and UnsizedType.is_eigen_type (SizedType.to_unsized st) would return different things in some circumstances. I suppose it depends on how/where both of them are used, but I'd feel a bit more confident with a is_eigen_type which matches Unsized and using is_eigen st && not (is_opencl st) where needed.

What do you think?

Yeah I agree it's very shaky. I think splitting it out it a better idea

WardBrian · 2023-08-30T14:37:54Z

src/analysis_and_optimization/Memory_patterns.ml

@@ -103,13 +126,16 @@ let query_stan_math_mem_pattern_support (name : string)
            |> Result.is_ok )
          namematches in
      let is_soa ((_ : UnsizedType.returntype), _, mem) =
-        mem = Mem_pattern.SoA in
+        match requested_mem with
+        | Mem_pattern.SoA -> mem = Mem_pattern.SoA || mem = OpenCL


This seems like one place where the SoA and OpenCL things are considered equivalent, so I wonder if the is functions in mem_pattern.ml should do the same?

For checking the functions OpenCL -> SoA but it is not true that SoA -> OpenCL

WardBrian · 2023-08-30T14:39:53Z

src/middle/Stan_math_signatures.ml

Are there any functions with are SoA but not OpenCL? It seems like this was a total find-and-replace, but that also seems incorrect? (The math opencl/ folder is much smaller than the others, no?)

Yes for now I just did a find and replace but before we merge I need to go through and strip out what is SoA and what is OpenCL

WardBrian · 2023-08-30T14:47:35Z

src/middle/SizedType.ml

@@ -220,6 +220,15 @@ let modify_sizedtype_mem (mem_pattern : Mem_pattern.t) st =
  match mem_pattern with
  | AoS -> demote_sizedtype_mem st
  | SoA -> promote_sizedtype_mem st
+  | OpenCL -> promote_sizedtype_mem st


OpenCL is "promoted" to SoA by this function?

WardBrian · 2023-08-30T14:48:08Z

src/middle/SizedType.ml

+  | SRowVector (_, dim) -> SRowVector (mem_pattern, dim)
+  | SMatrix (_, dim1, dim2) -> SMatrix (mem_pattern, dim1, dim2)
+  | SArray (inner_type, dim) -> SArray (promote_mem mem_pattern inner_type, dim)
+  | _ -> st


Does this need to consider things inside tuples?

If we want to allow OpenCL to use tuples then yes.

Also the Decls having a mem_pattern tag won't work because of tuples either :( I think we do just need to tag all sized types as having a memory pattern

Ah, that’s too bad. You could do something like what we had to do for Autodiff level and have a tuple specific variant in the type, but I wasn’t super happy with that either.

Yeah eod I think it's going to look a little odd in some places but I think it's fine to just add it to the sized types

WardBrian · 2023-08-30T14:48:28Z

src/middle/SizedType.ml

@@ -220,6 +220,15 @@ let modify_sizedtype_mem (mem_pattern : Mem_pattern.t) st =
  match mem_pattern with
  | AoS -> demote_sizedtype_mem st
  | SoA -> promote_sizedtype_mem st
+  | OpenCL -> promote_sizedtype_mem st
+
+let rec promote_mem (mem_pattern : Mem_pattern.t) st =


This seems like it should be called replace_mem rather than promote

WardBrian · 2023-08-30T14:50:51Z

src/middle/Index.ml

@@ -43,6 +43,13 @@ let apply ~default ~merge op (ind : 'a t) =
  | Between (expr_top, expr_bottom) -> merge (op expr_top) (op expr_bottom)
  | MultiIndex exprs -> op exprs

+let map_expr ~f = function


We already derive map for the type t, so this is equivalent to Index.map

Yeah this needs deleted

WardBrian · 2023-08-30T14:51:42Z

src/analysis_and_optimization/Optimize.ml

@@ -1281,7 +1333,8 @@ let settings_const b =
  ; lazy_code_motion= b
  ; optimize_ad_levels= b
  ; preserve_stability= not b
-  ; optimize_soa= b }
+  ; optimize_soa= b
+  ; optimize_opencl= false }


I think we'd want this to be enabled as part of all_optimizations so this should be b

idk, personally this is a very specific optimization for a piece of hardware. I'd rather the user explicitly flips it on

… over to the GPU

codecov · 2023-09-05T15:21:35Z

Codecov Report

Merging #1353 (51b3ba9) into master (743d0dd) will decrease coverage by 0.44%.
Report is 2 commits behind head on master.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##           master    #1353      +/-   ##
==========================================
- Coverage   89.39%   88.95%   -0.44%     
==========================================
  Files          65       65              
  Lines       10607    10814     +207     
==========================================
+ Hits         9482     9620     +138     
- Misses       1125     1194      +69

Files Changed	Coverage Δ
src/frontend/Pretty_printing.ml	`91.08% <0.00%> (ø)`
src/middle/Mem_pattern.ml	`40.00% <16.66%> (-26.67%)`	⬇️
src/middle/Stmt.ml	`79.47% <66.66%> (ø)`
src/analysis_and_optimization/Mir_utils.ml	`77.38% <75.00%> (ø)`
src/frontend/Ast_to_Mir.ml	`94.19% <75.00%> (+0.01%)`	⬆️
src/middle/SizedType.ml	`79.77% <77.27%> (-4.75%)`	⬇️
src/analysis_and_optimization/Memory_patterns.ml	`84.17% <77.88%> (-6.41%)`	⬇️
src/middle/Index.ml	`82.35% <80.00%> (-0.41%)`	⬇️
src/stan_math_backend/Transform_Mir.ml	`95.16% <87.50%> (-0.58%)`	⬇️
src/stan_math_backend/Cpp.ml	`85.78% <91.66%> (-0.19%)`	⬇️
... and 14 more

... and 1 file with indirect coverage changes

… or none for opencl in optimization pass

SteveBronder added 3 commits August 22, 2023 16:08

Adds base code for OpenCL optimizations. Things still broken and test…

0875110

…s needed

Remove previous OpenCL code along with --use-opencl and instead use -…

a9f6bcb

…fopencl everywhere. Still need to figure out how to handle data assignment. Wrote the code to parse data from log_prob and make new decls for the OpenCL data

add pass that uses the opencl data for the reverse mode log prob

4209a23

SteveBronder added 2 commits August 29, 2023 15:37

add simple warning report to SoA and OpenCL optimization engine

5e7a36b

update

1348044

WardBrian reviewed Aug 30, 2023

View reviewed changes

SteveBronder added 6 commits September 1, 2023 16:48

adds Mem_pattern to all SizedTypes. Also captures scalars to be moved…

d261c5b

… over to the GPU

Merge remote-tracking branch 'origin/master' into optims/opencl-pass

4048069

update stancjs

93dbe06

update stancjs

9c30349

update stancjs

e4aa467

fix logic for std vectors of opencl matrices

d0de469

SteveBronder added 9 commits September 5, 2023 17:52

update jenkins to complile OpenCL

1458491

update stan math signatures to support opencl and add logic to do all…

4ed4daa

… or none for opencl in optimization pass

update jenkinsfile

edc825b

update jenkinsfile

d1bf03d

update jenkinsfile

0d10d87

update jenkinsfile

911d457

update jenkinsfile

0adfd16

update jenkinsfile

b44de03

update jenkinsfile

51b3ba9

WardBrian mentioned this pull request Oct 9, 2023

Mark binomial logit and glm as opencl-supported #1368

Merged

3 tasks

WardBrian mentioned this pull request Jul 18, 2024

Add support for @annotations #1439

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Expose OpenCL optimization pass #1353

[WIP] Expose OpenCL optimization pass #1353

SteveBronder commented Aug 29, 2023 •

edited

Loading

rok-cesnovar commented Aug 29, 2023

SteveBronder commented Aug 29, 2023

andrjohns commented Aug 30, 2023

WardBrian left a comment

WardBrian Aug 30, 2023

WardBrian Aug 30, 2023

SteveBronder Aug 30, 2023

WardBrian Aug 30, 2023

SteveBronder Aug 30, 2023

WardBrian Aug 30, 2023

SteveBronder Aug 30, 2023 •

edited

Loading

WardBrian Aug 30, 2023

SteveBronder Aug 30, 2023

WardBrian Aug 30, 2023

WardBrian Aug 30, 2023

SteveBronder Aug 30, 2023

WardBrian Aug 30, 2023

SteveBronder Aug 30, 2023

WardBrian Aug 30, 2023

WardBrian Aug 30, 2023

SteveBronder Aug 30, 2023

WardBrian Aug 30, 2023

SteveBronder Aug 30, 2023

codecov bot commented Sep 5, 2023 •

edited

Loading

[WIP] Expose OpenCL optimization pass #1353

Are you sure you want to change the base?

[WIP] Expose OpenCL optimization pass #1353

Conversation

SteveBronder commented Aug 29, 2023 • edited Loading

Submission Checklist

Summary

Release notes

Copyright and Licensing

rok-cesnovar commented Aug 29, 2023

SteveBronder commented Aug 29, 2023

andrjohns commented Aug 30, 2023

WardBrian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveBronder Aug 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 5, 2023 • edited Loading

Codecov Report

SteveBronder commented Aug 29, 2023 •

edited

Loading

SteveBronder Aug 30, 2023 •

edited

Loading

codecov bot commented Sep 5, 2023 •

edited

Loading