You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issues track multithreading solution for JIT code.
Description
At the moment, Lux only target Nim and so can make use of OpenMP for threading.
In the future, Lux will probably add a JIT solution via LLVM IR, this will reduce code-size, code generation for specialized size and allow targeting new architectures that would otherwise require complex C extensions.
Example: Cuda introduce __global, magic like blockId * blockDim.x + threadIdx.x that requires some gymnastics for Nim to generate code and not throw "undefined".
Unfortunately, when doing JIT on CPU we lose OpenMP support as OpenMP is implemented in Clangand replaced by libraries call in LLVM IR. So we need an alternative solution.
Solutions to explore
Reuse Nim threadpool library.
Implement a threading library from scratch
Wrap a C/C++ library (note that C++ will cause issues with cpuinfo with some compiler due to it using C99)
externfloatfoo( void );
intmain () {
staticintzero=0;
auto intgtid;
auto floatr=0.0;
__kmpc_begin( &loc3, 0 );
// The gtid is not actually required in this example so could be omitted;// We show its initialization here because it is often required for calls into// the runtime and should be locally cached like this.gtid=__kmpc_globalthreadnum( &loc3 );
__kmpc_forkcall( &loc7, 1, main_7_parallel_3, &r );
__kmpc_end( &loc0 );
return0;
}
structmain_10_reduction_t_5 { floatr_10_rpr; };
statickmp_critical_namelck= { 0 };
staticident_tloc10; // loc10.flags should contain KMP_IDENT_ATOMIC_REDUCE bit set// if compiler has generated an atomic reduction.voidmain_7_parallel_3( int*gtid, int*btid, float*r_7_shp ) {
auto inti_7_pr;
auto intlower, upper, liter, incr;
auto structmain_10_reduction_t_5reduce;
reduce.r_10_rpr=0.F;
liter=0;
__kmpc_dispatch_init_4( &loc7,*gtid, 35, 0, 9, 1, 1 );
while ( __kmpc_dispatch_next_4( &loc7, *gtid, &liter, &lower, &upper, &incr
) ) {
for( i_7_pr=lower; upper >= i_7_pr; i_7_pr++ )
reduce.r_10_rpr+=foo();
}
switch( __kmpc_reduce_nowait( &loc10, *gtid, 1, 4, &reduce, main_10_reduce_5, &lck ) ) {
case1:
*r_7_shp+=reduce.r_10_rpr;
__kmpc_end_reduce_nowait( &loc10, *gtid, &lck );
break;
case2:
__kmpc_atomic_float4_add( &loc10, *gtid, r_7_shp, reduce.r_10_rpr );
break;
default:;
}
}
This issues track multithreading solution for JIT code.
Description
At the moment, Lux only target Nim and so can make use of OpenMP for threading.
In the future, Lux will probably add a JIT solution via LLVM IR, this will reduce code-size, code generation for specialized size and allow targeting new architectures that would otherwise require complex C extensions.
Example: Cuda introduce __global, magic like
blockId * blockDim.x + threadIdx.x
that requires some gymnastics for Nim to generate code and not throw "undefined".Unfortunately, when doing JIT on CPU we lose OpenMP support as OpenMP is implemented in Clangand replaced by libraries call in LLVM IR. So we need an alternative solution.
Solutions to explore
Reuse Nim threadpool library.
Implement a threading library from scratch
Wrap a C/C++ library (note that C++ will cause issues with cpuinfo with some compiler due to it using C99)
Wait for OpenMP IR to be merged in LLVM see:
OpenMP code transformation
from https://stackoverflow.com/questions/52285368/how-does-llvm-translate-openmp-multi-threaded-code-with-runtime-library-calls
This OMP code
is transformed into
in LLVM IR:
The text was updated successfully, but these errors were encountered: