You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when -d cuda is specified, some heuristics are currently used:
all inputs will be allocated in GPU except ones which are directly fed to Reshape op as its output shape.
results of Shape op will be stored in host.
if the right hand side of Div is a single float value, chainerx::AsScalar is called internally in XCVM's Div (this weird hack is for y / x.shape[0] where x.shape[0] is a batch_size).
These actually work as mitigation for now, but we should design more sophisticated device assignments. The principle would be
Shapes should be in CPU
Make it possible to run cross-device binary ops when one of their inputs is a scalar.
If requirements cannot be satisfied, insert a custom op (say OnikuxDeviceCopy) which explicitly copy data between devices, probably showing a warning to users.
The text was updated successfully, but these errors were encountered:
Currently, when
-d cuda
is specified, some heuristics are currently used:Reshape
op as its output shape.Shape
op will be stored in host.Div
is a single float value,chainerx::AsScalar
is called internally in XCVM'sDiv
(this weird hack is fory / x.shape[0]
wherex.shape[0]
is a batch_size).These actually work as mitigation for now, but we should design more sophisticated device assignments. The principle would be
The text was updated successfully, but these errors were encountered: