You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since XLA has control over scheduling, for efficiency it should schedule the slice first and then the in-place update, to avoid an unnecessary copy. However, on specifically the CPU backend it chooses to copy twice instead, generating
(I'm not sure why it needs to make two copies here instead of just one, but the important part is that it copies at all.)
By the semantics of lax.optimization_barrier, I would expect that introducing an explicit dependency of x on y would force the slice to happen first, and then the liveliness analysis will kick in and remove the copies.
However, what ends up happening is XLA still introduces copies and re-orders the calls, so the generated code is the same as the one shown above. This seems to violate the scheduling control one expects from optimization_barrier.
Note that for this particular example, setting the XLA flag --xla_cpu_copy_insertion_use_region_analysis=true removes the copy and generates
Is there a JAX interface to HloOrdering, particularly SequentialHloOrdering or is that controlled by the XLA flag --xla_cpu_enable_concurrency_optimized_scheduler? In particular, is there a way of manually writing schedules without relying only on optimization_barrier (which is not precise enough in cases like these)?
I'm a bit confused why the workaround works now, since region analysis was introduced more than 3 years ago in openxla/xla@92292d1. The core logic of RemoveUnnecessaryCopies and TryElideCopy hasn't seemed to change much in that time either. Rather, what has recently changed is the flag xla_cpu_copy_insertion_use_region_analysis was added to CPU (disabled by default) and region analysis was disabled on GPU. Is there some context I'm missing?
Yes, I think this would be better reported on the XLA github issue tracker.
There's current no JAX way to control the HLO schedule, but that's something we're actively looking into adding as a way to control communication/compute overlap.
Description
Almost certainly an XLA bug and happy to report there if so.
Consider the function
Since XLA has control over scheduling, for efficiency it should schedule the slice first and then the in-place update, to avoid an unnecessary copy. However, on specifically the CPU backend it chooses to copy twice instead, generating
(I'm not sure why it needs to make two copies here instead of just one, but the important part is that it copies at all.)
By the semantics of
lax.optimization_barrier
, I would expect that introducing an explicit dependency ofx
ony
would force the slice to happen first, and then the liveliness analysis will kick in and remove the copies.However, what ends up happening is XLA still introduces copies and re-orders the calls, so the generated code is the same as the one shown above. This seems to violate the scheduling control one expects from
optimization_barrier
.Note that for this particular example, setting the XLA flag
--xla_cpu_copy_insertion_use_region_analysis=true
removes the copy and generatesas expected, with or without
optimization_barrier
. Also, using a GPU device generates the copylessalso with or without
optimization_barrier
.Some miscellaneous related questions
HloOrdering
, particularlySequentialHloOrdering
or is that controlled by the XLA flag--xla_cpu_enable_concurrency_optimized_scheduler
? In particular, is there a way of manually writing schedules without relying only onoptimization_barrier
(which is not precise enough in cases like these)?RemoveUnnecessaryCopies
andTryElideCopy
hasn't seemed to change much in that time either. Rather, what has recently changed is the flagxla_cpu_copy_insertion_use_region_analysis
was added to CPU (disabled by default) and region analysis was disabled on GPU. Is there some context I'm missing?(originally reported in the discussion #19165.)
System info (python version, jaxlib version, accelerator, etc.)
The text was updated successfully, but these errors were encountered: