-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add end-to-end conversion of TTIR broadcast to TTNN repeat op. #1638
base: main
Are you sure you want to change the base?
Conversation
0612e3f
to
294b2b1
Compare
054b9e0
to
b3f07cc
Compare
@uazizTT Please provide better PR description with some IR before and after decomposition. |
6cb7d13
to
092ca7d
Compare
092ca7d
to
3b43c5a
Compare
3850f61
to
f741c50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uazizTT Approved, but let either @sdjordjevicTT @svuckovicTT @azecevicTT check the changes before checking in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it doesn't look bad from functional point of view, SHLO lowering need some refactoring, there is also a double conversion during lowering, but the logic looks good.
I've also left a comment regarding TTNN interface, I would like if others would get involved in discussion, as I see it as a problem without a clear solution at the moment, I am okay with either proposal from that comment until we finally resolve that question.
}]; | ||
|
||
let arguments = (ins AnyRankedTensor:$input, | ||
AnyRankedTensor:$output, | ||
I64ArrayAttr:$dimension); | ||
I64ArrayAttr:$repeat_dimensions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use DenseI64ArrayAttr
it's much nicer to deal with it that I64ArrayAttr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use DenseI64ArrayAttr
but found that we don't yet have infrastructure to handle it in TTNNToFlatBuffer
, so I suggest we keep it as I32ArrayAttr
which is the same type as used in ReshapeOp
and ReductionOp
and then refactor them all to use DenseI64ArrayAttr
as a refactoring task.
}]; | ||
|
||
let arguments = (ins AnyRankedTensor:$input, | ||
I64ArrayAttr:$shape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTNN op takes Shape
as parameter. At least from our standpoint Shape
is a wrapper around std::vector<uint32_t>
. We should decide if we should model it as TTNN_ShapeAttr
in the dialect. One problem that I have with modeling unsigned arrays is that MLIR intentionally didn't include DenseUIXArrayAttr
(and it's pretty much impossible to extend it without the fork of the whole project). So our options are either:
- Model it as close as possible, which means making a
TTNN_ShapeAttr
that is wrapper around unsigned array attr. There isn't a nice interface for unsigned arrays, from my experience it means relaying on the user to interpretAPInt
as unsigned integer (in every place that it's used). - Ignore signdness and width, make every integer attribute
int64_t
, and by extension every integer array attributeDenseI64ArrayAttr
.
For TTIR 2nd option is a no-brainer. For TTNN first option has the advantage that we can catch some errors during compile time, but it requires considerably more effort than second option.
We should definitely talk to folks from the Metal team, I know they used stl::span<int64_t>
in some places, which is great and ideal option. I would like to hear their rationale for using unsigned types in other places. Other than addresses (if we use them anywhere) and bitwise ops (but we are specifically talking about metadata/attrs here) I really can't think of use-case where unsigned is a better decision than signed. I guess it's just a matter of how much effort would it require to fix it.
@svuckovicTT I think we should discuss this with Metal team before we move with the 1st option.
Either way I64ArrayAttr
is a bad decision here, use either DenseI64ArrayAttr
or I32ArrayAttr
.
table RepeatOp { | ||
in: tt.target.TensorRef; | ||
out: tt.target.TensorRef; | ||
shape: [int32]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking into account comment about ttnn.repeat
this should be either [uint32]
or [int64]
.
6ed4be5
to
26a4297
Compare
26a4297
to
f25b452
Compare
Sorry for chiming in late, I missed this discussion as it was being held. I agree that shlo broadcast in dim should be lowered into multiple ops, an explicit rank change, when needed, and the actual broadcast. I would prefer that for the broadcasting operation we use the name ttir.broadcast as opposed to ttir.repeat. I think it's more clear to users of the dialect of what we intend to do, i.e. the value will be broadcast to all relevant cores/kernels, etc. If a particular ttnn op does not support operand broadcast, that is a ttnn limitation that should not leak into ttir dialect. In that case, we can repeat the value ahead of time and provide the full sized input to the op. This should happen in ttnn. |
Are you suggesting that we also always fold the The alternative solution is to apply this limitation at the TTIR level using |
ttir dialect is a dialect meant to represent what we want to do with the graph, and we should prevent ttnn limitations from leaking in. My preference would be:
Since we don't currently have a ttnnToTtnnPipeline to do the third step, we could do it in the lowering and in the work arounds. I do think we should keep the op that does broadcasting in ttir called broadcast. |
Had a quick chat with @uazizTT offline, want to summarize here for others. We both agree that folding in ttnn is probably the best approach in the long term because presumably, different backends will have different broadcasting capabilities and the folder should be aware of that. That said, until we have the infrastructure in place to do the folding in ttnn, we can have it in ttir using I'm OK with this PR as is, as long as we change the name to ttir.broadcast. One the folding PR lands, the overall flow would be:
|
…ue to scalar to 1d-tensor conversion. Updated tests.
f25b452
to
59d5fa5
Compare
Fixes #1348
Fixes #1345
Fixes #1235
This provides a general solution to lower
ttir.broadcast
by splitting it intottnn.reshape
(where required) andttnn.repeat
. This should provide a general lowering forttiir.broadcast
op.A subsequent change will provide an optimization to fold broadcast whenever possible with binary or ternary eltwise ops.
Example:
This gets lowered to the following TTNN graph: