Support for `tritongpu.upcast_mxfp` operation #2700

etiotto · 2024-11-13T16:28:02Z

Add initial support for the new tritongpu upcast_mxfp operation.

Signed-off-by: Tiotto, Ettore <[email protected]>

etiotto · 2024-11-13T16:40:10Z

Note: Merging upstream to 1cf7b1b31cde8c62611e421becd4648c7284d76c should make this PR smaller (changes to NVidia and AMD implementation of upcast_mxfp would be coming in from the merge).

etiotto · 2024-11-13T16:31:46Z

include/triton/Conversion/TritonGPUToLLVM/Utility.h

-// standalone values and returns them as a pair for (high 4 bits, low 4 bits).
-std::pair<Value, Value> convertMxfp4x2ToBf16x2(RewriterBase &rewriter,
-                                               Location loc, Value v);
+// Convert each value, which is an int8 containing 2 packed mxfp4 values,


Note: This is identical to upstream code as of commit 1cf7b1b

etiotto · 2024-11-13T16:31:53Z

lib/Conversion/TritonGPUToLLVM/Utility.cpp

-
-  return {v0, v1};
-}
+SmallVector<Value> convertMxfp4x2ToBf16x2(RewriterBase &rewriter, Location loc,


Note: This is identical to upstream code as of commit 1cf7b1b

etiotto · 2024-11-13T16:34:09Z

third_party/amd/lib/TritonAMDGPUToLLVM/UpcastMXFPToLLVM.cpp

@@ -19,17 +19,6 @@ using namespace mlir::triton::gpu;

 namespace {

-Value mxfpScaleBf16(RewriterBase &rewriter, Location loc, Value v,


Note: This is identical to upstream code as of commit 1cf7b1b

etiotto · 2024-11-13T16:38:51Z

third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/UpcastMXFPToLLVM.cpp

@@ -30,47 +30,6 @@ class UpcastMXFPOpPattern : public ConvertOpToLLVMPattern<UpcastMXFPOp> {
      : ConvertOpToLLVMPattern<UpcastMXFPOp>(typeConverter, benefit),
        targetInfo(targetInfo) {}

-  llvm::SmallVector<Value> unpackFP4Elements(Location loc,


Removed in 1cf7b1b

whitneywhtsang · 2024-11-14T15:59:48Z

Note: Merging upstream to 1cf7b1b31cde8c62611e421becd4648c7284d76c should make this PR smaller (changes to NVidia and AMD implementation of upcast_mxfp would be coming in from the merge).

Merging in #2707.

victor-eds

As this is just copying, LGTM. As in previous cases we found relying on logical bitwise operations for this kind of operations was slower, does it make sense to have a ticket to change the code in the future?

chengjunlu · 2024-11-18T01:53:53Z

lib/Dialect/TritonGPU/IR/Ops.cpp

+    auto parentEncoding = oldEncoding.getParent();
+
+    // Note: For Intel the dot operands layout's kWidth parameter must
+    // match the parent's dpas layout opsPerChannel. Given that the kWidth


opsPerChannel is defined by the HW DPAS instruction.
I think we should align the opsPerChannel to the result scalar type of the UpcastMXFPOp instead of double the size.
fp16/bf16 -> opsPerChannel=2

Otherwise there might be ambiguous in the lowering of UpcastMXFPOp.

chengjunlu · 2024-11-18T01:56:47Z

third_party/intel/lib/TritonIntelGPUToLLVM/UpcastMXFPToLLVM.cpp

+    if (fpType == ScaleDotElemType::E2M1)
+      xVals = LLVM::convertMxfp4x2ToBf16x2(rewriter, loc, xVals);
+
+    // Each thread owns elements of 4 mxfp vectors so we need 4 scales


The better we need to make sure the layout are expected of the DotOp with the DPAS as parent and the layout conversion from the source operand to the dest operand are supported as well.

etiotto added 3 commits November 11, 2024 18:21

[NFC]: Clean up AccelerateMatmul.cpp

4728298

Signed-off-by: Tiotto, Ettore <[email protected]>

Codegen for tritongpu.upcast_mxfp

f02c70f

Signed-off-by: Tiotto, Ettore <[email protected]>

Merge branch 'main' into etiotto.add_support_for_UpcastMXFPOp

ecf31df

etiotto self-assigned this Nov 13, 2024

etiotto linked an issue Nov 13, 2024 that may be closed by this pull request

Implement support for TritonGPU::UpcastMXFPOp for Intel XPU BE #2678

Open

etiotto requested a review from whitneywhtsang November 13, 2024 16:40

etiotto commented Nov 14, 2024

View reviewed changes

etiotto marked this pull request as ready for review November 14, 2024 14:12

etiotto requested review from chengjunlu, a team, Dewei-Wang-sh, alexbaden, LiyangLingIntel and quintinwang5 November 14, 2024 14:13

victor-eds reviewed Nov 15, 2024

View reviewed changes

Merge branch 'main' into etiotto.add_support_for_UpcastMXFPOp

6c8b589

etiotto requested a review from victor-eds November 15, 2024 14:54

chengjunlu reviewed Nov 18, 2024

View reviewed changes

Merge branch 'main' into etiotto.add_support_for_UpcastMXFPOp

d777d02

etiotto marked this pull request as draft November 26, 2024 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for `tritongpu.upcast_mxfp` operation #2700

Support for `tritongpu.upcast_mxfp` operation #2700

etiotto commented Nov 13, 2024

etiotto commented Nov 13, 2024

etiotto Nov 13, 2024

etiotto Nov 13, 2024

etiotto Nov 13, 2024

etiotto Nov 13, 2024

whitneywhtsang commented Nov 14, 2024

victor-eds left a comment

chengjunlu Nov 18, 2024 •

edited

Loading

chengjunlu Nov 18, 2024 •

edited

Loading

		@@ -19,17 +19,6 @@ using namespace mlir::triton::gpu;

		namespace {

		Value mxfpScaleBf16(RewriterBase &rewriter, Location loc, Value v,

Support for tritongpu.upcast_mxfp operation #2700

Are you sure you want to change the base?

Support for tritongpu.upcast_mxfp operation #2700

Conversation

etiotto commented Nov 13, 2024

etiotto commented Nov 13, 2024

etiotto Nov 13, 2024

Choose a reason for hiding this comment

etiotto Nov 13, 2024

Choose a reason for hiding this comment

etiotto Nov 13, 2024

Choose a reason for hiding this comment

etiotto Nov 13, 2024

Choose a reason for hiding this comment

whitneywhtsang commented Nov 14, 2024

victor-eds left a comment

Choose a reason for hiding this comment

chengjunlu Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

chengjunlu Nov 18, 2024 • edited Loading

Choose a reason for hiding this comment

Support for `tritongpu.upcast_mxfp` operation #2700

Support for `tritongpu.upcast_mxfp` operation #2700

chengjunlu Nov 18, 2024 •

edited

Loading

chengjunlu Nov 18, 2024 •

edited

Loading