riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Muzammil <55665739+Muzammiluddin-Syed-ECE@users.noreply.github.com>	2025-09-18 15:25:14 -0400
committer	GitHub <noreply@github.com>	2025-09-18 19:25:14 +0000
commit	9628061e055c9f695ff80f9a74e4f6e524b34993 (patch)
tree	ae3e8aa2b1457dfc26339db4428591463382c002 /llvm/lib/Support/CommandLine.cpp
parent	8c41859a21a4d0cfda164cc58f4a5336dbcd30d1 (diff)
download	llvm-9628061e055c9f695ff80f9a74e4f6e524b34993.zip llvm-9628061e055c9f695ff80f9a74e4f6e524b34993.tar.gz llvm-9628061e055c9f695ff80f9a74e4f6e524b34993.tar.bz2

[mlir][AMDGPU] Add canonicalization pattern to pack scales for ScaledMFMAOp (#155951)

The ScaledMFMAOp accepts scales as a vector of 4 bytes (`vector<4xf8E8M0FNU>`) that can be stored in a single register with a particular scale accessed using the `OpSel` attribute. Currently, we only use one byte in this 4-byte vector, resulting in 3 wasted registers. This is fixed by identifying when single byte extractions are performed and rewriting them into extractions of 4-byte vectors. Example: ``` %unit = vector.extract %ScaleSrc[offsets] : f8E8M0FNU from vector<?x?x?xf8E8M0FNU> %scale = vector.insert %unit, ... : f8E8M0FNU into vector<4xf8E8M0FNU> amdgpu.scaled_mfma(%scale[0] * ... ``` to ``` %reshaped = vector.shape_cast %ScaleSrc : vector<?x?x?xf8E8M0FNU> to vector<?x4xf8E8M0FNU> %scale = vector.extract %reshaped[?] : vector<4xf8E8M0FNU> from vector<?x4xf8E8M0FNU> amdgpu.scaled_mfma(%scale[0-3] * ... ``` --------- Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>

Diffstat (limited to 'llvm/lib/Support/CommandLine.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: