diff options
author | Jessica Del <50999226+OutOfCache@users.noreply.github.com> | 2023-10-30 16:23:49 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-10-30 16:23:49 +0100 |
commit | 849297c97d9e87584cae7c83fcca9686f784d54a (patch) | |
tree | be596ade81faec9c00e8841301e6db04720f84bd /llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp | |
parent | 72e6c1c70d5e07bbc8cb7cae2ed915108daf93aa (diff) | |
download | llvm-849297c97d9e87584cae7c83fcca9686f784d54a.zip llvm-849297c97d9e87584cae7c83fcca9686f784d54a.tar.gz llvm-849297c97d9e87584cae7c83fcca9686f784d54a.tar.bz2 |
[AMDGPU][wmma] - Add tied wmma intrinsic (#69903)
These new intrinsics, `amdgcn_wmma_tied_f16_16x16x16_f16` and
`amdgcn_wmma_tied_f16_16x16x16_f16`,
explicitly tie the destination accumulator matrix to the input
accumulator matrix.
The `wmma_f16` and `wmma_bf16` intrinsics only write to 16-bit of the
32-bit destination VGPRs.
Which half is determined via the `op_sel` argument. The other half of
the destination registers remains unchanged.
In some cases however, we expect the destination to copy the other
halves from the input accumulator.
For instance, when packing two separate accumulator matrices into one.
In that case, the two matrices
are tied into the same registers, but separate halves. Then it is
important to copy the other matrix values
to the new destination.
Diffstat (limited to 'llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp')
0 files changed, 0 insertions, 0 deletions