riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	peterbell10 <peterbell10@live.co.uk>	2023-11-17 17:51:50 +0000
committer	GitHub <noreply@github.com>	2023-11-17 09:51:50 -0800
commit	4263b2ecf8c4b9b62094a731bb92c501197531b0 (patch)
tree	4532ed395d2d32d5fc983c898687f316d1620a4b /llvm/lib/Object/ELF.cpp
parent	bfbfd1caa4da70774547c1c298e482661822a137 (diff)
download	llvm-4263b2ecf8c4b9b62094a731bb92c501197531b0.zip llvm-4263b2ecf8c4b9b62094a731bb92c501197531b0.tar.gz llvm-4263b2ecf8c4b9b62094a731bb92c501197531b0.tar.bz2

[NVPTX] Expand EXTLOAD for v8f16 and v8bf16 (#72672)

In openai/triton#2483 I've encountered a bug in the NVPTX codegen. Given `load<8 x half>` followed by `fpext to <8 x float>` we get ``` ld.shared.v4.b16 {%f1, %f2, %f3, %f4}, [%r15+8]; ld.shared.v4.b16 {%f5, %f6, %f7, %f8}, [%r15]; ``` Which loads float16 values into float registers without any conversion and the result is simply garbage. This PR brings `v8f16` and `v8bf16` into line with the other vector types by expanding it to load + cvt. cc @manman-ren @Artem-B @jlebar

Diffstat (limited to 'llvm/lib/Object/ELF.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: