riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Tamar Christina <tamar.christina@arm.com>	2025-10-31 16:07:24 +0000
committer	Tamar Christina <tamar.christina@arm.com>	2025-10-31 16:07:24 +0000
commit	58ee2079230e0340187a5a9990891fb174034301 (patch)
tree	0c769564c525a6ebd55756854cd03185d2991e26 /contrib/gcc-changelog/git_commit.py
parent	82384a2eda628a11a8bb6484b229f1feab8fa97b (diff)
download	gcc-58ee2079230e0340187a5a9990891fb174034301.zip gcc-58ee2079230e0340187a5a9990891fb174034301.tar.gz gcc-58ee2079230e0340187a5a9990891fb174034301.tar.bz2

AArch64: support bf16 to sf extensions [PR121853]

It looks like during the upstreaming of BF16 we didn't implement the extend optab for it. As a result we go through soft-float emulation which results in massive performance drop in projects using BF16. As an example, for float convert(__bf16 value) { return (float)value; } we generate: convert(__bf16): stp x29, x30, [sp, -16]! mov x29, sp bl __extendbfsf2 ldp x29, x30, [sp], 16 ret and after this patch convert: movi v31.4s, 0 ext v0.16b, v31.16b, v0.16b, #14 ret We generate an ext with movi because this has same latency as a shift however it has twice the throughput. The zero vector is zero latency as such in real workloads this codegen is much better than using shifts. As a reminder, BF16 -> FP32 is just shifting left 16 bits. The expand pattern has to rely on generating multiple subregs due to a restriction that subregs can't chang floating point size and type at the same time. I've tried alternative approaches like using the EXT as SF mode, but the paradoxical subreg of BF -> SF isn't allowed and using an extend doesn't work because extend is what we're defining. gcc/ChangeLog: PR target/121853 * config/aarch64/aarch64-simd.md (extendbfsf2): New. gcc/testsuite/ChangeLog: PR target/121853 * gcc.target/aarch64/pr121853_1.c: New test. * gcc.target/aarch64/pr121853_2.c: New test.

Diffstat (limited to 'contrib/gcc-changelog/git_commit.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: