aboutsummaryrefslogtreecommitdiff
path: root/gcc/common
diff options
context:
space:
mode:
authorChristoph Müllner <christoph.muellner@vrull.eu>2024-05-15 12:18:20 -0600
committerJeff Law <jlaw@ventanamicro.com>2024-05-15 15:14:04 -0600
commit4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe (patch)
tree84dfd769900a6b88c0978ae4d1cc1c0e8911793f /gcc/common
parent1a05332bbac98a4c002bef3fb45a3ad9d56b3a71 (diff)
downloadgcc-4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe.zip
gcc-4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe.tar.gz
gcc-4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe.tar.bz2
[v2,1/2] RISC-V: Add cmpmemsi expansion
GCC has a generic cmpmemsi expansion via the by-pieces framework, which shows some room for target-specific optimizations. E.g. for comparing two aligned memory blocks of 15 bytes we get the following sequence: my_mem_cmp_aligned_15: li a4,0 j .L2 .L8: bgeu a4,a7,.L7 .L2: add a2,a0,a4 add a3,a1,a4 lbu a5,0(a2) lbu a6,0(a3) addi a4,a4,1 li a7,15 // missed hoisting subw a5,a5,a6 andi a5,a5,0xff // useless beq a5,zero,.L8 lbu a0,0(a2) // loading again! lbu a5,0(a3) // loading again! subw a0,a0,a5 ret .L7: li a0,0 ret Diff first byte: 15 insns Diff second byte: 25 insns No diff: 25 insns Possible improvements: * unroll the loop and use load-with-displacement to avoid offset increments * load and compare multiple (aligned) bytes at once * Use the bitmanip/strcmp result calculation (reverse words and synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence) When applying these improvements we get the following sequence: my_mem_cmp_aligned_15: ld a5,0(a0) ld a4,0(a1) bne a5,a4,.L2 ld a5,8(a0) ld a4,8(a1) slli a5,a5,8 slli a4,a4,8 bne a5,a4,.L2 li a0,0 .L3: sext.w a0,a0 ret .L2: rev8 a5,a5 rev8 a4,a4 sltu a5,a5,a4 neg a5,a5 ori a0,a5,1 j .L3 Diff first byte: 11 insns Diff second byte: 16 insns No diff: 11 insns This patch implements this improvements. The tests consist of a execution test (similar to gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests that test the expansion conditions (known length and alignment). Similar to the cpymemsi expansion this patch does not introduce any gating for the cmpmemsi expansion (on top of requiring the known length, alignment and Zbb). Bootstrapped and SPEC CPU 2017 tested. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_expand_block_compare): New prototype. * config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper for zero_extendhi. (do_load_from_addr): Add support for HI and SI/64 modes. (do_load): Add helper for zero-extended loads. (emit_memcmp_scalar_load_and_compare): New helper to emit memcmp. (emit_memcmp_scalar_result_calculation): Likewise. (riscv_expand_block_compare_scalar): Likewise. (riscv_expand_block_compare): New RISC-V expander for memory compare. * config/riscv/riscv.md (cmpmemsi): New cmpmem expansion. gcc/testsuite/ChangeLog: * gcc.target/riscv/cmpmemsi-1.c: New test. * gcc.target/riscv/cmpmemsi-2.c: New test. * gcc.target/riscv/cmpmemsi-3.c: New test. * gcc.target/riscv/cmpmemsi.c: New test.
Diffstat (limited to 'gcc/common')
0 files changed, 0 insertions, 0 deletions