aboutsummaryrefslogtreecommitdiff
path: root/gcc/fortran/trans-expr.cc
diff options
context:
space:
mode:
authorJan Hubicka <hubicka@ucw.cz>2025-03-03 19:12:20 +0100
committerJan Hubicka <hubicka@ucw.cz>2025-03-04 16:10:34 +0100
commitc84be624e079cd748df93a3dc0b5168865fefee9 (patch)
tree8c7fb4c2cedcbf27e6f437b5e5a949d973b56f17 /gcc/fortran/trans-expr.cc
parent173cf7c9b8c0d61bb2cb0bd3a9e3150b393ab59a (diff)
downloadgcc-c84be624e079cd748df93a3dc0b5168865fefee9.zip
gcc-c84be624e079cd748df93a3dc0b5168865fefee9.tar.gz
gcc-c84be624e079cd748df93a3dc0b5168865fefee9.tar.bz2
Make ix86_macro_fusion_pair_p and ix86_fuse_mov_alu_p match current CPUs
The current implementation of fussion predicates misses some common fussion cases on zen and more recent cores. I added knobs for individual conditionals we test. 1) I split checks for fusing ALU with conditional operands when the ALU has memory operand. This seems to be supported by zen3+ and by tigerlake and coperlake (according to Agner Fog's manual) 2) znver4 and 5 supports fussion of ALU and conditional even if ALU has memory and immediate operands. This seems to be relatively important enabling 25% more fusions on gcc bootstrap. 3) no CPU supports fusing when ALU contains IP relative memory references. I added separate knob so we do not forger about this if this gets supoorted later. The patch does not solve the limitation of sched that fuse pairs must be adjacent on imput and the first operation must be signle-set. Fixing single-set is easy (I have separate patch for this), for non-adjacent pairs we need bigger surgery. To verify what CPU really does I made simpe test script. jh@ryzen3:~> cat fuse-test.c int b; const int z = 0; const int o = 1; int main() { int a = 1000000000; int b; int z = 0; int o = 1; asm volatile ("\n" ".L1234:\n" "nop\n" "subl %3, %0\n" "movl %0, %1\n" "cmpl %2, %1\n" "movl %0, %1\n" "test %1, %1\n" "nop\n" "jne .L1234":"=a"(a), "=m"(b) "=r"(b) : "m"(z), "m"(o), "i"(0), "i"(1), "0"(a) ); } jh@ryzen3:~> cat fuse-test.sh EVENT=ex_ret_fused_instr dotest() { gcc -O2 fuse-test.c $* -o fuse-cmp-imm-mem-nofuse perf stat -e $EVENT ./fuse-cmp-imm-mem-nofuse 2>&1 | grep $EVENT gcc -O2 fuse-test.c -DFUSE $* -o fuse-cmp-imm-mem-fuse perf stat -e $EVENT ./fuse-cmp-imm-mem-fuse 2>&1 | grep $EVENT } echo ALU with immediate dotest echo ALU with memory dotest -D MEM echo ALU with IP relative memory dotest -D MEM -D IPRELATIVE echo CMP with immediate dotest -D CMP echo CMP with memory dotest -D CMP -D MEM echo CMP with memory and immediate dotest -D CMP -D MEMIMM echo CMP with IP relative memory dotest -D CMP -D MEM -D IPRELATIVE echo TEST dotest -D TEST On zen5 I get: ALU with immediate 20,345 ex_ret_fused_instr:u 1,000,020,278 ex_ret_fused_instr:u ALU with memory 20,367 ex_ret_fused_instr:u 1,000,020,290 ex_ret_fused_instr:u ALU with IP relative memory 20,395 ex_ret_fused_instr:u 20,403 ex_ret_fused_instr:u CMP with immediate 20,369 ex_ret_fused_instr:u 1,000,020,301 ex_ret_fused_instr:u CMP with memory 20,314 ex_ret_fused_instr:u 1,000,020,341 ex_ret_fused_instr:u CMP with memory and immediate 20,372 ex_ret_fused_instr:u 1,000,020,266 ex_ret_fused_instr:u CMP with IP relative memory 20,382 ex_ret_fused_instr:u 20,369 ex_ret_fused_instr:u TEST 20,346 ex_ret_fused_instr:u 1,000,020,301 ex_ret_fused_instr:u IP relative memory seems to not be documented. On zen3/4 I get: ALU with immediate 20,263 ex_ret_fused_instr:u 1,000,020,051 ex_ret_fused_instr:u ALU with memory 20,255 ex_ret_fused_instr:u 1,000,020,056 ex_ret_fused_instr:u ALU with IP relative memory 20,253 ex_ret_fused_instr:u 20,266 ex_ret_fused_instr:u CMP with immediate 20,264 ex_ret_fused_instr:u 1,000,020,052 ex_ret_fused_instr:u CMP with memory 20,253 ex_ret_fused_instr:u 1,000,019,794 ex_ret_fused_instr:u CMP with memory and immediate 20,260 ex_ret_fused_instr:u 20,264 ex_ret_fused_instr:u CMP with IP relative memory 20,258 ex_ret_fused_instr:u 20,256 ex_ret_fused_instr:u TEST 20,261 ex_ret_fused_instr:u 1,000,020,048 ex_ret_fused_instr:u zen1 and 2 gets: ALU with immediate 21,610 ex_ret_fus_brnch_inst:u 21,697 ex_ret_fus_brnch_inst:u ALU with memory 21,479 ex_ret_fus_brnch_inst:u 21,747 ex_ret_fus_brnch_inst:u ALU with IP relative memory 21,623 ex_ret_fus_brnch_inst:u 21,684 ex_ret_fus_brnch_inst:u CMP with immediate 21,708 ex_ret_fus_brnch_inst:u 1,000,021,288 ex_ret_fus_brnch_inst:u CMP with memory 21,689 ex_ret_fus_brnch_inst:u 1,000,004,270 ex_ret_fus_brnch_inst:u CMP with memory and immediate 21,604 ex_ret_fus_brnch_inst:u 21,671 ex_ret_fus_brnch_inst:u CMP with IP relative memory 21,589 ex_ret_fus_brnch_inst:u 21,602 ex_ret_fus_brnch_inst:u TEST 21,600 ex_ret_fus_brnch_inst:u 1,000,021,233 ex_ret_fus_brnch_inst:u I tested the patch on zen3 and zen5 and spec2k17 and it seems neutral, however the number of fussion does go up. Bootstrapped/regtested x86_64-linux, I plan to commit it tomorrow. Honza gcc/ChangeLog: * config/i386/i386.h (TARGET_FUSE_ALU_AND_BRANCH_MEM): New macro. (TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM): New macro. (TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New macro. * config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Support non-single-set. (ix86_macro_fusion_pair_p): Allow ALU which only clobbers; be more careful about immediates; check TARGET_FUSE_ALU_AND_BRANCH_MEM, TARGET_FUSE_ALU_AND_BRANCH_MEM_IMM, TARGET_FUSE_ALU_AND_BRANCH_RIP_RELATIVE; verify that we never use unsigned checks with inc/dec. * config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): New tune. (X86_TUNE_FUSE_ALU_AND_BRANCH_MEM): New tune. (X86_TUNE_FUSE_ALU_AND_BRANCH_MEM_IMM): New tune. (X86_TUNE_FUSE_ALU_AND_BRANCH_RIP_RELATIVE): New tune.
Diffstat (limited to 'gcc/fortran/trans-expr.cc')
0 files changed, 0 insertions, 0 deletions