aboutsummaryrefslogtreecommitdiff
path: root/include/hsa_ext_image.h
diff options
context:
space:
mode:
authorRoger Sayle <roger@nextmovesoftware.com>2021-11-26 17:22:10 +0000
committerRoger Sayle <roger@nextmovesoftware.com>2021-11-26 17:22:10 +0000
commitb41be002eda047093bbf4757cb65ffb4d525cc35 (patch)
tree0763139a76a5ff2083d805b6dc8628fa8209f642 /include/hsa_ext_image.h
parent665f726b8a151a2685cd1804dc2ee147eb0cd0eb (diff)
downloadgcc-b41be002eda047093bbf4757cb65ffb4d525cc35.zip
gcc-b41be002eda047093bbf4757cb65ffb4d525cc35.tar.gz
gcc-b41be002eda047093bbf4757cb65ffb4d525cc35.tar.bz2
ivopts: Improve code generated for very simple loops.
This patch tidies up the code that GCC generates for simple loops, by selecting/generating a simpler loop bound expression in ivopts. The original motivation came from looking at the following loop (from gcc.target/i386/pr90178.c) int *find_ptr (int* mem, int sz, int val) { for (int i = 0; i < sz; i++) if (mem[i] == val) return &mem[i]; return 0; } which GCC currently compiles to: find_ptr: movq %rdi, %rax testl %esi, %esi jle .L4 leal -1(%rsi), %ecx leaq 4(%rdi,%rcx,4), %rcx jmp .L3 .L7: addq $4, %rax cmpq %rcx, %rax je .L4 .L3: cmpl %edx, (%rax) jne .L7 ret .L4: xorl %eax, %eax ret Notice the relatively complex leal/leaq instructions, that result from ivopts using the following expression for the loop bound: inv_expr 2: ((unsigned long) ((unsigned int) sz_8(D) + 4294967295) * 4 + (unsigned long) mem_9(D)) + 4 which results from NITERS being (unsigned int) sz_8(D) + 4294967295, i.e. (sz - 1), and the logic in cand_value_at determining the bound as BASE + NITERS*STEP at the start of the final iteration and as BASE + NITERS*STEP + STEP at the end of the final iteration. Ideally, we'd like the middle-end optimizers to simplify BASE + NITERS*STEP + STEP as BASE + (NITERS+1)*STEP, especially when NITERS already has the form BOUND-1, but with type conversions and possible overflow to worry about, the above "inv_expr 2" is the best that can be done by fold (without additional context information). This patch improves ivopts' cand_value_at by instead of using just the tree expression for NITERS, passing the data structure that explains how that expression was derived. This allows us to peek under the surface to check that NITERS+1 doesn't overflow, and in this patch to use the SSA_NAME already holding the required value. In the motivating loop above, inv_expr 2 now becomes: (unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D) And as a result, on x86_64 we now generate: find_ptr: movq %rdi, %rax testl %esi, %esi jle .L4 movslq %esi, %rsi leaq (%rdi,%rsi,4), %rcx jmp .L3 .L7: addq $4, %rax cmpq %rcx, %rax je .L4 .L3: cmpl %edx, (%rax) jne .L7 ret .L4: xorl %eax, %eax ret This improvement required one minor tweak to GCC's testsuite for gcc.dg/wrapped-binop-simplify.c, where we again generate better code, and therefore no longer find as many optimization opportunities in later passes (vrp2). Previously: void v1 (unsigned long *in, unsigned long *out, unsigned int n) { int i; for (i = 0; i < n; i++) { out[i] = in[i]; } } on x86_64 generated: v1: testl %edx, %edx je .L1 movl %edx, %edx xorl %eax, %eax .L3: movq (%rdi,%rax,8), %rcx movq %rcx, (%rsi,%rax,8) addq $1, %rax cmpq %rax, %rdx jne .L3 .L1: ret and now instead generates: v1: testl %edx, %edx je .L1 movl %edx, %edx xorl %eax, %eax leaq 0(,%rdx,8), %rcx .L3: movq (%rdi,%rax), %rdx movq %rdx, (%rsi,%rax) addq $8, %rax cmpq %rax, %rcx jne .L3 .L1: ret 2021-11-26 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * tree-ssa-loop-ivopts.c (cand_value_at): Take a class tree_niter_desc* argument instead of just a tree for NITER. If we require the iv candidate value at the end of the final loop iteration, try using the original loop bound as the NITER for sufficiently simple loops. (may_eliminate_iv): Update (only) call to cand_value_at. gcc/testsuite/ChangeLog * gcc.dg/wrapped-binop-simplify.c: Update expected test result. * gcc.dg/tree-ssa/ivopts-5.c: New test case. * gcc.dg/tree-ssa/ivopts-6.c: New test case. * gcc.dg/tree-ssa/ivopts-7.c: New test case. * gcc.dg/tree-ssa/ivopts-8.c: New test case. * gcc.dg/tree-ssa/ivopts-9.c: New test case.
Diffstat (limited to 'include/hsa_ext_image.h')
0 files changed, 0 insertions, 0 deletions