aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2021-11-29Optimize _Float16 usage for non AVX512FP16.liuhongt5-8/+41
1. No memory is needed to move HI/HFmode between GPR and SSE registers under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o AVX512FP16. 2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant initialization cound be eliminated. gcc/ChangeLog: PR target/102811 * config/i386/i386.c (inline_secondary_memory_needed): HImode move between GPR and SSE registers is supported under TARGET_SSE2 and above. * config/i386/i386.md (extendhfsf2): Optimize expander. (truncsfhf2): Ditto. * config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to align with V8HImode. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102811-2.c: New test. * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new scan-assembler-times.
2021-11-29Fix regression introduced by r12-5536.liuhongt3-18/+29
There're several failures: 1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)" %vpextrw should be used in output templates. 2. ICE in get_attr_memory for movhi_internal since some alternatives are marked as TYPE_SSELOG. use TYPE_SSELOG1 instead. Also this patch fixs a typo and some latent bugs which are related to moving HImode from/to sse register w/o TARGET_AVX512FP16. gcc/ChangeLog: PR target/102811 PR target/103463 * config/i386/i386.c (ix86_secondary_reload): Without TARGET_SSE4_1, General register is needed to move HImode from sse register to memory. * config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of pextrw in output templates. * config/i386/i386.md (movhi_internal): Ditto, also fix typo of MEM_P (operands[1]) and adjust mode/prefix/type attribute for alternatives related to sse register.
2021-11-29tree-optimization/103458 - avoid creating new loops in CD-DCERichard Biener2-2/+27
When creating forwarders in CD-DCE we have to avoid creating loops where we formerly did not consider those because of abnormal predecessors. At this point simply excuse us when there are any abnormal predecessors. 2021-11-29 Richard Biener <rguenther@suse.de> PR tree-optimization/103458 * tree-ssa-dce.c (make_forwarders_with_degenerate_phis): Do not create forwarders for blocks with abnormal predecessors. * gcc.dg/torture/pr103458.c: New testcase.
2021-11-29Restore can_be_invalidated_p semantics to before refactoringRichard Biener1-3/+5
This restores the semantics of can_be_invalidated_p to the original semantics of the function this was split out from tree-ssa-uninit.c. The current semantics only ever look at the first predicate which cannot be correct. 2021-11-26 Richard Biener <rguenther@suse.de> * gimple-predicate-analysis.cc (can_be_invalidated_p): Restore semantics to the one before the split from tree-ssa-uninit.c.
2021-11-28rs6000/test: Add emulated gather test caseKewen Lin1-0/+20
As verified, the emulated gather capability of vectorizer (r12-2733) can help to speed up SPEC2017 510.parest_r on Power8/9/10 by 5% ~ 9% with option sets Ofast unroll and Ofast lto. This patch is to add a test case similar to the one in i386 to add testing coverage for 510.parest_r hotspots. btw, different from the one in i386, this uses unsigned int as INDEXTYPE since the unpack support for unsigned int (r12-3134) also matters for the hotspots vectorization. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vect-gather-1.c: New test.
2021-11-29Daily bump.GCC Administrator3-1/+32
2021-11-28Compare guessed and feedback frequencies during profile feedback stream-inJan Hubicka1-5/+73
This patch adds simple code to dump and compare frequencies of basic blocks read from the profile feedback and frequencies guessed statically. It dumps basic blocks in the order of decreasing frequencies from feedback along with guessed frequencies and histograms. It makes it to possible spot basic blocks in hot regions that are considered cold by guessed profile or vice versa. I am trying to figure out how realistic our profile estimate is compared to read one on exchange2 (looking again into PR98782. There IRA now places spills into hot regions of code while with older (and worse) profile it did not. Catch is that the function is very large and has 9 nested loops, so it is hard to figure out how to improve the profile estimate and/or IRA. gcc/ChangeLog: 2021-11-28 Jan Hubicka <hubicka@ucw.cz> * profile.c: Include sreal.h (struct bb_stats): New. (cmp_stats): New function. (compute_branch_probabilities): Output bb stats.
2021-11-28Improve -fprofile-reportJan Hubicka5-124/+269
Profile-report was never properly updated after switch to new profile representation. This patch fixes the way profile mismatches are calculated: we used to collect separately count and freq mismatches, while now we have only counts & probabilities. So we verify - in count: that total count of incomming edges is close to acutal count of the BB - out prob: that total sum of outgoing edge edge probabilities is close to 1 (except for BB containing noreturn calls or EH). Moreover I added dumping of absolute data which is useful to plot them: with Martin Liska we plan to setup regular testing so we keep optimizers profie updates bit under control. Finally I added both static and dynamic stats about mismatches - static one is simply number of inconsistencies in the cfg while dynamic is scaled by the profile - I think in order to keep eye on optimizers the first number is quite relevant. WHile when tracking why code quality regressed the second number matters more. 2021-11-28 Jan Hubicka <hubicka@ucw.cz> * cfghooks.c: Include sreal.h, profile.h. (profile_record_check_consistency): Fix checking of count counsistency; record also dynamic mismatches. * cfgrtl.c (rtl_account_profile_record): Similarly. * tree-cfg.c (gimple_account_profile_record): Likewise. * cfghooks.h (struct profile_record): Remove num_mismatched_freq_in, num_mismatched_freq_out, turn time to double, add dyn_mismatched_prob_out, dyn_mismatched_count_in, num_mismatched_prob_out; remove num_mismatched_count_out. * passes.c (account_profile_1): New function. (account_profile_in_list): New function. (pass_manager::dump_profile_report): Rewrite. (execute_one_ipa_transform_pass): Check profile consistency after running all passes. (execute_all_ipa_transforms): Remove cfun test; record all transform methods. (execute_one_pass): Fix collecting of profile stats.
2021-11-28d: fix thinko in optimize attr parsingMartin Liska1-1/+1
gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Fix thinko.
2021-11-28Daily bump.GCC Administrator4-1/+36
2021-11-27jit: Change printf specifiers for size_t to %zuPetter Tomner1-2/+2
Change four occurances of %ld specifier for size_t to %zu for clean 32bit builds. Signed-off-by 2021-11-27 Petter Tomner <tomner@kth.se> gcc/jit/ * libgccjit.c: %ld -> %zu
2021-11-27x86: Fix up x86_{,64_}sh{l,r}d patterns [PR103431]Jakub Jelinek2-42/+281
The following testcase is miscompiled because the x86_{,64_}sh{l,r}d patterns don't properly describe what the instructions do. One thing is left out, in particular that there is initial count &= 63 for sh{l,r}dq and initial count &= 31 for sh{l,r}d{l,w}. And another thing not described properly, in particular the behavior when count (after the masking) is 0. The pattern says it is e.g. res = (op0 << op2) | (op1 >> (64 - op2)) but that triggers UB on op1 >> 64. For op2 0 we actually want res = (op0 << op2) | 0 When constants are propagated to these patterns during RTL optimizations, both such problems trigger wrong-code issues. This patch represents the patterns as e.g. res = (op0 << (op2 & 63)) | (unsigned long long) ((uint128_t) op1 >> (64 - (op2 & 63))) so there is both the initial masking and op2 == 0 behavior results in zero being ored. The patch introduces alternate patterns for constant op2 where simplify-rtx.c will fold those expressions into simple numbers, and define_insn_and_split pre-reload splitter for how the patterns looked before into the new form, so that it can pattern match during combine even computations that assumed the shift amount will be in the range of 1 .. bitsize-1. 2021-11-27 Jakub Jelinek <jakub@redhat.com> PR middle-end/103431 * config/i386/i386.md (x86_64_shld, x86_shld, x86_64_shrd, x86_shrd): Change insn pattern to accurately describe the instructions. (*x86_64_shld_1, *x86_shld_1, *x86_64_shrd_1, *x86_shrd_1): New define_insn patterns. (*x86_64_shld_2, *x86_shld_2, *x86_64_shrd_2, *x86_shrd_2): New define_insn_and_split patterns. (*ashl<dwi>3_doubleword_mask, *ashl<dwi>3_doubleword_mask_1, *<insn><dwi>3_doubleword_mask, *<insn><dwi>3_doubleword_mask_1, ix86_rotl<dwi>3_doubleword, ix86_rotr<dwi>3_doubleword): Adjust splitters for x86_{,64_}sh{l,r}d pattern changes. * gcc.dg/pr103431.c: New test.
2021-11-27bswap: Fix UB in find_bswap_or_nop_finalize [PR103435]Jakub Jelinek1-2/+8
On gcc.c-torture/execute/pr103376.c in the following code we trigger UB in the compiler. n->range is 8 because it is 64-bit load and rsize is 0 because it is a bswap sequence with load and known to be 0: /* Find real size of result (highest non-zero byte). */ if (n->base_addr) for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++); else rsize = n->range; The shifts then shift uint64_t by 64 bits. For this case mask is 0 and we want both *cmpxchg and *cmpnop as 0, the operation can be done as both nop and bswap and callers will prefer nop. 2021-11-27 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/103435 * gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Avoid UB if n->range - rsize == 8, just clear both *cmpnop and *cmpxchg in that case.
2021-11-27[Committed] Fix new ivopts-[89].c test cases for -m32.Roger Sayle2-2/+2
2021-11-27 Roger Sayle <roger@nextmovesoftware.com> gcc/testsuite/ChangeLog * gcc.dg/tree-ssa/ivopts-8.c: Fix new test case for -m32. * gcc.dg/tree-ssa/ivopts-9.c: Likewise.
2021-11-27Daily bump.GCC Administrator6-1/+146
2021-11-27ipa: Fix CFG fix-up in IPA-CP transform phase (PR 103441)Martin Jambor1-10/+8
I forgot that IPA passes before ipa-inline must not return TODO_cleanup_cfg from their transformation function because ordinary CFG cleanup does not remove call graph edges associated with removed call statements but must use delete_unreachable_blocks_update_callgraph instead. This patch fixes that error. gcc/ChangeLog: 2021-11-26 Martin Jambor <mjambor@suse.cz> PR ipa/103441 * ipa-prop.c (ipcp_transform_function): Call delete_unreachable_blocks_update_callgraph instead of returning TODO_cleanup_cfg.
2021-11-26Fortran: improve check of arguments to the RESHAPE intrinsicHarald Anlauf4-37/+41
gcc/fortran/ChangeLog: PR fortran/103411 * check.c (gfc_check_reshape): Improve check of size of source array for the RESHAPE intrinsic against the given shape when pad is not given, and shape is a parameter. Try other simplifications of shape. gcc/testsuite/ChangeLog: PR fortran/103411 * gfortran.dg/pr68153.f90: Adjust test to improved check. * gfortran.dg/reshape_7.f90: Likewise. * gfortran.dg/reshape_9.f90: New test.
2021-11-26tree-object-size: Abstract object_sizes arraySiddhesh Poyarekar1-79/+98
Put all accesses to object_sizes behind functions so that we can add dynamic capability more easily. gcc/ChangeLog: * tree-object-size.c (object_sizes_grow, object_sizes_release, object_sizes_unknown_p, object_sizes_get, object_size_set_force, object_sizes_set): New functions. (addr_object_size, compute_builtin_object_size, expr_object_size, call_object_size, unknown_object_size, merge_object_sizes, plus_stmt_object_size, cond_expr_object_size, collect_object_sizes_for, check_for_plus_in_loops_1, init_object_sizes, fini_object_sizes): Adjust. Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2021-11-26tree-object-size: Replace magic numbers with enumsSiddhesh Poyarekar1-25/+34
A simple cleanup to allow inserting dynamic size code more easily. gcc/ChangeLog: * tree-object-size.c: New enum. (object_sizes, computed, addr_object_size, compute_builtin_object_size, expr_object_size, call_object_size, merge_object_sizes, plus_stmt_object_size, collect_object_sizes_for, init_object_sizes, fini_object_sizes, object_sizes_execute): Replace magic numbers with enums. Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
2021-11-26ivopts: Improve code generated for very simple loops.Roger Sayle7-7/+106
This patch tidies up the code that GCC generates for simple loops, by selecting/generating a simpler loop bound expression in ivopts. The original motivation came from looking at the following loop (from gcc.target/i386/pr90178.c) int *find_ptr (int* mem, int sz, int val) { for (int i = 0; i < sz; i++) if (mem[i] == val) return &mem[i]; return 0; } which GCC currently compiles to: find_ptr: movq %rdi, %rax testl %esi, %esi jle .L4 leal -1(%rsi), %ecx leaq 4(%rdi,%rcx,4), %rcx jmp .L3 .L7: addq $4, %rax cmpq %rcx, %rax je .L4 .L3: cmpl %edx, (%rax) jne .L7 ret .L4: xorl %eax, %eax ret Notice the relatively complex leal/leaq instructions, that result from ivopts using the following expression for the loop bound: inv_expr 2: ((unsigned long) ((unsigned int) sz_8(D) + 4294967295) * 4 + (unsigned long) mem_9(D)) + 4 which results from NITERS being (unsigned int) sz_8(D) + 4294967295, i.e. (sz - 1), and the logic in cand_value_at determining the bound as BASE + NITERS*STEP at the start of the final iteration and as BASE + NITERS*STEP + STEP at the end of the final iteration. Ideally, we'd like the middle-end optimizers to simplify BASE + NITERS*STEP + STEP as BASE + (NITERS+1)*STEP, especially when NITERS already has the form BOUND-1, but with type conversions and possible overflow to worry about, the above "inv_expr 2" is the best that can be done by fold (without additional context information). This patch improves ivopts' cand_value_at by instead of using just the tree expression for NITERS, passing the data structure that explains how that expression was derived. This allows us to peek under the surface to check that NITERS+1 doesn't overflow, and in this patch to use the SSA_NAME already holding the required value. In the motivating loop above, inv_expr 2 now becomes: (unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D) And as a result, on x86_64 we now generate: find_ptr: movq %rdi, %rax testl %esi, %esi jle .L4 movslq %esi, %rsi leaq (%rdi,%rsi,4), %rcx jmp .L3 .L7: addq $4, %rax cmpq %rcx, %rax je .L4 .L3: cmpl %edx, (%rax) jne .L7 ret .L4: xorl %eax, %eax ret This improvement required one minor tweak to GCC's testsuite for gcc.dg/wrapped-binop-simplify.c, where we again generate better code, and therefore no longer find as many optimization opportunities in later passes (vrp2). Previously: void v1 (unsigned long *in, unsigned long *out, unsigned int n) { int i; for (i = 0; i < n; i++) { out[i] = in[i]; } } on x86_64 generated: v1: testl %edx, %edx je .L1 movl %edx, %edx xorl %eax, %eax .L3: movq (%rdi,%rax,8), %rcx movq %rcx, (%rsi,%rax,8) addq $1, %rax cmpq %rax, %rdx jne .L3 .L1: ret and now instead generates: v1: testl %edx, %edx je .L1 movl %edx, %edx xorl %eax, %eax leaq 0(,%rdx,8), %rcx .L3: movq (%rdi,%rax), %rdx movq %rdx, (%rsi,%rax) addq $8, %rax cmpq %rax, %rcx jne .L3 .L1: ret 2021-11-26 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * tree-ssa-loop-ivopts.c (cand_value_at): Take a class tree_niter_desc* argument instead of just a tree for NITER. If we require the iv candidate value at the end of the final loop iteration, try using the original loop bound as the NITER for sufficiently simple loops. (may_eliminate_iv): Update (only) call to cand_value_at. gcc/testsuite/ChangeLog * gcc.dg/wrapped-binop-simplify.c: Update expected test result. * gcc.dg/tree-ssa/ivopts-5.c: New test case. * gcc.dg/tree-ssa/ivopts-6.c: New test case. * gcc.dg/tree-ssa/ivopts-7.c: New test case. * gcc.dg/tree-ssa/ivopts-8.c: New test case. * gcc.dg/tree-ssa/ivopts-9.c: New test case.
2021-11-26d: fix ASAN in option processingMartin Liska1-1/+3
Fixes: ==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178 READ of size 4 at 0x00000666ca5c thread T0 #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855 #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916 #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887 #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829 #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427 #5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346 #6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967 #7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808 #8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146 for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d. gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Check index before accessing cl_options.
2021-11-26Minor ipa-modref tweaksJan Hubicka1-11/+13
To make dumps easier to read modref now dumps cgraph_node name rather then cfun name in function being analysed and I also fixed minor issue with ECF flags merging when updating inline summary. gcc/ChangeLog: 2021-11-26 Jan Hubicka <hubicka@ucw.cz> * ipa-modref.c (analyze_function): Drop parameter F and dump cgraph node name rather than cfun name. (modref_generate): Update. (modref_summaries::insert):Update. (modref_summaries_lto::insert):Update. (pass_modref::execute):Update. (ipa_merge_modref_summary_after_inlining): Improve combining of ECF_FLAGS.
2021-11-26Fix failure in inlline-9.c testcaseJan Hubicka1-1/+1
gcc/testsuite/ChangeLog: 2021-11-26 Jan Hubicka <hubicka@ucw.cz> * gcc.dg/ipa/inline-9.c: Update template.c
2021-11-26Fix handling of in_flags in update_escape_summary_1Jan Hubicka1-1/+1
update_escape_summary_1 has thinko where it compues proper min_flags but then stores original value (ignoring the fact whether there was a dereference in the escape point). PR ipa/102943 * ipa-modref.c (update_escape_summary_1): Fix handling of min_flags.
2021-11-26c++: Fix up taking address of an immediate function diagnostics [PR102753]Jakub Jelinek9-37/+165
On Wed, Oct 20, 2021 at 07:16:44PM -0400, Jason Merrill wrote: > or an unevaluated operand, or a subexpression of an immediate invocation. > > Hmm...that suggests that in consteval23.C, bar(foo) should also be OK, The following patch handles that by removing the diagnostics about taking address of immediate function from cp_build_addr_expr_1, and instead diagnoses it in cp_fold_r. To do that with proper locations, the patch attempts to ensure that ADDR_EXPRs of immediate functions get EXPR_LOCATION set and adds a PTRMEM_CST_LOCATION for PTRMEM_CSTs. Also, evaluation of std::source_location::current() is moved from genericization to cp_fold. 2021-11-26 Jakub Jelinek <jakub@redhat.com> PR c++/102753 * cp-tree.h (struct ptrmem_cst): Add locus member. (PTRMEM_CST_LOCATION): Define. * tree.c (make_ptrmem_cst): Set PTRMEM_CST_LOCATION to input_location. (cp_expr_location): Return PTRMEM_CST_LOCATION for PTRMEM_CST. * typeck.c (build_x_unary_op): Overwrite PTRMEM_CST_LOCATION for PTRMEM_CST instead of calling maybe_wrap_with_location. (cp_build_addr_expr_1): Don't diagnose taking address of immediate functions here. Instead when taking their address make sure the returned ADDR_EXPR has EXPR_LOCATION set. (expand_ptrmemfunc_cst): Copy over PTRMEM_CST_LOCATION to ADDR_EXPR's EXPR_LOCATION. (convert_for_assignment): Use cp_expr_loc_or_input_loc instead of EXPR_LOC_OR_LOC. * pt.c (tsubst_copy): Use build1_loc instead of build1. Ensure ADDR_EXPR of immediate function has EXPR_LOCATION set. * cp-gimplify.c (cp_fold_r): Diagnose taking address of immediate functions here. For consteval if don't walk THEN_CLAUSE. (cp_genericize_r): Move evaluation of calls to std::source_location::current from here to... (cp_fold): ... here. Don't assert calls to immediate functions must be source_location_current_p, instead only constant evaluate calls to source_location_current_p. * g++.dg/cpp2a/consteval20.C: Add some extra tests. * g++.dg/cpp2a/consteval23.C: Likewise. * g++.dg/cpp2a/consteval25.C: New test. * g++.dg/cpp2a/srcloc20.C: New test.
2021-11-26i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode ↵konglin15-11/+83
with -mf16c [PR 102811] Add define_insn extendhfsf2 and truncsfhf2 for target_f16c. gcc/ChangeLog: PR target/102811 * config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit data in XMM register for TARGET_SSE2. * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for TARGET_F16C. (extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only. (*extendhf<mode>2): Rename from extendhf<mode>2. (truncsfhf2): Likewise. (truncdfhf2): Likewise. (*trunc<mode>2): Likewise. gcc/testsuite/ChangeLog: PR target/102811 * gcc.target/i386/pr90773-21.c: Allow pextrw instead of movw. * gcc.target/i386/pr90773-23.c: Ditto. * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
2021-11-26Fix typo in r12-5486.liuhongt1-8/+8
gcc/ChangeLog: PR middle-end/103419 * match.pd: Fix typo, use the type of second parameter, not first one.
2021-11-26Daily bump.GCC Administrator5-1/+223
2021-11-25Remove forgotten early return in ipa_value_range_from_jfuncJan Hubicka2-1/+33
gcc/ChangeLog: * ipa-cp.c (ipa_value_range_from_jfunc): Remove forgotten early return. gcc/testsuite/ChangeLog: * gcc.dg/ipa/inline10.c: New test.
2021-11-25PR middle-end/103406: Check for Inf before simplifying x-x.Roger Sayle2-1/+17
This is a simple one line fix to the regression PR middle-end/103406, where x - x is being folded to 0.0 even when x is +Inf or -Inf. In GCC 11 and previously, we'd check whether the type honored NaNs (which implicitly covered the case where the type honors infinities), but my patch to test whether the operand could potentially be NaN failed to also check whether the operand could potentially be Inf. 2021-11-25 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/103406 * match.pd (minus @0 @0): Check tree_expr_maybe_infinite_p. gcc/testsuite/ChangeLog PR middle-end/103406 * gcc.dg/pr103406.c: New test case.
2021-11-25ipa: Teach IPA-CP transformation about IPA-SRA modifications (PR 103227)Martin Jambor8-24/+216
PR 103227 exposed an issue with ordering of transformations of IPA passes. IPA-CP can create clones for constants passed by reference and at the same time IPA-SRA can also decide that the parameter does not need to be a pointer (or an aggregate) and plan to convert it into (a) simple scalar(s). Because no intermediate clone is created just for the purpose of ordering the transformations and because IPA-SRA transformation is implemented as part of clone materialization, the IPA-CP transformation happens only afterwards, reversing the order of the transformations compared to the ordering of analyses. IPA-CP transformation looks at planned substitutions for values passed by reference or in aggregates but finds that all the relevant parameters no longer exist. Currently it subsequently simply gives up, leading to clones created for no good purpose (and huge regression of 548.exchange_r. This patch teaches it recognize the situation, look up the new scalarized parameter and perform value substitution on it. On my desktop this has recovered the lost exchange2 run-time (and some more). I have disabled IPA-SRA in a Fortran testcase so that the dumping from the transformation phase can still be matched in order to verify that IPA-CP understands the IL after verifying that it does the right thing also with IPA-SRA. gcc/ChangeLog: 2021-11-23 Martin Jambor <mjambor@suse.cz> PR ipa/103227 * ipa-prop.h (ipa_get_param): New overload. Move bits of the existing one to the new one. * ipa-param-manipulation.h (ipa_param_adjustments): New member function get_updated_index_or_split. * ipa-param-manipulation.c (ipa_param_adjustments::get_updated_index_or_split): New function. * ipa-prop.c (adjust_agg_replacement_values): Reimplement, add capability to identify scalarized parameters and perform substitution on them. (ipcp_transform_function): Create descriptors earlier, handle new return values of adjust_agg_replacement_values. gcc/testsuite/ChangeLog: 2021-11-23 Martin Jambor <mjambor@suse.cz> PR ipa/103227 * gcc.dg/ipa/pr103227-1.c: New test. * gcc.dg/ipa/pr103227-3.c: Likewise. * gcc.dg/ipa/pr103227-2.c: Likewise. * gfortran.dg/pr53787.f90: Disable IPA-SRA.
2021-11-25path solver: Revert computation of ranges in gimple order.Aldy Hernandez2-23/+11
Revert the patch below, as it may slow down compilation with large CFGs. commit 8acbd7bef6edbf537e3037174907029b530212f6 Author: Aldy Hernandez <aldyh@redhat.com> Date: Wed Nov 24 09:43:36 2021 +0100 path solver: Compute ranges in path in gimple order. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::compute_ranges_defined): Remove. (path_range_query::compute_ranges_in_block): Revert to bitmap order. * gimple-range-path.h: Remove compute_ranges_defined.
2021-11-25amdgcn: Fix ICE generating CFI [PR103396]Andrew Stubbs1-1/+1
gcc/ChangeLog: PR target/103396 * config/gcn/gcn.c (move_callee_saved_registers): Ensure that the number of spilled registers is counted correctly.
2021-11-25Add the testcase for this PR to the testsuite.Andrew MacLeod1-0/+21
Various ranger-enabled patches like threading and VRP2 can do this now, so add the testcase for posterity. gcc/testsuite/ PR tree-optimization/102648 * gcc.dg/pr102648.c: New.
2021-11-25Initialize node_is_self_scc in ipa_node_params::ipa_node_paramsJan Hubicka1-2/+2
gcc/ChangeLog: 2021-11-25 Jan Hubicka <hubicka@ucw.cz> * ipa-prop.h (ipa_node_params::ipa_node_params): Initialize node_is_self_scc.
2021-11-25Check for equivalences between PHI argument and def.Andrew MacLeod2-0/+37
If a PHI argument on an edge is equivalent with the DEF, then it doesn't provide any new information, defer processing it unless they are all equivalences. PR tree-optimization/103359 gcc/ * gimple-range-fold.cc (fold_using_range::range_of_phi): If arg is equivalent to def, don't initially include it's range. gcc/testsuite/ * gcc.dg/pr103359.c: New.
2021-11-25Do not check gimple_static_cahin in ref_maybe_used_by_call_p_1Jan Hubicka1-3/+1
gcc/ChangeLog: 2021-11-25 Jan Hubicka <hubicka@ucw.cz> * tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Do not check gimple_static_chain.
2021-11-25Remove dead code and functionRichard Biener1-15/+1
The only use of get_alias_symbol is gated by a gcc_unreachable (), so the following patch gets rid of it. 2021-11-24 Richard Biener <rguenther@suse.de> * cgraphunit.c (symbol_table::output_weakrefs): Remove unreachable init. (get_alias_symbol): Remove now unused function.
2021-11-25Continue RTL verifying in rtl_verify_fallthruRichard Biener1-3/+2
One case used fatal_insn which does not return which isn't intended as can be seen by the following erro = 1. The following change refactors this to inline the relevant parts of fatal_insn instead and continue validating the RTL IL. 2021-11-25 Richard Biener <rguenther@suse.de> * cfgrtl.c (rtl_verify_fallthru): Do not stop verifying with fatal_insn. (skip_insns_after_block): Remove unreachable break and continue.
2021-11-25Remove never looping loop in label_rtx_for_bbRichard Biener1-18/+6
This refactors the IL "walk" in a way to avoid the loop which will never iterate. 2021-11-25 Richard Biener <rguenther@suse.de> * cfgexpand.c (label_rtx_for_bb): Remove dead loop construct.
2021-11-25Introduce REG_SET_EMPTY_PRichard Biener2-2/+4
This avoids a -Wunreachable-code diagnostic with EXECUTE_IF_* in case the first iteration will exit the loop. For the case in thread_jump using bitmap_empty_p looks preferable so this adds REG_SET_EMPTY_P to make that available for register sets. 2021-11-25 Richard Biener <rguenther@suse.de> * regset.h (REG_SET_EMPTY_P): New macro. * cfgcleanup.c (thread_jump): Use REG_SET_EMPTY_P.
2021-11-25docs: Add missing @option keyword.Martin Liska1-2/+2
gcc/ChangeLog: * doc/invoke.texi: Use @option for -Wuninitialized.
2021-11-25path solver: Move boolean import code to compute_imports.Aldy Hernandez1-13/+12
In a follow-up patch I will be pruning the set of exported ranges within blocks to avoid unnecessary work. In order to do this, all the interesting SSA names must be in the internal import bitmap ahead of time. I had already abstracted them out into compute_imports, but I missed the boolean code. This fixes the oversight. There's a net gain of 25 threadable paths, which is unexpected but welcome. Tested on x86-64 & ppc64le Linux. gcc/ChangeLog: PR tree-optimization/103254 * gimple-range-path.cc (path_range_query::compute_ranges): Move exported boolean code... (path_range_query::compute_imports): ...here.
2021-11-25path solver: Compute ranges in path in gimple order.Aldy Hernandez2-11/+23
Andrew's patch for this PR103254 papered over some underlying performance issues in the path solver that I'd like to address. We are currently solving the SSA's defined in the current block in bitmap order, which amounts to random order for all purposes. This is causing unnecessary recursion in gori. This patch changes the order to gimple order, thus solving dependencies before uses. There is no change in threadable paths with this change. Tested on x86-64 & ppc64le Linux. gcc/ChangeLog: PR tree-optimization/103254 * gimple-range-path.cc (path_range_query::compute_ranges_defined): New (path_range_query::compute_ranges_in_block): Move to compute_ranges_defined. * gimple-range-path.h (compute_ranges_defined): New.
2021-11-25match.pd: Fix up the recent bitmask_inv_cst_vector_p simplification [PR103417]Jakub Jelinek2-13/+20
The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10 changes. The simplification triggers on (x & 4294967040U) >= 0U and turns it into: x <= 255U which is incorrect, it should fold to 1 because unsigned >= 0U is always true and normally the /* Non-equality compare simplifications from fold_binary */ (if (wi::to_wide (cst) == min) (if (cmp == GE_EXPR) { constant_boolean_node (true, type); }) simplification folds that, but this simplification was done earlier. The simplification correctly doesn't include lt which has the same reason why it shouldn't be handled, we'll fold it to 0 elsewhere. But, IMNSHO while it isn't incorrect to handle le and gt there, it is unnecessary. Because (x & cst) <= 0U and (x & cst) > 0U should never appear, again in /* Non-equality compare simplifications from fold_binary */ we have a simplification for it: (if (cmp == LE_EXPR) (eq @2 @1)) (if (cmp == GT_EXPR) (ne @2 @1)))) This is done for (cmp (convert?@2 @0) uniform_integer_cst_p@1) and so should be done for both integers and vectors. As the bitmask_inv_cst_vector_p simplification only handles eq and ne for signed types, I think it can be simplified to just following patch. 2021-11-25 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/103417 * match.pd ((X & Y) CMP 0): Only handle eq and ne. Commonalize common tests. * gcc.c-torture/execute/pr103417.c: New test.
2021-11-25bswap: Improve perform_symbolic_merge [PR103376]Jakub Jelinek2-5/+64
Thinking more about it, perhaps we could do more for BIT_XOR_EXPR. We could allow masked1 == masked2 case for it, but would need to do something different than the n->n = n1->n | n2->n; we do on all the bytes together. In particular, for masked1 == masked2 if masked1 != 0 (well, for 0 both variants are the same) and masked1 != 0xff we would need to clear corresponding n->n byte instead of setting it to the input as x ^ x = 0 (but if we don't know what x and y are, the result is also don't know). Now, for plus it is much harder, because not only for non-zero operands we don't know what the result is, but it can modify upper bytes as well. So perhaps only if current's byte masked1 && masked2 set the resulting byte to 0xff (unknown) iff the byte above it is 0 and 0, and set that resulting byte to 0xff too. Also, even for | we could instead of return NULL just set the resulting byte to 0xff if it is different, perhaps it will be masked off later on. This patch just punts on plus if both corresponding bytes are non-zero, otherwise implements the above. 2021-11-25 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/103376 * gimple-ssa-store-merging.c (perform_symbolic_merge): For BIT_IOR_EXPR, if masked1 && masked2 && masked1 != masked2, don't punt, but set the corresponding result byte to MARKER_BYTE_UNKNOWN. For BIT_XOR_EXPR similarly and if masked1 == masked2 and the byte isn't MARKER_BYTE_UNKNOWN, set the corresponding result byte to 0. * gcc.dg/optimize-bswapsi-7.c: New test.
2021-11-25c++: Return early in apply_late_template_attributes if there are no late ↵Jakub Jelinek1-0/+3
attribs [PR101180] The r12-299-ga0fdff3cf33f7284 change can result in cplus_decl_attributes being called even if there are no late attributes (but at least one early attribute) in apply_late_template_attributes. This patch fixes that, so that we return early if there are no late attrs, only arrange for TYPE_ATTRIBUTES to get the early attribute list. 2021-11-25 Jakub Jelinek <jakub@redhat.com> PR c++/101180 * pt.c (apply_late_template_attributes): Return early if there are no dependent attributes.
2021-11-25c++: Implement C++23 P2128R6 - Multidimensional subscript operator [PR102611]Jakub Jelinek22-118/+796
The following patch implements the C++23 Multidimensional subscript operator P2128R6 paper. As C++20 and older only allow a single expression in between []s (albeit for C++20 with a deprecation warning if it is a comma expression) and even in C++23 and for the coming years I think the vast majority of subscript expressions will still have a single expression and even in C++23 it is quite special, as e.g. the builtin operator requires exactly one assignment expression, the patch attempts to optimize for that case and if possible not to slow down that common case (or use more memory for it). So, already during parsing it differentiates between that (uses a single index_exp tree in that case) and the new cases (zero or two+ expressions in the list), for which it sets index_exp to NULL_TREE and uses a releasing_vec instead similarly to how e.g. finish_call_expr uses it. In call.c it introduces new functions build_op_subscript{,_1} which are something in between build_new_op{,_1} and build_op_call{,_1}. The former requires fixed number of arguments (and the patch still uses it for the common case of subscript with exactly one index expression), the latter handles variable number of arguments but is too CALL_EXPR specific and handles various cases that are unnecessary for the subscript. Right now the subscript for 0 or 2+ expressions doesn't need to deal with builtin candidates and so is quite simple. As discussed in the paper, for backwards compatibility, if for 2+ index expressions build_op_subscript fails (called with tf_none) and the expressions together form a valid comma expression (again checked with tf_none), it is used that C++20-ish way with a pedwarn about it, but if even that fails, build_op_subscript is called again with standard complain flags to diagnose it in the new way. And similarly for the builtin case. The -Wcomma-subscript warning used to be enabled by default unless -Wno-deprecated. Since the C/C++98..20 behavior is no longer deprecated, but ill-formed or changed meaning, it is now for C++23 enabled by default regardless of -Wno-deprecated and controls the pedwarn (but not the errors emitted if something wasn't valid before and isn't valid in C++23 either). 2021-11-25 Jakub Jelinek <jakub@redhat.com> PR c++/102611 gcc/ * doc/invoke.texi (-Wcomma-subscript): Document that for -std=c++20 the option isn't enabled by default with -Wno-deprecated but for -std=c++23 it is. gcc/c-family/ * c-opts.c (c_common_post_options): Enable -Wcomma-subscript by default for C++23 regardless of warn_deprecated. * c-cppbuiltin.c (c_cpp_builtins): Predefine __cpp_multidimensional_subscript=202110L for C++23. gcc/cp/ * cp-tree.h (build_op_subscript): Implement P2128R6 - Multidimensional subscript operator. Declare. (class releasing_vec): Add release method. (grok_array_decl): Remove bool argument, add vec<tree, va_gc> ** and tsubst_flags_t arguments. (build_min_non_dep_op_overload): Declare another overload. * parser.c (cp_parser_parenthesized_expression_list_elt): New function. (cp_parser_postfix_open_square_expression): Mention C++23 syntax in function comment. For C++23 parse zero or more than one initializer clauses in expression list, adjust grok_array_decl caller. (cp_parser_parenthesized_expression_list): Use cp_parser_parenthesized_expression_list_elt. (cp_parser_builtin_offsetof): Adjust grok_array_decl caller. * decl.c (grok_op_properties): For C++23 don't check number of arguments of operator[]. * decl2.c (grok_array_decl): Remove decltype_p argument, add index_exp_list and complain arguments. If index_exp is NULL, handle *index_exp_list as the subscript expression list. * tree.c (build_min_non_dep_op_overload): New overload. * call.c (add_operator_candidates, build_over_call): Adjust comments for removal of build_new_op_1. (build_op_subscript): New function. * pt.c (tsubst_copy_and_build_call_args): New function. (tsubst_copy_and_build) <case ARRAY_REF>: If second operand is magic CALL_EXPR with ovl_op_identifier (ARRAY_REF) as CALL_EXPR_FN, tsubst CALL_EXPR arguments including expanding pack expressions in it and call grok_array_decl instead of build_x_array_ref. <case CALL_EXPR>: Use tsubst_copy_and_build_call_args. * semantics.c (handle_omp_array_sections_1): Adjust grok_array_decl caller. gcc/testsuite/ * g++.dg/cpp2a/comma1.C: Expect different diagnostics for C++23. * g++.dg/cpp2a/comma3.C: Likewise. * g++.dg/cpp2a/comma4.C: Expect diagnostics for C++23. * g++.dg/cpp2a/comma5.C: Expect different diagnostics for C++23. * g++.dg/cpp23/feat-cxx2b.C: Test __cpp_multidimensional_subscript predefined macro. * g++.dg/cpp23/subscript1.C: New test. * g++.dg/cpp23/subscript2.C: New test. * g++.dg/cpp23/subscript3.C: New test. * g++.dg/cpp23/subscript4.C: New test. * g++.dg/cpp23/subscript5.C: New test. * g++.dg/cpp23/subscript6.C: New test.
2021-11-24pr103194-5.c: Replace long with int64_tH.J. Lu1-1/+2
Replace long with int64_t to work with -mx32. * gcc.target/i386/pr103194-5.c: Include <stdint.h>. Replace long with int64_t.
2021-11-25Daily bump.GCC Administrator5-1/+238