Age | Commit message (Collapse) | Author | Files | Lines |
|
1. No memory is needed to move HI/HFmode between GPR and SSE registers
under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o
AVX512FP16.
2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace
ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant
initialization cound be eliminated.
gcc/ChangeLog:
PR target/102811
* config/i386/i386.c (inline_secondary_memory_needed): HImode
move between GPR and SSE registers is supported under
TARGET_SSE2 and above.
* config/i386/i386.md (extendhfsf2): Optimize expander.
(truncsfhf2): Ditto.
* config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to
align with V8HImode.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr102811-2.c: New test.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new
scan-assembler-times.
|
|
There're several failures:
1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
%vpextrw should be used in output templates.
2. ICE in get_attr_memory for movhi_internal since some alternatives
are marked as TYPE_SSELOG.
use TYPE_SSELOG1 instead.
Also this patch fixs a typo and some latent bugs which are related to
moving HImode from/to sse register w/o TARGET_AVX512FP16.
gcc/ChangeLog:
PR target/102811
PR target/103463
* config/i386/i386.c (ix86_secondary_reload): Without
TARGET_SSE4_1, General register is needed to move HImode from
sse register to memory.
* config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of
pextrw in output templates.
* config/i386/i386.md (movhi_internal): Ditto, also fix typo of
MEM_P (operands[1]) and adjust mode/prefix/type attribute for
alternatives related to sse register.
|
|
When creating forwarders in CD-DCE we have to avoid creating loops
where we formerly did not consider those because of abnormal
predecessors. At this point simply excuse us when there are any
abnormal predecessors.
2021-11-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/103458
* tree-ssa-dce.c (make_forwarders_with_degenerate_phis): Do not
create forwarders for blocks with abnormal predecessors.
* gcc.dg/torture/pr103458.c: New testcase.
|
|
This restores the semantics of can_be_invalidated_p to the original
semantics of the function this was split out from tree-ssa-uninit.c.
The current semantics only ever look at the first predicate which
cannot be correct.
2021-11-26 Richard Biener <rguenther@suse.de>
* gimple-predicate-analysis.cc (can_be_invalidated_p):
Restore semantics to the one before the split from
tree-ssa-uninit.c.
|
|
As verified, the emulated gather capability of vectorizer
(r12-2733) can help to speed up SPEC2017 510.parest_r on
Power8/9/10 by 5% ~ 9% with option sets Ofast unroll and
Ofast lto.
This patch is to add a test case similar to the one in i386
to add testing coverage for 510.parest_r hotspots.
btw, different from the one in i386, this uses unsigned int
as INDEXTYPE since the unpack support for unsigned int
(r12-3134) also matters for the hotspots vectorization.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vect-gather-1.c: New test.
|
|
|
|
This patch adds simple code to dump and compare frequencies of basic blocks
read from the profile feedback and frequencies guessed statically.
It dumps basic blocks in the order of decreasing frequencies from feedback
along with guessed frequencies and histograms.
It makes it to possible spot basic blocks in hot regions that are considered
cold by guessed profile or vice versa.
I am trying to figure out how realistic our profile estimate is compared to
read one on exchange2 (looking again into PR98782. There IRA now places spills
into hot regions of code while with older (and worse) profile it did not.
Catch is that the function is very large and has 9 nested loops, so it is hard
to figure out how to improve the profile estimate and/or IRA.
gcc/ChangeLog:
2021-11-28 Jan Hubicka <hubicka@ucw.cz>
* profile.c: Include sreal.h
(struct bb_stats): New.
(cmp_stats): New function.
(compute_branch_probabilities): Output bb stats.
|
|
Profile-report was never properly updated after switch to new profile
representation. This patch fixes the way profile mismatches are calculated:
we used to collect separately count and freq mismatches, while now we have
only counts & probabilities. So we verify
- in count: that total count of incomming edges is close to acutal count of
the BB
- out prob: that total sum of outgoing edge edge probabilities is close
to 1 (except for BB containing noreturn calls or EH).
Moreover I added dumping of absolute data which is useful to plot them: with
Martin Liska we plan to setup regular testing so we keep optimizers profie
updates bit under control.
Finally I added both static and dynamic stats about mismatches - static one is
simply number of inconsistencies in the cfg while dynamic is scaled by the
profile - I think in order to keep eye on optimizers the first number is quite
relevant. WHile when tracking why code quality regressed the second number
matters more.
2021-11-28 Jan Hubicka <hubicka@ucw.cz>
* cfghooks.c: Include sreal.h, profile.h.
(profile_record_check_consistency): Fix checking of count counsistency;
record also dynamic mismatches.
* cfgrtl.c (rtl_account_profile_record): Similarly.
* tree-cfg.c (gimple_account_profile_record): Likewise.
* cfghooks.h (struct profile_record): Remove num_mismatched_freq_in,
num_mismatched_freq_out, turn time to double, add
dyn_mismatched_prob_out, dyn_mismatched_count_in,
num_mismatched_prob_out; remove num_mismatched_count_out.
* passes.c (account_profile_1): New function.
(account_profile_in_list): New function.
(pass_manager::dump_profile_report): Rewrite.
(execute_one_ipa_transform_pass): Check profile consistency after
running all passes.
(execute_all_ipa_transforms): Remove cfun test; record all transform
methods.
(execute_one_pass): Fix collecting of profile stats.
|
|
gcc/d/ChangeLog:
* d-attribs.cc (parse_optimize_options): Fix thinko.
|
|
|
|
Change four occurances of %ld specifier for size_t to %zu for clean 32bit builds.
Signed-off-by
2021-11-27 Petter Tomner <tomner@kth.se>
gcc/jit/
* libgccjit.c: %ld -> %zu
|
|
The following testcase is miscompiled because the x86_{,64_}sh{l,r}d
patterns don't properly describe what the instructions do. One thing
is left out, in particular that there is initial count &= 63 for
sh{l,r}dq and initial count &= 31 for sh{l,r}d{l,w}. And another thing
not described properly, in particular the behavior when count (after the
masking) is 0. The pattern says it is e.g.
res = (op0 << op2) | (op1 >> (64 - op2))
but that triggers UB on op1 >> 64. For op2 0 we actually want
res = (op0 << op2) | 0
When constants are propagated to these patterns during RTL optimizations,
both such problems trigger wrong-code issues.
This patch represents the patterns as e.g.
res = (op0 << (op2 & 63)) | (unsigned long long) ((uint128_t) op1 >> (64 - (op2 & 63)))
so there is both the initial masking and op2 == 0 behavior results in
zero being ored.
The patch introduces alternate patterns for constant op2 where
simplify-rtx.c will fold those expressions into simple numbers,
and define_insn_and_split pre-reload splitter for how the patterns
looked before into the new form, so that it can pattern match during
combine even computations that assumed the shift amount will be in
the range of 1 .. bitsize-1.
2021-11-27 Jakub Jelinek <jakub@redhat.com>
PR middle-end/103431
* config/i386/i386.md (x86_64_shld, x86_shld, x86_64_shrd, x86_shrd):
Change insn pattern to accurately describe the instructions.
(*x86_64_shld_1, *x86_shld_1, *x86_64_shrd_1, *x86_shrd_1): New
define_insn patterns.
(*x86_64_shld_2, *x86_shld_2, *x86_64_shrd_2, *x86_shrd_2): New
define_insn_and_split patterns.
(*ashl<dwi>3_doubleword_mask, *ashl<dwi>3_doubleword_mask_1,
*<insn><dwi>3_doubleword_mask, *<insn><dwi>3_doubleword_mask_1,
ix86_rotl<dwi>3_doubleword, ix86_rotr<dwi>3_doubleword): Adjust
splitters for x86_{,64_}sh{l,r}d pattern changes.
* gcc.dg/pr103431.c: New test.
|
|
On gcc.c-torture/execute/pr103376.c in the following code we trigger UB
in the compiler. n->range is 8 because it is 64-bit load and rsize is 0
because it is a bswap sequence with load and known to be 0:
/* Find real size of result (highest non-zero byte). */
if (n->base_addr)
for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER, rsize++);
else
rsize = n->range;
The shifts then shift uint64_t by 64 bits. For this case mask is 0
and we want both *cmpxchg and *cmpnop as 0, the operation can be done as
both nop and bswap and callers will prefer nop.
2021-11-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/103435
* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Avoid UB if
n->range - rsize == 8, just clear both *cmpnop and *cmpxchg in that
case.
|
|
2021-11-27 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/ivopts-8.c: Fix new test case for -m32.
* gcc.dg/tree-ssa/ivopts-9.c: Likewise.
|
|
|
|
I forgot that IPA passes before ipa-inline must not return
TODO_cleanup_cfg from their transformation function because ordinary
CFG cleanup does not remove call graph edges associated with removed
call statements but must use
delete_unreachable_blocks_update_callgraph instead. This patch fixes
that error.
gcc/ChangeLog:
2021-11-26 Martin Jambor <mjambor@suse.cz>
PR ipa/103441
* ipa-prop.c (ipcp_transform_function): Call
delete_unreachable_blocks_update_callgraph instead of returning
TODO_cleanup_cfg.
|
|
gcc/fortran/ChangeLog:
PR fortran/103411
* check.c (gfc_check_reshape): Improve check of size of source
array for the RESHAPE intrinsic against the given shape when pad
is not given, and shape is a parameter. Try other simplifications
of shape.
gcc/testsuite/ChangeLog:
PR fortran/103411
* gfortran.dg/pr68153.f90: Adjust test to improved check.
* gfortran.dg/reshape_7.f90: Likewise.
* gfortran.dg/reshape_9.f90: New test.
|
|
Put all accesses to object_sizes behind functions so that we can add
dynamic capability more easily.
gcc/ChangeLog:
* tree-object-size.c (object_sizes_grow, object_sizes_release,
object_sizes_unknown_p, object_sizes_get, object_size_set_force,
object_sizes_set): New functions.
(addr_object_size, compute_builtin_object_size,
expr_object_size, call_object_size, unknown_object_size,
merge_object_sizes, plus_stmt_object_size,
cond_expr_object_size, collect_object_sizes_for,
check_for_plus_in_loops_1, init_object_sizes,
fini_object_sizes): Adjust.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
|
|
A simple cleanup to allow inserting dynamic size code more easily.
gcc/ChangeLog:
* tree-object-size.c: New enum.
(object_sizes, computed, addr_object_size,
compute_builtin_object_size, expr_object_size, call_object_size,
merge_object_sizes, plus_stmt_object_size,
collect_object_sizes_for, init_object_sizes, fini_object_sizes,
object_sizes_execute): Replace magic numbers with enums.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
|
|
This patch tidies up the code that GCC generates for simple loops,
by selecting/generating a simpler loop bound expression in ivopts.
The original motivation came from looking at the following loop (from
gcc.target/i386/pr90178.c)
int *find_ptr (int* mem, int sz, int val)
{
for (int i = 0; i < sz; i++)
if (mem[i] == val)
return &mem[i];
return 0;
}
which GCC currently compiles to:
find_ptr:
movq %rdi, %rax
testl %esi, %esi
jle .L4
leal -1(%rsi), %ecx
leaq 4(%rdi,%rcx,4), %rcx
jmp .L3
.L7: addq $4, %rax
cmpq %rcx, %rax
je .L4
.L3: cmpl %edx, (%rax)
jne .L7
ret
.L4: xorl %eax, %eax
ret
Notice the relatively complex leal/leaq instructions, that result
from ivopts using the following expression for the loop bound:
inv_expr 2: ((unsigned long) ((unsigned int) sz_8(D) + 4294967295)
* 4 + (unsigned long) mem_9(D)) + 4
which results from NITERS being (unsigned int) sz_8(D) + 4294967295,
i.e. (sz - 1), and the logic in cand_value_at determining the bound
as BASE + NITERS*STEP at the start of the final iteration and as
BASE + NITERS*STEP + STEP at the end of the final iteration.
Ideally, we'd like the middle-end optimizers to simplify
BASE + NITERS*STEP + STEP as BASE + (NITERS+1)*STEP, especially
when NITERS already has the form BOUND-1, but with type conversions
and possible overflow to worry about, the above "inv_expr 2" is the
best that can be done by fold (without additional context information).
This patch improves ivopts' cand_value_at by instead of using just
the tree expression for NITERS, passing the data structure that
explains how that expression was derived. This allows us to peek
under the surface to check that NITERS+1 doesn't overflow, and in
this patch to use the SSA_NAME already holding the required value.
In the motivating loop above, inv_expr 2 now becomes:
(unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D)
And as a result, on x86_64 we now generate:
find_ptr:
movq %rdi, %rax
testl %esi, %esi
jle .L4
movslq %esi, %rsi
leaq (%rdi,%rsi,4), %rcx
jmp .L3
.L7: addq $4, %rax
cmpq %rcx, %rax
je .L4
.L3: cmpl %edx, (%rax)
jne .L7
ret
.L4: xorl %eax, %eax
ret
This improvement required one minor tweak to GCC's testsuite for
gcc.dg/wrapped-binop-simplify.c, where we again generate better
code, and therefore no longer find as many optimization opportunities
in later passes (vrp2).
Previously:
void v1 (unsigned long *in, unsigned long *out, unsigned int n)
{
int i;
for (i = 0; i < n; i++) {
out[i] = in[i];
}
}
on x86_64 generated:
v1: testl %edx, %edx
je .L1
movl %edx, %edx
xorl %eax, %eax
.L3: movq (%rdi,%rax,8), %rcx
movq %rcx, (%rsi,%rax,8)
addq $1, %rax
cmpq %rax, %rdx
jne .L3
.L1: ret
and now instead generates:
v1: testl %edx, %edx
je .L1
movl %edx, %edx
xorl %eax, %eax
leaq 0(,%rdx,8), %rcx
.L3: movq (%rdi,%rax), %rdx
movq %rdx, (%rsi,%rax)
addq $8, %rax
cmpq %rax, %rcx
jne .L3
.L1: ret
2021-11-26 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* tree-ssa-loop-ivopts.c (cand_value_at): Take a class
tree_niter_desc* argument instead of just a tree for NITER.
If we require the iv candidate value at the end of the final
loop iteration, try using the original loop bound as the
NITER for sufficiently simple loops.
(may_eliminate_iv): Update (only) call to cand_value_at.
gcc/testsuite/ChangeLog
* gcc.dg/wrapped-binop-simplify.c: Update expected test result.
* gcc.dg/tree-ssa/ivopts-5.c: New test case.
* gcc.dg/tree-ssa/ivopts-6.c: New test case.
* gcc.dg/tree-ssa/ivopts-7.c: New test case.
* gcc.dg/tree-ssa/ivopts-8.c: New test case.
* gcc.dg/tree-ssa/ivopts-9.c: New test case.
|
|
Fixes:
==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178
READ of size 4 at 0x00000666ca5c thread T0
#0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
#1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
#2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
#3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829
#4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427
#5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
#6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
#7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
#8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146
for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.
gcc/d/ChangeLog:
* d-attribs.cc (parse_optimize_options): Check index before
accessing cl_options.
|
|
To make dumps easier to read modref now dumps cgraph_node name rather then
cfun name in function being analysed and I also fixed minor issue with ECF
flags merging when updating inline summary.
gcc/ChangeLog:
2021-11-26 Jan Hubicka <hubicka@ucw.cz>
* ipa-modref.c (analyze_function): Drop parameter F and dump
cgraph node name rather than cfun name.
(modref_generate): Update.
(modref_summaries::insert):Update.
(modref_summaries_lto::insert):Update.
(pass_modref::execute):Update.
(ipa_merge_modref_summary_after_inlining): Improve combining of
ECF_FLAGS.
|
|
gcc/testsuite/ChangeLog:
2021-11-26 Jan Hubicka <hubicka@ucw.cz>
* gcc.dg/ipa/inline-9.c: Update template.c
|
|
update_escape_summary_1 has thinko where it compues proper min_flags but then
stores original value (ignoring the fact whether there was a dereference
in the escape point).
PR ipa/102943
* ipa-modref.c (update_escape_summary_1): Fix handling of min_flags.
|
|
On Wed, Oct 20, 2021 at 07:16:44PM -0400, Jason Merrill wrote:
> or an unevaluated operand, or a subexpression of an immediate invocation.
>
> Hmm...that suggests that in consteval23.C, bar(foo) should also be OK,
The following patch handles that by removing the diagnostics about taking
address of immediate function from cp_build_addr_expr_1, and instead diagnoses
it in cp_fold_r. To do that with proper locations, the patch attempts to
ensure that ADDR_EXPRs of immediate functions get EXPR_LOCATION set and
adds a PTRMEM_CST_LOCATION for PTRMEM_CSTs. Also, evaluation of
std::source_location::current() is moved from genericization to cp_fold.
2021-11-26 Jakub Jelinek <jakub@redhat.com>
PR c++/102753
* cp-tree.h (struct ptrmem_cst): Add locus member.
(PTRMEM_CST_LOCATION): Define.
* tree.c (make_ptrmem_cst): Set PTRMEM_CST_LOCATION to input_location.
(cp_expr_location): Return PTRMEM_CST_LOCATION for PTRMEM_CST.
* typeck.c (build_x_unary_op): Overwrite PTRMEM_CST_LOCATION for
PTRMEM_CST instead of calling maybe_wrap_with_location.
(cp_build_addr_expr_1): Don't diagnose taking address of
immediate functions here. Instead when taking their address make
sure the returned ADDR_EXPR has EXPR_LOCATION set.
(expand_ptrmemfunc_cst): Copy over PTRMEM_CST_LOCATION to ADDR_EXPR's
EXPR_LOCATION.
(convert_for_assignment): Use cp_expr_loc_or_input_loc instead of
EXPR_LOC_OR_LOC.
* pt.c (tsubst_copy): Use build1_loc instead of build1. Ensure
ADDR_EXPR of immediate function has EXPR_LOCATION set.
* cp-gimplify.c (cp_fold_r): Diagnose taking address of immediate
functions here. For consteval if don't walk THEN_CLAUSE.
(cp_genericize_r): Move evaluation of calls to
std::source_location::current from here to...
(cp_fold): ... here. Don't assert calls to immediate functions must
be source_location_current_p, instead only constant evaluate
calls to source_location_current_p.
* g++.dg/cpp2a/consteval20.C: Add some extra tests.
* g++.dg/cpp2a/consteval23.C: Likewise.
* g++.dg/cpp2a/consteval25.C: New test.
* g++.dg/cpp2a/srcloc20.C: New test.
|
|
with -mf16c [PR 102811]
Add define_insn extendhfsf2 and truncsfhf2 for target_f16c.
gcc/ChangeLog:
PR target/102811
* config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit data in XMM register
for TARGET_SSE2.
* config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for TARGET_F16C.
(extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only.
(*extendhf<mode>2): Rename from extendhf<mode>2.
(truncsfhf2): Likewise.
(truncdfhf2): Likewise.
(*trunc<mode>2): Likewise.
gcc/testsuite/ChangeLog:
PR target/102811
* gcc.target/i386/pr90773-21.c: Allow pextrw instead of movw.
* gcc.target/i386/pr90773-23.c: Ditto.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
|
|
gcc/ChangeLog:
PR middle-end/103419
* match.pd: Fix typo, use the type of second parameter, not
first one.
|
|
|
|
gcc/ChangeLog:
* ipa-cp.c (ipa_value_range_from_jfunc): Remove forgotten early return.
gcc/testsuite/ChangeLog:
* gcc.dg/ipa/inline10.c: New test.
|
|
This is a simple one line fix to the regression PR middle-end/103406,
where x - x is being folded to 0.0 even when x is +Inf or -Inf.
In GCC 11 and previously, we'd check whether the type honored NaNs
(which implicitly covered the case where the type honors infinities),
but my patch to test whether the operand could potentially be NaN
failed to also check whether the operand could potentially be Inf.
2021-11-25 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/103406
* match.pd (minus @0 @0): Check tree_expr_maybe_infinite_p.
gcc/testsuite/ChangeLog
PR middle-end/103406
* gcc.dg/pr103406.c: New test case.
|
|
PR 103227 exposed an issue with ordering of transformations of IPA
passes. IPA-CP can create clones for constants passed by reference
and at the same time IPA-SRA can also decide that the parameter does
not need to be a pointer (or an aggregate) and plan to convert it
into (a) simple scalar(s). Because no intermediate clone is created
just for the purpose of ordering the transformations and because
IPA-SRA transformation is implemented as part of clone
materialization, the IPA-CP transformation happens only afterwards,
reversing the order of the transformations compared to the ordering of
analyses.
IPA-CP transformation looks at planned substitutions for values passed
by reference or in aggregates but finds that all the relevant
parameters no longer exist. Currently it subsequently simply gives
up, leading to clones created for no good purpose (and huge regression
of 548.exchange_r. This patch teaches it recognize the situation,
look up the new scalarized parameter and perform value substitution on
it. On my desktop this has recovered the lost exchange2 run-time (and
some more).
I have disabled IPA-SRA in a Fortran testcase so that the dumping from
the transformation phase can still be matched in order to verify that
IPA-CP understands the IL after verifying that it does the right thing
also with IPA-SRA.
gcc/ChangeLog:
2021-11-23 Martin Jambor <mjambor@suse.cz>
PR ipa/103227
* ipa-prop.h (ipa_get_param): New overload. Move bits of the existing
one to the new one.
* ipa-param-manipulation.h (ipa_param_adjustments): New member
function get_updated_index_or_split.
* ipa-param-manipulation.c
(ipa_param_adjustments::get_updated_index_or_split): New function.
* ipa-prop.c (adjust_agg_replacement_values): Reimplement, add
capability to identify scalarized parameters and perform substitution
on them.
(ipcp_transform_function): Create descriptors earlier, handle new
return values of adjust_agg_replacement_values.
gcc/testsuite/ChangeLog:
2021-11-23 Martin Jambor <mjambor@suse.cz>
PR ipa/103227
* gcc.dg/ipa/pr103227-1.c: New test.
* gcc.dg/ipa/pr103227-3.c: Likewise.
* gcc.dg/ipa/pr103227-2.c: Likewise.
* gfortran.dg/pr53787.f90: Disable IPA-SRA.
|
|
Revert the patch below, as it may slow down compilation with large CFGs.
commit 8acbd7bef6edbf537e3037174907029b530212f6
Author: Aldy Hernandez <aldyh@redhat.com>
Date: Wed Nov 24 09:43:36 2021 +0100
path solver: Compute ranges in path in gimple order.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::compute_ranges_defined): Remove.
(path_range_query::compute_ranges_in_block): Revert to bitmap order.
* gimple-range-path.h: Remove compute_ranges_defined.
|
|
gcc/ChangeLog:
PR target/103396
* config/gcn/gcn.c (move_callee_saved_registers): Ensure that the
number of spilled registers is counted correctly.
|
|
Various ranger-enabled patches like threading and VRP2 can do this now, so add the testcase for posterity.
gcc/testsuite/
PR tree-optimization/102648
* gcc.dg/pr102648.c: New.
|
|
gcc/ChangeLog:
2021-11-25 Jan Hubicka <hubicka@ucw.cz>
* ipa-prop.h (ipa_node_params::ipa_node_params): Initialize
node_is_self_scc.
|
|
If a PHI argument on an edge is equivalent with the DEF, then it doesn't
provide any new information, defer processing it unless they are all
equivalences.
PR tree-optimization/103359
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_phi): If arg is
equivalent to def, don't initially include it's range.
gcc/testsuite/
* gcc.dg/pr103359.c: New.
|
|
gcc/ChangeLog:
2021-11-25 Jan Hubicka <hubicka@ucw.cz>
* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Do not check
gimple_static_chain.
|
|
The only use of get_alias_symbol is gated by a gcc_unreachable (),
so the following patch gets rid of it.
2021-11-24 Richard Biener <rguenther@suse.de>
* cgraphunit.c (symbol_table::output_weakrefs): Remove
unreachable init.
(get_alias_symbol): Remove now unused function.
|
|
One case used fatal_insn which does not return which isn't
intended as can be seen by the following erro = 1. The following
change refactors this to inline the relevant parts of fatal_insn
instead and continue validating the RTL IL.
2021-11-25 Richard Biener <rguenther@suse.de>
* cfgrtl.c (rtl_verify_fallthru): Do not stop verifying
with fatal_insn.
(skip_insns_after_block): Remove unreachable break and continue.
|
|
This refactors the IL "walk" in a way to avoid the loop which will
never iterate.
2021-11-25 Richard Biener <rguenther@suse.de>
* cfgexpand.c (label_rtx_for_bb): Remove dead loop construct.
|
|
This avoids a -Wunreachable-code diagnostic with EXECUTE_IF_*
in case the first iteration will exit the loop. For the case
in thread_jump using bitmap_empty_p looks preferable so this
adds REG_SET_EMPTY_P to make that available for register sets.
2021-11-25 Richard Biener <rguenther@suse.de>
* regset.h (REG_SET_EMPTY_P): New macro.
* cfgcleanup.c (thread_jump): Use REG_SET_EMPTY_P.
|
|
gcc/ChangeLog:
* doc/invoke.texi: Use @option for -Wuninitialized.
|
|
In a follow-up patch I will be pruning the set of exported ranges
within blocks to avoid unnecessary work. In order to do this, all the
interesting SSA names must be in the internal import bitmap ahead of
time. I had already abstracted them out into compute_imports, but I
missed the boolean code. This fixes the oversight.
There's a net gain of 25 threadable paths, which is unexpected but
welcome.
Tested on x86-64 & ppc64le Linux.
gcc/ChangeLog:
PR tree-optimization/103254
* gimple-range-path.cc (path_range_query::compute_ranges): Move
exported boolean code...
(path_range_query::compute_imports): ...here.
|
|
Andrew's patch for this PR103254 papered over some underlying
performance issues in the path solver that I'd like to address.
We are currently solving the SSA's defined in the current block in
bitmap order, which amounts to random order for all purposes. This is
causing unnecessary recursion in gori. This patch changes the order
to gimple order, thus solving dependencies before uses.
There is no change in threadable paths with this change.
Tested on x86-64 & ppc64le Linux.
gcc/ChangeLog:
PR tree-optimization/103254
* gimple-range-path.cc (path_range_query::compute_ranges_defined): New
(path_range_query::compute_ranges_in_block): Move to
compute_ranges_defined.
* gimple-range-path.h (compute_ranges_defined): New.
|
|
The following testcase is miscompiled since the r12-5489-g0888d6bbe97e10
changes.
The simplification triggers on
(x & 4294967040U) >= 0U
and turns it into:
x <= 255U
which is incorrect, it should fold to 1 because unsigned >= 0U is always
true and normally the
/* Non-equality compare simplifications from fold_binary */
(if (wi::to_wide (cst) == min)
(if (cmp == GE_EXPR)
{ constant_boolean_node (true, type); })
simplification folds that, but this simplification was done earlier.
The simplification correctly doesn't include lt which has the same
reason why it shouldn't be handled, we'll fold it to 0 elsewhere.
But, IMNSHO while it isn't incorrect to handle le and gt there, it is
unnecessary. Because (x & cst) <= 0U and (x & cst) > 0U should
never appear, again in
/* Non-equality compare simplifications from fold_binary */
we have a simplification for it:
(if (cmp == LE_EXPR)
(eq @2 @1))
(if (cmp == GT_EXPR)
(ne @2 @1))))
This is done for
(cmp (convert?@2 @0) uniform_integer_cst_p@1)
and so should be done for both integers and vectors.
As the bitmask_inv_cst_vector_p simplification only handles
eq and ne for signed types, I think it can be simplified to just
following patch.
2021-11-25 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/103417
* match.pd ((X & Y) CMP 0): Only handle eq and ne. Commonalize
common tests.
* gcc.c-torture/execute/pr103417.c: New test.
|
|
Thinking more about it, perhaps we could do more for BIT_XOR_EXPR.
We could allow masked1 == masked2 case for it, but would need to
do something different than the
n->n = n1->n | n2->n;
we do on all the bytes together.
In particular, for masked1 == masked2 if masked1 != 0 (well, for 0
both variants are the same) and masked1 != 0xff we would need to
clear corresponding n->n byte instead of setting it to the input
as x ^ x = 0 (but if we don't know what x and y are, the result is
also don't know). Now, for plus it is much harder, because not only
for non-zero operands we don't know what the result is, but it can
modify upper bytes as well. So perhaps only if current's byte
masked1 && masked2 set the resulting byte to 0xff (unknown) iff
the byte above it is 0 and 0, and set that resulting byte to 0xff too.
Also, even for | we could instead of return NULL just set the resulting
byte to 0xff if it is different, perhaps it will be masked off later on.
This patch just punts on plus if both corresponding bytes are non-zero,
otherwise implements the above.
2021-11-25 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/103376
* gimple-ssa-store-merging.c (perform_symbolic_merge): For
BIT_IOR_EXPR, if masked1 && masked2 && masked1 != masked2, don't
punt, but set the corresponding result byte to MARKER_BYTE_UNKNOWN.
For BIT_XOR_EXPR similarly and if masked1 == masked2 and the
byte isn't MARKER_BYTE_UNKNOWN, set the corresponding result byte to
0.
* gcc.dg/optimize-bswapsi-7.c: New test.
|
|
attribs [PR101180]
The r12-299-ga0fdff3cf33f7284 change can result in cplus_decl_attributes being called
even if there are no late attributes (but at least one early attribute) in
apply_late_template_attributes. This patch fixes that, so that we return early
if there are no late attrs, only arrange for TYPE_ATTRIBUTES to get the early
attribute list.
2021-11-25 Jakub Jelinek <jakub@redhat.com>
PR c++/101180
* pt.c (apply_late_template_attributes): Return early if there are no
dependent attributes.
|
|
The following patch implements the C++23 Multidimensional subscript operator
P2128R6 paper.
As C++20 and older only allow a single expression in between []s (albeit
for C++20 with a deprecation warning if it is a comma expression) and even
in C++23 and for the coming years I think the vast majority of subscript
expressions will still have a single expression and even in C++23 it is
quite special, as e.g. the builtin operator requires exactly one
assignment expression, the patch attempts to optimize for that case and
if possible not to slow down that common case (or use more memory for it).
So, already during parsing it differentiates between that (uses a single
index_exp tree in that case) and the new cases (zero or two+ expressions
in the list), for which it sets index_exp to NULL_TREE and uses a
releasing_vec instead similarly to how e.g. finish_call_expr uses it.
In call.c it introduces new functions build_op_subscript{,_1} which are
something in between build_new_op{,_1} and build_op_call{,_1}.
The former requires fixed number of arguments (and the patch still uses
it for the common case of subscript with exactly one index expression),
the latter handles variable number of arguments but is too CALL_EXPR specific
and handles various cases that are unnecessary for the subscript.
Right now the subscript for 0 or 2+ expressions doesn't need to deal with
builtin candidates and so is quite simple.
As discussed in the paper, for backwards compatibility, if for 2+ index
expressions build_op_subscript fails (called with tf_none) and the
expressions together form a valid comma expression (again checked with
tf_none), it is used that C++20-ish way with a pedwarn about it, but if
even that fails, build_op_subscript is called again with standard complain
flags to diagnose it in the new way. And similarly for the builtin case.
The -Wcomma-subscript warning used to be enabled by default unless
-Wno-deprecated. Since the C/C++98..20 behavior is no longer deprecated,
but ill-formed or changed meaning, it is now for C++23 enabled by
default regardless of -Wno-deprecated and controls the pedwarn (but not the
errors emitted if something wasn't valid before and isn't valid in C++23
either).
2021-11-25 Jakub Jelinek <jakub@redhat.com>
PR c++/102611
gcc/
* doc/invoke.texi (-Wcomma-subscript): Document that for
-std=c++20 the option isn't enabled by default with -Wno-deprecated
but for -std=c++23 it is.
gcc/c-family/
* c-opts.c (c_common_post_options): Enable -Wcomma-subscript by
default for C++23 regardless of warn_deprecated.
* c-cppbuiltin.c (c_cpp_builtins): Predefine
__cpp_multidimensional_subscript=202110L for C++23.
gcc/cp/
* cp-tree.h (build_op_subscript): Implement P2128R6
- Multidimensional subscript operator. Declare.
(class releasing_vec): Add release method.
(grok_array_decl): Remove bool argument, add vec<tree, va_gc> **
and tsubst_flags_t arguments.
(build_min_non_dep_op_overload): Declare another overload.
* parser.c (cp_parser_parenthesized_expression_list_elt): New function.
(cp_parser_postfix_open_square_expression): Mention C++23 syntax in
function comment. For C++23 parse zero or more than one initializer
clauses in expression list, adjust grok_array_decl caller.
(cp_parser_parenthesized_expression_list): Use
cp_parser_parenthesized_expression_list_elt.
(cp_parser_builtin_offsetof): Adjust grok_array_decl caller.
* decl.c (grok_op_properties): For C++23 don't check number
of arguments of operator[].
* decl2.c (grok_array_decl): Remove decltype_p argument, add
index_exp_list and complain arguments. If index_exp is NULL,
handle *index_exp_list as the subscript expression list.
* tree.c (build_min_non_dep_op_overload): New overload.
* call.c (add_operator_candidates, build_over_call): Adjust comments
for removal of build_new_op_1.
(build_op_subscript): New function.
* pt.c (tsubst_copy_and_build_call_args): New function.
(tsubst_copy_and_build) <case ARRAY_REF>: If second
operand is magic CALL_EXPR with ovl_op_identifier (ARRAY_REF)
as CALL_EXPR_FN, tsubst CALL_EXPR arguments including expanding
pack expressions in it and call grok_array_decl instead of
build_x_array_ref.
<case CALL_EXPR>: Use tsubst_copy_and_build_call_args.
* semantics.c (handle_omp_array_sections_1): Adjust grok_array_decl
caller.
gcc/testsuite/
* g++.dg/cpp2a/comma1.C: Expect different diagnostics for C++23.
* g++.dg/cpp2a/comma3.C: Likewise.
* g++.dg/cpp2a/comma4.C: Expect diagnostics for C++23.
* g++.dg/cpp2a/comma5.C: Expect different diagnostics for C++23.
* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_multidimensional_subscript
predefined macro.
* g++.dg/cpp23/subscript1.C: New test.
* g++.dg/cpp23/subscript2.C: New test.
* g++.dg/cpp23/subscript3.C: New test.
* g++.dg/cpp23/subscript4.C: New test.
* g++.dg/cpp23/subscript5.C: New test.
* g++.dg/cpp23/subscript6.C: New test.
|
|
Replace long with int64_t to work with -mx32.
* gcc.target/i386/pr103194-5.c: Include <stdint.h>.
Replace long with int64_t.
|
|
|