Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/testsuite/
* gcc.c-torture/execute/20021120-1.c: Skip if not size20plus or -Os.
* gcc.dg/fixed-point/convert-float-4.c: Require size20plus.
* gcc.dg/torture/pr112282.c: Skip if -O0 unless size20plus.
* g++.dg/lookup/pr21802.C: Require size20plus.
|
|
In g:9d20529d94b23275885f380d155fe8671ab5353a, I'd extended
insn_propagation to handle simple cases of hard-reg mode punning.
The punned "to" value was created using simplify_subreg rather
than simplify_gen_subreg, on the basis that hard-coded subregs
aren't generally useful after RA (where hard-reg propagation is
expected to happen).
This PR is about a case where the subreg gets pushed into the
operands of a plus, but the subreg on one of the operands
cannot be simplified. Specifically, we have to generate
(subreg:SI (reg:DI sp) 0) rather than (reg:SI sp), since all
references to the stack pointer must be via stack_pointer_rtx.
However, code in x86 (reasonably) expects no subregs of registers
to appear after RA, except for special cases like strict_low_part.
This leads to an awkward situation where we can't ban subregs of sp
(because of the strict_low_part use), can't allow direct references
to sp in other modes (because of the stack_pointer_rtx requirement),
and can't allow rvalue uses of the subreg (because of the "no subregs
after RA" assumption). It all seems a bit of a mess...
I sat on this for a while in the hope that a clean solution might
become apparent, but in the end, I think we'll just have to check
manually for nested subregs and punt on them.
gcc/
PR rtl-optimization/115881
* recog.cc: Include rtl-iter.h.
(insn_propagation::apply_to_rvalue_1): Check that the result
of simplify_subreg does not include nested subregs.
gcc/testsuite/
PR rtl-optimization/115881
* gcc.c-torture/compile/pr115881.c: New test.
|
|
PR ipa/111613
* gcc.c-torture/pr111613.c: Rename to..
* gcc.c-torture/execute/pr111613.c: ...this.
|
|
PR middle-end/115277
* gcc.c-torture/compile/pr115277.c: Rename to...
* gcc.c-torture/execute/pr115277.c: ...this.
|
|
Hi,
this patch fixes wrong code in case store-merging introduces load of function
parameter that was previously write-only (which happens for bitfields).
Without this, the whole store-merged area is consdered to be killed.
PR ipa/111613
gcc/ChangeLog:
* ipa-modref.cc (analyze_parms): Do not preserve EAF_NO_DIRECT_READ and
EAF_NO_INDIRECT_READ from past flags.
gcc/testsuite/ChangeLog:
* gcc.c-torture/pr111613.c: New test.
|
|
function call parameters
modref_eaf_analysis::analyze_ssa_name misinterprets EAF flags. If dereferenced
parameter is passed (to map_iterator in the testcase) it can be returned
indirectly which in turn makes it to escape into the next function call.
PR ipa/115033
gcc/ChangeLog:
* ipa-modref.cc (modref_eaf_analysis::analyze_ssa_name): Fix checking of
EAF flags when analysing values dereferenced as function parameters.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr115033.c: New test.
|
|
unadjusted_ptr_and_unit_offset accidentally throws away the offset computed by
get_addr_base_and_unit_offset. Instead of passing extra_offset it passes offset.
PR ipa/114207
gcc/ChangeLog:
* ipa-prop.cc (unadjusted_ptr_and_unit_offset): Fix accounting of offsets in ADDR_EXPR.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr114207.c: New test.
|
|
Hi,
this testcase shows another poblem with missing comparators for metadata
in ICF. With value ranges available to loop optimizations during early
opts we can estimate number of iterations based on guarding condition that
can be split away by the fnsplit pass. This patch disables ICF when
number of iteraitons does not match.
Bootstrapped/regtesed x86_64-linux, will commit it shortly
gcc/ChangeLog:
PR ipa/115277
* ipa-icf-gimple.cc (func_checker::compare_loops): compare loop
bounds.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr115277.c: New test.
|
|
In the fix for PR115928, I'd failed to notice that "root" was used
later in the function, so needed to be updated.
gcc/
PR rtl-optimization/116009
* rtl-ssa/accesses.cc (function_info::add_def): Set the root
local variable after removing the old clobber group.
gcc/testsuite/
PR rtl-optimization/116009
* gcc.c-torture/compile/pr116009.c: New test.
|
|
According to IEEE standard, for conversions from floating point to
integer. When a NaN or infinite operand cannot be represented in the
destination format and this cannot otherwise be indicated, the invalid
operation exception shall be signaled. When a numeric operand would
convert to an integer outside the range of the destination format, the
invalid operation exception shall be signaled if this situation cannot
otherwise be indicated.
The patch prevent simplication of the conversion from floating point
to integer for NAN/INF/out-of-range constant when flag_trapping_math.
gcc/ChangeLog:
PR rtl-optimization/100927
PR rtl-optimization/115161
PR rtl-optimization/115115
* simplify-rtx.cc (simplify_const_unary_operation): Prevent
simplication of FIX/UNSIGNED_FIX for NAN/INF/out-of-range
constant when flag_trapping_math.
* fold-const.cc (fold_convert_const_int_from_real): Don't fold
for overflow value when_trapping_math.
gcc/testsuite/ChangeLog:
* gcc.dg/pr100927.c: New test.
* c-c++-common/Wconversion-1.c: Add -fno-trapping-math.
* c-c++-common/dfp/convert-int-saturate.c: Ditto.
* g++.dg/ubsan/pr63956.C: Ditto.
* g++.dg/warn/Wconversion-real-integer.C: Ditto.
* gcc.c-torture/execute/20031003-1.c: Ditto.
* gcc.dg/Wconversion-complex-c99.c: Ditto.
* gcc.dg/Wconversion-real-integer.c: Ditto.
* gcc.dg/c90-const-expr-11.c: Ditto.
* gcc.dg/overflow-warn-8.c: Ditto.
|
|
__builtin{add,sub}c [PR108789]
The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR. Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).
2024-06-04 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.
(fold_builtin_addc_subc): Set TREE_SIDE_EFFECTS on call before
calling save_expr.
* gcc.c-torture/execute/pr108789.c: New test.
|
|
The problem here is the pattern added in r13-1162-g9991d84d2a8435
assumes that it is well defined to multiply zero_one_valuep by the truncated
converted integer constant. It is well defined for all types except for signed 1bit types.
Where `a * -1` is produced which is undefined/
So disable this pattern for 1bit signed types.
Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround the undefinedness except when
`-fsanitize=undefined` is turned on, this is why I added a testcase for that.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/115154
gcc/ChangeLog:
* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): Disable
for 1bit signed types.
gcc/testsuite/ChangeLog:
* c-c++-common/ubsan/signed1bitfield-1.c: New test.
* gcc.c-torture/execute/signed1bitfield-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.
Bootstrapped and tested on x86_64_linux-gnu with no regressions.
PR tree-optimization/115143
gcc/ChangeLog:
* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
TARGET_MEM_REF can be used to offset constant base into a memory object (to
produce lea instruction). This confuses points_to_local_or_readonly_memory_p
which treats the constant address as a base of the access.
Bootstrapped/regtsted x86_64-linux, comitted.
Honza
gcc/ChangeLog:
PR ipa/113787
* ipa-fnsummary.cc (points_to_local_or_readonly_memory_p): Do not
look into TARGET_MEM_REFS with constant opreand 0.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr113787.c: New test.
|
|
The optimize_range_tests_to_bit_test optimization normally emits a range
test first:
if (entry_test_needed)
{
tem = build_range_check (loc, optype, unshare_expr (exp),
false, lowi, high);
if (tem == NULL_TREE || is_gimple_val (tem))
continue;
}
so during the bit test we already know that exp is in the [lowi, high]
range, but skips it if we have range info which tells us this isn't
necessary.
Also, normally it emits shifts by exp - lowi counter, but has an
optimization to use just exp counter if the mask isn't a more expensive
constant in that case and lowi is > 0 and high is smaller than prec.
The following testcase is miscompiled because the two abnormal cases
are triggered. The range of exp is [43, 43][48, 48][95, 95], so we on
64-bit arch decide we don't need the entry test, because 95 - 43 < 64.
And we also decide to use just exp as counter, because the range test
tests just for exp == 43 || exp == 48, so high is smaller than 64 too.
Because 95 is in the exp range, we can't do that, we'd either need to
do a range test first, i.e.
if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1))
or need to subtract lowi from the shift counter, i.e.
if ((1UL << (exp - 43)) & mask2)
but can't do both unless r.upper_bound () is < prec.
The following patch ensures that.
2024-05-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114965
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to
optimize away exp - lowi subtraction from shift count unless entry
test is emitted or unless r.upper_bound () is smaller than prec.
* gcc.c-torture/execute/pr114965.c: New test.
|
|
The problem is `!a?b:c` pattern will create a COND_EXPR with an 1bit signed integer
which breaks patterns like `a?~t:t`. This rejects when we have a signed operand for
both patterns.
Note for GCC 15, I am going to look at the canonicalization of `a?~t:t` where t
was a constant since I think keeping it a COND_EXPR might be more canonical and
is what VPR produces from the same IR; if anything expand should handle which one
is better.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/114666
gcc/ChangeLog:
* match.pd (`!a?b:c`): Reject signed types for the condition.
(`a?~t:t`): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/bitfld-signed1-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
After commit e16f90be2dc8af6c371fe79044c3e668fa3dda62
"testsuite: Fix up lra effective target", we get for nvptx target:
-PASS: gcc.c-torture/compile/asmgoto-2.c -O0 (test for excess errors)
+ERROR: gcc.c-torture/compile/asmgoto-2.c -O0 : no files matched glob pattern "lra1020113.c.[0-9][0-9][0-9]r.reload" for " dg-do 2 compile { target lra } "
Etc.
However, nvptx appears to support 'asm goto' with outputs, including the
new execution test case:
PASS: gcc.dg/pr107385.c execution test
Therefore, generally use new effective-target 'asm_goto_with_outputs' instead
of 'lra'. One exceptions is 'gcc.dg/pr110079.c', which doesn't use 'asm goto'
with outputs, and continues using effective-target 'lra', with special-casing
nvptx target, to avoid ERROR for 'lra'.
gcc/
* doc/sourcebuild.texi (Effective-Target Keywords): Document
'asm_goto_with_outputs'. Add comment to 'lra'.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_lra): Add
comment.
(check_effective_target_asm_goto_with_outputs): New.
* gcc.c-torture/compile/asmgoto-2.c: Use it.
* gcc.c-torture/compile/asmgoto-5.c: Likewise.
* gcc.c-torture/compile/asmgoto-6.c: Likewise.
* gcc.c-torture/compile/pr98096.c: Likewise.
* gcc.dg/pr100590.c: Likewise.
* gcc.dg/pr107385.c: Likewise.
* gcc.dg/pr108095.c: Likewise.
* gcc.dg/pr97954.c: Likewise.
* gcc.dg/torture/pr100329.c: Likewise.
* gcc.dg/torture/pr100398.c: Likewise.
* gcc.dg/torture/pr100519.c: Likewise.
* gcc.dg/torture/pr110422.c: Likewise.
* gcc.dg/pr110079.c: Special-case nvptx target.
|
|
r13-990 added optimizations in multiple spots to optimize during
expansion storing of constant initializers into targets.
In the load_register_parameters and expand_expr_real_1 cases,
it checks it has a tree as the source and so knows we are reading
that whole decl's value, so the code is fine as is, but in the
emit_push_insn case it checks for a MEM from which something
is pushed and checks for SYMBOL_REF as the MEM's address, but
still assumes the whole object is copied, which as the following
testcase shows might not always be the case. In the testcase,
k is 6 bytes, then 2 bytes of padding, then another 4 bytes,
while the emit_push_insn wants to store just the 6 bytes.
The following patch simply verifies it is the whole initializer
that is being stored, I think that is best thing to do so late
in GCC 14 cycle as well for backporting.
For GCC 15, perhaps the code could stop requiring it must be at offset zero,
nor that the size is equal, but could use
get_symbol_constant_value/fold_ctor_reference gimple-fold APIs to actually
extract just part of the initializer if we e.g. push just some subset
(of course, still verify that it is a subset). For sizes which are power
of two bytes and we have some integer modes, we could use as type for
fold_ctor_reference corresponding integral types, otherwise dunno, punt
or use some structure (e.g. try to find one in the initializer?), whatever.
But even in the other spots it could perhaps handle loading of
COMPONENT_REFs or MEM_REFs from the .rodata vars.
2024-04-03 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114552
* expr.cc (emit_push_insn): Only use store_constructor for
immediate_const_ctor_p if int_expr_size matches size.
* gcc.c-torture/execute/pr114552.c: New test.
|
|
The testcase in the patch ICEs with
--- gcc/tree-scalar-evolution.cc
+++ gcc/tree-scalar-evolution.cc
@@ -3881,7 +3881,7 @@ final_value_replacement_loop (class loop *loop)
/* Propagate constants immediately, but leave an unused initialization
around to avoid invalidating the SCEV cache. */
- if (CONSTANT_CLASS_P (def) && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt))
+ if (0 && CONSTANT_CLASS_P (def) && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt))
replace_uses_by (rslt, def);
/* Create the replacement statements. */
(the addition of the above made the ICE latent), because profile_count
addition doesn't check for overflows and if unlucky, we can even overflow
into the uninitialized value.
Getting really huge profile counts is very easy even when not using
recursive inlining in loops, e.g.
__attribute__((noipa)) void
bar (void)
{
__builtin_exit (0);
}
__attribute__((noipa)) void
foo (void)
{
for (int i = 0; i < 1000; ++i)
for (int j = 0; j < 1000; ++j)
for (int k = 0; k < 1000; ++k)
for (int l = 0; l < 1000; ++l)
for (int m = 0; m < 1000; ++m)
for (int n = 0; n < 1000; ++n)
for (int o = 0; o < 1000; ++o)
for (int p = 0; p < 1000; ++p)
for (int q = 0; q < 1000; ++q)
for (int r = 0; r < 1000; ++r)
for (int s = 0; s < 1000; ++s)
for (int t = 0; t < 1000; ++t)
for (int u = 0; u < 1000; ++u)
for (int v = 0; v < 1000; ++v)
for (int w = 0; w < 1000; ++w)
for (int x = 0; x < 1000; ++x)
for (int y = 0; y < 1000; ++y)
for (int z = 0; z < 1000; ++z)
for (int a = 0; a < 1000; ++a)
for (int b = 0; b < 1000; ++b)
bar ();
}
int
main ()
{
foo ();
}
reaches the maximum count already on the 11th loop.
Some other methods of profile_count like apply_scale already
do use MIN (val, max_count) before assignment to m_val, this patch
just extends that to operator{+,+=} methods.
Furthermore, one overload of apply_probability wasn't using
safe_scale_64bit and so could very easily overflow as well
- prob is required to be [0, 10000] and if m_val is near the max_count,
it can overflow even with multiplications by 8.
2024-03-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112303
* profile-count.h (profile_count::operator+): Perform
addition in uint64_t variable and set m_val to MIN of that
val and max_count.
(profile_count::operator+=): Likewise.
(profile_count::operator-=): Formatting fix.
(profile_count::apply_probability): Use safe_scale_64bit
even in the int overload.
* gcc.c-torture/compile/pr112303.c: New test.
|
|
This testcase was made latent by r14-4089 and got fixed both
on the trunk and 13 branch with PR113372 fix.
Adding testcase to the testsuite and will close the PR as a dup.
2024-03-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109925
* gcc.c-torture/execute/pr109925.c: New test.
|
|
Apparently I've somehow screwed up the adjustments of the originally tested
testcase, tweaked it so that in the second/third cases it actually see
a MAX_EXPR rather than COND_EXPR the MAX_EXPR has been optimized into,
and didn't update the expected value.
2024-03-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/111151
PR testsuite/114486
* gcc.c-torture/execute/pr111151.c (main): Fix up expected value for
f.
|
|
As I've tried to explain in the comments, the extract_muldiv_1
MIN/MAX_EXPR optimization is wrong for code == MULT_EXPR.
If the multiplication is done in unsigned type or in signed
type with -fwrapv, it is fairly obvious that max (a, b) * c
in many cases isn't equivalent to max (a * c, b * c) (or min if c is
negative) due to overflows, but even for signed with undefined overflow,
the optimization could turn something without UB in it (where
say a * c invokes UB, but max (or min) picks the other operand where
b * c doesn't).
As for division/modulo, I think it is in most cases safe, except if
the problematic INT_MIN / -1 case could be triggered, but we can
just punt for MAX_EXPR because for MIN_EXPR if one operand is INT_MIN,
we'd pick that operand already. It is just for completeness, match.pd
already has an optimization which turns x / -1 into -x, so the division
by zero is mostly theoretical. That is also why in the testcase the
i case isn't actually miscompiled without the patch, while the c and f
cases are.
2024-03-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/111151
* fold-const.cc (extract_muldiv_1) <case MAX_EXPR>: Punt for
MULT_EXPR altogether, or for MAX_EXPR if c is -1.
* gcc.c-torture/execute/pr111151.c: New test.
|
|
Also fixed a typo in the testcase.
gcc/testsuite/ChangeLog:
PR tree-optimization/114396
* gcc.target/i386/pr114396.c: Move to...
* gcc.c-torture/execute/pr114396.c: ...here.
|
|
Excerpt from gcc.sum:
[...]
PASS: gcc.c-torture/execute/20101011-1.c -O0 (test for excess errors)
FAIL: gcc.c-torture/execute/20101011-1.c -O0 execution test
PASS: gcc.c-torture/execute/20101011-1.c -O1 (test for excess errors)
FAIL: gcc.c-torture/execute/20101011-1.c -O1 execution test
[ ... ]
This is because H8 MCUs do not throw a "divide by zero" exception.
gcc/testsuite
* gcc.c-torture/execute/20101011-1.c: Do not test on H8 series.
|
|
WORD_REGISTER_OPERATIONS [PR113010]
The sign-bit-copies of a sign-extending load cannot be known until runtime on
WORD_REGISTER_OPERATIONS targets, except in the case of a zero-extending MEM
load. See the fix for PR112758.
gcc/
PR rtl-optimization/113010
* combine.cc (simplify_comparison): Simplify a SUBREG on
WORD_REGISTER_OPERATIONS targets only if it is a zero-extending
MEM load.
gcc/testsuite
* gcc.c-torture/execute/pr113010.c: New test.
|
|
[PR61159]
gcc.c-torture/compile/pr61159.c currently FAILs on 32 and 64-bit
Solaris/x86 with the native assembler:
FAIL: gcc.c-torture/compile/pr61159.c -O0 (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c -O1 (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c -O2 (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c -O2 -flto (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c -O2 -flto -flto-partition=none (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c -O3 -g (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c -Os (test for excess errors)
Excess errors:
Assembler: pr61159.c
"/var/tmp//ccRtFPva.s", line 5 : Cannot set a weak symbol to a common symbol
This is a bug/limitation in the native assembler. Given that this
hasn't seen fixes for a long time, this patch xfails the test.
Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.
2024-02-24 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
PR ipa/61159
* gcc.c-torture/compile/pr61159.c: xfail on Solaris/x86 with as.
|
|
PR tree-optimization/111054
gcc/ChangeLog:
* tree-ssa-loop-split.cc (split_loop): Check for profile being present.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr111054.c: New test.
|
|
2024-02-11 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/ieee/cdivchkf.c: Use ilogb and
__builtin_fmax instead of ilogbf and __builtin_fmaxf.
|
|
On the following testcase we emit invalid stmt:
error: type mismatch in ‘widen_mult_plus_expr’
6 | foo (int c, int b)
| ^~~
unsigned long
int
unsigned int
unsigned long
_31 = WIDEN_MULT_PLUS_EXPR <b_5(D), 2, _30>;
The recent PR113560 r14-8680 changes tweaked convert_mult_to_widen,
but didn't change convert_plusminus_to_widen for the
TREE_TYPE (rhsN) != typeN cases, but looking at this, it was already
before that change quite weird.
Earlier in those functions it determines actual_precision and from_unsignedN
and wants to use that precision and signedness for the operands and
it used build_and_insert_cast for that (which emits a cast stmt, even for
INTEGER_CSTs) and later on for INTEGER_CST arguments fold_converted them
to typeN (which is unclear to me why, because it seems to have assumed
that TREE_TYPE (rhsN) is typeN, for the actual_precision or from_unsignedN
cases it would be wrong except that build_and_insert_cast forced a SSA_NAME
and so it doesn't trigger anymore).
Now, since r14-8680 it is possible that rhsN also has some other type from
typeN and we again want to cast.
The following patch changes this, so that for the differences in
actual_precision and/or from_unsignedN we actually update typeN and then use
it as the type to convert the arguments to if it isn't useless, for
INTEGER_CSTs by just fold_converting, otherwise using build_and_insert_cast.
And uses useless_type_conversion_p test so that we don't convert unless
necessary. Plus by doing that effectively also doing the important part of
the r14-8680 convert_mult_to_widen changes in convert_plusminus_to_widen.
2024-02-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113759
* tree-ssa-math-opts.cc (convert_mult_to_widen): If actual_precision
or from_unsignedN differs from properties of typeN, update typeN
to build_nonstandard_integer_type. If TREE_TYPE (rhsN) is not
uselessly convertible to typeN, convert it using fold_convert or
build_and_insert_cast depending on if rhsN is INTEGER_CST or not.
(convert_plusminus_to_widen): Likewise.
* gcc.c-torture/compile/pr113759.c: New test.
|
|
[PR111059, PR111911]
C front-end bugs 111059 and 111911 both report ICEs with conversions
to boolean of expressions with integer constant operands that can
appear in an integer constant expression as long as they are not
evaluated (such as division by zero).
The issue is a nested C_MAYBE_CONST_EXPR, with the inner one generated
in build_binary_op to indicate that a subexpression has been fully
folded and should not be folded again, and the outer one in
build_c_cast to indicate that the expression has integer constant
operands. To avoid the inner one from build_binary_op,
c_objc_common_truthvalue_conversion should be given an argument
properly marked as having integer constant operands rather than that
information having been removed by the caller - but because c_convert
would then also wrap a C_MAYBE_CONST_EXPR with a NOP_EXPR converting
to boolean, it seems most convenient to have
c_objc_common_truthvalue_conversion produce the NE_EXPR directly in
the desired type (boolean in this case), before generating any
C_MAYBE_CONST_EXPR there, rather than it always producing a comparison
in integer_type_node and doing a conversion to boolean in the caller.
The same issue as in those PRs also applies for conversion to enums
with a boolean fixed underlying type; that case is also fixed and
tests added for it. Note that not all the tests added failed before
the patch (in particular, the issue was specific to casts and did not
apply for implicit conversions, but some tests of those are added as
well).
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/111059
PR c/111911
gcc/c/
* c-tree.h (c_objc_common_truthvalue_conversion): Add third
argument.
* c-convert.cc (c_convert): For conversions to boolean, pass third
argument to c_objc_common_truthvalue_conversion rather than
converting here.
* c-typeck.cc (build_c_cast): Ensure arguments with integer
operands are marked as such for conversion to boolean.
(c_objc_common_truthvalue_conversion): Add third argument TYPE.
gcc/testsuite/
* gcc.c-torture/compile/pr111059-1.c,
gcc.c-torture/compile/pr111059-2.c,
gcc.c-torture/compile/pr111059-3.c,
gcc.c-torture/compile/pr111059-4.c,
gcc.c-torture/compile/pr111059-5.c,
gcc.c-torture/compile/pr111059-6.c,
gcc.c-torture/compile/pr111059-7.c,
gcc.c-torture/compile/pr111059-8.c,
gcc.c-torture/compile/pr111059-9.c,
gcc.c-torture/compile/pr111059-10.c,
gcc.c-torture/compile/pr111059-11.c,
gcc.c-torture/compile/pr111059-12.c,
gcc.c-torture/compile/pr111911-1.c,
gcc.c-torture/compile/pr111911-2.c: New tests.
|
|
For something like:
void
foo (void)
{
int *ptr;
asm volatile ("%0" : "=w" (ptr));
asm volatile ("%0" :: "m" (*ptr));
}
early-ra would allocate ptr to an FPR for the first asm, thus
leaving an FPR address in the second asm. The address was then
reloaded by LRA to make it valid.
But early-ra shouldn't be allocating at all in that kind of
situation. Doing so caused the ICE in the PR (with LDP fusion).
Fixed by making sure that we record address references as
GPR references.
gcc/
PR target/113623
* config/aarch64/aarch64-early-ra.cc (early_ra::preprocess_insns):
Mark all registers that occur in addresses as needing a GPR.
gcc/testsuite/
PR target/113623
* gcc.c-torture/compile/pr113623.c: New test.
|
|
Since r10-2101-gb631bdb3c16e85f35d3 handle_store uses
count_nonzero_bytes{,_addr} which (more recently limited to statements
with the same vuse) can walk earlier statements feeding the rhs
of the store and call get_stridx on it.
Unlike most of the other functions where get_stridx is called first on
rhs and only later on lhs, handle_store calls get_stridx on the lhs before
the count_nonzero_bytes* call and does some si->nonzero_bytes comparison
on it.
Now, strinfo structures are refcounted and it is important not to screw
it up.
What happens on the following testcase is that we call get_strinfo on the
destination idx's base (g), which returns a strinfo at that moment
with refcount of 2, one copy referenced in bb 2 final strinfos, one in bb 3
(the vector of strinfos was unshared from the dominator there because some
other strinfo was added) and finally we process a store in bb 6.
Now, count_nonzero_bytes is called and that sees &g[1] in a PHI and
calls get_stridx on it, which in turn calls get_stridx_plus_constant
because &g + 1 address doesn't have stridx yet. This creates a new
strinfo for it:
si = new_strinfo (ptr, idx, build_int_cst (size_type_node, nonzero_chars),
basesi->full_string_p);
set_strinfo (idx, si);
and the latter call, because it is the first one in bb 6 that needs it,
unshares the stridx_to_strinfo vector (so refcount of the g strinfo becomes
3).
Now, get_stridx_plus_constant needs to chain the new strinfo of &g[1] in
between the related strinfos, so after the g record. Because the strinfo
is now shared between the current bb and 2 other bbs, it needs to
unshare_strinfo it (creating a new strinfo which can be modified as a copy
of the old one, decrementing refcount of the old shared one and setting
refcount of the new one to 1):
if (strinfo *nextsi = get_strinfo (chainsi->next))
{
nextsi = unshare_strinfo (nextsi);
si->next = nextsi->idx;
nextsi->prev = idx;
}
chainsi = unshare_strinfo (chainsi);
if (chainsi->first == 0)
chainsi->first = chainsi->idx;
chainsi->next = idx;
Now, the bug is that the caller of this a couple of frames above,
handle_store, holds on a pointer to this g strinfo (but doesn't know
about the unsharing, so the pointer is to the old strinfo with refcount
of 2), and later needs to update it, so it
si = unshare_strinfo (si);
and modifies some fields in it.
This creates a new strinfo (with refcount of 1 which is stored into
the vector of the current bb) based on the old strinfo for g and
decrements refcount of the old one to 1. So, now we are in inconsistent
state, because the old strinfo for g is referenced in bb 2 and bb 3
vectors, but has just refcount of 1, and then have one strinfo (the one
created by unshare_strinfo (chainsi) in get_stridx_plus_constant) which
has refcount of 1 but isn't referenced from anywhere anymore.
Later on when we free one of the bb 2 or bb 3 vectors (forgot which)
that decrements refcount from 1 to 0 and poisons the strinfo/returns it to
the pool, but then maybe_invalidate when looking at the other bb's pointer
to it ICEs.
The following patch fixes it by calling get_strinfo again, it is guaranteed
to return non-NULL, but could be an unshared copy instead of the originally
fetched shared one.
I believe we only need to do this refetching for the case where get_strinfo
is called on the lhs before get_stridx is called on other operands, because
we should be always modifying (apart from the chaining changes) the strinfo
for the destination of the statements, not other strinfos just consumed in
there.
2024-01-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113603
* tree-ssa-strlen.cc (strlen_pass::handle_store): After
count_nonzero_bytes call refetch si using get_strinfo in case it
has been unshared in the meantime.
* gcc.c-torture/compile/pr113603.c: New test.
|
|
newlib-src/libc/include/sys/fenv.h doesn't define the FE_* macros that
libgcc expects to enable decimal float support. Only after newlib is
configured and built does an overriding header that defines those
macros become available in objdir/<target>/newlib/targ-include/, but
by then, libgcc has already been built without dfp and libbid.
This has exposed a number of tests that attempt to link dfp programs
without requiring a dfprt effective target.
dfp.exp already skips if dfp support is missing altogether, and sets
the default to compile rather than run if dfp support is present in
the compiler but missing in the runtime libraries.
However, some of the dfp tests override the default without requiring
dfprt. Drop the overriders where reasonable, and add the explicit
requirement elsewhere.
for gcc/testsuite/ChangeLog
* c-c++-common/dfp/pr36800.c: Drop dg-do overrider.
* c-c++-common/dfp/pr39034.c: Likewise.
* c-c++-common/dfp/pr39035.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d32-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d64-2.c: Likewise.
* gcc.dg/dfp/builtin-snan-1.c: Likewise.
* gcc.dg/dfp/builtin-tgmath-dfp.c: Likewise.
* gcc.dg/dfp/c23-float-dfp-4.c: Likewise.
* gcc.dg/dfp/c23-float-dfp-5.c: Likewise.
* gcc.dg/dfp/c23-float-dfp-6.c: Likewise.
* gcc.dg/dfp/c23-float-dfp-7.c: Likewise.
* gcc.dg/dfp/pr108068.c: Likewise.
* gcc.dg/dfp/pr97439.c: Likewise.
* g++.dg/compat/decimal/pass-1_main.C: Require dfprt.
* g++.dg/compat/decimal/pass-2_main.C: Likewise.
* g++.dg/compat/decimal/pass-3_main.C: Likewise.
* g++.dg/compat/decimal/pass-4_main.C: Likewise.
* g++.dg/compat/decimal/pass-5_main.C: Likewise.
* g++.dg/compat/decimal/pass-6_main.C: Likewise.
* g++.dg/compat/decimal/return-1_main.C: Likewise.
* g++.dg/compat/decimal/return-2_main.C: Likewise.
* g++.dg/compat/decimal/return-3_main.C: Likewise.
* g++.dg/compat/decimal/return-4_main.C: Likewise.
* g++.dg/compat/decimal/return-5_main.C: Likewise.
* g++.dg/compat/decimal/return-6_main.C: Likewise.
* g++.dg/eh/dfp-1.C: Likewise.
* g++.dg/eh/dfp-2.C: Likewise.
* g++.dg/eh/dfp-saves-aarch64.C: Likewise.
* gcc.c-torture/execute/pr80692.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-1.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-2.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-3.c: Likewise.
* gcc.dg/dfp/bid-non-canonical-d128-4.c: Likewise.
|
|
The fix for PR113089 introduced range-based for loops over the
debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a
debug insn, the use would get removed from the use list, and thus we
would end up using an invalidated iterator in the next iteration of the
loop. In practice this means we end up terminating the loop
prematurely, and hence ICE as in PR113089 since there are debug uses
that we failed to fix up.
This patch fixes that by introducing a general mechanism to avoid this
sort of problem. We introduce a safe_iterator to iterator-utils.h which
wraps an iterator, and also holds the end iterator value. It then
pre-computes the next iterator value at all iterations, so it doesn't
matter if the original iterator got invalidated during the loop body, we
can still move safely to the next iteration.
We introduce an iterate_safely helper which effectively adapts a
container such as iterator_range into a container of safe_iterators over
the original iterator type.
We then use iterate_safely around all loops over debug_insn_uses () in
the aarch64 ldp/stp pass to fix PR113616. While doing this, I
remembered that cleanup_tombstones () had the same problem. I
previously worked around this locally by manually maintaining the next
nondebug insn, so this patch also refactors that loop to use the new
iterate_safely helper.
While doing that I noticed that a couple of cases in cleanup_tombstones
could be converted from using dyn_cast<set_info *> to as_a<set_info *>,
which should be safe because there are no clobbers of mem in RTL-SSA, so
all defs of memory should be set_infos.
gcc/ChangeLog:
PR target/113616
* config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add):
Use iterate_safely when iterating over debug uses.
(fixup_debug_uses): Likewise.
(ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate
over nondebug insns instead of manually maintaining the next insn.
* iterator-utils.h (class safe_iterator): New.
(iterate_safely): New.
gcc/testsuite/ChangeLog:
PR target/113616
* gcc.c-torture/compile/pr113616.c: New test.
|
|
On the following testcase we emit an invalid range of [2, 1] due to
UB in the source. Older VRP code silently swapped the boundaries and
made [1, 2] range out of it, but newer code just ICEs on it.
The reason for pdata->minlen 2 is that we see a memcpy in this case
setting both elements of the array to non-zero value, so strlen (a)
can't be smaller than 2. The reason for pdata->maxlen 1 is that in
char a[2] array without UB there can be at most 1 non-zero character
because there needs to be '\0' termination in the buffer too.
IMHO we shouldn't create invalid ranges like that and even creating
for that case a range [1, 2] looks wrong to me, so the following patch
just doesn't set maxlen in that case to the array size - 1, matching
what will really happen at runtime when triggering such UB (strlen will
be at least 2, perhaps more or will crash).
This is what the second hunk of the patch does.
The first hunk fixes a fortunately harmless thinko.
If the strlen pass knows the string length (i.e. get_string_length
function returns non-NULL), we take a different path, we get to this
only if all we know is that there are certain number of non-zero
characters but we don't know what it is followed with, whether further
non-zero characters or zero termination or either of that.
If we know exactly how many non-zero characters it is, such as
char a[42];
...
memcpy (a, "01234567890123456789", 20);
then we take an earlier if for the INTEGER_CST case and set correctly
just pdata->minlen to 20 in that case, but if we have something like
int len;
...
if (len < 15 || len > 32) return;
memcpy (a, "0123456789012345678901234567890123456789", len);
then we have [15, 32] range for the nonzero_chars and we set pdata->minlen
correctly to 15, but incorrectly set also pdata->maxlen to 32. That is
not what the above implies, it just means that in some cases we know that
there are at least 32 non-zero characters, followed by something we don't
know. There is no guarantee that there is '\0' right after it, so it
means nothing.
The reason this is harmless, just confusing, is that the code a few lines
later fortunately overwrites this incorrect pdata->maxlen value with
something different (either array length - 1 or all ones etc.).
2024-01-29 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110603
* tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect
setting of pdata->maxlen to vr.upper_bound (which is unconditionally
overwritten anyway). Avoid creating invalid range with minlen
larger than maxlen. Formatting fix.
* gcc.c-torture/compile/pr110603.c: New test.
|
|
As the PR shows, we were missing code to update debug uses in the
load/store pair fusion pass. This patch fixes that.
The patch tries to give a complete treatment of the debug uses that will
be affected by the changes we make, and in particular makes an effort to
preserve debug info where possible, e.g. when re-ordering an update of
a base register by a constant over a debug use of that register. When
re-ordering loads over a debug use of a transfer register, we reset the
debug insn. Likewise when re-ordering stores over debug uses of mem.
While doing this I noticed that try_promote_writeback used a strange
choice of move_range for the pair insn, in that it chose the previous
nondebug insn instead of the insn itself. Since the insn is being
changed, these move ranges are equivalent (at least in terms of nondebug
insn placement as far as RTL-SSA is concerned), but I think it is more
natural to choose the pair insn itself. This is needed to avoid
incorrectly updating some debug uses.
gcc/ChangeLog:
PR target/113089
* config/aarch64/aarch64-ldp-fusion.cc (reset_debug_use): New.
(fixup_debug_use): New.
(fixup_debug_uses_trailing_add): New.
(fixup_debug_uses): New. Use it ...
(ldp_bb_info::fuse_pair): ... here.
(try_promote_writeback): Call fixup_debug_uses_trailing_add to
fix up debug uses of the base register that are affected by
folding in the trailing add insn.
gcc/testsuite/ChangeLog:
PR target/113089
* gcc.c-torture/compile/pr113089.c: New test.
|
|
The PR shows two different cases where try_promote_writeback produces an
RTL pattern which isn't recognized. Currently this leads to an ICE, as
we assert recog success, but I think it's better just to back out of the
changes gracefully if recog fails (as we do in the main fuse_pair case).
In theory since we check the ranges here recog shouldn't fail (which is
why I had the assert in the first place), but the PR shows an edge case
in the patterns where if we form a pre-writeback pair where the
writeback offset is exactly -S, where S is the size in bytes of one
transfer register, we fail to match the expected pattern as the patterns
look explicitly for plus operands in the mems. I think fixing this
would require adding at least four new special-case patterns to
aarch64.md for what doesn't seem to be a particularly useful variant of
the insns. Even if we were to do that, I think it would be GCC 15
material, and it's better to just punt for GCC 14.
The ILP32 case in the PR is a bit different, as that shows us trying to
combine a pair with DImode base register operands in the mems together
with an SImode trailing update of the base register. This leads to us
forming an RTL pattern which references the base register in both SImode
and DImode, which also fails to recog. Again, I think it's best just to
take the missed optimization for now. If we really want to make this
(try_promote_writeback) work for ILP32, we can try to do it for GCC 15.
gcc/ChangeLog:
PR target/113114
* config/aarch64/aarch64-ldp-fusion.cc (try_promote_writeback):
Don't assert recog success, just punt if the writeback pair
isn't recognized.
gcc/testsuite/ChangeLog:
PR target/113114
* gcc.c-torture/compile/pr113114.c: New test.
* gcc.target/aarch64/pr113114.c: New test.
|
|
This testcase was fixed with r13-1695-gb0f02eeb906b63 which
added an Ada testcase for the issue but adding a C testcase
is a good idea and that is what this does.
Committed after making sure it passes on x86_64-linux-gnu.
PR ipa/110705
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr110705-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
[PR113221]
So the problem here is that aarch64_ldp_reg_operand will all subreg even subreg of lo_sum.
When LRA tries to fix that up, all things break. So the fix is to change the check to only
allow reg and subreg of regs.
Note the tendancy here is to use register_operand but that checks the mode of the register
but we need to allow a mismatch modes for this predicate for now.
Built and tested for aarch64-linux-gnu with no regressions
(Also tested with the LD/ST pair pass back on).
PR target/113221
gcc/ChangeLog:
* config/aarch64/predicates.md (aarch64_ldp_reg_operand): For subreg,
only allow REG operands instead of allowing all.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr113221-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This testcase started to hang at -O3 with r13-4208 and got fixed
with r14-2097.
2024-01-17 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110251
* gcc.c-torture/compile/pr110251.c: New test.
|
|
The following patch adds a quick workaround to bugs in VAR_DECL
partitioning.
The problem is that there is no dependency between ADDR_EXPRs of local
decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs
(including ivopts integral variants thereof), which can break
add_scope_conflicts discovery of what variables are actually used
in certain region.
E.g. we can have
ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
...
uses of ivtmp.40_3
...
bitint.6 ={v} {CLOBBER(eos)};
...
ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
...
uses of ivtmp.28_43
before VN (such as dom3), which the add_scope_conflicts code identifies as 2
independent uses of bitint.6 variable (which is correct), but then VN
determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3
even in the second region; at that point add_scope_conflict thinks the
bitint.6 variable is not used in that region anymore.
The following patch does a simple single def-stmt check for such ADDR_EXPRs
(rather than say trying to do a full propagation of what SSA_NAMEs can
contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs.
In addition to this patch I've used the attached one to gather statistics
on the total size of all variable partitions in a function and seems besides
the new testcases nothing is really affected compared to no patch (I've
actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so
it looks the same except that it never triggers). The comparison wasn't
perfect because I've only gathered BITS_PER_WORD, main_input_filename (did
some replacement of build directories and /tmp/ccXXXXXX names of LTO to make
it more similar between the two bootstraps/regtests), current_function_name
and the total size of all variable partitions if any, because I didn't
record e.g. the optimization options and so e.g. torture tests which iterate
over options could have different partition sizes even in one compiler when
BITS_PER_WORD, main_input_filename and current_function_name are all equal.
So had to write an awk script to check if the first triple in the second
build appeared in the first one and the quadruple in the second build
appeared in the first one too, otherwise print result and that only
triggered in the new tests.
Also, the cc1plus binary according to objdump -dr is identical between the
two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots.
2024-01-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113372
PR middle-end/90348
PR middle-end/110115
PR middle-end/111422
* cfgexpand.cc (add_scope_conflicts_2): New function.
(add_scope_conflicts_1): Use it.
* gcc.dg/torture/bitint-49.c: New test.
* gcc.c-torture/execute/pr90348.c: New test.
* gcc.c-torture/execute/pr110115.c: New test.
* gcc.c-torture/execute/pr111422.c: New test.
|
|
The problem here is after the recent vectorizer improvements, we end up
with a comparison against a vector bool 0 which then tries expand_single_bit_test
which is not expecting vector comparisons at all.
The IR was:
vector(4) <signed-boolean:1> mask_patt_5.13;
_Bool _12;
mask_patt_5.13_44 = vect_perm_even_41 != { 0.0, 1.0e+0, 2.0e+0, 3.0e+0 };
_12 = mask_patt_5.13_44 == { 0, 0, 0, 0 };
and we tried to call expand_single_bit_test for the last comparison.
Rejecting the vector comparison is needed.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR middle-end/113322
gcc/ChangeLog:
* expr.cc (do_store_flag): Don't try single bit tests with
comparison on vector types.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr113322-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Like r14-2293-g11350734240dba and r14-2289-gb083203f053f16,
reassociation can combine across a few bb and one of the usage
can be an uninitializated variable and if going from an conditional
usage to an unconditional usage can cause wrong code.
This uses maybe_undef_p like other passes where this can happen.
Note if-to-switch uses the function (init_range_entry) provided
by ressociation so we need to call mark_ssa_maybe_undefs there;
otherwise we assume almost all ssa names are uninitialized.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/112581
* gimple-if-to-switch.cc (pass_if_to_switch::execute): Call
mark_ssa_maybe_undefs.
* tree-ssa-reassoc.cc (can_reassociate_op_p): Uninitialized
variables can not be reassociated.
(init_range_entry): Check for uninitialized variables too.
(init_reassoc): Call mark_ssa_maybe_undefs.
gcc/testsuite/ChangeLog:
PR tree-optimization/112581
* gcc.c-torture/execute/pr112581-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
[PR113210]
On the following testcase e.g. on riscv64 or aarch64 (latter with
-O3 -march=armv8-a+sve ) we ICE, because while NITERS is INTEGER_CST,
NITERSM1 is a complex expression like
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0
where a.0_1 is unsigned char. The condition is never true, so the above
is equivalent to just 0, but only when trying to fold the above with
PLUS_EXPR 1 we manage to simplify it (first
~(short unsigned int) (a.0_1 + 255)
to
-(short unsigned int) (a.0_1 + 255)
and then
(short unsigned int) (a.0_1 + 255) + 1 > 256 ? -(short unsigned int) (a.0_1 + 255) : 1
to
(short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) : 1
and only at this point we fold the condition to be false.
But the vectorizer seems to assume that if NITERS is known (i.e. suitable
INTEGER_CST) then NITERSM1 also is, so the following hack ensures that if
NITERS folds into INTEGER_CST NITERSM1 will be one as well.
2024-01-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113210
* tree-vect-loop.cc (vect_get_loop_niters): If non-INTEGER_CST
value in *number_of_iterationsm1 PLUS_EXPR 1 is folded into
INTEGER_CST, recompute *number_of_iterationsm1 as the INTEGER_CST
minus 1.
* gcc.c-torture/compile/pr113210.c: New test.
|
|
The following testcase ICEs during regimplificatgion since the addition of
(convert (eqne zero_one_valued_p@0 INTEGER_CST@1))
simplification. That simplification is novel in the sense that in
gimplify_expr it can turn an expression (comparison in particular) into
a SSA_NAME. Normally when gimplify_expr sees originally a SSA_NAME, it does
case SSA_NAME:
/* Allow callbacks into the gimplifier during optimization. */
ret = GS_ALL_DONE;
break;
and doesn't try to recalculate side effects because of that, but in this
case gimplify_expr normally enters the:
default:
switch (TREE_CODE_CLASS (TREE_CODE (*expr_p)))
{
case tcc_comparison:
then does
*expr_p = gimple_boolify (*expr_p);
and then
*expr_p = fold_convert_loc (input_location,
org_type, *expr_p);
with this new match.pd simplification turns that tcc_comparison class
into SSA_NAME. Unlike the outer SSA_NAME handling though, this falls
through into
recalculate_side_effects (*expr_p);
dont_recalculate:
break;
but unfortunately recalculate_side_effects doesn't handle SSA_NAME and ICEs
on it.
SSA_NAMEs don't ever have TREE_SIDE_EFFECTS set on those, so the following
patch fixes it by handling it similarly to the tcc_constant case.
2024-01-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113228
* gimplify.cc (recalculate_side_effects): Do nothing for SSA_NAMEs.
* gcc.c-torture/compile/pr113228.c: New test.
|
|
gcc/testsuite/
PR testsuite/52641
* gcc.c-torture/compile/attr-complex-method-2.c [target=avr]: Check
for "divsc3" as double = float per default.
* gcc.c-torture/compile/pr106537-1.c: Use __INTPTR_TYPE__ instead of
hard-coded "long".
* gcc.c-torture/compile/pr106537-2.c: Same.
* gcc.c-torture/compile/pr106537-3.c: Same.
* gcc.c-torture/execute/20230630-3.c: Use __INT32_TYPE__ for bit-field
wider than 16 bits.
* gcc.c-torture/execute/20230630-4.c: Same.
* gcc.c-torture/execute/pr109938.c: Require int32plus.
* gcc.c-torture/execute/pr109986.c: Same.
* gcc.dg/fold-ior-4.c: Same.
* gcc.dg/fold-ior-5.c: Same
* gcc.dg/fold-parity-5.c: Same.
* gcc.dg/fold-popcount-5.c: Same.
* gcc.dg/builtin-bswap-13.c [sizeof(int) < 4]: Use __INT32_TYPE__
instead of int.
* gcc.dg/builtin-bswap-14.c: Use __INT32_TYPE__ instead of int where
required by code.
* gcc.dg/c23-constexpr-9.c: Require large_double.
* gcc.dg/c23-nullptr-1.c [target=avr]: xfail.
* gcc.dg/loop-unswitch-10.c: Require size32plus.
* gcc.dg/loop-unswitch-14.c: Same.
* gcc.dg/loop-unswitch-11.c: Require int32.
* gcc.dg/pr101836.c: Use __SIZEOF_INT instead of hard-coded 4.
* gcc.dg/pr101836_1.c: Same.
* gcc.dg/pr101836_2.c: Same.
* gcc.dg/pr101836_3.c: Same.
|
|
The following testcase ICEs when rslt is SSA_NAME_OCCURS_IN_ABNORMAL_PHI
and we call replace_uses_by with a INTEGER_CST def, where it ICEs on:
if (e->flags & EDGE_ABNORMAL
&& !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (val))
because val is not an SSA_NAME. One way would be to add
&& TREE_CODE (val) == SSA_NAME
check in between the above 2 lines in replace_uses_by.
And/or the following patch just punts propagating constants to
SSA_NAME_OCCURS_IN_ABNORMAL_PHI rslt uses.
Or we could punt somewhere earlier in final value replacement (but dunno
where).
2024-01-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113201
* tree-scalar-evolution.cc (final_value_replacement_loop): Don't call
replace_uses_by on SSA_NAME_OCCURS_IN_ABNORMAL_PHI rslt.
* gcc.c-torture/compile/pr113201.c: New test.
|
|
gcc/testsuite
* gcc.c-torture/compile/mipscop-1.c: Include stdio.h.
* gcc.c-torture/compile/mipscop-2.c: Ditto.
* gcc.c-torture/compile/mipscop-3.c: Ditto.
* gcc.c-torture/compile/mipscop-4.c: Ditto.
|
|
|
|
WORD_REGISTER_OPERATIONS targets [PR112758]
As discussed in the PR, the following testcase is miscompiled on RISC-V
64-bit, because num_sign_bit_copies in one spot pretends the bits in
a paradoxical SUBREG beyond SUBREG_REG SImode are all sign bit copies:
5444 /* For paradoxical SUBREGs on machines where all register operations
5445 affect the entire register, just look inside. Note that we are
5446 passing MODE to the recursive call, so the number of sign bit
5447 copies will remain relative to that mode, not the inner mode.
5448
5449 This works only if loads sign extend. Otherwise, if we get a
5450 reload for the inner part, it may be loaded from the stack, and
5451 then we lose all sign bit copies that existed before the store
5452 to the stack. */
5453 if (WORD_REGISTER_OPERATIONS
5454 && load_extend_op (inner_mode) == SIGN_EXTEND
5455 && paradoxical_subreg_p (x)
5456 && MEM_P (SUBREG_REG (x)))
and then optimizes based on that in one place, but then the
r7-1077 optimization triggers in and treats all the upper bits in
paradoxical SUBREG as undefined and performs based on that another
optimization. The r7-1077 optimization is done only if SUBREG_REG
is either a REG or MEM, from the discussions in the PR seems that if
it is a REG, the upper bits in paradoxical SUBREG on
WORD_REGISTER_OPERATIONS targets aren't really undefined, but we can't
tell what values they have because we don't see the operation which
computed that REG, and for MEM it depends on load_extend_op - if
it is SIGN_EXTEND, the upper bits are sign bit copies and so something
not really usable for the optimization, if ZERO_EXTEND, they are zeros
and it is usable for the optimization, for UNKNOWN I think it is better
to punt as well.
So, the following patch basically disables the r7-1077 optimization
on WORD_REGISTER_OPERATIONS unless we know it is still ok for sure,
which is either if sub_width is >= BITS_PER_WORD because then the
WORD_REGISTER_OPERATIONS rules don't apply, or load_extend_op on a MEM
is ZERO_EXTEND.
2023-12-22 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/112758
* combine.cc (make_compopund_operation_int): Optimize AND of a SUBREG
based on nonzero_bits of SUBREG_REG and constant mask on
WORD_REGISTER_OPERATIONS targets only if it is a zero extending
MEM load.
* gcc.c-torture/execute/pr112758.c: New test.
|