Age | Commit message (Collapse) | Author | Files | Lines |
|
There's two new tests that are dependent on logical-op-non-short-circuit
settings. The BZ is reported against ppc64 and ppc64le, but also applies to a
goodly number of the other targets.
The "regression" fix is trivial, just add the appropriate param to force the
behavior we're expecting. I'm committing that fix momentarily. It's been
verified on ppc64, ppc64le and x86_64 as well as the various embedded targets
in my tester where many FAILS flip to PASS.
I'm leaving the bug open without the regression marker as Jakub has noted a
couple of improvements that we can and probably should make.
PR target/116860
gcc/testsuite
* gcc.dg/tree-ssa/fold-xor-and-or.c: Set logical-op-non-short-circuit.
* gcc.dg/tree-ssa/fold-xor-or.c: Similarly.
|
|
Zhendong Su and Michal Jireš found out that our gimple DSE pass can,
under fairly specific conditions, remove a noreturn call which then
leaves behind a "normal" BB with no successor edges which following
passes do not expect. This patch simply tells the pass to leave such
calls alone even when they otherwise appear to be dead.
Interestingly, our CFG verifier does not report this. I'll put on my
todo list to add a test for it in the next stage 1.
gcc/ChangeLog:
2025-01-28 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117892
* tree-ssa-dse.cc (dse_optimize_call): Leave control-altering
noreturn calls alone.
gcc/testsuite/ChangeLog:
2025-01-27 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117892
* gcc.dg/tree-ssa/pr117892.c: New test.
* gcc.dg/tree-ssa/pr118517.c: Likewise.
co-authored-by: Michal Jireš <mjires@suse.cz>
|
|
When we expand BIT_FIELD_REF <x_2(D), 8, 8> we can end up creating
a stack local, running into the fix. But get_object_alignment
will return 8 for any SSA_NAME because that's not an "object" we
handle. Deal with handled components on registers by singling out
SSA_NAME bases, using their type alignment instead of
get_object_alignment (I considered "robustifying" get_object_alignment,
but decided not to at this point).
This fixes an ICE on gcc.dg/pr41123.c on arm as reported by the CI.
PR middle-end/118684
* expr.cc (expand_expr_real_1): When creating a stack local
during expansion of a handled component, when the base is
a SSA_NAME use its type alignment and avoid calling
get_object_alignment.
* gcc.dg/pr118684.c: Require automatic_stack_alignment.
|
|
The following fixes a not properly aligned stack temporary created
during RTL expansion of a MEM_REF that we handle as a BIT_FIELD_REF
whose base was allocated to a register but which was originally
aligned to allow a larger load not trapping. While probably UB
in C the vectorizer creates aligned accesses that might overread
a (static) allocation because it is then known not to trap.
PR middle-end/118684
* expr.cc (expand_expr_real_1): When expanding a reference
based on a register and we end up needing a MEM make sure
that's aligned as the original reference required.
* gcc.dg/pr118684.c: New testcase.
|
|
gcc/ChangeLog:
PR other/118675
* diagnostic-format-sarif.cc: Define INCLUDE_STRING.
(escape_braces): New.
(set_string_property_escaping_braces): New.
(sarif_builder::make_message_object): Escape braces in the "text"
property.
(sarif_builder::make_message_object_for_diagram): Likewise, and
for the "markdown" property.
(sarif_builder::make_multiformat_message_string): Likewise for the
"text" property.
(xelftest::test_message_with_braces): New.
(selftest::diagnostic_format_sarif_cc_tests): Call it.
gcc/testsuite/ChangeLog:
PR other/118675
* gcc.dg/sarif-output/bad-binary-op.py: Update expected output for
escaping of braces in message text.
* gcc.dg/sarif-output/missing-semicolon.py: Likewise.
* gcc.dg/sarif-output/multiple-outputs.py: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
The following addresses a bug in tree_could_trap_p leading to
hoisting of a possibly trapping, because of out-of-bound, access.
We only ensured the first accessed byte is within a decl there,
the patch makes sure the whole base of the reference is within it.
This is pessimistic if a handled component would then subset to
a sub-object within the decl but upcasting of a decl to larger
types should be uncommon, questionable, and wrong without
-fno-strict-aliasing.
The testcase is a bit fragile, but I could not devise a (portable)
way to ensure an out-of-bound access to a decl would fault.
PR tree-optimization/117424
* tree-eh.cc (tree_could_trap_p): Verify the base is
fully contained within a decl.
* gcc.dg/tree-ssa/ssa-lim-25.c: New testcase.
|
|
The OpenACC reduction clause on compute construct implies a copy clause
for each reduction variable [1]. This patch adds tests to check if the
implied copy is being generated. The check covers various types and
operators as described in the specification.
[1] OpenACC 2.7 Specification section 2.5.13
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/implied-copy-1.c: New test.
* c-c++-common/goacc/implied-copy-2.c: New test.
* g++.dg/goacc/implied-copy.C: New test.
* gcc.dg/goacc/implied-copy.c: New test.
* gfortran.dg/goacc/implied-copy-1.f90: New test.
* gfortran.dg/goacc/implied-copy-2.f90: New test.
|
|
element type [PR116357]
In the following testcase we error on the first case because it is
trying to construct an array from overaligned type, but if there are
qualifiers, we accept it silently (unlike in C++ which diagnoses all 3).
The problem is that grokdeclarator if TYPE_QUALS (element_type) is
non-zero just uses TYPE_MAIN_VARIANT; that loses not just the qualifiers
but also attributes, alignment etc.
The following patch uses c_build_qualified_type with TYPE_UNQUALIFIED instead,
which will be in the common case the same as TYPE_MAIN_VARIANT if the
checks are satisfied for it, but if not, will look up different unqualified
type or even create it if there is none.
2025-01-28 Jakub Jelinek <jakub@redhat.com>
PR c/116357
* c-decl.cc (grokdeclarator): Use c_build_qualified_type with
TYPE_UNQUALIFIED instead of TYPE_MAIN_VARIANT.
* gcc.dg/pr116357.c: New test.
|
|
After testing on the BPI (4.2% improvement for x264 input 1, 4.4% for
input 2) and the discussion in PR117173 I figured it's best to disable
the two-source permutes by default for now.
The patch adds a parameter "riscv-two-source-permutes" which restores
the old behavior.
PR target/117173
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_generic_patterns): Only
support single-source permutes by default.
* config/riscv/riscv.opt: New param "riscv-two-source-permutes".
gcc/testsuite/ChangeLog:
* gcc.dg/fold-perm-2.c: Run with two-source permutes.
* gcc.dg/pr54346.c: Ditto.
|
|
The checking code didn't take into account debug uses.
PR tree-optimization/118653
* tree-vect-loop.cc (vectorizable_live_operation): Also allow
out-of-loop debug uses.
* gcc.dg/vect/pr118653.c: New testcase.
|
|
The following fixes an issue in the RTL combiner where we correctly
combine two vector sign-extends with a vector load
Trying 7, 9 -> 10:
7: r106:V4QI=[r119:DI]
REG_DEAD r119:DI
9: r108:V4HI=sign_extend(vec_select(r106:V4QI#0,parallel))
10: r109:V4SI=sign_extend(vec_select(r108:V4HI#0,parallel))
REG_DEAD r108:V4HI
to
modifying insn i2 9: r109:V4SI=sign_extend([r119:DI])
but since r106 is used we wrongly materialize it using a subreg:
modifying insn i3 10: r106:V4QI=r109:V4SI#0
which of course does not work for modes with more than one component,
specifically vector and complex modes.
PR rtl-optimization/118662
* combine.cc (try_combine): When re-materializing a load
from an extended reg by a lowpart subreg make sure we're
not dealing with vector or complex modes.
* gcc.dg/torture/pr118662.c: New testcase.
|
|
When RTL expansion of an out-of-bound access of a register falls
back to a BIT_FIELD_REF we have to ensure that's valid. The
following avoids negative offsets by expanding through a stack
temporary.
PR middle-end/118643
* expr.cc (expand_expr_real_1): Avoid falling back to BIT_FIELD_REF
expansion for negative offset.
* gcc.dg/pr118643.c: New testcase.
|
|
When we get a zero distance vector we still have to check for the
situation of a common inner loop with zero distance. But we can
still allow a zero distance for the loop we distribute
(gcc.dg/tree-ssa/ldist-33.c is such a case). This is because
zero distances in non-outermost loops are a misrepresentation
of dependence by dependence analysis.
Note that test coverage of loop distribution of loop nests is
very low.
PR tree-optimization/112859
PR tree-optimization/115347
* tree-loop-distribution.cc
(loop_distribution::pg_add_dependence_edges): For a zero
distance vector still make sure to not have an inner
loop with zero distance.
* gcc.dg/torture/pr112859.c: New testcase.
* gcc.dg/torture/pr115347.c: Likewise.
|
|
[PR118637]
We already do this canonicalization in
simplify_using_ranges::simplify_div_or_mod_using_ranges, but that means
that it is not done at -O1 or when vrp is otherwise disabled, and that
it can be done too late in some cases when e.g. the r8-2064
"X / C1 op C2 into a simple range test." optimization triggers first.
Note, for unsigned modulo we already have
(simplify
(mod @0 (convert? (power_of_two_cand@1 @2)))
(if ((TYPE_UNSIGNED (type) || tree_expr_nonnegative_p (@0))
...
optimization which duplicates what
simplify_using_ranges::simplify_div_or_mod_using_ranges
does in case ranges aren't needed.
For GCC 16 I think we should improve the niters pattern recognition
and handle even what r8-2064 comes with, after all as I've tried to show
in the PR the user could have written it that way.
I've guarded this optimization on #if GIMPLE just in case this would stand
in any way to the various divmult etc. simplification, guess that can be
lifted for GCC 16 too. In the modulo case we also handle
unsigned % (power_of_two << n), but not really sure if we could do that
for the division, because unsigned / (power_of_two << n) is not simple
unsigned >> (log2 (power_of_two) + n), one can shift the bit out and then
it becomes just 0.
2025-01-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118637
* match.pd: Canonicalize unsigned division by power of two to
right shift.
* gcc.dg/tree-ssa/pr118637.c: New test.
|
|
This patch fixes the ICE caused when comparing log or exp of a constant with
another constant.
The transform is now restricted to cases where the resultant
log/exp (CST) can be constant folded.
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
PR target/118490
* match.pd: Added ! to verify that log/exp (CST) can be constant folded.
gcc/testsuite/ChangeLog:
PR target/118490
* gcc.dg/pr118490.c: New test.
|
|
When comparing a signed narrow variable with a wider constant that has
the bit corresponding to the variable's sign bit set, we would check
that the constant is a sign-extension from that sign bit, and conclude
that the compare fails if it isn't.
When the signed variable is masked without getting the [lr]l_signbit
variable set, or when the sign bit itself is masked out, we know the
sign-extension bits from the extended variable are going to be zero,
so the constant will only compare equal if it is a zero- rather than
sign-extension from the narrow variable's precision, therefore, check
that it satisfies this property, and yield a false compare result
otherwise.
for gcc/ChangeLog
PR tree-optimization/118572
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Compare as
unsigned the variables whose extension bits are masked out.
for gcc/testsuite/ChangeLog
PR tree-optimization/118572
* gcc.dg/field-merge-24.c: New.
|
|
Check that BIT_FIELD_REFs of DECLs are in range before deciding they
don't trap.
Check that a replacement bitfield load is as trapping as the replaced
load.
for gcc/ChangeLog
PR tree-optimization/118514
* tree-eh.cc (bit_field_ref_in_bounds_p): New.
(tree_could_trap_p) <BIT_FIELD_REF>: Call it.
* gimple-fold.cc (make_bit_field_load): Check trapping status
of replacement load against original load.
for gcc/testsuite/ChangeLog
PR tree-optimization/118514
* gcc.dg/field-merge-23.c: New.
|
|
rtl-ssa uses degenerate phis to maintain an RPO list of
accesses in which every use is of the RPO-previous definition.
Thus, if it finds that a phi is always equal to a particular
value V, it sometimes needs to keep the phi and make V the
single input, rather than replace all uses of the phi with V.
The code to do that rerouted the phi's first input to the single
value V. But as this PR shows, it failed to unlink the uses of
the other inputs.
The specific problem in the PR was that we had:
x = PHI<x(a), V(b)>
The code replaced the first input with V and removed the second
input from the phi, but it didn't unlink the use of V associated
with that second input.
gcc/
PR rtl-optimization/118562
* rtl-ssa/blocks.cc (function_info::replace_phi): When converting
to a degenerate phi, make sure to remove all uses of the previous
inputs.
gcc/testsuite/
PR rtl-optimization/118562
* gcc.dg/torture/pr118562.c: New test.
|
|
The fold_builtin_frexp folding for NaN/Inf just returned the first argument
with evaluating second arguments side-effects, rather than storing something
to what the second argument points to.
The PR argues that the C standard requires the function to store something
there but what exactly is stored is unspecified, so not storing there
anything can result in UB if the value isn't initialized and is read later.
glibc and newlib store there 0, musl apparently doesn't store anything.
The following patch stores there zero (or would you prefer storing there
some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form
in assembly though) and adjusts the test so that it
doesn't rely on not storing there anything but instead checks for
-Wmaybe-uninitialized warning to find out that something has been stored
there.
Unfortunately I had to disable the NaN tests for -O0, while we can fold
__builtin_isnan (__builtin_nan ("")) at compile time, we can't fold
__builtin_isnan ((i = 0, __builtin_nan (""))) at compile time.
fold_builtin_classify uses just tree_expr_nan_p and if that isn't true
(because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg),
it does
arg = builtin_save_expr (arg);
return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and
nothing propagates the NAN to the comparison.
I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR:
added and recurse on the second argument, but that feels like stage1
material to me if we want to do that at all.
2025-01-23 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114877
* builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases
like rvc_zero, return passed in arg and set *exp = 0.
* gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as
dg-additional-options.
(bar): New function.
(TESTIT_FREXP2): Rework the macro so that it doesn't test whether
nothing has been stored to what the second argument points to, but
instead that something has been stored there, whatever it is.
(main): Temporarily don't enable the nan tests for -O0.
|
|
Most baremetal toolchains will not have an implementation for alarm and
sigaction as they are target specific.
For arm-none-eabi with newlib, function signatures are exposed, but
there is no implmentation and thus the test cases causes a undefined
symbol link error.
gcc/testsuite/ChangeLog:
* gcc.dg/pr78185.c: Remove dg-do and replace with
with dg-require-effective-target of signal and alarm.
* gcc.dg/pr116906-1.c: Likewise.
* gcc.dg/pr116906-2.c: Likewise.
* gcc.dg/vect/pr101145inf.c: Use effective-target alarm.
* gcc.dg/vect/pr101145inf_1.c: Likewise.
* lib/target-supports.exp(check_effective_target_alarm): New.
gcc/ChangeLog:
* doc/sourcebuild.texi (Effective-Target Keywords): Document
'alarm'.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
There are calls to dr_misalignment left that do not correct for the
offset (which is vector type dependent) when the stride is negative.
Notably vect_known_alignment_in_bytes doesn't allow to pass through
such offset which the following adds (computing the offset in
vect_known_alignment_in_bytes would be possible as well, but the
offset can be shared as seen). Eventually this function could go away.
This leads to peeling for gaps not considerd, nor shortening of the
access applied which is what fixes the testcase on x86_64.
PR tree-optimization/118558
* tree-vectorizer.h (vect_known_alignment_in_bytes): Pass
through offset to dr_misalignment.
* tree-vect-stmts.cc (get_group_load_store_type): Compute
offset applied for negative stride and use it when querying
alignment of accesses.
(vectorizable_load): Likewise.
* gcc.dg/vect/pr118558.c: New testcase.
|
|
This prevents the gcc driver erroneously accepting -nostdlib++ when it
should not when Ada was enabled.
Also, similarly, -nostdinc* (where * is nonempty) is unhandled by either
the Ada or D compiler, so the spec should not substitute those
either (thanks for pointing that out, Jakub).
Brought to my attention by Michał Górny <mgorny@gentoo.org>.
gcc/ada/ChangeLog:
* gcc-interface/lang-specs.h: Replace %{nostdinc*} %{nostdlib*}
with %{nostdinc} %{nostdlib}.
gcc/d/ChangeLog:
* lang-specs.h: Replace %{nostdinc*} with %{nostdinc}.
gcc/testsuite/ChangeLog:
* gcc.dg/driver-nostdlibstar.c: New test.
|
|
This improves this pattern by 2 ways:
* Allow for an optional convert, similar to how the few other
`a OP ~a` patterns also allow for an optional convert.
* Use bitwise_inverted_equal_p/maybe_bit_not instead of directly
matching bit_not. Just like the other patterns do too.
Note pr118483-2.c used to optimized for aarch64-linux-gnu with GCC 4.9.4
on the RTL level even though the gimple level was missing it.
PR tree-optimization/118483
gcc/ChangeLog:
* match.pd (`x ==/!= ~x`): Allow for an optional convert
and use itwise_inverted_equal_p/maybe_bit_not instead of
directly matching bit_not.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr118483-1.c: New test.
* gcc.dg/tree-ssa/pr118483-2.c: New test.
* gcc.dg/tree-ssa/pr118483-3.c: New test.
* gcc.dg/tree-ssa/pr118483-4.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Test case is valid even if size of int is more than 32 bits.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr117546.c: Require effective target int32plus.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
The fix for this PR has been committed without a testcase.
The following testcase would take at least 15 minutes to compile
on a fast machine (powerpc64-linux both -m32 or -m64), now it takes
100ms.
2025-01-21 Jakub Jelinek <jakub@redhat.com>
PR target/118560
* gcc.dg/dfp/pr118560.c: New test.
|
|
[PR118211]: update 'gcc.dg/vect/vect-switch-search-line-fast.c' for GCN
PR tree-optimization/118211
PR tree-optimization/116126
gcc/testsuite/
* gcc.dg/vect/vect-switch-search-line-fast.c: Update for GCN.
|
|
The following amends the previous fix to mark all of the loop BBs
as need to be scanned for new LC PHI uses when its nesting parents
changed, noticing one caller of fix_loop_placement was already
doing that. So the following moves this code into fix_loop_placement,
covering both callers now.
PR tree-optimization/118569
* cfgloopmanip.cc (fix_loop_placement): When the loops
nesting parents changed, mark all blocks to be scanned
for LC PHI uses.
(fix_bb_placements): Remove code moved into fix_loop_placement.
* gcc.dg/torture/pr118569.c: New testcase.
|
|
The test uses floats, not fp16 so it should use arm_v8_3a_complex_neon
instead of arm_v8_3a_fp16_complex_neon.
This makes it PASS on arm-linux-gnueabihf instead of being UNRESOLVED.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-float.c: Use
arm_v8_3a_complex_neon.
|
|
These two testcases have twice the same dg-add-options
arm_v8_3a_complex_neon, the patch removes one of them.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/complex/complex-operations-run.c: Remove duplicate
dg-add-options arm_v8_3a_complex_neon.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c:
Likewise.
|
|
When unrolling changes nesting relationship of loops we fail to
mark blocks as in need to change for LC SSA update. Specifically
the LC SSA PHI on a former inner loop exit might be misplaced
if that loop becomes a sibling of its outer loop.
PR tree-optimization/118552
* cfgloopmanip.cc (fix_loop_placement): Properly mark
exit source blocks as to be scanned for LC SSA update when
the loops nesting relationship changed.
(fix_loop_placements): Adjust.
(fix_bb_placements): Likewise.
* gcc.dg/torture/pr118552.c: New testcase.
|
|
As reported by Dimitar, this should have been a multiplication, but wasn't
caught because in the test (~(__SIZE_TYPE__) 0) / 2 is the largest accepted
size and so adding 3 to it also resulted in "overflow".
The following patch adds one subtest to really verify it is a multiplication
and fixes the operation.
2025-01-20 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118224
* tree-ssa-dce.cc (is_removable_allocation_p): Multiply a1 by a2
instead of adding it.
* gcc.dg/pr118224.c: New test.
|
|
This test fails on AVR.
Debugging the test on x86 host, I noticed that u in function s sometimes
has value 16128. The "t <= 3 * u" expression in the same function
results in signed integer overflow for targets with sizeof(int)=2.
Fix by requiring int32 effective target.
Also add return statement for the main function.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr117546.c: Require effective target int32.
(main): Add return statement.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
symtab_node::get_dump_name uses node order to identify nodes.
Order is no longer unique because of Incremental LTO patches.
This patch moves uid from cgraph_node node to symtab_node,
so get_dump_name can use uid instead and get back unique dump names.
In inlining passes, uid is replaced with more appropriate (more compact
for indexing) summary id.
Bootstrapped/regtested on x86_64-linux.
Ok for trunk?
gcc/ChangeLog:
* cgraph.cc (symbol_table::create_empty):
Move uid to symtab_node.
(test_symbol_table_test): Change expected dump id.
* cgraph.h (struct cgraph_node):
Move uid to symtab_node.
(symbol_table::register_symbol): Likewise.
* dumpfile.cc (test_capture_of_dump_calls):
Change expected dump id.
* ipa-inline.cc (update_caller_keys):
Use summary id instead of uid.
(update_callee_keys): Likewise.
* symtab.cc (symtab_node::get_dump_name):
Use uid instead of order.
gcc/testsuite/ChangeLog:
* gcc.dg/live-patching-1.c: Change expected dump id.
* gcc.dg/live-patching-4.c: Likewise.
|
|
The last case of this optimization assumes that if 2 integral types
have same precision and TYPE_UNSIGNED, then they are uselessly convertible.
While that is very likely the case for GIMPLE, it is not the case for
GENERIC, so the following patch adds there a convert so that the
optimization produces also valid GENERIC. Without it we got
(int) p == b where b had _BitInt(32) type, so incompatible types.
2025-01-17 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118522
* match.pd ((FTYPE) N CMP (FTYPE) M): Add convert, as in GENERIC
integral types with the same precision and sign might actually not
be compatible types.
* gcc.dg/bitint-120.c: New test.
|
|
The following makes niter analysis recognize a loop with an exit
condition scanning over a STRING_CST. This is done via enhancing
the force evaluation code rather than recognizing for example
strlen (s) as number of iterations because it allows to handle
some more cases.
STRING_CSTs are easy to handle since nothing can write to them, also
processing those should be cheap. I've refrained from handling
anything besides char8_t.
Note to avoid the -Warray-bound dianostic we have to either early unroll
the loop (there's no final value replacement done, there's a PR
for doing this as part of CD-DCE when possibly eliding a loop),
or create a canonical IV so we can DCE the loads. The latter is what
the patch does, also avoiding to repeatedly force-evaluate niters.
This also makes final value replacement work again since now ivcanon
is after it.
There are some testsuite adjustments needed, in particular we now
unroll some loops early, causing messages to appear in different
passes but also vectorization to now no longer happening on
outer loops. The changes mitigate that.
PR tree-optimization/92539
* tree-ssa-loop-ivcanon.cc (tree_unroll_loops_completely_1):
Also try force-evaluation if ivcanon did not yet run.
(canonicalize_loop_induction_variables):
When niter was computed constant by force evaluation add a
canonical IV if we didn't unroll.
* tree-ssa-loop-niter.cc (loop_niter_by_eval): When we
don't find a proper PHI try if the exit condition scans
over a STRING_CST and simulate that.
* g++.dg/warn/Warray-bounds-pr92539.C: New testcase.
* gcc.dg/tree-ssa/sccp-16.c: New testcase.
* g++.dg/vect/pr87621.cc: Use larger power to avoid
inner loop unrolling.
* gcc.dg/vect/pr89440.c: Use larger loop bound to avoid
inner loop unrolling.
* gcc.dg/pr77975.c: Scan cunrolli dump and adjust.
|
|
A few more dfp tests that recently got backported to gcc-14 override
dfp.exp's selection of default action depending on dfprt. Let the
default stand.
for gcc/testsuite/ChangeLog
* gcc.dg/dfp/pr102674.c: Use the default dg-do.
* gcc.dg/dfp/pr43374.c: Likewise.
|
|
dfp.exp sets the default to compile when dfprt is not available, but
some dfp bitint tests override the default without that requirement,
and try to run even when dfprt is not available.
Instead of overriding the default, rewrite the requirements so that
they apply even when compiling, since the absence of bitint or of
int128 would presumably cause compile failures.
for gcc/testsuite/ChangeLog
* gcc.dg/dfp/bitint-1.c: Rewrite requirements to retain dfprt.
* gcc.dg/dfp/bitint-2.c: Likewise.
* gcc.dg/dfp/bitint-3.c: Likewise.
* gcc.dg/dfp/bitint-4.c: Likewise.
* gcc.dg/dfp/bitint-5.c: Likewise.
* gcc.dg/dfp/bitint-6.c: Likewise.
* gcc.dg/dfp/bitint-7.c: Likewise.
* gcc.dg/dfp/bitint-8.c: Likewise.
* gcc.dg/dfp/int128-1.c: Likewise.
* gcc.dg/dfp/int128-2.c: Likewise.
* gcc.dg/dfp/int128-3.c: Likewise.
* gcc.dg/dfp/int128-4.c: Likewise.
|
|
Additional shared C/C++ testcases are included in a subsequent patch in this
series.
gcc/c-family/ChangeLog
PR middle-end/112779
PR middle-end/113904
* c-common.h (enum c_omp_directive_kind): Add C_OMP_DIR_META.
(c_omp_expand_variant_construct): Declare.
* c-gimplify.cc: Include omp-general.h.
(genericize_omp_metadirective_stmt): New.
(c_genericize_control_stmt): Add case for OMP_METADIRECTIVE.
* c-omp.cc (c_omp_directives): Fix entries for metadirective.
(c_omp_expand_variant_construct_r): New.
(c_omp_expand_variant_construct): New.
* c-pragma.cc (omp_pragmas): Add metadirective.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_METADIRECTIVE.
gcc/c/ChangeLog
PR middle-end/112779
PR middle-end/113904
* c-parser.cc (struct c_parser): Add omp_metadirective_state field.
(c_parser_skip_to_end_of_block_or_statement): Add metadirective_p
parameter and handle skipping over the parentheses in a "for"
statement.
(struct omp_metadirective_parse_data): New.
(mangle_metadirective_region_label): New.
(c_parser_label): Mangle label names in a metadirective body.
(c_parser_statement_after_labels): Likewise.
(c_parser_pragma): Handle PRAGMA_OMP_METADIRECTIVE.
(c_parser_omp_context_selector): Allow arbitrary expressions in
device_num and condition properties.
(c_parser_omp_assumption_clauses): Handle C_OMP_DIR_META.
(analyze_metadirective_body): New.
(c_parser_omp_metadirective): New.
gcc/testsuite/
PR middle-end/112779
* c-c++-common/gomp/declare-variant-2.c: Adjust expected output for C.
* gcc.dg/gomp/metadirective-1.c: New.
Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Sandra Loosemore <sandra@codesourcery.com>
|
|
[PR118451]
When this test was added initially it didn't add the early break effective
target tests.
This means that the test was "passing" (as in, it was failing to vectorize)
because many targets don't support early break.
But the test should not have been run for these targets. When the vectorizer
learned PFA the test started passing for 32-bit targets. I had adjusted the
testcase but fail to notice the requirements were wrong.
Thus this adds the extra guards, and on targets that don't support early break
this test will move to UNSUPPORTED, which is what it should have been all
along...
gcc/testsuite/ChangeLog:
PR testsuite/118451
* gcc.dg/vect/vect-switch-search-line-fast.c: Add early_break guards.
|
|
As reported in PR118487, it is possible that the mask parameter
of a __builtin_shuffle() is not a VECTOR_CST.
If this is the case and checking is enabled then an ICE is triggered.
Let's add a check to fix this issue.
PR tree-optimization/118487
gcc/ChangeLog:
* tree-ssa-forwprop.cc (recognise_vec_perm_simplify_seq):
Ensure that shuffle masks are VECTOR_CSTs.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr118487.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
When we PHI translate dependent expressions we keep SSA defs in
place of the translated expression in case the expression itself
did not change even though it's context did and thus the validity
of ranges associated with it. That eventually leads to simplification
errors given we violate the precondition that used SSA defs fed to
vn_valueize are valid to use (including their associated ranges).
The following makes sure to replace those with new representatives
always, not only when the dependent expression translation changed it.
The fix was originally discovered by Michael Morin.
PR tree-optimization/115494
* tree-ssa-pre.cc (phi_translate_1): Always generate a
representative for translated dependent expressions.
* gcc.dg/torture/pr115494.c: New testcase.
Co-Authored-By: Mikael Morin <mikael@gcc.gnu.org>
|
|
This in this PR we have missed optimization where we miss that,
`1 >> x` and `(1 >> x) ^ 1` can't be equal. There are a few ways of
optimizing this, the easiest and simpliest is to simplify `1 >> x` into
just `x == 0` as those are equivalant (if we ignore out of range values for x).
we already have an optimization for `(1 >> X) !=/== 0` so the only difference
here is we don't need the `!=/== 0` part to do the transformation.
So this removes the `(1 >> X) !=/== 0` transformation and just adds a simplfied
`1 >> x` -> `x == 0` one.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/102705
gcc/ChangeLog:
* match.pd (`(1 >> X) != 0`): Remove pattern.
(`1 >> x`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr105832-2.c: Update testcase.
* gcc.dg/tree-ssa/pr96669-1.c: Likewise.
* gcc.dg/tree-ssa/pr102705-1.c: New test.
* gcc.dg/tree-ssa/pr102705-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
In g:3c32575e5b6370270d38a80a7fa8eaa144e083d0 I made a mistake and incorrectly
replaced the type of the arguments of an expression with the type of the
expression. This is of course wrong.
This reverts that change and I have also double checked the other replacements
and they are fine.
gcc/ChangeLog:
PR middle-end/118472
* fold-const.cc (operand_compare::operand_equal_p): Fix incorrect
replacement.
gcc/testsuite/ChangeLog:
PR middle-end/118472
* gcc.dg/pr118472.c: New test.
|
|
Other spots in cgraphunit.cc already call bitmap_obstack_initialize (NULL);
before running a pass list and bitmap_obstack_release (NULL); after that,
while process_new_functions wasn't doing that and with the new r15-130
bitmap_alloc checking that results in ICE.
2025-01-15 Jakub Jelinek <jakub@redhat.com>
PR ipa/116068
* cgraphunit.cc (symbol_table::process_new_functions): Call
bitmap_obstack_initialize (NULL); and bitmap_obstack_release (NULL)
around processing the functions.
* gcc.dg/graphite/pr116068.c: New test.
|
|
Add logic to check and extend constants compared with bitfields, so
that fields are only compared with constants they could actually
equal. This involves making sure the signedness doesn't change
between loads and conversions before shifts: we'd need to carry a lot
more data to deal with all the possibilities.
for gcc/ChangeLog
PR tree-optimization/118456
* gimple-fold.cc (decode_field_reference): Punt if shifting
after changing signedness.
(fold_truth_andor_for_ifcombine): Check extension bits in
constants before clipping.
for gcc/testsuite/ChangeLog
PR tree-optimization/118456
* gcc.dg/field-merge-21.c: New.
* gcc.dg/field-merge-22.c: New.
|
|
In PR118140 we simplify
_ifc__33 = .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11);
to 1:
Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1
when _46 == 1. This happens by removing the conditional and applying
a | 1 = 1. Normally we re-introduce the conditional and its else value
if needed but that does not happen here as we're not dealing with a
vector type. For correctness's sake, we must not remove the conditional
even for non-vector types.
This patch re-introduces a COND_EXPR in such cases. For PR118140 this
result in a non-vectorized loop.
PR middle-end/118140
gcc/ChangeLog:
* gimple-match-exports.cc (maybe_resimplify_conditional_op): Add
COND_EXPR when we simplified to a scalar gimple value but still
have an else value.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr118140.c: New test.
* gcc.target/riscv/rvv/autovec/pr118140.c: New test.
|
|
PR c/116871 notes that our diagnostics about incompatible function types
could be improved.
In particular, for the case of migrating to C23 I'm seeing a lot of
build failures with signal handlers similar to this (simplified from
alsa-tools-1.2.11, envy24control/profiles.c; see rhbz#2336278):
typedef void (*__sighandler_t) (int);
extern __sighandler_t signal (int __sig, __sighandler_t __handler)
__attribute__ ((__nothrow__ , __leaf__));
void new_process(void)
{
void (*int_stat)();
int_stat = signal(2, ((__sighandler_t) 1));
signal(2, int_stat);
}
Before this patch, cc1 fails with this message:
t.c: In function 'new_process':
t.c:18:12: error: assignment to 'void (*)(void)' from incompatible pointer type '__sighandler_t' {aka 'void (*)(int)'} [-Wincompatible-pointer-types]
18 | int_stat = signal(2, ((__sighandler_t) 1));
| ^
t.c:20:13: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
20 | signal(2, int_stat);
| ^~~~~~~~
| |
| void (*)(void)
t.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'void (*)(void)'
11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
| ~~~~~~~~~~~~~~~^~~~~~~~~
With this patch, cc1 emits:
t.c: In function 'new_process':
t.c:18:12: error: assignment to 'void (*)(void)' from incompatible pointer type '__sighandler_t' {aka 'void (*)(int)'} [-Wincompatible-pointer-types]
18 | int_stat = signal(2, ((__sighandler_t) 1));
| ^
t.c:9:16: note: '__sighandler_t' declared here
9 | typedef void (*__sighandler_t) (int);
| ^~~~~~~~~~~~~~
t.c:20:13: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
20 | signal(2, int_stat);
| ^~~~~~~~
| |
| void (*)(void)
t.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'void (*)(void)'
11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
| ~~~~~~~~~~~~~~~^~~~~~~~~
t.c:9:16: note: '__sighandler_t' declared here
9 | typedef void (*__sighandler_t) (int);
| ^~~~~~~~~~~~~~
showing the location of the pertinent typedef ("__sighandler_t")
Another example, simplfied from a52dec-0.7.4: src/a52dec.c
(rhbz#2336013):
typedef void (*__sighandler_t) (int);
extern __sighandler_t signal (int __sig, __sighandler_t __handler)
__attribute__ ((__nothrow__ , __leaf__));
/* Mismatching return type. */
static RETSIGTYPE signal_handler (int sig)
{
}
static void print_fps (int final)
{
signal (42, signal_handler);
}
Before this patch, cc1 emits:
t2.c: In function 'print_fps':
t2.c:22:15: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
22 | signal (42, signal_handler);
| ^~~~~~~~~~~~~~
| |
| int (*)(int)
t2.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'int (*)(int)'
11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
| ~~~~~~~~~~~~~~~^~~~~~~~~
With this patch cc1 emits:
t2.c: In function 'print_fps':
t2.c:22:15: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
22 | signal (42, signal_handler);
| ^~~~~~~~~~~~~~
| |
| int (*)(int)
t2.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'int (*)(int)'
11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
| ~~~~~~~~~~~~~~~^~~~~~~~~
t2.c:16:19: note: 'signal_handler' declared here
16 | static RETSIGTYPE signal_handler (int sig)
| ^~~~~~~~~~~~~~
t2.c:9:16: note: '__sighandler_t' declared here
9 | typedef void (*__sighandler_t) (int);
| ^~~~~~~~~~~~~~
showing the location of the pertinent fndecl ("signal_handler"), and,
as before, the pertinent typedef.
The patch also updates the colorization in the messages to visually
link and contrast the different types and typedefs.
My hope is that this make it easier for users to decipher build failures
seen with the new C23 default.
Further improvements could be made to colorization in
convert_for_assignment, and similar improvements to C++, but I'm punting
those to GCC 16.
gcc/c/ChangeLog:
PR c/116871
* c-typeck.cc (pedwarn_permerror_init): Return bool for whether a
warning was emitted. Only call print_spelling if we warned.
(pedwarn_init): Return bool for whether a warning was emitted.
(permerror_init): Likewise.
(warning_init): Return bool for whether a
warning was emitted. Only call print_spelling if we warned.
(class pp_element_quoted_decl): New.
(maybe_inform_typedef_location): New.
(convert_for_assignment): For OPT_Wincompatible_pointer_types,
move auto_diagnostic_group to cover all cases. Use %e and
pp_element rather than %qT and tree to colorize the types.
Capture whether a warning was emitted, and, if it was,
show various notes: for a pointer to a function, show the
function decl, for typedef types, and show the decls.
gcc/testsuite/ChangeLog:
PR c/116871
* gcc.dg/c23-mismatching-fn-ptr-a52dec.c: New test.
* gcc.dg/c23-mismatching-fn-ptr-alsatools.c: New test.
* gcc.dg/c23-mismatching-fn-ptr.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
If a single-bit bitfield takes up the sign bit of a storage unit,
comparing the corresponding bitfield between two objects loads the
storage units, XORs them, converts the result to signed char, and
compares it with zero: ((signed char)(a.<byte> ^ c.<byte>) >= 0).
fold_truth_andor_for_ifcombine recognizes the compare with zero as a
sign bit test, then it decomposes the XOR into an equality test.
The problem is that, after this decomposition, that figures out the
width of the accessed fields, we apply the sign bit mask to the
left-hand operand of the compare, but we failed to also apply it to
the right-hand operand when both were taken from the same XOR.
This patch fixes that.
for gcc/ChangeLog
PR tree-optimization/118409
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Apply the
signbit mask to the right-hand XOR operand too.
for gcc/testsuite/ChangeLog
PR tree-optimization/118409
* gcc.dg/field-merge-20.c: New.
|
|
Here's another fix for a missing check that an IV value fits in a
HIW. It's originally from Stefan.
PR tree-optimization/117119
* tree-data-ref.cc (initialize_matrix_A): Check whether
an INTEGER_CST fits in HWI, otherwise return chrec_dont_know.
* gcc.dg/torture/pr117119.c: New testcase.
Co-Authored-By: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
|
|
Hi!
As mentioned in the second PR, using table names like
crc_table_for_crc_8_polynomial_0x12
in the user namespace is wrong, user could have defined such variables
in their code and as can be seen on the last testcase, then it just
misbehaves.
At minimum such names should start with 2 underscores, moving it into
implementation namespace, and if possible have some dot or dollar in the
name if target supports it.
I think assemble_crc_table right now always emits tables a local variables,
I really don't see what would be setting TREE_PUBLIC flag on
IDENTIFIER_NODEs.
It might be nice to share the tables between TUs in the same binary or
shared library, but it in that case should have hidden visibility if
possible, so that it isn't exported from the libraries or binaries, we don't
want the optimization to affect set of exported symbols from libraries.
And, as can be seen in the first PR, building gen_rtx_SYMBOL_REF by hand
is certainly unexpected on some targets, e.g. those which use
-fsection-anchors, so we should instead use DECL_RTL of the VAR_DECL.
For that we'd need to look it up if we haven't emitted it already, while
IDENTIFIER_NODEs can be looked up easily, I guess for the VAR_DECLs we'd
need custom hash table.
Now, all of the above (except sharing between multiple TUs) is already
implemented in output_constant_def, so I think it is much better to just
use that function.
And, if we want to share it between multiple TUs, we could extend the
SHF_MERGE usage in gcc, currently we only use it for constant pool
entries with same size as alignment, from 1 to 32 bytes, using .rodata.cstN
sections. We could just use say .rodata.cstM.N sections where M would be
alignment and N would be the entity size. We could use that for all
constant pool entries say up to 2048 bytes.
Though, as the current code doesn't share between multiple TUs, I think it
can be done incrementally (either still for GCC 15, or GCC 16+).
Bootstrapped/regtested on {x86_64,i686,aarch64,powerpc64le,s390x}-linux, on
aarch64 it also fixes
-FAIL: crypto/rsa
-FAIL: hash
ok for trunk?
gcc/
PR tree-optimization/117997
PR middle-end/118415
* expr.cc (assemble_crc_table): Make static, remove id argument,
use output_constant_def. Emit note if -fdump-rtl-expand-details
about which table has been emitted.
(generate_crc_table): Make static, adjust assemble_crc_table
caller, call it always.
(calculate_table_based_CRC): Make static.
* internal-fn.cc (expand_crc_optab_fn): Emit note if
-fdump-rtl-expand-details about using optab for crc. Formatting fix.
gcc/testsuite/
* gcc.dg/crc-builtin-target32.c: Add -fdump-rtl-expand-details
as dg-additional-options. Scan expand dump rather than assembly,
adjust the regexps.
* gcc.dg/crc-builtin-target64.c: Likewise.
* gcc.dg/crc-builtin-rev-target32.c: Likewise.
* gcc.dg/crc-builtin-rev-target64.c: Likewise.
* gcc.dg/pr117997.c: New test.
* gcc.dg/pr118415.c: New test.
|