Age | Commit message (Collapse) | Author | Files | Lines |
|
Given ~[0,0] = op1 * op2, range-ops should determine that neither op1 nor
op2 is zero. Add this to the operator_mult for op1_range. op2_range
simply invokes op1_range, so both will be covered.
PR tree-optimzation/110992.c
PR tree-optimzation/119471.c
gcc/
* range-op.cc (operator_mult::op1_range): If the LHS does not
contain zero, return non-zero.
gcc/testsuite/
* gcc.dg/pr110992.c: New.
* gcc.dg/pr119471.c: New.
|
|
Some targets (like arm) need some flags to enable _Float16 support.
gcc/testsuite/ChangeLog:
PR target/119133
* gcc.dg/torture/pr119133.c: Add options for float16.
|
|
I have a handful more of these left but those introduce FAILs, while
these all introduce new PASSes.
libstdc++-v3/ChangeLog:
* testsuite/std/format/string_neg.cc: Add missing brace for dg-error.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/fd-datagram-socket.c: Fix 'dg-message' spelling.
* gcc.dg/analyzer/out-of-bounds-zero.c: Fix whitespace in 'dg-additional-options'.
* gcc.dg/analyzer/strchr-1.c: Fix 'dg-message' whitespace.
* gnat.dg/sso/q11.adb: Fix 'dg-output' whitespace.
|
|
This fixes some 'scan-tree-dump-times' (vs '-time') typos and one or
two others I noticed in passing.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Winvalid-memory-model.C: Fix typo in comment.
* gcc.dg/builtin-dynamic-object-size-19.c: Ditto.
* gcc.dg/builtin-object-size-19.c: Ditto.
* gcc.dg/strlenopt-40.c: Ditto.
* gcc.dg/strlenopt-44.c: Ditto.
* gcc.dg/strlenopt-45.c: Ditto.
* gcc.dg/strlenopt-50.c: Ditto.
* gcc.dg/strlenopt-51.c: Ditto.
* gcc.dg/strlenopt-52.c: Ditto.
* gcc.dg/strlenopt-53.c: Ditto.
* gcc.dg/strlenopt-54.c: Ditto.
* gcc.dg/strlenopt-55.c: Ditto.
* gcc.dg/strlenopt-58.c: Ditto.
* gcc.dg/strlenopt-59.c: Ditto.
* gcc.dg/strlenopt-62.c: Ditto.
* gcc.dg/strlenopt-65.c: Ditto.
* gcc.dg/strlenopt-70.c: Ditto.
* gcc.dg/strlenopt-72.c: Ditto.
* gcc.dg/strlenopt-73.c: Ditto.
* gcc.dg/strlenopt-77.c: Ditto.
* gcc.dg/strlenopt-82.c: Ditto.
* gcc.dg/tree-ssa/builtin-snprintf-4.c: Ditto.
* gcc.dg/tree-ssa/builtin-snprintf-6.c: Ditto.
* gcc.dg/tree-ssa/builtin-snprintf-7.c: Ditto.
* gcc.dg/tree-ssa/builtin-sprintf-10.c: Ditto.
* gcc.dg/tree-ssa/builtin-sprintf-9.c: Ditto.
* gcc.dg/tree-ssa/phi-opt-value-5.c: Ditto.
* lib/multiline.exp: Ditto.
* lib/target-supports.exp: Ditto.
|
|
These just fix inconsistent/unusual style to avoid noise when grepping
and also people picking up bad habits when they see it (as similar
mistakes can be harmful).
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/pr69916.c: Fix unusual whitespace in dg-*.
* g++.old-deja/g++.abi/vtable2.C: Ditto.
* g++.old-deja/g++.bugs/900330_02.C: Ditto.
* g++.old-deja/g++.bugs/900406_02.C: Ditto.
* g++.old-deja/g++.bugs/900519_13.C: Ditto.
* g++.old-deja/g++.mike/p9068.C: Ditto.
* gcc.dg/20040203-1.c: Ditto.
* gcc.dg/980502-1.c: Ditto.
* gcc.dg/ipa/ipa-sra-14.c: Ditto.
* gcc.dg/pr35468.c: Ditto.
* gcc.dg/pr82597.c: Ditto.
* gcc.dg/tree-ssa/phi-opt-7.c: Ditto.
* gfortran.dg/assumed_charlen_in_main.f90: Ditto.
* gfortran.dg/cray_pointers_2.f90: Ditto.
|
|
When we redefine a typedef for a tagged type that has just been
redefined, merge_decls may produce invalid TYPE_DECLS that are not
consistent with what set_underlying_type produces. This is fixed
by updating DECL_ORIGINAL_TYPE.
PR c/118765
gcc/c/ChangeLog:
* c-decl.cc (merge_decls): For TYPE_DECLS copy
DECL_ORIGINAL_TYPE from the old declaration.
* c-typeck.cc (tagged_types_tu_compatible_p): Add
checking assertions.
gcc/testsuite/ChangeLog:
* gcc.dg/pr118765-2.c: New test.
* gcc.dg/pr118765-3.c: New test.
* gcc.dg/typedef-redecl3.c: New test.
|
|
A handful of cosmetic ones in here but most meant the directive wasn't
doing anything.
gcc/testsuite/ChangeLog:
PR target/98743
PR tree-optimization/105820
* g++.dg/cpp0x/udlit-namespace-ambiguous.C: Fix whitespace.
* g++.dg/cpp2a/constexpr-init21.C: Ditto.
* g++.dg/diagnostic/wrong-tag-1.C: Ditto.
* g++.dg/init/self1.C: Ditto.
* g++.dg/opt/pr98743.C: Add missing '}' to terminate dg directive.
* g++.dg/parse/error8.C: Fix whitespace.
* g++.dg/template/explicit-args6.C: Add missing '{' to begin dg directive.
* g++.dg/template/unify9.C: Fix whitespace.
* g++.dg/tree-ssa/pr105820.C: Ditto.
* g++.dg/warn/Wmismatched-tags-8.C: Add missing braces.
* gcc.dg/cpp/cmdlne-dM-M.c: Ditto.
* gcc.dg/tree-ssa/reassoc-32.c: Ditto.
* gcc.dg/tree-ssa/reassoc-33.c: Ditto.
* gcc.dg/tree-ssa/reassoc-34.c: Ditto.
* gcc.dg/tree-ssa/reassoc-35.c: Ditto.
* gcc.dg/tree-ssa/reassoc-36.c: Ditto.
* gcc.dg/tree-ssa/reassoc-39.c: Ditto.
* gcc.dg/tree-ssa/reassoc-41.c: Ditto.
|
|
.C is for C++ testcases and gcc.dg's dg.exp ignores .c. The test
was not being run.
gcc/testsuite/ChangeLog:
PR ipa/98265
* gcc.dg/tree-ssa/pr98265.C: Move to...
* g++.dg/tree-ssa/pr98265.C: ...here.
|
|
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/metadirective-target-device-2.c: Fix missing
trailing " }" on dg-do directive.
* gcc.dg/gomp/attrs-21.c: Likewise for dg-options.
* gcc.dg/gomp/parallel-2.c: Drop ":" from dg-message.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Found by dg-lint.
gcc/testsuite/ChangeLog:
* gcc.dg/ipa/pr110377.c: Fix missing trailing " }" in dg-do
directive.
* gcc.dg/plugin/infoleak-1.c: Fix dg-bogus directive.
* gcc.dg/pr101364-1.c: Fix missing trailing " }" in dg-options
directive.
* gcc.dg/pr113207.c: Fix dg-do.
* gcc.dg/sarif-output/include-chain-2.c: Fix ordering of dg-do
and dg-require-effective-target.
* gcc.dg/strub-pr118007.c: Likewise.
* gcc.dg/tanhbysinh.c: Fix missing whitespace after opening
brace and before closing brace in 6 dg-final directives.
* gcc.dg/uninit-pred-3_c.c: Fix missing whitespace after opening
brace in 6 dg-final directive.
* gcc.dg/uninit-pred-3_d.c: Likewise.
* gcc.dg/variable-sized-type-flex-array.c: Fix missing space
between dg-bogus and message in 2 places.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
The following testcase is miscompiled since r14-8680 PR113560 changes.
I've already tried to fix some of the issues caused by that change in
r14-8823 PR113759, but apparently didn't get it right.
The problem is that the r14-8680 changes sometimes set *type_out to
a narrower type than the *new_rhs_out actually has (because it will
handle stuff like _1 = rhs1 & 0xffff; and imply from that HImode type_out.
Now, if in convert_mult_to_widen or convert_plusminus_to_widen we actually
get optab for the modes we've asked for (i.e. with from_mode and to_mode),
everything works fine, if the operands don't have the expected types,
they are converted to those (for INTEGER_CSTs with fold_convert,
otherwise with build_and_insert_cast).
On the following testcase on aarch64 that is not the case, we ask
for from_mode HImode and to_mode DImode, but get actual_mode SImode.
The mult_rhs1 operand already has SImode and we change type1 to unsigned int
and so no cast is actually done, except that the & 0xffff is lost that way.
The following patch ensures that if we change typeN because of wider
actual_mode (or because of a sign change), we first cast to the old
typeN (if the r14-8680 code was encountered, otherwise it would have the
same precision) and only then change it, and then perhaps cast again.
On the testcase on aarch64-linux the patch results in the expected
- add x19, x19, w0, uxtw 1
+ add x19, x19, w0, uxth 1
difference.
2025-03-26 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119417
* tree-ssa-math-opts.cc (convert_mult_to_widen): Before changing
typeN because actual_precision/from_unsignedN differs cast rhsN
to typeN if it has a different type.
(convert_plusminus_to_widen): Before changing
typeN because actual_precision/from_unsignedN differs cast mult_rhsN
to typeN if it has a different type.
* gcc.dg/torture/pr119417.c: New test.
|
|
r15-7961-gdc47161c1f32c3 fixes a typo in ao_compare::compare_ao_refs
but there wasn't a testcase available at the time. Now there is.
Thanks to Andrew for the testcase.
gcc/testsuite/ChangeLog:
PR testsuite/119382
* gcc.dg/ipa/ipa-icf-40.c: New test.
Co-authored-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
If expand_binop_directly fails to add a REG_EQUAL note it tries to
unwind and restart. But it can unwind too far if expand_binop changed
some of the operands before calling it. We don't need to unwind that
far anyway since we should end up taking exactly the same route next
time, just without a target rtx.
To fix this we remove LAST from the argument list and let the callers
(all in expand_binop) do their own unwinding if the call fails.
Instead we unwind just as far as the entry to expand_binop_directly
and recurse within this function instead of all the way back up.
gcc/ChangeLog:
PR middle-end/117811
* optabs.cc (expand_binop_directly): Remove LAST as an argument,
instead record the last insn on entry. Only delete insns if
we need to restart and restart by calling ourself, not expand_binop.
(expand_binop): Update callers to expand_binop_directly. If it
fails to expand the operation, delete back to LAST.
gcc/testsuite:
PR middle-end/117811
* gcc.dg/torture/pr117811.c: New test.
|
|
Here is an updated version of Surya's PR116028 fix from August, which got
reverted because it caused bootstrap failures on aarch64, later on bootstrap
comparison errors there as well and problems on other targets as well.
Original description:
LRA emits insns to save caller-save registers in the
inheritance/splitting pass. In this pass, LRA builds EBBs (Extended
Basic Block) and traverses the insns in the EBBs in reverse order from
the last insn to the first insn. When LRA sees a write to a pseudo (that
has been assigned a caller-save register), and there is a read following
the write, with an intervening call insn between the write and read,
then LRA generates a spill immediately after the write and a restore
immediately before the read. The spill is needed because the call insn
will clobber the caller-save register.
If there is a write insn and a call insn in two separate BBs but
belonging to the same EBB, the spill insn gets generated in the BB
containing the write insn. If the write insn is in the entry BB, then
the spill insn that is generated in the entry BB prevents shrink wrap
from happening. This is because the spill insn references the stack
pointer and hence the prolog gets generated in the entry BB itself.
This patch ensures the the spill insn is generated before the call insn
instead of after the write. This also ensures that the spill occurs
only in the path containing the call.
The changes compared to the first r15-2810 version are:
1) the reason for aarch64 miscompilations and later on bootstrap comparison
issues as can be seen on the pr118615.c testcase in the patch was that
when curr_insn is a JUMP_INSN or some cases of CALL_INSNs,
split_if_necessary is called with before_p true and if it is successful,
the code set use_insn = PREV_INSN (curr_insn); instead of use_insn =
curr_insn; and that use_insn is then what is passed to
add_next_usage_insn; now, if the patch decides to emit the save
instruction(s) before the first call after curr_insn in the ebb rather
than before the JUMP_INSN/CALL_INSN, PREV_INSN (curr_insn) is some random
insn before it, not anything related to the split_reg actions.
If it is e.g. a DEBUG_INSN in one case vs. some unrelated other insn
otherwise, that can affect further split_reg within the same function
2) as suggested by Surya in PR118615, it makes no sense to try to change
behavior if the first call after curr_insn is in the same bb as curr_insn
3) split_reg is actually called sometimes from within inherit_in_ebb but
sometimes from elsewhere; trying to use whatever last call to
inherit_in_ebb saw last is a sure way to run into wrong-code issues,
so instead of clearing the rtx var at the start of inherit_in_ebb it is
now cleared at the end of it
4) calling the var latest_call_insn was weird, inherit_in_ebb walks the ebb
backwards, so what the var contains is the first call insn within the
ebb (after curr_insn)
5) the patch was using
lra_process_new_insns (PREV_INSN (latest_call_insn), NULL, save,
"Add save<-reg");
to emit the save insn before latest_call_insn. That feels quite weird
given that latest_call_insn has explicit support for adding stuff
before some insn or after some insn, adding something before some
insn doesn't really need to be done as addition after PREV_INSN
6) some formatting nits + new testcase + removal of xfail even on arm32
Bootstrapped/regtested on x86_64-linux/i686-linux (my usual
--enable-checking=yes,rtl,extra builds), aarch64-linux (normal default
bootstrap) and our distro scratch build
({x86_64,i686,aarch64,powerpc64le,s390x}-linux --enable-checking=release
LTO profiledbootstrap/regtest), I think Sam James tested on 32-bit arm
too.
On aarch64-linux this results in
-FAIL: gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"
I admit I don't know the code well nor understood everything it is doing.
I have some concerns:
1) I wonder if there is a guarantee that first_call_insn if non-NULL will be
always in between curr_insn and usage_insn when call_save_p; I'd hope
yes because if usage_insn is before first_call_insn in the ebb,
presumably it wouldn't need to find call save regs because the range
wouldn't cross any calls
2) I wonder whether it wouldn't be better instead of inserting the saves
before first_call_insn insert it at the start of the bb containing that
call (after labels of course); emitting it right before a call could
mislead code looking for argument slot initialization of the call
3) even when avoiding the use_insn = PREV_INSN (curr_insn);, I wonder
if it is ok to use use_insn equal to curr_insn rather than the insns
far later where we actually inserted it, but primarily because I don't
understand the code much; I think for the !before_p case it is doing
similar thing on a shorter distance, the saves were emitted after
curr_insn and we record it on curr_insn
2025-03-21 Surya Kumari Jangala <jskumari@linux.ibm.com>
Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/116028
PR rtl-optimization/118615
* lra-constraints.cc (first_call_insn): New variable.
(split_reg): Spill register before first_call_insn if call_save_p
and the call is in a different bb in the ebb.
(split_if_necessary): Formatting fix.
(inherit_in_ebb): Set first_call_insn when handling a CALL_INSN.
For successful split_if_necessary with before_p, only change
use_insn if it emitted any new instructions before curr_insn.
Clear first_call_insn before returning.
* gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc.
* gcc.dg/pr10474.c: Remove xfail for powerpc and arm.
* gcc.dg/pr118615.c: New test.
|
|
diagnostic_context's dtor assumed that it owned the m_urlifier pointer
and would delete it.
As of r15-5988-g5a022062d22e0b this isn't always the case -
auto_urlify_attributes is used in various places in the C/C++ frontends
and in the middle-end to temporarily override the urlifier with an
on-stack instance, which would lead to delete-of-on-stack-buffer crashes
with -Wfatal-errors as the global_dc was cleaned up.
Fix by explicitly tracking the stack of urlifiers within
diagnostic_context, tracking for each node whether the pointer is
owned or borrowed.
gcc/ChangeLog:
PR c/119366
* diagnostic-format-sarif.cc (test_message_with_embedded_link):
Convert diagnostic_context from one urlifier to a stack of
urlifiers, where each node in the stack tracks whether the
urlifier is owned or borrowed.
* diagnostic.cc (diagnostic_context::initialize): Likewise.
(diagnostic_context::finish): Likewise.
(diagnostic_context::set_urlifier): Delete.
(diagnostic_context::push_owned_urlifier): New.
(diagnostic_context::push_borrowed_urlifier): New.
(diagnostic_context::pop_urlifier): New.
(diagnostic_context::get_urlifier): Reimplement in terms of stack.
(diagnostic_context::override_urlifier): Delete.
* diagnostic.h (diagnostic_context::set_urlifier): Delete decl.
(diagnostic_context::override_urlifier): Delete decl.
(diagnostic_context::push_owned_urlifier): New decl.
(diagnostic_context::push_borrowed_urlifier): New decl.
(diagnostic_context::pop_urlifier): New decl.
(diagnostic_context::get_urlifier): Make return value const; hide
implementation.
(diagnostic_context::m_urlifier): Replace with...
(diagnostic_context::urlifier_stack_node): ... this and...
(diagnostic_context::m_urlifier_stack): ...this.
* gcc-urlifier.cc
(auto_override_urlifier::auto_override_urlifier): Reimplement.
(auto_override_urlifier::~auto_override_urlifier): Reimplement.
* gcc-urlifier.h (class auto_override_urlifier): Reimplement.
(auto_urlify_attributes::auto_urlify_attributes): Update for
pass-by-reference.
* gcc.cc (driver::global_initializations): Update for
reimplementation of urlifiers in terms of a stack.
* toplev.cc (general_init): Likewise.
gcc/testsuite/ChangeLog:
PR c/119366
* gcc.dg/Wfatal-bad-attr-pr119366.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Even in C23/C2Y any initialization of flexible array member is still
invalid, so we should emit a pedwarn on it. But we no longer do for
initialization with {}. The reason is that for C17 and earlier,
we already emitted a pedwarn on the {} initializer and so emitting
another pedwarn on the flexible array member initialization would
be diagnosing the same thing multiple times.
In C23 we no longer pedwarn on {}, it is standard.
The following patch arranges a pedwarning for that for C23+, so that
at least one pedwarning is emitted.
So that we don't "regress" from C17 to C23 on nested flexible array
member initialization with no -pedantic/-pedantic-errors/-Wpedantic,
the patch emits even the
initialization of flexible array member in a nested context
diagnostic as pedwarn in the {} case, after all, it doesn't cause
much trouble, we just ignore it like before, it wouldn't initialize
anything.
2025-03-19 Jakub Jelinek <jakub@redhat.com>
PR c/119350
* c-typeck.cc (pop_init_level): Don't ignore empty brackets for
flag_isoc23, still set constructor_type to NULL in that case but
emit a pedwarn_init in that case.
* gcc.dg/pr119350-1.c: New test.
* gcc.dg/pr119350-2.c: New test.
* gcc.dg/pr119350-3.c: New test.
|
|
Then we can also remove the added -std=gnu17
PR testsuite/113634
* gcc.dg/Wfree-nonheap-object-7.c: Adjust calloc and realloc
declarations, remove -std=gnu17.
|
|
Broadly speaking, these tests were failing because the BB limitation for SLP'ing
loads in an || in an early break makes the loads end up in different BBs and so
today we can't SLP them. This results in load_lanes being required to vectorize
them because the alternative is loads with permutes which we don't allow.
The original checks were only checking partial vectors, which ended up working
because e.g. Adv. SIMD isn't a partial vector target, so it failed, and SVE was
a partial vector target but also has load lanes so it passes.
GCN however is a partial vector target without load lanes which makes the tests
fail. As we require load_lanes for now, also check for them.
Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.
Cross checked the failing cases on amdgcn-amdhsa
and all pass now.
gcc/testsuite/ChangeLog:
PR target/119286
* gcc.dg/vect/bb-slp-41.c: Add pragma novector.
* gcc.dg/vect/vect-early-break_133_pfa11.c: Should never vectorize today
as indexes can be out of range.
* gcc.dg/vect/vect-early-break_128.c: Require load_lanes as well.
* gcc.dg/vect/vect-early-break_133_pfa10.c: Likewise.
* gcc.dg/vect/vect-early-break_133_pfa8.c: Likewise.
* gcc.dg/vect/vect-early-break_133_pfa9.c: Likewise.
* gcc.dg/vect/vect-early-break_22.c: Likewise.
* gcc.dg/vect/vect-early-break_26.c: Likewise.
* gcc.dg/vect/vect-early-break_43.c: Likewise.
* gcc.dg/vect/vect-early-break_44.c: Likewise.
* gcc.dg/vect/vect-early-break_6.c: Likewise.
* gcc.dg/vect/vect-early-break_56.c: Expect failures on group misalign.
|
|
r15-7222 added an empty file gcc.dg/pr not mentioned in the ChangeLog
nor used anywhere in that patch.
Removed as obvious.
2025-03-19 Jakub Jelinek <jakub@redhat.com>
* gcc.dg/pr: Remove.
|
|
When we redefine a tagged type we incorrectly update TYPE_STUB_DECL
of the previously defined type instead of the new one. Because
TYPE_STUB_DECL is used when determining whether two such types are
the same, this can cause valid typedef redefinitions to be rejected
later. This is only a partial fix for PR118765.
PR c/118765
gcc/c/ChangeLog:
* c-decl.cc (finish_struct,finish_enum): Swap direction when
copying TYPE_STRUB_DECL in redefinitions.
gcc/testsuite/ChangeLog:
* gcc.dg/pr118765.c: New test.
|
|
Return early when comparing two structures for compatibility
and the type of a member is erroneous.
PR c/118061
gcc/c/ChangeLog:
* c-typeck.cc (tagged_types_tu_compatible_p): Handle
errors in types of struct members.
gcc/testsuite/ChangeLog:
* gcc.dg/pr118061.c: New test.
|
|
The COBOL tests has many tests which just dump emit lots of output
to stdout and want to compare it against expected output.
We have the dg-output directive, but if one needs more than dozens
of lines in the output, adding hundreds of dg-output directives to
each source uses too much memory and is harder to maintain.
The following patch offers an alternative, dg-output-file
directive where one can supply a text file with expected output
(no regexp matching in that case, just exact output, except that it
handles different line ending styles (for the expected file
using tcl gets, for the actual output skips over \n, \r\n or \r).
And a newline at the end of the whole output is optional (in the actual
output, because I think some boards get it eaten).
Also tested with addition or subtraction of some characters from the
expected output files and saw FAILs with appropriate messages.
2025-03-18 Jakub Jelinek <jakub@redhat.com>
* doc/sourcebuild.texi (dg-output-file): Document.
* lib/gcc-dg.exp (${tool}-load): If output-file is set, compare
combined output against content of the [lindex ${output-file} 1]
file.
(dg-output-file): New directive.
* lib/dg-test-cleanup.exp (cleanup-after-saved-dg-test): Clear
output-file variable.
* gcc.dg/dg-output-file-1.c: New test.
* gcc.dg/dg-output-file-1-lp64.txt: New test.
* gcc.dg/dg-output-file-1-ilp32.txt: New test.
|
|
Since gcc.dg/pr90838-2.c is only for 64-bit integer, replace long with
long long for ILP32 targets.
* gcc.dg/pr90838-2.c (ctz4): Replace long with long long.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Some targets default to strict dwarf.
2025-03-17 John David Anglin <danglin@gcc.gnu.org>
gcc/testsuite/ChangeLog:
PR testsuite/119220
* gcc.dg/debug/dwarf2/inline2.c: Add -gno-strict-dwarf option.
* gcc.dg/debug/dwarf2/inline6.c: Likewise.
|
|
There exists no .REDUC_PLUS on s390.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-77.c: Skip on s390.
|
|
The following testcase ICEs since r15-8025.
tree_nop_conversion_p doesn't imply TREE_TYPE (@0) is uselessly convertible
to type, e.g. they could be INTEGER_TYPEs with the same precision but
different TYPE_SIGN.
The following patch just adds a convert so that it creates a valid IL
even in those cases.
2025-03-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119287
* match.pd (((X >> C1) & C2) * (1 << C1) to X & (C2 << C1)): Use
(convert @0) instead of @0 in the substitution.
* gcc.dg/pr119287.c: New test.
|
|
When doing strided SLP vectorization we use the wrong alignment for
the possibly piecewise access of the vector elements for loads and
stores. While we are carefully using element aligned loads and
stores that isn't enough for the case the original scalar accesses
are packed. The following instead honors larger alignment when
present but correctly falls back to the original scalar alignment
used.
PR tree-optimization/119155
* tree-vect-stmts.cc (vectorizable_store): Do not always
use vector element alignment for VMAT_STRIDED_SLP but
a more correct alignment towards both ends.
(vectorizable_load): Likewise.
* gcc.dg/vect/pr119155.c: New testcase.
|
|
We have long had the fold:
/* Pattern match
tem = (sizetype) ptr;
tem = tem & algn;
tem = -tem;
... = ptr p+ tem;
and produce the simpler and easier to analyze with respect to alignment
... = ptr & ~algn; */
But the gimple in gcc.target/aarch64/sve/pr98119.c has a variant in
which a constant is added before the conversion, giving:
tem = (sizetype) (ptr p+ CST);
tem = tem & algn;
tem = -tem;
... = ptr p+ tem;
This case is also valid if algn fits within the trailing zero bits
of CST. Adding CST then has no effect.
Similarly the testcase has:
tem = (sizetype) (ptr p+ CST1);
tem = tem & algn;
tem = CST2 - tem;
... = ptr p+ tem;
This folds to:
... = (ptr & ~algn) p+ CST2;
if algn fits within the trailing zero bits of both CST1 and CST2.
An alternative would be:
... = (ptr p+ CST2) & ~algn;
but I would expect the alignment to be more easily shareable than
the CST2 addition, given that the CST2 addition wasn't being applied
by a POINTER_PLUS_EXPR.
gcc/
* match.pd: Extend pointer alignment folds so that they handle
the case where a constant is added before or after the alignment.
gcc/testsuite/
* gcc.dg/pointer-arith-11.c: New test.
* gcc.dg/pointer-arith-12.c: Likewise.
|
|
Using a combination of rules, we were able to fold
((X >> C1) & C2) * (1 << C1) --> X & (C2 << C1)
if everything was done at the same precision, but we couldn't fold
it if the AND was done at a different precision. The optimisation is
often (but not always) valid for that case too.
This patch adds a dedicated rule for the case where different precisions
are involved.
An alternative would be to extend the individual folds that together
handle the same-precision case so that those rules handle differing
precisions. But the risk is that that could replace narrow operations
with wide operations, which would be especially harmful on targets
like avr. It's also not obviously free of cycles.
I also wondered whether the converts should be non-optional.
gcc/
* match.pd: Fold ((X >> C1) & C2) * (1 << C1) to X & (C2 << C1).
gcc/testsuite/
* gcc.dg/fold-mul-and-lshift-1.c: New test.
* gcc.dg/fold-mul-and-lshift-2.c: Likewise.
|
|
multi-dimensional nonstring array initializers [PR117178]
My/Kees' earlier patches adjusted -Wunterminated-string-initialization
warning so that it doesn't warn about initializers of nonstring decls
and that nonstring attribute is allowed on multi-dimensional arrays.
Unfortunately as this testcase shows, we still warn about initializers
of multi-dimensional array nonstring decls.
The problem is that in that case field passed to output_init_element
is actually INTEGER_CST, index into the array.
For RECORD_OR_UNION_TYPE_P (constructor_type) field is a FIELD_DECL
which we want to use, but otherwise (in arrays) IMHO we want to use
constructor_fields (which is the innermost FIELD_DECL whose part
is being initialized), or - if that is NULL - constructor_decl, the
whole decl being initialized with multi-dimensional array type.
2025-03-11 Jakub Jelinek <jakub@redhat.com>
PR c/117178
* c-typeck.cc (output_init_element): Pass field to digest_init
only for record/union types, otherwise pass constructor_fields
if non-NULL and constructor_decl if constructor_fields is NULL.
* gcc.dg/Wunterminated-string-initialization-2.c: New test.
|
|
After r15-6660-g45d306a835cb3f865, in some cases
DFP constants would cause an ICE. This is due to
do a mismatch of a few things. The predicate of the move
uses aarch64_valid_fp_move to say if the constant is valid or not.
But after reload/LRA when can_create_pseudo_p returns false; aarch64_valid_fp_move
would return false for constants that were valid for the constraints
of the instruction. A strictor predicate compared to the constraint is wrong.
In this case `Uvi` is the constraint while aarch64_valid_fp_move allows it
via aarch64_can_const_movi_rtx_p for !DECIMAL_FLOAT_MODE_P, there is no such check
for DECIMAL_FLOAT_MODE_P.
The fix is to remove the check !DECIMAL_FLOAT_MODE_P in aarch64_valid_fp_move
and in the define_expand. As now the predicate allows a superset of what is allowed
by the constraints.
aarch64_float_const_representable_p should be rejecting DFP modes as they can't be used
with instructions like `mov s0, 1.0`.
Changes since v1:
* v2: Add check to aarch64_float_const_representable_p for DFP.
Built and tested on aarch64-linux-gnu with no regressions.
PR target/119131
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_valid_fp_move): Remove check
for !DECIMAL_FLOAT_MODE_P.
(aarch64_float_const_representable_p): Reject decimal floating modes.
* config/aarch64/aarch64.md (mov<mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr119131-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
gcc/testsuite/ChangeLog
* gcc.dg/builtin-bswap-5.c: Improve test vector to avoid nibble
swaps passing.
|
|
The following makes sure to convert the folded expression to the
original expression type.
PR middle-end/119204
* builtins.cc (fold_builtin_strcspn): Preserve the original
expression type.
* gcc.dg/pr119204.c: New testcase.
|
|
The following testcase takes very long time to compile, because
skip_simple_arithmetic decides to first call tree_invariant_p on
the second argument (and indirectly recurse there). I think before
canonicalization of operands for commutative binary expressions
(and for non-commutative ones always) it is pretty common that the
first operand is a constant, something which tree_invariant_p handles
immediately, so the following patch special cases that; I've added
there a tree_invariant_p call too after the checks, while it is not
really needed currently, tree_invariant_p has the same checks, I wanted
to be prepared in case tree_invariant_p changes. But if you think
I should avoid it, I can drop it too.
This is just a partial fix, I think one can certainly construct a testcase
which will still have horrible compile time complexity (but I've tried and
haven't managed to do so), so perhaps we should just limit the recursion
depth through skip_simple_arithmetic/tree_invariant_p with some defaulted
argument.
2025-03-11 Jakub Jelinek <jakub@redhat.com>
PR c/119183
* tree.cc (skip_simple_arithmetic): If first operand of binary
expr is TREE_CONSTANT or TREE_READONLY with no side-effects, call
tree_invariant_p on that operand first instead of on the second.
* gcc.dg/pr119183.c: New test.
|
|
The following testcase shows a bug in unwind-dw2-btree.h.
In short, the header provides lock-free btree data structure (so no parent
link on nodes, both insertion and deletion are done in top-down walks
with some locking of just a few nodes at a time so that lookups can notice
concurrent modifications and retry, non-leaf (inner) nodes contain keys
which are initially the base address of the left-most leaf entry of the
following child (or all ones if there is none) minus one, insertion ensures
balancing of the tree to ensure [d/2, d] entries filled through aggressive
splitting if it sees a full tree while walking, deletion performs various
operations like merging neighbour trees, merging into parent or moving some
nodes from neighbour to the current one).
What differs from the textbook implementations is mostly that the leaf nodes
don't include just address as a key, but address range, address + size
(where we don't insert any ranges with zero size) and the lookups can be
performed for any address in the [address, address + size) range. The keys
on inner nodes are still just address-1, so the child covers all nodes
where addr <= key unless it is covered already in children to the left.
The user (static executables or JIT) should always ensure there is no
overlap in between any of the ranges.
In the testcase a bunch of insertions are done, always followed by one
removal, followed by one insertion of a range slightly different from the
removed one. E.g. in the first case [&code[0x50], &code[0x59]] range
is removed and then we insert [&code[0x4c], &code[0x53]] range instead.
This is valid, it doesn't overlap anything. But the problem is that some
non-leaf (inner) one used the &code[0x4f] key (after the 11 insertions
completely correctly). On removal, nothing adjusts the keys on the parent
nodes (it really can't in the top-down only walk, the keys could be many nodes
above it and unlike insertion, removal only knows the start address, doesn't
know the removed size and so will discover it only when reaching the leaf
node which contains it; plus even if it knew the address and size, it still
doesn't know what the second left-most leaf node will be (i.e. the one after
removal)). And on insertion, if nodes aren't split at a level, nothing
adjusts the inner keys either. If a range is inserted and is either fully
bellow key (keys are - 1, so having address + size - 1 being equal to key is
fine) or fully after key (i.e. address > key), it works just fine, but if
the key is in a middle of the range like in this case, &code[0x4f] is in the
middle of the [&code[0x4c], &code[0x53]] range, then insertion works fine
(we only use size on the leaf nodes), and lookup of the addresses below
the key work fine too (i.e. [&code[0x4c], &code[0x4f]] will succeed).
The problem is with lookups after the key (i.e. [&code[0x50, &code[0x53]]),
the lookup looks for them in different children of the btree and doesn't
find an entry and returns NULL.
As users need to ensure non-overlapping entries at any time, the following
patch fixes it by adjusting keys during insertion where we know not just
the address but also size; if we find during the top-down walk a key
which is in the middle of the range being inserted, we simply increase the
key to be equal to address + size - 1 of the range being inserted.
There can't be any existing leaf nodes overlapping the range in correct
programs and the btree rebalancing done on deletion ensures we don't have
any empty nodes which would also cause problems.
The patch adjusts the keys in two spots, once for the current node being
walked (the last hunk in the header, with large comment trying to explain
it) and once during inner node splitting in a parent node if we'd otherwise
try to add that key in the middle of the range being inserted into the
parent node (in that case it would be missed in the last hunk).
The testcase covers both of those spots, so succeeds with GCC 12 (which
didn't have btrees) and fails with vanilla GCC trunk and also fails if
either the
if (fence < base + size - 1)
fence = iter->content.children[slot].separator = base + size - 1;
or
if (left_fence >= target && left_fence < target + size - 1)
left_fence = target + size - 1;
hunk is removed (of course, only with the current node sizes, i.e. up to
15 children of inner nodes and up to 10 entries in leaf nodes).
2025-03-10 Jakub Jelinek <jakub@redhat.com>
Michael Leuchtenburg <michael@slashhome.org>
PR libgcc/119151
* unwind-dw2-btree.h (btree_split_inner): Add size argument. If
left_fence is in the middle of [target,target + size - 1] range,
increase it to target + size - 1.
(btree_insert): Adjust btree_split_inner caller. If fence is smaller
than base + size - 1, increase it and separator of the slot to
base + size - 1.
* gcc.dg/pr119151.c: New test.
|
|
After d34cda720988674bcf8a24267c9e1ec61335d6de, what was originally
not vectorizable can now be vectorized. So adjust
gcc.dg/vect/slp-26.c.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/slp-26.c: Adjust.
|
|
The issue is the same as 12383255fe4e82c31f5e42c72a8fbcb1b5dea35d.
Neither is .REDUC_PLUS set for V2SImode on LoongArch, so add it
to the list of targets not expecting BB vectorization.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-77.c: Add loongarch*-*-* to the list
of expected failing targets.
|
|
By default, vectorization is not enabled on LoongArch,
resulting in the failure of these two test cases.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr112325.c: Add the vector compilation
option '-mlsx' for LoongArch.
* gcc.dg/vect/pr117888-1.c: Likewise.
|
|
After r12-5300-gf98f373dd822b3, value_replacement would be able to look at the
following cfg structure:
```
<bb 5> [local count: 1014686024]:
if (h_6 != 0)
goto <bb 7>; [94.50%]
else
goto <bb 6>; [5.50%]
<bb 6> [local count: 114863530]:
# h_6 = PHI <0(4), 1(5)>
<bb 7> [local count: 1073741824]:
# f_8 = PHI <0(5), h_6(6)>
_9 = f_8 ^ 1;
a.0_10 = a;
_11 = _9 + a.0_10;
if (_11 != -117)
goto <bb 5>; [94.50%]
else
goto <bb 8>; [5.50%]
```
value_replacement would incorrectly think the middle bb (6) was empty and so it decides
to remove condition in bb5 and replacing it with 0 as the function thought it was `h_6 ? 0 : h_6`.
But since the there is an incoming phi node to bb6 defining h_6 that is incorrect.
The fix is to check if there is phi nodes in the middle bb and set empty_or_with_defined_p to false.
This was not needed before r12-5300-gf98f373dd822b3 because the phi would have been dead otherwise due to
other checks.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/118922
gcc/ChangeLog:
* tree-ssa-phiopt.cc (value_replacement): Set empty_or_with_defined_p
to false when there is phi nodes for the middle bb.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr118922-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The test spuriously failed on pru-unknown-elf due to missing support for
_Float16 type.
PR target/119133
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr119133.c: Require effective target float16.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
[PR117178]
When initializing a nonstring char array when compiled with
-Wunterminated-string-initialization the warning trips even when
truncating the trailing NUL character from the string constant. Only
warn about this when running under -Wc++-compat since under C++ we should
not initialize nonstrings from C strings.
This patch separates the -Wunterminated-string-initialization and
-Wc++-compat warnings, they are now independent option, the former implied
by -Wextra, the latter not implied by anything. If -Wc++-compat is in effect,
it takes precedence over -Wunterminated-string-initialization and warns regardless
of nonstring attribute, otherwise if -Wunterminated-string-initialization is
enabled, it warns only if there isn't nonstring attribute.
In all cases, the warnings and also pedwarn_init for even larger sizes now
provide details on the lengths.
2025-03-07 Kees Cook <kees@kernel.org>
Jakub Jelinek <jakub@redhat.com>
PR c/117178
gcc/
* doc/invoke.texi (Wunterminated-string-initialization): Document
the new interaction between this warning and -Wc++-compat and that
initialization of decls with nonstring attribute aren't warned about.
gcc/c-family/
* c.opt (Wunterminated-string-initialization): Don't depend on
-Wc++-compat.
gcc/c/
* c-typeck.cc (digest_init): Add DECL argument. Adjust wording of
pedwarn_init for too long strings and provide details on the lengths,
for string literals where just the trailing NULL doesn't fit warn for
warn_cxx_compat with OPT_Wc___compat, wording which mentions "for C++"
and provides details on lengths, otherwise for
warn_unterminated_string_initialization adjust the warning, provide
details on lengths and don't warn if get_attr_nonstring_decl (decl).
(build_c_cast, store_init_value, output_init_element): Adjust
digest_init callers.
gcc/testsuite/
* gcc.dg/Wunterminated-string-initialization.c: Add additional test
coverage.
* gcc.dg/Wcxx-compat-14.c: Check in dg-warning for "for C++" part of
the diagnostics.
* gcc.dg/Wcxx-compat-23.c: New test.
* gcc.dg/Wcxx-compat-24.c: New test.
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
I missed these two testcases in the diff when looking for testcases
that fail. The change is the same as what was done for
gcc.dg/Wreturn-mismatch-2.c.
Pushed as obvious after a quick test.
gcc/testsuite/ChangeLog:
* gcc.dg/Wreturn-mismatch-2a.c: Change dg-warning
for the last -Wreturn-type to dg-bogus.
* gcc.dg/Wreturn-mismatch-6.c: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Like r5-6912-g3dbb84276aca10 but this is for the C front-end.
Basically we have an error on a return statement, we just return
error_mark_node and then the warning happens as there is no return
statement. Anyways instead mark the current function for supression
of the warning instead.
PR c/60440
gcc/c/ChangeLog:
* c-typeck.cc (c_finish_return): Mark the current function
for supression of the -Wreturn-type if there was an error
on the return statement.
gcc/testsuite/ChangeLog:
* gcc.dg/Wreturn-mismatch-2.c: Change dg-warning
for the last -Wreturn-type to dg-bogus.
* gcc.dg/pr60440-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This fixes two PRs on Early break vectorization by delaying the safety checks to
vectorizable_load when the VF, VMAT and vectype are all known.
This patch does add two new restrictions:
1. On LOAD_LANES targets, where the buffer size is known, we reject non-power
of two group sizes, as they are unaligned every other iteration and so may
cross a page unwittingly. For those cases require partial masking support.
2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if
we cannot peel for alignment, as the alignment requirement is quite large at
GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we
don't support it for now.
There are other steps documented inside the code itself so that the reasoning
is next to the code.
As a fall-back, when the alignment fails we require partial vector support.
For VLA targets like SVE return element alignment as the desired vector
alignment. This means that the loads are never misaligned and so annoying it
won't ever need to peel.
So what I think needs to happen in GCC 16 is that.
1. during vect_compute_data_ref_alignment we need to take the max of
POLY_VALUE_MIN and vector_alignment.
2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a
check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use as a
proxy for pagesize.
3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in
vect_determine_partial_vectors_and_peeling since the first iteration has to
be partial. Require LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P otherwise we have
to fail to vectorize.
4. Create a default mask to be used, so that vect_use_loop_mask_for_alignment_p
becomes true and we generate the peeled check through loop control for
partial loops. From what I can tell this won't work for
LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at
all in the compiler. That would need to be done independently from the
above.
In any case, not GCC 15 material so I've kept the WIP patches I have downstream.
Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.
gcc/ChangeLog:
PR tree-optimization/118464
PR tree-optimization/116855
* doc/invoke.texi (min-pagesize): Update docs with vectorizer use.
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay
checks.
(vect_compute_data_ref_alignment): Remove alignment checks and move to
get_load_store_type, increase group access alignment.
(vect_enhance_data_refs_alignment): Add note to comment needing
investigating.
(vect_analyze_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): For group loads look at first DR.
* tree-vect-stmts.cc (get_load_store_type):
Perform safety checks for early break pfa.
* tree-vectorizer.h (dr_set_safe_speculative_read_required,
dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New.
(need_peeling_for_alignment): Renamed to...
(safe_speculative_read_required): .. This
(class dr_vec_info): Add scalar_access_known_in_bounds.
gcc/testsuite/ChangeLog:
PR tree-optimization/118464
PR tree-optimization/116855
* gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the
load type is relaxed later.
* gcc.dg/vect/vect-early-break_121-pr114081.c: Update.
* gcc.dg/vect/vect-early-break_22.c: Require partial vectors.
* gcc.dg/vect/vect-early-break_128.c: Likewise.
* gcc.dg/vect/vect-early-break_26.c: Likewise.
* gcc.dg/vect/vect-early-break_43.c: Likewise.
* gcc.dg/vect/vect-early-break_44.c: Likewise.
* gcc.dg/vect/vect-early-break_2.c: Require load_lanes.
* gcc.dg/vect/vect-early-break_7.c: Likewise.
* gcc.dg/vect/vect-early-break_132-pr118464.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa1.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa11.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa10.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa2.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa3.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa4.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa5.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa6.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa7.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa8.c: New test.
* gcc.dg/vect/vect-early-break_133_pfa9.c: New test.
* gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment.
* gcc.dg/vect/vect-early-break_18.c: Likewise.
* gcc.dg/vect/vect-early-break_20.c: Likewise.
* gcc.dg/vect/vect-early-break_21.c: Likewise.
* gcc.dg/vect/vect-early-break_38.c: Likewise.
* gcc.dg/vect/vect-early-break_6.c: Likewise.
* gcc.dg/vect/vect-early-break_53.c: Likewise.
* gcc.dg/vect/vect-early-break_56.c: Likewise.
* gcc.dg/vect/vect-early-break_57.c: Likewise.
* gcc.dg/vect/vect-early-break_81.c: Likewise.
|
|
When we BB vectorize an if-converted loop body we make sure to not
leave around .MASK_LOAD or .MASK_STORE created by if-conversion but
we failed to check for .MASK_CALL.
PR tree-optimization/119145
* tree-vectorizer.cc (try_vectorize_loop_1): Avoid BB
vectorizing an if-converted loop body when there's a .MASK_CALL
in the loop body.
* gcc.dg/vect/pr119145.c: New testcase.
|
|
tree-data-refs.cc uses alignment information to try to optimise
the code generated for alias checks. The assumption for "normal"
non-grouped, full-width scalar accesses was that the access size
would be a multiple of the alignment. As Richi notes in the PR,
this is a documented precondition of dr_with_seg_len:
/* The minimum common alignment of DR's start address, SEG_LEN and
ACCESS_SIZE. */
unsigned int align;
PR115192 was a case in which this assumption didn't hold. The access
was part of an aligned 4-element group, but only the first 2 elements
of the group were accessed. The alignment was therefore double the
access size.
In r15-820-ga0fe4fb1c8d78045 I'd "fixed" that by capping the
alignment in one of the output routines. But I think that was
misconceived. The precondition means that we should cap the
alignment at source instead.
Failure to do that caused a similar wrong code bug in this PR,
where the alignment comes from a short bitfield access rather
than from a group access.
gcc/
PR tree-optimization/116125
* tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Make
the dr_with_seg_len alignment fields describe tha access sizes as
well as the pointer alignment.
* tree-data-ref.cc (create_intersect_range_checks): Don't compensate
for invalid alignment fields here.
gcc/testsuite/
PR tree-optimization/116125
* gcc.dg/vect/pr116125.c: New test.
|
|
lowpart_subreg ICEs are the gift that keeps giving. This is another
case where we need to use force_lowpart_subreg instead, to handle
cases where the input is already a subreg and where the combined
subreg is not allowed as a single operation.
We don't need to check can_create_pseudo_p since the input should
be a hard register rather than a subreg if !can_create_pseudo_p.
gcc/
PR target/119133
* config/aarch64/aarch64.md
(*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): Use
force_lowpart_subreg.
gcc/testsuite/
PR target/119133
* gcc.dg/torture/pr119133.c: New test.
|
|
This fixes the ping-ponging of live sets in ext-dce which is left
unresolved can lead to infinite loops in the ext-dce pass as seen by the
P1 regression 119099.
At its core instead of replacing the livein set with the just recomputed
data, we IOR in the just recomputed data to the existing livein set.
That ensures the existing livein set never shrinks.
Bootstrapped and regression tested on x86. I've also thrown this into
my tester to verify it across multiple targets and that we aren't
regressing the (limited) tests we have in place for ext-dce's
optimization behavior.
While it's a generic patch, I'll wait for the RISC-V tester to run is
course before committing.
PR rtl-optimization/119099
gcc/
* ext-dce.cc (ext_dce_rd_transfer_n): Do not allow the livein
set to shrink.
gcc/testsuite/
* gcc.dg/torture/pr119099.c: New test.
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
|
|
The following testcase is miscompiled during evrp.
Before vrp, we have (from ccp):
# RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x2d
_3 = _2 + 18446744073708503085;
...
# RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x59
_6 = (long long unsigned int) _5;
# RANGE [irange] int [-INF, +INF] MASK 0xffffc000 VALUE 0x34
_7 = k_11 + -1048524;
switch (_7) <default: <L5> [33.33%], case 8: <L7> [33.33%], case 24: <L6> [33.33%], case 32: <L6> [33.33%]>
...
# RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc07d VALUE 0x0
# i_20 = PHI <_3(4), 0(3), _6(2)>
and evrp is now trying to figure out range for i_20 in range_of_phi.
All the ranges and MASK/VALUE pairs above are correct for the testcase,
k_11 and _2 based on it is a result of multiplication by a constant with low
14 bits cleared and then some numbers are added to it.
There is an obvious missed optimization for which I've filed PR119039,
simplify_switch_using_ranges could see that all the labels but default
are unreachable because the controlling expression has
MASK 0xffffc000 VALUE 0x34 and none of 8, 24 and 32 satisfy that.
Anyway, during range_of_phi for i_20, we process the PHI arguments
in order. For the _3(4) case, we figure out that it is reachable
through the case 24: case 32: labels only of the switch and that
0x34 - 0x2d is 7, so derive
[irange] long long unsigned int [17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
(the MASK/VALUE just got inherited from the _3 earlier range).
Now (not suprisingly because those labels aren't actually reachable),
that range is inconsistent, 0x2d is 45, so there is conflict between the
values and the irange_bitmask.
value-range.{h,cc} code differentiates between actually stored
irange_bitmask, which is that MASK 0xffffffffffffc000 VALUE 0x2d, and
semantic bitmask, which is what get_bitmask returns. That is
// The mask inherent in the range is calculated on-demand. For
// example, [0,255] does not have known bits set by default. This
// saves us considerable time, because setting it at creation incurs
// a large penalty for irange::set. At the time of writing there
// was a 5% slowdown in VRP if we kept the mask precisely up to date
// at all times. Instead, we default to -1 and set it when
// explicitly requested. However, this function will always return
// the correct mask.
//
// This also means that the mask may have a finer granularity than
// the range and thus contradict it. Think of the mask as an
// enhancement to the range. For example:
//
// [3, 1000] MASK 0xfffffffe VALUE 0x0
//
// 3 is in the range endpoints, but is excluded per the known 0 bits
// in the mask.
//
// See also the note in irange_bitmask::intersect.
irange_bitmask bm
= get_bitmask_from_range (type (), lower_bound (), upper_bound ());
if (!m_bitmask.unknown_p ())
bm.intersect (m_bitmask);
Now, get_bitmask_from_range here is MASK 0x1f VALUE 0x0 and it intersects
that with that MASK 0xffffffffffffc000 VALUE 0x2d.
Which triggers the ugly special case in irange_bitmask::intersect:
// If we have two known bits that are incompatible, the resulting
// bit is undefined. It is unclear whether we should set the entire
// range to UNDEFINED, or just a subset of it. For now, set the
// entire bitmask to unknown (VARYING).
if (wi::bit_and (~(m_mask | src.m_mask),
m_value ^ src.m_value) != 0)
{
unsigned prec = m_mask.get_precision ();
m_mask = wi::minus_one (prec);
m_value = wi::zero (prec);
}
so the semantic bitmask is actually MASK 0xffffffffffffffff VALUE 0x0.
Next, range_of_phi attempts to union it with the 0(3) PHI argument,
and during irange::union_ first adds the [0,0] to the subranges, so
[irange] long long unsigned int [0, 0][17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
and then goes on to irange::union_bitmask which does
if (m_bitmask == r.m_bitmask)
return false;
irange_bitmask bm = get_bitmask ();
irange_bitmask save = bm;
bm.union_ (r.get_bitmask ());
if (save == bm)
return false;
m_bitmask = bm;
if (save == get_bitmask ())
return false;
m_bitmask MASK 0xffffffffffffc000 VALUE 0x2d isn't the same as
r.m_bitmask MASK 0x0 VALUE 0x0, so we compute the semantic bitmask
(but note, not from the original range before union, but the modified one,
dunno if that isn't a problem as well), which is still the VARYING/unknown_p
one, union_ that with MASK 0x0 VALUE 0x0 and get still
MASK 0xffffffffffffffff VALUE 0x0, so don't update anything, the semantic
bitmask didn't change, so we are fine (not!, see later).
Except then we try to union with the third PHI argument. And, because the
edge to that comes only from case 8: label and there is a known difference
between the two, the argument is actually already from earlier replaced by
45(2) constant. So, irange::union_ adds the [45, 45] range to the list
of subranges, but voila, 45 is 0x2d and satisfies the stored
MASK 0xffffffffffffc000 VALUE 0x2d and so the semantic bitmask changed to
from MASK 0xffffffffffffffff VALUE 0x0 to MASK 0xffffffffffffc000 VALUE 0x2d
by that addition. Eventually, we just optimize this to
[irange] long long unsigned int [45, 45] because that is the only range
which satisfies the bitmask. And that is wrong, at runtime i_20 has
value 0.
The following patch attempts to detect this case where get_bitmask
turns some non-VARYING m_bitmask into VARYING one because of a conflict
and in that case makes sure m_bitmask is actually updated rather than
unmodified, so that later union_ doesn't cause problems.
I also wonder whether e.g. get_bitmask couldn't have special case for this
and if bm.intersect (m_bitmask); yields unknown_p from something not
originally unknown_p, perhaps chooses to just use get_bitmask_from_range
value and ignore the stored m_bitmask. Though, dunno how union_bitmask
in that case would figure out it needs to update m_bitmask.
2025-03-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118953
* value-range.cc (irange::union_bitmask): Update m_bitmask if
get_bitmask () is unknown_p and m_bitmask is not even when the
semantic bitmask didn't change and returning false.
* gcc.dg/torture/pr118953.c: New test.
|
|
For strict-alignment targets we can end up with BLKmode single-element
array types when the element type is unaligned. This confuses
type checking since the canonical type would have an aligned
element type and a non-BLKmode mode. The following simply ignores
the mode we assign to array types for this purpose, like we already
do for record and union types.
PR middle-end/97323
* tree.cc (gimple_canonical_types_compatible_p): Ignore
TYPE_MODE also for ARRAY_TYPE.
(verify_type): Likewise.
* gcc.dg/pr97323.c: New testcase.
|