Age | Commit message (Collapse) | Author | Files | Lines |
|
Now that VEC_COND_EXPR has normal unnested operands,
operation_could_trap_p can treat it like any other expression.
This fixes many testsuite ICEs for SVE, but it turns out that none
of the tests in gcc.target/aarch64/sve were affected. Anyone testing
on non-SVE aarch64 therefore wouldn't have seen it.
gcc/
PR middle-end/100284
* gimple.c (gimple_could_trap_p_1): Remove VEC_COND_EXPR test.
* tree-eh.c (operation_could_trap_p): Handle VEC_COND_EXPR rather
than asserting on it.
gcc/testsuite/
PR middle-end/100284
* gcc.target/aarch64/sve/pr81003.c: New test.
|
|
gcc/testsuite/ChangeLog:
PR testsuite/100272
* g++.dg/ext/flexary13.C: Remove malformed directives.
|
|
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_aix_precompute_tls_p): Protect
with TARGET_AIX_OS.
|
|
AIX uses a compiler-managed TOC for global data, including TLS symbols.
The GCC TOC implementation manages the TOC entries through the constant pool.
TLS symbols sometimes require a function call to obtain the TLS base
pointer. The arguments to the TLS call can conflict with arguments to
a normal function call if the TLS symbol is an argument in the normal call.
GCC specifically checks for this situation and precomputes the TLS
arguments, but the mechanism to check for this requirement utilizes
legitimate_constant_p(). The necessary result of legitimate_constant_p()
for correct TOC behavior and for correct TLS argument behavior is in
conflict.
This patch adds a new target hook precompute_tls_p() to decide if an
argument should be precomputed regardless of the result from
legitmate_constant_p().
gcc/ChangeLog:
PR target/94177
* calls.c (precompute_register_parameters): Additionally test
targetm.precompute_tls_p to pre-compute argument.
* config/rs6000/aix.h (TARGET_PRECOMPUTE_TLS_P): Define.
* config/rs6000/rs6000.c (rs6000_aix_precompute_tls_p): New.
* target.def (precompute_tls_p): New.
* doc/tm.texi.in (TARGET_PRECOMPUTE_TLS_P): Add hook documentation.
* doc/tm.texi: Regenerated.
|
|
Pedantically signed vs. unsigned mismatches in va_arg are only well defined
if the value can be represented in both signed and unsigned integer types.
2021-04-27 Jakub Jelinek <jakub@redhat.com>
PR target/100200
* config/aarch64/aarch64.c (aarch64_print_operand): Cast -UINTVAL
back to HOST_WIDE_INT.
|
|
As the test case shows, the outer mode may have a higher alignment
requirement than the inner mode here.
2021-04-27 Bernd Edlinger <bernd.edlinger@hotmail.de>
PR target/100106
* simplify-rtx.c (simplify_context::simplify_subreg): Check the
memory alignment for the outer mode.
* gcc.c-torture/compile/pr100106.c: New testcase.
|
|
Change a while loop in op_by_pieces_d::run to a do-while loop to prepare
for offset adjusted operation for the remaining bytes on the last piece
operation of a memory region.
PR middle-end/90773
* expr.c (op_by_pieces_d::get_usable_mode): New member function.
(op_by_pieces_d::run): Cange a while loop to a do-while loop.
|
|
The PR shows two ICEs with __sync_bool_compare_and_swap and
-mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and one
later on, after the CAS insn is split.
The LRA ICE occurs because the
@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern attempts to tie
two output operands together (operands 0 and 1 in the third
alternative). LRA can't handle this, since it doesn't make sense for an
insn to assign to the same operand twice.
The later (post-splitting) ICE occurs because the expansion of the
cbranchsi4_scratch insn doesn't quite go according to plan. As it
stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
attempting to pass a register (neg_bval) to use as a scratch register.
However, since the RTL template has a match_scratch here,
gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx.
Since this is all happening after RA, this is doomed to fail (and we get
an ICE about the insn not matching its constraints).
It seems that the motivation for the choice of constraints in the
atomic_compare_and_swap pattern comes from an attempt to satisfy the
constraints of the cbranchsi4_scratch insn. This insn requires the
scratch register to be the same as the input register in the case that
we use a larger negative immediate (one that satisfies J, but not L).
Of course, as noted above, LRA refuses to assign two output operands to
the same register, so this was never going to work.
The solution I'm proposing here is to collapse the alternatives to the
CAS insn (allowing the two output register operands to be matched to
different registers) and to ensure that the constraints for
cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this by
inserting a move to ensure the source and destination registers match if
necessary (i.e. in the case of large negative immediates).
Another notable change here is that we only do:
emit_move_insn (neg_bval, const1_rtx);
for non-negative immediates. This is because the ADDS instruction used in
the negative case suffices to leave a suitable value in neg_bval: if the
operands compare equal, we don't take the branch (so neg_bval will be
set by the load exclusive). Otherwise, the ADDS will leave a nonzero
value in neg_bval, which will correctly signal that the CAS has failed
when it is later negated.
gcc/ChangeLog:
PR target/99977
* config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
with negative immediates: ensure we expand cbranchsi4_scratch
correctly and ensure we satisfy its constraints.
* config/arm/sync.md
(@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): Don't
attempt to tie two output operands together with constraints;
collapse two alternatives.
(@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise.
* config/arm/thumb1.md (cbranchsi4_neg_late): New.
gcc/testsuite/ChangeLog:
PR target/99977
* gcc.target/arm/pr99977.c: New test.
|
|
The following patch fixes UBs in the compiler when negativing
a CONST_INT containing HOST_WIDE_INT_MIN. I've changed the spots where
there wasn't an obvious earlier condition check or predicate that
would fail for such CONST_INTs.
2021-04-27 Jakub Jelinek <jakub@redhat.com>
PR target/100200
* config/aarch64/predicates.md (aarch64_sub_immediate,
aarch64_plus_immediate): Use -UINTVAL instead of -INTVAL.
* config/aarch64/aarch64.md (casesi, rotl<mode>3): Likewise.
* config/aarch64/aarch64.c (aarch64_print_operand,
aarch64_split_atomic_op, aarch64_expand_subvti): Likewise.
|
|
The following testcase ICEs at -O0, because lower_vec_perm sees the
_1 = { 0, 0, 0, 0, 0, 0, 0, 0 };
_2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>;
_3 = { 6, 0, 0, 0, 0, 0, 0, 0 };
_4 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _2, _3>;
and as the ISA is SSE2, there is no support for the particular permutation
nor for variable mask permutation. But, the code to match vec_shl matches
it, because the permutation has the first operand a zero vector and the
mask picks all elements randomly from that vector.
So, in the end that isn't a vec_shl, but the permutation could be in theory
optimized into the first argument. As we keep it as is, it will fail
during expansion though, because that for vec_shl correctly requires that
it actually is a shift:
unsigned firstidx = 0;
for (unsigned int i = 0; i < nelt; i++)
{
if (known_eq (sel[i], nelt))
{
if (i == 0 || firstidx)
return NULL_RTX;
firstidx = i;
}
else if (firstidx
? maybe_ne (sel[i], nelt + i - firstidx)
: maybe_ge (sel[i], nelt))
return NULL_RTX;
}
if (firstidx == 0)
return NULL_RTX;
first = firstidx;
The if (firstidx == 0) return NULL; is what is missing a counterpart
on the lower_vec_perm side.
As with optimize != 0 we fold it in other spots, I think it is not needed
to optimize this cornercase in lower_vec_perm (which would mean we'd need
to recurse on the newly created _4 = { 0, 0, 0, 0, 0, 0, 0, 0 };
whether it is supported or not).
2021-04-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/100239
* tree-vect-generic.c (lower_vec_perm): Don't accept constant
permutations with all indices from the first zero element as vec_shl.
* gcc.dg/pr100239.c: New test.
|
|
The following testcase fails with -fcompare-debug. The problem is that
outgoing_edges_match behaves differently between -g0 and -g, if
some load/store with REG_EH_REGION is followed by DEBUG_INSNs, the
REG_EH_REGION check is not done, while when there are no DEBUG_INSNs, it is
done.
We already compute last1 and last2 as BB_END (bb{1,2}) with skipped debug
insns and notes, so this patch just uses those.
2021-04-27 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/100254
* cfgcleanup.c (outgoing_edges_match): Check REG_EH_REGION on
last1 and last2 insns rather than BB_END (bb1) and BB_END (bb2) insns.
* g++.dg/opt/pr100254.C: New test.
|
|
This makes sure to remove unused locals and prune CLOBBERs after
the first scalar cleanup phase after IPA optimizations. On the
testcase in the PR this results in 8000 CLOBBERs removed which
in turn unleashes more DSE which otherwise hits its walking limit
of 256 too early on this testcase.
2021-04-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/99912
* passes.def: Add comment about new TODO_remove_unused_locals.
* tree-stdarg.c (pass_data_stdarg): Run TODO_remove_unused_locals
at start.
|
|
For the testcase in the PR the main SRA pass is unable to do some
important scalarizations because dead stores of addresses make
the candiate variables disqualified. The following patch adds
another DSE pass before SRA forming a DCE/DSE pair and moves the
DSE pass that is currently closely after SRA up to after the
next DCE pass, forming another DCE/DSE pair now residing after PRE.
2021-04-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/99912
* passes.def (pass_all_optimizations): Add pass_dse before
the first pass_dce, move the first pass_dse before the
pass_dce following pass_pre.
* gcc.dg/tree-ssa/ldist-33.c: Disable PRE and LIM.
* gcc.dg/tree-ssa/pr96789.c: Adjust dump file scanned.
* gcc.dg/tree-ssa/ssa-dse-28.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-29.c: Likewise.
|
|
This patch adds some ctz simplifications (e.g. ctz (x) >= 3 can be done by
testing if the low 3 bits are zero, etc.).
In addition, I've noticed that in the CLZ case, the
#ifdef CLZ_DEFINED_VALUE_AT_ZERO don't really work as intended, they
are evaluated during genmatch and the macro is not defined then
(but, because of the missing tm.h includes it isn't defined in
gimple-match.c or generic-match.c either). And when tm.h is included,
defaults.h is included which defines a fallback version of that macro.
For GCC 12, I wonder if it wouldn't be better to say in addition to __builtin_c[lt]z*
is always UB at zero that it would be undefined for .C[LT]Z ifn too if it
has just one operand and use a second operand to be the constant we expect
at zero.
2021-04-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/95527
* generic-match-head.c: Include tm.h.
* gimple-match-head.c: Include tm.h.
* match.pd (CLZ == INTEGER_CST): Don't use
#ifdef CLZ_DEFINED_VALUE_AT_ZERO, only test CLZ_DEFINED_VALUE_AT_ZERO
if clz == CFN_CLZ. Add missing val declaration.
(CTZ cmp CST): New simplifications.
* gcc.dg/tree-ssa/pr95527-2.c: New test.
|
|
The following patch tests both x / y * y and x - x % y expansion for the
former GIMPLE code and chooses the cheaper of those sequences.
2021-04-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/96696
* expr.c (expand_expr_divmod): New function.
(expand_expr_real_2) <case TRUNC_DIV_EXPR>: Use it for truncations and
divisions. Formatting fixes.
<case MULT_EXPR>: Optimize x / y * y as x - x % y if the latter is
cheaper.
* gcc.target/i386/pr96696.c: New test.
|
|
When IPA-SRA removes an SSA_NAME from a LHS of a call statement
because it is not necessary, it does not release it. This patch fixes
that.
gcc/ChangeLog:
2021-04-08 Martin Jambor <mjambor@suse.cz>
PR ipa/99951
* ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
If removing a call statement LHS SSA name, release it.
|
|
arm_compute_save_core_reg_mask contains UB in that the saved PIC
register number is used to create a bit mask. However, for some target
options this register is undefined and we end up with a shift of ~0.
On native compilations this is benign since the shift will still be
large enough to move the bit outside of the range of the mask, but if
cross compiling from a system that truncates out-of-range shifts to
zero (or worse, raises a trap for such values) we'll get potentially
wrong code (or a fault).
gcc:
PR target/100236
* config/arm/arm.c (THUMB2_WORK_REGS): Check PIC_OFFSET_TABLE_REGNUM
is valid before including it in the mask.
|
|
Even though "SVE type" and "SVE sizeless type" are marked as
affecting type identity, the middle end doesn't truly believe
it unless we also handle them in comp_type_attributes.
gcc/
PR target/100270
* config/aarch64/aarch64.c (aarch64_comp_type_attributes): Handle
SVE attributes.
gcc/testsuite/
PR target/100270
* gcc.target/aarch64/sve/acle/general-c/pr100270_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Change
expected error message when subtracting pointers to different
vector types. Expect warnings when mixing them elsewhere.
* gcc.target/aarch64/sve/acle/general/attributes_7.c: Remove
XFAILs. Tweak error messages for some cases.
|
|
Adding +nosve is more robust than checking for command-line arguments,
since SVE can be enabled by default or indirectly via other options.
gcc/testsuite/
* gcc.target/aarch64/simd/ssra.c: Use +nosve
* gcc.target/aarch64/simd/usra.c: Likewise.
|
|
This adds disambiguation of the access size vs. the decl size
in the pointer based vs. decl based disambiguator. We have
a TBAA based check like this already but that's fend off when
seeing alias-sets of zero or when -fno-strict-aliasing is in
effect. Also the perceived dynamic type could be smaller than
the actual access.
2021-04-13 Richard Biener <rguenther@suse.de>
PR tree-optimization/100051
* tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Add
disambiguator based on access size vs. decl size.
* gcc.dg/tree-ssa/ssa-fre-92.c: New testcase.
|
|
This re-enables PRE and fixes the malformed dg directive pointed
out in the PR. It all works as desired and I forgot why I
disabled this in the past.
2021-04-27 Richard Biener <rguenther@suse.de>
PR testsuite/100272
* gcc.dg/tree-ssa/predcom-1.c: Re-enable PRE and fix
malformed dg directive.
|
|
The bug points out several malformed dg directives, the following
fixes the obvious ones where the testcases keep working after the
change.
2021-04-27 Richard Biener <rguenther@suse.de>
PR testsuite/100272
* g++.dg/diagnostic/ptrtomem1.C: Fix dg directives.
* g++.dg/ipa/pr45572-2.C: Likewise.
* g++.dg/template/spec26.C: Likewise.
* gcc.dg/pr20126.c: Likewise.
* gcc.dg/tree-ssa/pr20739.c: Likewise.
|
|
PRE has code to adjust TBAA behavior for refs that expects the base
operation code to match. The testcase shows a case where we have
a VAR_DECL vs. a MEM_REF so add code to give up in such cases.
2021-04-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/100278
* tree-ssa-pre.c (compute_avail): Give up when we cannot
adjust TBAA beacuse of mismatching bases.
* gcc.dg/tree-ssa/pr100278.c: New testcase.
|
|
The following testcase shows that while we nicely optimize away the
useless and? of shift count before rotation for [SD]Imode rotates,
we don't do that for [QH]Imode.
The following patch optimizes that by using the right iterator on those
4 patterns.
2021-04-27 Jakub Jelinek <jakub@redhat.com>
PR target/99405
* config/i386/i386.md (*<insn><mode>3_mask, *<insn><mode>3_mask_1):
For any_rotate define_insn_split and following splitters, use
SWI iterator instead of SWI48.
* gcc.target/i386/pr99405.c: New test.
|
|
This relaxes the condition for the match.pd pattern doing vector ctor
element extracts to not require type identity but only size equality.
Since we vectorize pointer data as unsigned integer data such mismatches
have to be tolerated to optimize scalar code uses of vector results.
2021-03-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/99776
* match.pd (bit_field_ref (ctor)): Relax element extract
type compatibility checks.
* gcc.dg/tree-ssa/ssa-fre-91.c: New testcase.
|
|
processor_type
gcc/ChangeLog
* common/config/i386/i386-common.c (processor_names):
Sync processor_names with processor_type.
* config/i386/i386-options.c (processor_cost_table):
Sync processor_cost_table with processor_type.
|
|
|
|
During constexpr evaluation, a base-to-derived conversion may yield an
expression like (Derived*)(&D.2217.D.2106 p+ -4) where D.2217 is the
derived object and D.2106 is the base. But cxx_fold_indirect_ref
doesn't know how to resolve an INDIRECT_REF thereof to just D.2217,
because it doesn't handle POINTER_PLUS_EXPR of a COMPONENT_REF with
negative offset well: when the offset N is positive, it knows that
'&x p+ N' is equivalent to '&x.f p+ (N - bytepos(f))', but it doesn't
know about the reverse transformation, that '&x.f p+ N' is equivalent
to '&x p+ (N + bytepos(f))' when N is negative, which is important for
resolving such base-to-derived conversions and for accessing subobjects
backwards. This patch teaches cxx_fold_indirect_ref this reverse
transformation.
gcc/cp/ChangeLog:
PR c++/100209
* constexpr.c (cxx_fold_indirect_ref): Try to canonicalize the
object/offset pair for a POINTER_PLUS_EXPR of a COMPONENT_REF
with a negative offset into one whose offset is nonnegative
before calling cxx_fold_indirect_ref_1.
gcc/testsuite/ChangeLog:
PR c++/100209
* g++.dg/cpp1y/constexpr-base1.C: New test.
* g++.dg/cpp1y/constexpr-ptrsub1.C: New test.
|
|
It turned out that a compiler built without offloading support
and one with can produce slightly different diagnostic.
Offloading support implies ENABLE_OFFLOAD which implies that
g->have_offload is set when offloading is actually needed.
In cgraphunit.c, the latter causes flag_generate_offload = 1,
which in turn affects tree.c's free_lang_data.
The result is that the front-end specific diagnostic gets reset
('tree_diagnostics_defaults (global_dc)'), which affects in this
case 'Warning' vs. 'warning' via the Fortran frontend.
Result: 'Warning:' vs. 'warning:'.
Side note: Other FE also override the diagnostic, leading to
similar differences, e.g. the C++ FE outputs mangled function
names differently, cf. patch thread.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/par-reduction-2-1.f:
Use [Ww]arning in dg-bogus as FE diagnostic and default
diagnostic differ and the result depends on ENABLE_OFFLOAD.
* testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/classify-serial.f95:
Use [Ww]arning in dg-bogus as FE diagnostic and default
diagnostic differ and the result depends on ENABLE_OFFLOAD.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
|
|
The -fstrict-enums comment in the VR_ANTI_RANGE handling code is out
of date, as out-of-range ranges have already been handled by this
time. I've removed it.
Furthermore, optimizing ~[MIN,MAX] as VARYING instead of UNDEFINED is
an old idiom. I've been wanting to change it for a while, but have
only remembered late in the release cycle when it was too risky.
What I've chosen to do in this case is fall through to the code that
normalizes the range. This will correctly turn ~[MIN,MAX] into
UNDEFINED, yet leaving things alone in the case of -fstrict-enums
where [MIN,MAX] may not necessarily include the entire range of the
underlying precision. For example, if the domain of a strict enum is
[3,5] setting a VR_ANTI_RANGE of ~[3,5] should not cause neither
VR_UNDEFINED nor VR_VARYING, but just plain ~[3,5].
This is similar to what we do for -fstrict-enums when we set a range
of [3,5]. We leave it as a VR_RANGE, instead of upgrading it to
VR_VARYING.
gcc/ChangeLog:
* value-range.cc (irange::irange_set_1bit_anti_range): Add assert.
(irange::set): Call irange_set_1bit_anti_range for handling all
1-bit ranges. Fall through on ~[MIN,MAX].
|
|
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/par-reduction-2-1.f:
Correct spelling in dg-bogus to match -Wopenacc-parallelism.
* testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/classify-serial.f95:
Correct spelling in dg-bogus to match -Wopenacc-parallelism.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
|
|
This does for num_pairs() what my previous patch did for VR_UNDEFINED
and VR_VARYING.
Note that VR_ANTI_RANGE for legacy is always set to 2 ranges. There
is only one way of representing a range, so a range that can be
represented as a VR_RANGE will never have a kind of VR_ANTI_RANGE.
Also legacy symbolics can also use VR_ANTI_RANGE, but no one will ever
ask for the bounds of such range, so m_num_ranges is irrelevant.
gcc/ChangeLog:
* value-range.cc (irange::legacy_num_pairs): Remove.
(irange::invert): Change gcc_assert to gcc_checking_assert.
* value-range.h (irange::num_pairs): Adjust for a cached
num_pairs(). Also, rename all gcc_assert's to
gcc_checking_assert's.
|
|
Currently multi-ranges calculate the undefined and varying bits on the
fly, whereas legacy uses the m_kind field. Since we will always have
space in the irange class for a kind field, might as well keep it in
sync as ranges are created, thus speeding up lookups.
This patch, along with an upcoming ones for num_pairs(), speeds up EVRP
by 8.47%, VRP proper by 1.84% and overall compilation by 0.24%.
FWIW, since evrp is such a fast pass, and is hard to measure clock-wise,
we've been using callgrind to estimate improvements. This has coincided
more or less with -ftime-report numbers (albeit having to run -ftime-report
half a dozen times and use the average).
gcc/ChangeLog:
* value-range.cc (irange::operator=): Set m_kind.
(irange::copy_to_legacy): Handle varying and undefined sources
as a legacy copy since they can be easily copied.
(irange::irange_set): Set m_kind.
(irange::irange_set_anti_range): Same.
(irange::set): Rename normalize_min_max to normalize_kind.
(irange::verify_range): Adjust for multi-ranges having the
m_kind field set.
(irange::irange_union): Set m_kind.
(irange::irange_intersect): Same.
(irange::invert): Same.
* value-range.h (irange::kind): Always return m_kind.
(irange::varying_p): Rename to...
(irange::varying_comptaible_p): ...this.
(irange::undefined_p): Only look at m_kind.
(irange::irange): Always set VR_UNDEFINED if applicable.
(irange::set_undefined): Always set VR_UNDEFINED.
(irange::set_varying): Always set m_kind to VR_VARYING.
(irange::normalize_min_max): Rename to...
(irange::normalize_kind): ...this.
|
|
As of a few releases ago, varying_p() ranges are also constant_p.
Consequently, there is no need to check varying_p from either
symbolic_p or constant_p.
I have adjusted a few users of constant_p that were depending on
constant_p returning false for varying_p. In these cases, I have
placed the varying_p check before the constant_p check to avoid
the more expensive constant_p check when possible.
gcc/ChangeLog:
* gimple-ssa-evrp-analyze.c (evrp_range_analyzer::set_ssa_range_info):
Adjust for constant_p including varying_p.
* tree-vrp.c (vrp_prop::finalize): Same.
(determine_value_range): Same.
* vr-values.c (vr_values::range_of_expr): Same.
* value-range.cc (irange::symbolic_p): Do not check varying_p.
(irange::constant_p): Same.
|
|
A few of the undefined_p checks in the irange code are really checking if
there are sub-ranges. It just so happens that undefined_p is
implemented with num_ranges > 0, so it was a shorthand used throughout.
This shorthand was making the code unreadable.
gcc/ChangeLog:
* value-range.cc (irange::legacy_lower_bound): Replace
!undefined_p check with num_ranges > 0.
(irange::legacy_upper_bound): Same.
* value-range.h (irange::type): Same.
(irange::lower_bound): Same.
(irange::upper_bound): Same.
|
|
When we apply store motion and DSE manually to the bwaves kernel
in gfortran.dg/pr81303.f loop interchange no longer happens because
the perfect nest considered covers outer loops we cannot analyze
strides for. The following compensates for this by shrinking the
nest in this analysis which was already possible but on a too coarse
granularity. It shares the shrinked nest with the rest of the DRs
so the complexity overhead should be negligible.
2021-04-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/99956
* gimple-loop-interchange.cc (compute_access_stride):
Try instantiating the access in a shallower loop nest
if instantiating failed.
(compute_access_strides): Pass adjustable loop_nest
to compute_access_stride.
* gfortran.dg/pr99956.f: New testcase.
|
|
Some of the CMSE tests have 'dg-do run', but qemu-arm does not support
the privileged instructions involved; one has to use qemu-system-arm
for this, which in turn requires modifications to the default
newlib/libgloss startup code to enable the FPU as the FP status
registers need to be saved when using CMSE code.
This patch introduces arm_cmse_hw, similar to arm_neon_hw, to detect
whether the execution engine supports the CMSE instructions. If not,
we set dg-do-what-default to assemble instead of run. We thus remove
all the 'dg-do run' directives from CMSE tests, to rely on
dg-do-what-default instead.
Note that cmse-16.c used to pass with dg-do run under qemu-arm,
because the property being tested is not available (qemu-arm does not
model secure vs non-secure memory). The patch removes dg-do from it
too, since it is relevant only with an adequate simulator.
Before the patch, bitfield-[123].c and struct-1.c fail at execution
under qemu-arm. With the patch, execution is skipped.
The same tests pass under qemu-system-arm both with and without the
patch.
This avoids failures when testing with
-mthumb/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp under qemu-arm for
cortex-m33.
I'm also running tests with qemu-system-arm for cortex-m33, but I run
only cmse.exp with a patched newlib in this case: I use qemu-arm for
all combinations except that one because it's faster and supports
semihosting.
I do not have a setup to check this with actual hardware or another
simulator.
2021-04-26 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
* doc/sourcebuild.texi (arm_cmse_hw): Document.
gcc/testsuite/
* gcc.target/arm/cmse/bitfield-1.c: Remove dg-do.
* gcc.target/arm/cmse/bitfield-2.c: Likewise.
* gcc.target/arm/cmse/bitfield-3.c: Likewise.
* gcc.target/arm/cmse/cmse-16.c: Likewise.
* gcc.target/arm/cmse/struct-1.c: Likewise.
* gcc.target/arm/cmse/cmse.exp: Set dg-do-what-default depending
on arm_cmse_hw.
* lib/target-supports.exp (check_effective_target_arm_cmse_hw):
New.
|
|
While playing with other unrelated changes I hit an assemble-failure bug where
a pattern (one of the get_lane ones) that was using V4BF, V8BF as part of a
mode iterator and outputting registers with the vwcore attribute,
but there is no vwcore mapping for V4BF and V8BF.
This patch fixes that in the obvious way by adding the missing mappings
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/iterators.md (vwcore): Handle V4BF, V8BF.
|
|
The maximum supported alignment is 64-bit on 32-bit mode.
gcc/testsuite/
* gcc.dg/pr84877.c: XFAIL on SPARC as well.
|
|
... to diagnose potentially suboptimal choices regarding OpenACC parallelism.
Not enabled by default: too noisy ("*potentially* suboptimal choices"); see
XFAILed 'dg-bogus'es.
gcc/c-family/
* c.opt (Wopenacc-parallelism): New.
gcc/fortran/
* lang.opt (Wopenacc-parallelism): New.
gcc/
* omp-offload.c (oacc_validate_dims): Implement
'-Wopenacc-parallelism'.
* doc/invoke.texi (-Wopenacc-parallelism): Document.
gcc/testsuite/
* c-c++-common/goacc/diag-parallelism-1.c: New.
* c-c++-common/goacc/acc-icf.c: Specify '-Wopenacc-parallelism',
and match diagnostics, as appropriate.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/classify-serial.c: Likewise.
* c-c++-common/goacc/kernels-decompose-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
* c-c++-common/goacc/parallel-dims-1.c: Likewise.
* c-c++-common/goacc/parallel-reduction.c: Likewise.
* c-c++-common/goacc/pr70688.c: Likewise.
* c-c++-common/goacc/routine-1.c: Likewise.
* c-c++-common/goacc/routine-level-of-parallelism-2.c: Likewise.
* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/classify-parallel.f95: Likewise.
* gfortran.dg/goacc/classify-routine.f95: Likewise.
* gfortran.dg/goacc/classify-serial.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
* gfortran.dg/goacc/parallel-tree.f95: Likewise.
* gfortran.dg/goacc/routine-4.f90: Likewise.
* gfortran.dg/goacc/routine-level-of-parallelism-1.f90: Likewise.
* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
* gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise.
* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Specify
'-Wopenacc-parallelism', and match diagnostics, as appropriate.
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/static-variable-1.c:
Likewise.
* testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise.
* testsuite/libgomp.oacc-fortran/par-reduction-2-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
* testsuite/libgomp.oacc-fortran/pr84028.f90: Likewise.
* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
* testsuite/libgomp.oacc-fortran/reduction-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.
Co-Authored-By: Nathan Sidwell <nathan@codesourcery.com>
Co-Authored-By: Tom de Vries <vries@codesourcery.com>
Co-Authored-By: Julian Brown <julian@codesourcery.com>
Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
|
|
This moves the legacy gimplify_buildN API to tree-vect-generic.c,
its only user and elides the gimplification step, making it a wrapper
around gimple_build, adjusting tree_vec_extract for this.
I've noticed that vector CTOR expansion doesn't deal with unfolded
{} and thus this makes it more resilent. I've also adjusted the
match.pd vector CTOR extraction code to make sure it doesn't
produce a CTOR when folding would make it a vector constant.
2021-04-15 Richard Biener <rguenther@suse.de>
* tree-cfg.h (gimplify_build1): Remove.
(gimplify_build2): Likewise.
(gimplify_build3): Likewise.
* tree-cfg.c (gimplify_build1): Move to tree-vect-generic.c.
(gimplify_build2): Likewise.
(gimplify_build3): Likewise.
* tree-vect-generic.c (gimplify_build1): Move from tree-cfg.c.
Modernize.
(gimplify_build2): Likewise.
(gimplify_build3): Likewise.
(tree_vec_extract): Use resimplify with following SSA edges.
(expand_vector_parallel): Avoid passing NULL size/bitpos
to tree_vec_extract.
* expr.c (store_constructor): Deal with zero-element CTORs.
* match.pd (bit_field_ref <vector CTOR>): Make sure to
produce vector constants when possible.
|
|
This removes the legacy gimplify_buildN API use from complex lowering.
2021-04-15 Richard Biener <rguenther@suse.de>
* tree-complex.c: Include gimple-fold.h.
(expand_complex_addition): Use gimple_build.
(expand_complex_multiplication_components): Likewise.
(expand_complex_multiplication): Likewise.
(expand_complex_div_straight): Likewise.
(expand_complex_div_wide): Likewise.
(expand_complex_division): Likewise.
(expand_complex_conjugate): Likewise.
(expand_complex_comparison): Likewise.
|
|
This removes use of the legacy gimplify_buildN API from phiopt.
2021-04-15 Richard Biener <rguenther@suse.de>
* tree-ssa-phiopt.c (two_value_replacement): Remove use
of legacy gimplify_buildN API.
|
|
This fixes the pre-condition on cselim to include all references
and decls when they end up as auto-var.
Bootstrapped/tested on x86_64-linux
2021-03-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/99473
* tree-ssa-phiopt.c (cond_store_replacement): Handle all
stores.
* gcc.dg/tree-ssa/pr99473-1.c: New testcase.
|
|
This removes update_call_from_tree in favor of
gimplify_and_update_call_from_tree, removing some code duplication
and simplifying the API use. Some users of update_call_from_tree
have been transitioned to replace_call_with_value and the API
and its dependences have been moved to gimple-fold.h.
This shaves off another user of valid_gimple_rhs_p which is now
only used from within gimple-fold.c and thus moved and made private.
2021-04-14 Richard Biener <rguenther@suse.de>
* tree-ssa-propagate.h (valid_gimple_rhs_p): Remove.
(update_gimple_call): Likewise.
(update_call_from_tree): Likewise.
* tree-ssa-propagate.c (valid_gimple_rhs_p): Remove.
(valid_gimple_call_p): Likewise.
(move_ssa_defining_stmt_for_defs): Likewise.
(finish_update_gimple_call): Likewise.
(update_gimple_call): Likewise.
(update_call_from_tree): Likewise.
(propagate_tree_value_into_stmt): Use replace_call_with_value.
* gimple-fold.h (update_gimple_call): Declare.
* gimple-fold.c (valid_gimple_rhs_p): Move here from
tree-ssa-propagate.c.
(update_gimple_call): Likewise.
(valid_gimple_call_p): Likewise.
(finish_update_gimple_call): Likewise, and simplify.
(gimplify_and_update_call_from_tree): Implement
update_call_from_tree functionality, avoid excessive
push/pop_gimplify_context.
(gimple_fold_builtin): Use only gimplify_and_update_call_from_tree.
(gimple_fold_call): Likewise.
* gimple-ssa-sprintf.c (try_substitute_return_value): Likewise.
* tree-ssa-ccp.c (ccp_folder::fold_stmt): Likewise.
(pass_fold_builtins::execute): Likewise.
(optimize_stack_restore): Use replace_call_with_value.
* tree-cfg.c (fold_loop_internal_call): Likewise.
* tree-ssa-dce.c (maybe_optimize_arith_overflow): Use
only gimplify_and_update_call_from_tree.
* tree-ssa-strlen.c (handle_builtin_strlen): Likewise.
(handle_builtin_strchr): Likewise.
* tsan.c: Include gimple-fold.h instead of tree-ssa-propagate.h.
* config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
Use replace_call_with_value.
|
|
register keyword was removed in C++17, and in vmsdbgout.c it served no
useful purpose.
2021-04-26 Jakub Jelinek <jakub@redhat.com>
PR debug/100255
* vmsdbgout.c (ASM_OUTPUT_DEBUG_STRING, vmsdbgout_begin_block,
vmsdbgout_end_block, lookup_filename, vmsdbgout_source_line): Remove
register keywords.
|
|
|
|
target/98911]
gcc/ChangeLog:
PR target/98911
* config/i386/i386-builtin.def (BDESC): Change the icode of
the following builtins to CODE_FOR_nothing.
* config/i386/i386.c (ix86_gimple_fold_builtin): Fold
IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128,
IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ,
IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256,
IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256,
IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128,
IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ,
IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256,
IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256.
* config/i386/sse.md (avx2_eq<mode>3): Deleted.
(sse2_eq<mode>3): Ditto.
(sse4_1_eqv2di3): Ditto.
(sse2_gt<mode>3): Rename to ..
(*sse2_gt<mode>3): .. this.
gcc/testsuite/ChangeLog:
PR target/98911
* gcc.target/i386/pr98911.c: New test.
* gcc.target/i386/funcspec-8.c: Replace __builtin_ia32_pcmpgtq
with __builtin_ia32_pcmpistrm128 since it has been folded.
|
|
|
|
PR analyzer/100244 reports an ICE on a -Wanalyzer-free-of-non-heap
due to a case where free_of_non_heap::describe_state_change can be
passed a NULL change.m_expr for a suitably complicated symbolic value.
Bulletproof it by checking for change.m_expr being NULL before
dereferencing it.
gcc/analyzer/ChangeLog:
PR analyzer/100244
* sm-malloc.cc (free_of_non_heap::describe_state_change):
Bulletproof against change.m_expr being NULL.
gcc/testsuite/ChangeLog:
PR analyzer/100244
* g++.dg/analyzer/pr100244.C: New test.
|