aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2021-04-27Fix handling of VEC_COND_EXPR trap tests [PR100284]Richard Sandiford3-6/+13
Now that VEC_COND_EXPR has normal unnested operands, operation_could_trap_p can treat it like any other expression. This fixes many testsuite ICEs for SVE, but it turns out that none of the tests in gcc.target/aarch64/sve were affected. Anyone testing on non-SVE aarch64 therefore wouldn't have seen it. gcc/ PR middle-end/100284 * gimple.c (gimple_could_trap_p_1): Remove VEC_COND_EXPR test. * tree-eh.c (operation_could_trap_p): Handle VEC_COND_EXPR rather than asserting on it. gcc/testsuite/ PR middle-end/100284 * gcc.target/aarch64/sve/pr81003.c: New test.
2021-04-27Remove malformed dg-warning directives.Martin Sebor1-14/+10
gcc/testsuite/ChangeLog: PR testsuite/100272 * g++.dg/ext/flexary13.C: Remove malformed directives.
2021-04-27powerpc: fix bootstrap.David Edelsohn1-0/+2
gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_aix_precompute_tls_p): Protect with TARGET_AIX_OS.
2021-04-27aix: TLS precompute register parameters (PR 94177)David Edelsohn6-2/+40
AIX uses a compiler-managed TOC for global data, including TLS symbols. The GCC TOC implementation manages the TOC entries through the constant pool. TLS symbols sometimes require a function call to obtain the TLS base pointer. The arguments to the TLS call can conflict with arguments to a normal function call if the TLS symbol is an argument in the normal call. GCC specifically checks for this situation and precomputes the TLS arguments, but the mechanism to check for this requirement utilizes legitimate_constant_p(). The necessary result of legitimate_constant_p() for correct TOC behavior and for correct TLS argument behavior is in conflict. This patch adds a new target hook precompute_tls_p() to decide if an argument should be precomputed regardless of the result from legitmate_constant_p(). gcc/ChangeLog: PR target/94177 * calls.c (precompute_register_parameters): Additionally test targetm.precompute_tls_p to pre-compute argument. * config/rs6000/aix.h (TARGET_PRECOMPUTE_TLS_P): Define. * config/rs6000/rs6000.c (rs6000_aix_precompute_tls_p): New. * target.def (precompute_tls_p): New. * doc/tm.texi.in (TARGET_PRECOMPUTE_TLS_P): Add hook documentation. * doc/tm.texi: Regenerated.
2021-04-27aarch64: Fix up last commit [PR100200]Jakub Jelinek1-1/+1
Pedantically signed vs. unsigned mismatches in va_arg are only well defined if the value can be represented in both signed and unsigned integer types. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR target/100200 * config/aarch64/aarch64.c (aarch64_print_operand): Cast -UINTVAL back to HOST_WIDE_INT.
2021-04-27Fix target/100106 ICE in gen_movdiBernd Edlinger2-0/+12
As the test case shows, the outer mode may have a higher alignment requirement than the inner mode here. 2021-04-27 Bernd Edlinger <bernd.edlinger@hotmail.de> PR target/100106 * simplify-rtx.c (simplify_context::simplify_subreg): Check the memory alignment for the outer mode. * gcc.c-torture/compile/pr100106.c: New testcase.
2021-04-27op_by_pieces_d::run: Change a while loop to a do-while loopH.J. Lu1-23/+53
Change a while loop in op_by_pieces_d::run to a do-while loop to prepare for offset adjusted operation for the remaining bytes on the last piece operation of a memory region. PR middle-end/90773 * expr.c (op_by_pieces_d::get_usable_mode): New member function. (op_by_pieces_d::run): Cange a while loop to a do-while loop.
2021-04-27arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]Alex Coplan4-18/+57
The PR shows two ICEs with __sync_bool_compare_and_swap and -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and one later on, after the CAS insn is split. The LRA ICE occurs because the @atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern attempts to tie two output operands together (operands 0 and 1 in the third alternative). LRA can't handle this, since it doesn't make sense for an insn to assign to the same operand twice. The later (post-splitting) ICE occurs because the expansion of the cbranchsi4_scratch insn doesn't quite go according to plan. As it stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch, attempting to pass a register (neg_bval) to use as a scratch register. However, since the RTL template has a match_scratch here, gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx. Since this is all happening after RA, this is doomed to fail (and we get an ICE about the insn not matching its constraints). It seems that the motivation for the choice of constraints in the atomic_compare_and_swap pattern comes from an attempt to satisfy the constraints of the cbranchsi4_scratch insn. This insn requires the scratch register to be the same as the input register in the case that we use a larger negative immediate (one that satisfies J, but not L). Of course, as noted above, LRA refuses to assign two output operands to the same register, so this was never going to work. The solution I'm proposing here is to collapse the alternatives to the CAS insn (allowing the two output register operands to be matched to different registers) and to ensure that the constraints for cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this by inserting a move to ensure the source and destination registers match if necessary (i.e. in the case of large negative immediates). Another notable change here is that we only do: emit_move_insn (neg_bval, const1_rtx); for non-negative immediates. This is because the ADDS instruction used in the negative case suffices to leave a suitable value in neg_bval: if the operands compare equal, we don't take the branch (so neg_bval will be set by the load exclusive). Otherwise, the ADDS will leave a nonzero value in neg_bval, which will correctly signal that the CAS has failed when it is later negated. gcc/ChangeLog: PR target/99977 * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen with negative immediates: ensure we expand cbranchsi4_scratch correctly and ensure we satisfy its constraints. * config/arm/sync.md (@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): Don't attempt to tie two output operands together with constraints; collapse two alternatives. (@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise. * config/arm/thumb1.md (cbranchsi4_neg_late): New. gcc/testsuite/ChangeLog: PR target/99977 * gcc.target/arm/pr99977.c: New test.
2021-04-27aarch64: Fix UB in the compiler [PR100200]Jakub Jelinek3-7/+8
The following patch fixes UBs in the compiler when negativing a CONST_INT containing HOST_WIDE_INT_MIN. I've changed the spots where there wasn't an obvious earlier condition check or predicate that would fail for such CONST_INTs. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR target/100200 * config/aarch64/predicates.md (aarch64_sub_immediate, aarch64_plus_immediate): Use -UINTVAL instead of -INTVAL. * config/aarch64/aarch64.md (casesi, rotl<mode>3): Likewise. * config/aarch64/aarch64.c (aarch64_print_operand, aarch64_split_atomic_op, aarch64_expand_subvti): Likewise.
2021-04-27veclower: Fix up vec_shl matching of VEC_PERM_EXPR [PR100239]Jakub Jelinek2-1/+13
The following testcase ICEs at -O0, because lower_vec_perm sees the _1 = { 0, 0, 0, 0, 0, 0, 0, 0 }; _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>; _3 = { 6, 0, 0, 0, 0, 0, 0, 0 }; _4 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _2, _3>; and as the ISA is SSE2, there is no support for the particular permutation nor for variable mask permutation. But, the code to match vec_shl matches it, because the permutation has the first operand a zero vector and the mask picks all elements randomly from that vector. So, in the end that isn't a vec_shl, but the permutation could be in theory optimized into the first argument. As we keep it as is, it will fail during expansion though, because that for vec_shl correctly requires that it actually is a shift: unsigned firstidx = 0; for (unsigned int i = 0; i < nelt; i++) { if (known_eq (sel[i], nelt)) { if (i == 0 || firstidx) return NULL_RTX; firstidx = i; } else if (firstidx ? maybe_ne (sel[i], nelt + i - firstidx) : maybe_ge (sel[i], nelt)) return NULL_RTX; } if (firstidx == 0) return NULL_RTX; first = firstidx; The if (firstidx == 0) return NULL; is what is missing a counterpart on the lower_vec_perm side. As with optimize != 0 we fold it in other spots, I think it is not needed to optimize this cornercase in lower_vec_perm (which would mean we'd need to recurse on the newly created _4 = { 0, 0, 0, 0, 0, 0, 0, 0 }; whether it is supported or not). 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/100239 * tree-vect-generic.c (lower_vec_perm): Don't accept constant permutations with all indices from the first zero element as vec_shl. * gcc.dg/pr100239.c: New test.
2021-04-27cfgcleanup: Fix -fcompare-debug issue in outgoing_edges_match [PR100254]Jakub Jelinek2-2/+103
The following testcase fails with -fcompare-debug. The problem is that outgoing_edges_match behaves differently between -g0 and -g, if some load/store with REG_EH_REGION is followed by DEBUG_INSNs, the REG_EH_REGION check is not done, while when there are no DEBUG_INSNs, it is done. We already compute last1 and last2 as BB_END (bb{1,2}) with skipped debug insns and notes, so this patch just uses those. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/100254 * cfgcleanup.c (outgoing_edges_match): Check REG_EH_REGION on last1 and last2 insns rather than BB_END (bb1) and BB_END (bb2) insns. * g++.dg/opt/pr100254.C: New test.
2021-04-27tree-optimization/99912 - schedule another TODO_remove_unused_localsRichard Biener2-1/+4
This makes sure to remove unused locals and prune CLOBBERs after the first scalar cleanup phase after IPA optimizations. On the testcase in the PR this results in 8000 CLOBBERs removed which in turn unleashes more DSE which otherwise hits its walking limit of 256 too early on this testcase. 2021-04-27 Richard Biener <rguenther@suse.de> PR tree-optimization/99912 * passes.def: Add comment about new TODO_remove_unused_locals. * tree-stdarg.c (pass_data_stdarg): Run TODO_remove_unused_locals at start.
2021-04-27tree-optimization/99912 - schedule DSE before SRARichard Biener5-5/+11
For the testcase in the PR the main SRA pass is unable to do some important scalarizations because dead stores of addresses make the candiate variables disqualified. The following patch adds another DSE pass before SRA forming a DCE/DSE pair and moves the DSE pass that is currently closely after SRA up to after the next DCE pass, forming another DCE/DSE pair now residing after PRE. 2021-04-07 Richard Biener <rguenther@suse.de> PR tree-optimization/99912 * passes.def (pass_all_optimizations): Add pass_dse before the first pass_dce, move the first pass_dse before the pass_dce following pass_pre. * gcc.dg/tree-ssa/ldist-33.c: Disable PRE and LIM. * gcc.dg/tree-ssa/pr96789.c: Adjust dump file scanned. * gcc.dg/tree-ssa/ssa-dse-28.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-29.c: Likewise.
2021-04-27match.pd: Add some __builtin_ctz (x) cmp cst simplifications [PR95527]Jakub Jelinek4-12/+138
This patch adds some ctz simplifications (e.g. ctz (x) >= 3 can be done by testing if the low 3 bits are zero, etc.). In addition, I've noticed that in the CLZ case, the #ifdef CLZ_DEFINED_VALUE_AT_ZERO don't really work as intended, they are evaluated during genmatch and the macro is not defined then (but, because of the missing tm.h includes it isn't defined in gimple-match.c or generic-match.c either). And when tm.h is included, defaults.h is included which defines a fallback version of that macro. For GCC 12, I wonder if it wouldn't be better to say in addition to __builtin_c[lt]z* is always UB at zero that it would be undefined for .C[LT]Z ifn too if it has just one operand and use a second operand to be the constant we expect at zero. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/95527 * generic-match-head.c: Include tm.h. * gimple-match-head.c: Include tm.h. * match.pd (CLZ == INTEGER_CST): Don't use #ifdef CLZ_DEFINED_VALUE_AT_ZERO, only test CLZ_DEFINED_VALUE_AT_ZERO if clz == CFN_CLZ. Add missing val declaration. (CTZ cmp CST): New simplifications. * gcc.dg/tree-ssa/pr95527-2.c: New test.
2021-04-27expand: Expand x / y * y as x - x % y if the latter is cheaper [PR96696]Jakub Jelinek2-58/+162
The following patch tests both x / y * y and x - x % y expansion for the former GIMPLE code and chooses the cheaper of those sequences. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/96696 * expr.c (expand_expr_divmod): New function. (expand_expr_real_2) <case TRUNC_DIV_EXPR>: Use it for truncations and divisions. Formatting fixes. <case MULT_EXPR>: Optimize x / y * y as x - x % y if the latter is cheaper. * gcc.target/i386/pr96696.c: New test.
2021-04-27ipa-sra: Release dead LHS SSA_NAME when removing it (PR 99951)Martin Jambor1-0/+4
When IPA-SRA removes an SSA_NAME from a LHS of a call statement because it is not necessary, it does not release it. This patch fixes that. gcc/ChangeLog: 2021-04-08 Martin Jambor <mjambor@suse.cz> PR ipa/99951 * ipa-param-manipulation.c (ipa_param_adjustments::modify_call): If removing a call statement LHS SSA name, release it.
2021-04-27arm: fix UB when compiling thumb2 with PIC [PR100236]Richard Earnshaw1-3/+7
arm_compute_save_core_reg_mask contains UB in that the saved PIC register number is used to create a bit mask. However, for some target options this register is undefined and we end up with a shift of ~0. On native compilations this is benign since the shift will still be large enough to move the bit outside of the range of the mask, but if cross compiling from a system that truncates out-of-range shifts to zero (or worse, raises a trap for such values) we'll get potentially wrong code (or a fault). gcc: PR target/100236 * config/arm/arm.c (THUMB2_WORK_REGS): Check PIC_OFFSET_TABLE_REGNUM is valid before including it in the mask.
2021-04-27aarch64: Handle SVE attributes in comp_type_attributes [PR100270]Richard Sandiford4-49/+166
Even though "SVE type" and "SVE sizeless type" are marked as affecting type identity, the middle end doesn't truly believe it unless we also handle them in comp_type_attributes. gcc/ PR target/100270 * config/aarch64/aarch64.c (aarch64_comp_type_attributes): Handle SVE attributes. gcc/testsuite/ PR target/100270 * gcc.target/aarch64/sve/acle/general-c/pr100270_1.c: New test. * gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Change expected error message when subtracting pointers to different vector types. Expect warnings when mixing them elsewhere. * gcc.target/aarch64/sve/acle/general/attributes_7.c: Remove XFAILs. Tweak error messages for some cases.
2021-04-27aarch64: Add +nosve to two testsRichard Sandiford2-2/+4
Adding +nosve is more robust than checking for command-line arguments, since SVE can be enabled by default or indirectly via other options. gcc/testsuite/ * gcc.target/aarch64/simd/ssra.c: Use +nosve * gcc.target/aarch64/simd/usra.c: Likewise.
2021-04-27tree-optimization/100051 - disambiguate access size vs declRichard Biener2-0/+32
This adds disambiguation of the access size vs. the decl size in the pointer based vs. decl based disambiguator. We have a TBAA based check like this already but that's fend off when seeing alias-sets of zero or when -fno-strict-aliasing is in effect. Also the perceived dynamic type could be smaller than the actual access. 2021-04-13 Richard Biener <rguenther@suse.de> PR tree-optimization/100051 * tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Add disambiguator based on access size vs. decl size. * gcc.dg/tree-ssa/ssa-fre-92.c: New testcase.
2021-04-27testsuite/100272 - undo PRE disabling for gcc.dg/tree-ssa/predcom-1.cRichard Biener1-4/+3
This re-enables PRE and fixes the malformed dg directive pointed out in the PR. It all works as desired and I forgot why I disabled this in the past. 2021-04-27 Richard Biener <rguenther@suse.de> PR testsuite/100272 * gcc.dg/tree-ssa/predcom-1.c: Re-enable PRE and fix malformed dg directive.
2021-04-27testsuite/100272 - fix some malformed dg directivesRichard Biener5-7/+7
The bug points out several malformed dg directives, the following fixes the obvious ones where the testcases keep working after the change. 2021-04-27 Richard Biener <rguenther@suse.de> PR testsuite/100272 * g++.dg/diagnostic/ptrtomem1.C: Fix dg directives. * g++.dg/ipa/pr45572-2.C: Likewise. * g++.dg/template/spec26.C: Likewise. * gcc.dg/pr20126.c: Likewise. * gcc.dg/tree-ssa/pr20739.c: Likewise.
2021-04-27tree-optimization/100278 - handle mismatched code in TBAA adjust of PRERichard Biener2-0/+27
PRE has code to adjust TBAA behavior for refs that expects the base operation code to match. The testcase shows a case where we have a VAR_DECL vs. a MEM_REF so add code to give up in such cases. 2021-04-27 Richard Biener <rguenther@suse.de> PR tree-optimization/100278 * tree-ssa-pre.c (compute_avail): Give up when we cannot adjust TBAA beacuse of mismatching bases. * gcc.dg/tree-ssa/pr100278.c: New testcase.
2021-04-27i386: Improve [QH]Imode rotates with masked shift count [PR99405]Jakub Jelinek2-19/+42
The following testcase shows that while we nicely optimize away the useless and? of shift count before rotation for [SD]Imode rotates, we don't do that for [QH]Imode. The following patch optimizes that by using the right iterator on those 4 patterns. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR target/99405 * config/i386/i386.md (*<insn><mode>3_mask, *<insn><mode>3_mask_1): For any_rotate define_insn_split and following splitters, use SWI iterator instead of SWI48. * gcc.target/i386/pr99405.c: New test.
2021-04-27tree-optimization/99776 - relax condition on vector ctor element extractRichard Biener2-5/+30
This relaxes the condition for the match.pd pattern doing vector ctor element extracts to not require type identity but only size equality. Since we vectorize pointer data as unsigned integer data such mismatches have to be tolerated to optimize scalar code uses of vector results. 2021-03-26 Richard Biener <rguenther@suse.de> PR tree-optimization/99776 * match.pd (bit_field_ref (ctor)): Relax element extract type compatibility checks. * gcc.dg/tree-ssa/ssa-fre-91.c: New testcase.
2021-04-27Synchronize Rocket Lake's processor_names and processor_cost_table with ↵Cui,Lili2-2/+2
processor_type gcc/ChangeLog * common/config/i386/i386-common.c (processor_names): Sync processor_names with processor_type. * config/i386/i386-options.c (processor_cost_table): Sync processor_cost_table with processor_type.
2021-04-27Daily bump.GCC Administrator6-1/+279
2021-04-26c++: constexpr pointer indirection with negative offset [PR100209]Patrick Palka3-3/+68
During constexpr evaluation, a base-to-derived conversion may yield an expression like (Derived*)(&D.2217.D.2106 p+ -4) where D.2217 is the derived object and D.2106 is the base. But cxx_fold_indirect_ref doesn't know how to resolve an INDIRECT_REF thereof to just D.2217, because it doesn't handle POINTER_PLUS_EXPR of a COMPONENT_REF with negative offset well: when the offset N is positive, it knows that '&x p+ N' is equivalent to '&x.f p+ (N - bytepos(f))', but it doesn't know about the reverse transformation, that '&x.f p+ N' is equivalent to '&x p+ (N + bytepos(f))' when N is negative, which is important for resolving such base-to-derived conversions and for accessing subobjects backwards. This patch teaches cxx_fold_indirect_ref this reverse transformation. gcc/cp/ChangeLog: PR c++/100209 * constexpr.c (cxx_fold_indirect_ref): Try to canonicalize the object/offset pair for a POINTER_PLUS_EXPR of a COMPONENT_REF with a negative offset into one whose offset is nonnegative before calling cxx_fold_indirect_ref_1. gcc/testsuite/ChangeLog: PR c++/100209 * g++.dg/cpp1y/constexpr-base1.C: New test. * g++.dg/cpp1y/constexpr-ptrsub1.C: New test.
2021-04-26OpenACC: Fix pattern in dg-bogus in Fortran testcases againTobias Burnus3-5/+5
It turned out that a compiler built without offloading support and one with can produce slightly different diagnostic. Offloading support implies ENABLE_OFFLOAD which implies that g->have_offload is set when offloading is actually needed. In cgraphunit.c, the latter causes flag_generate_offload = 1, which in turn affects tree.c's free_lang_data. The result is that the front-end specific diagnostic gets reset ('tree_diagnostics_defaults (global_dc)'), which affects in this case 'Warning' vs. 'warning' via the Fortran frontend. Result: 'Warning:' vs. 'warning:'. Side note: Other FE also override the diagnostic, leading to similar differences, e.g. the C++ FE outputs mangled function names differently, cf. patch thread. libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/par-reduction-2-1.f: Use [Ww]arning in dg-bogus as FE diagnostic and default diagnostic differ and the result depends on ENABLE_OFFLOAD. * testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/classify-serial.f95: Use [Ww]arning in dg-bogus as FE diagnostic and default diagnostic differ and the result depends on ENABLE_OFFLOAD. * gfortran.dg/goacc/kernels-decompose-2.f95: Likewise. * gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
2021-04-26Handle anti-ranges of MIN,MAX uniformly.Aldy Hernandez1-16/+10
The -fstrict-enums comment in the VR_ANTI_RANGE handling code is out of date, as out-of-range ranges have already been handled by this time. I've removed it. Furthermore, optimizing ~[MIN,MAX] as VARYING instead of UNDEFINED is an old idiom. I've been wanting to change it for a while, but have only remembered late in the release cycle when it was too risky. What I've chosen to do in this case is fall through to the code that normalizes the range. This will correctly turn ~[MIN,MAX] into UNDEFINED, yet leaving things alone in the case of -fstrict-enums where [MIN,MAX] may not necessarily include the entire range of the underlying precision. For example, if the domain of a strict enum is [3,5] setting a VR_ANTI_RANGE of ~[3,5] should not cause neither VR_UNDEFINED nor VR_VARYING, but just plain ~[3,5]. This is similar to what we do for -fstrict-enums when we set a range of [3,5]. We leave it as a VR_RANGE, instead of upgrading it to VR_VARYING. gcc/ChangeLog: * value-range.cc (irange::irange_set_1bit_anti_range): Add assert. (irange::set): Call irange_set_1bit_anti_range for handling all 1-bit ranges. Fall through on ~[MIN,MAX].
2021-04-26OpenACC: Fix pattern in dg-bogus in Fortran testcasesTobias Burnus3-5/+5
libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/par-reduction-2-1.f: Correct spelling in dg-bogus to match -Wopenacc-parallelism. * testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/goacc/classify-serial.f95: Correct spelling in dg-bogus to match -Wopenacc-parallelism. * gfortran.dg/goacc/kernels-decompose-2.f95: Likewise. * gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
2021-04-26Cache irange::num_pairs() for non-legacy code.Aldy Hernandez2-39/+9
This does for num_pairs() what my previous patch did for VR_UNDEFINED and VR_VARYING. Note that VR_ANTI_RANGE for legacy is always set to 2 ranges. There is only one way of representing a range, so a range that can be represented as a VR_RANGE will never have a kind of VR_ANTI_RANGE. Also legacy symbolics can also use VR_ANTI_RANGE, but no one will ever ask for the bounds of such range, so m_num_ranges is irrelevant. gcc/ChangeLog: * value-range.cc (irange::legacy_num_pairs): Remove. (irange::invert): Change gcc_assert to gcc_checking_assert. * value-range.h (irange::num_pairs): Adjust for a cached num_pairs(). Also, rename all gcc_assert's to gcc_checking_assert's.
2021-04-26Keep VR_UNDEFINED and VR_VARYING in sync (speeds up evrp by 8.47%).Aldy Hernandez2-72/+63
Currently multi-ranges calculate the undefined and varying bits on the fly, whereas legacy uses the m_kind field. Since we will always have space in the irange class for a kind field, might as well keep it in sync as ranges are created, thus speeding up lookups. This patch, along with an upcoming ones for num_pairs(), speeds up EVRP by 8.47%, VRP proper by 1.84% and overall compilation by 0.24%. FWIW, since evrp is such a fast pass, and is hard to measure clock-wise, we've been using callgrind to estimate improvements. This has coincided more or less with -ftime-report numbers (albeit having to run -ftime-report half a dozen times and use the average). gcc/ChangeLog: * value-range.cc (irange::operator=): Set m_kind. (irange::copy_to_legacy): Handle varying and undefined sources as a legacy copy since they can be easily copied. (irange::irange_set): Set m_kind. (irange::irange_set_anti_range): Same. (irange::set): Rename normalize_min_max to normalize_kind. (irange::verify_range): Adjust for multi-ranges having the m_kind field set. (irange::irange_union): Set m_kind. (irange::irange_intersect): Same. (irange::invert): Same. * value-range.h (irange::kind): Always return m_kind. (irange::varying_p): Rename to... (irange::varying_comptaible_p): ...this. (irange::undefined_p): Only look at m_kind. (irange::irange): Always set VR_UNDEFINED if applicable. (irange::set_undefined): Always set VR_UNDEFINED. (irange::set_varying): Always set m_kind to VR_VARYING. (irange::normalize_min_max): Rename to... (irange::normalize_kind): ...this.
2021-04-26Remove irange::varying_p checks from symbolic_p and constant_p.Aldy Hernandez4-12/+7
As of a few releases ago, varying_p() ranges are also constant_p. Consequently, there is no need to check varying_p from either symbolic_p or constant_p. I have adjusted a few users of constant_p that were depending on constant_p returning false for varying_p. In these cases, I have placed the varying_p check before the constant_p check to avoid the more expensive constant_p check when possible. gcc/ChangeLog: * gimple-ssa-evrp-analyze.c (evrp_range_analyzer::set_ssa_range_info): Adjust for constant_p including varying_p. * tree-vrp.c (vrp_prop::finalize): Same. (determine_value_range): Same. * vr-values.c (vr_values::range_of_expr): Same. * value-range.cc (irange::symbolic_p): Do not check varying_p. (irange::constant_p): Same.
2021-04-26Replace !irange::undefined_p checks with num_ranges > 0 for readability.Aldy Hernandez2-5/+5
A few of the undefined_p checks in the irange code are really checking if there are sub-ranges. It just so happens that undefined_p is implemented with num_ranges > 0, so it was a shorthand used throughout. This shorthand was making the code unreadable. gcc/ChangeLog: * value-range.cc (irange::legacy_lower_bound): Replace !undefined_p check with num_ranges > 0. (irange::legacy_upper_bound): Same. * value-range.h (irange::type): Same. (irange::lower_bound): Same. (irange::upper_bound): Same.
2021-04-26tree-optimization/99956 - improve loop interchangeRichard Biener2-28/+85
When we apply store motion and DSE manually to the bwaves kernel in gfortran.dg/pr81303.f loop interchange no longer happens because the perfect nest considered covers outer loops we cannot analyze strides for. The following compensates for this by shrinking the nest in this analysis which was already possible but on a too coarse granularity. It shares the shrinked nest with the rest of the DRs so the complexity overhead should be negligible. 2021-04-07 Richard Biener <rguenther@suse.de> PR tree-optimization/99956 * gimple-loop-interchange.cc (compute_access_stride): Try instantiating the access in a shallower loop nest if instantiating failed. (compute_access_strides): Pass adjustable loop_nest to compute_access_stride. * gfortran.dg/pr99956.f: New testcase.
2021-04-26testsuite/arm: Add arm_cmse_hw effective targetChristophe Lyon8-6/+33
Some of the CMSE tests have 'dg-do run', but qemu-arm does not support the privileged instructions involved; one has to use qemu-system-arm for this, which in turn requires modifications to the default newlib/libgloss startup code to enable the FPU as the FP status registers need to be saved when using CMSE code. This patch introduces arm_cmse_hw, similar to arm_neon_hw, to detect whether the execution engine supports the CMSE instructions. If not, we set dg-do-what-default to assemble instead of run. We thus remove all the 'dg-do run' directives from CMSE tests, to rely on dg-do-what-default instead. Note that cmse-16.c used to pass with dg-do run under qemu-arm, because the property being tested is not available (qemu-arm does not model secure vs non-secure memory). The patch removes dg-do from it too, since it is relevant only with an adequate simulator. Before the patch, bitfield-[123].c and struct-1.c fail at execution under qemu-arm. With the patch, execution is skipped. The same tests pass under qemu-system-arm both with and without the patch. This avoids failures when testing with -mthumb/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp under qemu-arm for cortex-m33. I'm also running tests with qemu-system-arm for cortex-m33, but I run only cmse.exp with a patched newlib in this case: I use qemu-arm for all combinations except that one because it's faster and supports semihosting. I do not have a setup to check this with actual hardware or another simulator. 2021-04-26 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * doc/sourcebuild.texi (arm_cmse_hw): Document. gcc/testsuite/ * gcc.target/arm/cmse/bitfield-1.c: Remove dg-do. * gcc.target/arm/cmse/bitfield-2.c: Likewise. * gcc.target/arm/cmse/bitfield-3.c: Likewise. * gcc.target/arm/cmse/cmse-16.c: Likewise. * gcc.target/arm/cmse/struct-1.c: Likewise. * gcc.target/arm/cmse/cmse.exp: Set dg-do-what-default depending on arm_cmse_hw. * lib/target-supports.exp (check_effective_target_arm_cmse_hw): New.
2021-04-26aarch64: Handle V4BF V8BF modes in vwcore attributeKyrylo Tkachov1-0/+1
While playing with other unrelated changes I hit an assemble-failure bug where a pattern (one of the get_lane ones) that was using V4BF, V8BF as part of a mode iterator and outputting registers with the vwcore attribute, but there is no vwcore mapping for V4BF and V8BF. This patch fixes that in the obvious way by adding the missing mappings Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/iterators.md (vwcore): Handle V4BF, V8BF.
2021-04-26Add XFAIL for gcc.dg/pr84877.c on the SPARCEric Botcazou1-1/+1
The maximum supported alignment is 64-bit on 32-bit mode. gcc/testsuite/ * gcc.dg/pr84877.c: XFAIL on SPARC as well.
2021-04-26Add '-Wopenacc-parallelism'Thomas Schwinge32-1/+302
... to diagnose potentially suboptimal choices regarding OpenACC parallelism. Not enabled by default: too noisy ("*potentially* suboptimal choices"); see XFAILed 'dg-bogus'es. gcc/c-family/ * c.opt (Wopenacc-parallelism): New. gcc/fortran/ * lang.opt (Wopenacc-parallelism): New. gcc/ * omp-offload.c (oacc_validate_dims): Implement '-Wopenacc-parallelism'. * doc/invoke.texi (-Wopenacc-parallelism): Document. gcc/testsuite/ * c-c++-common/goacc/diag-parallelism-1.c: New. * c-c++-common/goacc/acc-icf.c: Specify '-Wopenacc-parallelism', and match diagnostics, as appropriate. * c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/classify-parallel.c: Likewise. * c-c++-common/goacc/classify-routine.c: Likewise. * c-c++-common/goacc/classify-serial.c: Likewise. * c-c++-common/goacc/kernels-decompose-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Likewise. * c-c++-common/goacc/parallel-dims-1.c: Likewise. * c-c++-common/goacc/parallel-reduction.c: Likewise. * c-c++-common/goacc/pr70688.c: Likewise. * c-c++-common/goacc/routine-1.c: Likewise. * c-c++-common/goacc/routine-level-of-parallelism-2.c: Likewise. * c-c++-common/goacc/uninit-dim-clause.c: Likewise. * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/classify-parallel.f95: Likewise. * gfortran.dg/goacc/classify-routine.f95: Likewise. * gfortran.dg/goacc/classify-serial.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-1.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Likewise. * gfortran.dg/goacc/parallel-tree.f95: Likewise. * gfortran.dg/goacc/routine-4.f90: Likewise. * gfortran.dg/goacc/routine-level-of-parallelism-1.f90: Likewise. * gfortran.dg/goacc/routine-module-mod-1.f90: Likewise. * gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise. * gfortran.dg/goacc/uninit-dim-clause.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Specify '-Wopenacc-parallelism', and match diagnostics, as appropriate. * testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/private-variables.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/static-variable-1.c: Likewise. * testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise. * testsuite/libgomp.oacc-fortran/par-reduction-2-1.f: Likewise. * testsuite/libgomp.oacc-fortran/par-reduction-2-2.f: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise. * testsuite/libgomp.oacc-fortran/pr84028.f90: Likewise. * testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise. * testsuite/libgomp.oacc-fortran/reduction-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise. Co-Authored-By: Nathan Sidwell <nathan@codesourcery.com> Co-Authored-By: Tom de Vries <vries@codesourcery.com> Co-Authored-By: Julian Brown <julian@codesourcery.com> Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
2021-04-26Move gimplify_buildN API local to only remaining userRichard Biener5-74/+84
This moves the legacy gimplify_buildN API to tree-vect-generic.c, its only user and elides the gimplification step, making it a wrapper around gimple_build, adjusting tree_vec_extract for this. I've noticed that vector CTOR expansion doesn't deal with unfolded {} and thus this makes it more resilent. I've also adjusted the match.pd vector CTOR extraction code to make sure it doesn't produce a CTOR when folding would make it a vector constant. 2021-04-15 Richard Biener <rguenther@suse.de> * tree-cfg.h (gimplify_build1): Remove. (gimplify_build2): Likewise. (gimplify_build3): Likewise. * tree-cfg.c (gimplify_build1): Move to tree-vect-generic.c. (gimplify_build2): Likewise. (gimplify_build3): Likewise. * tree-vect-generic.c (gimplify_build1): Move from tree-cfg.c. Modernize. (gimplify_build2): Likewise. (gimplify_build3): Likewise. (tree_vec_extract): Use resimplify with following SSA edges. (expand_vector_parallel): Avoid passing NULL size/bitpos to tree_vec_extract. * expr.c (store_constructor): Deal with zero-element CTORs. * match.pd (bit_field_ref <vector CTOR>): Make sure to produce vector constants when possible.
2021-04-26Remove gimplify_buildN API use from complex loweringRichard Biener1-100/+132
This removes the legacy gimplify_buildN API use from complex lowering. 2021-04-15 Richard Biener <rguenther@suse.de> * tree-complex.c: Include gimple-fold.h. (expand_complex_addition): Use gimple_build. (expand_complex_multiplication_components): Likewise. (expand_complex_multiplication): Likewise. (expand_complex_div_straight): Likewise. (expand_complex_div_wide): Likewise. (expand_complex_division): Likewise. (expand_complex_conjugate): Likewise. (expand_complex_comparison): Likewise.
2021-04-26Remove gimplify_buildN API use from phioptRichard Biener1-7/+7
This removes use of the legacy gimplify_buildN API from phiopt. 2021-04-15 Richard Biener <rguenther@suse.de> * tree-ssa-phiopt.c (two_value_replacement): Remove use of legacy gimplify_buildN API.
2021-04-26tree-optimization/99473 - more cselimRichard Biener2-3/+16
This fixes the pre-condition on cselim to include all references and decls when they end up as auto-var. Bootstrapped/tested on x86_64-linux 2021-03-09 Richard Biener <rguenther@suse.de> PR tree-optimization/99473 * tree-ssa-phiopt.c (cond_store_replacement): Handle all stores. * gcc.dg/tree-ssa/pr99473-1.c: New testcase.
2021-04-26Simplify {gimplify_and_,}update_call_from_tree APIRichard Biener11-333/+226
This removes update_call_from_tree in favor of gimplify_and_update_call_from_tree, removing some code duplication and simplifying the API use. Some users of update_call_from_tree have been transitioned to replace_call_with_value and the API and its dependences have been moved to gimple-fold.h. This shaves off another user of valid_gimple_rhs_p which is now only used from within gimple-fold.c and thus moved and made private. 2021-04-14 Richard Biener <rguenther@suse.de> * tree-ssa-propagate.h (valid_gimple_rhs_p): Remove. (update_gimple_call): Likewise. (update_call_from_tree): Likewise. * tree-ssa-propagate.c (valid_gimple_rhs_p): Remove. (valid_gimple_call_p): Likewise. (move_ssa_defining_stmt_for_defs): Likewise. (finish_update_gimple_call): Likewise. (update_gimple_call): Likewise. (update_call_from_tree): Likewise. (propagate_tree_value_into_stmt): Use replace_call_with_value. * gimple-fold.h (update_gimple_call): Declare. * gimple-fold.c (valid_gimple_rhs_p): Move here from tree-ssa-propagate.c. (update_gimple_call): Likewise. (valid_gimple_call_p): Likewise. (finish_update_gimple_call): Likewise, and simplify. (gimplify_and_update_call_from_tree): Implement update_call_from_tree functionality, avoid excessive push/pop_gimplify_context. (gimple_fold_builtin): Use only gimplify_and_update_call_from_tree. (gimple_fold_call): Likewise. * gimple-ssa-sprintf.c (try_substitute_return_value): Likewise. * tree-ssa-ccp.c (ccp_folder::fold_stmt): Likewise. (pass_fold_builtins::execute): Likewise. (optimize_stack_restore): Use replace_call_with_value. * tree-cfg.c (fold_loop_internal_call): Likewise. * tree-ssa-dce.c (maybe_optimize_arith_overflow): Use only gimplify_and_update_call_from_tree. * tree-ssa-strlen.c (handle_builtin_strlen): Likewise. (handle_builtin_strchr): Likewise. * tsan.c: Include gimple-fold.h instead of tree-ssa-propagate.h. * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Use replace_call_with_value.
2021-04-26vmsdbgout: Remove useless register keywordsJakub Jelinek1-10/+10
register keyword was removed in C++17, and in vmsdbgout.c it served no useful purpose. 2021-04-26 Jakub Jelinek <jakub@redhat.com> PR debug/100255 * vmsdbgout.c (ASM_OUTPUT_DEBUG_STRING, vmsdbgout_begin_block, vmsdbgout_end_block, lookup_filename, vmsdbgout_source_line): Remove register keywords.
2021-04-26Daily bump.GCC Administrator3-1/+28
2021-04-25Add folding and remove expanders for x86 *pcmp{et,gt}* builtins [PR ↵liuhongt5-49/+186
target/98911] gcc/ChangeLog: PR target/98911 * config/i386/i386-builtin.def (BDESC): Change the icode of the following builtins to CODE_FOR_nothing. * config/i386/i386.c (ix86_gimple_fold_builtin): Fold IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128, IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ, IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256, IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256, IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128, IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ, IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256, IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256. * config/i386/sse.md (avx2_eq<mode>3): Deleted. (sse2_eq<mode>3): Ditto. (sse4_1_eqv2di3): Ditto. (sse2_gt<mode>3): Rename to .. (*sse2_gt<mode>3): .. this. gcc/testsuite/ChangeLog: PR target/98911 * gcc.target/i386/pr98911.c: New test. * gcc.target/i386/funcspec-8.c: Replace __builtin_ia32_pcmpgtq with __builtin_ia32_pcmpistrm128 since it has been folded.
2021-04-25Daily bump.GCC Administrator6-1/+131
2021-04-24analyzer: fix ICE on NULL change.m_expr [PR100244]David Malcolm2-1/+23
PR analyzer/100244 reports an ICE on a -Wanalyzer-free-of-non-heap due to a case where free_of_non_heap::describe_state_change can be passed a NULL change.m_expr for a suitably complicated symbolic value. Bulletproof it by checking for change.m_expr being NULL before dereferencing it. gcc/analyzer/ChangeLog: PR analyzer/100244 * sm-malloc.cc (free_of_non_heap::describe_state_change): Bulletproof against change.m_expr being NULL. gcc/testsuite/ChangeLog: PR analyzer/100244 * g++.dg/analyzer/pr100244.C: New test.