aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2022-01-11[Ada] Use atomic builtins for atomic counters on x86 (32bit)Piotr Trojanek2-113/+1
gcc/ada/ * Makefile.rtl (X86_TARGET_PAIRS): Use __builtin variant of System.Atomic_Counters. * libgnat/s-atocou__x86.adb: Remove.
2022-01-11ira: Fix s390 buildRichard Sandiford1-4/+7
My g:01f3e6a40e7202310abbeb41c345d325bd69554f broke the s390 build because the rclass variable was still needed by the IRA_HARD_REGNO_ADD_COST_MULTIPLIER code. gcc/ * ira-costs.c (ira_tune_allocno_costs): Fix missing rclass definition in IRA_HARD_REGNO_ADD_COST_MULTIPLIER code.
2022-01-11Recognize MULT_HIGHPART_EXPR in tree-ssa-math-opts pass.Roger Sayle2-1/+264
This is the third iteration of a patch to perceive MULT_HIGHPART_EXPR in the middle-end. As they say "the third time's a charm". The first version implemented this in match.pd, which was considered too early. https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551316.html The second version attempted to do this during RTL expansion, and was considered to be too late in the middle-end. https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576922.html https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576923.html This latest version incorporates Richard Biener's feedback/suggestion to perceive MULT_HIGHPART_EXPR in one of the "instruction selection passes", specifically tree-ssa-math-opts, where the recognition of highpart multiplications takes place in the same pass as widening multiplications. With each rewrite, the patch is also getting more aggressive in the set of widening multiplications that it recognizes as highpart multiplies. Currently any widening multiplication followed by a right shift (either signed or unsigned) by a bit count sufficient to eliminate the lowpart is recognized. The result of this shift doesn't need to be truncated. As previously, this patch confirms the target provides a suitable optab before introducing the MULT_HIGHPART_EXPR. This is the reason the testcase is restricted to x86_64, as this pass doesn't do anything on some platforms, but x86_64 should be sufficient to confirm that the pass is working/continues to work. 2022-01-11 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog * tree-ssa-math-opts.c (struct widen_mul_stats): Add a highpart_mults_inserted field. (convert_mult_to_highpart): New function to convert right shift of a widening multiply into a MULT_HIGHPART_EXPR. (math_opts_dom_walker::after_dom_children) [RSHIFT_EXPR]: Call new convert_mult_to_highpart function. (pass_optimize_widening_mul::execute): Add a statistics counter for tracking "highpart multiplications inserted" events. gcc/testsuite/ChangeLog * gcc.target/i386/mult-highpart.c: New test case.
2022-01-11rs6000: powerpc suboptimal boolean test of contiguous bits [PR102239]Xionghu Luo4-0/+59
Add specialized version to combine two instructions from 9: {r123:CC=cmp(r124:DI&0x600000000,0);clobber scratch;} REG_DEAD r124:DI 10: pc={(r123:CC==0)?L15:pc} REG_DEAD r123:CC to: 10: {pc={(r123:DI&0x600000000==0)?L15:pc};clobber scratch;clobber %0:CC;} then split2 will split it to one rotate dot instruction (to save one rotate back instruction) as shifted result doesn't matter when comparing to 0 in CCEQmode. Bootstrapped and regression tested pass on Power 8/9/10. gcc/ChangeLog: PR target/102239 * config/rs6000/rs6000-protos.h (rs6000_is_valid_rotate_dot_mask): New declare. * config/rs6000/rs6000.c (rs6000_is_valid_rotate_dot_mask): New function. * config/rs6000/rs6000.md (*branch_anddi3_dot): New. gcc/testsuite/ChangeLog: PR target/102239 * gcc.target/powerpc/pr102239.c: New test.
2022-01-11State --sysroot option as validated once processedOlivier Hainque1-0/+1
Since we now save the option in the "switches" table to let specs use it more generally, we need to explicitly state that the option was validated else the driver will consider it "unrecognized". 2022-01-05 Olivier Hainque <hainque@adacore.com> * gcc.c (driver_handle_option): State --sysroot as validated.
2022-01-11Improve sequence logic in cxx_init_decl_processingOlivier Hainque1-3/+3
Checking for one_only/weak support is better done before deciding to turn references to __cxa_pure_virtual weak. This helps at least on VxWorks where one_only / weak support varies between kernel and rtp modes as well as across VxWorks versions. 2021-12-30 Olivier Hainque <hainque@adacore.com> gcc/cp/ * decl.c (cxx_init_decl_processing): Move code possibly altering flag_weak before code testing it.
2022-01-11testsuite: Fix regression on m32 by r12-6087 [PR103820]Xionghu Luo1-3/+3
r12-6087 will avoid move cold bb out of hot loop, while the original intent of this testcase is to hoist divides out of loop and CSE them to only one divide. So increase the loop count to turn the cold bb to hot bb again. Then the 3 divides could be rewritten with same reciptmp. Tested pass on Power-Linux {32,64}, x86 {64,32} and i686-linux. gcc/testsuite/ChangeLog: PR testsuite/103820 * gcc.dg/tree-ssa/recip-3.c: Adjust.
2022-01-10rs6000: Remove useless code related to -mno-power10Kewen Lin1-1/+0
Option -mpower10 was made as "WarnRemoved" since commit r11-2318, so -mno-power10 doesn't take effect any more. This patch is to remove one line code which respects it. gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_disable_incompatible_switches): Remove useless related to option -mno-power10.
2022-01-11Extend predicate of operands[1] from register_operand to vector_operand for ↵Haochen Jiang2-1/+17
andnot insn. This can do optimization like - pcmpeqd %xmm0, %xmm0 - pxor g(%rip), %xmm0 - pand %xmm1, %xmm0 + movdqa g(%rip), %xmm0 + pandn %xmm1, %xmm0 gcc/ChangeLog: PR target/53652 * config/i386/sse.md (*andnot<mode>3): Extend predicate of operands[1] from register_operand to vector_operand. gcc/testsuite/ChangeLog: PR target/53652 * gcc.target/i386/pr53652-1.c: New test.
2022-01-11Daily bump.GCC Administrator7-1/+306
2022-01-10i386: Introduce V2QImode vector compares [PR103861]Uros Bizjak3-28/+56
Add V2QImode vector compares with SSE registers. 2022-01-10 Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog: PR target/103861 * config/i386/i386-expand.c (ix86_expand_int_sse_cmp): Handle V2QImode. * config/i386/mmx.md (<sat_plusminus:insn><mode>3): Use VI1_16_32 mode iterator. (*eq<mode>3): Ditto. (*gt<mode>3): Ditto. (*xop_maskcmp<mode>3): Ditto. (*xop_maskcmp_uns<mode>3): Ditto. (vec_cmp<mode><mode>): Ditto. (vec_cmpu<mode><mode>): Ditto. gcc/testsuite/ChangeLog: PR target/103861 * gcc.target/i386/pr103861-2.c: New test.
2022-01-10c++: constexpr base-to-derived conversion with offset 0 [PR103879]Patrick Palka3-12/+58
r12-136 made us canonicalize an object/offset pair with negative offset into one with a nonnegative offset, by iteratively absorbing the innermost component into the offset and stopping as soon as the offset becomes nonnegative. This patch strengthens this transformation by making it keep on absorbing even if the offset is already 0 as long as the innermost component is at position 0 (and thus absorbing doesn't change the offset). This lets us accept the two constexpr testcases below, which we'd previously reject essentially because cxx_fold_indirect_ref would be unable to resolve *(B*)&b.D123 (where D123 is the base A subobject at position 0) to just b. PR c++/103879 gcc/cp/ChangeLog: * constexpr.c (cxx_fold_indirect_ref): Split out object/offset canonicalization step into a local lambda. Strengthen it to absorb more components at position 0. Use it before both calls to cxx_fold_indirect_ref_1. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/constexpr-base2.C: New test. * g++.dg/cpp1y/constexpr-base2a.C: New test.
2022-01-10c++: "more constrained" vs staticness of memfn [PR103783]Patrick Palka2-3/+39
Here we're rejecting the calls to g1 and g2 as ambiguous even though one overload is more constrained than the other (and they're otherwise tied), because the implicit 'this' parameter of the non-static overload causes cand_parms_match to think the function parameter lists aren't equivalent. This patch fixes this by making cand_parms_match skip over 'this' appropriately. Note that this bug only affects partial ordering of non-template member functions because for member function templates more_specialized_fn seems to already skip over 'this' appropriately. PR c++/103783 gcc/cp/ChangeLog: * call.c (cand_parms_match): Skip over 'this' when given one static and one non-static member function. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-memfun2.C: New test.
2022-01-10c++: Ensure some more that immediate functions aren't gimplified [PR103912]Jakub Jelinek3-0/+51
Immediate functions should never be emitted into assembly, the FE doesn't genericize them and does various things to ensure they aren't gimplified. But the following testcase ICEs anyway due to that, because the consteval function returns a lambda, and operator() of the lambda has decl_function_context of the consteval function. cgraphunit.c then does: /* Preserve a functions function context node. It will later be needed to output debug info. */ if (tree fn = decl_function_context (decl)) { cgraph_node *origin_node = cgraph_node::get_create (fn); enqueue_node (origin_node); } which enqueues the immediate function and then tries to gimplify it, which results in ICE because it hasn't been genericized. When I try similar testcase with constexpr instead of consteval and static constinit auto instead of auto in main, what happens is that the functions are gimplified, later ipa.c discovers they aren't reachable and sets body_removed to true for them (and clears other flags) and we end up with a debug info which has the foo and bar functions without DW_AT_low_pc and other code specific attributes, just stuff from its BLOCK structure and in there the lambda with DW_AT_low_pc etc. The following patch attempts to emulate that behavior early, so that cgraph doesn't try to gimplify those and pretends they were already gimplified and found unused and optimized away. 2022-01-10 Jakub Jelinek <jakub@redhat.com> PR c++/103912 * semantics.c (expand_or_defer_fn): For immediate functions, set node->body_removed to true and clear analyzed, definition and force_output. * decl2.c (c_parse_final_cleanups): Ignore immediate functions for expand_or_defer_fn. * g++.dg/cpp2a/consteval26.C: New test.
2022-01-10tree-optimization/103948 - detect vector vec_cmp in expand_vector_conditionUros Bizjak1-1/+3
Currently, expand_vector_condition detects only vcondMN and vconduMN named RTX patterns. Teach it to also consider vec_cmpMN and vec_cmpuMN RTX patterns when all ones vector is returned for true and all zeros vector is returned for false. 2022-01-10 Richard Biener <rguenther@suse.de> gcc/ChangeLog: PR tree-optimization/103948 * tree-vect-generic.c (expand_vector_condition): Return true if all ones vector is returned for true, all zeros vector for false and the target defines corresponding vec_cmp{,u}MN named RTX pattern.
2022-01-10rs6000: Add Power10 optimization for _mm_blendv*Paul A. Clarke2-1/+78
Power10 ISA added `xxblendv*` instructions which are realized in the `vec_blendv` instrinsic. Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`. Update original implementation of _mm_blendv_epi8 to use signed types, to better match the function parameters. Realization is unchanged. Also, copy a test from i386 for testing `_mm_blendv_ps`. This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8, but was inadvertently omitted. 2022-01-10 Paul A. Clarke <pc@us.ibm.com> gcc * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv when _ARCH_PWR10. Use signed types. (_mm_blendv_ps): Use vec_blendv when _ARCH_PWR10. (_mm_blendv_pd): Likewise. gcc/testsuite * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386, adjust dg directives to suit.
2022-01-10[vect] Re-analyze all modes for epiloguesAndre Vieira4-39/+72
gcc/ChangeLog: * tree-vectorizer.c (better_epilogue_loop_than_p): Round factors up for epilogue costing. * tree-vect-loop.c (vect_analyze_loop): Re-analyze all modes for epilogues, unless we are guaranteed that we can't have partial vectors. * genopinit.c: (partial_vectors_supported): Generate new function. gcc/testsuite/ChangeLog: * gcc.target/aarch64/masked_epilogue.c: New test.
2022-01-10Fortran: Pass unlimited polymorphic argument to assumed type [PR103366].Paul Thomas2-4/+31
2022-01-10 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/103366 * trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Allow unlimited polymorphic actual argument passed to assumed type formal. gcc/testsuite/ PR fortran/103366 * gfortran.dg/pr103366.f90: New test.
2022-01-10x86_64: Ignore zero width bitfields in ABI and issue -Wpsabi warning about C ↵Jakub Jelinek3-7/+82
zero width bitfield ABI changes [PR102024] For zero-width bitfields current GCC classify_argument does: if (DECL_BIT_FIELD (field)) { for (i = (int_bit_position (field) + (bit_offset % 64)) / 8 / 8; i < ((int_bit_position (field) + (bit_offset % 64)) + tree_to_shwi (DECL_SIZE (field)) + 63) / 8 / 8; i++) classes[i] = merge_classes (X86_64_INTEGER_CLASS, classes[i]); } which I think means that if the zero-width bitfields are at bit-positions (in the toplevel aggregate) which are multiples of 64 bits doesn't do anything, (int_bit_position (field) + (bit_offset % 64)) / 64 and (int_bit_position (field) + (bit_offset % 64) + 63) / 64 should be equal. But for zero-width bitfields at other bit positions it will call merge_classes once. Now, the typical case is that the zero width bitfield is surrounded by some bitfields and in that case, it doesn't change anything, but it can be sandwitched in between floats too as the testcases show. In C we had this behavior, in C++ previously the FE was removing the zero-width bitfields and therefore they were ignored. LLVM and ICC seems to ignore those bitfields both in C and C++ (== passing struct S in SSE register rather than in GPR). The x86-64 psABI has been recently clarified by https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/1aa4398d26c250b252a0c4a0f777216c9a6789ec that zero width bitfield should be always ignored. This patch implements that and emits a warning for C for cases where the ABI changed from GCC 11. 2022-01-10 Jakub Jelinek <jakub@redhat.com> PR target/102024 * config/i386/i386.c (classify_argument): Add zero_width_bitfields argument, when seeing DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD bitfields, always ignore them, when seeing other zero sized bitfields, either set zero_width_bitfields to 1 and ignore it or if equal to 2 process it. Pass it to recursive calls. Add wrapper with old arguments and diagnose ABI differences for C structures with zero width bitfields. Formatting fixes. * gcc.target/i386/pr102024.c: New test. * g++.target/i386/pr102024.C: New test.
2022-01-10ira: Handle "soft" conflicts between cap and non-cap allocnosRichard Sandiford4-30/+326
This patch looks for allocno conflicts of the following form: - One allocno (X) is a cap allocno for some non-cap allocno X2. - X2 belongs to some loop L2. - The other allocno (Y) is a non-cap allocno. - Y is an ancestor of some allocno Y2 in L2. - Y2 is not referenced in L2 (that is, ALLOCNO_NREFS (Y2) == 0). - Y can use a different allocation from Y2. In this case, Y's register is live across L2 but is not used within it, whereas X's register is used only within L2. The conflict is therefore only "soft", in that it can easily be avoided by spilling Y2 inside L2 without affecting any insn references. In principle we could do this for ALLOCNO_NREFS (Y2) != 0 too, with the callers then taking Y2's ALLOCNO_MEMORY_COST into account. There would then be no "cliff edge" between a Y2 that has no references and a Y2 that has (say) a single cold reference. However, doing that isn't necessary for the PR and seems to give variable results in practice. (fotonik3d_r improves slightly but namd_r regresses slightly.) It therefore seemed better to start with the higher-value zero-reference case and see how things go. On top of the previous patches in the series, this fixes the exchange2 regression seen in GCC 11. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_soft_conflict): Declare. * ira-color.c (max_soft_conflict_loop_depth): New constant. (ira_soft_conflict): New function. (spill_soft_conflicts): Likewise. (assign_hard_reg): Use them to handle the case described by the comment above ira_soft_conflict. (improve_allocation): Likewise. * ira.c (check_allocation): Allow allocnos with "soft" conflicts to share the same register. gcc/testsuite/ * gcc.target/aarch64/reg-alloc-4.c: New test.
2022-01-10ira: Consider modelling caller-save allocations as loop spillsRichard Sandiford5-18/+129
If an allocno A in an inner loop L spans a call, a parent allocno AP can choose to handle a call-clobbered/caller-saved hard register R in one of two ways: (1) save R before each call in L and restore R after each call (2) spill R to memory throughout L (2) can be cheaper than (1) in some cases, particularly if L does not reference A. Before the patch we always did (1). The patch adds support for picking (2) instead, when it seems cheaper. It builds on the earlier support for not propagating conflicts to parent allocnos. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_caller_save_cost): New function. (ira_caller_save_loop_spill_p): Likewise. * ira-build.c (ira_propagate_hard_reg_costs): Test whether it is cheaper to spill a call-clobbered register throughout a loop rather than spill it around each individual call. If so, treat all call-clobbered registers as conflicts and... (propagate_allocno_info): ...do not propagate call information from the child to the parent. * ira-color.c (move_spill_restore): Update accordingly. * ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost. gcc/testsuite/ * gcc.target/aarch64/reg-alloc-3.c: New test.
2022-01-10ira: Try to avoid propagating conflictsRichard Sandiford4-23/+169
Suppose that: - an inner loop L contains an allocno A - L clobbers hard register R while A is live - A's parent allocno is AP Previously, propagate_allocno_info would propagate conflict sets up the loop tree, so that the conflict between A and R would become a conflict between AP and R (and so on for ancestors of AP). However, when IRA treats loops as separate allocation regions, it can decide on a loop-by-loop basis whether to allocate a register or spill to memory. Conflicts in inner loops therefore don't need to become hard conflicts in parent loops. Instead we can record that using the “conflicting” registers for the parent allocnos has a higher cost. In the example above, this higher cost is the sum of: - the cost of saving R on entry to L - the cost of keeping the pseudo register in memory throughout L - the cost of reloading R on exit from L This value is also a cap on the hard register cost that A can contribute to AP in general (not just for conflicts). Whatever allocation we pick for AP, there is always the option of spilling that register to memory throughout L, so the cost to A of allocating a register to AP can't be more than the cost of spilling A. To take an extreme example: if allocating a register R2 to A is more expensive than spilling A to memory, ALLOCNO_HARD_REG_COSTS (A)[R2] could be (say) 2 times greater than ALLOCNO_MEMORY_COST (A) or 100 times greater than ALLOCNO_MEMORY_COST (A). But this scale factor doesn't matter to AP. All that matters is that R2 is more expensive than memory for A, so that allocating R2 to AP should be costed as spilling A to memory (again assuming that A and AP are in different allocation regions). Propagating a factor of 100 would distort the register costs for AP. move_spill_restore tries to undo the propagation done by propagate_allocno_info, so we need some extra processing there. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_allocno::might_conflict_with_parent_p): New field. (ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P): New macro. (ira_single_region_allocno_p): New function. (ira_total_conflict_hard_regs): Likewise. * ira-build.c (ira_create_allocno): Initialize ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P. (ira_propagate_hard_reg_costs): New function. (propagate_allocno_info): Use it. Try to avoid propagating hard register conflicts to parent allocnos if we can handle the conflicts by spilling instead. Limit the propagated register costs to the cost of spilling throughout the child loop. * ira-color.c (color_pass): Use ira_single_region_allocno_p to test whether a child and parent allocno can share the same register. (move_spill_restore): Adjust for the new behavior of propagate_allocno_info. gcc/testsuite/ * gcc.target/aarch64/reg-alloc-2.c: New test.
2022-01-10ira: Add ira_subloop_allocnos_can_differ_pRichard Sandiford2-20/+29
color_pass has two instances of the same code for propagating non-cap assignments from parent loops to subloops. This patch adds a helper function for testing when such propagations are required for correctness and uses it to remove the duplicated code. A later patch will use this in ira-build.c too, which is why the function is exported to ira-int.h. No functional change intended. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_subloop_allocnos_can_differ_p): New function, extracted from... * ira-color.c (color_pass): ...here.
2022-01-10ira: Add comments and fix move_spill_restore calculationRichard Sandiford1-1/+27
This patch adds comments to describe each use of ira_loop_border_costs. I think this highlights that move_spill_restore was using the wrong cost in one case, which came from tranposing [0] and [1] in the original (pre-ira_loop_border_costs) ira_memory_move_cost expressions. The difference would only be noticeable on targets that distinguish between load and store costs. gcc/ PR rtl-optimization/98782 * ira-color.c (color_pass): Add comments to describe the spill costs. (move_spill_restore): Likewise. Fix reversed calculation.
2022-01-10ira: Add a ira_loop_border_costs classRichard Sandiford2-46/+86
The final index into (ira_)memory_move_cost is 1 for loads and 0 for stores. Thus the combination: entry_freq * memory_cost[1] + exit_freq * memory_cost[0] is the cost of loading a register on entry to a loop and storing it back on exit from the loop. This is the cost to use if the register is successfully allocated within the loop but is spilled in the parent loop. Similarly: entry_freq * memory_cost[0] + exit_freq * memory_cost[1] is the cost of storing a register on entry to the loop and restoring it on exit from the loop. This is the cost to use if the register is spilled within the loop but is successfully allocated in the parent loop. The patch adds a helper class for calculating these values and mechanically replaces the existing instances. There is no attempt to editorialise the choice between using “spill inside” and “spill outside” costs. (I think one of them is the wrong way round, but a later patch deals with that.) No functional change intended. gcc/ PR rtl-optimization/98782 * ira-int.h (ira_loop_border_costs): New class. * ira-color.c (ira_loop_border_costs::ira_loop_border_costs): New constructor. (calculate_allocno_spill_cost): Use ira_loop_border_costs. (color_pass): Likewise. (move_spill_restore): Likewise.
2022-01-10Properly enable -freorder-blocks-and-partition on 64-bit WindowsEric Botcazou1-3/+5
The PR uncovered that -freorder-blocks-and-partition was working by accident on 64-bit Windows, i.e. the middle-end was supposed to disable it with SEH. After the change installed on mainline, the middle-end properly disables it, which is too bad since a significant amount of work went into it for SEH. gcc/ PR target/103465 * coretypes.h (unwind_info_type): Swap UI_SEH and UI_TARGET.
2022-01-10Fortran: Allow IEEE_CLASS to identify signaling NaNsFrancois-Xavier Coudert2-0/+103
We use the issignaling macro, present in some libc's (notably glibc), when it is available. Compile all IEEE-related files in the library (both C and Fortran sources) with -fsignaling-nans to ensure maximum compatibility. libgfortran/ChangeLog: PR fortran/82207 * Makefile.am: Pass -fsignaling-nans for IEEE files. * Makefile.in: Regenerate. * ieee/ieee_helper.c: Use issignaling macro to recognized signaling NaNs. gcc/testsuite/ChangeLog: PR fortran/82207 * gfortran.dg/ieee/signaling_1.f90: New test. * gfortran.dg/ieee/signaling_1_c.c: New file.
2022-01-10middle-end/101530 - fix shufflevector loweringRichard Biener2-0/+22
This makes __builtin_shufflevector lowering force the result of the BIT_FIELD_REF lowpart operation to a temporary as to fulfil the IL verifier constraint that BIT_FIELD_REFs should be always in outermost handled component position. Trying to enforce this during gimplification isn't as straight-forward as here where we know we're dealing with an rvalue. FAIL: c-c++-common/torture/builtin-shufflevector-1.c -O0 execution test 2022-01-05 Richard Biener <rguenther@suse.de> PR middle-end/101530 gcc/c-family/ * c-common.c (c_build_shufflevector): Wrap the BIT_FIELD_REF in a TARGET_EXPR to force a temporary. gcc/testsuite/ * c-c++-common/builtin-shufflevector-3.c: New testcase.
2022-01-10tree-optimization/100359 - restore unroll at -O3Richard Biener2-1/+36
This fixes a mistake done with r8-5008 when introducing allow_peel to the unroll code. The intent was to allow peeling that doesn't grow code but the result was that with -O3 and UL_ALL this wasn't done. The following instantiates the desired effect by adjusting ul to UL_NO_GROWTH if peeling is not allowed. 2022-01-05 Richard Biener <rguenther@suse.de> PR tree-optimization/100359 * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Allow non-growing peeling with !allow_peel and UL_ALL. * gcc.dg/tree-ssa/pr100359.c: New testcase.
2022-01-10[Ada] Fix bogus error on call to subprogram with incomplete profileEric Botcazou1-2/+29
gcc/ada/ * gcc-interface/trans.c (Identifier_to_gnu): Use correct subtype. (elaborate_profile): New function. (Call_to_gnu): Call it on the formals and the result type before retrieving the translated result type from the subprogram type.
2022-01-10[Ada] Fix internal error on unchecked union with component clausesEric Botcazou1-12/+17
gcc/ada/ * gcc-interface/decl.c (gnat_to_gnu_entity) <E_Record_Type>: Fix computation of boolean result in the unchecked union case. (components_to_record): Rename MAYBE_UNUSED parameter to IN_VARIANT and remove local variable of the same name. Pass NULL recursively as P_GNU_REP_LIST for nested variants in the unchecked union case.
2022-01-10[Ada] Make pragma Inspection_Point work for constantsEric Botcazou1-0/+17
gcc/ada/ * gcc-interface/trans.c (lvalue_required_p) <N_Pragma>: New case. <N_Pragma_Argument_Association>: Likewise. (Pragma_to_gnu) <Pragma_Inspection_Point>: Fetch the corresponding variable of a constant before marking it as addressable.
2022-01-10[Ada] Reduce runtime dependencies on stage1Arnaud Charlet1-29/+26
gcc/ada/ * gcc-interface/Make-lang.in (ADA_GENERATED_FILES): Remove s-casuti.ad?, s-crtl.ad?, s-os_lib.ad?. Update list of object files accordingly.
2022-01-10[Ada] Switch from __sync to __atomic builtins for Lock_Free_Try_WritePiotr Trojanek2-19/+15
gcc/ada/ * libgnat/s-atopri.ads (Atomic_Compare_Exchange): Replaces deprecated Sync_Compare_And_Swap. * libgnat/s-atopri.adb (Lock_Free_Try_Write): Switch from __sync to __atomic builtins.
2022-01-10[Ada] Remove CodePeer annotations for pragma Loop_VariantPiotr Trojanek5-15/+0
gcc/ada/ * libgnat/s-exponn.adb, libgnat/s-expont.adb, libgnat/s-exponu.adb, libgnat/s-widthi.adb, libgnat/s-widthu.adb: Remove CodePeer annotations for pragma Loop_Variant.
2022-01-10[Ada] Disable expansion of pragma Loop_Variant in CodePeer modePiotr Trojanek1-1/+4
gcc/ada/ * exp_prag.adb (Expand_Pragma_Loop_Variant): Disable expansion in CodePeer mode.
2022-01-10[Ada] Fix typo in comment about unit familiesPiotr Trojanek1-1/+1
gcc/ada/ * sem_util.adb (Is_Child_Or_Sibling): Fix typo in comment.
2022-01-10[Ada] Adjust the alignment to the size for bit-packed arraysEric Botcazou1-0/+13
gcc/ada/ * exp_pakd.adb (Install_PAT): If the PAT is a scalar type, apply the canonical adjustment to its alignment.
2022-01-10[Ada] Switch from __sync to __atomic builtins for atomic countersPiotr Trojanek1-16/+26
gcc/ada/ * libgnat/s-atocou__builtin.adb (Decrement, Increment): Switch from __sync to __atomic builtins; use 'Address to be consistent with System.Atomic_Primitives.
2022-01-10[Ada] Fix error on too large size clause for bit-packed arrayEric Botcazou2-2/+6
gcc/ada/ * exp_pakd.adb (Install_PAT): Do not reset the alignment here. * layout.adb (Layout_Type): Call Adjust_Esize_Alignment after having copied the RM_Size onto the Esize when the latter is too small.
2022-01-10[Ada] Task arrays trigger spurious unreferenced warningsJustin Squirek1-1/+8
gcc/ada/ * sem_warn.adb (Check_References): Handle arrays of tasks similar to task objects.
2022-01-10Daily bump.GCC Administrator3-1/+51
2022-01-09Fortran: check arguments of MASKL/MASKR intrinsics before simplificationHarald Anlauf2-0/+20
gcc/fortran/ChangeLog: PR fortran/103777 * simplify.c (gfc_simplify_maskr): Check validity of argument 'I' before simplifying. (gfc_simplify_maskl): Likewise. gcc/testsuite/ChangeLog: PR fortran/103777 * gfortran.dg/masklr_3.f90: New test.
2022-01-09Fortran: reject invalid non-constant pointer initialization targetsHarald Anlauf2-0/+57
gcc/fortran/ChangeLog: PR fortran/101762 * expr.c (gfc_check_pointer_assign): For pointer initialization targets, check that subscripts and substring indices in specifications are constant expressions. gcc/testsuite/ChangeLog: PR fortran/101762 * gfortran.dg/pr101762.f90: New test.
2022-01-09Fortran: Ignore KIND argument of a few more intrinsics. [PR103789]Mikael Morin5-0/+46
After PR97896 for which some code was added to ignore the KIND argument of the INDEX intrinsics, and PR87711 for which that was extended to LEN_TRIM as well, this propagates it further to MASKL, MASKR, SCAN and VERIFY. PR fortran/103789 gcc/fortran/ChangeLog: * trans-array.c (arg_evaluated_for_scalarization): Add MASKL, MASKR, SCAN and VERIFY to the list of intrinsics whose KIND argument is to be ignored. gcc/testsuite/ChangeLog: * gfortran.dg/maskl_1.f90: New test. * gfortran.dg/maskr_1.f90: New test. * gfortran.dg/scan_3.f90: New test. * gfortran.dg/verify_3.f90: New test.
2022-01-08Testsuite: Make dependence on -fdelete-null-pointer-checks explicitSandra Loosemore7-0/+7
nios2-elf target defaults to -fno-delete-null-pointer-checks, breaking tests that implicitly depend on that optimization. Add the option explicitly on these tests. 2022-01-08 Sandra Loosemore <sandra@codesourcery.com> gcc/testsuite/ * g++.dg/cpp0x/constexpr-compare1.C: Add explicit -fdelete-null-pointer-checks option. * g++.dg/cpp0x/constexpr-compare2.C: Likewise. * g++.dg/cpp0x/constexpr-typeid2.C: Likewise. * g++.dg/cpp1y/constexpr-94716.C: Likewise. * g++.dg/cpp1z/constexpr-compare1.C: Likewise. * g++.dg/cpp1z/constexpr-if36.C: Likewise. * gcc.dg/init-compare-1.c: Likewise. libstdc++-v3/ * testsuite/18_support/type_info/constexpr.cc: Add explicit -fdelete-null-pointer-checks option.
2022-01-09Daily bump.GCC Administrator4-1/+52
2022-01-08x86_64: Improve (interunit) moves from TImode to V1TImode.Roger Sayle3-0/+44
This patch improves the code generated when moving a 128-bit value in TImode, represented by two 64-bit registers, to V1TImode, which is a single SSE register. Currently, the simple move: typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16))); uv1ti foo(__int128 x) { return (uv1ti)x; } is always transferred via memory, as: foo: movq %rdi, -24(%rsp) movq %rsi, -16(%rsp) movdqa -24(%rsp), %xmm0 ret with this patch, we now generate (with -msse2): foo: movq %rdi, %xmm1 movq %rsi, %xmm2 punpcklqdq %xmm2, %xmm1 movdqa %xmm1, %xmm0 ret and with -mavx2: foo: vmovq %rdi, %xmm1 vpinsrq $1, %rsi, %xmm1, %xmm0 ret Even more dramatic is the improvement of zero extended transfers. uv1ti bar(unsigned char c) { return (uv1ti)(__int128)c; } Previously generated: bar: movq $0, -16(%rsp) movzbl %dil, %eax movq %rax, -24(%rsp) vmovdqa -24(%rsp), %xmm0 ret Now generates: bar: movzbl %dil, %edi movq %rdi, %xmm0 ret My first attempt at this functionality attempted to use a simple define_split, but unfortunately, this triggers very late during the compilation preventing some of the simplifications we'd like (in combine). For example the foo case above becomes: foo: movq %rsi, -16(%rsp) movq %rdi, %xmm0 movhps -16(%rsp), %xmm0 transferring half directly, and the other half via memory. And for the bar case above, GCC fails to appreciate that movq/vmovq clears the high bits, resulting in: bar: movzbl %dil, %eax xorl %edx, %edx vmovq %rax, %xmm1 vpinsrq $1, %rdx, %xmm1, %xmm0 ret Hence the solution (i.e. this patch) is to add a special case to ix86_expand_vector_move for TImode to V1TImode transfers. 2022-01-08 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.c (ix86_expand_vector_move): Add special case for TImode to V1TImode moves, going via V2DImode. gcc/testsuite/ChangeLog * gcc.target/i386/sse2-v1ti-mov-1.c: New test case. * gcc.target/i386/sse2-v1ti-zext.c: New test case.
2022-01-08c++, match.pd: Evaluate in constant evaluation comparisons like &var1 + 12 ↵Jakub Jelinek4-0/+91
== &var2 + 24 [PR89074] The match.pd address_comparison simplification can only handle ADDR_EXPR comparisons possibly converted to some other type (I wonder if we shouldn't restrict it in address_compare to casts to pointer types or pointer-sized integer types, I think we shouldn't optimize (short) (&var) == (short) (&var2) because we really don't know whether it will be true or false). On GIMPLE, most of pointer to pointer casts are useless and optimized away and further we have in gimple_fold_stmt_to_constant_1 an optimization that folds &something p+ const_int into &MEM_REF[..., off] On GENERIC, we don't do that and e.g. for constant evaluation it could be pretty harmful if e.g. such pointers are dereferenced, because it can lose what exact field it was starting with etc., all it knows is the base and offset, type and alias set. Instead of teaching the match.pd address_compare about 3 extra variants where one or both compared operands are pointer_plus, this patch attempts to fold operands of comparisons similarly to gimple_fold_stmt_to_constant_1 before calling fold_binary on it. There is another thing though, while we do have (x p+ y) p+ z to x p+ (y + z) simplification which works on GIMPLE well because of the useless pointer conversions, on GENERIC we can have pointer casts in between and at that point we can end up with large expressions like ((type3) (((type2) ((type1) (&var + 2) + 2) + 2) + 2)) etc. Pointer-plus doesn't really care what exact pointer type it has as long as it is a pointer, so the following match.pd simplification for GENERIC only (it is useless for GIMPLE) also moves the cast so that nested p+ can be simplified. Note, I've noticed we don't really diagnose going out of bounds with pointer_plus (unlike e.g. with ARRAY_REF) during constant evaluation, I think another patch for cxx_eval_binary_expression with POINTER_PLUS will be needed. But it isn't clear to me what exactly it should do in case of subobjects. If we start with address of a whole var, (&var), I guess we should diagnose if the pointer_plus gets before start of the var (i.e. "negative") or 1 byte past the end of the var, but what if we start with &var.field or &var.field[3] ? For &var.field, shall we diagnose out of bounds of field (except perhaps flexible members?) or the whole var? For ARRAY_REFs, I assume we must at least strip all the outer ARRAY_REFs and so start with &var.field too, right? 2022-01-08 Jakub Jelinek <jakub@redhat.com> PR c++/89074 gcc/ * match.pd ((ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z))): New GENERIC simplification. gcc/cp/ * constexpr.c (cxx_maybe_fold_addr_pointer_plus): New function. (cxx_eval_binary_expression): Use it. gcc/testsuite/ * g++.dg/cpp1y/constexpr-89074-2.C: New test. * g++.dg/cpp1z/constexpr-89074-1.C: New test.
2022-01-08c++: default mem-init of array [PR103946]Jason Merrill3-8/+8
In the patch for PR92385 I added asserts to see if we tried to make a vec_init of a vec_init, but didn't see any in regression testing. This testcase is one case, which seems reasonable: we create a VEC_INIT_EXPR for the aggregate initializer, and then again to express the actual initialization of the member. We already do similar collapsing of TARGET_EXPR. So let's just remove the asserts. PR c++/103946 gcc/cp/ChangeLog: * init.c (build_vec_init): Remove assert. * tree.c (build_vec_init_expr): Likewise. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/nsdmi-array1.C: New test.