aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2022-02-22arm: Fix mve_vmvnq_n_<supf><mode> argument modeChristophe Lyon1-1/+1
The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use <V_elem> iterator instead of HI in mve_vmvnq_n_<supf><mode>. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/mve.md (mve_vmvnq_n_<supf><mode>): Use V_elem mode for operand 1.
2022-02-22arm: Add support for VPR_REG in arm_class_likely_spilled_pChristophe Lyon1-1/+1
VPR_REG is the only register in its class, so it should be handled by TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling default_class_likely_spilled_p. No test fails without this patch, but it seems it should be implemented. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm.cc (arm_class_likely_spilled_p): Handle VPR_REG.
2022-02-22arm: Add GENERAL_AND_VPR_REGS regclassChristophe Lyon2-1/+9
At some point during the development of this patch series, it appeared that in some cases the register allocator wants “VPR or general” rather than “VPR or general or FP” (which is the same thing as ALL_REGS). The series does not seem to require this anymore, but it seems to be a good thing to do anyway, to give the register allocator more freedom. CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a regression in gcc.dg/stack-usage-1.c when compiled with -mthumb -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (CLASS_MAX_NREGS): Handle VPR. * config/arm/arm.cc (arm_hard_regno_nregs): Handle VPR.
2022-02-22arm: Add new tests for comparison vectorization with Neon and MVEChristophe Lyon10-0/+267
This patch mainly adds Neon tests similar to existing MVE ones, to make sure we do not break Neon when fixing MVE. mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional with 2.0f and 3.0f constants to help scan-assembler-times. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon <christophe.lyon@arm.com> gcc/testsuite/ * gcc.target/arm/simd/mve-vcmp-f32-2.c: New. * gcc.target/arm/simd/neon-compare-1.c: New. * gcc.target/arm/simd/neon-compare-2.c: New. * gcc.target/arm/simd/neon-compare-3.c: New. * gcc.target/arm/simd/neon-compare-scalar-1.c: New. * gcc.target/arm/simd/neon-vcmp-f16.c: New. * gcc.target/arm/simd/neon-vcmp-f32-2.c: New. * gcc.target/arm/simd/neon-vcmp-f32-3.c: New. * gcc.target/arm/simd/neon-vcmp-f32.c: New. * gcc.target/arm/simd/neon-vcmp.c: New.
2022-02-22nvptx: Add -misa=sm_70Tobias Burnus5-2/+9
Add -misa=sm_70, and use it to specify the misa value in test-case gcc.target/nvptx/atomic-store-2.c. Tested on nvptx. gcc/ChangeLog: * config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Handle SM70. * config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Likewise. * config/nvptx/nvptx.opt (misa): Add sm_70 alias PTX_ISA_SM70. gcc/testsuite/ChangeLog: 2022-02-22 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/atomic-store-2.c: Use -misa=sm_70. * gcc.target/nvptx/uniform-simt-3.c: Same. Co-Authored-By: Tom de Vries <tdevries@suse.de>
2022-02-22nvptx: Add -mptx=6.0Tobias Burnus2-3/+7
Currently supported internally are 3.1, 6.0, 6.3 and 7.0. However, -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0. Add -mptx=6.0 for consistency. Tested on nvptx. gcc/ChangeLog: * config/nvptx/nvptx.opt (mptx): Add 6.0 alias PTX_VERSION_6_0. * doc/invoke.texi (-mptx): Update for new values and defaults. Co-Authored-By: Tom de Vries <tdevries@suse.de>
2022-02-22[nvptx] Add -mptx-commentTom de Vries2-0/+45
Add functionality that indicates which insns are added by -minit-regs, such that for instance we have for pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP mov.u32 %r26, 0; // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // End: Added by -minit-regs=3: // #NO_APP ... Can be switched off using -mno-ptx-comment. Tested on nvptx. gcc/ChangeLog: 2022-02-21 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (gen_comment): New function. (workaround_uninit_method_1, workaround_uninit_method_2) (workaround_uninit_method_3): : Use gen_comment. * config/nvptx/nvptx.opt (mptx-comment): New option.
2022-02-22Dump def that we use for a splatRichard Biener1-1/+2
This makes the SLP vectorizer dump the def we use for a splat to aid debugging. 2022-02-22 Richard Biener <rguenther@suse.de> * tree-vect-slp.cc (vect_build_slp_tree_2): Dump the def used for a splat.
2022-02-22Implement constant-folding simplifications of reductions.Roger Sayle4-0/+58
This patch addresses a code quality regression in GCC 12 by implementing some constant folding/simplification transformations for REDUC_PLUS_EXPR in match.pd. The motivating example is gcc.dg/vect/pr89440.c which with -O2 -ffast-math (with vectorization now enabled) gets optimized to: float f (float x) { vector(4) float vect_x_14.11; vector(4) float _2; float _32; _2 = {x_9(D), 0.0, 0.0, 0.0}; vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 }; _32 = .REDUC_PLUS (vect_x_14.11_29); [tail call] return _32; } With these proposed new transformations, we can simplify the above code even further. float f (float x) { float _32; _32 = x_9(D) + 1.36e+2; return _32; } [which happens to match what we'd produce with -fno-tree-vectorize, and with GCC 11]. 2022-02-22 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog * fold-const.cc (ctor_single_nonzero_element): New function to return the single non-zero element of a (vector) constructor. * fold-const.h (ctor_single_nonzero_element): Prototype here. * match.pd (reduc (constructor@0)): Simplify reductions of a constructor containing a single non-zero element. (reduc (@0 op VECTOR_CST) -> (reduc @0) op CONST): Simplify reductions of vector operations of the same operator with constant vector operands. gcc/testsuite/ChangeLog * gcc.dg/fold-reduc-1.c: New test case.
2022-02-22ranger: Fix up REALPART_EXPR/IMAGPART_EXPR handling [PR104604]Jakub Jelinek2-2/+38
The following testcase is miscompiled since r12-3328. That change assumed that if rhs1 of a GIMPLE_ASSIGN is COMPLEX_CST, then that is the value of the lhs of the stmt, but that is not the case always, only if it is a GIMPLE_SINGLE_RHS stmt. If it is e.g. GIMPLE_UNARY_RHS or GIMPLE_BINARY_RHS (the latter happens in the testcase), then it can be e.g. __complex__ (3, 0) / var and the REALPART_EXPR of that isn't 3, but the realpart of the division. I assume once the ranger can do complex numbers adjust_*part_expr will just fetch one or the other range from a underlying complex range, but until then, we should limit this to what r12-3328 meant to do. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/104604 * gimple-range-fold.cc (adjust_imagpart_expr, adjust_realpart_expr): Only check if gimple_assign_rhs1 is COMPLEX_CST if gimple_assign_rhs_code is COMPLEX_CST. * gcc.c-torture/execute/pr104604.c: New test.
2022-02-22i386: Fix up copysign/xorsign expansion [PR104612]Jakub Jelinek2-11/+54
We ICE on the following testcase for -m32 since r12-3435. because operands[2] is (subreg:SF (reg:DI ...) 0) and lowpart_subreg (V4SFmode, operands[2], SFmode) returns NULL, and that is what we use in AND etc. insns we emit. My earlier version of the patch fixes that by calling force_reg for the input operands, to make sure they are really REGs and so lowpart_subreg will succeed on them - even for theoretical MEMs using REGs there seems desirable, we don't want to read following memory slots for the paradoxical subreg. For the outputs, I thought we'd get better code by always computing result into a new pseudo and them move lowpart of that pseudo into dest. Unfortunately it regressed FAIL: gcc.target/i386/pr89984-2.c scan-assembler-not vmovaps on which the patch changes: vandps .LC0(%rip), %xmm1, %xmm1 - vxorps %xmm0, %xmm1, %xmm0 + vxorps %xmm0, %xmm1, %xmm1 + vmovaps %xmm1, %xmm0 ret The RA sees: (insn 8 4 9 2 (set (reg:V4SF 85) (and:V4SF (subreg:V4SF (reg:SF 90) 0) (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128]))) "pr89984-2.c":7:12 2838 {*andv4sf3} (expr_list:REG_DEAD (reg:SF 90) (nil))) (insn 9 8 10 2 (set (reg:V4SF 87) (xor:V4SF (reg:V4SF 85) (subreg:V4SF (reg:SF 89) 0))) "pr89984-2.c":7:12 2842 {*xorv4sf3} (expr_list:REG_DEAD (reg:SF 89) (expr_list:REG_DEAD (reg:V4SF 85) (nil)))) (insn 10 9 14 2 (set (reg:SF 82 [ <retval> ]) (subreg:SF (reg:V4SF 87) 0)) "pr89984-2.c":7:12 142 {*movsf_internal} (expr_list:REG_DEAD (reg:V4SF 87) (nil))) (insn 14 10 15 2 (set (reg/i:SF 20 xmm0) (reg:SF 82 [ <retval> ])) "pr89984-2.c":8:1 142 {*movsf_internal} (expr_list:REG_DEAD (reg:SF 82 [ <retval> ]) (nil))) (insn 15 14 0 2 (use (reg/i:SF 20 xmm0)) "pr89984-2.c":8:1 -1 (nil)) and doesn't know that if it would use xmm0 not just for pseudo 82 but also for pseudo 87, it could create a noop move in insn 10 and so could avoid an extra register copy and nothing later on is able to figure that out either. I don't know how the RA should know that though. So that we don't regress, this version of the patch will do this stuff (i.e. use fresh vector pseudo as destination and then move lowpart of that to dest) over what it used before (i.e. use paradoxical subreg of the dest) only if lowpart_subreg returns NULL. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR target/104612 * config/i386/i386-expand.cc (ix86_expand_copysign): Call force_reg on input operands before calling lowpart_subreg on it. For output operand, use a vmode pseudo as destination and then move its lowpart subreg into operands[0] if lowpart_subreg fails on dest. (ix86_expand_xorsign): Likewise. * gcc.dg/pr104612.c: New test.
2022-02-22[nvptx] Xfail sibcall execution testsTom de Vries3-3/+3
On nvptx I see the following FAIL: ... FAIL: gcc.dg/sibcall-3.c execution test ... The test-case states that "this test is xfailed on targets without sibcall patterns". The nvptx port doesn't have a sibcall pattern, so add an xfail. Likewise in two similar test-cases. Tested on nvptx. gcc/testsuite/ChangeLog: 2022-02-20 Tom de Vries <tdevries@suse.de> * gcc.dg/sibcall-10.c: Xfail execution test for nvptx. * gcc.dg/sibcall-3.c: Same. * gcc.dg/sibcall-4.c: Same.
2022-02-22[nvptx, testsuite] Remove mptx settings in gcc.target/nvptx testsTom de Vries7-7/+7
Some test-cases in gcc/testsuite/gcc.target/nvptx contain mptx settings, which are paired with misa settings, in order to have the mptx version support the misa version. Since commit decde11183bd ("[nvptx] Choose -mptx default based on -misa"), this is no longer necessary. Remove the mptx settings. Tested on nvptx. gcc/testsuite/ChangeLog: 2022-02-20 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/float16-1.c: Drop -mptx setting. * gcc.target/nvptx/float16-2.c: Same. * gcc.target/nvptx/float16-3.c: Same. * gcc.target/nvptx/float16-4.c: Same. * gcc.target/nvptx/float16-5.c: Same. * gcc.target/nvptx/float16-6.c: Same. * gcc.target/nvptx/tanh-1.c: Same.
2022-02-22target/99881 - x86 vector cost of CTOR from integer regsRichard Biener7-3/+102
This uses the now passed SLP node to the vectorizer costing hook to adjust vector construction costs for the cost of moving an integer component from a GPR to a vector register when that's required for building a vector from components. A cruical difference here is whether the component is loaded from memory or extracted from a vector register as in those cases no intermediate GPR is involved. The pr99881.c testcase can be Un-XFAILed with this patch, the pr91446.c testcase now produces scalar code which looks superior to me so I've adjusted it as well. 2022-02-18 Richard Biener <rguenther@suse.de> PR tree-optimization/104582 PR target/99881 * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Cost GPR to vector register moves for integer vector construction. * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: New. * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-2.c: Likewise. * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Likewise. * gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Likewise. * gcc.target/i386/pr99881.c: Un-XFAIL. * gcc.target/i386/pr91446.c: Adjust to not expect vectorization.
2022-02-22tree-optimization/104582 - make SLP node available in vector cost hookRichard Biener8-38/+75
This adjusts the vectorizer costing API to allow passing down the SLP node the vector stmt is created from. 2022-02-18 Richard Biener <rguenther@suse.de> PR tree-optimization/104582 * tree-vectorizer.h (stmt_info_for_cost::node): New field. (vector_costs::add_stmt_cost): Add SLP node parameter. (dump_stmt_cost): Likewise. (add_stmt_cost): Likewise, new overload and adjust. (add_stmt_costs): Adjust. (record_stmt_cost): New overload. * tree-vectorizer.cc (dump_stmt_cost): Dump the SLP node. (vector_costs::add_stmt_cost): Adjust. * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust. * tree-vect-slp.cc (vect_prologue_cost_for_slp): Record the SLP node for costing. (vectorizable_slp_permutation): Likewise. * tree-vect-stmts.cc (record_stmt_cost): Adjust and add new overloads. * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Adjust. * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost): Adjust. * config/rs6000/rs6000.cc (rs6000_vector_costs::add_stmt_cost): Adjust. (rs6000_cost_data::adjust_vect_cost_per_loop): Likewise.
2022-02-22tree-optimization/104582 - Simplify vectorizer cost API and fixesRichard Biener3-24/+38
This simplifies the vectorizer cost API by providing overloads to add_stmt_cost and record_stmt_cost suitable for scalar stmt and branch stmt costing which do not need information like a vector type or alignment. It also fixes two mistakes where costs for versioning tests were recorded as vector stmt rather than scalar stmt. This is a first patch to simplify the actual fix for PR104582. 2022-02-18 Richard Biener <rguenther@suse.de> PR tree-optimization/104582 * tree-vectorizer.h (add_stmt_cost): New overload. (record_stmt_cost): Likewise. * tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost): Use add_stmt_costs. (vect_get_known_peeling_cost): Use new overloads. (vect_estimate_min_profitable_iters): Likewise. Consistently use scalar_stmt for costing versioning checks. * tree-vect-stmts.cc (record_stmt_cost): New overload.
2022-02-22i386: Relax cmpxchg instruction under -mrelax-cmpxchg-loop [PR103069]Hongyu Wang6-65/+226
For cmpxchg, it is commonly used in spin loop, and several user code such as pthread directly takes cmpxchg as loop condition, which cause huge cache bouncing. This patch extends previous implementation to relax all cmpxchg instruction under -mrelax-cmpxchg-loop with an extra atomic load, compare and emulate the failed cmpxchg behavior. For original spin loop which looks like loop: mov %eax,%r8d or $1,%r8d lock cmpxchg %r8d,(%rdi) jne loop It will now truns to loop: mov %eax,%r8d or $1,%r8d mov (%r8),%rsi <--- load lock first cmp %rsi,%rax <--- compare with expected input jne .L2 <--- lock ne expected lock cmpxchg %r8d,(%rdi) jne loop L2: mov %rsi,%rax <--- perform the behavior of failed cmpxchg jne loop under -mrelax-cmpxchg-loop. gcc/ChangeLog: PR target/103069 * config/i386/i386-expand.cc (ix86_expand_atomic_fetch_op_loop): Split atomic fetch and loop part. (ix86_expand_cmpxchg_loop): New expander for cmpxchg loop. * config/i386/i386-protos.h (ix86_expand_cmpxchg_loop): New prototype. * config/i386/sync.md (atomic_compare_and_swap<mode>): Call new expander under TARGET_RELAX_CMPXCHG_LOOP. (atomic_compare_and_swap<mode>): Likewise for doubleword modes. gcc/testsuite/ChangeLog: PR target/103069 * gcc.target/i386/pr103069-2.c: Adjust result check. * gcc.target/i386/pr103069-3.c: New test. * gcc.target/i386/pr103069-4.c: Likewise.
2022-02-22Daily bump.GCC Administrator3-1/+79
2022-02-21runtime/internal/syscall: build dummy package if not LinuxIan Lance Taylor1-1/+1
Fixes libgo build on non-Linux systems. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/387134
2022-02-21aarch64: Add compiler support for Shadow Call StackDan Li19-34/+330
Shadow Call Stack can be used to protect the return address of a function at runtime, and clang already supports this feature[1]. To enable SCS in user mode, in addition to compiler, other support is also required (as discussed in [2]). This patch only adds basic support for SCS from the compiler side, and provides convenience for users to enable SCS. For linux kernel, only the support of the compiler is required. [1] https://clang.llvm.org/docs/ShadowCallStack.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768 Signed-off-by: Dan Li <ashimida@linux.alibaba.com> gcc/ChangeLog: * config/aarch64/aarch64.cc (SLOT_REQUIRED): Change wb_candidate[12] to wb_push_candidate[12]. (aarch64_layout_frame): Likewise, and change callee_adjust when scs is enabled. (aarch64_save_callee_saves): Change wb_candidate[12] to wb_push_candidate[12]. (aarch64_restore_callee_saves): Change wb_candidate[12] to wb_pop_candidate[12]. (aarch64_get_separate_components): Change wb_candidate[12] to wb_push_candidate[12]. (aarch64_expand_prologue): Push x30 onto SCS before it's pushed onto stack. (aarch64_expand_epilogue): Pop x30 frome SCS, while preventing it from being popped from the regular stack again. (aarch64_override_options_internal): Add SCS compile option check. (TARGET_HAVE_SHADOW_CALL_STACK): New hook. * config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled, wb_pop_candidate[12], and rename wb_candidate[12] to wb_push_candidate[12]. * config/aarch64/aarch64.md (scs_push): New template. (scs_pop): Likewise. * doc/invoke.texi: Document -fsanitize=shadow-call-stack. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Add hook have_shadow_call_stack. * flag-types.h (enum sanitize_code): Add SANITIZE_SHADOW_CALL_STACK. * opts.cc (parse_sanitizer_options): Add shadow-call-stack and exclude SANITIZE_SHADOW_CALL_STACK. * target.def: New hook. * toplev.cc (process_options): Add SCS compile option check. * ubsan.cc (ubsan_expand_null_ifn): Enum type conversion. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shadow_call_stack_1.c: New test. * gcc.target/aarch64/shadow_call_stack_2.c: New test. * gcc.target/aarch64/shadow_call_stack_3.c: New test. * gcc.target/aarch64/shadow_call_stack_4.c: New test. * gcc.target/aarch64/shadow_call_stack_5.c: New test. * gcc.target/aarch64/shadow_call_stack_6.c: New test. * gcc.target/aarch64/shadow_call_stack_7.c: New test. * gcc.target/aarch64/shadow_call_stack_8.c: New test.
2022-02-21[nvptx] Initialize ptx regsTom de Vries2-0/+192
With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into: ... FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test ... while the test-cases pass with nvptx-none-run -O0. The problem is that the generated ptx contains a read from an uninitialized ptx register, and the driver JIT doesn't handle this well. For -O2 and -O3, we can get rid of the FAIL using --param logical-op-non-short-circuit=0. But not for -O1. At -O1, the test-case minimizes to: ... void __attribute__((noinline, noclone)) foo (int y) { int c; for (int i = 0; i < y; i++) { int d = i + 1; if (i && d <= c) __builtin_abort (); c = d; } } int main () { foo (2); return 0; } ... Note that the test-case does not contain an uninitialized use. In the first iteration, i is 0 and consequently c is not read. In the second iteration, c is read, but by that time it's already initialized by 'c = d' from the first iteration. AFAICT the problem is introduced as follows: the conditional use of c in the loop body is translated into an unconditional use of c in the loop header: ... # c_1 = PHI <c_4(D)(2), c_9(6)> ... which forwprop1 propagates the 'c_9 = d_7' assignment into: ... # c_1 = PHI <c_4(D)(2), d_7(6)> ... which ends up being translated by expand into an unconditional: ... (insn 13 12 0 (set (reg/v:SI 22 [ c ]) (reg/v:SI 23 [ d ])) -1 (nil)) ... at the start of the loop body, creating an uninitialized read of d on the path from loop entry. By disabling coalesce_ssa_name, we get the more usual copies on the incoming edges. The copy on the loop entry path still does an uninitialized read, but that one's now initialized by init-regs. The test-case passes, also when disabling init-regs, so it's possible that the JIT driver doesn't object to this type of uninitialized read. Now that we characterized the problem to some degree, we need to fix this, because either: - we're violating an undocumented ptx invariant, and this is a compiler bug, or - this is is a driver JIT bug and we need to work around it. There are essentially two strategies to address this: - stop the compiler from creating uninitialized reads - patch up uninitialized reads using additional initialization The former will probably involve: - making some optimizations more conservative in the presence of uninitialized reads, and - disabling some other optimizations (where making them more conservative is not possible, or cannot easily be achieved). This will probably will have a cost penalty for code that does not suffer from the original problem. The latter has the problem that it may paper over uninitialized reads in the source code, or indeed over ones that were incorrectly introduced by the compiler. But it has the advantage that it allows for the problem to be addressed at a single location. There's an existing pass, init-regs, which implements a form of the latter, but it doesn't work for this example because it only inserts additional initialization for uses that have not a single reaching definition. Fix this by adding initialization of uninitialized ptx regs in reorg. Control the new functionality using -minit-regs=<0|1|2|3>, meaning: - 0: disabled. - 1: add initialization of all regs at the entry bb - 2: add initialization of uninitialized regs at the entry bb - 3: add initialization of uninitialized regs close to the use and defaulting to 3. Tested on nvptx. gcc/ChangeLog: 2022-02-17 Tom de Vries <tdevries@suse.de> PR target/104440 * config/nvptx/nvptx.cc (workaround_uninit_method_1) (workaround_uninit_method_2, workaround_uninit_method_3) (workaround_uninit): New function. (nvptx_reorg): Use workaround_uninit. * config/nvptx/nvptx.opt (minit-regs): New option.
2022-02-21c++: Add testcase for already fixed PR [PR85493]Patrick Palka1-0/+16
The a1 and a2 case were fixed (by diagnosing the invalid expression) with r11-434, and the a3 case with r8-7625. PR c++/85493 gcc/testsuite/ChangeLog: * g++.dg/cpp0x/decltype80.C: New test.
2022-02-21rtl-optimization/104498: Fix comparing symbol referenceAndre Vieira1-2/+4
gcc/ChangeLog: PR rtl-optimization/104498 * alias.cc (compare_base_symbol_refs): Correct distance computation when swapping x and y.
2022-02-21c: [PR104506] Fix ICE after error due to change of type to error_mark_nodeAndrew Pinski4-7/+47
The problem here is we end up with an error_mark_node when calling useless_type_conversion_p and that ICEs. STRIP_NOPS/tree_nop_conversion has had a check for the inner type being an error_mark_node since g9a6bb3f78c96 (2000). This just adds the check also to tree_ssa_useless_type_conversion. STRIP_USELESS_TYPE_CONVERSION is mostly used inside the gimplifier and the places where it is used outside of the gimplifier would not be adding too much overhead. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski PR c/104506 gcc/ChangeLog: * tree-ssa.cc (tree_ssa_useless_type_conversion): Check the inner type before calling useless_type_conversion_p. gcc/testsuite/ChangeLog: * gcc.dg/pr104506-1.c: New test. * gcc.dg/pr104506-2.c: New test. * gcc.dg/pr104506-3.c: New test.
2022-02-21Daily bump.GCC Administrator4-1/+40
2022-02-21d: Remove handling of deleting GC allocated classes.Iain Buclaw2-23/+7
Now that the `delete' keyword has been removed from the front-end, only compiler-generated uses of DeleteExp reach the code generator via the auto-destruction of `scope class' variables. The run-time library helpers that previously were used to delete GC class objects can now be removed from the compiler. gcc/d/ChangeLog: * expr.cc (ExprVisitor::visit (DeleteExp *)): Remove handling of deleting GC allocated classes. * runtime.def (DELCLASS): Remove. (DELINTERFACE): Remove.
2022-02-20d: Merge upstream dmd cb49e99f8, druntime 55528bd1, phobos 1a3e80ec2.Iain Buclaw200-4378/+4924
D front-end changes: - Import dmd v2.099.0-beta.1. - It's now an error to use `alias this' for partial assignment. - The `delete' keyword has been removed from the language. - Using `this' and `super' as types has been removed from the language, the parser no longer specially handles this wrong code with an informative error. D Runtime changes: - Import druntime v2.099.0-beta.1. Phobos changes: - Import phobos v2.099.0-beta.1. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd cb49e99f8. * dmd/VERSION: Update version to v2.099.0-beta.1. * decl.cc (layout_class_initializer): Update call to NewExp::create. * expr.cc (ExprVisitor::visit (DeleteExp *)): Remove handling of deleting arrays and pointers. (ExprVisitor::visit (DotVarExp *)): Convert complex types to the front-end library type representing them. (ExprVisitor::visit (StringExp *)): Use getCodeUnit instead of charAt to get the value of each index in a string expression. * runtime.def (DELMEMORY): Remove. (DELARRAYT): Remove. * types.cc (TypeVisitor::visit (TypeEnum *)): Handle anonymous enums. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 55528bd1. * src/MERGE: Merge upstream phobos 1a3e80ec2. * testsuite/libphobos.hash/test_hash.d: Update. * testsuite/libphobos.betterc/test19933.d: New test.
2022-02-20Fortran: improve check of pointer initialization in DATA statementsHarald Anlauf3-0/+28
gcc/fortran/ChangeLog: PR fortran/77693 * data.cc (gfc_assign_data_value): If a variable in a data statement has the POINTER attribute, check for allowed initial data target that is compatible with pointer assignment. * gfortran.h (IS_POINTER): New macro. gcc/testsuite/ChangeLog: PR fortran/77693 * gfortran.dg/data_pointer_2.f90: New test.
2022-02-20Daily bump.GCC Administrator3-1/+53
2022-02-19[nvptx] Use _ as destination operand of atom.exchTom de Vries3-12/+35
We currently generate this code for an atomic store: ... .reg.u32 %r21; atom.exch.b32 %r21,[%r22],%r23; ... where %r21 is set but unused. Use the ptx bit bucket operand '_' instead, such that we have: ... atom.exch.b32 _,[%r22],%r23; ... [ Note that the same problem still occurs for this code: ... void atomic_store (int *ptr, int val) { __atomic_exchange_n (ptr, val, MEMMODEL_RELAXED); } ... ] Tested on nvptx. gcc/ChangeLog: 2022-02-19 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_reorg_uniform_simt): Handle SET insn. * config/nvptx/nvptx.md (define_insn "nvptx_atomic_store<mode>"): Rename to ... (define_insn "nvptx_atomic_store_sm70<mode>"): This. (define_insn "nvptx_atomic_store<mode>"): New define_insn. (define_expand "atomic_store<mode>"): Handle rename. Use nvptx_atomic_store instead of atomic_exchange. gcc/testsuite/ChangeLog: 2022-02-19 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/atomic-store-1.c: Update.
2022-02-19[nvptx] Don't skip atomic insns in nvptx_reorg_uniform_simtTom de Vries3-5/+21
In nvptx_reorg_uniform_simt we have a loop: ... for (insn = get_insns (); insn; insn = next) { next = NEXT_INSN (insn); if (!(CALL_P (insn) && nvptx_call_insn_is_syscall_p (insn)) && !(NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == PARALLEL && get_attr_atomic (insn))) continue; ... that intends to handle syscalls and atomic insns. However, this also silently skips the atomic insn nvptx_atomic_store, which has GET_CODE (PATTERN (insn)) == SET. This does not cause problems, because the nvptx_atomic_store actually maps onto a "st" insn, and therefore is not atomic and doesn't need to be handled by nvptx_reorg_uniform_simt. Fix this by: - explicitly setting nvptx_atomic_store's atomic attribute to false, - rewriting the skip condition to make sure all insn with atomic attribute are handled, and - asserting that all handled insns are PARALLEL. Tested on nvptx. gcc/ChangeLog: 2022-02-19 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_reorg_uniform_simt): Handle all insns with atomic attribute. Assert that all handled insns are PARALLELs. * config/nvptx/nvptx.md (define_insn "nvptx_atomic_store<mode>"): Set atomic attribute to false. gcc/testsuite/ChangeLog: 2022-02-19 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/uniform-simt-3.c: New test.
2022-02-19[nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simtTom de Vries3-3/+52
With the default ptx isa 6.0, we have for uniform-simt-1.c: ... @%r33 atom.global.cas.b32 %r26, [a], %r28, %r29; shfl.sync.idx.b32 %r26, %r26, %r32, 31, 0xffffffff; ... The atomic insn is predicated by -muniform-simt, and the subsequent insn does a warp sync, at which point the warp is uniform again. But with -mptx=3.1, we have instead: ... @%r33 atom.global.cas.b32 %r26, [a], %r28, %r29; shfl.idx.b32 %r26, %r26, %r32, 31; ... The shfl does not sync the warp, and we want the warp to go back to executing uniformly asap. We cannot enforce this, but at least check this using nvptx_uniform_warp_check, similar to how that is done for openacc. Likewise, detect the case that no shfl insn is emitted, and add a nvptx_uniform_warp_check or nvptx_warpsync. gcc/ChangeLog: 2022-02-19 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Change return type to bool. (nvptx_reorg_uniform_simt): Insert nvptx_uniform_warp_check or nvptx_warpsync, if necessary. gcc/testsuite/ChangeLog: 2022-02-19 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/uniform-simt-1.c: Add scan-assembler test. * gcc.target/nvptx/uniform-simt-2.c: New test.
2022-02-19asan: Mark instrumented vars addressable [PR102656]Jakub Jelinek2-2/+34
We ICE on the following testcase, because the asan1 pass decides to instrument <retval>.x = 0; and does that by _13 = &<retval>.x; .ASAN_CHECK (7, _13, 4, 4); <retval>.x = 0; and later sanopt pass turns that into: _39 = (unsigned long) &<retval>.x; _40 = _39 >> 3; _41 = _40 + 2147450880; _42 = (signed char *) _41; _43 = *_42; _44 = _43 != 0; _45 = _39 & 7; _46 = (signed char) _45; _47 = _46 + 3; _48 = _47 >= _43; _49 = _44 & _48; if (_49 != 0) goto <bb 10>; [0.05%] else goto <bb 9>; [99.95%] <bb 10> [local count: 536864]: __builtin___asan_report_store4 (_39); <bb 9> [local count: 1073741824]: <retval>.x = 0; The problem is during expansion, <retval> isn't marked TREE_ADDRESSABLE, even when we take its address in (unsigned long) &<retval>.x. Now, instrument_derefs has code to avoid the instrumentation altogether if we can prove the access is within bounds of an automatic variable in the current function and the var isn't TREE_ADDRESSABLE (or we don't instrument use after scope), but we do it solely for VAR_DECLs. I think we should treat RESULT_DECLs exactly like that too, which is what the following patch does. I must say I'm unsure about PARM_DECLs, those can have different cases, either they are fully or partially passed in registers, then if we take parameter's address, they are in a local copy inside of a function and so work like those automatic vars. But if they are fully passed in memory, we typically just take address of the slot and in that case they live in the caller's frame. It is true we don't (can't) put any asan padding in between the arguments, so all asan could detect in that case is if caller passes fewer on stack arguments or smaller arguments than callee accepts. Anyway, as I'm unsure, I haven't added PARM_DECLs to that case. And another thing is, when we actually build_fold_addr_expr, we need to mark_addressable the inner if it isn't addressable already. 2022-02-19 Jakub Jelinek <jakub@redhat.com> PR sanitizer/102656 * asan.cc (instrument_derefs): If inner is a RESULT_DECL and access is known to be within bounds, treat it like automatic variables. If instrumenting access and inner is {VAR,PARM,RESULT}_DECL from current function and !TREE_STATIC which is not TREE_ADDRESSABLE, mark it addressable. * g++.dg/asan/pr102656.C: New test.
2022-02-19Daily bump.GCC Administrator4-1/+70
2022-02-18libgo: update Hurd supportIan Lance Taylor1-1/+1
Patches from Svante Signell for PR go/104290. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/386797
2022-02-18Mark Power10 fusion option undocumented and remove sub-options.Pat Haugen5-238/+174
gcc/ * config/rs6000/rs6000.opt (mpower10-fusion): Mark Undocumented. (mpower10-fusion-ld-cmpi, mpower10-fusion-2logical, mpower10-fusion-logical-add, mpower10-fusion-add-logical, mpower10-fusion-2add, mpower10-fusion-2store): Remove. * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER, OTHER_P9_VECTOR_MASKS): Remove Power10 fusion sub-options. * config/rs6000/rs6000.cc (rs6000_option_override_internal, power10_sched_reorder): Likewise. * config/rs6000/genfusion.pl (gen_ld_cmpi_p10, gen_logical_addsubf, gen_addadd): Likewise * config/rs6000/fusion.md: Regenerate.
2022-02-18libgo: update to Go1.18rc1 releaseIan Lance Taylor1-1/+1
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/386594
2022-02-18pieces-memset-21.c: Expect vzeroupper for ia32H.J. Lu1-1/+2
Update gcc.target/i386/pieces-memset-21.c to expect vzeroupper for ia32 caused by commit fe79d652c96b53384ddfa43e312cb0010251391b Author: Richard Biener <rguenther@suse.de> Date: Thu Feb 17 14:40:16 2022 +0100 target/104581 - compile-time regression in mode-switching PR target/104581 * gcc.target/i386/pieces-memset-21.c: Expect vzeroupper for ia32.
2022-02-18rs6000: Fix up posix_memalign call in _mm_malloc [PR104598]Jakub Jelinek1-1/+1
The uglification changes went in one spot too far and uglified also the anem of function, posix_memalign should be called like that and not a non-existent function instead of it. 2022-02-18 Jakub Jelinek <jakub@redhat.com> PR target/104257 PR target/104598 * config/rs6000/mm_malloc.h (_mm_malloc): Call posix_memalign rather than __posix_memalign.
2022-02-18target/104581 - compile-time regression in mode-switchingRichard Biener2-76/+5
The x86 backend piggy-backs on mode-switching for insertion of vzeroupper. A recent improvement there was implemented in a way to walk possibly the whole basic-block for all DF reg def definitions in its mode_needed hook which is called for each instruction in a basic-block during mode-switching local analysis. The following mostly reverts this improvement. It needs to be re-done in a way more consistent with a local dataflow which probably means making targets aware of the state of the local dataflow analysis. 2022-02-17 Richard Biener <rguenther@suse.de> PR target/104581 * config/i386/i386.cc (ix86_avx_u128_mode_source): Remove. (ix86_avx_u128_mode_needed): Return AVX_U128_DIRTY instead of calling ix86_avx_u128_mode_source which would eventually have returned AVX_U128_ANY in some very special case. * gcc.target/i386/pr101456-1.c: XFAIL.
2022-02-18tree-optimization/96881 - CD-DCE and CLOBBERsRichard Biener3-4/+109
CD-DCE does not consider CLOBBERs as necessary in the attempt to not prevent DCE of SSA defs it uses. A side-effect of that is that it also removes all its control dependences if they are not made necessary by other means. When we later try to preserve as many CLOBBERs as possible we have to make sure we also preserved the controlling conditions, otherwise a CLOBBER can now appear on a path where it was not executed before, leading to wrong code as seen in the testcase. I've tried to continue to handle both direct and indirect CLOBBERs optimistically, allowing CD-DCE to remove control flow that just controls CLOBBERs but that regresses for example the stack coalescing test g++.dg/opt/pr80032.C. The pattern there is if (pred) D.2512 = CLOBBER; else D.2512 = CLOBBER; basically we have all paths leading to the same clobber but we could safely cut some branches which we do not realize early enough. This regression can be mitigated by no longer considering direct CLOBBERs optimistically - the original motivation for the CD-DCE handling wasn't removal of control flow but SSA defs of the address. Handling indirect vs. direct clobbers differently feels somewhat wrong, still the patch goes with this solution. 2022-02-15 Richard Biener <rguenther@suse.de> PR tree-optimization/96881 * tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Comment CLOBBER handling. (control_parents_preserved_p): New function. (eliminate_unnecessary_stmts): Check that we preserved control parents before retaining a CLOBBER. (perform_tree_ssa_dce): Pass down aggressive flag to eliminate_unnecessary_stmts. * g++.dg/torture/pr96881-1.C: New testcase. * g++.dg/torture/pr96881-2.C: Likewise.
2022-02-17c++: implicit 'this' in noexcept-spec within class tmpl [PR94944]Patrick Palka3-16/+28
Here when instantiating the noexcept-spec we fail to resolve the implicit object for the member call A<T>::f() ultimately because maybe_instantiate_noexcept sets current_class_ptr/ref to the dependent 'this' (of type B<T>) rather than the specialized 'this' (of type B<int>). This patch fixes this by making maybe_instantiate_noexcept set current_class_ptr/ref to the specialized 'this' instead, consistent with what tsubst_function_type does when substituting into the trailing return type of a non-static member function. PR c++/94944 gcc/cp/ChangeLog: * pt.cc (maybe_instantiate_noexcept): For non-static member functions, set current_class_ptr/ref to the specialized 'this' instead. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept34.C: Adjusted expected diagnostics. * g++.dg/cpp0x/noexcept75.C: New test.
2022-02-18Daily bump.GCC Administrator7-1/+170
2022-02-17c++: inlining explicit instantiations [PR104539]Jason Merrill2-3/+15
The PR10968 fix cleared DECL_COMDAT to force output of explicit instantiations. Then the PR59469 fix added a call to mark_needed, after which we no longer need to clear DECL_COMDAT, and leaving it set allows us to inline explicit instantiations without worrying about symbol interposition. I suppose there's an argument to be made that an explicit instantiation declaration (extern template) should clear DECL_COMDAT, since that suggests that there will be only a single instantiation somewhere that could be subject to interposition, but that doesn't change the 'inline' semantics, and it seems cleaner to treat template instantiations uniformly. PR c++/104539 gcc/cp/ChangeLog: * pt.cc (mark_decl_instantiated): Don't clear DECL_COMDAT. gcc/testsuite/ChangeLog: * g++.dg/ipa/inline-4.C: New test.
2022-02-17tree: tweak warn_deprecated_useJason Merrill2-3/+7
While looking at PR90451 I noticed that this function was failing to find the attributes if called with a variant of the struct. gcc/ChangeLog: * tree.cc (warn_deprecated_use): Look for TYPE_STUB_DECL on TYPE_MAIN_VARIANT. gcc/testsuite/ChangeLog: * g++.dg/warn/deprecated-16.C: New test.
2022-02-17c++: avoid duplicate deprecated warning [PR90451]Jason Merrill9-18/+132
We were getting the deprecated warning twice for the same call because we called mark_used first in finish_qualified_id_expr and then again in build_over_call. Let's not call it the first time; C++17 clarified that a function is used only when it is selected from an overload set, which happens later. Then I had to add a few more uses in places that don't do anything further with the expression (convert_to_void, finish_decltype_type), and places that use the expression more unusually (cp_build_addr_expr_1, convert_nontype_argument). The new mark_single_function is mostly so that I only have to put the comment in one place. PR c++/90451 gcc/cp/ChangeLog: * decl2.cc (mark_single_function): New. * cp-tree.h: Declare it. * typeck.cc (cp_build_addr_expr_1): mark_used when making a PMF. * semantics.cc (finish_qualified_id_expr): Not here. (finish_id_expression_1): Or here. (finish_decltype_type): Call mark_single_function. * cvt.cc (convert_to_void): And here. * pt.cc (convert_nontype_argument): And here. * init.cc (build_offset_ref): Adjust assert. gcc/testsuite/ChangeLog: * g++.dg/warn/deprecated-14.C: New test. * g++.dg/warn/deprecated-15.C: New test.
2022-02-17rs6000: __Uglify non-uglified local variables in headersPaul A. Clarke8-1341/+1340
Properly prefix (with "__") all local variables in shipped headers for x86 compatibility intrinsics implementations. This avoids possible problems with usages like: ``` ``` 2022-02-16 Paul A. Clarke <pc@us.ibm.com> gcc PR target/104257 * config/rs6000/bmi2intrin.h: Uglify local variables. * config/rs6000/emmintrin.h: Likewise. * config/rs6000/mm_malloc.h: Likewise. * config/rs6000/mmintrin.h: Likewise. * config/rs6000/pmmintrin.h: Likewise. * config/rs6000/smmintrin.h: Likewise. * config/rs6000/tmmintrin.h: Likewise. * config/rs6000/xmmintrin.h: Likewise.
2022-02-17rs6000: Workaround for new ifcvt behavior [PR104335].Robin Dapp1-0/+6
Since r12-6747-gaa8cfe785953a0 ifcvt passes a "cc comparison" i.e. the representation of the result of a comparison to the backend. rs6000_emit_int_cmove () is not prepared to handle this. Therefore, this patch makes it return false in such a case. PR target/104335 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_emit_int_cmove): Return false if the expected comparison's first operand is of mode MODE_CC.
2022-02-17c-family: Remove names of unused parametersJonathan Wakely1-13/+13
C++ allows unnamed parameters, which means we don't need to call them 'dummy' and mark them with the unused attribute. gcc/c-family/ChangeLog: * c-pragma.cc (handle_pragma_pack): Remove parameter name. (handle_pragma_weak): Likewise. (handle_pragma_scalar_storage_order): Likewise. (handle_pragma_redefine_extname): Likewise. (handle_pragma_visibility): Likewise. (handle_pragma_diagnostic): Likewise. (handle_pragma_target): Likewise. (handle_pragma_optimize): Likewise. (handle_pragma_push_options): Likewise. (handle_pragma_pop_options): Likewise. (handle_pragma_reset_options): Likewise. (handle_pragma_message): Likewise. (handle_pragma_float_const_decimal64): Likewise.
2022-02-17Add missing target selectorEric Botcazou1-1/+1
gcc/testsuite/ PR target/79754 * gcc.target/i386/pr79754.c: Add target dfp.