aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-10-30MATCH: first of the value replacement moving from phioptAndrew Pinski5-0/+86
This moves a few simple patterns that are done in value replacement in phiopt over to match.pd. Just the simple ones which might show up in other code. This allows some optimizations to happen even without depending on sinking from happening and in some cases where phiopt is not invoked (cond-1.c is an example there). Changes since v1: * v2: Add an extra testcase to showcase improvements at -O1. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: (`a == 0 ? b : b + a`, `a == 0 ? b : b - a`): New patterns. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/cond-1.c: New test. * gcc.dg/tree-ssa/phi-opt-value-1.c: New test. * gcc.dg/tree-ssa/phi-opt-value-1a.c: New test. * gcc.dg/tree-ssa/phi-opt-value-2.c: New test.
2023-10-31Daily bump.GCC Administrator4-1/+245
2023-10-30i386: Zhaoxin yongfeng enablementMayshao18-49/+1095
Enable -march/-mtune=yongfeng. Costs and tunings are set according to the characteristics of the processor. Add a new .md file to describe yongfeng processor. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize yongfeng. * common/config/i386/i386-common.cc: Add yongfeng. * common/config/i386/i386-cpuinfo.h (enum processor_subtypes): Add ZHAOXIN_FAM7H_YONGFENG. * config.gcc: Add yongfeng. * config/i386/driver-i386.cc (host_detect_local_cpu): Let -march=native recognize yongfeng processors. * config/i386/i386-c.cc (ix86_target_macros_internal): Add yongfeng. * config/i386/i386-options.cc (m_YONGFENG): New definition. (m_ZHAOXIN): Ditto. * config/i386/i386.h (enum processor_type): Add PROCESSOR_YONGFENG. * config/i386/i386.md: Add yongfeng. * config/i386/lujiazui.md: Fix typo. * config/i386/x86-tune-costs.h (struct processor_costs): Add yongfeng costs. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add yongfeng. (ix86_adjust_cost): Ditto. * config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Replace m_LUJIAZUI with m_ZHAOXIN. (X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto. (X86_TUNE_MOVX): Ditto. (X86_TUNE_MEMORY_MISMATCH_STALL): Ditto. (X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto. (X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto. (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto. (X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto. (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto. (X86_TUNE_USE_LEAVE): Ditto. (X86_TUNE_PUSH_MEMORY): Ditto. (X86_TUNE_LCP_STALL): Ditto. (X86_TUNE_INTEGER_DFMODE_MOVES): Ditto. (X86_TUNE_OPT_AGU): Ditto. (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto. (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto. (X86_TUNE_USE_SAHF): Ditto. (X86_TUNE_USE_BT): Ditto. (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto. (X86_TUNE_ONE_IF_CONV_INSN): Ditto. (X86_TUNE_AVOID_MFENCE): Ditto. (X86_TUNE_EXPAND_ABS): Ditto. (X86_TUNE_USE_SIMODE_FIOP): Ditto. (X86_TUNE_USE_FFREEP): Ditto. (X86_TUNE_EXT_80387_CONSTANTS): Ditto. (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto. (X86_TUNE_SSE_TYPELESS_STORES): Ditto. (X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto. (X86_TUNE_USE_GATHER_2PARTS): Add m_YONGFENG. (X86_TUNE_USE_GATHER_4PARTS): Ditto. (X86_TUNE_USE_GATHER_8PARTS): Ditto. (X86_TUNE_AVOID_128FMA_CHAINS): Ditto. * doc/extend.texi: Add details about yongfeng. * doc/invoke.texi: Ditto. * config/i386/yongfeng.md: New file to describe yongfeng processor. gcc/testsuite/ChangeLog: * g++.target/i386/mv32.C: Handle new -march. * gcc.target/i386/funcspec-56.inc: Ditto.
2023-10-30ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)Martin Jambor6-6/+118
PR 111157 shows that IPA-modref and IPA-CP (when plugged into value numbering) can optimize out a store both before a call (because the call will overwrite it) and in the call (because the store is of the same value) and by eliminating both create miscompilation. This patch fixes that by pruning any constants from the list of IPA-CP aggregate value constants that it knows the contents of the memory can be "killed." Unfortunately, doing so is tricky. First, IPA-modref loads override kills and so only stores not loaded are truly not necessary. Looking stuff up there means doing what most of what modref_may_alias may do but doing exactly what it does is tricky because it takes also aliasing into account and has bail-out counters. To err on the side of caution in order to avoid this miscompilation we have to prune a constant when in doubt. However, pruning can interfere with the mechanism of how clone materialization distinguishes between the cases when a parameter was entirely removed and when it was both IPA-CPed and IPA-SRAed (in order to make up for the removal in debug info, which can bump into an assert when compiling g++.dg/torture/pr103669.C when we are not careful). Therefore this patch: 1) marks constants that IPA-modref has in its kill list with a new "killed" flag, and 2) prunes the list from entries with this flag after materialization and IPA-CP transformation is done using the template introduced in the previous patch It does not try to look up anything in the load lists, this will be done as a follow-up in order to ease review. gcc/ChangeLog: 2023-10-27 Martin Jambor <mjambor@suse.cz> PR ipa/111157 * ipa-prop.h (struct ipa_argagg_value): Newf flag killed. * ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function. (update_signature): Mark any any IPA-CP aggregate constants at positions known to be killed as killed. Move check that there is clone_info after this pruning. * ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag. (ipa_argagg_value_list::push_adjusted_values): Clear the new flag. (push_agg_values_from_plats): Likewise. (ipa_push_agg_values_from_jfunc): Likewise. (estimate_local_effects): Likewise. (push_agg_values_for_index_from_edge): Likewise. * ipa-prop.cc (write_ipcp_transformation_info): Stream the killed flag. (read_ipcp_transformation_info): Likewise. (ipcp_get_aggregate_const): Update comment, assert that encountered record does not have killed flag set. (ipcp_transform_function): Prune all aggregate constants with killed set. gcc/testsuite/ChangeLog: 2023-09-18 Martin Jambor <mjambor@suse.cz> PR ipa/111157 * gcc.dg/lto/pr111157_0.c: New test. * gcc.dg/lto/pr111157_1.c: Second file of the same new test.
2023-10-30ipa-cp: Templatize filtering of m_agg_valuesMartin Jambor2-29/+37
PR 111157 points to another place where IPA-CP collected aggregate compile-time constants need to be filtered, in addition to the one place that already does this in ipa-sra. In order to re-use code, this patch turns the common bit into a template. The functionality is still covered by testcase gcc.dg/ipa/pr108959.c. gcc/ChangeLog: 2023-09-13 Martin Jambor <mjambor@suse.cz> PR ipa/111157 * ipa-prop.h (ipcp_transformation): New member function template remove_argaggs_if. * ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to filter aggreagate constants.
2023-10-30RISC-V: Make rv32i_zcmp testcase more robustPatrick O'Neill1-6/+6
GCC recently changed its register allocator which causes this testcase to fail. This patch updates the regex to be more robust to change by accepting any s register in the range of 1-9 for cm.push and cm.popret insns. gcc/testsuite/ChangeLog: * gcc.target/riscv/rv32i_zcmp.c: Accept any register in the range of 1-9 for cm.push and cm.popret insns. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-10-30ARC: Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.Roger Sayle2-0/+24
This patch optimizes PR middle-end/101955 for the ARC backend. On ARC CPUs with a barrel shifter, using two shifts is optimal as: asl_s r0,r0,31 asr_s r0,r0,31 but without a barrel shifter, GCC -O2 -mcpu=em currently generates: and r2,r0,1 ror r2,r2 add.f 0,r2,r2 sbc r0,r0,r0 with this patch, we now generate the smaller, faster and non-flags clobbering: bmsk_s r0,r0,0 neg_s r0,r0 2023-10-30 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/101955 * config/arc/arc.md (*extvsi_1_0): New define_insn_and_split to convert sign extract of the least significant bit into an AND $1 then a NEG when !TARGET_BARREL_SHIFTER. gcc/testsuite/ChangeLog PR middle-end/101955 * gcc.target/arc/pr101955.c: New test case.
2023-10-30ARC: Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.Roger Sayle2-43/+56
This patch overhauls the ARC backend's insn_cost target hook, and makes some related improvements to rtx_costs, BRANCH_COST, etc. The primary goal is to allow the backend to indicate that shifts and rotates are slow (discouraged) when the CPU doesn't have a barrel shifter. I should also acknowledge Richard Sandiford for inspiring the use of set_cost in this rewrite of arc_insn_cost; this implementation borrows heavily for the target hooks for AArch64 and ARM. The motivating example is derived from PR rtl-optimization/110717. struct S { int a : 5; }; unsigned int foo (struct S *p) { return p->a; } With a barrel shifter, GCC -O2 generates the reasonable: foo: ldb_s r0,[r0] asl_s r0,r0,27 j_s.d [blink] asr_s r0,r0,27 What's interesting is that during combine, the middle-end actually has two shifts by three bits, and a sign-extension from QI to SI. Trying 8, 9 -> 11: 8: r158:SI=r157:QI#0<<0x3 REG_DEAD r157:QI 9: r159:SI=sign_extend(r158:SI#0) REG_DEAD r158:SI 11: r155:SI=r159:SI>>0x3 REG_DEAD r159:SI Whilst it's reasonable to simplify this to two shifts by 27 bits when the CPU has a barrel shifter, it's actually a significant pessimization when these shifts are implemented by loops. This combination can be prevented if the backend provides accurate-ish estimates for insn_cost. Previously, without a barrel shifter, GCC -O2 -mcpu=em generates: foo: ldb_s r0,[r0] mov lp_count,27 lp 2f add r0,r0,r0 nop 2: # end single insn loop mov lp_count,27 lp 2f asr r0,r0 nop 2: # end single insn loop j_s [blink] which contains two loops and requires about ~113 cycles to execute. With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates: foo: ldb_s r0,[r0] mov_s r2,0 ;3 add3 r0,r2,r0 sexb_s r0,r0 asr_s r0,r0 asr_s r0,r0 j_s.d [blink] asr_s r0,r0 which requires only ~6 cycles, for the shorter shifts by 3 and sign extension. 2023-10-30 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates. Provide reasonable values for SHIFTS and ROTATES by constant bit counts depending upon TARGET_BARREL_SHIFTER. (arc_insn_cost): Use insn attributes if the instruction is recognized. Avoid calling get_attr_length for type "multi", i.e. define_insn_and_split patterns without explicit type. Fall-back to set_rtx_cost for single_set and pattern_cost otherwise. * config/arc/arc.h (COSTS_N_BYTES): Define helper macro. (BRANCH_COST): Improve/correct definition. (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.
2023-10-30ARC: Improved SImode shifts and rotates with -mswap.Roger Sayle6-11/+125
This patch improves the code generated by the ARC back-end for CPUs without a barrel shifter but with -mswap. The -mswap option provides a SWAP instruction that implements SImode rotations by 16, but also logical shift instructions (left and right) by 16 bits. Clearly these are also useful building blocks for implementing shifts by 17, 18, etc. which would otherwise require a loop. As a representative example: int shl20 (int x) { return x << 20; } GCC with -O2 -mcpu=em -mswap would previously generate: shl20: mov lp_count,10 lp 2f add r0,r0,r0 add r0,r0,r0 2: # end single insn loop j_s [blink] with this patch we now generate: shl20: mov_s r2,0 ;3 lsl16 r0,r0 add3 r0,r2,r0 j_s.d [blink] asl_s r0,r0 Although both are four instructions (excluding the j_s), the original takes ~22 cycles, and replacement ~4 cycles. 2023-10-30 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/arc/arc.cc (arc_split_ashl): Use lsl16 on TARGET_SWAP. (arc_split_ashr): Use swap and sign-extend on TARGET_SWAP. (arc_split_lshr): Use lsr16 on TARGET_SWAP. (arc_split_rotl): Use swap on TARGET_SWAP. (arc_split_rotr): Likewise. * config/arc/arc.md (ANY_ROTATE): New code iterator. (<ANY_ROTATE>si2_cnt16): New define_insn for alternate form of swap instruction on TARGET_SWAP. (ashlsi2_cnt16): Rename from *ashlsi16_cnt16 and move earlier. (lshrsi2_cnt16): New define_insn for LSR16 instruction. (*ashlsi2_cnt16): See above. gcc/testsuite/ChangeLog * gcc.target/arc/lsl16-1.c: New test case. * gcc.target/arc/lsr16-1.c: Likewise. * gcc.target/arc/swap-1.c: Likewise. * gcc.target/arc/swap-2.c: Likewise.
2023-10-30arm: move the switch tables for Arm to the RO data section.Richard Ball6-57/+90
Follow up patch to arm: Use deltas for Arm switch tables This patch moves the switch tables for Arm from the .text section into the .rodata section. gcc/ChangeLog: * config/arm/aout.h: Change to use the Lrtx label. * config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets from (!target_pure_code) condition. (ADDR_VEC_ALIGN): Add align for tables in rodata section. * config/arm/arm.cc (arm_output_casesi): Alter the function to include .Lrtx label and remove adr instructions. * config/arm/arm.md (arm_casesi_internal): Use force_reg to generate ldr instructions that would otherwise be out of range, and change rtl to accommodate force reg. Additionally remove unnecessary register temp. (casesi): Remove pure code check for Arm. * config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm targets from JUMP_TABLES_IN_TEXT_SECTION definition. gcc/testsuite/ChangeLog: * gcc.target/arm/arm-switchstatement.c: Alter the tests to change adr instruction to ldr.
2023-10-30Testsuite, i386: Mark test as requiring ifuncFrancois-Xavier Coudert1-0/+1
Test is currently failing on x86_64-apple-darwin. gcc/testsuite/ChangeLog: * gcc.target/i386/pr105554.c: Require ifunc.
2023-10-30Testsuite, Darwin: Fix trampoline warningFrancois-Xavier Coudert1-0/+3
Heap-based trampolines are enabled on darwin20 and later, meaning that no warning is emitted. gcc/testsuite/ChangeLog: * gcc.dg/Wtrampolines.c: Skip on darwin20 and later.
2023-10-30Testsuite, i386: Fix test by passing -marchFrancois-Xavier Coudert1-1/+1
The test currently fails on Darwin, where the default arch is core2. gcc/testsuite/ChangeLog: PR target/112287 * gcc.target/i386/pr111698.c: Pass -march=sandybridge.
2023-10-30Testsuite, Darwin: skip PIE testFrancois-Xavier Coudert1-0/+1
gcc/testsuite/ChangeLog: * gcc.dg/pie-2.c: Skip test on darwin.
2023-10-30rs6000: Change bitwise xor to an equality operator [PR106907]Jeevitha1-4/+4
PR106907 has a few warnings spotted from cppcheck. These warnings are related to the need of precedence clarification. Instead of using xor, it has been changed to equality check, which achieves the same result. Additionally, comment indentation has been fixed. 2023-10-11 Jeevitha Palanisamy <jeevitha@linux.ibm.com> gcc/ PR target/106907 * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Change bitwise xor to an equality and fix comment indentation.
2023-10-30PR testsuite/111462 - add powerpc64le to list of ssa-sink-18.c XFAILRichard Biener1-3/+3
PR testsuite/111462 gcc/testsuite/ * gcc.dg/tree-ssa/ssa-sink-18.c: XFAIL also powerpc64le.
2023-10-30RISC-V: Fix bugs of handling scalar of SEW64 vx instruction in RV32Juzhe-Zhong3-21/+43
sew64_scalar_helper is handling SEW64 vx instruction pattern on RV32 system. According to RVV ISA, we can directly use vx instruction of SEW64 on RV32 system since RV32 GR reg is 32bit. Consider this following case: vsetvl e64m1 vadd.vx v,v,x will be transform by sew64_scalar_helper: vsetvl e64m1 sw sw vlse v vadd.vv This bug is reported by Robin. (insn 143 179 230 9 (set (reg:SI 15 a5 [234]) (unspec:SI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX)) 751 {vlmax_avlsi} (expr_list:REG_EQUIV (unspec:SI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX) (nil))) (insn 230 143 78 9 (parallel [ (set (reg:SI 66 vl) (unspec:SI [ (reg:SI 15 a5 [234]) (const_int 64 [0x40]) (const_int 0 [0]) ] UNSPEC_VSETVL)) (set (reg:SI 67 vtype) (unspec:SI [ (const_int 64 [0x40]) (const_int 0 [0]) (const_int 1 [0x1]) repeated x2 ] UNSPEC_VSETVL)) ]) "bug.c":14:14 discrim 1 1469 {vsetvl_discard_resultsi} (nil)) (insn 78 230 84 9 (set (reg:RVVM1DI 102 v6 [203]) (if_then_else:RVVM1DI (unspec:RVVMF64BI [ (const_vector:RVVMF64BI repeat [ (const_int 1 [0x1]) ]) (const_int 0 [0]) (const_int 2 [0x2]) repeated x2 (const_int 0 [0]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (vec_duplicate:RVVM1DI (mem/u/c:DI (reg/f:SI 29 t4 [230]) [0 S8 A64])) (unspec:RVVM1DI [ (reg:SI 0 zero) ] UNSPEC_VUNDEF))) "bug.c":14:14 discrim 1 1872 {*pred_broadcastrvvm1di} (expr_list:REG_DEAD (reg/f:SI 29 t4 [230]) (nil))) The root cause of this is because we missed VLMAX handling since the codes was invented long time ago (Callers always intrinsics codes, no VLMAX situation). Now, all following bugs are fixed after this patch: FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test gcc/ChangeLog: * config/riscv/riscv-protos.h (sew64_scalar_helper): Fix bug. * config/riscv/riscv-v.cc (sew64_scalar_helper): Ditto. * config/riscv/vector.md: Ditto.
2023-10-30Fortran: Fix a problem with SELECT TYPE selectors [PR104555].Paul Thomas2-0/+38
2023-10-30 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/104555 * resolve.cc (resolve_select_type): If the selector expression has no class component references and the expression is a derived type, copy the typespec of the symbol to that of the expression. gcc/testsuite/ PR fortran/104555 * gfortran.dg/pr104555.f90: New test.
2023-10-30Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.liuhongt4-22/+99
When 2 vectors are equal, kmask is allones and kortest will set CF, else CF will be cleared. So CF bit can be used to check for the result of the comparison. Before: vmovdqu (%rsi), %ymm0 vpxorq (%rdi), %ymm0, %ymm0 vptest %ymm0, %ymm0 jne .L2 vmovdqu 32(%rsi), %ymm0 vpxorq 32(%rdi), %ymm0, %ymm0 vptest %ymm0, %ymm0 je .L5 .L2: movl $1, %eax xorl $1, %eax vzeroupper ret After: vmovdqu64 (%rsi), %zmm0 xorl %eax, %eax vpcmpeqd (%rdi), %zmm0, %k0 kortestw %k0, %k0 setc %al vzeroupper ret gcc/ChangeLog: PR target/104610 * config/i386/i386-expand.cc (ix86_expand_branch): Handle 512-bit vector with vpcmpeq + kortest. * config/i386/i386.md (cbranchxi4): New expander. * config/i386/sse.md: (cbranch<mode>4): Extend to V16SImode and V8DImode. gcc/testsuite/ChangeLog: * gcc.target/i386/pr104610-2.c: New test.
2023-10-30Expand: Checking available optabs for scalar modes in by pieces operationsHaochen Gui1-10/+13
The former patch (f08ca5903c7) examines the scalar modes by target hook scalar_mode_supported_p. It causes some i386 regression cases as XImode and OImode are not enabled in i386 target function. This patch examines the scalar mode by checking if the corresponding optabs are available for the mode. gcc/ PR target/111449 * expr.cc (qi_vector_mode_supported_p): Rename to... (by_pieces_mode_supported_p): ...this, and extends it to do the checking for both scalar and vector mode. (widest_fixed_size_mode_for_size): Call by_pieces_mode_supported_p to examine the mode. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.
2023-10-30Daily bump.GCC Administrator4-1/+65
2023-10-29d: Fix ICE: verify_gimple_failed (conversion of register to a different size ↵Iain Buclaw5-33/+139
in 'view_convert_expr') Static arrays in D are passed around by value, rather than decaying to a pointer. On x86_64 __builtin_va_list is an exception to this rule, but semantically it's still treated as a static array. This makes certain assignment operations fail due a mismatch in types. As all examples in the test program are rejected by C/C++ front-ends, these are now errors in D too to be consistent. PR d/110712 gcc/d/ChangeLog: * d-codegen.cc (d_build_call): Update call to convert_for_argument. * d-convert.cc (is_valist_parameter_type): New function. (check_valist_conversion): New function. (convert_for_assignment): Update signature. Add check whether assigning va_list is permissible. (convert_for_argument): Likewise. * d-tree.h (convert_for_assignment): Update signature. (convert_for_argument): Likewise. * expr.cc (ExprVisitor::visit (AssignExp *)): Update call to convert_for_assignment. gcc/testsuite/ChangeLog: * gdc.dg/pr110712.d: New test.
2023-10-29d: Merge upstream dmd, druntime e48bc0987d, phobos 2458e8f82.Iain Buclaw48-294/+209
D front-end changes: - Import dmd v2.106.0-beta.1. D runtime changes: - Import druntime v2.106.0-beta.1. Phobos changes: - Import phobos v2.106.0-beta.1. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd e48bc0987d. * expr.cc (ExprVisitor::visit (NewExp *)): Update for new front-end interface. * runtime.def (NEWARRAYT): Remove. (NEWARRAYIT): Remove. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime e48bc0987d. * src/MERGE: Merge upstream phobos 2458e8f82.
2023-10-29testsuite, X86, Darwin: Skip a test for mcmodel=large.Iain Sandoe1-0/+1
The large model is not implemented so far for Darwin (and the codegen will be different when it is). gcc/testsuite/ChangeLog: * gcc.target/i386/large-data.c: Skip for Darwin. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-29testsuite, X86, Darwin: Skip tests with incompatible output.Iain Sandoe3-0/+3
Darwin platforms do not currently emit .cfi_xxx instructions so that these tests do not work there. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-interrupt-1.c: Skip for Darwin. * gcc.target/i386/apx-push2pop2-1.c: Likewise. * gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-29tree-optimization/109334: Improve computation for access attributeMartin Uecker4-7/+81
The fix for PR104970 restricted size computations to the case where the access attribute was specified explicitly (no VLA). It also restricted it to void pointers or elements with constant sizes. The second restriction is enough to fix the original bug. Revert the first change to again allow size computations for VLA parameters and for VLA parameters together with an explicit access attribute. gcc/ChangeLog: PR tree-optimization/109334 * tree-object-size.cc (parm_object_size): Allow size computation for implicit access attributes. gcc/testsuite/ChangeLog: PR tree-optimization/109334 * gcc.dg/builtin-dynamic-object-size-0.c (test_parmsz_simple3): Supported again. (test_parmsz_external4): New test. * gcc.dg/builtin-dynamic-object-size-20.c: New test. * gcc.dg/pr104970.c: New test.
2023-10-28gcc: xtensa: fix salt/saltu version checkMax Filippov1-1/+1
gcc/ * config/xtensa/xtensa.h (TARGET_SALT): Change HW version from 260000 (which corresponds to RF-2014.0) to 270000 (which corresponds to RG-2015.0, the release where salt/saltu opcodes were introduced).
2023-10-29RISC-V: Fix one range-loop-construct warning of avlpropPan Li1-1/+1
This patch would like to fix one warning of avlprop as below. ../../gcc/config/riscv/riscv-avlprop.cc: In member function 'virtual unsigned int pass_avlprop::execute(function*)': ../../gcc/config/riscv/riscv-avlprop.cc:346:23: error: loop variable 'candidate' creates a copy from type 'const std::pair<avlprop_type, rtl_ssa::insn_info*>' [-Werror=range-loop-construct] 346 | for (const auto candidate : m_candidates) | ^~~~~~~~~ ../../gcc/config/riscv/riscv-avlprop.cc:346:23: note: use reference type to prevent copying 346 | for (const auto candidate : m_candidates) | ^~~~~~~~~ | & gcc/ChangeLog: * config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Use reference type to prevent copying. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-29Daily bump.GCC Administrator3-1/+38
2023-10-29d: Fix ICE: in verify_gimple_in_seq on powerpc-darwin9 [PR112270]Iain Buclaw6-8/+24
This ICE was seen during stage2 on powerpc-darwin9 only. There were still some uses of GCC's boolean_type_node in the D front-end, which caused a type mismatch to trigger as D bool size is fixed to 1 byte on all targets. So two new nodes have been introduced - d_bool_false_node and d_bool_true_node - which have replaced all remaining uses of boolean_false_node and boolean_true_node respectively. PR d/112270 gcc/d/ChangeLog: * d-builtins.cc (d_build_d_type_nodes): Initialize d_bool_false_node, d_bool_true_node. * d-codegen.cc (build_array_struct_comparison): Use d_bool_false_node instead of boolean_false_node. * d-convert.cc (d_truthvalue_conversion): Use d_bool_false_node and d_bool_true_node instead of boolean_false_node and boolean_true_node. * d-tree.h (enum d_tree_index): Add DTI_BOOL_FALSE and DTI_BOOL_TRUE. (d_bool_false_node): New macro. (d_bool_true_node): New macro. * modules.cc (build_dso_cdtor_fn): Use d_bool_false_node and d_bool_true_node instead of boolean_false_node and boolean_true_node. (register_moduleinfo): Use d_bool_type instead of boolean_type_node. gcc/testsuite/ChangeLog: * gdc.dg/pr112270.d: New test.
2023-10-28d: Add warning for call expression without side effectsIain Buclaw5-1/+134
In the last merge of the dmd front-end with upstream (r14-4830), this warning got removed from the semantic passes. Reimplement the warning for the code generation pass instead, where it cannot have an effect on conditional compilation. gcc/d/ChangeLog: * d-codegen.cc (call_side_effect_free_p): New function. * d-tree.h (CALL_EXPR_WARN_IF_UNUSED): New macro. (call_side_effect_free_p): New prototype. * expr.cc (ExprVisitor::visit (CallExp *)): Set CALL_EXPR_WARN_IF_UNUSED on matched call expressions. (ExprVisitor::visit (NewExp *)): Don't dereference the result of an allocation call here. * toir.cc (add_stmt): Emit warning when call expression added to statement list without being used. gcc/testsuite/ChangeLog: * gdc.dg/Wunused_value.d: New test.
2023-10-28Daily bump.GCC Administrator7-1/+295
2023-10-27[RA]: Fixing i686 bootstrap failure because of pushing the equivalence patchVladimir N. Makarov1-1/+2
GCC with my recent patch improving cost calculation for pseudos with equivalence may generate different code with and without debug info and as the result i686 bootstrap fails on i686. The patch fixes this bug. gcc/ChangeLog: PR rtl-optimization/112107 * ira-costs.cc: (calculate_equiv_gains): Use NONDEBUG_INSN_P instead of INSN_P.
2023-10-27RISC-V: Make stack_save_restore_2 more robustPatrick O'Neill1-2/+2
GCC recently changed to emit __riscv_restore_5 which causes this testcase to fail. This patch updates the regex to be more robust to change by accepting any number after __riscv_save_ and __riscv_restore_. gcc/testsuite/ChangeLog: * gcc.target/riscv/stack_save_restore_2.c: Accept any number after __riscv_save_ and __riscv_restore_. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-10-27Fortran: diagnostics of MODULE PROCEDURE declaration conflicts [PR104649]Harald Anlauf2-4/+61
gcc/fortran/ChangeLog: PR fortran/104649 * decl.cc (gfc_match_formal_arglist): Handle conflicting declarations of a MODULE PROCEDURE when one of the declarations is an alternate return. gcc/testsuite/ChangeLog: PR fortran/104649 * gfortran.dg/pr104649.f90: New test. Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
2023-10-27amdgcn: Fix bug in gfx1030 support patchAndrew Stubbs1-4/+2
The previous patch to add gfx1030 support introduced an issue with passing exit codes from kernels run under gcn-run (offload kernels were unaffected). gcc/ChangeLog: PR target/112088 * config/gcn/gcn.cc (gcn_expand_epilogue): Fix kernel epilogue register conflict.
2023-10-27amdgcn: silence warningsAndrew Stubbs2-3/+5
The operands really should be VOIDmode, so the warnings are false. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop): Mention "operands" in condition to silence the warnings. (vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop): Likewise. * config/gcn/gcn.md (*movti_insn): Likewise.
2023-10-27recog: Fix propagation into ASM_OPERANDSRichard Sandiford1-7/+20
An inline asm with multiple output operands is represented as a parallel set in which the SET_SRCs are the same (shared) ASM_OPERANDS. insn_propagation didn't account for this, and instead propagated into each ASM_OPERANDS individually. This meant that it could apply a substitution X->Y to Y itself, which (a) could create circularity and (b) would be semantically wrong in any case, since Y might use a different value of X. This patch checks explicitly for parallels involving ASM_OPERANDS, just like combine does. gcc/ * recog.cc (insn_propagation::apply_to_pattern_1): Handle shared ASM_OPERANDS.
2023-10-27c++: another build_new_1 folding fix [PR111929]Patrick Palka2-4/+12
In build_new_1, we also need to avoid folding 'outer_nelts_check' when in a template context to prevent an ICE on the below testcase. This patch replaces the problematic fold_build2 call with build2 (we'll later fold it if appropriate during cp_fully_fold). In passing, this patch removes an unnecessary conversion of 'nelts' since it should always already be a size_t (and 'convert' isn't the best conversion entry point to use anyway since it lacks a complain parameter). PR c++/111929 gcc/cp/ChangeLog: * init.cc (build_new_1): Remove unnecessary call to convert on 'nelts'. Use build2 instead of fold_build2 for 'outer_nelts_checks'. gcc/testsuite/ChangeLog: * g++.dg/template/non-dependent28a.C: New test.
2023-10-27c++: add testcase verifying non-dep new-expr checkingPatrick Palka1-0/+20
gcc/testsuite/ChangeLog: * g++.dg/template/new14.C: New test.
2023-10-27c++: more ahead-of-time -Wparentheses warningsPatrick Palka6-38/+44
Now that we don't have to worry about looking through NON_DEPENDENT_EXPR, we can easily extend the -Wparentheses warning in convert_for_assignment to consider (non-dependent) templated assignment operator expressions as well, like r14-4111-g6e92a6a2a72d3b did in maybe_convert_cond. gcc/cp/ChangeLog: * cp-tree.h (maybe_warn_unparenthesized_assignment): Declare. * semantics.cc (is_assignment_op_expr_p): Generalize to return true for any assignment operator expression, not just one that has been resolved to an operator overload. (maybe_warn_unparenthesized_assignment): Factored out from ... (maybe_convert_cond): ... here. (finish_parenthesized_expr): Mention maybe_warn_unparenthesized_assignment. * typeck.cc (convert_for_assignment): Replace -Wparentheses warning logic with maybe_warn_unparenthesized_assignment. gcc/testsuite/ChangeLog: * g++.dg/warn/Wparentheses-13.C: Strengthen by expecting that we issue the -Wparentheses warnings ahead of time. * g++.dg/warn/Wparentheses-23.C: Likewise. * g++.dg/warn/Wparentheses-32.C: Remove xfails.
2023-10-27PR modula2/111530: Build failure on BSD due to getopt_long_only GNU ↵Gaius Mulley5-86/+98
extension dependency This patch uses the libiberty getopt long functions (wrapped up inside libgm2/libm2pim/cgetopt.cc) and only enables this implementation if libgm2/configure.ac detects no getopt_long and friends on the target. gcc/m2/ChangeLog: PR modula2/111530 * gm2-libs-ch/cgetopt.c (cgetopt_cgetopt_long): Re-format. (cgetopt_cgetopt_long_only): Re-format. (cgetopt_SetOption): Re-format and assign flag to NULL if name is also NULL. * gm2-libs/GetOpt.def (AddLongOption): Add index parameter and change flag to be a VAR parameter rather than a pointer. (GetOptLong): Re-format. (GetOpt): Correct comment. * gm2-libs/GetOpt.mod: Re-write to rely on cgetopt rather than implement long option creation in GetOpt. * gm2-libs/cgetopt.def (SetOption): has_arg type is INTEGER. libgm2/ChangeLog: PR modula2/111530 * Makefile.in: Regenerate. * aclocal.m4: Regenerate. * config.h.in: Regenerate. * configure: Regenerate. * configure.ac (AC_CHECK_HEADERS): Include getopt.h. (GM2_CHECK_LIB): getopt_long check. (GM2_CHECK_LIB): getopt_long_only check. * libm2cor/Makefile.in: Regenerate. * libm2iso/Makefile.in: Regenerate. * libm2log/Makefile.in: Regenerate. * libm2min/Makefile.in: Regenerate. * libm2pim/Makefile.in: Regenerate. * libm2pim/cgetopt.cc: Re-write using conditional on configure and long function code from libiberty/getopt.c. gcc/testsuite/ChangeLog: PR modula2/111530 * gm2/pimlib/run/pass/testgetopt.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-10-27[PATCH] RISC-V: Fix wrong tune parameters on int_divYangyu Chen1-3/+3
This patch fixes an issue with the cost on "int_div" in various RISC-V tune parameters including those for Rocket, SiFive U7 series, and T-Head C906. This incorrect cost value interferes with the optimization process. For example, it prevents the optimization of division by a constant to a more efficient method known as Barrett reduction. This lack of optimization negatively affects the performance of these systems. The integer div cost of the Rocket and SiFive U7 is taken from the Rocket-Chip Divider source code[1] with BigCore configuration[2]. It shows the divUnroll unchanged which is 1 by default. Thus, the maximum int_div cycles should be the dataWidth + 1, which is 33 for 32-bit and 65 for 64-bit. As for C906, the divider takes 2 cycle to start[3], and it produce 2-bit result each cycle[4]. Thus, the maximum int_div cycles should be the dataWidth / 2 + 2, which is 18 for 32-bit and 34 for 64-bit. I also test the performance on VisionFive2 which has Qual-Core Sifive U74. I write a simple C program to do 1e8 times div by constant 6 in int32. The result shows it takes 1.998s using div, and 0.420s using barrett reduction to replace div with mul, which is 4.75x faster. [1] https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/rocket/Multiplier.scala#L40 [2] https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/subsystem/Configs.scala#L97 [3] https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div.v#L267 [4] https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div_shift2_kernel.v#L93 gcc/ChangeLog: * config/riscv/riscv.cc (rocket_tune_info): Fix int_div cost. (sifive_7_tune_info, thead_c906_tune_info): Likewise.
2023-10-27RISC-V: Add rawmemchr expander.Robin Dapp10-211/+429
This patch adds a vectorized rawmemchr expander. It also moves the vectorized expand_block_move to riscv-string.cc. gcc/ChangeLog: * config/riscv/autovec.md (rawmemchr<ANYI:mode>): New expander. * config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx): Define. (expand_rawmemchr): Define. * config/riscv/riscv-v.cc (force_vector_length_operand): Remove static. (expand_block_move): Move from here... * config/riscv/riscv-string.cc (expand_block_move): ...to here. (expand_rawmemchr): Add vectorized expander. * internal-fn.cc (expand_RAWMEMCHR): Fix typo. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/peel-2.c: Add -fno-tree-loop-distribute-patterns. * gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv. * gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto. * gcc.target/riscv/rvv/rvv.exp: Add builtin directory. * gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c: New test.
2023-10-27RISC-V: Fix cond_sqrt tests.Robin Dapp7-7/+154
As long as we do not have universal Zvfh support in binutils linking against libm does not work out of the box. This patch splits the cond_sqrt tests into non-zvfh and zvfh variants and makes the run-zvfh ones depend on a zvfh target. While at it, I also added Zvfh handling to the testsuite helpers. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Remove Float16. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto. * lib/target-supports.exp: Add zvfh handling. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-2.c: New test.
2023-10-27[RA]: Add cost calculation for reg equivalence invariantsVladimir N. Makarov1-0/+4
My recent patch improving cost calculation for pseudos with equivalence resulted in failure of gcc.target/arm/eliminate.c on aarch64. This patch fixes this failure. gcc/ChangeLog: * ira-costs.cc: (get_equiv_regno, calculate_equiv_gains): Process reg equivalence invariants.
2023-10-27i386: Fiy typo in "partial_memory_read_stall" tune option.Uros Bizjak1-1/+1
gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL): i386: Fiy typo in "partial_memory_read_stall" tune option.
2023-10-27Move OpenMP tests to gomp subdirPaul-Antoine Arras2-4/+0
gcc/testsuite/ChangeLog: * gfortran.dg/c_ptr_tests_20.f90: Moved to... * gfortran.dg/gomp/c_ptr_tests_20.f90: ...here. * gfortran.dg/c_ptr_tests_21.f90: Moved to... * gfortran.dg/gomp/c_ptr_tests_21.f90: ...here.
2023-10-27aarch64: Add basic target_print_operand support for CONST_STRINGVictor Do Nascimento1-0/+5
Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accordingly. Consequently, an rtx such as: (set (reg/i:DI 0 x0) (unspec:DI [(const_string ("s3_3_c13_c2_2"))]) can now be output correctly using the following output pattern when composing `define_insn's: "mrs\t%x0, %1" gcc/ChangeLog * config/aarch64/aarch64.cc (aarch64_print_operand): Add support for CONST_STRING.
2023-10-27PR target/110551: Fix reg allocation for widening multiplications on x86.Roger Sayle2-19/+68
This patch contains clean-ups of the widening multiplication patterns in i386.md, and provides variants of the existing highpart multiplication peephole2 transformations (that tidy up register allocation after reload), and thereby fixes PR target/110551, which is a superfluous move instruction. For the new test case, compiled on x86_64 with -O2. Before: mulx64: movabsq $-7046029254386353131, %rcx movq %rcx, %rax mulq %rdi xorq %rdx, %rax ret After: mulx64: movabsq $-7046029254386353131, %rax mulq %rdi xorq %rdx, %rax ret The clean-ups are (i) that operand 1 is consistently made register_operand and operand 2 becomes nonimmediate_operand, so that predicates match the constraints, (ii) the representation of the BMI2 mulx instruction is updated to use the new umul_highpart RTX, and (iii) because operands 0 and 1 have different modes in widening multiplications, "a" is a more appropriate constraint than "0" (which avoids spills/reloads containing SUBREGs). The new peephole2 transformations are based upon those at around line 9951 of i386.md, that begins with the comment ;; Highpart multiplication peephole2s to tweak register allocation. ;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx -> mov imm,%rax; imulq %rdi 2023-10-27 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110551 * config/i386/i386.md (<u>mul<mode><dwi>3): Make operands 1 and 2 take "regiser_operand" and "nonimmediate_operand" respectively. (<u>mulqihi3): Likewise. (*bmi2_umul<mode><dwi>3_1): Operand 2 needs to be register_operand matching the %d constraint. Use umul_highpart RTX to represent the highpart multiplication. (*umul<mode><dwi>3_1): Operand 2 should use regiser_operand predicate, and "a" rather than "0" as operands 0 and 2 have different modes. (define_split): For mul to mulx conversion, use the new umul_highpart RTX representation. (*mul<mode><dwi>3_1): Operand 1 should be register_operand and the constraint %a as operands 0 and 1 have different modes. (*<u>mulqihi3_1): Operand 1 should be register_operand matching the constraint %0. (define_peephole2): Providing widening multiplication variants of the peephole2s that tweak highpart multiplication register allocation. gcc/testsuite/ChangeLog PR target/110551 * gcc.target/i386/pr110551.c: New test case.