aboutsummaryrefslogtreecommitdiff
path: root/gcc/testsuite/gcc.target
AgeCommit message (Collapse)AuthorFilesLines
2023-04-26powerpc: Fix up *branch_anddi3_dot for -m32 -mpowerpc64 [PR109566]Jakub Jelinek1-0/+18
The following testcase reduced from newlib ICEs on powerpc-linux, with -O2 -m32 -mpowerpc64 since r12-6433 PR102239 optimization was added and on the original testcase since some ranger improvements in GCC 13 made it no longer latent on newlib. The problem is that the *branch_anddi3_dot define_insn_and_split relies on the *rotldi3_mask_dot define_insn_and_split being recognized during splitting. The rs6000_is_valid_rotate_dot_mask function checks whether the mask is a CONST_INT which is a valid mask, but *rotl<mode>3_mask_dot in addition to checking that it is a valid mask also has (<MODE>mode == Pmode || UINTVAL (operands[3]) <= 0x7fffffff) test in the condition. For TARGET_64BIT that doesn't add any further requirements, but for !TARGET_64BIT && TARGET_POWERPC64 if the AND second operand is larger than INT_MAX it will not be recognized. The rs6000_is_valid_rotate_dot_mask function is used solely in one spot, condition of *branch_anddi3_dot, so the following patch adjusts it to check for that as well. 2023-04-25 Jakub Jelinek <jakub@redhat.com> PR target/109566 * config/rs6000/rs6000.cc (rs6000_is_valid_rotate_dot_mask): For !TARGET_64BIT, don't return true if UINTVAL (mask) << (63 - nb) is larger than signed int maximum. * gcc.target/powerpc/pr109566.c: New test. (cherry picked from commit 97f8f2d0a0384d377ca46da88495f9a3d18d4415)
2023-04-21rs6000: correct vector sign extend builtins on Big EndianHaochen Gui2-0/+33
gcc/ PR target/108812 * config/rs6000/vsx.md (vsx_sign_extend_qi_<mode>): Rename to... (vsx_sign_extend_v16qi_<mode>): ... this. (vsx_sign_extend_hi_<mode>): Rename to... (vsx_sign_extend_v8hi_<mode>): ... this. (vsx_sign_extend_si_v2di): Rename to... (vsx_sign_extend_v4si_v2di): ... this. (vsignextend_qi_<mode>): Remove. (vsignextend_hi_<mode>): Remove. (vsignextend_si_v2di): Remove. (vsignextend_v2di_v1ti): Remove. (*xxspltib_<mode>_split): Replace gen_vsx_sign_extend_qi_v2di with gen_vsx_sign_extend_v16qi_v2di and gen_vsx_sign_extend_qi_v4si with gen_vsx_sign_extend_v16qi_v4si. * config/rs6000/rs6000.md (split for DI constant generation): Replace gen_vsx_sign_extend_qi_si with gen_vsx_sign_extend_v16qi_si. (split for HSDI constant generation): Replace gen_vsx_sign_extend_qi_di with gen_vsx_sign_extend_v16qi_di and gen_vsx_sign_extend_qi_si with gen_vsx_sign_extend_v16qi_si. * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsignextsb2d): Set bif-pattern to vsx_sign_extend_v16qi_v2di. (__builtin_altivec_vsignextsb2w): Set bif-pattern to vsx_sign_extend_v16qi_v4si. (__builtin_altivec_visgnextsh2d): Set bif-pattern to vsx_sign_extend_v8hi_v2di. (__builtin_altivec_vsignextsh2w): Set bif-pattern to vsx_sign_extend_v8hi_v4si. (__builtin_altivec_vsignextsw2d): Set bif-pattern to vsx_sign_extend_si_v2di. (__builtin_altivec_vsignext): Set bif-pattern to vsx_sign_extend_v2di_v1ti. * config/rs6000/rs6000-builtin.cc (lxvrse_expand_builtin): Replace gen_vsx_sign_extend_qi_v2di with gen_vsx_sign_extend_v16qi_v2di, gen_vsx_sign_extend_hi_v2di with gen_vsx_sign_extend_v8hi_v2di and gen_vsx_sign_extend_si_v2di with gen_vsx_sign_extend_v4si_v2di. gcc/testsuite/ PR target/108812 * gcc.target/powerpc/p9-sign_extend-runnable.c: Set corresponding expected vectors for Big Endian. * gcc.target/powerpc/int_128bit-runnable.c: Likewise. (cherry picked from commit a213e2c965382c24fe391ee5798effeba8da0fdf)
2023-04-18i386: Require just 32-bit alignment for SLOT_FLOATxFDI_387 -m32 ↵Jakub Jelinek1-0/+13
-mpreferred-stack-boundary=2 DImode temporaries [PR109276] The following testcase ICEs since r11-2259 because assign_386_stack_local -> assign_stack_local -> ix86_local_alignment now uses 64-bit alignment for DImode temporaries rather than 32-bit as before. Most of the spots in the backend which ask for DImode temporaries are during expansion and those apparently are handled fine with -m32 -mpreferred-stack-boundary=2, we dynamically realign the stack in that case (most of the spots actually don't need that alignment but at least one does), then 2 spots are in STV which I assume also work correctly. But during splitting we can create a DImode slot which doesn't need to be 64-bit alignment (it is nicer for performance though), when we apparently aren't able to detect it for dynamic stack realignment purposes. The following patch just makes the slot 32-bit aligned in that rare case. 2023-03-28 Jakub Jelinek <jakub@redhat.com> PR target/109276 * config/i386/i386.cc (assign_386_stack_local): For DImode with SLOT_FLOATxFDI_387 and -m32 -mpreferred-stack-boundary=2 pass align 32 rather than 0 to assign_stack_local. * gcc.target/i386/pr109276.c: New test. (cherry picked from commit 4b5ef857f5faf09f274c0a95c67faaa80d198124)
2023-04-18tree-vect-generic: Fix up expand_vector_condition [PR109176]Jakub Jelinek1-0/+12
The following testcase ICEs on aarch64-linux, because expand_vector_condition attempts to piecewise lower SVE d_3 = a_1(D) < b_2(D); _5 = VEC_COND_EXPR <d_3, c_4(D), d_3>; which isn't possible - nunits_for_known_piecewise_op ICEs but the rest of the code assumes constant number of elements too. expand_vector_condition attempts to find if a (rhs1) is a SSA_NAME for comparison and calls expand_vec_cond_expr_p (type, TREE_TYPE (a1), code) where a1 is one of the operands of the comparison and code is the comparison code. That one indeed isn't supported here, but what aarch64 SVE supports are the individual statements, comparison (expand_vec_cmp_expr_p) and expand_vec_cond_expr_p (type, TREE_TYPE (a), SSA_NAME), the latter because that function starts with if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type) && get_vcond_mask_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing) return true; In an earlier version of the patch (in the PR), we did this if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (a)) && expand_vec_cond_expr_p (type, TREE_TYPE (a), ERROR_MARK)) return true; before the code == SSA_NAME handling plus some further tweaks later. While that fixed the ICE, it broke quite a few tests on x86 and some on aarch64 too. The problem is that expand_vector_comparison doesn't lower comparisons which aren't supported and only feed VEC_COND_EXPR first operand and expand_vector_condition succeeds for those, so with the above mentioned change we'd verify the VEC_COND_EXPR is implementable using optab alone, but nothing would verify the tcc_comparison which relied on expand_vector_condition to verify. So, the following patch instead queries whether optabs can handle the comparison and VEC_COND_EXPR together (if a (rhs1) is a comparison; otherwise as before it checks only the VEC_COND_EXPR) and if that fails, also checks whether the two operations could be supported individually and only if even that fails does the piecewise lowering. 2023-03-23 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109176 * tree-vect-generic.cc (expand_vector_condition): If a has vector boolean type and is a comparison, also check if both the comparison and VEC_COND_EXPR could be successfully expanded individually. * gcc.target/aarch64/sve/pr109176.c: New test. (cherry picked from commit 484c41c747d95f9cee15a33b75b32ae2e7eb45f3)
2023-04-18PR target/108589 - Check REG_P for AARCH64_FUSE_ADDSUB_2REG_CONST1Philipp Tomsich1-0/+15
This adds a check for REG_P on SET_DEST for the new idiom recognizer for AARCH64_FUSE_ADDSUB_2REG_CONST1. The reported ICE is only observable with checking=rtl. Bootstrapped/regtested aarch64-linux, committed. PR target/108589 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Check REG_P on SET_DEST. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr108589.c: New test. (cherry picked from commit a39c6ec97906766ad65d15d4856fd41121ee7a45)
2023-04-17aarch64: disable LDP via tuning structure for -mcpu=ampere1Philipp Tomsich1-0/+11
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops. Given the chance that this causes instructions to slip into the next decoding cycle and the additional overheads when handling cacheline-crossing LDP instructions, we disable the generation of LDP isntructions through the tuning structure from instruction combining (such as in peephole2). Given the code-density benefits in builtins and prologue/epilogue expansion, we allow LDPs there. This commit: * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE * allows -moverride=tune=... to override this These changes are benchmark-driven, yielding the following changes (with a net-overall improvement): 503.bwaves_r. -0.88% 507.cactuBSSN_r 0.35% 508.namd_r 3.09% 510.parest_r -2.99% 511.povray_r 5.54% 519.lbm_r 15.83% 521.wrf_r 0.56% 526.blender_r 2.47% 527.cam4_r 0.70% 538.imagick_r 0.00% 544.nab_r -0.33% 549.fotonik3d_r. -0.42% 554.roms_r 0.00% ------------------------- = total 1.79% Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-Authored-By: Di Zhao <di.zhao@amperecomputing.com> gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION): Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE. * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp): Check for the above tuning option when processing loads. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ampere1-no_ldp_combine.c: New test. (cherry picked from commit f200c56787f2c6f93ffb739d57d01a294ab72f68)
2023-04-16rs6000: Fix vector parity support [PR108699]Kewen Lin2-0/+43
The failures on the original failed case builtin-bitops-1.c and the associated test case pr108699.c here show that the current support of parity vector mode is wrong on Power. The hardware insns vprtyb[wdq] which operate on the least significant bit of each byte per element, they doesn't match what RTL opcode parity needs, but the current implementation expands it with them wrongly. This patch is to fix the handling with one more insn vpopcntb. PR target/108699 gcc/ChangeLog: * config/rs6000/altivec.md (*p9v_parity<mode>2): Rename to ... (rs6000_vprtyb<mode>2): ... this. * config/rs6000/rs6000-builtins.def (VPRTYBD): Replace parityv2di2 with rs6000_vprtybv2di2. (VPRTYBW): Replace parityv4si2 with rs6000_vprtybv4si2. (VPRTYBQ): Replace parityv1ti2 with rs6000_vprtybv1ti2. * config/rs6000/vector.md (parity<mode>2 with VEC_IP): Expand with popcountv16qi2 and the corresponding rs6000_vprtyb<mode>2. gcc/testsuite/ChangeLog: * gcc.target/powerpc/p9-vparity.c: Add scan-assembler-not for vpopcntb to distinguish parity byte from parity. * gcc.target/powerpc/pr108699.c: New test. (cherry picked from commit cdd2d6643f7fef40e335a7027edfea7276cde608)
2023-04-10Backport from masterMichael Meissner4-0/+92
2023-04-10 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/109067 * config/rs6000/rs6000.cc (create_complex_muldiv): Delete. (init_float128_ieee): Delete code to switch complex multiply and divide for long double. Backport from master, 3/20/2023. (complex_multiply_builtin_code): New helper function. (complex_divide_builtin_code): Likewise. (rs6000_mangle_decl_assembler_name): Add support for mangling the name of complex 128-bit multiply and divide built-in functions. gcc/testsuite/ PR target/109067 * gcc.target/powerpc/divic3-1.c: New test. Backport from master, 3/20/2023. * gcc.target/powerpc/divic3-2.c: Likewise. * gcc.target/powerpc/mulic3-1.c: Likewise. * gcc.target/powerpc/mulic3-2.c: Likewise.
2023-04-03vect: Make partial trapping ops use predication [PR96373]Richard Sandiford4-14/+14
PR96373 points out that a predicated SVE loop currently converts trapping unconditional ops into unpredicated vector ops. Doing the operation on inactive lanes can then raise an exception. As discussed in the PR trail, we aren't 100% consistent about whether we preserve traps or not. But the direction of travel is clearly to improve that rather than live with it. This patch tries to do that for the SVE case. Doing this regresses gcc.target/aarch64/sve/fabd_1.c. I've added -fno-trapping-math for now and filed PR108571 to track it. A similar problem applies to fsubr_1.c. I think this is likely to regress Power 10, since conditional operations are only available for masked loops. I think we'll need to add -fno-trapping-math to any affected testcases, but I don't have a Power 10 system to test on. gcc/ PR tree-optimization/96373 PR tree-optimization/108979 * tree-vect-stmts.cc (vectorizable_operation): Predicate trapping operations on the loop mask. Reject partial vectors if this isn't possible. Don't mask operations on invariants. gcc/testsuite/ PR tree-optimization/96373 PR tree-optimization/108571 PR tree-optimization/108979 * gcc.target/aarch64/sve/fabd_1.c: Add -fno-trapping-math. * gcc.target/aarch64/sve/fsubr_1.c: Likewise. * gcc.target/aarch64/sve/fmul_1.c: Expect predicate ops. * gcc.target/aarch64/sve/fp_arith_1.c: Likewise. * gfortran.dg/vect/pr108979.f90: New test.
2023-04-03aarch64: Restore vectorisation of vld1 inputs [PR109072]Richard Sandiford2-0/+341
Before GCC 12, we would vectorize: int32_t arr[] = { x, x, x, x }; at -O3. Vectorizing the store on its own is often a loss, particularly for integers, so g:4963079769c99c4073adfd799885410ad484cbbe suppressed it. This was necessary to fix regressions from enabling vectorisation at -O2, However, the vectorisation is important if the code subsequently loads from the array using vld1: return vld1q_s32 (arr); This approach of initialising an array and loading from it is the recommend endian-agnostic way of constructing an ACLE vector. As discussed in the PR notes, the general fix would be to fold the store and load-back to a constructor (preferably before vectorisation). But that's clearly not stage 4 material. This patch instead delays folding vld1 until after inlining and records which decls a vld1 loads from. It then treats vector stores to those decls as free, on the optimistic assumption that they will be removed later. The patch also brute-forces vectorization of plain constructor+store sequences, since some of the CPU costs make that (dubiously) expensive even when the store is discounted. Delaying folding showed that we were failing to update the vops. The patch fixes that too. Thanks to Tamar for discussion & help with testing. gcc/ PR target/109072 * config/aarch64/aarch64-protos.h (aarch64_vector_load_decl): Declare. * config/aarch64/aarch64.h (machine_function::vector_load_decls): New variable. * config/aarch64/aarch64-builtins.cc (aarch64_record_vector_load_arg): New function. (aarch64_general_gimple_fold_builtin): Delay folding of vld1 until after inlining. Record which decls are loaded from. Fix handling of vops for loads and stores. * config/aarch64/aarch64.cc (aarch64_vector_load_decl): New function. (aarch64_accesses_vector_load_decl_p): Likewise. (aarch64_vector_costs::m_stores_to_vector_load_decl): New member variable. (aarch64_vector_costs::add_stmt_cost): If the function has a vld1 that loads from a decl, treat vector stores to those decls as zero cost. (aarch64_vector_costs::finish_cost): ...and in that case, if the vector code does nothing more than a store, give the prologue a zero cost as well. gcc/testsuite/ PR target/109072 * gcc.target/aarch64/pr109072_1.c: New test. * gcc.target/aarch64/pr109072_2.c: Likewise. (cherry picked from commit fcb411564a655a01f759eea3bb16bfd1bc879bfd)
2023-04-03lra: Replace subregs in bare uses & clobbers [PR108681]Richard Sandiford1-0/+15
In this PR we had a write to one vector of a 4-vector tuple. The vector had mode V1DI, and the target doesn't provide V1DI moves, so this was converted into: (clobber (subreg:V1DI (reg/v:V4x1DI 92 [ b ]) 24)) followed by a DImode move. (The clobber isn't really necessary or helpful for a single word, but would be for wider moves.) The subreg in the clobber survived until after RA: (clobber (subreg:V1DI (reg/v:V4x1DI 34 v2 [orig:92 b ] [92]) 24)) IMO this isn't well-formed. If a subreg of a hard register simplifies to a hard register, it should be replaced by the hard register. If the subreg doesn't simplify, then target-independent code can't be sure which parts of the register are affected and which aren't. A clobber of such a subreg isn't useful and (again IMO) should just be removed. Conversely, a use of such a subreg is effectively a use of the whole inner register. LRA has code to simplify subregs of hard registers, but it didn't handle bare uses and clobbers. The patch extends it to do that. One question was whether the final_p argument to alter_subregs should be true or false. True is IMO dangerous, since it forces replacements that might not be valid from a dataflow perspective, and uses and clobbers only exist for dataflow. As said above, I think the correct way of handling a failed simplification would be to delete clobbers and replace uses of subregs with uses of the inner register. But I didn't want to write untested code to do that. In the PR, the clobber caused an infinite loop in DCE, because of a disagreement about what effect the clobber had. But for the reasons above, I think that was GIGO rather than a bug in DF or DCE. gcc/ PR rtl-optimization/108681 * lra-spills.cc (lra_final_code_change): Extend subreg replacement code to handle bare uses and clobbers. gcc/testsuite/ PR rtl-optimization/108681 * gcc.target/aarch64/pr108681.c: New test. (cherry picked from commit 3cac06d84f334705ed0bce12fbc3a4cec4a8fd3b)
2023-04-03vect: Fix single def-use cycle for ifn reductions [PR108608]Richard Sandiford1-0/+9
The patch that added support for fmin/fmax reductions didn't handle single def-use cycles. In some ways, this seems like going out of our way to make things slower, but that's a discussion for another day. gcc/ PR tree-optimization/108608 * tree-vect-loop.cc (vect_transform_reduction): Handle single def-use cycles that involve function calls rather than tree codes. gcc/testsuite/ PR tree-optimization/108608 * gcc.dg/vect/pr108608.c: New test. * gcc.target/aarch64/sve/pr108608-1.c: Likewise. (cherry picked from commit 2bb444787ed17a9e786f544cdf51ee2ac6779ab2)
2023-04-03Avoid creating (const (reg ...)) [PR108603]Richard Sandiford1-0/+8
convert_memory_address_addr_space_1 has two modes: one in which it tries to create a self-contained RTL expression (which might fail) and one in which it can emit new instructions where necessary. When handling a CONST, the function recurses into the CONST's operand and then constifies the result. But that's only valid if the result is still a self-contained expression. If new instructions have been emitted, the expression will refer to the (non-constant) results of those instructions. In the PR, this caused us to emit a nonsensical (const (reg ...)) REG_EQUAL note. gcc/ PR tree-optimization/108603 * explow.cc (convert_memory_address_addr_space_1): Only wrap the result of a recursive call in a CONST if no instructions were emitted. gcc/testsuite/ PR tree-optimization/108603 * gcc.target/aarch64/sve/pr108603.c: New test. (cherry picked from commit b09dc74801cf4e19bdf5fcd18a5cd53fc9e9ca9a)
2023-04-03rtl-ssa: Fix splitting of clobber groups [PR108508]Richard Sandiford1-0/+28
Since rtl-ssa isn't a real/native SSA representation, it has to honour the constraints of the underlying rtl representation. Part of this involves maintaining an rpo list of definitions for each rtl register, backed by a splay tree where necessary for quick lookup/insertion. However, clobbers of a register don't act as barriers to other clobbers of a register. E.g. it's possible to move one flag-clobbering instruction across an arbitrary number of other flag-clobbering instructions. In order to allow passes to do that without quadratic complexity, the splay tree groups all consecutive clobbers into groups, with only the group being entered into the splay tree. These groups in turn have an internal splay tree of clobbers where necessary. This means that, if we insert a new definition and use into the middle of a sea of clobbers, we need to split the clobber group into two groups. This was quite a difficult condition to trigger during development, and the PR shows that the code to handle it had (at least) two bugs. First, the process involves searching the clobber tree for the split point. This search can give either the previous clobber (which will belong to the first of the split groups) or the next clobber (which will belong to the second of the split groups). The code for the former case handled the split correctly but the code for the latter case didn't. Second, I'd forgotten to add the second clobber group to the main splay tree. :-( gcc/ PR rtl-optimization/108508 * rtl-ssa/accesses.cc (function_info::split_clobber_group): When the splay tree search gives the first clobber in the second group, make sure that the root of the first clobber group is updated correctly. Enter the new clobber group into the definition splay tree. gcc/testsuite/ PR rtl-optimization/108508 * gcc.target/aarch64/pr108508.c: New test. (cherry picked from commit f4e1b46618ef3bd7933992ab79f663ab9112bb80)
2023-04-03vect: Fix voluntarily-masked negative conditionals [PR108430]Richard Sandiford1-0/+21
vectorizable_condition checks whether a COND_EXPR condition is used elsewhere with a loop mask. If so, it applies the loop mask to the COND_EXPR too, to reduce the number of live masks and to increase the chance of combining the AND with the comparison. There is also code to do this for inverted conditions. E.g. if we have a < b ? c : d and something else is conditional on !(a < b) (such as a load in d), we use !(a < b) ? d : c and apply the loop mask to !(a < b). This inversion relied on the function's bitop1/bitop2 mechanism. However, that mechanism is skipped if the condition is split out of the COND_EXPR as a separate statement. This meant that we could end up using the inverse of the intended condition. There is a separate way of negating the condition when a mask is being applied (which is also used for EXTRACT_LAST reductions). This patch uses that instead. As well as the testcase, this fixes aarch64/sve/vcond_{4,17}_run.c. gcc/ PR tree-optimization/108430 * tree-vect-stmts.cc (vectorizable_condition): Fix handling of inverted condition. gcc/testsuite/ PR tree-optimization/108430 * gcc.target/aarch64/sve/pr108430.c: New test. (cherry picked from commit 2a8ce4b52f5892a10a02b94d7be689e59a444ff6)
2023-03-31IRA: Use minimal cost for hard register movementVladimir N. Makarov1-0/+9
This is the 2nd attempt to fix PR90706. IRA calculates wrong AVR costs for moving general hard regs of SFmode. This was the reason for spilling a pseudo in the PR. In this patch we use smaller move cost of hard reg in its natural and operand modes. PR rtl-optimization/90706 gcc/ChangeLog: * ira-costs.cc: Include print-rtl.h. (record_reg_classes, scan_one_insn): Add code to print debug info. (record_operand_costs): Find and use smaller cost for hard reg move. gcc/testsuite/ChangeLog: * gcc.target/avr/pr90706.c: New.
2023-03-28Fix line endingEric Botcazou4-76/+76
gcc/testsuite/ * gcc.target/sparc/20230328-1.c: New test. * gcc.target/sparc/20230328-2.c: Likewise. * gcc.target/sparc/20230328-3.c: Likewise. * gcc.target/sparc/20230328-4.c: Likewise.
2023-03-28Fix PR target/109140Eric Botcazou4-0/+76
This is a regression present on the mainline and 12 branch at -O2, but the issue is related to vectorization so was present at -O3 in earlier versions. The vcondu expander that was added for VIS 3 more than a decade ago does not fully work, because it does not filter out the unsigned condition codes (the instruction is an UNSPEC that accepts only signed condition codes). While I was at it, I also added the missing vcond and vcondu expanders for the new comparison instructions that were added in VIS 4. gcc/ PR target/109140 * config/sparc/sparc.cc (sparc_expand_vcond): Call signed_condition on operand #3 to get the final condition code. Use std::swap. * config/sparc/sparc.md (vcondv8qiv8qi): New VIS 4 expander. (fucmp<gcond:code>8<P:mode>_vis): Move around. (fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis): Likewise. (vcondu<GCM:mode><GCM:mode>): New VIS 4 expander. gcc/testsuite/ * gcc.target/sparc/20230328-1.c: New test. * gcc.target/sparc/20230328-2.c: Likewise. * gcc.target/sparc/20230328-3.c: Likewise. * gcc.target/sparc/20230328-4.c: Likewise.
2023-03-20rs6000: Don't ICE when compiling the __builtin_vec_xst_trunc built-in [PR109178]Peter Bergner1-0/+13
When we expand the __builtin_vec_xst_trunc built-in, we use the wrong mode for the MEM operand which causes an unrecognizable insn ICE. The solution is to use the correct TMODE mode. 2023-03-20 Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/109178 * config/rs6000/rs6000-builtin.cc (stv_expand_builtin): Use tmode. gcc/testsuite/ PR target/109178 * gcc.target/powerpc/pr109178.c: New test. (cherry picked from commit fbd50e867e6a782c7b56c9727bf7e1e74dae4b94)
2023-03-19tree-inline: Fix up multiversioning with vector arguments [PR105554]Jakub Jelinek1-0/+10
The following testcase ICEs, because we call tree_function_versioning from old_decl which has target attributes not supporting V4DImode and so DECL_MODE of DECL_ARGUMENTS is BLKmode, while new_decl supports those. tree_function_versioning initially copies DECL_RESULT and DECL_ARGUMENTS from old_decl to new_decl, then calls initialize_cfun to create cfun and only when the cfun is created it can later actually remap_decl DECL_RESULT and DECL_ARGUMENTS etc. The problem is that initialize_cfun -> push_struct_function -> allocate_struct_function calls relayout_decl on DECL_RESULT and DECL_ARGUMENTS, which clobbers DECL_MODE of old_decl and we then ICE because of it. In particular, allocate_struct_function does: if (!abstract_p) { /* Now that we have activated any function-specific attributes that might affect layout, particularly vector modes, relayout each of the parameters and the result. */ relayout_decl (result); for (tree parm = DECL_ARGUMENTS (fndecl); parm; parm = DECL_CHAIN (parm)) relayout_decl (parm); /* Similarly relayout the function decl. */ targetm.target_option.relayout_function (fndecl); } if (!abstract_p && aggregate_value_p (result, fndecl)) { #ifdef PCC_STATIC_STRUCT_RETURN cfun->returns_pcc_struct = 1; #endif cfun->returns_struct = 1; } Now, in the case of tree_function_versioning, I believe all that we need from these is possibly the targetm.target_option.relayout_function (fndecl); call (arm only), we will remap DECL_RESULT and DECL_ARGUMENTS later on and copy_decl_for_dup_finish in that case will handle all we need: /* For vector typed decls make sure to update DECL_MODE according to the new function context. */ if (VECTOR_TYPE_P (TREE_TYPE (copy))) SET_DECL_MODE (copy, TYPE_MODE (TREE_TYPE (copy))); We don't need the cfun->returns_*struct either, because we override it in initialize_cfun a few lines later: /* Copy items we preserve during cloning. */ ... cfun->returns_struct = src_cfun->returns_struct; cfun->returns_pcc_struct = src_cfun->returns_pcc_struct; So, to avoid the clobbering of DECL_RESULT/DECL_ARGUMENTS of old_decl, the following patch arranges allocate_struct_function to be called with abstract_p true and calls targetm.target_option.relayout_function (fndecl); by hand. The removal of DECL_RESULT/DECL_ARGUMENTS copying at the start of initialize_cfun is removed because the only caller - tree_function_versioning, does that unconditionally before. 2023-03-17 Jakub Jelinek <jakub@redhat.com> PR target/105554 * function.h (push_struct_function): Add ABSTRACT_P argument defaulted to false. * function.cc (push_struct_function): Add ABSTRACT_P argument, pass it to allocate_struct_function instead of false. * tree-inline.cc (initialize_cfun): Don't copy DECL_ARGUMENTS nor DECL_RESULT here. Pass true as ABSTRACT_P to push_struct_function. Call targetm.target_option.relayout_function after it. (tree_function_versioning): Formatting fix. * gcc.target/i386/pr105554.c: New test. (cherry picked from commit 24c06560a7fa39049911eeb8777325d112e0deb9)
2023-03-19fold-const: Ignore padding bits in native_interpret_expr REAL_CST reverse ↵Jakub Jelinek1-3/+2
verification [PR108934] In the following testcase we try to std::bit_cast a (pair of) integral value(s) which has some non-zero bits in the place of x86 long double (for 64-bit 16 byte type with 10 bytes actually loaded/stored by hw, for 32-bit 12 byte) and starting with my PR104522 change we reject that as native_interpret_expr fails on it. The PR104522 change extends what has been done before for MODE_COMPOSITE_P (but those don't have any padding bits) to all floating point types, because e.g. the exact x86 long double has various bit combinations we don't support, like pseudo-(denormals,infinities,NaNs) or unnormals. The HW handles some of those as exceptional cases and others similarly to the non-pseudo ones. But for the padding bits it actually doesn't load/store those bits at all, it loads/stores 10 bytes. So, I think we should exempt the padding bits from the reverse comparison (the native_encode_expr bits for the padding will be all zeros), which the following patch does. For bit_cast it is similar to e.g. ignoring padding bits if the destination is a structure which has padding bits in there. The change changed auto-init-4.c to how it has been behaving before the PR105259 change, where some more VCEs can be now done. 2023-03-02 Jakub Jelinek <jakub@redhat.com> PR c++/108934 * fold-const.cc (native_interpret_expr) <case REAL_CST>: Before memcmp comparison copy the bytes from ptr to a temporary buffer and clearing padding bits in there. * gcc.target/i386/auto-init-4.c: Revert PR105259 change. * g++.target/i386/pr108934.C: New test. (cherry picked from commit cc88366a80e35b3e53141f49d3071010ff3c2ef8)
2023-03-19i386: Fix up builtins used in avx512bf16vlintrin.h [PR108881]Jakub Jelinek1-0/+14
The builtins used in avx512bf16vlintrin.h implementation need both avx512bf16 and avx512vl ISAs, which the header ensures for them, but the builtins weren't actually requiring avx512vl, so when used by hand with just -mavx512bf16 -mno-avx512vl it resulted in ICEs. Fixed by adding OPTION_MASK_ISA_AVX512VL to their BDESC. 2023-02-24 Jakub Jelinek <jakub@redhat.com> PR target/108881 * config/i386/i386-builtin.def (__builtin_ia32_cvtne2ps2bf16_v16hi, __builtin_ia32_cvtne2ps2bf16_v16hi_mask, __builtin_ia32_cvtne2ps2bf16_v16hi_maskz, __builtin_ia32_cvtne2ps2bf16_v8hi, __builtin_ia32_cvtne2ps2bf16_v8hi_mask, __builtin_ia32_cvtne2ps2bf16_v8hi_maskz, __builtin_ia32_cvtneps2bf16_v8sf_mask, __builtin_ia32_cvtneps2bf16_v8sf_maskz, __builtin_ia32_cvtneps2bf16_v4sf_mask, __builtin_ia32_cvtneps2bf16_v4sf_maskz, __builtin_ia32_dpbf16ps_v8sf, __builtin_ia32_dpbf16ps_v8sf_mask, __builtin_ia32_dpbf16ps_v8sf_maskz, __builtin_ia32_dpbf16ps_v4sf, __builtin_ia32_dpbf16ps_v4sf_mask, __builtin_ia32_dpbf16ps_v4sf_maskz): Require also OPTION_MASK_ISA_AVX512VL. * gcc.target/i386/avx512bf16-pr108881.c: New test. (cherry picked from commit 0ccfa3884f638816af0f5a3f0ee2695e0771ef6d)
2023-03-15tree-optimization/108724 - vectorized code getting piecewise expandedRichard Biener1-0/+15
This fixes an oversight to when removing the hard limits on using generic vectors for the vectorizer to enable both SLP and BB vectorization to use those. The vectorizer relies on vector lowering to expand plus, minus and negate to bit operations but vector lowering has a hard limit on the minimum number of elements per work item. Vectorizer costs for the testcase at hand work out to vectorize a loop with just two work items per vector and that causes element wise expansion and spilling. The fix for now is to re-instantiate the hard limit, matching what vector lowering does. For the future the way to go is to emit the lowered sequence directly from the vectorizer instead. PR tree-optimization/108724 * tree-vect-stmts.cc (vectorizable_operation): Avoid using word_mode vectors when vector lowering will decompose them to elementwise operations. * gcc.target/i386/pr108724.c: New testcase. (cherry picked from commit dc87e1391c55c666c7ff39d4f0dea87666f25468)
2023-03-06LoongArch: Stop -mfpu from silently breaking ABI [PR109000]Xi Ruoyao4-0/+43
In the toolchain convention, we describe -mfpu= as: "Selects the allowed set of basic floating-point instructions and registers. This option should not change the FP calling convention unless it's necessary." Though not explicitly stated, the rationale of this rule is to allow combinations like "-mabi=lp64s -mfpu=64". This will be useful for running applications with LP64S/F ABI on a double-float-capable LoongArch hardware and using a math library with LP64S/F ABI but native double float HW instructions, for a better performance. And now a case in Linux kernel has again proven the usefulness of this kind of combination. The AMDGPU DCN kernel driver needs to perform some floating-point operation, but the entire kernel uses LP64S ABI. So the translation units of the AMDGPU DCN driver need to be compiled with -mfpu=64 (the kernel lacks soft-FP routines in libgcc), but -mabi=lp64s (or you can't link it with the other part of the kernel). Unfortunately, currently GCC uses TARGET_{HARD,SOFT,DOUBLE}_FLOAT to determine the floating calling convention. This causes "-mfpu=64" silently allow using $fa* to pass parameters and return values EVEN IF -mabi=lp64s is used. To make things worse, the generated object file has SOFT-FLOAT set in the eflags field so the linker will happily link it with other LP64S ABI object files, but obviously this will lead to bad results at runtime. And for now all loongarch64 CPU models (-march settings) implies -mfpu=64 on by default, so the issue makes a single "-mabi=lp64s" option basically broken (fortunately most projects for eg the Linux kernel have used -msoft-float which implies both -mabi=lp64s and -mfpu=none as we've recommended in the toolchain convention doc). The fix is simple: use TARGET_*_FLOAT_ABI instead. I consider this a bug fix: the behavior difference from the toolchain convention doc is a bug, and generating object files with SOFT-FLOAT flag but parameters/return values passed through FPRs is definitely a bug. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk and release/gcc-12 branch? gcc/ChangeLog: PR target/109000 * config/loongarch/loongarch.h (FP_RETURN): Use TARGET_*_FLOAT_ABI instead of TARGET_*_FLOAT. (UNITS_PER_FP_ARG): Likewise. gcc/testsuite/ChangeLog: PR target/109000 * gcc.target/loongarch/flt-abi-isa-1.c: New test. * gcc.target/loongarch/flt-abi-isa-2.c: New test. * gcc.target/loongarch/flt-abi-isa-3.c: New test. * gcc.target/loongarch/flt-abi-isa-4.c: New test. (cherry picked from commit 75eccddef5784bc5e262af31f535267a9c4e993e)
2023-02-26rs6000/test: Adjust some test cases on partial vector [PR96373]Kewen Lin15-14/+45
As Richard pointed out in [1] and the testing on Power10, the proposed fix for PR96373 requires some updates on a few rs6000 test cases which adopt partial vector. This patch is to fix all of them with one extra option "-fno-trapping-math" as Richard suggested. Besides, the original test case also failed on Power10 without Richard's proposed fix, this patch adds it together for a bit better testing coverage. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610728.html PR target/96373 gcc/testsuite/ChangeLog: * gcc.target/powerpc/p9-vec-length-epil-1.c: Add -fno-trapping-math. * gcc.target/powerpc/p9-vec-length-epil-2.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-3.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-4.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-5.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-6.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-1.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-2.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-3.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-4.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-5.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-6.c: Likewise. * gcc.target/powerpc/p9-vec-length-full-8.c: Likewise. * gcc.target/powerpc/pr96373.c: New test. (cherry picked from commit 4f5a1198065dc078f8099db628da7b06a2666f34)
2023-02-20aarch64: Fix up bfmlal lane pattern [PR104921]Alex Coplan3-0/+27
As the testcase shows, this pattern had an incorrect constraint leading to GCC's output getting rejected by the assembler. This patch fixes the constraint accordingly. The test is split into two: one that can run without bf16 support from the assembler and another that checks that the output actually assembles when such support is available. gcc/ChangeLog: PR target/104921 * config/aarch64/aarch64-simd.md (aarch64_bfmlal<bt>_lane<q>v4sf): Use correct constraint for operand 3. gcc/testsuite/ChangeLog: PR target/104921 * gcc.target/aarch64/pr104921-1.c: New test. * gcc.target/aarch64/pr104921-2.c: New test. * gcc.target/aarch64/pr104921.x: Include file for new tests. (cherry picked from commit 277e1f30a5e4e634304a7b8a532825119f0ea47f)
2023-02-12rs6000: Fix typo on vec_vsubcuq in rs6000-overload.def [PR108396]Kewen Lin1-0/+14
As Andrew pointed out in PR108396, there is one typo in rs6000-overload.def on built-in function vec_vsubcuq: [VEC_VSUBCUQ, vec_vsubcuqP, __builtin_vec_vsubcuq] "vec_vsubcuqP" should be "vec_vsubcuq", this typo caused us to define vec_vsubcuqP in rs6000-vecdefines.h instead of vec_vsubcuq, so that compiler is not able to realize the built-in function name vec_vsubcuq any more. Co-authored-By: Andrew Pinski <apinski@marvell.com> PR target/108396 gcc/ChangeLog: * config/rs6000/rs6000-overload.def (VEC_VSUBCUQ): Fix typo vec_vsubcuqP with vec_vsubcuq. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108396.c: New test. (cherry picked from commit aaf29ae6cdbaad58b709a77784375d15138174b3)
2023-02-12rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348]Kewen Lin2-0/+46
PR108348 shows one special case that MMA opaque types are used in function arguments and treated as pass by reference, it results in one copying from argument to a temp variable, since this copying happens before rs6000_function_arg check, it can cause ICE without MMA support then. This patch is to teach function rs6000_opaque_type_invalid_use_p to check if any function argument in a gcall stmt has the invalid use of MMA opaque types. btw, I checked the handling on return value, it doesn't have this kind of issue as its checking and error emission is quite early, so this doesn't handle function return value. PR target/108348 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses of MMA opaque type in function arguments. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108348-1.c: New test. * gcc.target/powerpc/pr108348-2.c: New test. (cherry picked from commit 5d9529687deb9ed009361a16c02a7f6c3e2ebbf3)
2023-02-12rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272]Kewen Lin4-0/+69
As PR108272 shows, there are some invalid uses of MMA opaque types in inline asm statements. This patch is to teach the function rs6000_opaque_type_invalid_use_p for inline asm, check and error any invalid use of MMA opaque types in input and output operands. PR target/108272 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses in inline asm, factor out the checking and erroring to lambda function check_and_error_invalid_use. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108272-1.c: New test. * gcc.target/powerpc/pr108272-2.c: New test. * gcc.target/powerpc/pr108272-3.c: New test. * gcc.target/powerpc/pr108272-4.c: New test. (cherry picked from commit 074b0c03eabeb8e9c8de813c81bf87a1f88fdb65)
2023-02-10i386: Fix up -Wuninitialized warnings in avx512erintrin.h [PR105593]Jakub Jelinek1-1/+1
As reported in the PR, there are some -Wuninitialized warnings in avx512erintrin.h. One can see that by compiling sse-23.c testcase with -Wuninitialized (or when actually using those intrinsics). Those 6 spots use an uninitialized variable and pass it as one of the argument to a builtin with constant mask -1, because there is no unmasked builtin. It is true that expansion of those builtins into RTL will see mask is all ones and ignore the unneeded argument, but -Wuninitialized is diagnosed on GIMPLE and on GIMPLE these builtins are just builtin calls. avx512fintrin.h and other headers use in these cases the _mm*_undefined_* () intrinsics, like: return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X, (__v8di) __Y, (__v8di) _mm512_undefined_epi32 (), (__mmask8) -1); etc. and the following patch does the same for avx512erintrin.h. With the recent changes in C++ FE and the _mm*_undefined_* intrinsics, we don't emit -Wuninitialized warnings for those (previously we didn't just in C due to self-initialization). Of course we could also just self-initialize these uninitialized vars and add the #pragma GCC diagnostic dances around it, but using the intrinsics is consistent with the rest and IMHO cleaner. 2023-01-31 Jakub Jelinek <jakub@redhat.com> PR c++/105593 * config/i386/avx512erintrin.h (_mm512_exp2a23_round_pd, _mm512_exp2a23_round_ps, _mm512_rcp28_round_pd, _mm512_rcp28_round_ps, _mm512_rsqrt28_round_pd, _mm512_rsqrt28_round_ps): Use _mm512_undefined_pd () or _mm512_undefined_ps () instead of using uninitialized automatic variable __W. * gcc.target/i386/sse-23.c: Add -Wuninitialized to dg-options. (cherry picked from commit 41602390456901c14ecdfa2fa64c3cebd5b6ff09)
2023-02-10i386: Fix up ix86_convert_const_wide_int_to_broadcast [PR108599]Jakub Jelinek1-0/+32
The following testcase is miscompiled. The problem is that during RTL DSE we see a V4DI register is being loaded { 16, 16, 0, 0 } value and DSE mostly works in terms of scalar modes, so it calls movoi to set an OImode REG to (const_wide_int 0x100000000000000010) and ix86_convert_const_wide_int_to_broadcast thinks it can compute that value by broadcasting DImode 0x10. While it is true that for TImode result the broadcast could be used, for OImode/XImode it can't be, because all but the lowest 2 HOST_WIDE_INTs aren't present (so are 0 or -1 depending on sign), not 0x10 in this case. The function checks if the least significant HOST_WIDE_INT elt of the CONST_WIDE_INT is broadcastable from QI/HI/SI/DImode and then /* Check if OP can be broadcasted from VAL. */ for (int i = 1; i < CONST_WIDE_INT_NUNITS (op); i++) if (val != CONST_WIDE_INT_ELT (op, i)) return nullptr; That is needed of course, but nothing checks that CONST_WIDE_INT_NUNITS (op) isn't too small for the mode in question. I think if op would be 0 or -1, it ought to be never CONST_WIDE_INT, but CONST_INT and so we can just punt whenever the number of CONST_WIDE_INT elts is not the expected one. 2023-01-31 Jakub Jelinek <jakub@redhat.com> PR target/108599 * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Return nullptr if CONST_WIDE_INT_NUNITS (op) times HOST_BITS_PER_WIDE_INT isn't equal to bitsize of mode. * gcc.target/i386/avx2-pr108599.c: New test. (cherry picked from commit 963315a922e228c4f6853826666151fc540f111a)
2023-01-29Enable AMD znver4 support and add instruction reservationsTejas Joshi1-0/+2
2022-09-28 Tejas Joshi <TejasSanjay.Joshi@amd.com> gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver4. * common/config/i386/i386-common.cc (processor_names): Add znver4. (processor_alias_table): Add znver4 and modularize old znvers. * common/config/i386/i386-cpuinfo.h (processor_subtypes): AMDFAM19H_ZNVER4. * config.gcc (x86_64-*-* |...): Likewise. * config/i386/driver-i386.cc (host_detect_local_cpu): Let -march=native recognize znver4 cpus. * config/i386/i386-c.cc (ix86_target_macros_internal): Add znver4. * config/i386/i386-options.cc (m_ZNVER4): New definition. (m_ZNVER): Include m_ZNVER4. (processor_cost_table): Add znver4. * config/i386/i386.cc (ix86_reassociation_width): Likewise. * config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER4. (PTA_ZNVER1): New definition. (PTA_ZNVER2): Likewise. (PTA_ZNVER3): Likewise. (PTA_ZNVER4): Likewise. * config/i386/i386.md (define_attr "cpu"): Add znver4 and rename md file. * config/i386/x86-tune-costs.h (znver4_cost): New definition. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add znver4. (ix86_adjust_cost): Likewise. * config/i386/znver1.md: Rename to znver.md. * config/i386/znver.md: Add new reservations for znver4. * doc/extend.texi: Add details about znver4. * doc/invoke.texi: Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/funcspec-56.inc: Handle new march. * g++.target/i386/mv29.C: Likewise. (cherry picked from commit bf3b532b524ecacb3202ab2c8af419ffaaab7cff)
2023-01-27arm: Fix MVE's vcmp vector-scalar patterns [PR107987]Andre Vieira1-0/+11
This patch surrounds the scalar operand of the MVE vcmp patterns with a vec_duplicate to ensure both operands of the comparision operator have the same (vector) mode. gcc/ChangeLog: PR target/107987 * config/arm/mve.md (mve_vcmp<mve_cmp_op>q_n_<mode>, @mve_vcmp<mve_cmp_op>q_n_f<mode>): Apply vec_duplicate to scalar operand. gcc/testsuite/ChangeLog: * gcc.target/arm/mve/pr107987.c: New test. (cherry picked from commit ed34c3bc3428bce663d42e9eeda10bc0c5d56d5c)
2023-01-25aarch64: fix warning emission for ABI break since GCC 9.1Christophe Lyon6-0/+552
While looking at PR 105549, which is about fixing the ABI break introduced in GCC 9.1 in parameter alignment with bit-fields, we noticed that the GCC 9.1 warning is not emitted in all the cases where it should be. This patch fixes that and the next patch in the series fixes the GCC 9.1 break. We split this into two patches since patch #2 introduces a new ABI break starting with GCC 13.1. This way, patch #1 can be back-ported to release branches if needed to fix the GCC 9.1 warning issue. The main idea is to add a new global boolean that indicates whether we're expanding the start of a function, so that aarch64_layout_arg can emit warnings for callees as well as callers. This removes the need for aarch64_function_arg_boundary to warn (with its incomplete information). However, in the first patch there are still cases where we emit warnings were we should not; this is fixed in patch #2 where we can distinguish between GCC 9.1 and GCC.13.1 ABI breaks properly. The fix in aarch64_function_arg_boundary (replacing & with &&) looks like an oversight of a previous commit in this area which changed 'abi_break' from a boolean to an integer. We also take the opportunity to fix the comment above aarch64_function_arg_alignment since the value of the abi_break parameter was changed in a previous commit, no longer matching the description. 2022-11-28 Christophe Lyon <christophe.lyon@arm.com> Richard Sandiford <richard.sandiford@arm.com> gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Fix comment. (aarch64_layout_arg): Factorize warning conditions. (aarch64_function_arg_boundary): Fix typo. * function.cc (currently_expanding_function_start): New variable. (expand_function_start): Handle currently_expanding_function_start. * function.h (currently_expanding_function_start): Declare. gcc/testsuite/ChangeLog: * gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align8-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning.h: New test. * g++.target/aarch64/bitfield-abi-warning-align16-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C: New test. * g++.target/aarch64/bitfield-abi-warning-align32-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C: New test. * g++.target/aarch64/bitfield-abi-warning-align8-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning.h: New test. (cherry picked from commit 3df1a115be22caeab3ffe7afb12e71adb54ff132)
2023-01-10Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]Stam Markianos-Wright1-0/+300
In the M-Class Arm-ARM: https://developer.arm.com/documentation/ddi0553/bu/?lang=en these MVE instructions only have '!' writeback variant and at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714 we found that the Um constraint would also allow through a register offset writeback, resulting in an assembler error. Here I have added a new constraint and predicate for these instructions, which (uniquely, AFAICT), only support a `!` writeback increment by the data size (inside the compiler this is a POST_INC). No regressions in arm-none-eabi with MVE and MVE.FP. gcc/ChangeLog: PR target/107714 * config/arm/arm-protos.h (mve_struct_mem_operand): New protoype. * config/arm/arm.cc (mve_struct_mem_operand): New function. * config/arm/constraints.md (Ug): New constraint. * config/arm/mve.md (mve_vst4q<mode>): Change constraint. (mve_vst2q<mode>): Likewise. (mve_vld4q<mode>): Likewise. (mve_vld2q<mode>): Likewise. * config/arm/predicates.md (mve_struct_operand): New predicate. gcc/testsuite/ChangeLog: PR target/107714 * gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test. (cherry picked from commit 4269a6567eb991e6838f40bda5be9e3a7972530c)
2023-01-10aarch64: PR target/108140 Handle NULL target in data intrinsic expansionKyrylo Tkachov1-0/+15
In this PR we ICE when expanding the __rbit builtin with a NULL target rtx. I *think* that only happens when the result is unused and hence maybe we shouldn't be expanding any RTL at all, but the ICE here is easily fixed by deriving the mode from the type of the expression rather than the target. This patch does that. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/108140 * config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_data_intrinsic): Handle NULL target. gcc/testsuite/ChangeLog: PR target/108140 * gcc.target/aarch64/acle/pr108140.c: New test. (cherry picked from commit 98756bcbe27647f263f2b312d1d933d70cf56ba9)
2023-01-04rs6000: Raise error for __vector_{quad,pair} uses without MMA enabled [PR106736]Kewen Lin5-0/+92
As PR106736 shows, it's unexpected to use __vector_quad and __vector_pair types without MMA support, it would cause ICE when expanding the corresponding assignment. We can't guard these built-in types registering under MMA support as Peter pointed out in that PR, because the registering is global, it doesn't work for target pragma/attribute support with MMA enabled. The existing verify_type_context mentioned in [2] can help to make the diagnostics invalid built-in type uses better, but as Richard pointed out in [4], it can't deal with all cases. As the discussions in [1][3], this patch is to check the invalid use of built-in types __vector_quad and __vector_pair in mov pattern of OOmode and XOmode, on the currently being expanded gimple assignment statement. It still puts an assertion in else arm rather than just makes it go through, it's to ensure we can catch any other possible unexpected cases in time if there are. [1] https://gcc.gnu.org/pipermail/gcc/2022-December/240218.html [2] https://gcc.gnu.org/pipermail/gcc/2022-December/240220.html [3] https://gcc.gnu.org/pipermail/gcc/2022-December/240223.html [4] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608083.html PR target/106736 gcc/ChangeLog: * config/rs6000/mma.md (define_expand movoo): Call function rs6000_opaque_type_invalid_use_p to check and emit error message for the invalid use of opaque type. (define_expand movxo): Likewise. * config/rs6000/rs6000-protos.h (rs6000_opaque_type_invalid_use_p): New function declaration. (currently_expanding_gimple_stmt): New extern declaration. * config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): New function. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106736-1.c: New test. * gcc.target/powerpc/pr106736-2.c: Likewise. * gcc.target/powerpc/pr106736-3.c: Likewise. * gcc.target/powerpc/pr106736-4.c: Likewise. * gcc.target/powerpc/pr106736-5.c: Likewise.
2022-12-15AArch64: Add UNSPECV_PATCHABLE_AREA [PR98776]Sebastian Pop3-2/+13
Currently patchable area is at the wrong place on AArch64. It is placed immediately after function label, before .cfi_startproc. This patch adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and modifies aarch64_print_patchable_function_entry to avoid placing patchable area before .cfi_startproc. gcc/ PR target/98776 * config/aarch64/aarch64-protos.h (aarch64_output_patchable_area): Declared. * config/aarch64/aarch64.cc (aarch64_print_patchable_function_entry): Emit an UNSPECV_PATCHABLE_AREA pseudo instruction. (aarch64_output_patchable_area): New. * config/aarch64/aarch64.md (UNSPECV_PATCHABLE_AREA): New. (patchable_area): Define. gcc/testsuite/ PR target/98776 * gcc.target/aarch64/pr98776.c: New. * gcc.target/aarch64/pr92424-2.c: Adjust pattern. * gcc.target/aarch64/pr92424-3.c: Adjust pattern.
2022-12-12tree-optimization/107647 - avoid FMA from SLP with -ffp-contract=offRichard Biener1-0/+17
Only with -ffp-contract=fast we can synthesize FMA operations like vfmaddsub231ps, so properly guard the transform in SLP pattern detection. PR tree-optimization/107647 * tree-vect-slp-patterns.cc (addsub_pattern::recognize): Only allow FMA generation with -ffp-contract=fast for FP types. (complex_mul_pattern::matches): Likewise. * gcc.target/i386/pr107647.c: New testcase. (cherry picked from commit c5df8392c5848c0462558f41cdf6eab31db301cf)
2022-12-09i386: fix assert (__builtin_cpu_supports ("x86-64") >= 0)Martin Liska1-0/+5
Similar story as PR103661, we again return a negative number for __builtin_cpu_supports: Documentation says: int __builtin_cpu_supports(const char *feature) This function returns a positive integer if the run-time CPU supports feature and returns 0 otherwise. while we return -2147483648. Moreover, I noticed "x86-64" is not a valid option for __builtin_cpu_is, but for __builtin_cpu_supports. PR target/107551 gcc/ChangeLog: * config/i386/i386-builtins.cc (fold_builtin_cpu): Use same path as for PR103661. * doc/extend.texi: Fix "x86-64" use. gcc/testsuite/ChangeLog: * gcc.target/i386/builtin_target.c: Add more checks. (cherry picked from commit d71b20fc30965ba8326ad9363d0aca9d61eb4ed3)
2022-12-06aarch64: Specify that FEAT_MOPS sequences clobber CCKyrylo Tkachov3-0/+50
According to the architecture pseudocode the FEAT_MOPS sequences overwrite the NZCV flags as par of their operation, so GCC needs to model that in the relevant RTL patterns. For the testcase: void g(); void foo (int a, size_t N, char *__restrict__ in, char *__restrict__ out) { if (a != 3) __builtin_memcpy (out, in, N); if (a > 3) g (); } we will currently generate: foo: cmp w0, 3 bne .L6 .L1: ret .L6: cpyfp [x3]!, [x2]!, x1! cpyfm [x3]!, [x2]!, x1! cpyfe [x3]!, [x2]!, x1! ble .L1 // Flags reused after CPYF* sequence b g This is wrong as the result of cmp needs to be recalculated after the MOPS sequence. With this patch we'll insert a "cmp w0, 3" before the ble, similar to what clang does. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk and to the GCC 12 branch after some baking time. gcc/ChangeLog: * config/aarch64/aarch64.md (aarch64_cpymemdi): Specify clobber of CC reg. (*aarch64_cpymemdi): Likewise. (aarch64_movmemdi): Likewise. (aarch64_setmemdi): Likewise. (*aarch64_setmemdi): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/mops_5.c: New test. * gcc.target/aarch64/mops_6.c: Likewise. * gcc.target/aarch64/mops_7.c: Likewise. (cherry picked from commit cbdffae5745327b0e5eb887afc512daf34b049b1)
2022-12-01Fix unrecognizable insn due to illegal immediate_operand (const_int 255) of ↵liuhongt1-0/+8
QImode. For __builtin_ia32_vec_set_v16qi (a, -1, 2) with !flag_signed_char. it's transformed to __builtin_ia32_vec_set_v16qi (_4, 255, 2) in the gimple, and expanded to (const_int 255) in the rtl. But for immediate_operand, it expects (const_int 255) to be signed extended to (const_int -1). The mismatch caused an unrecognizable insn error. The patch converts (const_int 255) to (const_int -1) in the backend expander. gcc/ChangeLog: PR target/107863 * config/i386/i386-expand.cc (ix86_expand_vec_set_builtin): Convert op1 to target mode whenever mode mismatch. gcc/testsuite/ChangeLog: * gcc.target/i386/pr107863.c: New test.
2022-11-19LoongArch: Fix atomic_exchange expanding [PR107713]Jinyang He2-0/+59
We used to expand atomic_exchange_n(ptr, new, mem_order) for subword types into something like: { __typeof__(*ptr) t = atomic_load_n(ptr, mem_order); atomic_compare_exchange_n(ptr, &t, new, true, mem_order, mem_order); return t; } It's incorrect because another thread may store a different value into *ptr after atomic_load_n. Then atomic_compare_exchange_n will not store into *ptr, but atomic_exchange_n should always perform the store. gcc/ChangeLog: PR target/107713 * config/loongarch/sync.md (atomic_cas_value_exchange_7_<mode>): New define_insn. (atomic_exchange): Use atomic_cas_value_exchange_7_si instead of atomic_cas_value_cmp_and_7_si. gcc/testsuite/ChangeLog: PR target/107713 * gcc.target/loongarch/pr107713-1.c: New test. * gcc.target/loongarch/pr107713-2.c: New test. (cherry picked from commit f0024bfb228f94e60e06dc32a4983e40a9b90be5)
2022-11-08Always use TYPE_MODE instead of DECL_MODE for vector fieldH.J. Lu1-0/+39
e034c5c8957 re PR target/78643 (ICE in convert_move, at expr.c:230) fixed the case where DECL_MODE of a vector field is BLKmode and its TYPE_MODE is a vector mode because of target attribute. Remove the BLKmode check for the case where DECL_MODE of a vector field is a vector mode and its TYPE_MODE isn't a vector mode because of target attribute. gcc/ PR target/107304 * expr.cc (get_inner_reference): Always use TYPE_MODE for vector field with vector raw mode. gcc/testsuite/ PR target/107304 * gcc.target/i386/pr107304.c: New test. (cherry picked from commit 1c64aba8cdf6509533f554ad86640f274cdbe37f)
2022-10-26aarch64: fix off-by-one in reading cpuinfoPhilipp Tomsich2-0/+23
Fixes: 341573406b39 Don't subtract one from the result of strnlen() when trying to point to the first character after the current string. This issue would cause individual characters (where the 128 byte buffers are stitched together) to be lost. gcc/ChangeLog: * config/aarch64/driver-aarch64.cc (readline): Fix off-by-one. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/info_18: New test. * gcc.target/aarch64/cpunative/native_cpu_18.c: New test. (cherry picked from commit b1cfbccc41de6aec950c0f662e7e85ab34bfff8a)
2022-10-25IBM zSystems: Fix function_ok_for_sibcall [PR106355]Stefan Schulze Frielinghaus4-0/+76
For a parameter with BLKmode we cannot use REG_NREGS in order to determine the number of consecutive registers. Streamlined this with the implementation of s390_function_arg. Fix some indentation whitespace, too. gcc/ChangeLog: PR target/106355 * config/s390/s390.cc (s390_call_saved_register_used): For a parameter with BLKmode fix determining number of consecutive registers. gcc/testsuite/ChangeLog: * gcc.target/s390/pr106355.h: Common code for new tests. * gcc.target/s390/pr106355-1.c: New test. * gcc.target/s390/pr106355-2.c: New test. * gcc.target/s390/pr106355-3.c: New test. (cherry picked from commit cb994acc08b67f26a54e7c5dc1f4995a2ce24d98)
2022-10-20aarch64: Prevent generation of /M BRKAS and BRKBSRichard Sandiford2-4/+6
Bit of a brown-paper-bag bug, but: GCC was generating non-existent merging forms of BRKAS and BRKBS. Those instructions only support zero predication (although BRKA and BRKB support both). gcc/ * config/aarch64/aarch64-sve.md (*aarch64_brk<brk_op>_cc): Remove merging alternative. (*aarch64_brk<brk_op>_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/brka_1.c: Expect a separate PTEST instruction. * gcc.target/aarch64/sve/acle/general/brkb_1.c: Likewise. (cherry picked from commit 57675c7f92a3bd3ca8dae1faac7f2f51d40e0f9e)
2022-10-20aarch64: Fix matching of BRKNSRichard Sandiford2-2/+26
Unlike other flag-setting SVE instructions, BRKNS sets the flags based on an all-true governing predicate, rather than the GP operand. gcc/ * config/aarch64/iterators.md (SVE_BRKP): New iterator. * config/aarch64/aarch64-sve.md (*aarch64_brkn_cc): New pattern. (*aarch64_brkn_ptest): Likewise. (*aarch64_brk<brk_op>_cc): Restrict to SVE_BRKP. (*aarch64_brk<brk_op>_ptest): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/brkn_1.c: Expect separate PTEST instructions. * gcc.target/aarch64/sve/acle/general/brkn_2.c: New test. (cherry picked from commit 6bec66640597e2604f51fc1642c7d279164cd442)
2022-10-20aarch64: Define __ARM_FEATURE_RCPCRichard Sandiford1-0/+20
https://github.com/ARM-software/acle/pull/199 adds a new feature macro for RCPC, for use in things like inline assembly. This patch adds the associated support to GCC. Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a entry didn't include it. This was probably harmless in practice since GCC simply ignored the extension until now. (The GAS definition is OK.) gcc/ * config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH8_3): Add AARCH64_FL_RCPC. (AARCH64_ISA_RCPC): New macro. * config/aarch64/aarch64-cores.def (thunderx3t110, zeus, neoverse-v1) (neoverse-512tvb, saphira): Remove RCPC from these Armv8.3-A+ cores. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define __ARM_FEATURE_RCPC when appropriate. gcc/testsuite/ * gcc.target/aarch64/pragma_cpp_predefs_1.c: Add RCPC tests.
2022-10-19rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]Kewen Lin1-0/+14
As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note makes one assumption that we have emitted one insn which restores the frame pointer previously. That part of code was guarded with flag frame_pointer_needed before, it was consistent, but it was replaced with flag frame_pointer_needed_indeed since commit r10-7981. It caused ICE due to unexpected NULL insn. PR target/96072 gcc/ChangeLog: * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the condition for adding REG_CFA_DEF_CFA reg note with frame_pointer_needed_indeed. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr96072.c: New test. (cherry picked from commit 5be0950d22209f5ba69d244387228e12389a8470)