aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2024-07-28[RISC-V][target/116085] Fix rv64 minmax extension avoidance splitterJeff Law2-18/+29
A patch introduced a pattern to avoid unnecessary extensions when doing a min/max operation where one of the values is a 32 bit positive constant. > (define_insn_and_split "*minmax" > [(set (match_operand:DI 0 "register_operand" "=r") > (sign_extend:DI > (subreg:SI > (bitmanip_minmax:DI (zero_extend:DI (match_operand:SI 1 "register_operand" "r")) > (match_operand:DI 2 "immediate_operand" "i")) > 0))) > (clobber (match_scratch:DI 3 "=&r")) > (clobber (match_scratch:DI 4 "=&r"))] > "TARGET_64BIT && TARGET_ZBB && sext_hwi (INTVAL (operands[2]), 32) >= 0" > "#" > "&& reload_completed" > [(set (match_dup 3) (sign_extend:DI (match_dup 1))) > (set (match_dup 4) (match_dup 2)) > (set (match_dup 0) (<minmax_optab>:DI (match_dup 3) (match_dup 4)))] Lots going on in here. The key is the nonconstant value is zero extended from SI to DI in the original RTL and we know the constant value is unchanged if we were to sign extend it from 32 to 64 bits. We change the extension of the nonconstant operand from zero to sign extension. I'm pretty confident the goal there is take advantage of the fact that SI values are kept sign extended and will often be optimized away. The problem occurs when the nonconstant operand has the SI sign bit set. As an example: smax (0x8000000, 0x7) resulting in 0x80000000 The split RTL will generate smax (sign_extend (0x80000000), 0x7)) smax (0xffffffff80000000, 0x7) resulting in 0x7 Opps. We really needed to change the opcode to umax for this transformation to work. That's easy enough. But there's further improvements we can make. First the pattern is a define_and_split with a post-reload split condition. It would be better implemented as a 4->3 define_split so that the costing model just works. Second, if operands[1] is a suitably promoted subreg, then we can elide the sign extension when we generate the split code, so often it'll be a 4->2 split, again with the cost model working with no adjustments needed. Tested on rv32 and rv64 in my tester. I'll wait for the pre-commit tester to spin it as well. PR target/116085 gcc/ * config/riscv/bitmanip.md (minmax extension avoidance splitter): Rewrite as a simpler define_split. Adjust the opcode appropriately. Avoid emitting sign extension if it's clearly not needed. * config/riscv/iterators.md (minmax_optab): Rename to uminmax_optab and map everything to unsigned variants. gcc/testsuite/ * gcc.target/riscv/pr116085.c: New test.
2024-07-28aarch64: sve: Rename aarch64_bic to standard pattern, andnAndrew Pinski2-3/+3
Now there is an optab for bic, andn since r15-1890-gf379596e0ba99d. This moves aarch64_bic for sve over to use it instead. Note unlike the simd bic patterns, the operands were already in the order that was expected for the optab so no swapping was needed. Built and tested on aarch64-linux-gnu with no regressions. gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand): Update to use andn optab instead of using code_for_aarch64_bic. * config/aarch64/aarch64-sve.md (@aarch64_bic<mode>): Rename to ... (andn<mode>3): This. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-07-28aarch64: Use iorn and andn standard pattern names for scalar modesAndrew Pinski1-6/+6
Since r15-1890-gf379596e0ba99d, these are the new optabs. So let's use these names for them. These will be used to generate during expand from gimple in the next few patches. Built and tested for aarch64-linux-gnu with no regressions. gcc/ChangeLog: * config/aarch64/aarch64.md (*<NLOGICAL:optab>_one_cmpl<mode>3): Rename to ... (<NLOGICAL:optab>n<mode>3): This. (*<NLOGICAL:optab>_one_cmplsidi3_ze): Rename to ... (*<NLOGICAL:optab>nsidi3_ze): This. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-07-28aarch64: Rename bic/orn patterns to iorn/andn for vector modesAndrew Pinski1-10/+10
This renames the patterns orn<mode>3 to iorn<mode>3 so it matches the new optab that was added with r15-1890-gf379596e0ba99d. Likewise for bic<mode>3 to andn<mode>3. Note the operand 1 and operand 2 are swapped from the original patterns to match the optab now. Built and tested for aarch64-linux-gnu with no regression. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (bic<mode>3<vczle><vczbe>): Rename to ... (andn<mode>3<vczle><vczbe>): This. Also swap operands. (orn<mode>3<vczle><vczbe>): Rename to ... (iorn<mode>3<vczle><vczbe>): This. Also swap operands. (vec_cmp<mode><v_int_equiv>): Update orn call to iorn and swap the last two arguments. gcc/testsuite/ChangeLog: * g++.target/aarch64/vect_cmp-1.C: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-07-28aarch64: Fix target/optimize option handling with transiting between O1 to O2Andrew Pinski1-1/+1
The problem here is the aarch64 backend enables -mearly-ra at -O2 and above but it is not marked as an Optimization in the .opt file so enabling it sometimes reset the target options when going from -O1 to -O2 for the first time. Build and tested for aarch64-linux-gnu with no regressions. PR target/116065 gcc/ChangeLog: * config/aarch64/aarch64.opt (mearly-ra=): Mark as Optimization rather than Save. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/target_optimization-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-07-28RISC-V: Work around bare apostrophe in error string.Robin Dapp1-1/+1
An unquoted apostrophe slipped through when testing the recent V/M extension patch. This, again, re-words the message to "Currently the 'V' implementation requires the 'M' extension". Going to commit as obvious after testing. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_override_options_internal): Reword error string without apostrophe. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr116036.c: Adjust expected error string.
2024-07-28i386: Use BLKmode for {ld,st}tilecfgHaochen Jiang2-8/+6
Hi all, For AMX instructions related with memory, we will treat the memory size as not specified since there won't be different size causing confusion for memory. This will change the output under Intel mode, which is broken for now when using with assembler and aligns to current binutils behavior. Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk? Thx, Haochen gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_builtin): Change from XImode to BLKmode. * config/i386/i386.md (ldtilecfg): Change XI to BLK. (sttilecfg): Ditto.
2024-07-28rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, ↵Carl Love3-81/+10
__builtin_vsx_set_2di The built-ins set a value in a vector. The same operation can be done in C-code. The assembly code generated from the C-code is as good or better than the code generated by the built-ins. With default optimization the number of assembly generated for the two methods are similar. With -O3 optimization, the assembly generated for the two approaches is identical for the 2DF and 2DI types. The assembly for the C-code version of the 1Ti requires one less assembly instruction. It also only uses one load versus two loads for the built-in. With the removal of the built-ins, there are no other uses of the set built-in attribute. The code associated with the set built-in attribute is removed. Finally, the testcase for the __builtin_vsx_set_2df is removed. The other built-ins do not have testcases. gcc/ChangeLog: * config/rs6000/rs6000-builtin.cc (get_element_number, altivec_expand_vec_set_builtin): Remove functions. (rs6000_expand_builtin): Remove the if statement to call altivec_expand_vec_set_builtin. * config/rs6000/rs6000-builtins.def (__builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di): Remove the built-in definitions. * config/rs6000/rs6000-gen-builtins.cc (struct attrinfo): Remove the isset variable from the structure. (parse_bif_attrs): Remove the uses of the isset variable. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-3.c: Remove test cases for the __builtin_vsx_set_2df built-in.
2024-07-28rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, ↵Carl Love2-53/+0
__builtin_vec_set_v2di This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and __builtin_vec_set_v2di built-ins. The users should just use normal C-code to update the various vector elements. This change was originally intended to be part of the earlier series of cleanup patches. It was initially thought that some additional work would be needed to do some gimple generation instead of these built-ins. However, the existing default code generation does produce the needed code. For the vec_set bif, the equivalent C code is as good or better than the built-in. For the vec_insert bif whose resolving previously made use of the vec_set bif, the assembly code generation is as good as before with the -O3 optimization. Remove the built-ins, use the default gimple generation instead. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in definitions. * config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove the handling for constant vec_insert position with VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.
2024-07-28rs6000, remove __builtin_vsx_xvcmp* built-insCarl Love1-9/+0
This patch removes the built-ins: __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp. which are similar to the recommended PVIPR documented overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins. The difference is that the overloaded built-ins return a vector of 32-bit booleans. The removed built-ins returned a vector of floats. The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp are not removed as they are used by the overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins. The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp are changed to use the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins. Use of the overloaded built-ins requires the result to be stored in a vector of boolean of the appropriate size or the result must be cast to the return type used by the original __builtin_vsx_xvcmp* built-ins. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp): Remove definitions. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-3.c (do_cmp): Replace __builtin_vsx_xvcmp{eq,gt,ge}{sp,dp} by vec_cmp{eq,gt,ge} respectively and add explicit casts to vector {float,double}. Add more testing code assigning result to vector boolean types.
2024-07-28RISC-V: xtheadmemidx: Fix mode test for pre/post-modify addressingChristoph Müllner1-4/+2
auto_inc_dec (-O3) performs optimizations like the following if RVV and XTheadMemIdx is enabled. (insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM <vector(4) char> [(char *)_39]+0 S4 A32]) (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 {*movv4qi} (nil)) (insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ]) (plus:DI (reg:DI 136 [ ivtmp.13 ]) (const_int 20 [0x14]))) 5 {adddi3} (nil)) ====> (insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ]) (plus:DI (reg:DI 136 [ ivtmp.13 ]) (const_int 20 [0x14]))) [0 MEM <vector(4) char> [(char *)_39]+0 S4 A32]) (reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 {*movv4qi} (expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ]) (nil))) The reason why the pass believes that this is legal is, that the mode test in th_memidx_classify_address_modify() requires INTEGRAL_MODE_P (mode), which includes vector modes. Let's restrict the mode test such, that only MODE_INT is allowed. PR target/116033 gcc/ChangeLog: * config/riscv/thead.cc (th_memidx_classify_address_modify): Fix mode test. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr116033.c: New test. Reported-by: Patrick O'Neill <patrick@rivosinc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-07-28rtl-ssa: Define INCLUDE_ARRAYRichard Sandiford4-0/+4
g:72fbd3b2b2a497dbbe6599239bd61c5624203ed0 added a use of std::array without explicitly forcing <array> to be included. That didn't cause problems in my local builds but understandably did for some people. gcc/ * doc/rtl.texi: Document the need to define INCLUDE_ARRAY before including rtl-ssa.h. * rtl-ssa.h: Likewise (in comment). * config/aarch64/aarch64-cc-fusion.cc: Add INCLUDE_ARRAY. * config/aarch64/aarch64-early-ra.cc: Likewise. * config/riscv/riscv-avlprop.cc: Likewise. * config/riscv/riscv-vsetvl.cc: Likewise. * fwprop.cc: Likewise. * late-combine.cc: Likewise. * pair-fusion.cc: Likewise. * rtl-ssa/accesses.cc: Likewise. * rtl-ssa/blocks.cc: Likewise. * rtl-ssa/changes.cc: Likewise. * rtl-ssa/functions.cc: Likewise. * rtl-ssa/insns.cc: Likewise. * rtl-ssa/movement.cc: Likewise.
2024-07-28RISC-V: Error early with V and no M extension.Robin Dapp1-0/+5
For calculating the value of a poly_int at runtime we use a multiplication instruction that requires the M extension. Instead of just asserting and ICEing this patch emits an early error at option-parsing time. gcc/ChangeLog: PR target/116036 * config/riscv/riscv.cc (riscv_override_options_internal): Error with TARGET_VECTOR && !TARGET_MUL. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-31.c: Add m to arch string and expect it. * gcc.target/riscv/arch-32.c: Ditto. * gcc.target/riscv/arch-37.c: Ditto. * gcc.target/riscv/arch-38.c: Ditto. * gcc.target/riscv/predef-14.c: Ditto. * gcc.target/riscv/predef-15.c: Ditto. * gcc.target/riscv/predef-16.c: Ditto. * gcc.target/riscv/predef-26.c: Ditto. * gcc.target/riscv/predef-27.c: Ditto. * gcc.target/riscv/predef-32.c: Ditto. * gcc.target/riscv/predef-33.c: Ditto. * gcc.target/riscv/predef-36.c: Ditto. * gcc.target/riscv/predef-37.c: Ditto. * gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string. * gcc.target/riscv/compare-debug-1.c: Ditto. * gcc.target/riscv/compare-debug-2.c: Ditto. * gcc.target/riscv/rvv/base/pr116036.c: New test.
2024-07-28RISC-V: Allow LICM hoist POLY_INT configuration code sequenceJuzhe-Zhong1-4/+5
Realize in recent benchmark evaluation (coremark-pro zip-test): vid.v v2 vmv.v.i v5,0 .L9: vle16.v v3,0(a4) vrsub.vx v4,v2,a6 ---> LICM failed to hoist it outside the loop. The root cause is: (insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0) (reg:DI 223)) "rvv.c":11:9 208 {*movdi_64bit} -> Its result used by the following vrsub.vx then supress the hoist of the vrsub.vx (nil)) (insn 57 56 59 4 (set (reg:RVVMF2HI 216) (if_then_else:RVVMF2HI (unspec:RVVMF32BI [ (const_vector:RVVMF32BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 350) (const_int 2 [0x2]) repeated x2 (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (minus:RVVMF2HI (vec_duplicate:RVVMF2HI (reg:HI 220)) (reg:RVVMF2HI 217)) (unspec:RVVMF2HI [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) "rvv.c":11:9 6938 {pred_subrvvmf2hi_reverse_scalar} (expr_list:REG_DEAD (reg:HI 220) (nil))) This patch fixes it generate (set (reg:HI) (subreg:HI (reg:DI))) instead of (set (subreg:DI (reg:DI)) (reg:DI)). After this patch: vid.v v2 vrsub.vx v2,v2,a7 vmv.v.i v4,0 .L3: vle16.v v3,0(a4) Tested on both RV32 and RV64 no regression. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_move): Fix poly_int dest generation. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/poly_licm-1.c: New test. * gcc.target/riscv/rvv/autovec/poly_licm-2.c: New test. * gcc.target/riscv/rvv/autovec/poly_licm-3.c: New test.
2024-07-28SVE Intrinsics: Change return type of redirect_call to gcall.Jennifer Schmitz3-5/+5
As suggested in the review of https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657474.html, this patch changes the return type of gimple_folder::redirect_call from gimple * to gcall *. The motivation for this is that so far, most callers of the function had been casting the result of the function to gcall. These call sites were updated. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::redirect_call): Update return type. * config/aarch64/aarch64-sve-builtins.h: Likewise. * config/aarch64/aarch64-sve-builtins-sve2.cc (svqshl_impl::fold): Remove cast to gcall. (svrshl_impl::fold): Likewise.
2024-07-28i386: Adjust rtx cost for imulq and imulw [PR115749]Lingling Kong1-8/+8
gcc/ChangeLog: PR target/115749 * config/i386/x86-tune-costs.h (struct processor_costs): Adjust rtx_cost of imulq and imulw for COST_N_INSNS (4) to COST_N_INSNS (3). gcc/testsuite/ChangeLog: * gcc.target/i386/pr115749.c: New test.
2024-07-28aarch64: Extend aarch64_feature_flags to 128 bitsAndrew Carlotti4-13/+25
Replace the existing uint64_t typedef with a bbitmap<2> typedef. Most of the preparatory work was carried out in previous commits, so this patch itself is fairly small. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (aarch64_set_asm_isa_flags): Store a second uint64_t value. * config/aarch64/aarch64-opts.h (aarch64_feature_flags): Switch typedef to bbitmap<2>. * config/aarch64/aarch64.cc (aarch64_set_current_function): Extract isa mode from val[0]. * config/aarch64/aarch64.h (aarch64_get_asm_isa_flags): Load a second uint64_t value. (aarch64_get_isa_flags): Ditto. (aarch64_asm_isa_flags): Ditto. (aarch64_isa_flags): Ditto. (HANDLE): Use bbitmap<2>::from_index to initialise flags. (AARCH64_FL_ISA_MODES): Do arithmetic on integer type. (AARCH64_ISA_MODE): Extract value from bbitmap<2> array. * config/aarch64/aarch64.opt (aarch64_asm_isa_flags_1): New variable. (aarch64_isa_flags_1): Ditto.
2024-07-28aarch64: Use constructor explicitly in get_flags_offAndrew Carlotti1-2/+3
gcc/ChangeLog: * config/aarch64/aarch64-feature-deps.h (get_flags_off): Construct aarch64_feature_flags (0) explicitly.
2024-07-28aarch64: Add bool conversion to TARGET_* macrosAndrew Carlotti5-131/+79
Use a new AARCH64_HAVE_ISA macro in TARGET_* definitions, and eliminate all the AARCH64_ISA_* feature macros. gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros): Use TARGET_V8R macro. (aarch64_update_cpp_builtins): Use TARGET_* macros. * config/aarch64/aarch64.h (AARCH64_HAVE_ISA): New macro. (AARCH64_ISA_SM_OFF, AARCH64_ISA_SM_ON, AARCH64_ISA_ZA_ON) (AARCH64_ISA_V8A, AARCH64_ISA_V8_1A, AARCH64_ISA_CRC) (AARCH64_ISA_FP, AARCH64_ISA_SIMD, AARCH64_ISA_LSE) (AARCH64_ISA_RDMA, AARCH64_ISA_V8_2A, AARCH64_ISA_F16) (AARCH64_ISA_SVE, AARCH64_ISA_SVE2, AARCH64_ISA_SVE2_AES) (AARCH64_ISA_SVE2_BITPERM, AARCH64_ISA_SVE2_SHA3) (AARCH64_ISA_SVE2_SM4, AARCH64_ISA_SME, AARCH64_ISA_SME_I16I64) (AARCH64_ISA_SME_F64F64, AARCH64_ISA_SME2, AARCH64_ISA_V8_3A) (AARCH64_ISA_DOTPROD, AARCH64_ISA_AES, AARCH64_ISA_SHA2) (AARCH64_ISA_V8_4A, AARCH64_ISA_SM4, AARCH64_ISA_SHA3) (AARCH64_ISA_F16FML, AARCH64_ISA_RCPC, AARCH64_ISA_RCPC8_4) (AARCH64_ISA_RNG, AARCH64_ISA_V8_5A, AARCH64_ISA_TME) (AARCH64_ISA_MEMTAG, AARCH64_ISA_V8_6A, AARCH64_ISA_I8MM) (AARCH64_ISA_F32MM, AARCH64_ISA_F64MM, AARCH64_ISA_BF16) (AARCH64_ISA_SB, AARCH64_ISA_RCPC3, AARCH64_ISA_V8R) (AARCH64_ISA_PAUTH, AARCH64_ISA_V8_7A, AARCH64_ISA_V8_8A) (AARCH64_ISA_V8_9A, AARCH64_ISA_V9A, AARCH64_ISA_V9_1A) (AARCH64_ISA_V9_2A, AARCH64_ISA_V9_3A, AARCH64_ISA_V9_4A) (AARCH64_ISA_MOPS, AARCH64_ISA_LS64, AARCH64_ISA_CSSC) (AARCH64_ISA_D128, AARCH64_ISA_THE, AARCH64_ISA_GCS): Remove. (TARGET_BASE_SIMD, TARGET_SIMD, TARGET_FLOAT) (TARGET_NON_STREAMING, TARGET_STREAMING, TARGET_ZA, TARGET_SHA2) (TARGET_SHA3, TARGET_AES, TARGET_SM4, TARGET_F16FML) (TARGET_CRC32, TARGET_LSE, TARGET_FP_F16INST) (TARGET_SIMD_F16INST, TARGET_DOTPROD, TARGET_SVE, TARGET_SVE2) (TARGET_SVE2_AES, TARGET_SVE2_BITPERM, TARGET_SVE2_SHA3) (TARGET_SVE2_SM4, TARGET_SME, TARGET_SME_I16I64) (TARGET_SME_F64F64, TARGET_SME2, TARGET_ARMV8_3, TARGET_JSCVT) (TARGET_FRINT, TARGET_TME, TARGET_RNG, TARGET_MEMTAG) (TARGET_I8MM, TARGET_SVE_I8MM, TARGET_SVE_F32MM) (TARGET_SVE_F64MM, TARGET_BF16_FP, TARGET_BF16_SIMD) (TARGET_SVE_BF16, TARGET_PAUTH, TARGET_BTI, TARGET_MOPS) (TARGET_LS64, TARGET_CSSC, TARGET_SB, TARGET_RCPC, TARGET_RCPC2) (TARGET_RCPC3, TARGET_SIMD_RDMA, TARGET_ARMV9_4, TARGET_D128) (TARGET_THE, TARGET_GCS): Redefine using AARCH64_HAVE_ISA. (TARGET_V8R, TARGET_V9A): New. * config/aarch64/aarch64.md (arch_enabled): Use TARGET_RCPC2. * config/aarch64/iterators.md (GPI_I16): Use TARGET_FP_F16INST. (GPF_F16): Ditto. * config/aarch64/predicates.md (aarch64_rcpc_memory_operand): Use TARGET_RCPC2.
2024-07-28aarch64: Add explicit bool cast to return valueAndrew Carlotti1-1/+1
gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_valid_sysreg_name_p): Add bool cast.
2024-07-28aarch64: Decouple feature flag option storage typeAndrew Carlotti2-5/+10
The awk scripts that process the .opt files are relatively fragile and only handle a limited set of data types correctly. The unrecognised aarch64_feature_flags type is handled as a uint64_t, which happens to be correct for now. However, that assumption will change when we extend the mask to 128 bits. This patch changes the option members to use uint64_t types, and adds a "_0" suffix to the names (both for future extensibility, and to allow the original name to be used for the full aarch64_feature_flags mask within generator files). gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (aarch64_set_asm_isa_flags): Reorder, and add suffix to names. * config/aarch64/aarch64.h (aarch64_get_asm_isa_flags): Add "_0" suffix. (aarch64_get_isa_flags): Ditto. (aarch64_asm_isa_flags): Redefine using renamed uint64_t value. (aarch64_isa_flags): Ditto. * config/aarch64/aarch64.opt: (aarch64_asm_isa_flags): Rename to... (aarch64_asm_isa_flags_0): ...this, and change to uint64_t. (aarch64_isa_flags): Rename to... (aarch64_isa_flags_0): ...this, and change to uint64_t.
2024-07-28aarch64: Define aarch64_get_{asm_|}isa_flagsAndrew Carlotti2-22/+25
Building an aarch64_feature_flags value from data within a gcc_options or cl_target_option struct will get more complicated in a later commit. Use a macro to avoid doing this manually in more than one location. gcc/ChangeLog: * common/config/aarch64/aarch64-common.cc (aarch64_handle_option): Use new macro. * config/aarch64/aarch64.cc (aarch64_override_options_internal): Ditto. (aarch64_option_print): Ditto. (aarch64_set_current_function): Ditto. (aarch64_can_inline_p): Ditto. (aarch64_declare_function_name): Ditto. (aarch64_start_file): Ditto. * config/aarch64/aarch64.h (aarch64_get_asm_isa_flags): New (aarch64_get_isa_flags): New. (aarch64_asm_isa_flags): Use new macro. (aarch64_isa_flags): Ditto.
2024-07-28aarch64: Introduce aarch64_isa_mode typeAndrew Carlotti4-77/+94
Currently there are many places where an aarch64_feature_flags variable is used, but only the bottom three isa mode bits are set and read. Using a separate data type for these value makes it more clear that they're not expected or required to have any of their upper feature bits set. It will also make things simpler and more efficient when we extend aarch64_feature_flags to 128 bits. This patch uses explicit casts whenever converting from an aarch64_feature_flags value to an aarch64_isa_mode value. This isn't strictly necessary, but serves to highlight the locations where an explicit conversion will become necessary later. gcc/ChangeLog: * config/aarch64/aarch64-opts.h: Add aarch64_isa_mode typedef. * config/aarch64/aarch64-protos.h (aarch64_gen_callee_cookie): Use aarch64_isa_mode parameter. (aarch64_sme_vq_immediate): Ditto. * config/aarch64/aarch64.cc (aarch64_fntype_pstate_sm): Use aarch64_isa_mode values. (aarch64_fntype_pstate_za): Ditto. (aarch64_fndecl_pstate_sm): Ditto. (aarch64_fndecl_pstate_za): Ditto. (aarch64_fndecl_isa_mode): Ditto. (aarch64_cfun_incoming_pstate_sm): Ditto. (aarch64_cfun_enables_pstate_sm): Ditto. (aarch64_call_switches_pstate_sm): Ditto. (aarch64_gen_callee_cookie): Ditto. (aarch64_callee_isa_mode): Ditto. (aarch64_insn_callee_abi): Ditto. (aarch64_sme_vq_immediate): Ditto. (aarch64_add_offset_temporaries): Ditto. (aarch64_add_offset): Ditto. (aarch64_add_sp): Ditto. (aarch64_sub_sp): Ditto. (aarch64_guard_switch_pstate_sm): Ditto. (aarch64_switch_pstate_sm): Ditto. (aarch64_init_cumulative_args): Ditto. (aarch64_allocate_and_probe_stack_space): Ditto. (aarch64_expand_prologue): Ditto. (aarch64_expand_epilogue): Ditto. (aarch64_start_call_args): Ditto. (aarch64_expand_call): Ditto. (aarch64_end_call_args): Ditto. (aarch64_set_current_function): Ditto, with added conversions. (aarch64_handle_attr_arch): Avoid macro with changed type. (aarch64_handle_attr_cpu): Ditto. (aarch64_handle_attr_isa_flags): Ditto. (aarch64_switch_pstate_sm_for_landing_pad): Use arch64_isa_mode values. (aarch64_switch_pstate_sm_for_jump): Ditto. (pass_switch_pstate_sm::gate): Ditto. * config/aarch64/aarch64.h (AARCH64_ISA_MODE_{SM_ON|SM_OFF|ZA_ON}): New macros. (AARCH64_FL_SM_STATE): Mark as possibly unused. (AARCH64_ISA_MODE_SM_STATE): New aarch64_isa_mode mask. (AARCH64_DEFAULT_ISA_MODE): New aarch64_isa_mode value. (AARCH64_FL_DEFAULT_ISA_MODE): Define using above value. (AARCH64_ISA_MODE): Change type to aarch64_isa_mode. (arm_pcs): Use aarch64_isa_mode value.
2024-07-28aarch64: Eliminate a temporary variable.Andrew Carlotti1-5/+4
The name would become misleading in a later commit anyway, and I think this is marginally more readable. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_override_options): Remove temporary variable.
2024-07-28aarch64: Move AARCH64_NUM_ISA_MODES definitionAndrew Carlotti2-5/+5
AARCH64_NUM_ISA_MODES will be used within aarch64-opts.h in a later commit. gcc/ChangeLog: * config/aarch64/aarch64.h (DEF_AARCH64_ISA_MODE): Move to... * config/aarch64/aarch64-opts.h (DEF_AARCH64_ISA_MODE): ...here.
2024-07-28aarch64: Remove unused global aarch64_tune_flagsAndrew Carlotti1-4/+0
gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_tune_flags): Remove unused global variable. (aarch64_override_options_internal): Remove dead assignment.
2024-07-28optabs/rs6000: Rename iorc and andc to iorn and andnAndrew Pinski3-25/+25
When I was trying to add an scalar version of iorc and andc, the optab that got matched was for and/ior with the mode of csi and cdi instead of iorc and andc optabs for si and di modes. Since csi/cdi are the complex integer modes, we need to rename the optabs to be without c there. This changes c to n which is a neutral and known not to be first letter of a mode. Bootstrapped and tested on x86_64 and powerpc64le. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def: s/iorc/iorn/. s/andc/andn/ for the code. * config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update to iorn. * config/rs6000/rs6000.md (andc<mode>3): Rename to ... (andn<mode>3): This. (iorc<mode>3): Rename to ... (iorn<mode>3): This. * doc/md.texi: Update documentation for the rename. * internal-fn.def (BIT_ANDC): Rename to ... (BIT_ANDN): This. (BIT_IORC): Rename to ... (BIT_IORN): This. * optabs.def (andc_optab): Rename to ... (andn_optab): This. (iorc_optab): Rename to ... (iorn_optab): This. * gimple-isel.cc (gimple_expand_vec_cond_expr): Update for the renamed internal functions, ANDC/IORC to ANDN/IORN. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-07-28Revert "aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2"Kyrylo Tkachov3-25/+1
This reverts commit 4c5eb66e701bc9f3bf1298269f52559b10d63a09.
2024-07-28aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2Jennifer Schmitz3-1/+25
According to the Neoverse V2 Software Optimization Guide (section 4.14), the instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been implemented so far. This patch implements and tests the two fusion pairs. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. There was also no non-noise impact on SPEC CPU2017 benchmark. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement fusion logic. * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry. (cmp+cset): Likewise. * config/aarch64/tuning_models/neoversev2.h: Enable logic in field fusible_ops. gcc/testsuite/ * gcc.target/aarch64/cmp_csel_fuse.c: New test. * gcc.target/aarch64/cmp_cset_fuse.c: Likewise.
2024-07-28RISC-V: Disable Zba optimization pattern if XTheadMemIdx is enabledChristoph Müllner1-1/+1
It is possible that the Zba optimization pattern zero_extendsidi2_bitmanip matches for a XTheadMemIdx INSN with the effect of emitting an invalid instruction as reported in PR116035. The pattern above is used to emit a zext.w instruction to zero-extend SI mode registers to DI mode. A similar functionality can be achieved by XTheadBb's th.extu instruction. And indeed, we have the equivalent pattern in thead.md (zero_extendsidi2_th_extu). However, that pattern depends on !TARGET_XTHEADMEMIDX. To compensate for that, there are specific patterns that ensure that zero-extension instruction can still be emitted (th_memidx_bb_zero_extendsidi2 and friends). While we could implement something similar (th_memidx_zba_zero_extendsidi2) it would only make sense, if there existed real HW that does implement Zba and XTheadMemIdx, but not XTheadBb. Unless such a machine exists, let's simply disable zero_extendsidi2_bitmanip if XTheadMemIdx is available. PR target/116035 gcc/ChangeLog: * config/riscv/bitmanip.md: Disable zero_extendsidi2_bitmanip for XTheadMemIdx. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr116035-1.c: New test. * gcc.target/riscv/pr116035-2.c: New test. Reported-by: Patrick O'Neill <patrick@rivosinc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-07-28x86: Don't enable APX_F in 32-bit modeLingling Kong2-2/+4
gcc/ChangeLog: PR target/115978 * config/i386/driver-i386.cc (host_detect_local_cpu): Enable APX_F only for 64-bit codegen. * config/i386/i386-options.cc (DEF_PTA): Skip PTA_APX_F if not in 64-bit mode. gcc/testsuite/ChangeLog: PR target/115978 * gcc.target/i386/pr115978-1.c: New test. * gcc.target/i386/pr115978-2.c: Ditto.
2024-07-28RISC-V: Fix snafu in SI mode splitters patchVineet Gupta1-1/+1
SPEC2017 perlbench for RISC-V was broke as runtime output mismatch failure. > 3830: mbox2: dWshe3Aa1EULre4CT5O/ErYFrk+o/EOoebA1kTVjQVQQH2EjT5fHcYnwjj2MdBmZu5y3Ce4Ei4QQZo/SNrry9g > mbox2: uuWPimQiU0D4UrwFP+LS0lFNph4qL43WV1A6T3tHleatIOUaHixhrJU9NoA2lc9KjwYpdEL0lNTXkvo8ymNHzA > ^ > 3832: mbox3: 8f4jdv6GIf0lX3DcdwRdEm6/aZwnmGX6n86GzCvmkwTKFXQjwlwVHc8jy8XlcyiIPr3yXTkgVOiP3cRYvyYQPg > mbox3: 9xQySgP6qbhfxl8Usu1WfGA5UhStB5AN31wueGM6OF4Jp59DkqJPu6ksGblOU5u0nQapQC1e9oYIs16a2mq2NA > ^ > specdiff run completed Edwin bisected this to 273f16a125c4 ("[v3][RISC-V] Handle bit manipulation of SImode values") which had the operands swapped in one of the new splitters introduced. No test as reducer narrows it to down to the exact test introduced by the original commit. gcc/ChangeLog: * config/riscv/bitmanip.md: Fix splitter. Reported-by: Edwin Lu <ewlu@rivosinc.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2024-07-28report message for operator %a on unaddressible operandJiufu Guo1-1/+6
Hi, For PR96866, when printing asm code for modifier "%a", an addressable operand is required. While the constraint "X" allow any kind of operand even which is hard to get the address directly. e.g. extern symbol whose address is in TOC. An error message would be reported to indicate the invalid asm operand. Compare with previous version, test case is updated with -mno-pcrel. Bootstrap&regtest pass on ppc64{,le}. Is this ok for trunk? BR, Jeff(Jiufu Guo) PR target/96866 gcc/ChangeLog: * config/rs6000/rs6000.cc (print_operand_address): Emit message for unsupported operand. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr96866-1.c: New test. * gcc.target/powerpc/pr96866-2.c: New test.
2024-07-28Relax ix86_hardreg_mov_ok after split1.liuhongt1-3/+2
ix86_hardreg_mov_ok is added by r11-5066-gbe39636d9f68c4 > The solution proposed here is to have the x86 backend/recog prevent > early RTL passes composing instructions (that set likely_spilled hard > registers) that they (combine) can't simplify, until after reload. > We allow sets from pseudo registers, immediate constants and memory > accesses, but anything more complicated is performed via a temporary > pseudo. Not only does this simplify things for the register allocator, > but any remaining register-to-register moves are easily cleaned up > by the late optimization passes after reload, such as peephole2 and > cprop_hardreg. The restriction is mainly for rtl optimization passes before pass_combine. But split1 splits ``` (insn 17 13 18 2 (set (reg/i:V4SI 20 xmm0) (vec_merge:V4SI (const_vector:V4SI [ (const_int -1 [0xffffffffffffffff]) repeated x4 ]) (const_vector:V4SI [ (const_int 0 [0]) repeated x4 ]) (unspec:QI [ (reg:V4SF 106) (reg:V4SF 102) (const_int 0 [0]) ] UNSPEC_PCMP))) "/app/example.cpp":20:1 2929 {*avx_cmpv4sf3_1} (expr_list:REG_DEAD (reg:V4SF 102) (expr_list:REG_DEAD (reg:V4SF 106) (nil)))) ``` into: ``` (insn 23 13 24 2 (set (reg:V4SF 107) (unspec:V4SF [ (reg:V4SF 106) (reg:V4SF 102) (const_int 0 [0]) ] UNSPEC_PCMP)) "/app/example.cpp":20:1 -1 (nil)) (insn 24 23 18 2 (set (reg/i:V4SI 20 xmm0) (subreg:V4SI (reg:V4SF 107) 0)) "/app/example.cpp":20:1 -1 (nil)) ``` There're many splitters generating MOV insn with SUBREG and would have same problem. Instead of changing those splitters one by one, the patch relaxes ix86_hard_mov_ok to allow mov subreg to hard register after split1. ix86_pre_reload_split () is used to replace !reload_completed && ira_in_progress. gcc/ChangeLog: * config/i386/i386.cc (ix86_hardreg_mov_ok): Relax mov subreg to hard register after split1. gcc/testsuite/ChangeLog: * g++.target/i386/pr115982.C: New test.
2024-07-28rs6000: Update option set in rs6000_inner_target_options [PR115713]Kewen Lin1-1/+2
When function rs6000_inner_target_options parsing target options, it updates the explicit option set information for rs6000_opt_masks by rs6000_isa_flags_explicit, but it misses to update that information for rs6000_opt_vars, and it can result in some unexpected consequence as the associated test case shows. This patch is to fix rs6000_inner_target_options to update the option set for rs6000_opt_vars as well. PR target/115713 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_inner_target_options): Update option set information for rs6000_opt_vars. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr115713-2.c: New test.
2024-07-28rs6000: Consider explicitly set options in target option parsing [PR115713]Kewen Lin1-2/+5
In rs6000_inner_target_options, when enabling VSX we enable altivec and disable -mavoid-indexed-addresses implicitly, but it doesn't consider the case that the options altivec and avoid-indexed-addresses can be explicitly disabled. As the test case in PR115713#c1 shows, with target attribute "no-altivec,vsx", it results in that VSX unexpectedly set altivec flag and there isn't an expected error. This patch is to avoid the automatic enablement when they are explicitly specified. With this change, an existing test case ppc-target-4.c also requires an adjustment by specifying explicit altivec in target attribute (since it requires altivec feature and command line is specifying no-altivec). PR target/115713 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_inner_target_options): Avoid to enable altivec or disable avoid-indexed-addresses automatically when they get specified explicitly. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr115713-1.c: New test. * gcc.target/powerpc/ppc-target-4.c: Adjust by specifying altivec in target attribute.
2024-07-28rs6000: Escalate warning to error for VSX with explicit no-altivec etc.Kewen Lin1-18/+23
As the discussion in PR115688, for now when users specify -mvsx and -mno-altivec explicitly, compiler emits warning rather than error, but considering both options are given explicitly, emitting hard error should be better. So this patch is to escalate some related warning to error when both are incompatible. PR target/115713 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_option_override_internal): Emit error messages when explicit VSX encounters explicit soft-float, no-altivec or avoid-indexed-addresses. gcc/testsuite/ChangeLog: * gcc.target/powerpc/warn-1.c: Move to ... * gcc.target/powerpc/error-1.c: ... here. Adjust dg-warning with dg-error and remove ineffective scan.
2024-07-28i386: Change prefetchi output templateHaochen Jiang1-1/+1
For prefetchi instructions, RIP-relative address is explicitly mentioned for operand and assembler obeys that rule strictly. This makes instruction like: prefetchit0 bar got illegal for assembler, which should be a broad usage for prefetchi. Change to %a to explicitly add (%rip) after function label to make it legal in assembler so that it could pass to linker to get the real address. gcc/ChangeLog: * config/i386/i386.md (prefetchi): Change to %a. gcc/testsuite/ChangeLog: * gcc.target/i386/prefetchi-1.c: Check (%rip).
2024-07-28RISC-V: Implement the .SAT_TRUNC for scalarPan Li4-0/+61
This patch would like to implement the simple .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \ { \ bool overflow = x > (WT)(NT)(-1); \ return ((NT)x) | (NT)-overflow; \ } DEF_SAT_U_TRUC_FMT_1(uint32_t, uint64_t) Before this patch: __attribute__((noinline)) uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x) { _Bool overflow; unsigned char _1; unsigned char _2; unsigned char _3; uint8_t _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY overflow_5 = x_4(D) > 255; _1 = (unsigned char) x_4(D); _2 = (unsigned char) overflow_5; _3 = -_2; _6 = _1 | _3; return _6; ;; succ: EXIT } After this patch: __attribute__((noinline)) uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x) { uint8_t _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _6 = .SAT_TRUNC (x_4(D)); [tail call] return _6; ;; succ: EXIT } The below tests suites are passed for this patch 1. The rv64gcv fully regression test. 2. The rv64gcv build with glibc gcc/ChangeLog: * config/riscv/iterators.md (ANYI_DOUBLE_TRUNC): Add new iterator for int double truncation. (ANYI_DOUBLE_TRUNCATED): Add new attr for int double truncation. (anyi_double_truncated): Ditto but for lowercase. * config/riscv/riscv-protos.h (riscv_expand_ustrunc): Add new func decl for expanding ustrunc * config/riscv/riscv.cc (riscv_expand_ustrunc): Add new func impl to expand ustrunc. * config/riscv/riscv.md (ustrunc<mode><anyi_double_truncated>2): Impl the new pattern ustrunc<m><n>2 for int. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macro. * gcc.target/riscv/sat_arith_data.h: New test. * gcc.target/riscv/sat_u_trunc-1.c: New test. * gcc.target/riscv/sat_u_trunc-2.c: New test. * gcc.target/riscv/sat_u_trunc-3.c: New test. * gcc.target/riscv/sat_u_trunc-run-1.c: New test. * gcc.target/riscv/sat_u_trunc-run-2.c: New test. * gcc.target/riscv/sat_u_trunc-run-3.c: New test. * gcc.target/riscv/scalar_sat_unary.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-28Add -mcpu=power11 support.Michael Meissner16-87/+127
This patch adds the power11 option to the -mcpu= and -mtune= switches. This patch treats the power11 like a power10 in terms of costs and reassociation width. This patch issues a ".machine power11" to the assembly file if you use -mcpu=power11. This patch defines _ARCH_PWR11 if the user uses -mcpu=power11. This patch allows GCC to be configured with the --with-cpu=power11 and --with-tune=power11 options. This patch passes -mpwr11 to the assembler if the user uses -mcpu=power11. This patch adds support for using "power11" in the __builtin_cpu_is built-in function. 2024-07-22 Michael Meissner <meissner@linux.ibm.com> gcc/ * config.gcc (powerpc*-*-*): Add support for power11. * config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=power11. * config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise. * config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise. * config/rs6000/driver-rs6000.cc (asm_names): Likewise. * config/rs6000/ppc-auxv.h (PPC_PLATFORM_POWER11): New define. * config/rs6000/rs6000-builtin.cc (cpu_is_info): Add power11. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define _ARCH_PWR11 if -mcpu=power11. * config/rs6000/rs6000-cpus.def (POWER11_MASKS_SERVER): New define. (POWERPC_MASKS): Add power11. (power11 cpu): Add power11 definition. * config/rs6000/rs6000-opts.h (PROCESSOR_POWER11): Add power11 processor. * config/rs6000/rs6000-string.cc (expand_compare_loop): Likewise. * config/rs6000/rs6000-tables.opt: Regenerate. * config/rs6000/rs6000.cc (rs6000_option_override_internal): Add power11 support. (rs6000_machine_from_flags): Likewise. (rs6000_reassociation_width): Likewise. (rs6000_adjust_cost): Likewise. (rs6000_issue_rate): Likewise. (rs6000_sched_reorder): Likewise. (rs6000_sched_reorder2): Likewise. (rs6000_register_move_cost): Likewise. (rs6000_opt_masks): Likewise. * config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise. * config/rs6000/rs6000.md (cpu attribute): Add power11. * config/rs6000/rs6000.opt (-mpower11): Add internal power11 flag. * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mcpu=power11. * config/rs6000/power10.md (all reservations): Add power11 support. gcc/testsuite/ * gcc.target/powerpc/power11-1.c: New test. * gcc.target/powerpc/power11-2.c: Likewise. * gcc.target/powerpc/power11-3.c: Likewise.
2024-07-28aarch64: Tighten aarch64_simd_mem_operand_p [PR115969]Richard Sandiford1-2/+3
aarch64_simd_mem_operand_p checked for a memory with a POST_INC or REG address, but it didn't check what kind of register was being used. This meant that it allowed DImode FPRs as well as GPRs. I wondered about rewriting it to use aarch64_classify_address, but this one-line fix seemed simpler. The structure then mirrors the existing early exit in aarch64_classify_address itself: /* On LE, for AdvSIMD, don't support anything other than POST_INC or REG addressing. */ if (advsimd_struct_p && TARGET_SIMD && !BYTES_BIG_ENDIAN && (code != POST_INC && code != REG)) return false; gcc/ PR target/115969 * config/aarch64/aarch64.cc (aarch64_simd_mem_operand_p): Require the operand to be a legitimate memory_operand. gcc/testsuite/ PR target/115969 * gcc.target/aarch64/pr115969.c: New test.
2024-07-28AArch64: implement TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE ↵Tamar Christina1-0/+12
[PR115531]. This implements the new target hook indicating that for AArch64 when possible we prefer masked operations for any type vs doing LOAD + SELECT or SELECT + STORE. Thanks, Tamar gcc/ChangeLog: PR tree-optimization/115531 * config/aarch64/aarch64.cc (aarch64_conditional_operation_is_expensive): New. (TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE): New. gcc/testsuite/ChangeLog: PR tree-optimization/115531 * gcc.dg/vect/vect-conditional_store_1.c: New test. * gcc.dg/vect/vect-conditional_store_2.c: New test. * gcc.dg/vect/vect-conditional_store_3.c: New test. * gcc.dg/vect/vect-conditional_store_4.c: New test.
2024-07-21SH: Fix outage caused by recently added 2nd combine pass after reg allocOleg Endo1-1/+9
I've also confirmed on the CSiBE set that the secondary combine pass is actually beneficial on SH. It does result in some code size reductions. gcc/CHangeLog: * config/sh/sh.md (mov_neg_si_t): Allow insn and split after register allocation. (*treg_noop_move): New insn.
2024-07-20LoongArch: Organize the code related to split move and merge the same functions.Lulu Cheng3-169/+58
gcc/ChangeLog: * config/loongarch/loongarch-protos.h (loongarch_split_128bit_move): Delete. (loongarch_split_128bit_move_p): Delete. (loongarch_split_256bit_move): Delete. (loongarch_split_256bit_move_p): Delete. (loongarch_split_vector_move): Add a function declaration. * config/loongarch/loongarch.cc (loongarch_vector_costs::finish_cost): Adjust the code formatting. (loongarch_split_vector_move_p): Merge loongarch_split_128bit_move_p and loongarch_split_256bit_move_p. (loongarch_split_move_p): Merge code. (loongarch_split_move): Likewise. (loongarch_split_128bit_move_p): Delete. (loongarch_split_256bit_move_p): Delete. (loongarch_split_128bit_move): Delete. (loongarch_split_vector_move): Merge loongarch_split_128bit_move and loongarch_split_256bit_move. (loongarch_split_256bit_move): Delete. (loongarch_global_init): Remove the extra semicolon at the end of the function. * config/loongarch/loongarch.md (*movdf_softfloat): Added a new condition TARGET_64BIT.
2024-07-19AVR: Support new built-in function __builtin_avr_mask1.Georg-Johann Lay3-0/+201
gcc/ * config/avr/builtins.def (MASK1): New DEF_BUILTIN. * config/avr/avr.cc (avr_rtx_costs_1): Handle rtx costs for expressions like __builtin_avr_mask1. (avr_init_builtins) <uintQI_ftype_uintQI_uintQI>: New tree type. (avr_expand_builtin) [AVR_BUILTIN_MASK1]: Diagnose unexpected forms. (avr_fold_builtin) [AVR_BUILTIN_MASK1]: Handle case. * config/avr/avr.md (gen_mask1): New expand helper. (mask1_0x01_split, mask1_0x80_split, mask1_0xfe_split): New insn-and-split. (*mask1_0x01, *mask1_0x80, *mask1_0xfe): New insns. * doc/extend.texi (AVR Built-in Functions) <__builtin_avr_mask1>: Document new built-in function. gcc/testsuite/ * gcc.target/avr/torture/builtin-mask1.c: New test.
2024-07-19bpf: create modifier for mem operand for xchg and cmpxchgCupertino Miranda2-6/+18
Both xchg and cmpxchg instructions, in the pseudo-C dialect, do not expect their memory address operand to be surrounded by parentheses. For example, it should be output as "w0 =cmpxchg32_32(r8+8,w0,w2)" instead of "w0 =cmpxchg32_32((r8+8),w0,w2)". This patch implements an operand modifier 'M' which marks the instruction templates that do not expect the parentheses, and adds it do xchg and cmpxchg templates. gcc/ChangeLog: * config/bpf/atomic.md (atomic_compare_and_swap, atomic_exchange): Add operand modifier %M to the first operand. * config/bpf/bpf.cc (no_parentheses_mem_operand): Create variable. (bpf_print_operand): Set no_parentheses_mem_operand variable if %M operand is used. (bpf_print_operand_address): Conditionally output parentheses. gcc/testsuite/ChangeLog: * gcc.target/bpf/pseudoc-atomic-memaddr-op.c: Add test.
2024-07-18rs6000: Fix .machine cpu selection w/ altivec [PR97367]René Rebe1-1/+4
There are various non-IBM CPUs with altivec, so we cannot use that flag to determine which .machine cpu to use, so ignore it. Emit an additional ".machine altivec" if Altivec is enabled so that the assembler doesn't require an explicit -maltivec option to assemble any Altivec instructions for those targets where the ".machine cpu" is insufficient to enable Altivec. For example, -mcpu=G5 emits a ".machine power4". 2024-07-18 René Rebe <rene@exactcode.de> Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/97367 * config/rs6000/rs6000.cc (rs6000_machine_from_flags): Do not consider OPTION_MASK_ALTIVEC. (emit_asm_machine): For Altivec compiles, emit a ".machine altivec". gcc/testsuite/ PR target/97367 * gcc.target/powerpc/pr97367.c: New test. Signed-off-by: René Rebe <rene@exactcode.de>
2024-07-18Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOVliuhongt2-6/+32
gcc/ChangeLog: PR target/115843 * config/i386/predicates.md (const0_or_m1_operand): New predicate. * config/i386/sse.md (*<avx512>_store<mode>_mask_1): New pre_reload define_insn_and_split. (V): Add V32BF,V16BF,V8BF. (V4SF_V8BF): Rename to .. (V24F_128): .. this. (*vec_concat<mode>): Adjust with V24F_128. (*vec_concat<mode>_0): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr115843.c: New test.
2024-07-17alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]Uros Bizjak1-3/+7
Add missing "cannot_copy" attribute to instructions that have to stay in 1-1 correspondence with another insn. PR target/115526 gcc/ChangeLog: * config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy attribute. (movdi_er_tlsgd): Ditto. (movdi_er_tlsldm): Ditto. (call_value_osf_<tls>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/alpha/pr115526.c: New test.
2024-07-17AVR: target/90616 - Improve adding constants that are 0 mod 256.Georg-Johann Lay4-0/+47
This patch introduces a new insn that works as an insn combine pattern for (plus:HI (zero_extend:HI (reg:QI)) (const_0mod256_operannd:HI)) which requires at most 2 instructions. When the input register operand is already in HImode, the addhi3 printer only adds the hi8 part when it sees a SYMBOL_REF or CONST aligned to at least 256 bytes. (The CONST_INT case was already handled). gcc/ PR target/90616 * config/avr/predicates.md (const_0mod256_operand): New predicate. * config/avr/constraints.md (Cp8): New constraint. * config/avr/avr.md (*aligned_add_symbol): New insn. * config/avr/avr.cc (avr_out_plus_symbol) [HImode]: When op2 is a multiple of 256, there is no need to add / subtract the lo8 part. (avr_rtx_costs_1) [PLUS && HImode]: Return expected costs for new insn *aligned_add_symbol as it applies.