Age | Commit message (Collapse) | Author | Files | Lines |
|
A patch introduced a pattern to avoid unnecessary extensions when doing a
min/max operation where one of the values is a 32 bit positive constant.
> (define_insn_and_split "*minmax"
> [(set (match_operand:DI 0 "register_operand" "=r")
> (sign_extend:DI
> (subreg:SI
> (bitmanip_minmax:DI (zero_extend:DI (match_operand:SI 1 "register_operand" "r"))
> (match_operand:DI 2 "immediate_operand" "i"))
> 0)))
> (clobber (match_scratch:DI 3 "=&r"))
> (clobber (match_scratch:DI 4 "=&r"))]
> "TARGET_64BIT && TARGET_ZBB && sext_hwi (INTVAL (operands[2]), 32) >= 0"
> "#"
> "&& reload_completed"
> [(set (match_dup 3) (sign_extend:DI (match_dup 1)))
> (set (match_dup 4) (match_dup 2))
> (set (match_dup 0) (<minmax_optab>:DI (match_dup 3) (match_dup 4)))]
Lots going on in here. The key is the nonconstant value is zero extended from
SI to DI in the original RTL and we know the constant value is unchanged if we
were to sign extend it from 32 to 64 bits.
We change the extension of the nonconstant operand from zero to sign extension.
I'm pretty confident the goal there is take advantage of the fact that SI
values are kept sign extended and will often be optimized away.
The problem occurs when the nonconstant operand has the SI sign bit set. As an
example:
smax (0x8000000, 0x7) resulting in 0x80000000
The split RTL will generate
smax (sign_extend (0x80000000), 0x7))
smax (0xffffffff80000000, 0x7) resulting in 0x7
Opps.
We really needed to change the opcode to umax for this transformation to work.
That's easy enough. But there's further improvements we can make.
First the pattern is a define_and_split with a post-reload split condition. It
would be better implemented as a 4->3 define_split so that the costing model
just works. Second, if operands[1] is a suitably promoted subreg, then we can
elide the sign extension when we generate the split code, so often it'll be a
4->2 split, again with the cost model working with no adjustments needed.
Tested on rv32 and rv64 in my tester. I'll wait for the pre-commit tester to
spin it as well.
PR target/116085
gcc/
* config/riscv/bitmanip.md (minmax extension avoidance splitter):
Rewrite as a simpler define_split. Adjust the opcode appropriately.
Avoid emitting sign extension if it's clearly not needed.
* config/riscv/iterators.md (minmax_optab): Rename to uminmax_optab
and map everything to unsigned variants.
gcc/testsuite/
* gcc.target/riscv/pr116085.c: New test.
|
|
Now there is an optab for bic, andn since r15-1890-gf379596e0ba99d.
This moves aarch64_bic for sve over to use it instead.
Note unlike the simd bic patterns, the operands were already
in the order that was expected for the optab so no swapping
was needed.
Built and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-base.cc (svbic_impl::expand): Update
to use andn optab instead of using code_for_aarch64_bic.
* config/aarch64/aarch64-sve.md (@aarch64_bic<mode>): Rename to ...
(andn<mode>3): This.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Since r15-1890-gf379596e0ba99d, these are the new optabs.
So let's use these names for them. These will be used to
generate during expand from gimple in the next few patches.
Built and tested for aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
* config/aarch64/aarch64.md (*<NLOGICAL:optab>_one_cmpl<mode>3): Rename to ...
(<NLOGICAL:optab>n<mode>3): This.
(*<NLOGICAL:optab>_one_cmplsidi3_ze): Rename to ...
(*<NLOGICAL:optab>nsidi3_ze): This.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This renames the patterns orn<mode>3 to iorn<mode>3 so it
matches the new optab that was added with r15-1890-gf379596e0ba99d.
Likewise for bic<mode>3 to andn<mode>3.
Note the operand 1 and operand 2 are swapped from the original
patterns to match the optab now.
Built and tested for aarch64-linux-gnu with no regression.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(bic<mode>3<vczle><vczbe>): Rename to ...
(andn<mode>3<vczle><vczbe>): This. Also swap operands.
(orn<mode>3<vczle><vczbe>): Rename to ...
(iorn<mode>3<vczle><vczbe>): This. Also swap operands.
(vec_cmp<mode><v_int_equiv>): Update orn call to iorn
and swap the last two arguments.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/vect_cmp-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The problem here is the aarch64 backend enables -mearly-ra at -O2 and above but
it is not marked as an Optimization in the .opt file so enabling it sometimes
reset the target options when going from -O1 to -O2 for the first time.
Build and tested for aarch64-linux-gnu with no regressions.
PR target/116065
gcc/ChangeLog:
* config/aarch64/aarch64.opt (mearly-ra=): Mark as Optimization rather
than Save.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/target_optimization-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
An unquoted apostrophe slipped through when testing the recent
V/M extension patch. This, again, re-words the message to
"Currently the 'V' implementation requires the 'M' extension".
Going to commit as obvious after testing.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_override_options_internal):
Reword error string without apostrophe.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr116036.c: Adjust expected error
string.
|
|
Hi all,
For AMX instructions related with memory, we will treat the memory
size as not specified since there won't be different size causing
confusion for memory.
This will change the output under Intel mode, which is broken for now when
using with assembler and aligns to current binutils behavior.
Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk?
Thx,
Haochen
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_builtin): Change
from XImode to BLKmode.
* config/i386/i386.md (ldtilecfg): Change XI to BLK.
(sttilecfg): Ditto.
|
|
__builtin_vsx_set_2di
The built-ins set a value in a vector. The same operation can be done
in C-code. The assembly code generated from the C-code is as good or
better than the code generated by the built-ins. With default
optimization the number of assembly generated for the two methods are
similar. With -O3 optimization, the assembly generated for the two
approaches is identical for the 2DF and 2DI types. The assembly for
the C-code version of the 1Ti requires one less assembly instruction.
It also only uses one load versus two loads for the built-in.
With the removal of the built-ins, there are no other uses of the
set built-in attribute. The code associated with the set built-in
attribute is removed.
Finally, the testcase for the __builtin_vsx_set_2df is removed. The
other built-ins do not have testcases.
gcc/ChangeLog:
* config/rs6000/rs6000-builtin.cc (get_element_number,
altivec_expand_vec_set_builtin): Remove functions.
(rs6000_expand_builtin): Remove the if statement to call
altivec_expand_vec_set_builtin.
* config/rs6000/rs6000-builtins.def (__builtin_vsx_set_1ti,
__builtin_vsx_set_2df, __builtin_vsx_set_2di): Remove the
built-in definitions.
* config/rs6000/rs6000-gen-builtins.cc (struct attrinfo):
Remove the isset variable from the structure.
(parse_bif_attrs): Remove the uses of the isset variable.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c: Remove test cases for the
__builtin_vsx_set_2df built-in.
|
|
__builtin_vec_set_v2di
This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df
and __builtin_vec_set_v2di built-ins. The users should just use
normal C-code to update the various vector elements. This change was
originally intended to be part of the earlier series of cleanup
patches. It was initially thought that some additional work would be
needed to do some gimple generation instead of these built-ins.
However, the existing default code generation does produce the needed
code. For the vec_set bif, the equivalent C code is as good or
better than the built-in. For the vec_insert bif whose resolving
previously made use of the vec_set bif, the assembly code generation
is as good as before with the -O3 optimization.
Remove the built-ins, use the default gimple generation instead.
gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
__builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
definitions.
* config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove the
handling for constant vec_insert position with
VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.
|
|
This patch removes the built-ins:
__builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
__builtin_vsx_xvcmpgtsp.
which are similar to the recommended PVIPR documented overloaded
vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.
The difference is that the overloaded built-ins return a vector of
32-bit booleans. The removed built-ins returned a vector of floats.
The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
__builtin_vsx_xvcmpgtdp are not removed as they are used by the
overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.
The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
__builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
__builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp are changed to use
the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins. Use of the
overloaded built-ins requires the result to be stored in a vector of
boolean of the appropriate size or the result must be cast to the return
type used by the original __builtin_vsx_xvcmp* built-ins.
gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp,
__builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp): Remove
definitions.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (do_cmp): Replace
__builtin_vsx_xvcmp{eq,gt,ge}{sp,dp} by vec_cmp{eq,gt,ge}
respectively and add explicit casts to vector {float,double}.
Add more testing code assigning result to vector boolean types.
|
|
auto_inc_dec (-O3) performs optimizations like the following
if RVV and XTheadMemIdx is enabled.
(insn 23 20 27 3 (set (mem:V4QI (reg:DI 136 [ ivtmp.13 ]) [0 MEM <vector(4) char> [(char *)_39]+0 S4 A32])
(reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 {*movv4qi}
(nil))
(insn 40 39 41 3 (set (reg:DI 136 [ ivtmp.13 ])
(plus:DI (reg:DI 136 [ ivtmp.13 ])
(const_int 20 [0x14]))) 5 {adddi3}
(nil))
====>
(insn 23 20 27 3 (set (mem:V4QI (post_modify:DI (reg:DI 136 [ ivtmp.13 ])
(plus:DI (reg:DI 136 [ ivtmp.13 ])
(const_int 20 [0x14]))) [0 MEM <vector(4) char> [(char *)_39]+0 S4 A32])
(reg:V4QI 168)) "gcc/testsuite/gcc.target/riscv/pr116033.c":12:27 3183 {*movv4qi}
(expr_list:REG_INC (reg:DI 136 [ ivtmp.13 ])
(nil)))
The reason why the pass believes that this is legal is,
that the mode test in th_memidx_classify_address_modify()
requires INTEGRAL_MODE_P (mode), which includes vector modes.
Let's restrict the mode test such, that only MODE_INT is allowed.
PR target/116033
gcc/ChangeLog:
* config/riscv/thead.cc (th_memidx_classify_address_modify):
Fix mode test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr116033.c: New test.
Reported-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
g:72fbd3b2b2a497dbbe6599239bd61c5624203ed0 added a use of std::array
without explicitly forcing <array> to be included. That didn't cause
problems in my local builds but understandably did for some people.
gcc/
* doc/rtl.texi: Document the need to define INCLUDE_ARRAY before
including rtl-ssa.h.
* rtl-ssa.h: Likewise (in comment).
* config/aarch64/aarch64-cc-fusion.cc: Add INCLUDE_ARRAY.
* config/aarch64/aarch64-early-ra.cc: Likewise.
* config/riscv/riscv-avlprop.cc: Likewise.
* config/riscv/riscv-vsetvl.cc: Likewise.
* fwprop.cc: Likewise.
* late-combine.cc: Likewise.
* pair-fusion.cc: Likewise.
* rtl-ssa/accesses.cc: Likewise.
* rtl-ssa/blocks.cc: Likewise.
* rtl-ssa/changes.cc: Likewise.
* rtl-ssa/functions.cc: Likewise.
* rtl-ssa/insns.cc: Likewise.
* rtl-ssa/movement.cc: Likewise.
|
|
For calculating the value of a poly_int at runtime we use a
multiplication instruction that requires the M extension.
Instead of just asserting and ICEing this patch emits an early
error at option-parsing time.
gcc/ChangeLog:
PR target/116036
* config/riscv/riscv.cc (riscv_override_options_internal): Error
with TARGET_VECTOR && !TARGET_MUL.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-31.c: Add m to arch string and expect it.
* gcc.target/riscv/arch-32.c: Ditto.
* gcc.target/riscv/arch-37.c: Ditto.
* gcc.target/riscv/arch-38.c: Ditto.
* gcc.target/riscv/predef-14.c: Ditto.
* gcc.target/riscv/predef-15.c: Ditto.
* gcc.target/riscv/predef-16.c: Ditto.
* gcc.target/riscv/predef-26.c: Ditto.
* gcc.target/riscv/predef-27.c: Ditto.
* gcc.target/riscv/predef-32.c: Ditto.
* gcc.target/riscv/predef-33.c: Ditto.
* gcc.target/riscv/predef-36.c: Ditto.
* gcc.target/riscv/predef-37.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111486.c: Add m to arch string.
* gcc.target/riscv/compare-debug-1.c: Ditto.
* gcc.target/riscv/compare-debug-2.c: Ditto.
* gcc.target/riscv/rvv/base/pr116036.c: New test.
|
|
Realize in recent benchmark evaluation (coremark-pro zip-test):
vid.v v2
vmv.v.i v5,0
.L9:
vle16.v v3,0(a4)
vrsub.vx v4,v2,a6 ---> LICM failed to hoist it outside the loop.
The root cause is:
(insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0)
(reg:DI 223)) "rvv.c":11:9 208 {*movdi_64bit} -> Its result used by the following vrsub.vx then supress the hoist of the vrsub.vx
(nil))
(insn 57 56 59 4 (set (reg:RVVMF2HI 216)
(if_then_else:RVVMF2HI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 350)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(minus:RVVMF2HI (vec_duplicate:RVVMF2HI (reg:HI 220))
(reg:RVVMF2HI 217))
(unspec:RVVMF2HI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF))) "rvv.c":11:9 6938 {pred_subrvvmf2hi_reverse_scalar}
(expr_list:REG_DEAD (reg:HI 220)
(nil)))
This patch fixes it generate (set (reg:HI) (subreg:HI (reg:DI))) instead of (set (subreg:DI (reg:DI)) (reg:DI)).
After this patch:
vid.v v2
vrsub.vx v2,v2,a7
vmv.v.i v4,0
.L3:
vle16.v v3,0(a4)
Tested on both RV32 and RV64 no regression.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_move): Fix poly_int dest generation.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/poly_licm-1.c: New test.
* gcc.target/riscv/rvv/autovec/poly_licm-2.c: New test.
* gcc.target/riscv/rvv/autovec/poly_licm-3.c: New test.
|
|
As suggested in the review of
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657474.html,
this patch changes the return type of gimple_folder::redirect_call from
gimple * to gcall *. The motivation for this is that so far, most callers of
the function had been casting the result of the function to gcall. These
call sites were updated.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::redirect_call): Update return type.
* config/aarch64/aarch64-sve-builtins.h: Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svqshl_impl::fold):
Remove cast to gcall.
(svrshl_impl::fold): Likewise.
|
|
gcc/ChangeLog:
PR target/115749
* config/i386/x86-tune-costs.h (struct processor_costs):
Adjust rtx_cost of imulq and imulw for COST_N_INSNS (4)
to COST_N_INSNS (3).
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr115749.c: New test.
|
|
Replace the existing uint64_t typedef with a bbitmap<2> typedef. Most
of the preparatory work was carried out in previous commits, so this
patch itself is fairly small.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc
(aarch64_set_asm_isa_flags): Store a second uint64_t value.
* config/aarch64/aarch64-opts.h
(aarch64_feature_flags): Switch typedef to bbitmap<2>.
* config/aarch64/aarch64.cc
(aarch64_set_current_function): Extract isa mode from val[0].
* config/aarch64/aarch64.h
(aarch64_get_asm_isa_flags): Load a second uint64_t value.
(aarch64_get_isa_flags): Ditto.
(aarch64_asm_isa_flags): Ditto.
(aarch64_isa_flags): Ditto.
(HANDLE): Use bbitmap<2>::from_index to initialise flags.
(AARCH64_FL_ISA_MODES): Do arithmetic on integer type.
(AARCH64_ISA_MODE): Extract value from bbitmap<2> array.
* config/aarch64/aarch64.opt
(aarch64_asm_isa_flags_1): New variable.
(aarch64_isa_flags_1): Ditto.
|
|
gcc/ChangeLog:
* config/aarch64/aarch64-feature-deps.h
(get_flags_off): Construct aarch64_feature_flags (0) explicitly.
|
|
Use a new AARCH64_HAVE_ISA macro in TARGET_* definitions, and eliminate
all the AARCH64_ISA_* feature macros.
gcc/ChangeLog:
* config/aarch64/aarch64-c.cc
(aarch64_define_unconditional_macros): Use TARGET_V8R macro.
(aarch64_update_cpp_builtins): Use TARGET_* macros.
* config/aarch64/aarch64.h (AARCH64_HAVE_ISA): New macro.
(AARCH64_ISA_SM_OFF, AARCH64_ISA_SM_ON, AARCH64_ISA_ZA_ON)
(AARCH64_ISA_V8A, AARCH64_ISA_V8_1A, AARCH64_ISA_CRC)
(AARCH64_ISA_FP, AARCH64_ISA_SIMD, AARCH64_ISA_LSE)
(AARCH64_ISA_RDMA, AARCH64_ISA_V8_2A, AARCH64_ISA_F16)
(AARCH64_ISA_SVE, AARCH64_ISA_SVE2, AARCH64_ISA_SVE2_AES)
(AARCH64_ISA_SVE2_BITPERM, AARCH64_ISA_SVE2_SHA3)
(AARCH64_ISA_SVE2_SM4, AARCH64_ISA_SME, AARCH64_ISA_SME_I16I64)
(AARCH64_ISA_SME_F64F64, AARCH64_ISA_SME2, AARCH64_ISA_V8_3A)
(AARCH64_ISA_DOTPROD, AARCH64_ISA_AES, AARCH64_ISA_SHA2)
(AARCH64_ISA_V8_4A, AARCH64_ISA_SM4, AARCH64_ISA_SHA3)
(AARCH64_ISA_F16FML, AARCH64_ISA_RCPC, AARCH64_ISA_RCPC8_4)
(AARCH64_ISA_RNG, AARCH64_ISA_V8_5A, AARCH64_ISA_TME)
(AARCH64_ISA_MEMTAG, AARCH64_ISA_V8_6A, AARCH64_ISA_I8MM)
(AARCH64_ISA_F32MM, AARCH64_ISA_F64MM, AARCH64_ISA_BF16)
(AARCH64_ISA_SB, AARCH64_ISA_RCPC3, AARCH64_ISA_V8R)
(AARCH64_ISA_PAUTH, AARCH64_ISA_V8_7A, AARCH64_ISA_V8_8A)
(AARCH64_ISA_V8_9A, AARCH64_ISA_V9A, AARCH64_ISA_V9_1A)
(AARCH64_ISA_V9_2A, AARCH64_ISA_V9_3A, AARCH64_ISA_V9_4A)
(AARCH64_ISA_MOPS, AARCH64_ISA_LS64, AARCH64_ISA_CSSC)
(AARCH64_ISA_D128, AARCH64_ISA_THE, AARCH64_ISA_GCS): Remove.
(TARGET_BASE_SIMD, TARGET_SIMD, TARGET_FLOAT)
(TARGET_NON_STREAMING, TARGET_STREAMING, TARGET_ZA, TARGET_SHA2)
(TARGET_SHA3, TARGET_AES, TARGET_SM4, TARGET_F16FML)
(TARGET_CRC32, TARGET_LSE, TARGET_FP_F16INST)
(TARGET_SIMD_F16INST, TARGET_DOTPROD, TARGET_SVE, TARGET_SVE2)
(TARGET_SVE2_AES, TARGET_SVE2_BITPERM, TARGET_SVE2_SHA3)
(TARGET_SVE2_SM4, TARGET_SME, TARGET_SME_I16I64)
(TARGET_SME_F64F64, TARGET_SME2, TARGET_ARMV8_3, TARGET_JSCVT)
(TARGET_FRINT, TARGET_TME, TARGET_RNG, TARGET_MEMTAG)
(TARGET_I8MM, TARGET_SVE_I8MM, TARGET_SVE_F32MM)
(TARGET_SVE_F64MM, TARGET_BF16_FP, TARGET_BF16_SIMD)
(TARGET_SVE_BF16, TARGET_PAUTH, TARGET_BTI, TARGET_MOPS)
(TARGET_LS64, TARGET_CSSC, TARGET_SB, TARGET_RCPC, TARGET_RCPC2)
(TARGET_RCPC3, TARGET_SIMD_RDMA, TARGET_ARMV9_4, TARGET_D128)
(TARGET_THE, TARGET_GCS): Redefine using AARCH64_HAVE_ISA.
(TARGET_V8R, TARGET_V9A): New.
* config/aarch64/aarch64.md (arch_enabled): Use TARGET_RCPC2.
* config/aarch64/iterators.md (GPI_I16): Use TARGET_FP_F16INST.
(GPF_F16): Ditto.
* config/aarch64/predicates.md
(aarch64_rcpc_memory_operand): Use TARGET_RCPC2.
|
|
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_valid_sysreg_name_p): Add bool cast.
|
|
The awk scripts that process the .opt files are relatively fragile and
only handle a limited set of data types correctly. The unrecognised
aarch64_feature_flags type is handled as a uint64_t, which happens to be
correct for now. However, that assumption will change when we extend
the mask to 128 bits.
This patch changes the option members to use uint64_t types, and adds a
"_0" suffix to the names (both for future extensibility, and to allow
the original name to be used for the full aarch64_feature_flags mask
within generator files).
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc
(aarch64_set_asm_isa_flags): Reorder, and add suffix to names.
* config/aarch64/aarch64.h
(aarch64_get_asm_isa_flags): Add "_0" suffix.
(aarch64_get_isa_flags): Ditto.
(aarch64_asm_isa_flags): Redefine using renamed uint64_t value.
(aarch64_isa_flags): Ditto.
* config/aarch64/aarch64.opt:
(aarch64_asm_isa_flags): Rename to...
(aarch64_asm_isa_flags_0): ...this, and change to uint64_t.
(aarch64_isa_flags): Rename to...
(aarch64_isa_flags_0): ...this, and change to uint64_t.
|
|
Building an aarch64_feature_flags value from data within a gcc_options
or cl_target_option struct will get more complicated in a later commit.
Use a macro to avoid doing this manually in more than one location.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc
(aarch64_handle_option): Use new macro.
* config/aarch64/aarch64.cc
(aarch64_override_options_internal): Ditto.
(aarch64_option_print): Ditto.
(aarch64_set_current_function): Ditto.
(aarch64_can_inline_p): Ditto.
(aarch64_declare_function_name): Ditto.
(aarch64_start_file): Ditto.
* config/aarch64/aarch64.h (aarch64_get_asm_isa_flags): New
(aarch64_get_isa_flags): New.
(aarch64_asm_isa_flags): Use new macro.
(aarch64_isa_flags): Ditto.
|
|
Currently there are many places where an aarch64_feature_flags variable
is used, but only the bottom three isa mode bits are set and read.
Using a separate data type for these value makes it more clear that
they're not expected or required to have any of their upper feature bits
set. It will also make things simpler and more efficient when we extend
aarch64_feature_flags to 128 bits.
This patch uses explicit casts whenever converting from an
aarch64_feature_flags value to an aarch64_isa_mode value. This isn't
strictly necessary, but serves to highlight the locations where an
explicit conversion will become necessary later.
gcc/ChangeLog:
* config/aarch64/aarch64-opts.h: Add aarch64_isa_mode typedef.
* config/aarch64/aarch64-protos.h
(aarch64_gen_callee_cookie): Use aarch64_isa_mode parameter.
(aarch64_sme_vq_immediate): Ditto.
* config/aarch64/aarch64.cc
(aarch64_fntype_pstate_sm): Use aarch64_isa_mode values.
(aarch64_fntype_pstate_za): Ditto.
(aarch64_fndecl_pstate_sm): Ditto.
(aarch64_fndecl_pstate_za): Ditto.
(aarch64_fndecl_isa_mode): Ditto.
(aarch64_cfun_incoming_pstate_sm): Ditto.
(aarch64_cfun_enables_pstate_sm): Ditto.
(aarch64_call_switches_pstate_sm): Ditto.
(aarch64_gen_callee_cookie): Ditto.
(aarch64_callee_isa_mode): Ditto.
(aarch64_insn_callee_abi): Ditto.
(aarch64_sme_vq_immediate): Ditto.
(aarch64_add_offset_temporaries): Ditto.
(aarch64_add_offset): Ditto.
(aarch64_add_sp): Ditto.
(aarch64_sub_sp): Ditto.
(aarch64_guard_switch_pstate_sm): Ditto.
(aarch64_switch_pstate_sm): Ditto.
(aarch64_init_cumulative_args): Ditto.
(aarch64_allocate_and_probe_stack_space): Ditto.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_start_call_args): Ditto.
(aarch64_expand_call): Ditto.
(aarch64_end_call_args): Ditto.
(aarch64_set_current_function): Ditto, with added conversions.
(aarch64_handle_attr_arch): Avoid macro with changed type.
(aarch64_handle_attr_cpu): Ditto.
(aarch64_handle_attr_isa_flags): Ditto.
(aarch64_switch_pstate_sm_for_landing_pad):
Use arch64_isa_mode values.
(aarch64_switch_pstate_sm_for_jump): Ditto.
(pass_switch_pstate_sm::gate): Ditto.
* config/aarch64/aarch64.h
(AARCH64_ISA_MODE_{SM_ON|SM_OFF|ZA_ON}): New macros.
(AARCH64_FL_SM_STATE): Mark as possibly unused.
(AARCH64_ISA_MODE_SM_STATE): New aarch64_isa_mode mask.
(AARCH64_DEFAULT_ISA_MODE): New aarch64_isa_mode value.
(AARCH64_FL_DEFAULT_ISA_MODE): Define using above value.
(AARCH64_ISA_MODE): Change type to aarch64_isa_mode.
(arm_pcs): Use aarch64_isa_mode value.
|
|
The name would become misleading in a later commit anyway, and I think
this is marginally more readable.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_override_options): Remove temporary variable.
|
|
AARCH64_NUM_ISA_MODES will be used within aarch64-opts.h in a later
commit.
gcc/ChangeLog:
* config/aarch64/aarch64.h (DEF_AARCH64_ISA_MODE): Move to...
* config/aarch64/aarch64-opts.h (DEF_AARCH64_ISA_MODE): ...here.
|
|
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_tune_flags): Remove unused global variable.
(aarch64_override_options_internal): Remove dead assignment.
|
|
When I was trying to add an scalar version of iorc and andc, the optab that
got matched was for and/ior with the mode of csi and cdi instead of iorc and
andc optabs for si and di modes. Since csi/cdi are the complex integer modes,
we need to rename the optabs to be without c there. This changes c to n which
is a neutral and known not to be first letter of a mode.
Bootstrapped and tested on x86_64 and powerpc64le.
gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: s/iorc/iorn/. s/andc/andn/
for the code.
* config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update
to iorn.
* config/rs6000/rs6000.md (andc<mode>3): Rename to ...
(andn<mode>3): This.
(iorc<mode>3): Rename to ...
(iorn<mode>3): This.
* doc/md.texi: Update documentation for the rename.
* internal-fn.def (BIT_ANDC): Rename to ...
(BIT_ANDN): This.
(BIT_IORC): Rename to ...
(BIT_IORN): This.
* optabs.def (andc_optab): Rename to ...
(andn_optab): This.
(iorc_optab): Rename to ...
(iorn_optab): This.
* gimple-isel.cc (gimple_expand_vec_cond_expr): Update for the
renamed internal functions, ANDC/IORC to ANDN/IORN.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This reverts commit 4c5eb66e701bc9f3bf1298269f52559b10d63a09.
|
|
According to the Neoverse V2 Software Optimization Guide (section 4.14), the
instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
implemented so far. This patch implements and tests the two fusion pairs.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
There was also no non-noise impact on SPEC CPU2017 benchmark.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
fusion logic.
* config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
(cmp+cset): Likewise.
* config/aarch64/tuning_models/neoversev2.h: Enable logic in
field fusible_ops.
gcc/testsuite/
* gcc.target/aarch64/cmp_csel_fuse.c: New test.
* gcc.target/aarch64/cmp_cset_fuse.c: Likewise.
|
|
It is possible that the Zba optimization pattern zero_extendsidi2_bitmanip
matches for a XTheadMemIdx INSN with the effect of emitting an invalid
instruction as reported in PR116035.
The pattern above is used to emit a zext.w instruction to zero-extend
SI mode registers to DI mode. A similar functionality can be achieved
by XTheadBb's th.extu instruction. And indeed, we have the equivalent
pattern in thead.md (zero_extendsidi2_th_extu). However, that pattern
depends on !TARGET_XTHEADMEMIDX. To compensate for that, there are
specific patterns that ensure that zero-extension instruction can still
be emitted (th_memidx_bb_zero_extendsidi2 and friends).
While we could implement something similar (th_memidx_zba_zero_extendsidi2)
it would only make sense, if there existed real HW that does implement Zba
and XTheadMemIdx, but not XTheadBb. Unless such a machine exists, let's
simply disable zero_extendsidi2_bitmanip if XTheadMemIdx is available.
PR target/116035
gcc/ChangeLog:
* config/riscv/bitmanip.md: Disable zero_extendsidi2_bitmanip
for XTheadMemIdx.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr116035-1.c: New test.
* gcc.target/riscv/pr116035-2.c: New test.
Reported-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
gcc/ChangeLog:
PR target/115978
* config/i386/driver-i386.cc (host_detect_local_cpu): Enable
APX_F only for 64-bit codegen.
* config/i386/i386-options.cc (DEF_PTA): Skip PTA_APX_F if
not in 64-bit mode.
gcc/testsuite/ChangeLog:
PR target/115978
* gcc.target/i386/pr115978-1.c: New test.
* gcc.target/i386/pr115978-2.c: Ditto.
|
|
SPEC2017 perlbench for RISC-V was broke as runtime output mismatch
failure.
> 3830: mbox2: dWshe3Aa1EULre4CT5O/ErYFrk+o/EOoebA1kTVjQVQQH2EjT5fHcYnwjj2MdBmZu5y3Ce4Ei4QQZo/SNrry9g
> mbox2: uuWPimQiU0D4UrwFP+LS0lFNph4qL43WV1A6T3tHleatIOUaHixhrJU9NoA2lc9KjwYpdEL0lNTXkvo8ymNHzA
> ^
> 3832: mbox3: 8f4jdv6GIf0lX3DcdwRdEm6/aZwnmGX6n86GzCvmkwTKFXQjwlwVHc8jy8XlcyiIPr3yXTkgVOiP3cRYvyYQPg
> mbox3: 9xQySgP6qbhfxl8Usu1WfGA5UhStB5AN31wueGM6OF4Jp59DkqJPu6ksGblOU5u0nQapQC1e9oYIs16a2mq2NA
> ^
> specdiff run completed
Edwin bisected this to 273f16a125c4 ("[v3][RISC-V] Handle bit
manipulation of SImode values") which had the operands swapped in one
of the new splitters introduced.
No test as reducer narrows it to down to the exact test introduced by
the original commit.
gcc/ChangeLog:
* config/riscv/bitmanip.md: Fix splitter.
Reported-by: Edwin Lu <ewlu@rivosinc.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
Hi,
For PR96866, when printing asm code for modifier "%a", an addressable
operand is required. While the constraint "X" allow any kind of
operand even which is hard to get the address directly. e.g. extern
symbol whose address is in TOC.
An error message would be reported to indicate the invalid asm operand.
Compare with previous version, test case is updated with -mno-pcrel.
Bootstrap®test pass on ppc64{,le}.
Is this ok for trunk?
BR,
Jeff(Jiufu Guo)
PR target/96866
gcc/ChangeLog:
* config/rs6000/rs6000.cc (print_operand_address): Emit message for
unsupported operand.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr96866-1.c: New test.
* gcc.target/powerpc/pr96866-2.c: New test.
|
|
ix86_hardreg_mov_ok is added by r11-5066-gbe39636d9f68c4
> The solution proposed here is to have the x86 backend/recog prevent
> early RTL passes composing instructions (that set likely_spilled hard
> registers) that they (combine) can't simplify, until after reload.
> We allow sets from pseudo registers, immediate constants and memory
> accesses, but anything more complicated is performed via a temporary
> pseudo. Not only does this simplify things for the register allocator,
> but any remaining register-to-register moves are easily cleaned up
> by the late optimization passes after reload, such as peephole2 and
> cprop_hardreg.
The restriction is mainly for rtl optimization passes before pass_combine.
But split1 splits
```
(insn 17 13 18 2 (set (reg/i:V4SI 20 xmm0)
(vec_merge:V4SI (const_vector:V4SI [
(const_int -1 [0xffffffffffffffff]) repeated x4
])
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(unspec:QI [
(reg:V4SF 106)
(reg:V4SF 102)
(const_int 0 [0])
] UNSPEC_PCMP))) "/app/example.cpp":20:1 2929 {*avx_cmpv4sf3_1}
(expr_list:REG_DEAD (reg:V4SF 102)
(expr_list:REG_DEAD (reg:V4SF 106)
(nil))))
```
into:
```
(insn 23 13 24 2 (set (reg:V4SF 107)
(unspec:V4SF [
(reg:V4SF 106)
(reg:V4SF 102)
(const_int 0 [0])
] UNSPEC_PCMP)) "/app/example.cpp":20:1 -1
(nil))
(insn 24 23 18 2 (set (reg/i:V4SI 20 xmm0)
(subreg:V4SI (reg:V4SF 107) 0)) "/app/example.cpp":20:1 -1
(nil))
```
There're many splitters generating MOV insn with SUBREG and would have
same problem.
Instead of changing those splitters one by one, the patch relaxes
ix86_hard_mov_ok to allow mov subreg to hard register after
split1. ix86_pre_reload_split () is used to replace
!reload_completed && ira_in_progress.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_hardreg_mov_ok): Relax mov subreg
to hard register after split1.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr115982.C: New test.
|
|
When function rs6000_inner_target_options parsing target
options, it updates the explicit option set information for
rs6000_opt_masks by rs6000_isa_flags_explicit, but it misses
to update that information for rs6000_opt_vars, and it can
result in some unexpected consequence as the associated test
case shows. This patch is to fix rs6000_inner_target_options
to update the option set for rs6000_opt_vars as well.
PR target/115713
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_inner_target_options): Update option
set information for rs6000_opt_vars.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr115713-2.c: New test.
|
|
In rs6000_inner_target_options, when enabling VSX we enable
altivec and disable -mavoid-indexed-addresses implicitly,
but it doesn't consider the case that the options altivec
and avoid-indexed-addresses can be explicitly disabled. As
the test case in PR115713#c1 shows, with target attribute
"no-altivec,vsx", it results in that VSX unexpectedly set
altivec flag and there isn't an expected error.
This patch is to avoid the automatic enablement when they
are explicitly specified. With this change, an existing
test case ppc-target-4.c also requires an adjustment by
specifying explicit altivec in target attribute (since it
requires altivec feature and command line is specifying
no-altivec).
PR target/115713
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_inner_target_options): Avoid to
enable altivec or disable avoid-indexed-addresses automatically
when they get specified explicitly.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr115713-1.c: New test.
* gcc.target/powerpc/ppc-target-4.c: Adjust by specifying altivec
in target attribute.
|
|
As the discussion in PR115688, for now when users specify
-mvsx and -mno-altivec explicitly, compiler emits warning
rather than error, but considering both options are given
explicitly, emitting hard error should be better.
So this patch is to escalate some related warning to error
when both are incompatible.
PR target/115713
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Emit error
messages when explicit VSX encounters explicit soft-float, no-altivec
or avoid-indexed-addresses.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/warn-1.c: Move to ...
* gcc.target/powerpc/error-1.c: ... here. Adjust dg-warning with
dg-error and remove ineffective scan.
|
|
For prefetchi instructions, RIP-relative address is explicitly mentioned
for operand and assembler obeys that rule strictly. This makes
instruction like:
prefetchit0 bar
got illegal for assembler, which should be a broad usage for prefetchi.
Change to %a to explicitly add (%rip) after function label to make it
legal in assembler so that it could pass to linker to get the real address.
gcc/ChangeLog:
* config/i386/i386.md (prefetchi): Change to %a.
gcc/testsuite/ChangeLog:
* gcc.target/i386/prefetchi-1.c: Check (%rip).
|
|
This patch would like to implement the simple .SAT_TRUNC pattern
in the riscv backend. Aka:
Form 1:
#define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
{ \
bool overflow = x > (WT)(NT)(-1); \
return ((NT)x) | (NT)-overflow; \
}
DEF_SAT_U_TRUC_FMT_1(uint32_t, uint64_t)
Before this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
_Bool overflow;
unsigned char _1;
unsigned char _2;
unsigned char _3;
uint8_t _6;
;; basic block 2, loop depth 0
;; pred: ENTRY
overflow_5 = x_4(D) > 255;
_1 = (unsigned char) x_4(D);
_2 = (unsigned char) overflow_5;
_3 = -_2;
_6 = _1 | _3;
return _6;
;; succ: EXIT
}
After this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
uint8_t _6;
;; basic block 2, loop depth 0
;; pred: ENTRY
_6 = .SAT_TRUNC (x_4(D)); [tail call]
return _6;
;; succ: EXIT
}
The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc
gcc/ChangeLog:
* config/riscv/iterators.md (ANYI_DOUBLE_TRUNC): Add new iterator
for int double truncation.
(ANYI_DOUBLE_TRUNCATED): Add new attr for int double truncation.
(anyi_double_truncated): Ditto but for lowercase.
* config/riscv/riscv-protos.h (riscv_expand_ustrunc): Add new
func decl for expanding ustrunc
* config/riscv/riscv.cc (riscv_expand_ustrunc): Add new func
impl to expand ustrunc.
* config/riscv/riscv.md (ustrunc<mode><anyi_double_truncated>2): Impl
the new pattern ustrunc<m><n>2 for int.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_arith_data.h: New test.
* gcc.target/riscv/sat_u_trunc-1.c: New test.
* gcc.target/riscv/sat_u_trunc-2.c: New test.
* gcc.target/riscv/sat_u_trunc-3.c: New test.
* gcc.target/riscv/sat_u_trunc-run-1.c: New test.
* gcc.target/riscv/sat_u_trunc-run-2.c: New test.
* gcc.target/riscv/sat_u_trunc-run-3.c: New test.
* gcc.target/riscv/scalar_sat_unary.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch adds the power11 option to the -mcpu= and -mtune= switches.
This patch treats the power11 like a power10 in terms of costs and reassociation
width.
This patch issues a ".machine power11" to the assembly file if you use
-mcpu=power11.
This patch defines _ARCH_PWR11 if the user uses -mcpu=power11.
This patch allows GCC to be configured with the --with-cpu=power11 and
--with-tune=power11 options.
This patch passes -mpwr11 to the assembler if the user uses -mcpu=power11.
This patch adds support for using "power11" in the __builtin_cpu_is built-in
function.
2024-07-22 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config.gcc (powerpc*-*-*): Add support for power11.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=power11.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/driver-rs6000.cc (asm_names): Likewise.
* config/rs6000/ppc-auxv.h (PPC_PLATFORM_POWER11): New define.
* config/rs6000/rs6000-builtin.cc (cpu_is_info): Add power11.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
_ARCH_PWR11 if -mcpu=power11.
* config/rs6000/rs6000-cpus.def (POWER11_MASKS_SERVER): New define.
(POWERPC_MASKS): Add power11.
(power11 cpu): Add power11 definition.
* config/rs6000/rs6000-opts.h (PROCESSOR_POWER11): Add power11 processor.
* config/rs6000/rs6000-string.cc (expand_compare_loop): Likewise.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Add power11
support.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
(rs6000_opt_masks): Likewise.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/rs6000.md (cpu attribute): Add power11.
* config/rs6000/rs6000.opt (-mpower11): Add internal power11 flag.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document -mcpu=power11.
* config/rs6000/power10.md (all reservations): Add power11 support.
gcc/testsuite/
* gcc.target/powerpc/power11-1.c: New test.
* gcc.target/powerpc/power11-2.c: Likewise.
* gcc.target/powerpc/power11-3.c: Likewise.
|
|
aarch64_simd_mem_operand_p checked for a memory with a POST_INC
or REG address, but it didn't check what kind of register was
being used. This meant that it allowed DImode FPRs as well as GPRs.
I wondered about rewriting it to use aarch64_classify_address,
but this one-line fix seemed simpler. The structure then mirrors
the existing early exit in aarch64_classify_address itself:
/* On LE, for AdvSIMD, don't support anything other than POST_INC or
REG addressing. */
if (advsimd_struct_p
&& TARGET_SIMD
&& !BYTES_BIG_ENDIAN
&& (code != POST_INC && code != REG))
return false;
gcc/
PR target/115969
* config/aarch64/aarch64.cc (aarch64_simd_mem_operand_p): Require
the operand to be a legitimate memory_operand.
gcc/testsuite/
PR target/115969
* gcc.target/aarch64/pr115969.c: New test.
|
|
[PR115531].
This implements the new target hook indicating that for AArch64 when possible
we prefer masked operations for any type vs doing LOAD + SELECT or
SELECT + STORE.
Thanks,
Tamar
gcc/ChangeLog:
PR tree-optimization/115531
* config/aarch64/aarch64.cc
(aarch64_conditional_operation_is_expensive): New.
(TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE): New.
gcc/testsuite/ChangeLog:
PR tree-optimization/115531
* gcc.dg/vect/vect-conditional_store_1.c: New test.
* gcc.dg/vect/vect-conditional_store_2.c: New test.
* gcc.dg/vect/vect-conditional_store_3.c: New test.
* gcc.dg/vect/vect-conditional_store_4.c: New test.
|
|
I've also confirmed on the CSiBE set that the secondary combine pass is
actually beneficial on SH. It does result in some code size reductions.
gcc/CHangeLog:
* config/sh/sh.md (mov_neg_si_t): Allow insn and split after
register allocation.
(*treg_noop_move): New insn.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch-protos.h
(loongarch_split_128bit_move): Delete.
(loongarch_split_128bit_move_p): Delete.
(loongarch_split_256bit_move): Delete.
(loongarch_split_256bit_move_p): Delete.
(loongarch_split_vector_move): Add a function declaration.
* config/loongarch/loongarch.cc
(loongarch_vector_costs::finish_cost): Adjust the code
formatting.
(loongarch_split_vector_move_p): Merge
loongarch_split_128bit_move_p and loongarch_split_256bit_move_p.
(loongarch_split_move_p): Merge code.
(loongarch_split_move): Likewise.
(loongarch_split_128bit_move_p): Delete.
(loongarch_split_256bit_move_p): Delete.
(loongarch_split_128bit_move): Delete.
(loongarch_split_vector_move): Merge loongarch_split_128bit_move
and loongarch_split_256bit_move.
(loongarch_split_256bit_move): Delete.
(loongarch_global_init): Remove the extra semicolon at the
end of the function.
* config/loongarch/loongarch.md (*movdf_softfloat): Added a new
condition TARGET_64BIT.
|
|
gcc/
* config/avr/builtins.def (MASK1): New DEF_BUILTIN.
* config/avr/avr.cc (avr_rtx_costs_1): Handle rtx costs for
expressions like __builtin_avr_mask1.
(avr_init_builtins) <uintQI_ftype_uintQI_uintQI>: New tree type.
(avr_expand_builtin) [AVR_BUILTIN_MASK1]: Diagnose unexpected forms.
(avr_fold_builtin) [AVR_BUILTIN_MASK1]: Handle case.
* config/avr/avr.md (gen_mask1): New expand helper.
(mask1_0x01_split, mask1_0x80_split, mask1_0xfe_split): New
insn-and-split.
(*mask1_0x01, *mask1_0x80, *mask1_0xfe): New insns.
* doc/extend.texi (AVR Built-in Functions) <__builtin_avr_mask1>:
Document new built-in function.
gcc/testsuite/
* gcc.target/avr/torture/builtin-mask1.c: New test.
|
|
Both xchg and cmpxchg instructions, in the pseudo-C dialect, do not
expect their memory address operand to be surrounded by parentheses.
For example, it should be output as "w0 =cmpxchg32_32(r8+8,w0,w2)"
instead of "w0 =cmpxchg32_32((r8+8),w0,w2)".
This patch implements an operand modifier 'M' which marks the
instruction templates that do not expect the parentheses, and adds it do
xchg and cmpxchg templates.
gcc/ChangeLog:
* config/bpf/atomic.md (atomic_compare_and_swap,
atomic_exchange): Add operand modifier %M to the first
operand.
* config/bpf/bpf.cc (no_parentheses_mem_operand): Create
variable.
(bpf_print_operand): Set no_parentheses_mem_operand variable if
%M operand is used.
(bpf_print_operand_address): Conditionally output parentheses.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/pseudoc-atomic-memaddr-op.c: Add test.
|
|
There are various non-IBM CPUs with altivec, so we cannot use that
flag to determine which .machine cpu to use, so ignore it.
Emit an additional ".machine altivec" if Altivec is enabled so
that the assembler doesn't require an explicit -maltivec option
to assemble any Altivec instructions for those targets where
the ".machine cpu" is insufficient to enable Altivec. For example,
-mcpu=G5 emits a ".machine power4".
2024-07-18 René Rebe <rene@exactcode.de>
Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/97367
* config/rs6000/rs6000.cc (rs6000_machine_from_flags): Do not consider
OPTION_MASK_ALTIVEC.
(emit_asm_machine): For Altivec compiles, emit a ".machine altivec".
gcc/testsuite/
PR target/97367
* gcc.target/powerpc/pr97367.c: New test.
Signed-off-by: René Rebe <rene@exactcode.de>
|
|
gcc/ChangeLog:
PR target/115843
* config/i386/predicates.md (const0_or_m1_operand): New
predicate.
* config/i386/sse.md (*<avx512>_store<mode>_mask_1): New
pre_reload define_insn_and_split.
(V): Add V32BF,V16BF,V8BF.
(V4SF_V8BF): Rename to ..
(V24F_128): .. this.
(*vec_concat<mode>): Adjust with V24F_128.
(*vec_concat<mode>_0): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr115843.c: New test.
|
|
Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.
PR target/115526
gcc/ChangeLog:
* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_<tls>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/alpha/pr115526.c: New test.
|
|
This patch introduces a new insn that works as an insn combine
pattern for
(plus:HI (zero_extend:HI (reg:QI))
(const_0mod256_operannd:HI))
which requires at most 2 instructions. When the input register operand
is already in HImode, the addhi3 printer only adds the hi8 part when
it sees a SYMBOL_REF or CONST aligned to at least 256 bytes.
(The CONST_INT case was already handled).
gcc/
PR target/90616
* config/avr/predicates.md (const_0mod256_operand): New predicate.
* config/avr/constraints.md (Cp8): New constraint.
* config/avr/avr.md (*aligned_add_symbol): New insn.
* config/avr/avr.cc (avr_out_plus_symbol) [HImode]:
When op2 is a multiple of 256, there is no need to add / subtract
the lo8 part.
(avr_rtx_costs_1) [PLUS && HImode]: Return expected costs for
new insn *aligned_add_symbol as it applies.
|