Age | Commit message (Collapse) | Author | Files | Lines |
|
Since 87cb9423add vector alignment hints are emitted for target z13,
too. This patch changes this behaviour in the sense that alignment
hints are only emitted for target z13 if the assembler accepts them.
gcc/ChangeLog:
* config.in: Regenerate.
* config/s390/s390.c (print_operand): Emit vector alignment hints
for target z13, if AS accepts them. For other targets the logic
stays the same.
* config/s390/s390.h (TARGET_VECTOR_LOADSTORE_ALIGNMENT_HINTS): Define
macro.
* configure: Regenerate.
* configure.ac: Check HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS_ON_Z13.
|
|
Hello,
This patch fixes the MVE ACLE vaddq_m polymorphic variants by modifying the corresponding
intrinsic parameters and vaddq_m polymorphic variant's _Generic case entries in "arm_mve.h"
header file.
2020-06-04 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
gcc/
* config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Correct the intrinsic
arguments.
(__arm_vaddq_m_n_s32): Likewise.
(__arm_vaddq_m_n_s16): Likewise.
(__arm_vaddq_m_n_u8): Likewise.
(__arm_vaddq_m_n_u32): Likewise.
(__arm_vaddq_m_n_u16): Likewise.
(__arm_vaddq_m): Modify polymorphic variant.
gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_vaddq_m.c: New test.
|
|
This patch modifies the MVE scalar shift RTL patterns. The current patterns
have wrong constraints and predicates due to which the values returned from
MVE scalar shift instructions are overwritten in the code-gen.
example:
$ cat x.c
int32_t foo(int64_t acc, int shift)
{
return sqrshrl_sat48 (acc, shift);
}
Code-gen before applying this patch:
$ arm-none-eabi-gcc -march=armv8.1-m.main+mve -mfloat-abi=hard -O2 -S
$ cat x.s
foo:
push {r4, r5}
sqrshrl r0, r1, #48, r2 ----> (a)
mov r0, r4 ----> (b)
pop {r4, r5}
bx lr
Code-gen after applying this patch:
foo:
sqrshrl r0, r1, #48, r2
bx lr
In the current compiler the return value (r0) from sqrshrl (a) is getting
overwritten by the mov statement (b).
This patch fixes above issue.
2020-06-12 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
gcc/
* config/arm/mve.md (mve_uqrshll_sat<supf>_di): Correct the predicate
and constraint of all the operands.
(mve_sqrshrl_sat<supf>_di): Likewise.
(mve_uqrshl_si): Likewise.
(mve_sqrshr_si): Likewise.
(mve_uqshll_di): Likewise.
(mve_urshrl_di): Likewise.
(mve_uqshl_si): Likewise.
(mve_urshr_si): Likewise.
(mve_sqshl_si): Likewise.
(mve_srshr_si): Likewise.
(mve_srshrl_di): Likewise.
(mve_sqshll_di): Likewise.
* config/arm/predicates.md (arm_low_register_operand): Define.
gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_scalar_shifts1.c: New test.
* gcc.target/arm/mve/intrinsics/mve_scalar_shifts2.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_scalar_shifts3.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_scalar_shifts4.c: Likewise.
|
|
- riscv_gpr_save_operation_p might try to match parallel on other
patterns like inline asm pattern, and then it might trigger ther
assertion checking there, so we could trun it into a early exit check.
gcc/ChangeLog:
PR target/95683
* config/riscv/riscv.c (riscv_gpr_save_operation_p): Remove
assertion and turn it into a early exit check.
gcc/testsuite/ChangeLog
PR target/95683
* gcc.target/riscv/pr95683.c: New.
|
|
Remove TARGET_THREADPTR reference from TARGET_HAVE_TLS to avoid
static data initialization dependency on xtensa core configuration.
2020-06-15 Max Filippov <jcmvbkbc@gmail.com>
gcc/
* config/xtensa/xtensa.c (TARGET_HAVE_TLS): Remove
TARGET_THREADPTR reference.
(xtensa_tls_symbol_p, xtensa_tls_referenced_p): Use
targetm.have_tls instead of TARGET_HAVE_TLS.
(xtensa_option_override): Set targetm.have_tls to false in
configurations without THREADPTR.
|
|
2020-06-15 Max Filippov <jcmvbkbc@gmail.com>
gcc/
* config/xtensa/elf.h (ASM_SPEC, LINK_SPEC): Pass ABI switch to
assembler/linker.
* config/xtensa/linux.h (ASM_SPEC, LINK_SPEC): Ditto.
* config/xtensa/uclinux.h (ASM_SPEC, LINK_SPEC): Ditto.
* config/xtensa/xtensa.c (xtensa_option_override): Initialize
xtensa_windowed_abi if needed.
* config/xtensa/xtensa.h (TARGET_WINDOWED_ABI_DEFAULT): New
macro.
(TARGET_WINDOWED_ABI): Redefine to xtensa_windowed_abi.
* config/xtensa/xtensa.opt (xtensa_windowed_abi): New target
option variable.
(mabi=call0, mabi=windowed): New options.
* doc/invoke.texi: Document new -mabi= Xtensa-specific options.
gcc/testsuite/
* gcc.target/xtensa/mabi-call0.c: New test.
* gcc.target/xtensa/mabi-windowed.c: New test.
libgcc/
* configure: Regenerate.
* configure.ac: Use AC_COMPILE_IFELSE instead of manual
preprocessor invocation to check for __XTENSA_CALL0_ABI__.
|
|
Remove ABI reference from the ELIMINABLE_REGS to avoid static data
initialization dependency on xtensa core configuration.
2020-06-15 Max Filippov <jcmvbkbc@gmail.com>
gcc/
* config/xtensa/xtensa.c (xtensa_can_eliminate): New function.
(TARGET_CAN_ELIMINATE): New macro.
* config/xtensa/xtensa.h
(XTENSA_WINDOWED_HARD_FRAME_POINTER_REGNUM)
(XTENSA_CALL0_HARD_FRAME_POINTER_REGNUM): New macros.
(HARD_FRAME_POINTER_REGNUM): Define using
XTENSA_*_HARD_FRAME_POINTER_REGNUM.
(ELIMINABLE_REGS): Replace lines with HARD_FRAME_POINTER_REGNUM
by lines with XTENSA_WINDOWED_HARD_FRAME_POINTER_REGNUM and
XTENSA_CALL0_HARD_FRAME_POINTER_REGNUM.
|
|
gcc/ChangeLog:
* config/riscv/riscv.c (riscv_gen_gpr_save_insn): Change type to
unsigned for i.
(riscv_gpr_save_operation_p): Change type to unsigned for i and
len.
|
|
2020-06-13 Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog:
PR target/95488
* config/i386/i386-expand.c (ix86_expand_vecmul_qihi): New
function.
* config/i386/i386-protos.h (ix86_expand_vecmul_qihi): Declare.
* config/i386/sse.md (mul<mode>3): Drop mask_name since
there's no real simd int8 multiplication instruction with
mask. Also optimize it under TARGET_AVX512BW.
(mulv8qi3): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-pr95488-1.c: New test.
* gcc.target/i386/avx512bw-pr95488-2.c: Ditto.
* gcc.target/i386/avx512vl-pr95488-1.c: Ditto.
* gcc.target/i386/avx512vl-pr95488-2.c: Ditto.
|
|
Currently patchable area is at the wrong place. It is placed immediately
after function label, before both .cfi_startproc and ENDBR. This patch
adds UNSPECV_PATCHABLE_AREA for pseudo patchable area instruction and
changes ENDBR insertion pass to also insert patchable area instruction.
TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY is defined to avoid placing
patchable area before .cfi_startproc and ENDBR.
gcc/
PR target/93492
* config/i386/i386-features.c (rest_of_insert_endbranch):
Renamed to ...
(rest_of_insert_endbr_and_patchable_area): Change return type
to void. Add need_endbr and patchable_area_size arguments.
Don't call timevar_push nor timevar_pop. Replace
endbr_queued_at_entrance with insn_queued_at_entrance. Insert
UNSPECV_PATCHABLE_AREA for patchable area.
(pass_data_insert_endbranch): Renamed to ...
(pass_data_insert_endbr_and_patchable_area): This. Change
pass name to endbr_and_patchable_area.
(pass_insert_endbranch): Renamed to ...
(pass_insert_endbr_and_patchable_area): This. Add need_endbr
and patchable_area_size;.
(pass_insert_endbr_and_patchable_area::gate): Set and check
need_endbr and patchable_area_size.
(pass_insert_endbr_and_patchable_area::execute): Call
timevar_push and timevar_pop. Pass need_endbr and
patchable_area_size to rest_of_insert_endbr_and_patchable_area.
(make_pass_insert_endbranch): Renamed to ...
(make_pass_insert_endbr_and_patchable_area): This.
* config/i386/i386-passes.def: Replace pass_insert_endbranch
with pass_insert_endbr_and_patchable_area.
* config/i386/i386-protos.h (ix86_output_patchable_area): New.
(make_pass_insert_endbranch): Renamed to ...
(make_pass_insert_endbr_and_patchable_area): This.
* config/i386/i386.c (ix86_asm_output_function_label): Set
function_label_emitted to true.
(ix86_print_patchable_function_entry): New function.
(ix86_output_patchable_area): Likewise.
(x86_function_profiler): Replace endbr_queued_at_entrance with
insn_queued_at_entrance. Generate ENDBR only for TYPE_ENDBR.
Call ix86_output_patchable_area to generate patchable area if
needed.
(TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): New.
* config/i386/i386.h (queued_insn_type): New.
(machine_function): Add function_label_emitted. Replace
endbr_queued_at_entrance with insn_queued_at_entrance.
* config/i386/i386.md (UNSPECV_PATCHABLE_AREA): New.
(patchable_area): New.
gcc/testsuite/
PR target/93492
* gcc.target/i386/pr93492-1.c: New test.
* gcc.target/i386/pr93492-2.c: Likewise.
* gcc.target/i386/pr93492-3.c: Likewise.
* gcc.target/i386/pr93492-4.c: Likewise.
* gcc.target/i386/pr93492-5.c: Likewise.
|
|
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_density_test): Fix GNU coding
style.
|
|
gcc/ChangeLog:
PR target/95627
* config/rs6000/rs6000.c (rs6000_density_test): Skip debug
statements.
|
|
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_output_gpr_save): Remove.
* config/riscv/riscv-sr.c (riscv_sr_match_prologue): Update
value.
* config/riscv/riscv.c (riscv_output_gpr_save): Remove.
* config/riscv/riscv.md (gpr_save): Update output asm pattern.
|
|
- Verified on rv32emc/rv32gc/rv64gc bare-metal target and rv32gc/rv64gc
linux target with qemu.
gcc/ChangeLog:
* config/riscv/predicates.md (gpr_save_operation): New.
* config/riscv/riscv-protos.h (riscv_gen_gpr_save_insn): New.
(riscv_gpr_save_operation_p): Ditto.
* config/riscv/riscv-sr.c (riscv_remove_unneeded_save_restore_calls):
Ignore USEs for gpr_save patter.
* config/riscv/riscv.c (gpr_save_reg_order): New.
(riscv_expand_prologue): Use riscv_gen_gpr_save_insn to gen gpr_save.
(riscv_gen_gpr_save_insn): New.
(riscv_gpr_save_operation_p): Ditto.
* config/riscv/riscv.md (S3_REGNUM): New.
(S4_REGNUM): Ditto.
(S5_REGNUM): Ditto.
(S6_REGNUM): Ditto.
(S7_REGNUM): Ditto.
(S8_REGNUM): Ditto.
(S9_REGNUM): Ditto.
(S10_REGNUM): Ditto.
(S11_REGNUM): Ditto.
(gpr_save): Model USEs correctly.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr95252.c: New.
|
|
When registering the tuple type in register_tuple_type, the
TYPE_ALIGN (tuple_type) will be changed by -fpack-struct=n. We need to
maintain natural alignment in handle_arm_sve_h.
2020-06-10 Haijian Zhang <z.zhanghaijian@huawei.com>
gcc/
PR target/95523
* config/aarch64/aarch64-sve-builtins.h
(sve_switcher::m_old_maximum_field_alignment): New member.
* config/aarch64/aarch64-sve-builtins.cc
(sve_switcher::sve_switcher): Save maximum_field_alignment in
m_old_maximum_field_alignment and clear maximum_field_alignment.
(sve_switcher::~sve_switcher): Restore maximum_field_alignment.
gcc/testsuite/
PR target/95523
* gcc.target/aarch64/sve/pr95523.c: New test.
|
|
The cost model is currently treating multiplication by element as being more
expensive than 3 same multiplication. This means that if the value is on the
SIMD side we add an unneeded DUP. If the value is on the genreg side we use the
more expensive DUP instead of fmov.
This patch corrects the costs such that the two multiplies are costed the same
which allows us to generate
fmul v3.4s, v3.4s, v0.s[0]
instead of
dup v0.4s, v0.s[0]
fmul v3.4s, v3.4s, v0.4s
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Adjust costs for mul.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/asimd-mull-elem.c: New test.
|
|
This patch adds support for the two new HWCAP2 fields used by the
__builtin_cpu_supports function. It adds support in the target_clones
attribute for -mcpu=future.
The two new __builtin_cpu_supports tests are:
__builtin_cpu_supports ("isa_3_1")
__builtin_cpu_supports ("mma")
The bits used are the bits that the Linux kernel engineers will be using for
these new features.
gcc/
2020-06-05 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/ppc-auxv.h (PPC_PLATFORM_FUTURE): Allocate
'future' PowerPC platform.
(PPC_FEATURE2_ARCH_3_1): New HWCAP2 bit for ISA 3.1.
(PPC_FEATURE2_MMA): New HWCAP2 bit for MMA.
* config/rs6000/rs6000-call.c (cpu_supports_info): Add ISA 3.1 and
MMA HWCAP2 bits.
* config/rs6000/rs6000.c (CLONE_ISA_3_1): New clone support.
(rs6000_clone_map): Add 'future' system target_clones support.
|
|
MD patterns extended for unary ops ABS, CLS, CLZ, CNT, NEG and NOT
to support unpacked vectors. Also extended patterns for BIC to
support unpacked vectors where input elements are of the same width.
gcc/ChangeLog:
2020-06-09 Joe Ramsay <joe.ramsay@arm.com>
* config/aarch64/aarch64-sve.md (<optab><mode>2): Add support for
unpacked vectors.
(@aarch64_pred_<optab><mode>): Add support for unpacked vectors.
(@aarch64_bic<mode>): Enable unpacked BIC.
(*bic<mode>3): Enable unpacked BIC.
gcc/testsuite/ChangeLog:
2020-06-09 Joe Ramsay <joe.ramsay@arm.com>
* gcc.target/aarch64/sve/logical_unpacked_abs.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_bic_1.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_bic_2.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_bic_3.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_bic_4.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_neg.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_not.c: New test.
|
|
This expands the comment on an assert we have in aarch64_layout_frame
and points to an existing comment somewhere else that has a much longer
explanation of what's going on.
Committed under the GCC Obvious rule.
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_layout_frame): Expand comments.
|
|
While looking at PR target/94743 I noticed an ICE when I tried to save
all the FP registers: this was because all HI registers wouldn't match
vfp_register_operand.
gcc/ChangeLog:
* config/arm/predicates.md (vfp_register_operand): Use VFP_HI_REGS
instead of VFP_REGS.
|
|
gcc/ChangeLog:
* config/rs6000/vector.md: Replace FAIL with gcc_unreachable
in all vcond* patterns.
|
|
GCC currently hides the shift and xor reduction inside a backend
specific UNSPEC PARITY, making it invisible to the RTL optimizers until
very late during compilation. It is normally reasonable for the
middle-end to maintain wider mode representations for as long as possible
and split them later, but this only helps if the semantics are visible
at the RTL-level (to combine and other passes), but UNSPECs are black
boxes, so in this case splitting early (during RTL expansion) is a
better strategy.
It turns out that that popcount instruction on modern x86_64 processors
has (almost) made the integer parity flag in the x86 ALU completely
obsolete, especially as POPCOUNT's integer semantics are a much better
fit to RTL. The one remaining case where these transistors are useful
is where __builtin_parity is immediately tested by a conditional branch,
and therefore the result is wanted in a flags register rather than as
an integer. This case is captured by two peephole2 optimizations in
the attached patch.
2020-06-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog:
* config/i386/i386.md (paritydi2, paritysi2): Expand reduction
via shift and xor to an USPEC PARITY matching a parityhi2_cmp.
(paritydi2_cmp, paritysi2_cmp): Delete these define_insn_and_split.
(parityhi2, parityqi2): New expanders.
(parityhi2_cmp): Implement set parity flag with xorb insn.
(parityqi2_cmp): Implement set parity flag with testb insn.
New peephole2s to use these insns (UNSPEC PARITY) when appropriate.
gcc/testsuite/ChangeLog:
* gcc.target/i386/parity-3.c: New test.
* gcc.target/i386/parity-4.c: Likewise.
* gcc.target/i386/parity-5.c: Likewise.
* gcc.target/i386/parity-6.c: Likewise.
* gcc.target/i386/parity-7.c: Likewise.
* gcc.target/i386/parity-8.c: Likewise.
* gcc.target/i386/parity-9.c: Likewise.
|
|
Previously, flag_unroll_loops was turned on at -O2 implicitly. This
also turned on cunroll with allowance size increasing, and cunroll
will unroll/peel the loop even the loop is complex like code in PR95018.
With this patch, size growth for cunroll is allowed only for if -funroll-loops
or -fpeel-loops or -O3 is specified explicitly.
gcc/ChangeLog
2020-06-07 Jiufu Guo <guojiufu@linux.ibm.com>
PR target/95018
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Override flag_cunroll_grow_size.
|
|
In January I've added patterns to optimize SImode -> DImode sign or zero
extension of __builtin_popcount, this patch does the same for
__builtin_c[lt]z. Like most other instructions, the [tl]zcntl instructions
clear the upper 32 bits of the destination register and as the instructions
only result in values 0 to 32 inclusive, both sign and zero extensions
behave the same.
2020-06-05 Jakub Jelinek <jakub@redhat.com>
PR target/95535
* config/i386/i386.md (*ctzsi2_zext, *clzsi2_lzcnt_zext): New
define_insn_and_split patterns.
(*ctzsi2_zext_falsedep, *clzsi2_lzcnt_zext_falsedep): New
define_insn patterns.
* gcc.target/i386/pr95535-1.c: New test.
* gcc.target/i386/pr95535-2.c: New test.
|
|
gcc/config/i386/i386.h
2020-06-05 Lili Cui <lili.cui@intel.com>
gcc/ChangeLog:
PR target/95525
* config/i386/i386.h (PTA_WAITPKG): Change bitmask value.
|
|
This patch fixes a latent bug exposed by
eb72dc663e9070b281be83a80f6f838a3a878822 in the aarch64 backend that was
causing wrong codegen and several testsuite failures. See the discussion
on the bug for details.
Bootstrapped and regtested on aarch64-linux-gnu. Cleaned up several
failing tests and no new fails introduced.
2020-06-04 Richard Biener <rguenther@suse.de>
gcc/:
* config/aarch64/aarch64.c (aarch64_gimplify_va_arg_expr):
Ensure that tmp_ha is marked TREE_ADDRESSABLE.
|
|
intrinsics (PR94735).
The operands in RTL patterns of MVE vector scatter store intrinsics are wrongly grouped,
because of which few vector loads and stores instructions are wrongly getting optimized
out with -O2.
A new predicate "mve_scatter_memory" is defined in this patch, this predicate returns TRUE on
matching: (mem(reg)) for MVE scatter store intrinsics.
This patch fixes the issue by adding define_expand pattern with "mve_scatter_memory" predicate
and calls the corresponding define_insn by passing register_operand as first argument.
This register_operand is extracted from the operand with "mve_scatter_memory" predicate in
define_expand pattern.
gcc/ChangeLog:
2020-06-01 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
PR target/94735
* config/arm/predicates.md (mve_scatter_memory): Define to
match (mem (reg)) for scatter store memory.
* config/arm/mve.md (mve_vstrbq_scatter_offset_<supf><mode>): Modify
define_insn to define_expand.
(mve_vstrbq_scatter_offset_p_<supf><mode>): Likewise.
(mve_vstrhq_scatter_offset_<supf><mode>): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_<supf><mode>): Likewise.
(mve_vstrhq_scatter_shifted_offset_<supf><mode>): Likewise.
(mve_vstrdq_scatter_offset_p_<supf>v2di): Likewise.
(mve_vstrdq_scatter_offset_<supf>v2di): Likewise.
(mve_vstrdq_scatter_shifted_offset_p_<supf>v2di): Likewise.
(mve_vstrdq_scatter_shifted_offset_<supf>v2di): Likewise.
(mve_vstrhq_scatter_offset_fv8hf): Likewise.
(mve_vstrhq_scatter_offset_p_fv8hf): Likewise.
(mve_vstrhq_scatter_shifted_offset_fv8hf): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf): Likewise.
(mve_vstrwq_scatter_offset_fv4sf): Likewise.
(mve_vstrwq_scatter_offset_p_fv4sf): Likewise.
(mve_vstrwq_scatter_offset_p_<supf>v4si): Likewise.
(mve_vstrwq_scatter_offset_<supf>v4si): Likewise.
(mve_vstrwq_scatter_shifted_offset_fv4sf): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_<supf>v4si): Likewise.
(mve_vstrwq_scatter_shifted_offset_<supf>v4si): Likewise.
(mve_vstrbq_scatter_offset_<supf><mode>_insn): Define insn for scatter
stores.
(mve_vstrbq_scatter_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_offset_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_<supf><mode>_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_<supf><mode>_insn): Likewise.
(mve_vstrdq_scatter_offset_p_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_offset_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_<supf>v2di_insn): Likewise.
(mve_vstrhq_scatter_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_offset_p_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn): Likewise.
(mve_vstrwq_scatter_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_offset_p_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_offset_p_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_offset_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn): Likewise.
gcc/testsuite/ChangeLog:
2020-06-01 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
PR target/94735
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_base.c: New test.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_base_p.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_offset.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_offset_p.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_shifted_offset.c:
Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_shifted_offset_p.c:
Likewise.
|
|
Following MVE intrinsic testcases are failing in GCC testsuite.
Directory: gcc.target/arm/mve/intrinsics/
Testcases: vbicq_f16.c, vbicq_f32.c, vbicq_s16.c, vbicq_s32.c, vbicq_s8.c
,vbicq_u16.c, vbicq_u32.c and vbicq_u8.c.
This patch fixes the vbicq intrinsics by modifying the intrinsic parameters
and polymorphic variants in "arm_mve.h" header file.
Thanks,
Srinath.
gcc/ChangeLog:
2020-05-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
* config/arm/arm_mve.h (__arm_vbicq_n_u16): Correct the intrinsic
arguments.
(__arm_vbicq_n_s16): Likewise.
(__arm_vbicq_n_u32): Likewise.
(__arm_vbicq_n_s32): Likewise.
(__arm_vbicq): Modify polymorphic variant.
gcc/testsuite/ChangeLog:
2020-05-20 Srinath Parvathaneni <srinath.parvathaneni@arm.com>
* gcc.target/arm/mve/intrinsics/vbicq_f16.c: Modify.
* gcc.target/arm/mve/intrinsics/vbicq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vbicq_u8.c: Likewise.
|
|
When dest is memory, zero-masking is not valid, only merging-masking is available,
2020-06-24 Hongtao Liu <hongtao.liu@inte.com>
gcc/ChangeLog:
PR target/95254
* config/i386/sse.md (*vcvtps2ph_store<merge_mask_name>):
Refine from *vcvtps2ph_store<mask_name>.
(vcvtps2ph256<mask_name>): Refine constraint from vm to v.
(<mask_codefor>avx512f_vcvtps2ph512<mask_name>): Ditto.
(*vcvtps2ph256<merge_mask_name>): New define_insn.
(*avx512f_vcvtps2ph512<merge_mask_name>): Ditto.
* config/i386/subst.md (merge_mask): New define_subst.
(merge_mask_name): New define_subst_attr.
(merge_mask_operand3): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-vcvtps2ph-pr95254.c: New test.
* gcc.target/i386/avx512vl-vcvtps2ph-pr95254.c: Ditto.
|
|
When TARGET_VTABLE_USES_DESCRIPTORS is defined then function pointers in
the vtable are output by ASM_OUTPUT_FDESC. The only current user of
this is ia64, but its implementation of ASM_OUTPUT_FDESC lacks a call to
assemble_external. Thus if there is no other reference to the function
the weak declaration for it will be missing.
PR target/95154
* config/ia64/ia64.h (ASM_OUTPUT_FDESC): Call assemble_external.
|
|
2020-06-04 Hongtao.liu <hongtao.liu@intel.com>
gcc/ChangeLog:
* config/i386/sse.md (pmov_dst_3_lower): New mode attribute.
(trunc<mode><pmov_dst_3_lower>2): Refine from
trunc<mode><pmov_dst_3>2.
gcc/testsuite
* gcc.target/i386/pr92658-avx512bw-trunc.c: Adjust testcase.
|
|
The same problem also arises for plfs where prefixed_load_p()
doesn't recognize it so we get just lfs in the asm output
with an @pcrel address.
PR target/95347
* config/rs6000/rs6000.c (is_stfs_insn): Rename to
is_lfs_stfs_insn and make it recognize lfs as well.
(prefixed_store_p): Use is_lfs_stfs_insn().
(prefixed_load_p): Use is_lfs_stfs_insn() to recognize lfs.
|
|
In aarch64_short_vector_p, we are simply checking whether a type (and a mode)
is a 64/128-bit short vector or not. This should not be affected by the value
of TARGET_SVE. Simply leave later code to report an error if SVE is disabled.
2020-06-02 Felix Yang <felix.yang@huawei.com>
gcc/
PR target/95459
* config/aarch64/aarch64.c (aarch64_short_vector_p):
Leave later code to report an error if SVE is disabled.
gcc/testsuite/
PR target/95459
* gcc.target/aarch64/mgeneral-regs_6.c: New test.
|
|
This patch adds support for the Arm Zeus CPU.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
2020-06-02 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* config/aarch64/aarch64-cores.def (zeus): Define.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document zeus -mcpu option.
|
|
Because reg_to_non_prefixed() only looks at the register being used, it
doesn't get the right answer for stfs, which leads to us not seeing
that it has a PCREL symbol ref. This patch works around this by
introducing a helper function that inspects the insn to see if it is in
fact a stfs. Then if we use NON_PREFIXED_DEFAULT, address_to_insn_form()
can see that it has the PCREL symbol ref.
gcc/ChangeLog:
PR target/95347
* config/rs6000/rs6000.c (prefixed_store_p): Add special case
for stfs.
(is_stfs_insn): New helper function.
|
|
This patch removes the obsolete -mlocal-symbol-id option. This was used to
control mangling of local symbol names in order to work around a ROCm runtime
bug, but that has not been needed in some time, and the mangling was removed
already.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (CC1_SPEC): Delete.
* config/gcn/gcn.opt (-mlocal-symbol-id): Delete.
* config/gcn/mkoffload.c (main): Don't use -mlocal-symbol-id.
gcc/testsuite/ChangeLog:
* gcc.dg/intermod-1.c: Don't use -mlocal-symbol-id.
|
|
2020-06-02 Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
gcc/ChangeLog:
* config/s390/s390.c (print_operand): Emit vector alignment
hints for z13.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/align-1.c: Change target architecture
to z13.
* gcc.target/s390/vector/align-2.c: Change target architecture
to z13.
|
|
and leads to incorrect code generation.
* config/h8300/jumpcall.md (brabs, brabc): Disable patterns.
|
|
This is potentially a sequence of 3 shifts, we which optimize to a sequence
of 2 shifts. This can happen when unsigned int is used for array indexing.
gcc/
* config/riscv/riscv.md (zero_extendsidi2_shifted): New.
gcc/testsuite/
* gcc.target/riscv/zero-extend-5.c: New.
|
|
This is necessary as libmsvcrt.a is not a pure import library, but
also contains some functions that invoke others in KERNEL32.DLL.
gcc/
* config/i386/mingw32.h (REAL_LIBGCC_SPEC): Insert -lkernel32
after -lmsvcrt. This is necessary as libmsvcrt.a is not a pure
import library, but also contains some functions that invoke
others in KERNEL32.DLL.
Signed-off-by: Liu Hao <lh_mouse@126.com>
Signed-off-by: Jonathan Yong <10walls@gmail.com>
|
|
There are various VSX insns that do the same job as (older) AltiVec
insns, just with a wider range of possible registers. Many patterns
for such insns have the "v" alternative before the "wa" alternative,
which makes the output less readable than possible (since vs32 is v0,
and most insns before or after this insn will be VSX as well).
This changes the define_insns for the mrg and perm machine instructions
to prefer the VSX form. No behaviour change. Only one testcase needed
a little adjustment as well.
2020-05-29 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/altivec.md (altivec_vmrghw_direct): Prefer VSX form.
(altivec_vmrglw_direct): Ditto.
(altivec_vperm_<mode>_direct): Ditto.
(altivec_vperm_v8hiv16qi): Ditto.
(*altivec_vperm_<mode>_uns_internal): Ditto.
(*altivec_vpermr_<mode>_internal): Ditto.
(vperm_v8hiv4si): Ditto.
(vperm_v16qiv8hi): Ditto.
gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6.p9.c: Allow xxperm as perm as well.
|
|
gcc/ChangeLog:
2020-05-28 Andrew Stubbs <ams@codesourcery.com>
* config/gcn/gcn-valu.md (add<mode>3_vcc_zext_dup): Add early clobber.
(add<mode>3_vcc_zext_dup_exec): Likewise.
(add<mode>3_vcc_zext_dup2): Likewise.
(add<mode>3_vcc_zext_dup2_exec): Likewise.
|
|
Extended patterns for these instructions to support unpacked vectors.
BIC will have to wait, as there is not currently support for unpacked
NOT.
2020-05-29 Joe Ramsay <joe.ramsay@arm.com>
gcc/
* config/aarch64/aarch64-sve.md (<LOGICAL:optab><mode>3): Add support
for unpacked EOR, ORR, AND.
gcc/testsuite/
* gcc.target/aarch64/sve/load_const_offset_2.c: Force using packed
vectors.
* gcc.target/aarch64/sve/logical_unpacked_and_1.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_and_2.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_and_3.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_and_4.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_and_5.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_and_6.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_and_7.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_eor_1.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_eor_2.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_eor_3.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_eor_4.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_eor_5.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_eor_6.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_eor_7.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_orr_1.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_orr_2.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_orr_3.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_orr_4.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_orr_5.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_orr_6.c: New test.
* gcc.target/aarch64/sve/logical_unpacked_orr_7.c: New test.
* gcc.target/aarch64/sve/scatter_store_6.c: Force using packed vectors.
* gcc.target/aarch64/sve/scatter_store_7.c: Force using packed vectors.
* gcc.target/aarch64/sve/strided_load_3.c: Force using packed vectors.
* gcc.target/aarch64/sve/strided_store_3.c: Force using packed vectors.
* gcc.target/aarch64/sve/unpack_signed_1.c: Force using packed vectors.
|
|
* config/h8300/logical.md (bclrhi_msx): Remove pattern.
|
|
* config/h8300/logical.md (HImode H8/SX bit-and splitter): Don't
make a nonzero adjustment to the memory offset.
(b<ior,xor>hi_msx): Turn into a splitter.
|
|
wb_candidate1 and wb_candidate2 exist for two overlapping cases:
when we use an STR or STP with writeback to allocate the frame,
and when we set up a frame chain record (either using writeback
allocation or not).
However, aarch64_layout_frame was leaving these fields with
legitimate register numbers even if we decided to do neither
of those things. This prevented those registers from being
shrink-wrapped, even though we were otherwise treating them
as normal saves and restores.
The case this patch handles isn't the common case, so it might
not be worth going out of our way to optimise it. But I think
the patch actually makes the output of aarch64_layout_frame more
consistent.
2020-05-28 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* config/aarch64/aarch64.h (aarch64_frame): Add a comment above
wb_candidate1 and wb_candidate2.
* config/aarch64/aarch64.c (aarch64_layout_frame): Invalidate
wb_candidate1 and wb_candidate2 if we decided not to use them.
gcc/testsuite/
* gcc.target/aarch64/shrink_wrap_1.c: New test.
|
|
The stack frame for the function in the testcase consisted of two
SVE save slots. Both saves had been shrink-wrapped, but for different
blocks, meaning that the stack allocation and deallocation were
separate from the saves themselves. Before emitting the deallocation,
we tried to attach a REG_CFA_DEF_CFA note to the preceding instruction,
to redefine the CFA in terms of the stack pointer. But in this case
there was no preceding instruction.
This in practice only happens for SVE because:
(a) We don't try to shrink-wrap wb_candidate* registers even when
we've decided to treat them as normal saves and restores.
I have a fix for that.
(b) Even with (a) fixed, we're (almost?) guaranteed to emit
a stack tie for frames that are 64k or larger, so we end
up hanging the REG_CFA_DEF_CFA note on that instead.
We should only need to redefine the CFA if it was previously
defined in terms of the frame pointer. In other cases the CFA
should already be defined in terms of the stack pointer,
so redefining it is unnecessary but usually harmless.
2020-05-28 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR testsuite/95361
* config/aarch64/aarch64.c (aarch64_expand_epilogue): Assert that
we have at least some CFI operations when using a frame pointer.
Only redefine the CFA if we have CFI operations.
gcc/testsuite/
PR testsuite/95361
* gcc.target/aarch64/sve/pr95361.c: New test.
|
|
gcc/ChangeLog
2020-05-28 Andrea Corallo <andrea.corallo@arm.com>
* config/arm/arm.c (mve_vector_mem_operand): Fix unwanted
fall-throughs.
|
|
According to Intel SDM, VPMOVQB xmm1/m16 {k1}{z}, xmm2 has 16-bit
memory_operand instead of 128-bit one which existed in current
implementation. Also for other vpmov instructions which have
memory_operand narrower than 128bits.
2020-05-25 Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog
* config/i386/sse.md (*avx512vl_<code>v2div2qi2_store_1): Rename
from *avx512vl_<code>v2div2qi_store and refine memory size of
the pattern.
(*avx512vl_<code>v2div2qi2_mask_store_1): Ditto.
(*avx512vl_<code><mode>v4qi2_store_1): Ditto.
(*avx512vl_<code><mode>v4qi2_mask_store_1): Ditto.
(*avx512vl_<code><mode>v8qi2_store_1): Ditto.
(*avx512vl_<code><mode>v8qi2_mask_store_1): Ditto.
(*avx512vl_<code><mode>v4hi2_store_1): Ditto.
(*avx512vl_<code><mode>v4hi2_mask_store_1): Ditto.
(*avx512vl_<code>v2div2hi2_store_1): Ditto.
(*avx512vl_<code>v2div2hi2_mask_store_1): Ditto.
(*avx512vl_<code>v2div2si2_store_1): Ditto.
(*avx512vl_<code>v2div2si2_mask_store_1): Ditto.
(*avx512f_<code>v8div16qi2_store_1): Ditto.
(*avx512f_<code>v8div16qi2_mask_store_1): Ditto.
(*avx512vl_<code>v2div2qi2_store_2): New define_insn_and_split.
(*avx512vl_<code>v2div2qi2_mask_store_2): Ditto.
(*avx512vl_<code><mode>v4qi2_store_2): Ditto.
(*avx512vl_<code><mode>v4qi2_mask_store_2): Ditto.
(*avx512vl_<code><mode>v8qi2_store_2): Ditto.
(*avx512vl_<code><mode>v8qi2_mask_store_2): Ditto.
(*avx512vl_<code><mode>v4hi2_store_2): Ditto.
(*avx512vl_<code><mode>v4hi2_mask_store_2): Ditto.
(*avx512vl_<code>v2div2hi2_store_2): Ditto.
(*avx512vl_<code>v2div2hi2_mask_store_2): Ditto.
(*avx512vl_<code>v2div2si2_store_2): Ditto.
(*avx512vl_<code>v2div2si2_mask_store_2): Ditto.
(*avx512f_<code>v8div16qi2_store_2): Ditto.
(*avx512f_<code>v8div16qi2_mask_store_2): Ditto.
* config/i386/i386-builtin-types.def: Adjust builtin type.
* config/i386/i386-expand.c: Ditto.
* config/i386/i386-builtin.def: Adjust builtin.
* config/i386/avx512fintrin.h: Ditto.
* config/i386/avx512vlbwintrin.h: Ditto.
* config/i386/avx512vlintrin.h: Ditto.
|
|
This fixes 'non-delegitimized UNSPEC 3 found in variable location' notes
issued when building libraries which interferes with running tests.
2020-05-27 Max Filippov <jcmvbkbc@gmail.com>
gcc/
* config/xtensa/xtensa.c (xtensa_delegitimize_address): New
function.
(TARGET_DELEGITIMIZE_ADDRESS): New macro.
|