riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-05-29	RISC-V: Fix ternary instruction attribute bug	Juzhe-Zhong	1	-1/+1
	Fix bug of vector.md which generate incorrect information to VSETVL PASS when testing FMA auto vectorization ternop-3.c. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/vector.md: Fix vimuladd instruction bug.
2023-05-29	RISC-V: Fix incorrect VXRM configuration in mode switching for CALL and ASM	Juzhe-Zhong	1	-1/+28
	Currently mode switching incorrect codegen for the following case: void fn (void); void f (void * in, void out, int32_t x, int n, int m) { for (int i = 0; i < n; i++) { vint32m1_t v = __riscv_vle32_v_i32m1 (in + i, 4); vint32m1_t v2 = __riscv_vle32_v_i32m1_tu (v, in + 100 + i, 4); vint32m1_t v3 = __riscv_vaadd_vx_i32m1 (v2, 0, VXRM_RDN, 4); fn (); v3 = __riscv_vaadd_vx_i32m1 (v3, 3, VXRM_RDN, 4); __riscv_vse32_v_i32m1 (out + 100 + i, v3, 4); } } Before this patch: Preheader: ... csrwi vxrm,2 Loop Body: ... (no cswri vxrm,2) vaadd.vx ... vaadd.vx ... This codegen is incorrect. After this patch: Preheader: ... csrwi vxrm,2 Loop Body: ... vaadd.vx ... csrwi vxrm,2 ... vaadd.vx ... cross-compile build PASS and regression PASS. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: config/riscv/riscv.cc (global_state_unknown_p): New function. (riscv_mode_after): Fix incorrect VXM. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/vxrm-11.c: New test. * gcc.target/riscv/rvv/base/vxrm-12.c: New test.
2023-05-29	RISC-V: Add ZVFHMIN extension to the -march= option	Pan Li	1	-2/+4
	This patch would like to add new sub extension (aka ZVFHMIN) to the -march= option. To make it simple, only the sub extension itself is involved in this patch, and the underlying FP16 related RVV intrinsic API depends on the TARGET_ZVFHMIN. The Zvfhmin extension depends on the Zve32f extension. You can locate more information about ZVFHMIN from below spec doc. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#zvfhmin-vector-extension-for-minimal-half-precision-floating-point Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * common/config/riscv/riscv-common.cc: (riscv_implied_info): Add zvfhmin item. (riscv_ext_version_table): Ditto. (riscv_ext_flag_table): Ditto. * config/riscv/riscv-opts.h (MASK_ZVFHMIN): New macro. (TARGET_ZFHMIN): Align indent. (TARGET_ZFH): Ditto. (TARGET_ZVFHMIN): New macro. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-20.c: New test. * gcc.target/riscv/predef-26.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-27	Split notl + pbraodcast + pand to pbroadcast + pandn more modes.	liuhongt	1	-6/+6
	r12-5595-gc39d77f252e895306ef88c1efb3eff04e4232554 adds 2 splitter to transform notl + pbroadcast + pand to pbroadcast + pandn for VI124_AVX2 which leaves out all DI-element-size ones as well as all 512-bit ones. This patch extend the splitter to VI_AVX2 which will handle DImode for AVX2, and V64QImode,V32HImode,V16SImode,V8DImode for AVX512. gcc/ChangeLog: PR target/100711 * config/i386/sse.md (andnot<mode>3): Extend below splitter to VI_AVX2 to cover more modes. gcc/testsuite/ChangeLog: gcc.target/i386/pr100711-2.c: Add v4di/v2di testcases. * gcc.target/i386/pr100711-3.c: New test.
2023-05-27	Disable avoid_false_dep_for_bmi for atom and icelake(and later) core processors.	liuhongt	1	-1/+2
	lzcnt/tzcnt has been fixed since skylake, popcnt has been fixed since icelake. At least for icelake and later intel Core processors, the errata tune is not needed. And the tune isn't need for ATOM either. gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Remove ATOM and ICELAKE(and later) core processors.
2023-05-26	RISC-V: Implement autovec abs, vneg, vnot.	Robin Dapp	3	-0/+59
	This patch implements abs<mode>2, vneg<mode>2 and vnot<mode>2 expanders for integer vector registers and adds tests for them. gcc/ChangeLog: * config/riscv/autovec.md (<optab><mode>2): Add vneg/vnot. (abs<mode>2): Add. * config/riscv/riscv-protos.h (emit_vlmax_masked_mu_insn): Declare. * config/riscv/riscv-v.cc (emit_vlmax_masked_mu_insn): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add unop tests. * gcc.target/riscv/rvv/autovec/unop/abs-run.c: New test. * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/abs-template.h: New test. * gcc.target/riscv/rvv/autovec/unop/vneg-run.c: New test. * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/vneg-template.h: New test. * gcc.target/riscv/rvv/autovec/unop/vnot-run.c: New test. * gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/vnot-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/vnot-template.h: New test.
2023-05-26	RISC-V: Add autovec sign/zero extension and truncation.	Robin Dapp	5	-3/+249
	This patch implements the autovec expanders for sign and zero extension patterns as well as the accompanying truncations. In order to use them additional mode_attr iterators as well as vectorizer hooks are required. Using these hooks we can e.g. vectorize with VNx4QImode as base mode and extend VNx4SI to VNx4DI. They are still going to be expanded in the future. vf4 and vf8 truncations are emulated by truncating two and three times respectively. The patch also adds tests and changes some expectations for already existing ones. Combine does not yet handle binary operations of two widened operands as we are missing the necessary split/rewrite patterns. These will be added at a later time. Co-authored-by: Juzhe Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/autovec.md (<optab><v_double_trunc><mode>2): New expander. (<optab><v_quad_trunc><mode>2): Dito. (<optab><v_oct_trunc><mode>2): Dito. (trunc<mode><v_double_trunc>2): Dito. (trunc<mode><v_quad_trunc>2): Dito. (trunc<mode><v_oct_trunc>2): Dito. * config/riscv/riscv-protos.h (vectorize_related_mode): Define. (autovectorize_vector_modes): Define. * config/riscv/riscv-v.cc (vectorize_related_mode): Implement hook. (autovectorize_vector_modes): Implement hook. * config/riscv/riscv.cc (riscv_autovectorize_vector_modes): Implement target hook. (riscv_vectorize_related_mode): Implement target hook. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define. (TARGET_VECTORIZE_RELATED_MODE): Define. * config/riscv/vector-iterators.md: Add lowercase versions of mode_attr iterators. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adjust expectation. * gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito. * gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito. * gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: Dito. * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: Dito. * gcc.target/riscv/rvv/autovec/zve64d-2.c: Dito. * gcc.target/riscv/rvv/autovec/zve64f-2.c: Dito. * gcc.target/riscv/rvv/autovec/zve64x-2.c: Dito. * gcc.target/riscv/rvv/rvv.exp: Add new conversion tests. * gcc.target/riscv/rvv/vsetvl/avl_single-38.c: Do not vectorize. * gcc.target/riscv/rvv/vsetvl/avl_single-47.c: Dito. * gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Dito. * gcc.target/riscv/rvv/vsetvl/avl_single-49.c: Dito. * gcc.target/riscv/rvv/vsetvl/imm_switch-8.c: Dito. * gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vncvt-template.h: New test. * gcc.target/riscv/rvv/autovec/conversions/vsext-run.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vsext-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vsext-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vsext-template.h: New test. * gcc.target/riscv/rvv/autovec/conversions/vzext-run.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vzext-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vzext-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vzext-template.h: New test.
2023-05-26	amdgcn: Change -m(no-)xnack to -mxnack=(on,off,any)	Tobias Burnus	5	-29/+47
	Since object code target ID V4, xnack has the values unspecified, '+' and '-', which with this commit is represented in GCC as 'any', 'on', and 'off', following the precidence for 'sram(-)ecc' and -msram-ecc=. The current default was 'no' and is now 'off'; however, once XNACK is implemented, the default should be probably 'any'. This commit updates the commandline options to permit the new tristate and updates the documentation. As the feature itself is currently not really supported in GCC, the change should not affect real-world users. The XNACK feature allows memory load instructions to restart safely following a page-miss interrupt. This is useful for shared-memory devices, like APUs, and to implement OpenMP Unified Shared Memory. 2023-05-26 Andrew Stubbs <ams@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> * config/gcn/gcn-hsa.h (XNACKOPT): New macro. (ASM_SPEC): Use XNACKOPT. * config/gcn/gcn-opts.h (enum sram_ecc_type): Rename to ... (enum hsaco_attr_type): ... this, and generalize the names. (TARGET_XNACK): New macro. * config/gcn/gcn.cc (gcn_option_override): Update to sorry for all but -mxnack=off. (output_file_start): Update xnack handling. (gcn_hsa_declare_function_name): Use TARGET_XNACK. * config/gcn/gcn.opt (-mxnack): Add the "on/off/any" syntax. (sram_ecc_type): Rename to ... (hsaco_attr_type: ... this.) * config/gcn/mkoffload.cc (SET_XNACK_ANY): New macro. (TEST_XNACK): Delete. (TEST_XNACK_ANY): New macro. (TEST_XNACK_ON): New macro. (main): Support the new -mxnack=on/off/any syntax. * doc/invoke.texi (-mxnack): Update for new syntax.
2023-05-26	xtensa: Rework 'setmemsi' insn pattern	Takayuki 'January June' Suwa	3	-154/+172
	In order to reject voodoo estimation logic with lots of magic numbers, this patch revises the code to measure the costs of the three memset methods based on the actual emission size of the insn sequence corresponding to each method and choose the smallest one. gcc/ChangeLog: * config/xtensa/xtensa-protos.h (xtensa_expand_block_set_unrolled_loop, xtensa_expand_block_set_small_loop): Remove. (xtensa_expand_block_set): New prototype. * config/xtensa/xtensa.cc (xtensa_expand_block_set_libcall): New subfunction. (xtensa_expand_block_set_unrolled_loop, xtensa_expand_block_set_small_loop): Rewrite as subfunctions. (xtensa_expand_block_set): New function that calls the above subfunctions. * config/xtensa/xtensa.md (memsetsi): Change to invoke only xtensa_expand_block_set().
2023-05-26	xtensa: Add 'subtraction from constant' insn pattern	Takayuki 'January June' Suwa	4	-1/+33
	This patch makes try to eliminate using temporary pseudo for '(minus:SI (const_int) (reg:SI))' if the addition of negative constant value can be emitted in a single machine instruction. /* example / int test0(int x) { return 1 - x; } int test1(int x) { return 100 - x; } int test2(int x) { return 25600 - x; } ;; before test0: movi.n a9, 1 sub a2, a9, a2 ret.n test1: movi a9, 0x64 sub a2, a9, a2 ret.n test2: movi.n a9, 0x19 slli a9, a9, 10 sub a2, a9, a2 ret.n ;; after test0: addi.n a2, a2, -1 neg a2, a2 ret.n test1: addi a2, a2, -100 neg a2, a2 ret.n test2: addmi a2, a2, -0x6400 neg a2, a2 ret.n gcc/ChangeLog: config/xtensa/xtensa-protos.h (xtensa_m1_or_1_thru_15): New prototype. * config/xtensa/xtensa.cc (xtensa_m1_or_1_thru_15): New function. * config/xtensa/constraints.md (O): Change to use the above function. * config/xtensa/xtensa.md (*subsi3_from_const): New insn_and_split pattern.
2023-05-26	xtensa: tidy extzvsi-1bit patterns	Takayuki 'January June' Suwa	1	-5/+6
	gcc/ChangeLog: * config/xtensa/xtensa.md (extzvsi-1bit_ashlsi3): Retract excessive line folding, and correct the value of the "length" insn attribute related to TARGET_DENSITY. (extzvsi-1bit_addsubx): Ditto.
2023-05-26	i386: Do not disable call to ix86_expand_vecop_qihi2	Uros Bizjak	1	-1/+1
	gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vecop_qihi): Do not disable call to ix86_expand_vecop_qihi2.
2023-05-26	RISC-V: Fix zero-scratch-regs-3.c fail	Juzhe-Zhong	1	-2/+2
	gcc/ChangeLog: * config/riscv/riscv.cc (vector_zero_call_used_regs): Add explict VL and drop VL in ops. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2023-05-25	i386: Use 2x-wider modes when emulating QImode vector instructions	Uros Bizjak	2	-169/+254
	Rewrite ix86_expand_vecop_qihi2 to expand fo 2x-wider (e.g. V16QI -> V16HImode) instructions when available. Currently, the compiler generates following assembly for V16QImode multiplication (-mavx2): vpunpcklbw %xmm0, %xmm0, %xmm3 vpunpcklbw %xmm1, %xmm1, %xmm2 vpunpckhbw %xmm0, %xmm0, %xmm0 movl $255, %eax vpunpckhbw %xmm1, %xmm1, %xmm1 vpmullw %xmm3, %xmm2, %xmm2 vmovd %eax, %xmm3 vpmullw %xmm0, %xmm1, %xmm1 vpbroadcastw %xmm3, %xmm3 vpand %xmm2, %xmm3, %xmm0 vpand %xmm1, %xmm3, %xmm3 vpackuswb %xmm3, %xmm0, %xmm0 and only with -mavx512bw -mavx512vl generates: vpmovzxbw %xmm1, %ymm1 vpmovzxbw %xmm0, %ymm0 vpmullw %ymm1, %ymm0, %ymm0 vpmovwb %ymm0, %xmm0 Patched compiler generates more optimized code involving multiplication in 2x-wider mode in cases where missing truncate instruction has to be emulated with a permutation (-mavx2): vpmovzxbw %xmm0, %ymm0 vpmovzxbw %xmm1, %ymm1 movl $255, %eax vpmullw %ymm1, %ymm0, %ymm1 vmovd %eax, %xmm0 vpbroadcastw %xmm0, %ymm0 vpand %ymm1, %ymm0, %ymm0 vpackuswb %ymm0, %ymm0, %ymm0 vpermq $216, %ymm0, %ymm0 The patch also adjusts cost calculation of VQImode emulations to account for generation of 2x-wider mode instructions. gcc/ChangeLog: config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode) instructions when available. Emulate truncation via ix86_expand_vec_perm_const_1 when native truncate insn is not available. (ix86_expand_vecop_qihi_partial) <case MULT>: Use pmovzx when available. Trivially rename some variables. (ix86_expand_vecop_qihi): Unconditionally call ix86_expand_vecop_qihi2. * config/i386/i386.cc (ix86_multiplication_cost): Rewrite cost calculation of VQImode emulations to account for generation of 2x-wider mode instructions. (ix86_shift_rotate_cost): Update cost calculation of VQImode emulations to account for generation of 2x-wider mode instructions. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vl-pr95488-1.c: Revert 2023-05-18 change.
2023-05-25	target/104327: Allow more inlining between different optimization levels.	Georg-Johann Lay	1	-0/+16
	avr-common.cc introduces the following options that are set depending on optimization level: -mgas-isr-prologues, -mmain-is-OS-task and -fsplit-wide-types-early. The inliner thinks that different options disallow cross-optimization inlining, so provide can_inline_p. gcc/ PR target/104327 * config/avr/avr.cc (avr_can_inline_p): New static function. (TARGET_CAN_INLINE_P): Define to that function.
2023-05-25	target/82931: Make a pattern more generic to match more bit-transfers.	Georg-Johann Lay	2	-10/+23
	There is already a pattern in avr.md that matches single-bit transfers from one register to another one, but it only handled bit 0 of 8-bit registers. This change makes that pattern more generic so it matches more of similar single-bit transfers. gcc/ PR target/82931 * config/avr/avr.md (movbitqi.0): Rename to movbit<mode>.0-6. Handle any bit position and use mode QISI. * config/avr/avr.cc (avr_rtx_costs_1) [IOR]: Return a cost of 2 insns for bit-transfer of respective style. gcc/testsuite/ PR target/82931 * gcc.target/avr/pr82931.c: New test.
2023-05-25	arm: merge MVE_5 and MVE_6 iterators	Christophe Lyon	2	-35/+34
	MVE_5 and MVE_6 iterators are the same: this patch replaces MVE_6 with MVE_5 everywhere in mve.md and removes MVE_6 from iterators.md. 2023-05-25 Christophe Lyon <christophe.lyon@linaro.org> gcc/ * config/arm/iterators.md (MVE_6): Remove. * config/arm/mve.md: Replace MVE_6 with MVE_5.
2023-05-25	aarch64: PR target/99195 Annotate complex FP patterns for vec-concat-zero	Kyrylo Tkachov	1	-16/+16
	This patch annotates the complex add and mla patterns for vec-concat-zero. Testing showed an interesting bug in our MD patterns where they were defined to match: (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0") (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand" "w") (match_operand:VHSDF 3 "register_operand" "w") (match_operand:SI 4 "const_int_operand" "n")] FCMLA)) but the canonicalisation rules for PLUS require the more "complex" operand to be first so during combine when the new substituted patterns were attempted to be formed combine/recog would try to match: (plus:V2SF (unspec:V2SF [ (reg:V2SF 100) (reg:V2SF 101) (const_int 0 [0]) ] UNSPEC_FCMLA270) (reg:V2SF 99)) instead. This patch fixes the operands of the PLUS RTX in these patterns. Similar patterns for the dot-product instructions already used the right order. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (aarch64_fcadd<rot><mode>): Rename to... (aarch64_fcadd<rot><mode><vczle><vczbe>): ... This. Fix canonicalization of PLUS operands. (aarch64_fcmla<rot><mode>): Rename to... (aarch64_fcmla<rot><mode><vczle><vczbe>): ... This. Fix canonicalization of PLUS operands. (aarch64_fcmla_lane<rot><mode>): Rename to... (aarch64_fcmla_lane<rot><mode><vczle><vczbe>): ... This. Fix canonicalization of PLUS operands. (aarch64_fcmla_laneq<rot>v4hf): Rename to... (aarch64_fcmla_laneq<rot>v4hf<vczle><vczbe>): ... This. Fix canonicalization of PLUS operands. (aarch64_fcmlaq_lane<rot><mode>): Fix canonicalization of PLUS operands. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_9.c: New test.
2023-05-25	arm: Implement ACLE Data Intrinsics	Chris Sidebottom	3	-4/+83
	This patch implements a number of scalar data processing intrinsics from ACLE that were requested by some users. Some of these have fast single-instruction sequences for Armv6 and later, but even for earlier versions they can still emit an inline sequence or a call to libgcc (and ACLE recommends them being unconditionally available). Chris Sidebottom wrote most of the patch, I just cleaned it up, wired up some builtins and adjusted the tests. Bootstrapped and tested on arm-none-linux-gnueabihf. Co-authored-by: Chris Sidebottom <chris.sidebottom@arm.com> gcc/ChangeLog: * config/arm/arm.md (rbitsi2): Rename to... (arm_rbit): ... This. (ctzsi2): Adjust for the above. (arm_rev16si2): Convert to define_expand. (arm_rev16si2_alt1): New pattern. (arm_rev16si2_alt): Rename to... (arm_rev16si2_alt2): ... This. config/arm/arm_acle.h (__ror, __rorl, __rorll, __clz, __clzl, __clzll, __cls, __clsl, __clsll, __revsh, __rev, __revl, __revll, __rev16, __rev16l, __rev16ll, __rbit, __rbitl, __rbitll): Define intrinsics. * config/arm/arm_acle_builtins.def (rbit, rev16si2): Define builtins. gcc/testsuite/ChangeLog: * gcc.target/arm/acle/data-intrinsics-armv6.c: New test. * gcc.target/arm/acle/data-intrinsics-assembly.c: New test. * gcc.target/arm/acle/data-intrinsics-rbit.c: New test. * gcc.target/arm/acle/data-intrinsics.c: New test.
2023-05-25	arm: Fix ICE due to infinite splitting [PR109800]	Alex Coplan	2	-4/+5
	In r11-966-g9a182ef9ee011935d827ab5c6c9a7cd8e22257d8 we introduce a simplification to emit_move_insn that attempts to simplify moves of the form: (set (subreg:M1 (reg:M2 ...)) (constant C)) where M1 and M2 are of equal mode size. That is problematic for the splitter vfp.md:no_literal_pool_df_immediate in the arm backend, which tries to pun an lvalue DFmode pseudo into DImode and assign a constant to it with emit_move_insn, as the new transformation simply undoes this, and we end up splitting indefinitely. This patch changes things around in the arm backend so that we use a DImode temporary (instead of DFmode) and first load the DImode constant into the pseudo, and then pun the pseudo into DFmode as an rvalue in a reg -> reg move. I believe this should be semantically equivalent but avoids the pathalogical behaviour seen in the PR. gcc/ChangeLog: PR target/109800 * config/arm/arm.md (movdf): Generate temporary pseudo in DImode instead of DFmode. * config/arm/vfp.md (no_literal_pool_df_immediate): Rather than punning an lvalue DFmode pseudo into DImode, use a DImode pseudo and pun it into DFmode as an rvalue. gcc/testsuite/ChangeLog: PR target/109800 * gcc.target/arm/pure-code/pr109800.c: New test.
2023-05-25	arc: Make TLS Local Dynamic work like Global Dynamic model	Claudiu Zissulescu	1	-23/+1
	Current ARC's TLS Local Dynamic model is using two anchors to access data, namely `.tdata` and `.tbss`. This implementation is unnecessary complicated. However, the TLS Local Dynamic model has better results using Global Dynamic model and anchors. gcc/ChangeLog; * config/arc/arc.cc (arc_call_tls_get_addr): Simplify access using TLS Local Dynamic. Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
2023-05-25	[aarch64] Ignore cost of scalar moves for seq in vector initialization.	Prathamesh Kulkarni	1	-2/+42
	gcc/ChangeLog: * config/aarch64/aarch64.cc (scalar_move_insn_p): New function. (seq_cost_ignoring_scalar_moves): Likewise. (aarch64_expand_vector_init): Call seq_cost_ignoring_scalar_moves.
2023-05-25	aarch64: Implement vector FP absolute compare intrinsics with builtins	Kyrylo Tkachov	1	-24/+24
	While optimising some vector math library code with intrinsics we stumbled upon the issue in the testcase. The compiler should be generating a FACGT instruction but instead we generate: foo(__Float32x4_t, __Float32x4_t, __Float32x4_t): fabs v0.4s, v0.4s adrp x0, .LC0 ldr q31, [x0, #:lo12:.LC0] fcmgt v0.4s, v0.4s, v31.4s ret This is because the vcagtq_f32 intrinsic is open-coded in arm_neon.h as return vabsq_f32 (__a) > vabsq_f32 (__b) thus relying on the optimisers to merge it back together. But since one of the arms of the comparison is a vector constant the combine pass optimises the abs into it and tries matching: (set (reg:V4SI 101) (neg:V4SI (gt:V4SI (reg:V4SF 100) (const_vector:V4SF [ (const_double:SF 1.0e+2 [0x0.c8p+7]) repeated x4 ])))) and (set (reg:V4SI 101) (neg:V4SI (gt:V4SI (abs:V4SF (reg:V4SF 104)) (reg:V4SF 103)))) instead of what we want: (insn 13 9 14 2 (set (reg/i:V4SI 32 v0) (neg:V4SI (gt:V4SI (abs:V4SF (reg:V4SF 98)) (abs:V4SF (reg:V4SF 96))))) I don't really see a good way around that with our current implementation of these intrinsics. Therefore this patch reimplements these intrinsics with aarch64 builtins that generate the RTL for these instructions directly. Apparently we already had them defined in aarch64-simd-builtins.def and have been using them for the fp16 case already. I realise that this approach is against the general principle of expressing intrinsics in the higher-level constructs, so I'm willing to listen to counter-arguments. That said, the FACGT/FACGE instructions are as fast as the non-ABS comparison instructions on all microarchitectures that I know of so it should always be a win to have them in the merged form rather than split the fabs step separately or try to hoist it. And the testcase does come from real library code that we're trying to optimise. With this patch for the testcase we generate: foo: adrp x0, .LC0 ldr q31, [x0, #:lo12:.LC0] facgt v0.4s, v0.4s, v31.4s ret gcc/ChangeLog: * config/aarch64/arm_neon.h (vcage_f64): Reimplement with builtins. (vcage_f32): Likewise. (vcages_f32): Likewise. (vcageq_f32): Likewise. (vcaged_f64): Likewise. (vcageq_f64): Likewise. (vcagts_f32): Likewise. (vcagt_f32): Likewise. (vcagt_f64): Likewise. (vcagtq_f32): Likewise. (vcagtd_f64): Likewise. (vcagtq_f64): Likewise. (vcale_f32): Likewise. (vcale_f64): Likewise. (vcaled_f64): Likewise. (vcales_f32): Likewise. (vcaleq_f32): Likewise. (vcaleq_f64): Likewise. (vcalt_f32): Likewise. (vcalt_f64): Likewise. (vcaltd_f64): Likewise. (vcaltq_f32): Likewise. (vcaltq_f64): Likewise. (vcalts_f32): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/facgt_constpool_1.c: New test.
2023-05-25	i386: Fix incorrect intrinsic signature for AVX512 s{lli\|rai\|rli}	Hu, Lin1	4	-167/+200
	This patch aims to fix incorrect intrinsic signature for _mm{512\|256\|}_s{lli\|rai\|rli}_epi. gcc/ChangeLog: PR target/109173 PR target/109174 config/i386/avx512bwintrin.h (_mm512_srli_epi16): Change type from int to const int or const int to const unsigned int. (_mm512_mask_srli_epi16): Ditto. (_mm512_slli_epi16): Ditto. (_mm512_mask_slli_epi16): Ditto. (_mm512_maskz_slli_epi16): Ditto. (_mm512_srai_epi16): Ditto. (_mm512_mask_srai_epi16): Ditto. (_mm512_maskz_srai_epi16): Ditto. * config/i386/avx512fintrin.h (_mm512_slli_epi64): Ditto. (_mm512_mask_slli_epi64): Ditto. (_mm512_maskz_slli_epi64): Ditto. (_mm512_srli_epi64): Ditto. (_mm512_mask_srli_epi64): Ditto. (_mm512_maskz_srli_epi64): Ditto. (_mm512_srai_epi64): Ditto. (_mm512_mask_srai_epi64): Ditto. (_mm512_maskz_srai_epi64): Ditto. (_mm512_slli_epi32): Ditto. (_mm512_mask_slli_epi32): Ditto. (_mm512_maskz_slli_epi32): Ditto. (_mm512_srli_epi32): Ditto. (_mm512_mask_srli_epi32): Ditto. (_mm512_maskz_srli_epi32): Ditto. (_mm512_srai_epi32): Ditto. (_mm512_mask_srai_epi32): Ditto. (_mm512_maskz_srai_epi32): Ditto. * config/i386/avx512vlbwintrin.h (_mm256_mask_srai_epi16): Ditto. (_mm256_maskz_srai_epi16): Ditto. (_mm_mask_srai_epi16): Ditto. (_mm_maskz_srai_epi16): Ditto. (_mm256_mask_slli_epi16): Ditto. (_mm256_maskz_slli_epi16): Ditto. (_mm_mask_slli_epi16): Ditto. (_mm_maskz_slli_epi16): Ditto. (_mm_maskz_srli_epi16): Ditto. * config/i386/avx512vlintrin.h (_mm256_mask_srli_epi32): Ditto. (_mm256_maskz_srli_epi32): Ditto. (_mm_mask_srli_epi32): Ditto. (_mm_maskz_srli_epi32): Ditto. (_mm256_mask_srli_epi64): Ditto. (_mm256_maskz_srli_epi64): Ditto. (_mm_mask_srli_epi64): Ditto. (_mm_maskz_srli_epi64): Ditto. (_mm256_mask_srai_epi32): Ditto. (_mm256_maskz_srai_epi32): Ditto. (_mm_mask_srai_epi32): Ditto. (_mm_maskz_srai_epi32): Ditto. (_mm256_srai_epi64): Ditto. (_mm256_mask_srai_epi64): Ditto. (_mm256_maskz_srai_epi64): Ditto. (_mm_srai_epi64): Ditto. (_mm_mask_srai_epi64): Ditto. (_mm_maskz_srai_epi64): Ditto. (_mm_mask_slli_epi32): Ditto. (_mm_maskz_slli_epi32): Ditto. (_mm_mask_slli_epi64): Ditto. (_mm_maskz_slli_epi64): Ditto. (_mm256_mask_slli_epi32): Ditto. (_mm256_maskz_slli_epi32): Ditto. (_mm256_mask_slli_epi64): Ditto. (_mm256_maskz_slli_epi64): Ditto. gcc/testsuite/ChangeLog: PR target/109173 PR target/109174 * gcc.target/i386/pr109173-1.c: New test. * gcc.target/i386/pr109174-1.c: Ditto.
2023-05-25	RISC-V: Remove FRM_REGNUM dependency for rtx conversions	Juzhe-Zhong	1	-9/+3
	According to RVV ISA: The conversions use the dynamic rounding mode in frm, except for the rtz variants, which round towards zero. So rtz conversion patterns should not have FRM dependency. We can't support mode switching for FRM yet since rvv intrinsic doc is not updated but I think this patch is correct. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM_REGNUM dependency in rtz instructions. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2023-05-24	Fix sprintf length warning	Jan-Benedict Glaw	1	-1/+1
	One of the supplied argument strings is unneccesarily long (c-sky, using basically the same code, fixed it to a shorter length) and this fixes overflow warnings, as GCC fails to deduce that the full 256 bytes for load_op[] are not used at all. gcc/ChangeLog: * config/mcore/mcore.cc (output_inline_const) Make buffer smaller to silence overflow warnings later on.
2023-05-24	i386: Add v<any_shift:insn>v4qi3 expander	Uros Bizjak	3	-17/+27
	Also, move v<any_shift:insn>v8qi3 expander to a better place and enable it with TARGET_MMX_WITH_SSE. Remove handling of V8QImode from ix86_expand_vecop_qihi2 since all partial QI->HI vector modes expand via ix86_expand_vecop_qihi_partial. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Remove handling of V8QImode. * config/i386/mmx.md (v<insn>v8qi3): Move from sse.md. Call ix86_expand_vecop_qihi_partial. Enable for TARGET_MMX_WITH_SSE. (v<insn>v4qi3): Ditto. * config/i386/sse.md (v<insn>v8qi3): Remove. gcc/testsuite/ChangeLog: * gcc.target/i386/vect-shiftv4qi.c (dg-options): Remove -ftree-vectorize. * gcc.target/i386/vect-shiftv8qi.c (dg-options): Ditto. * gcc.target/i386/vect-vshiftv4qi.c: New test. * gcc.target/i386/vect-vshiftv8qi.c: New test.
2023-05-24	aarch64: PR target/99195 Annotate vector shift patterns for vec-concat-zero	Kyrylo Tkachov	1	-9/+9
	Continuing the series of straightforward annotations, this one handles the normal (not widening or narrowing) vector shifts. Tests included. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (aarch64_simd_lshr<mode>): Rename to... (aarch64_simd_lshr<mode><vczle><vczbe>): ... This. (aarch64_simd_ashr<mode>): Rename to... (aarch64_simd_ashr<mode><vczle><vczbe>): ... This. (aarch64_simd_imm_shl<mode>): Rename to... (aarch64_simd_imm_shl<mode><vczle><vczbe>): ... This. (aarch64_simd_reg_sshl<mode>): Rename to... (aarch64_simd_reg_sshl<mode><vczle><vczbe>): ... This. (aarch64_simd_reg_shl<mode>_unsigned): Rename to... (aarch64_simd_reg_shl<mode>_unsigned<vczle><vczbe>): ... This. (aarch64_simd_reg_shl<mode>_signed): Rename to... (aarch64_simd_reg_shl<mode>_signed<vczle><vczbe>): ... This. (vec_shr_<mode>): Rename to... (vec_shr_<mode><vczle><vczbe>): ... This. (aarch64_<sur>shl<mode>): Rename to... (aarch64_<sur>shl<mode><vczle><vczbe>): ... This. (aarch64_<sur>q<r>shl<mode>): Rename to... (aarch64_<sur>q<r>shl<mode><vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: Add testing for shifts. * gcc.target/aarch64/simd/pr99195_6.c: Likewise. * gcc.target/aarch64/simd/pr99195_8.c: New test.
2023-05-24	target/109944 - avoid STLF fail for V16QImode CTOR expansion	Richard Biener	1	-5/+6
	The following dispatches to V2DImode CTOR expansion instead of using sets of (subreg:DI (reg:V16QI 146) [08]) which causes LRA to spill DImode and reload V16QImode. The same applies for V8QImode or V4HImode construction from SImode parts which happens during 32bit libgcc build. PR target/109944 * config/i386/i386-expand.cc (ix86_expand_vector_init_general): Perform final vector composition using ix86_expand_vector_init_general instead of setting the highpart and lowpart which causes spilling. * gcc.target/i386/pr109944-1.c: New testcase. * gcc.target/i386/pr109944-2.c: Likewise.
2023-05-24	RISC-V: Add FRM_ prefix to dynamic rounding mode enum	Juzhe-Zhong	1	-1/+1
	An obvious fix to make all enum naming consistent. gcc/ChangeLog: * config/riscv/riscv-protos.h (enum frm_field_enum): Add FRM_ prefix. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2023-05-24	arm: PR target/109939 Correct signedness of return type of __ssat intrinsics	Kyrylo Tkachov	1	-1/+1
	As the PR says we shouldn't be using qualifier_unsigned for the return type of the __ssat intrinsics. UNSIGNED_SAT_BINOP_UNSIGNED_IMM_QUALIFIERS already exists for that. This was just a thinko. This patch fixes this and the warning with -Wconversion goes away. Bootstrapped and tested on arm-none-linux-gnueabihf. gcc/ChangeLog: PR target/109939 * config/arm/arm-builtins.cc (SAT_BINOP_UNSIGNED_IMM_QUALIFIERS): Use qualifier_none for the return operand. gcc/testsuite/ChangeLog: PR target/109939 * gcc.target/arm/pr109939.c: New test.
2023-05-24	RISC-V: Add RVV mask logic auto-vectorization	Juzhe-Zhong	2	-3/+103
	This patch is adding mask logic auto-vectorization, define the pattern as "define_insn_and_split" to allow combine PASS easily combine series instructions. For example: combine vmxor.mm + vmnot.m into vmxnor.mm Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/autovec.md (<optab><mode>3): New pattern. (one_cmpl<mode>2): Ditto. (<optab>not<mode>): Ditto. (n<optab><mode>): Ditto. * config/riscv/riscv-v.cc (expand_vec_cmp_float): Change to one_cmpl. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cmp/vcond-4.c: New test. * gcc.target/riscv/rvv/autovec/cmp/vcond_run-4.c: New test.
2023-05-24	RISC-V: Add RVV comparison autovectorization	Juzhe-Zhong	3	-0/+376
	This patch enable RVV auto-vectorization including floating-point unorder and order comparison. The testcases are leveraged from Richard. So include Richard as co-author. And this patch is the prerequisite patch for my current middle-end work. Without this patch, I can't support len_mask_xxx middle-end pattern since the mask is generated by comparison. For example, for (int i...; i < n.) if (cond[i]) a[i] = b[i] We need len_mask_load/len_mask_store for such code and I am gonna support them in the middle-end after this patch is merged. Both integer && floating (order and unorder) are tested. built && regression passed. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com> gcc/ChangeLog: * config/riscv/autovec.md (@vcond_mask_<mode><vm>): New pattern. (vec_cmp<mode><vm>): New pattern. (vec_cmpu<mode><vm>): New pattern. (vcond<V:mode><VI:mode>): New pattern. (vcondu<V:mode><VI:mode>): New pattern. * config/riscv/riscv-protos.h (enum insn_type): Add new enum. (emit_vlmax_merge_insn): New function. (emit_vlmax_cmp_insn): Ditto. (emit_vlmax_cmp_mu_insn): Ditto. (expand_vec_cmp): Ditto. (expand_vec_cmp_float): Ditto. (expand_vcond): Ditto. * config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto. (emit_vlmax_cmp_insn): Ditto. (emit_vlmax_cmp_mu_insn): Ditto. (get_cmp_insn_code): Ditto. (expand_vec_cmp): Ditto. (expand_vec_cmp_float): Ditto. (expand_vcond): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: * gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test. * gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test. * gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test. * gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.
2023-05-24	RISC-V: Support RVV VREINTERPRET from vbool_t to vuintm1_t	Pan Li	5	-0/+156
	This patch support the RVV VREINTERPRET from the vbool_t to the vuintm1_t. Aka: vuintm1_t __riscv_vreinterpret_x_x(vbool_t); These APIs help the users to convert vector the vbool_t to the LMUL=1 unsigned integer vint_t. According to the RVV intrinsic SPEC as below, the reinterpret intrinsics only change the types of the underlying contents. https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1 For example, given below code. vuint8m1_t test_vreinterpret_v_b1_vuint8m1 (vbool1_t src) { return __riscv_vreinterpret_v_b1_u8m1 (src); } It will generate the assembly code similar as below: vsetvli a5,zero,e8,m8,ta,ma vlm.v v1,0(a1) vs1r.v v1,0(a0) ret Please NOTE the test files doesn't cover all the possible combinations of the intrinsic APIs introduced by this PATCH due to too many. This is the last PATCH for the reinterpret between the signed/unsigned and the bool vector types. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc (main): Add unsigned_eew_lmul1_interpret for indexer. config/riscv/riscv-vector-builtins-functions.def (vreinterpret): Register vuintm1_t interpret function. config/riscv/riscv-vector-builtins-types.def (DEF_RVV_UNSIGNED_EEW8_LMUL1_INTERPRET_OPS): New macro for vuint8m1_t. (DEF_RVV_UNSIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise. (DEF_RVV_UNSIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise. (DEF_RVV_UNSIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise. (vbool1_t): Add to unsigned_eew_interpret_ops. (vbool2_t): Likewise. (vbool4_t): Likewise. (vbool8_t): Likewise. (vbool16_t): Likewise. (vbool32_t): Likewise. (vbool64_t): Likewise. config/riscv/riscv-vector-builtins.cc (DEF_RVV_UNSIGNED_EEW8_LMUL1_INTERPRET_OPS): New macro for vuintm1_t. (DEF_RVV_UNSIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise. (DEF_RVV_UNSIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise. (DEF_RVV_UNSIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise. (required_extensions_p): Add vuintm1_t interpret case. * config/riscv/riscv-vector-builtins.def (unsigned_eew8_lmul1_interpret): Add vuintm1_t interpret to base type. (unsigned_eew16_lmul1_interpret): Likewise. (unsigned_eew32_lmul1_interpret): Likewise. (unsigned_eew64_lmul1_interpret): Likewise. gcc/testsuite/ChangeLog: gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Enrich test cases.
2023-05-24	RISC-V: Support RVV VREINTERPRET from vbool_t to vintm1_t	Pan Li	5	-0/+157
	This patch support the RVV VREINTERPRET from the vbool_t to the vintm1_t. Aka: vintm1_t __riscv_vreinterpret_x_x(vbool_t); These APIs help the users to convert vector the vbool_t to the LMUL=1 signed integer vint_t. According to the RVV intrinsic SPEC as below, the reinterpret intrinsics only change the types of the underlying contents. https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1 For example, given below code. vint8m1_t test_vreinterpret_v_b1_vint8m1 (vbool1_t src) { return __riscv_vreinterpret_v_b1_i8m1 (src); } It will generate the assembly code similar as below: vsetvli a5,zero,e8,m8,ta,ma vlm.v v1,0(a1) vs1r.v v1,0(a0) ret Please NOTE the test files doesn't cover all the possible combinations of the intrinsic APIs introduced by this PATCH due to too many. The reinterpret from vbool_t to vuintm1_t with lmul=1 will be coverred in another PATCH. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc (EEW_SIZE_LIST): New macro for the eew size list. (LMUL1_LOG2): New macro for the log2 value of lmul=1. (main): Add signed_eew_lmul1_interpret for indexer. config/riscv/riscv-vector-builtins-functions.def (vreinterpret): Register vintm1_t interpret function. config/riscv/riscv-vector-builtins-types.def (DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS): New macro for vint8m1_t. (DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise. (DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise. (DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise. (vbool1_t): Add to signed_eew_interpret_ops. (vbool2_t): Likewise. (vbool4_t): Likewise. (vbool8_t): Likewise. (vbool16_t): Likewise. (vbool32_t): Likewise. (vbool64_t): Likewise. config/riscv/riscv-vector-builtins.cc (DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS): New macro for vintm1_t. (DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise. (DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise. (DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise. (required_extensions_p): Add vint8m1_t interpret case. config/riscv/riscv-vector-builtins.def (signed_eew8_lmul1_interpret): Add vintm1_t interpret to base type. (signed_eew16_lmul1_interpret): Likewise. (signed_eew32_lmul1_interpret): Likewise. (signed_eew64_lmul1_interpret): Likewise. gcc/testsuite/ChangeLog: gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Enrich the test cases.
2023-05-24	RISC-V: Fix incorrect code of reaching inaccessible memory address	Juzhe-Zhong	4	-17/+25
	To fix this issue, we seperate Vl operand and normal operands. gcc/ChangeLog: * config/riscv/autovec.md: Adjust for new interface. * config/riscv/riscv-protos.h (emit_vlmax_insn): Add VL operand. (emit_nonvlmax_insn): Add AVL operand. * config/riscv/riscv-v.cc (emit_vlmax_insn): Add VL operand. (emit_nonvlmax_insn): Add AVL operand. (sew64_scalar_helper): Adjust for new interface. (expand_tuple_move): Ditto. * config/riscv/vector.md: Ditto. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2023-05-24	RISC-V: Fix magic number of RVV auto-vectorization expander	Juzhe-Zhong	2	-29/+26
	This simple patch fixes the magic number, remove magic number make codes more reasonable. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_series): Remove magic number. (expand_const_vector): Ditto. (legitimize_move): Ditto. (sew64_scalar_helper): Ditto. (expand_tuple_move): Ditto. (expand_vector_init_insert_elems): Ditto. * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2023-05-24	Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple ABS_EXPR.	liuhongt	2	-33/+71
	Also for 64-bit vector abs intrinsics _mm_abs_{pi8,pi16,pi32}. gcc/ChangeLog: PR target/109900 * config/i386/i386.cc (ix86_gimple_fold_builtin): Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} and _mm_abs_{pi8,pi16,pi32} into gimple ABS_EXPR. (ix86_masked_all_ones): Handle 64-bit mask. * config/i386/i386-builtin.def: Replace icode of related non-mask simd abs builtins with CODE_FOR_nothing. gcc/testsuite/ChangeLog: * gcc.target/i386/pr109900.c: New test.
2023-05-23	xtensa: Merge 'addx' and 'subx' insn patterns into one	Takayuki 'January June' Suwa	1	-18/+13
	By making use of the 'addsub_operator' added in the last patch. gcc/ChangeLog: * config/xtensa/xtensa.md (addsubx): Rename from 'addx', and change to also accept 'subx' pattern. (subx): Remove.
2023-05-23	xtensa: Optimize '(x & CST1_POW2) != 0 ? CST2_POW2 : 0'	Takayuki 'January June' Suwa	3	-1/+88
	This patch decreses one machine instruction from "single bit extraction with shifting" operation, and tries to eliminate the conditional branch if CST2_POW2 doesn't fit into signed 12 bits with the help of ifcvt optimization. /* example #1 / int test0(int x) { return (x & 1048576) != 0 ? 1024 : 0; } extern int foo(void); int test1(void) { return (foo() & 1048576) != 0 ? 16777216 : 0; } ;; before test0: movi a9, 0x400 srai a2, a2, 10 and a2, a2, a9 ret.n test1: addi sp, sp, -16 s32i.n a0, sp, 12 call0 foo extui a2, a2, 20, 1 slli a2, a2, 20 beqz.n a2, .L2 movi.n a2, 1 slli a2, a2, 24 .L2: l32i.n a0, sp, 12 addi sp, sp, 16 ret.n ;; after test0: extui a2, a2, 20, 1 slli a2, a2, 10 ret.n test1: addi sp, sp, -16 s32i.n a0, sp, 12 call0 foo l32i.n a0, sp, 12 extui a2, a2, 20, 1 slli a2, a2, 24 addi sp, sp, 16 ret.n In addition, if the left shift amount ('exact_log2(CST2_POW2)') is between 1 through 3 and a either addition or subtraction with another register follows, emit a ADDX[248] or SUBX[248] machine instruction instead of separate left shift and add/subtract ones. / example #2 / int test2(int x, int y) { return ((x & 1048576) != 0 ? 4 : 0) + y; } int test3(int x, int y) { return ((x & 2) != 0 ? 8 : 0) - y; } ;; before test2: movi.n a9, 4 srai a2, a2, 18 and a2, a2, a9 add.n a2, a2, a3 ret.n test3: movi.n a9, 8 slli a2, a2, 2 and a2, a2, a9 sub a2, a2, a3 ret.n ;; after test2: extui a2, a2, 20, 1 addx4 a2, a2, a3 ret.n test3: extui a2, a2, 1, 1 subx8 a2, a2, a3 ret.n gcc/ChangeLog: config/xtensa/predicates.md (addsub_operator): New. * config/xtensa/xtensa.md (extzvsi-1bit_ashlsi3, extzvsi-1bit_addsubx): New insn_and_split patterns. * config/xtensa/xtensa.cc (xtensa_rtx_costs): Add a special case about ifcvt 'noce_try_cmove()' to handle constant loads that do not fit into signed 12 bits in the patterns added above.
2023-05-23	Improve cost computation for single-bit bit insertions.	Georg-Johann Lay	1	-0/+48
	Some miscomputation of rtx_costs lead to sub-optimal code for single-bit bit insertions. This patch implements TARGET_INSN_COST, which has a chance to see the whole insn during insn combination; in partictlar the SET_DEST of (set (zero_extract (...) ...)). gcc/ * config/avr/avr.cc (avr_insn_cost): New static function. (TARGET_INSN_COST): Define to that function.
2023-05-23	Account for vector splat GPR->XMM move cost	Richard Biener	1	-2/+4
	The following also accounts for a GPR->XMM move cost for splat operations and properly guards eliding the cost when moving from memory only for SSE4.1 or HImode or larger operands. This doesn't fix the PR fully yet. PR target/109944 * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): For vector construction or splats apply GPR->XMM move costing. QImode memory can be handled directly only with SSE4.1 pinsrb.
2023-05-23	i386: Add V8QI and V4QImode partial vector shift operations	Uros Bizjak	3	-3/+69
	Add V8QImode and V4QImode vector shift patterns that call into ix86_expand_vecop_qihi_partial. Generate special sequences for constant count operands. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial): Call ix86_expand_vec_shift_qihi_constant for shifts with constant count operand. * config/i386/i386.cc (ix86_shift_rotate_cost): Handle V4QImode and V8QImode. * config/i386/mmx.md (<insn>v8qi3): New insn pattern. (<insn>v4qi3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/vect-shiftv4qi.c: New test. * gcc.target/i386/vect-shiftv8qi.c: New test.
2023-05-23	RISC-V: Fix warning of vxrm pattern	Juzhe-Zhong	1	-1/+1
	I just notice the warning: ../../../riscv-gcc/gcc/config/riscv/vector.md:618:1: warning: source missing a mode? gcc/ChangeLog: * config/riscv/vector.md: Add mode. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2023-05-23	aarch64: Provide FPR alternatives for some bit insertions [PR109632]	Richard Sandiford	3	-0/+78
	At -O2, and so with SLP vectorisation enabled: struct complx_t { float re, im; }; complx_t add(complx_t a, complx_t b) { return {a.re + b.re, a.im + b.im}; } generates: fmov w3, s1 fmov x0, d0 fmov x1, d2 fmov w2, s3 bfi x0, x3, 32, 32 fmov d31, x0 bfi x1, x2, 32, 32 fmov d30, x1 fadd v31.2s, v31.2s, v30.2s fmov x1, d31 lsr x0, x1, 32 fmov s1, w0 lsr w0, w1, 0 fmov s0, w0 ret This is because complx_t is passed and returned in FPRs, but GCC gives it DImode. We therefore “need” to assemble a DImode pseudo from the two individual floats, bitcast it to a vector, do the arithmetic, bitcast it back to a DImode pseudo, then extract the individual floats. There are many problems here. The most basic is that we shouldn't use SLP for such a trivial example. But SLP should in principle be beneficial for more complicated examples, so preventing SLP for the example above just changes the reproducer needed. A more fundamental problem is that it doesn't make sense to use single DImode pseudos in a testcase like this. I have a WIP patch to allow re and im to be stored in individual SFmode pseudos instead, but it's quite an invasive change and might end up going nowhere. A simpler problem to tackle is that we allow DImode pseudos to be stored in FPRs, but we don't provide any patterns for inserting values into them, even though INS makes that easy for element-like insertions. This patch adds some patterns for that. Doing that showed that aarch64_modes_tieable_p was too strict: it didn't allow SFmode and DImode values to be tied, even though both of them occupy a single GPR and FPR, and even though we allow both classes to change between the modes. The aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS> pattern is especially ugly, but it's not clear what target-independent code ought to simplify it to, if it was going to simplify it. We should probably do the same thing for extractions, but that's left as future work. After the patch we generate: ins v0.s[1], v1.s[0] ins v2.s[1], v3.s[0] fadd v0.2s, v0.2s, v2.2s fmov x0, d0 ushr d1, d0, 32 lsr w0, w0, 0 fmov s0, w0 ret which seems like a step in the right direction. All in all, there's nothing elegant about this patchh. It just seems like the least worst option. gcc/ PR target/109632 config/aarch64/aarch64.cc (aarch64_modes_tieable_p): Allow subregs between any scalars that are 64 bits or smaller. * config/aarch64/iterators.md (SUBDI_BITS): New int iterator. (bits_etype): New int attribute. * config/aarch64/aarch64.md (insv_reg<mode>_<SUBDI_BITS>) (aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): New patterns. (aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>): Likewise. gcc/testsuite/ gcc.target/aarch64/ins_bitfield_1.c: New test. * gcc.target/aarch64/ins_bitfield_2.c: Likewise. * gcc.target/aarch64/ins_bitfield_3.c: Likewise. * gcc.target/aarch64/ins_bitfield_4.c: Likewise. * gcc.target/aarch64/ins_bitfield_5.c: Likewise. * gcc.target/aarch64/ins_bitfield_6.c: Likewise.
2023-05-23	RISC-V: Refactor the framework of RVV auto-vectorization	Juzhe-Zhong	5	-229/+232
	This patch is to refactor the framework of RVV auto-vectorization. Since we find out are keep adding helpers && wrappers when implementing auto-vectorization. It will make the RVV auto-vectorizaiton very messy. After double check my downstream RVV GCC, assemble all auto-vectorization patterns we are going to have. Base on these informations, I refactor the RVV framework to make it is easier and flexible for future use. For example, we will definitely implement len_mask_load/len_mask_store patterns which have both length && mask operand and use undefine merge operand. len_cond_div or cond_div will have length or mask operand and use a real merge operand instead of undefine merge operand. Also, we will have some patterns will use tail undisturbed and mask any. etc..... We will defintely have various features. Base on these circumstances, we add these following private members: int m_op_num; /* It't true when the pattern has a dest operand. Most of the patterns have dest operand wheras some patterns like STOREs does not have dest operand. / bool m_has_dest_p; bool m_fully_unmasked_p; bool m_use_real_merge_p; bool m_has_avl_p; bool m_vlmax_p; bool m_has_tail_policy_p; bool m_has_mask_policy_p; enum tail_policy m_tail_policy; enum mask_policy m_mask_policy; machine_mode m_dest_mode; machine_mode m_mask_mode; These variables I believe can cover all potential situations. And the instruction generater wrapper is "emit_insn" which will add operands and emit instruction according to the variables I mentioned above. After this is done. We will easily add helpers without changing any base class "insn_expand". Currently, we have "emit_vlmax_tany_many" and "emit_nonvlmax_tany_many". For example, when we want to emit a binary operations: We have Then just use emit_vlmax_tany_many (...RVV_BINOP_NUM...) So, if we support ternary operation in the future. It's quite simple: emit_vlmax_tany_many (...RVV_BINOP_NUM...) "_tany_many" means we are using tail any and mask any. We will definitely need tail undisturbed or mask undisturbed when we support these patterns in middle-end. It's very simple to extend such helper base on current framework: we can do that in the future like this: void emit_nonvlmax_tu_mu (unsigned icode, int op_num, rtx ops) { machine_mode data_mode = GET_MODE (ops[0]); machine_mode mask_mode = get_mask_mode (data_mode).require (); / The number = 11 is because we have maximum 11 operands for RVV instruction patterns according to vector.md. / insn_expander<11> e (/OP_NUM/ op_num, /HAS_DEST_P/ true, /USE_ALL_TRUES_MASK_P/ true, /USE_UNDEF_MERGE_P/ true, /HAS_AVL_P/ true, /VLMAX_P/ false, /HAS_TAIL_POLICY_P/ true, /HAS_MASK_POLICY_P/ true, /TAIL_POLICY/ TAIL_UNDISTURBED, /MASK_POLICY/ MASK_UNDISTURBED, /DEST_MODE/ data_mode, /MASK_MODE/ mask_mode); e.emit_insn ((enum insn_code) icode, ops); } That's enough (I have tested it fully in my downstream RVV GCC). I didn't add it in this patch. Thanks. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: config/riscv/autovec.md: Refactor the framework of RVV auto-vectorization. * config/riscv/riscv-protos.h (RVV_MISC_OP_NUM): Ditto. (RVV_UNOP_NUM): New macro. (RVV_BINOP_NUM): Ditto. (legitimize_move): Refactor the framework of RVV auto-vectorization. (emit_vlmax_op): Ditto. (emit_vlmax_reg_op): Ditto. (emit_len_op): Ditto. (emit_len_binop): Ditto. (emit_vlmax_tany_many): Ditto. (emit_nonvlmax_tany_many): Ditto. (sew64_scalar_helper): Ditto. (expand_tuple_move): Ditto. * config/riscv/riscv-v.cc (emit_pred_op): Ditto. (emit_pred_binop): Ditto. (emit_vlmax_op): Ditto. (emit_vlmax_tany_many): New function. (emit_len_op): Remove. (emit_nonvlmax_tany_many): New function. (emit_vlmax_reg_op): Remove. (emit_len_binop): Ditto. (emit_index_op): Ditto. (expand_vec_series): Refactor the framework of RVV auto-vectorization. (expand_const_vector): Ditto. (legitimize_move): Ditto. (sew64_scalar_helper): Ditto. (expand_tuple_move): Ditto. (expand_vector_init_insert_elems): Ditto. * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto. * config/riscv/vector.md: Ditto.
2023-05-23	aarch64: PR target/109855 Add predicate and constraints to define_subst in ↵	Kyrylo Tkachov	1	-2/+2
	aarch64-simd.md In this PR we ICE because the substituted pattern for mla "lost" its predicate and constraint for operand 0 because the define_subst template: [(set (match_operand:<VDBL> 0) (vec_concat:<VDBL> (match_dup 1) (match_operand:VDZ 2 "aarch64_simd_or_scalar_imm_zero")))]) Uses match_operand instead of match_dup for operand 0. We can't use match_dup 0 for it because we need to specify the widened mode. The problem is fixed by adding a "register_operand" predicate and "=w" constraint to the match_operand. This makes sense conceptually too as the transformation we're targeting only applies to instructions that write a "w" register. With this change the mddump pattern that ICEs goes from: (define_insn ("aarch64_mlav4hi_vec_concatz_le") [ (set (match_operand:V8HI 0 ("") ("")) <<------ Missing constraint! (vec_concat:V8HI (plus:V4HI (mult:V4HI (match_operand:V4HI 2 ("register_operand") ("w")) (match_operand:V4HI 3 ("register_operand") ("w"))) (match_operand:V4HI 1 ("register_operand") ("0"))) (match_operand:V4HI 4 ("aarch64_simd_or_scalar_imm_zero") ("")))) ] ("(!BYTES_BIG_ENDIAN) && (TARGET_SIMD)") ("mla\t%0.4h, %2.4h, %3.4h") to the proper: (define_insn ("aarch64_mlav4hi_vec_concatz_le") [ (set (match_operand:V8HI 0 ("register_operand") ("=w")) <<-------- Constraint in the right place (vec_concat:V8HI (plus:V4HI (mult:V4HI (match_operand:V4HI 2 ("register_operand") ("w")) (match_operand:V4HI 3 ("register_operand") ("w"))) (match_operand:V4HI 1 ("register_operand") ("0"))) (match_operand:V4HI 4 ("aarch64_simd_or_scalar_imm_zero") ("")))) ] ("(!BYTES_BIG_ENDIAN) && (TARGET_SIMD)") ("mla\t%0.4h, %2.4h, %3.4h") This seems to do the right thing for multi-alternative patterns as well, the annotated pattern for aarch64_cmltv8qi is: (define_insn ("aarch64_cmltv8qi") [ (set (match_operand:V8QI 0 ("register_operand") ("=w,w")) (neg:V8QI (lt:V8QI (match_operand:V8QI 1 ("register_operand") ("w,w")) (match_operand:V8QI 2 ("aarch64_simd_reg_or_zero") ("w,ZDz"))))) ] whereas the substituted version now looks like: (define_insn ("aarch64_cmltv8qi_vec_concatz_le") [ (set (match_operand:V16QI 0 ("register_operand") ("=w,w")) (vec_concat:V16QI (neg:V8QI (lt:V8QI (match_operand:V8QI 1 ("register_operand") ("w,w")) (match_operand:V8QI 2 ("aarch64_simd_reg_or_zero") ("w,ZDz")))) (match_operand:V8QI 3 ("aarch64_simd_or_scalar_imm_zero") ("")))) ] Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/109855 * config/aarch64/aarch64-simd.md (add_vec_concat_subst_le): Add predicate and constraint for operand 0. (add_vec_concat_subst_be): Likewise. gcc/testsuite/ChangeLog: PR target/109855 * gcc.target/aarch64/pr109855.c: New test.
2023-05-22	i386: Adjust emulated integer vector mode shift costs	Uros Bizjak	1	-34/+64
	Returned integer vector mode costs of emulated instructions in ix86_shift_rotate_cost are wrong and do not reflect generated instruction sequences. Rewrite handling of different integer vector modes and different target ABIs to return real instruction counts in order to calcuate better costs of various emulated modes. Also add the cost of a memory read, when the instruction in the sequence reads memory. gcc/ChangeLog: * config/i386/i386.cc (ix86_shift_rotate_cost): Correct calcuation of integer vector mode costs to reflect generated instruction sequences of different integer vector modes and different target ABIs. Remove "speed" function argument. (ix86_rtx_costs): Update call for removed function argument. (ix86_vector_costs::add_stmt_cost): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/sse2-shiftqihi-constant-1.c: Remove XFAILs.
2023-05-22	i386: Account for the memory read in V*QImode multiplication sequences	Uros Bizjak	1	-8/+23
	Add the cost of a memory read to the cost of VQImode vector mult sequences. gcc/ChangeLog: config/i386/i386.cc (ix86_multiplication_cost): Add the cost of a memory read to the cost of V?QImode sequences.
2023-05-22	RISC-V: Add "m_" prefix for private member	Juzhe-Zhong	1	-12/+12
	Since the current framework is hard to maintain and hard to be used in the future possible auto-vectorization patterns. We will need to keep adding more helpers and arguments during the auto-vectorization supporting. We should refactor the framework now for the future use since the we don't support too much auto-vectorization patterns for now. Start with this simple patch, this patch is adding "m_" prefix for private the members. gcc/ChangeLog: * config/riscv/riscv-v.cc: Add "m_" prefix. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>