aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2023-06-18xtensa: constantsynth: Add new 2-insns synthesis patternTakayuki 'January June' Suwa1-2/+10
This patch adds a new 2-instructions constant synthesis pattern: - A non-negative square value that root can fit into a signed 12-bit: => "MOVI(.N) Ax, simm12" + "MULL Ax, Ax, Ax" Due to the execution cost of the integer multiply instruction (MULL), this synthesis works only when the 32-bit Integer Multiply Option is configured and optimize for size is specified. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_constantsynth_2insn): Add new pattern for the abovementioned case.
2023-06-18xtensa: Remove TARGET_MEMORY_MOVE_COST hookTakayuki 'January June' Suwa1-13/+0
It used to always return a constant 4, which is same as the default behavior, but doesn't take into account the effects of secondary reloads. Therefore, the implementation of this target hook is removed. gcc/ChangeLog: * config/xtensa/xtensa.cc (TARGET_MEMORY_MOVE_COST, xtensa_memory_move_cost): Remove.
2023-06-19rs6000: Enable const_anchor for 'addi'Jiufu Guo1-0/+4
There is a functionality as const_anchor in cse.cc. This const_anchor supports to generate new constants through adding small gap/offsets to existing constant. For example: void __attribute__ ((noinline)) foo (long long *a) { *a++ = 0x2351847027482577LL; *a++ = 0x2351847027482578LL; } The second constant (0x2351847027482578LL) can be compated by adding '1' to the first constant (0x2351847027482577LL). This is profitable if more than one instructions are need to build the second constant. * For rs6000, we can enable this functionality, as the instruction 'addi' is just for this when gap is smaller than 0x8000. * One potential side effect of this feature: Comparing with "r101=0x2351847027482577LL ... r201=0x2351847027482578LL" The new r201 will be "r201=r101+1", and then r101 will live longer, and would increase pressure when allocating registers. But I feel, this would be acceptable for this const_anchor feature. With this feature, for GCC source code and SPEC object files, the significant changes are the improvement that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some other optimizations opportunities: like combine/jump2. While the side effect is also occurring in few cases, but it does not impact overall performance. gcc/ChangeLog: * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define. gcc/testsuite/ChangeLog: * gcc.target/powerpc/const_anchors.c: New test. * gcc.target/powerpc/try_const_anchors_ice.c: New test.
2023-06-19Refined 256/512-bit vpacksswb/vpackssdw patterns.liuhongt1-18/+147
The packing in vpacksswb/vpackssdw is not a simple concat, it's an interweave from src1 and src2 for every 128 bit(or 64-bit for the ss_truncate result). .i.e. dst[192-255] = ss_truncate (src2[128-255]) dst[128-191] = ss_truncate (src1[128-255]) dst[64-127] = ss_truncate (src2[0-127]) dst[0-63] = ss_truncate (src1[0-127] The patch refined those patterns with an extra vec_select for the interweave. gcc/ChangeLog: PR target/110235 * config/i386/sse.md (<sse2_avx2>_packsswb<mask_name>): Substitute with .. (sse2_packsswb<mask_name>): .. this, .. (avx2_packsswb<mask_name>): .. this and .. (avx512bw_packsswb<mask_name>): .. this. (<sse2_avx2>_packssdw<mask_name>): Substitute with .. (sse2_packssdw<mask_name>): .. this, .. (avx2_packssdw<mask_name>): .. this and .. (avx512bw_packssdw<mask_name>): .. this. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-vpackssdw-3.c: New test. * gcc.target/i386/avx512bw-vpacksswb-3.c: New test.
2023-06-19Reimplement packuswb/packusdw with UNSPEC_US_TRUNCATE instead of original ↵liuhongt4-30/+59
us_truncate. packuswb/packusdw does unsigned saturation for signed source, but rtl us_truncate means does unsigned saturation for unsigned source. So for value -1, packuswb will produce 0, but us_truncate produces 255. The patch reimplement those related patterns and functions with UNSPEC_US_TRUNCATE instead of us_truncate. gcc/ChangeLog: PR target/110235 * config/i386/i386-expand.cc (ix86_split_mmx_pack): Use UNSPEC_US_TRUNCATE instead of original us_truncate for packusdw/packuswb. * config/i386/mmx.md (mmx_pack<s_trunsuffix>swb): Substitute with .. (mmx_packsswb): .. this and .. (mmx_packuswb): .. this. (mmx_packusdw): Use UNSPEC_US_TRUNCATE instead of original us_truncate. (s_trunsuffix): Removed code iterator. (any_s_truncate): Ditto. * config/i386/sse.md (<sse2_avx2>_packuswb<mask_name>): Use UNSPEC_US_TRUNCATE instead of original us_truncate. (<sse4_1_avx2>_packusdw<mask_name>): Ditto. * config/i386/i386.md (UNSPEC_US_TRUNCATE): New unspec_c_enum.
2023-06-19RISC-V: Fix one typo for reduc expand GET_MODE_CLASSPan Li1-1/+1
This patch would like to fix one typo when GET_MODE_CLASS by mode. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Fix one typo.
2023-06-18Fix arc assumption that insns are not re-recognizedJeff Law1-1/+7
Testing the V2 version of Manolis's fold-mem-offsets patch exposed a minor bug in the arc backend. The movsf_insn pattern has constraints which allow storing certain constants to memory. reload/lra will target those alternatives under the right circumstances. However the insn's condition requires that one of the two operands must be a register. Thus if a pass were to force re-recognition of the pattern we can get an unrecognized insn failure. This patch adjusts the conditions to more closely match movsi_insn. More specifically it allows storing a constant into a limited set of memory operands (as defined by the Usc constraint). movqi_insn has the same core problem and gets the same solution. Committed after the tester validated there are not regresisons gcc/ * config/arc/arc.md (movqi_insn): Allow certain constants to be stored into memory in the pattern's condition. (movsf_insn): Similarly.
2023-06-18i386: Refactor new ix86_expand_carry to set the carry flag.Roger Sayle3-14/+19
This patch refactors the three places in the i386.md backend that we set the carry flag into a new ix86_expand_carry helper function, that allows Jakub's recently added uaddc<mode>5 and usubc<mode>5 expanders to take advantage of the recently added support for the stc instruction. 2023-06-18 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_carry): New helper function for setting the carry flag. (ix86_expand_builtin) <handlecarry>: Use it here. * config/i386/i386-protos.h (ix86_expand_carry): Prototype here. * config/i386/i386.md (uaddc<mode>5): Use ix86_expand_carry. (usubc<mode>5): Likewise.
2023-06-18i386: Standardize shift amount constants as QImode in i386.md.Roger Sayle1-6/+6
This clean-up improves consistency within i386.md by using QImode for the constant shift count in patterns that specify a mode. 2023-06-18 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (*concat<mode><dwi>3_1): Use QImode for the immediate constant shift count. (*concat<mode><dwi>3_2): Likewise. (*concat<mode><dwi>3_3): Likewise. (*concat<mode><dwi>3_4): Likewise. (*concat<mode><dwi>3_5): Likewise. (*concat<mode><dwi>3_6): Likewise.
2023-06-18RISC-V:Add float16 tuple type supportyulong7-3/+144
This patch adds support for the float16 tuple type. gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc (valid_type): Enable FP16 tuple. * config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro. (ADJUST_ALIGNMENT): Ditto. (RVV_TUPLE_PARTIAL_MODES): Ditto. (ADJUST_NUNITS): Ditto. * config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t): New types. (vfloat16mf4x3_t): Ditto. (vfloat16mf4x4_t): Ditto. (vfloat16mf4x5_t): Ditto. (vfloat16mf4x6_t): Ditto. (vfloat16mf4x7_t): Ditto. (vfloat16mf4x8_t): Ditto. (vfloat16mf2x2_t): Ditto. (vfloat16mf2x3_t): Ditto. (vfloat16mf2x4_t): Ditto. (vfloat16mf2x5_t): Ditto. (vfloat16mf2x6_t): Ditto. (vfloat16mf2x7_t): Ditto. (vfloat16mf2x8_t): Ditto. (vfloat16m1x2_t): Ditto. (vfloat16m1x3_t): Ditto. (vfloat16m1x4_t): Ditto. (vfloat16m1x5_t): Ditto. (vfloat16m1x6_t): Ditto. (vfloat16m1x7_t): Ditto. (vfloat16m1x8_t): Ditto. (vfloat16m2x2_t): Ditto. (vfloat16m2x3_t): Ditto. (vfloat16m2x4_t): Ditto. (vfloat16m4x2_t): Ditto. * config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): New macro. (vfloat16mf4x3_t): Ditto. (vfloat16mf4x4_t): Ditto. (vfloat16mf4x5_t): Ditto. (vfloat16mf4x6_t): Ditto. (vfloat16mf4x7_t): Ditto. (vfloat16mf4x8_t): Ditto. (vfloat16mf2x2_t): Ditto. (vfloat16mf2x3_t): Ditto. (vfloat16mf2x4_t): Ditto. (vfloat16mf2x5_t): Ditto. (vfloat16mf2x6_t): Ditto. (vfloat16mf2x7_t): Ditto. (vfloat16mf2x8_t): Ditto. (vfloat16m1x2_t): Ditto. (vfloat16m1x3_t): Ditto. (vfloat16m1x4_t): Ditto. (vfloat16m1x5_t): Ditto. (vfloat16m1x6_t): Ditto. (vfloat16m1x7_t): Ditto. (vfloat16m1x8_t): Ditto. (vfloat16m2x2_t): Ditto. (vfloat16m2x3_t): Ditto. (vfloat16m2x4_t): Ditto. (vfloat16m4x2_t): Ditto. * config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): New. * config/riscv/riscv.md: New. * config/riscv/vector-iterators.md: New. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/tuple-28.c: New test. * gcc.target/riscv/rvv/base/tuple-29.c: New test. * gcc.target/riscv/rvv/base/tuple-30.c: New test. * gcc.target/riscv/rvv/base/tuple-31.c: New test. * gcc.target/riscv/rvv/base/tuple-32.c: New test.
2023-06-17i386: Two minor tweaks to ix86_expand_move.Roger Sayle1-4/+6
This patch splits out two (independent) minor changes to i386-expand.cc's ix86_expand_move from a larger patch, given that it's better to review and commit these independent pieces separately from a more complex patch. The first change is to test for CONST_WIDE_INT_P before calling ix86_convert_const_wide_int_to_broadcast. Whilst stepping through this function in gdb, I was surprised that the code was continually jumping into this function with operands that obviously weren't appropriate. The second change is to generalize the optimization for efficiently moving a TImode value to V1TImode (via V2DImode), to cover all 128-bit vector modes. Hence for the test case: typedef unsigned long uv2di __attribute__ ((__vector_size__ (16))); uv2di foo2(__int128 x) { return (uv2di)x; } we'd previously move via memory with: foo2: movq %rdi, -24(%rsp) movq %rsi, -16(%rsp) movdqa -24(%rsp), %xmm0 ret with this patch we now generate with -O2 (the same as V1TImode): foo2: movq %rdi, %xmm0 movq %rsi, %xmm1 punpcklqdq %xmm1, %xmm0 ret and with -O2 -msse4 the even better: foo2: movq %rdi, %xmm0 pinsrq $1, %rsi, %xmm0 ret The new test case is unimaginatively called sse2-v1ti-mov-2.c given the original test case just for V1TI mode was called sse2-v1ti-mov-1.c. 2023-06-17 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_move): Check that OP1 is CONST_WIDE_INT_P before calling ix86_convert_wide_int_to_broadcast. Generalize special case for converting TImode to V1TImode to handle all 128-bit vector conversions. gcc/testsuite/ChangeLog * gcc.target/i386/sse2-v1ti-mov-2.c: New test case.
2023-06-17RISC-V: Bugfix for RVV integer reduction in ZVE32/64.Pan Li3-60/+222
The rvv integer reduction has 3 different patterns for zve128+, zve64 and zve32. They take the same iterator with different attributions. However, we need the generated function code_for_reduc (code, mode1, mode2). The implementation of code_for_reduc may look like below. code_for_reduc (code, mode1, mode2) { if (code == max && mode1 == VNx1QI && mode2 == VNx1QI) return CODE_FOR_pred_reduc_maxvnx1qivnx16qi; // ZVE128+ if (code == max && mode1 == VNx1QI && mode2 == VNx1QI) return CODE_FOR_pred_reduc_maxvnx1qivnx8qi; // ZVE64 if (code == max && mode1 == VNx1QI && mode2 == VNx1QI) return CODE_FOR_pred_reduc_maxvnx1qivnx4qi; // ZVE32 } Thus there will be a problem here. For example zve32, we will have code_for_reduc (max, VNx1QI, VNx1QI) which will return the code of the ZVE128+ instead of the ZVE32 logically. This patch will merge the 3 patterns into pattern, and pass both the input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be code_for_reduc (max, VNx1Q1, VNx8QI), then the correct code of ZVE32 will be returned as expectation. Please note both GCC 13 and 14 are impacted by this issue. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> PR target/110265 gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Add ret_mode for integer reduction expand. * config/riscv/vector-iterators.md: Add VQI, VHI, VSI and VDI, and the LMUL1 attr respectively. * config/riscv/vector.md (@pred_reduc_<reduc><mode><vlmul1>): Removed. (@pred_reduc_<reduc><mode><vlmul1_zve64>): Likewise. (@pred_reduc_<reduc><mode><vlmul1_zve32>): Likewise. (@pred_reduc_<reduc><VQI:mode><VQI_LMUL1:mode>): New pattern. (@pred_reduc_<reduc><VHI:mode><VHI_LMUL1:mode>): Likewise. (@pred_reduc_<reduc><VSI:mode><VSI_LMUL1:mode>): Likewise. (@pred_reduc_<reduc><VDI:mode><VDI_LMUL1:mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr110265-1.c: New test. * gcc.target/riscv/rvv/base/pr110265-1.h: New test. * gcc.target/riscv/rvv/base/pr110265-2.c: New test. * gcc.target/riscv/rvv/base/pr110265-2.h: New test. * gcc.target/riscv/rvv/base/pr110265-3.c: New test.
2023-06-17RISC-V: Fix VL operand bug in VSETVL PASS[PR110264]Juzhe-Zhong1-1/+4
This patch fixes this issue happens on both GCC-13 and GCC-14. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264 The testcase is too big and I failed to reduce it so I didn't append test into this patch. This patch should not only land into GCC-14 but also should backport to GCC-13. PR target/110264 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (insert_vsetvl): Fix bug.
2023-06-16PR target/31985: Improve memory operand use with doubleword add.Roger Sayle1-0/+30
This patch addresses the last remaining issue with PR target/31985, that GCC could make better use of memory addressing modes when implementing double word addition. This is achieved by adding a define_insn_and_split that combines an *add<dwi>3_doubleword with a *concat<mode><dwi>3, so that the components of the concat can be used directly, without first being loaded into a double word register. For test_c in the bugzilla PR: Before: pushl %ebx subl $16, %esp movl 28(%esp), %eax movl 36(%esp), %ecx movl 32(%esp), %ebx movl 24(%esp), %edx addl %ecx, %eax adcl %ebx, %edx movl %eax, 8(%esp) movl %edx, 12(%esp) addl $16, %esp popl %ebx ret After: test_c: subl $20, %esp movl 36(%esp), %eax movl 32(%esp), %edx addl 28(%esp), %eax adcl 24(%esp), %edx movl %eax, 8(%esp) movl %edx, 12(%esp) addl $20, %esp ret 2023-06-16 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog PR target/31985 * config/i386/i386.md (*add<dwi>3_doubleword_concat): New define_insn_and_split combine *add<dwi>3_doubleword with a *concat<mode><dwi>3 for more efficient lowering after reload. gcc/testsuite/ChangeLog PR target/31985 * gcc.target/i386/pr31985.c: New test case.
2023-06-16aarch64: Handle ASHIFTRT in patterns for shrn2Kyrylo Tkachov3-29/+31
Similar to the low-half patterns, we want to match both ashiftrt and lshiftrt with the truncate for SHRN2. We reuse the SHIFTRT iterator and the AARCH64_VALID_SHRN_OP check to help, but because we expand the high-half patterns by their gen_* names we need to disambiguate all the different trunc+shift combinations in the pattern name, which leads to a slight renaming of the builtins. The AARCH64_VALID_SHRN_OP check on the expander and the define_insns ensures that no invalid combination ends up getting matched. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (shrn2_n): Rename builtins to... (ushrn2_n): ... This. (sqshrn2_n): Rename builtins to... (ssqshrn2_n): ... This. (uqshrn2_n): Rename builtins to... (uqushrn2_n): ... This. * config/aarch64/arm_neon.h (vqshrn_high_n_s16): Adjust for the above. (vqshrn_high_n_s32): Likewise. (vqshrn_high_n_s64): Likewise. (vqshrn_high_n_u16): Likewise. (vqshrn_high_n_u32): Likewise. (vqshrn_high_n_u64): Likewise. (vshrn_high_n_s16): Likewise. (vshrn_high_n_s32): Likewise. (vshrn_high_n_s64): Likewise. (vshrn_high_n_u16): Likewise. (vshrn_high_n_u32): Likewise. (vshrn_high_n_u64): Likewise. * config/aarch64/aarch64-simd.md (aarch64_<shrn_op>shrn2_n<mode>_insn_le): Rename to... (aarch64_<shrn_op><sra_op>shrn2_n<mode>_insn_le): ... This. Use SHIFTRT iterator and AARCH64_VALID_SHRN_OP check. (aarch64_<shrn_op>shrn2_n<mode>_insn_be): Rename to... (aarch64_<shrn_op><sra_op>shrn2_n<mode>_insn_be): ... This. Use SHIFTRT iterator and AARCH64_VALID_SHRN_OP check. (aarch64_<shrn_op>shrn2_n<mode>): Rename to... (aarch64_<shrn_op><sra_op>shrn2_n<mode>): ... This. Update expander for the above.
2023-06-16aarch64: [US]Q(R)SHR(U)N2 refactoringKyrylo Tkachov4-198/+237
This patch is large in lines of code, but it is a fairly regular extension of the first patch as it converts the high-half patterns to standard RTL codes in the same fashion as the first patch did for the low-half ones. This now allows us to remove the unspec codes for these instructions as there are no more uses of them left. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (shrn2): Rename builtins to... (shrn2_n): ... This. (rshrn2): Rename builtins to... (rshrn2_n): ... This. * config/aarch64/arm_neon.h (vrshrn_high_n_s16): Adjust for the above. (vrshrn_high_n_s32): Likewise. (vrshrn_high_n_s64): Likewise. (vrshrn_high_n_u16): Likewise. (vrshrn_high_n_u32): Likewise. (vrshrn_high_n_u64): Likewise. (vshrn_high_n_s16): Likewise. (vshrn_high_n_s32): Likewise. (vshrn_high_n_s64): Likewise. (vshrn_high_n_u16): Likewise. (vshrn_high_n_u32): Likewise. (vshrn_high_n_u64): Likewise. * config/aarch64/aarch64-simd.md (*aarch64_<srn_op>shrn<mode>2_vect_le): Delete. (*aarch64_<srn_op>shrn<mode>2_vect_be): Likewise. (aarch64_shrn2<mode>_insn_le): Likewise. (aarch64_shrn2<mode>_insn_be): Likewise. (aarch64_shrn2<mode>): Likewise. (aarch64_rshrn2<mode>_insn_le): Likewise. (aarch64_rshrn2<mode>_insn_be): Likewise. (aarch64_rshrn2<mode>): Likewise. (aarch64_<sur>q<r>shr<u>n2_n<mode>_insn_le): Likewise. (aarch64_<shrn_op>shrn2_n<mode>_insn_le): New define_insn. (aarch64_<sur>q<r>shr<u>n2_n<mode>_insn_be): Delete. (aarch64_<shrn_op>shrn2_n<mode>_insn_be): New define_insn. (aarch64_<sur>q<r>shr<u>n2_n<mode>): Delete. (aarch64_<shrn_op>shrn2_n<mode>): New define_expand. (aarch64_<shrn_op>rshrn2_n<mode>_insn_le): New define_insn. (aarch64_<shrn_op>rshrn2_n<mode>_insn_be): New define_insn. (aarch64_<shrn_op>rshrn2_n<mode>): New define_expand. (aarch64_sqshrun2_n<mode>_insn_le): New define_insn. (aarch64_sqshrun2_n<mode>_insn_be): New define_insn. (aarch64_sqshrun2_n<mode>): New define_expand. (aarch64_sqrshrun2_n<mode>_insn_le): New define_insn. (aarch64_sqrshrun2_n<mode>_insn_be): New define_insn. (aarch64_sqrshrun2_n<mode>): New define_expand. * config/aarch64/iterators.md (UNSPEC_SQSHRUN, UNSPEC_SQRSHRUN, UNSPEC_SQSHRN, UNSPEC_UQSHRN, UNSPEC_SQRSHRN, UNSPEC_UQRSHRN): Delete unspec values. (VQSHRN_N): Delete int iterator.
2023-06-16aarch64: Add ASHIFTRT handling for shrn patternKyrylo Tkachov3-3/+10
The first patch in the series has some fallout in the testsuite, particularly gcc.target/aarch64/shrn-combine-2.c. Our previous patterns for SHRN matched both (truncate (ashiftrt (x) (N))) and (truncate (lshiftrt (x) (N)) as these are equivalent for the shift amounts involved. In our refactoring, however, we mapped shrn to truncate+lshiftrt. The fix here is to iterate over ashiftrt,lshiftrt in the pattern for it. However, we don't want to allow ashiftrt for us_truncate or lshiftrt for ss_truncate from the ALL_TRUNC iterator. This patch addds a AARCH64_VALID_SHRN_OP helper to gate the valid combinations of truncations and shifts. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64.h (AARCH64_VALID_SHRN_OP): Define. * config/aarch64/aarch64-simd.md (*aarch64_<shrn_op>shrn_n<mode>_insn<vczle><vczbe>): Rename to... (*aarch64_<shrn_op><shrn_s>shrn_n<mode>_insn<vczle><vczbe>): ... This. Use SHIFTRT iterator and add AARCH64_VALID_SHRN_OP to condition. * config/aarch64/iterators.md (shrn_s): New code attribute.
2023-06-16aarch64: [US]Q(R)SHR(U)N scalar forms refactoringKyrylo Tkachov2-8/+106
Some instructions from the previous patch have scalar forms: SQSHRN,SQRSHRN,UQSHRN,UQRSHRN,SQSHRUN,SQRSHRUN. This patch converts the patterns for these to use standard RTL codes. Their MD patterns deviate slightly from the vector forms mostly due to things like operands being scalar rather than vectors. One nuance is in the SQSHRUN,SQRSHRUN patterns. These end in a truncate to the scalar narrow mode e.g. SI -> QI. This gets simplified by the RTL passes to a subreg rather than keeping it as a truncate. So we end up representing these without the truncate and in the expander read the narrow subreg in order to comply with the expected width of the intrinsic. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>q<r>shr<u>n_n<mode>): Rename to... (aarch64_<shrn_op>shrn_n<mode>): ... This. Reimplement with RTL codes. (*aarch64_<shrn_op>rshrn_n<mode>_insn): New define_insn. (aarch64_sqrshrun_n<mode>_insn): Likewise. (aarch64_sqshrun_n<mode>_insn): Likewise. (aarch64_<shrn_op>rshrn_n<mode>): New define_expand. (aarch64_sqshrun_n<mode>): Likewise. (aarch64_sqrshrun_n<mode>): Likewise. * config/aarch64/iterators.md (V2XWIDE): Add HI and SI modes.
2023-06-16aarch64: Reimplement [US]Q(R)SHR(U)N patterns with RTL codesKyrylo Tkachov5-98/+174
This patch reimplements the MD patterns for the instructions that perform narrowing right shifts with optional rounding and saturation using standard RTL codes rather than unspecs. There are four groups of patterns involved: * Simple narrowing shifts with optional signed or unsigned truncation: SHRN, SQSHRN, UQSHRN. These are expressed as a truncation operation of a right shift. The matrix of valid combinations looks like this: | ashiftrt | lshiftrt | ------------------------------------------ ss_truncate | SQSHRN | X | us_truncate | X | UQSHRN | truncate | X | SHRN | ------------------------------------------ * Narrowing shifts with rounding with optional signed or unsigned truncation: RSHRN, SQRSHRN, UQRSHRN. These follow the same combinations of truncation and shift codes as above, but also perform intermediate widening of the results in order to represent the addition of the rounding constant. This group also corrects an existing inaccuracy for RSHRN where we don't currently model the intermediate widening for rounding. * The somewhat special "Signed saturating Shift Right Unsigned Narrow": SQSHRUN. Similar to the SQXTUN instructions, these perform a saturating truncation that isn't represented by US_TRUNCATE or SS_TRUNCATE but needs to use a clamping operation followed by a TRUNCATE. * The rounding version of the above: SQRSHRUN. It needs the special clamping truncate representation but with an intermediate widening and rounding addition. Besides using standard RTL codes for all of the above instructions, this patch allows us to get rid of the explicit define_insns and define_expands for SHRN and RSHRN. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. We've got pretty thorough execute tests in advsimd-intrinsics.exp that exercise these and many instances of these instructions get constant-folded away during optimisation and the validation still passes (during development where I was figuring out the details of the semantics they were discovering failures), so I'm fairly confident in the representation. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (shrn): Rename builtins to... (shrn_n): ... This. (rshrn): Rename builtins to... (rshrn_n): ... This. * config/aarch64/arm_neon.h (vshrn_n_s16): Adjust for the above. (vshrn_n_s32): Likewise. (vshrn_n_s64): Likewise. (vshrn_n_u16): Likewise. (vshrn_n_u32): Likewise. (vshrn_n_u64): Likewise. (vrshrn_n_s16): Likewise. (vrshrn_n_s32): Likewise. (vrshrn_n_s64): Likewise. (vrshrn_n_u16): Likewise. (vrshrn_n_u32): Likewise. (vrshrn_n_u64): Likewise. * config/aarch64/aarch64-simd.md (*aarch64_<srn_op>shrn<mode><vczle><vczbe>): Delete. (aarch64_shrn<mode>): Likewise. (aarch64_rshrn<mode><vczle><vczbe>_insn): Likewise. (aarch64_rshrn<mode>): Likewise. (aarch64_<sur>q<r>shr<u>n_n<mode>_insn<vczle><vczbe>): Likewise. (aarch64_<sur>q<r>shr<u>n_n<mode>): Likewise. (*aarch64_<shrn_op>shrn_n<mode>_insn<vczle><vczbe>): New define_insn. (*aarch64_<shrn_op>rshrn_n<mode>_insn<vczle><vczbe>): Likewise. (*aarch64_sqshrun_n<mode>_insn<vczle><vczbe>): Likewise. (*aarch64_sqrshrun_n<mode>_insn<vczle><vczbe>): Likewise. (aarch64_<shrn_op>shrn_n<mode>): New define_expand. (aarch64_<shrn_op>rshrn_n<mode>): Likewise. (aarch64_sqshrun_n<mode>): Likewise. (aarch64_sqrshrun_n<mode>): Likewise. * config/aarch64/iterators.md (ALL_TRUNC): New code iterator. (TRUNCEXTEND): New code attribute. (TRUNC_SHIFT): Likewise. (shrn_op): Likewise. * config/aarch64/predicates.md (aarch64_simd_umax_quarter_mode): New predicate.
2023-06-16RISC-V: Fix one warning of maybe-uninitialized in riscv-vsetvl.ccPan Li1-1/+1
This patch would like to fix one maybe-uninitialized warning. Aka: riscv-vsetvl.cc:4354:3: error: 'vsetvl_rinsn' may be used uninitialized [-Werror=maybe-uninitialized] Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::global_eliminate_vsetvl_insn): Initialize var by NULL.
2023-06-16Add MinGW option -mcrtdll= for choosing C RunTime DLL libraryPali Rohár3-5/+49
It adjust preprocess, compile and link flags, which allows to change default -lmsvcrt library by another provided by MinGW runtime. gcc/ChangeLog: * config/i386/mingw-w64.h (CPP_SPEC): Adjust for -mcrtdll=. (REAL_LIBGCC_SPEC): New define. * config/i386/mingw.opt: Add mcrtdll= * config/i386/mingw32.h (CPP_SPEC): Adjust for -mcrtdll=. (REAL_LIBGCC_SPEC): Adjust for -mcrtdll=. (STARTFILE_SPEC): Adjust for -mcrtdll=. * doc/invoke.texi: Add mcrtdll= documentation. Signed-off-by: Jonathan Yong <10walls@gmail.com>
2023-06-16MIPS16: Implement `code_readable` function attribute.Simon Dardis1-1/+96
Support for __attribute__ ((code_readable)). Takes up to one argument of "yes", "no", "pcrel". This will change the code readability setting for just that function. If no argument is supplied, then the setting is 'yes'. gcc/ChangeLog: * config/mips/mips.cc (enum mips_code_readable_setting):New enmu. (mips_handle_code_readable_attr):New static function. (mips_get_code_readable_attr):New static enum function. (mips_set_current_function):Set the code_readable mode. (mips_option_override):Same as above. * doc/extend.texi:Document code_readable. gcc/testsuite/ChangeLog: * gcc.target/mips/code-readable-attr-1.c: New test. * gcc.target/mips/code-readable-attr-2.c: New test. * gcc.target/mips/code-readable-attr-3.c: New test. * gcc.target/mips/code-readable-attr-4.c: New test. * gcc.target/mips/code-readable-attr-5.c: New test.
2023-06-15x86/AVX512: use VMOVDDUP for broadcast to V2DFJan Beulich1-2/+2
Like is already the case for the AVX/AVX2 form, VMOVDDUP - acting on double precision floating values - is more appropriate to use here, and it can also result in shorter insn encodings when source is memory or %xmm0...%xmm7, and no masking is applied (in allowing a 2-byte VEX prefix then instead of a 3-byte one). gcc/ * config/i386/sse.md (<avx512>_vec_dup<mode><mask_name>): Use vmovddup.
2023-06-15x86: add Bk and Br to comment list B's sub-charsJan Beulich1-0/+2
gcc/ * config/i386/constraints.md: Mention k and r for B.
2023-06-15LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]Lulu Cheng1-2/+6
Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine", hence doing "jr $ra" would interfere with both subroutine return prediction and the more general indirect branch prediction. Therefore, a problem like PR110136 can cause a significant increase in branch error prediction rate and affect performance. The same problem exists with "indirect_jump". gcc/ChangeLog: PR target/110136 * config/loongarch/loongarch.md: Modify the register constraints for template "jumptable" and "indirect_jump" from "r" to "e". Co-authored-by: Andrew Pinski <apinski@marvell.com>
2023-06-15LoongArch: Set default alignment for functions and labels with -mtuneXi Ruoyao4-0/+27
The LA464 micro-architecture is sensitive to alignment of code. The Loongson team has benchmarked various combinations of function, the results [1] show that 16-byte label alignment together with 32-byte function alignment gives best results in terms of SPEC score. Add a mtune-based table-driven mechanism to set the default of -falign-{functions,labels}. As LA464 is the first (and the only for now) uarch supported by GCC, the same setting is also used for the "generic" -mtune=loongarch64. In the future we may set different settings for LA{2,3,6}64 once we add the support for them. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/loongarch/loongarch-tune.h (loongarch_align): New struct. * config/loongarch/loongarch-def.h (loongarch_cpu_align): New array. * config/loongarch/loongarch-def.c (loongarch_cpu_align): Define the array. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set the value of -falign-functions= if -falign-functions is enabled but no value is given. Likewise for -falign-labels=.
2023-06-15middle-end, i386: Pattern recognize add/subtract with carry [PR79173]Jakub Jelinek1-4/+69
The following patch introduces {add,sub}c5_optab and pattern recognizes various forms of add with carry and subtract with carry/borrow, see pr79173-{1,2,3,4,5,6}.c tests on what is matched. Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow calls per limb (with just one for the least significant one), for add with carry even when it is hand written in C (for subtraction reassoc seems to change it too much so that the pattern recognition doesn't work). __builtin_{add,sub}_overflow are standardized in C23 under ckd_{add,sub} names, so it isn't any longer a GNU only extension. Note, clang has for these (IMHO badly designed) __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just a single bit of carry, but basically add 3 unsigned values or subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2 because of that. If we wanted to introduce those for clang compatibility, we could and lower them early to just two __builtin_{add,sub}_overflow calls and let the pattern matching in this patch recognize it later. I've added expanders for this on ix86 and in addition to that added various peephole2s (in preparation patches for this patch) to make sure we get nice (and small) code for the common cases. I think there are other PRs which request that e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch also improves. Would be nice if support for these optabs was added to many other targets, arm/aarch64 and powerpc* certainly have such instructions, I'd expect in fact that most targets do. The _BitInt support I'm working on will also need this to emit reasonable code. 2023-06-15 Jakub Jelinek <jakub@redhat.com> PR middle-end/79173 * internal-fn.def (UADDC, USUBC): New internal functions. * internal-fn.cc (expand_UADDC, expand_USUBC): New functions. (commutative_ternary_fn_p): Return true also for IFN_UADDC. * optabs.def (uaddc5_optab, usubc5_optab): New optabs. * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart, match_uaddc_usubc): New functions. (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless other optimizations have been successful for those. * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC. * fold-const-call.cc (fold_const_call): Likewise. * gimple-range-fold.cc (adjust_imagpart_expr): Likewise. * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise. * doc/md.texi (uaddc<mode>5, usubc<mode>5): Document new named patterns. * config/i386/i386.md (uaddc<mode>5, usubc<mode>5): New define_expand patterns. (*setcc_qi_addqi3_cconly_overflow_1_<mode>, *setccc): Split into NOTE_INSN_DELETED note rather than nop instruction. (*setcc_qi_negqi_ccc_1_<mode>, *setcc_qi_negqi_ccc_2_<mode>): Likewise. * gcc.target/i386/pr79173-1.c: New test. * gcc.target/i386/pr79173-2.c: New test. * gcc.target/i386/pr79173-3.c: New test. * gcc.target/i386/pr79173-4.c: New test. * gcc.target/i386/pr79173-5.c: New test. * gcc.target/i386/pr79173-6.c: New test. * gcc.target/i386/pr79173-7.c: New test. * gcc.target/i386/pr79173-8.c: New test. * gcc.target/i386/pr79173-9.c: New test. * gcc.target/i386/pr79173-10.c: New test.
2023-06-15i386: Add peephole2 patterns to improve subtract with borrow with memory ↵Jakub Jelinek1-3/+151
destination [PR79173] This patch adds subborrow<mode> alternative so that it can have memory destination and adds various peephole2s which help to match it. 2023-06-15 Jakub Jelinek <jakub@redhat.com> PR middle-end/79173 * config/i386/i386.md (subborrow<mode>): Add alternative with memory destination and add for it define_peephole2 TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory destination in these patterns.
2023-06-15i386: Add peephole2 patterns to improve add with carry or subtract with ↵Jakub Jelinek1-0/+289
borrow with memory destination [PR79173] This patch adds various peephole2s which help to recognize add with carry or subtract with borrow with memory destination. 2023-06-14 Jakub Jelinek <jakub@redhat.com> PR middle-end/79173 * config/i386/i386.md (*sub<mode>_3, @add<mode>3_carry, addcarry<mode>, @sub<mode>3_carry, *add<mode>3_cc_overflow_1): Add define_peephole2 TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory destination in these patterns.
2023-06-15AArch64: New RTL for ABDOluwatamilore Adebayo2-2/+14
This patch adds new RTL and tests for sabd and uabd PR tree-optimization/109156 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<su>abd<mode>): Rename to <su>abd<mode>3. * config/aarch64/aarch64-sve.md (<su>abd<mode>_3): Rename to <su>abd<mode>3. gcc/testsuite/ChangeLog: * gcc.target/aarch64/abd.h: New file. * gcc.target/aarch64/abd_2.c: New test. * gcc.target/aarch64/abd_3.c: New test. * gcc.target/aarch64/abd_4.c: New test. * gcc.target/aarch64/abd_none_2.c: New test. * gcc.target/aarch64/abd_none_3.c: New test. * gcc.target/aarch64/abd_none_4.c: New test. * gcc.target/aarch64/abd_run_1.c: New test. * gcc.target/aarch64/sve/abd_1.c: New test. * gcc.target/aarch64/sve/abd_none_1.c: New test. * gcc.target/aarch64/sve/abd_2.c: New test. * gcc.target/aarch64/sve/abd_none_2.c: New test.
2023-06-15LoongArch: Change the default value of LARCH_CALL_RATIO to 6.chenxiaolong1-1/+1
During the regression testing of the LoongArch architecture GCC, it was found that the tests in the pr90883.C file failed. The problem was modulated and found that the error was caused by setting the macro LARCH_CALL_RATIO to a too large value. Combined with the actual LoongArch architecture, the different thresholds for meeting the test conditions were tested using the engineering method (SPEC CPU 2006), and the results showed that its optimal threshold should be set to 6. gcc/ChangeLog: * config/loongarch/loongarch.h (LARCH_CALL_RATIO): Modify the value of macro LARCH_CALL_RATIO on LoongArch to make it perform optimally.
2023-06-15RISC-V: Use merge approach to optimize vector permutationJuzhe-Zhong1-0/+53
This patch is to optimize the permuation case that is suiteable use merge approach. Consider this following case: typedef int8_t vnx16qi __attribute__((vector_size (16))); void __attribute__ ((noipa)) merge0 (vnx16qi x, vnx16qi y, vnx16qi *out) { vnx16qi v = __builtin_shufflevector ((vnx16qi) x, (vnx16qi) y, MASK_16); *(vnx16qi*)out = v; } The gimple IR: v_3 = VEC_PERM_EXPR <x_1(D), y_2(D), { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }>; Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the common expression: { 0, nunits + 1, 2, nunits + 3, 4, nunits + 5, ... } For this selector, we can use vmsltu + vmerge to optimize the codegen. Before this patch: merge0: addi a5,sp,16 vl1re8.v v3,0(a5) li a5,31 vsetivli zero,16,e8,m1,ta,mu vmv.v.x v2,a5 lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) vl1re8.v v1,0(a5) vl1re8.v v4,0(sp) vand.vv v1,v1,v2 vmsgeu.vi v0,v1,16 vrgather.vv v2,v4,v1 vadd.vi v1,v1,-16 vrgather.vv v2,v3,v1,v0.t vs1r.v v2,0(a0) ret After this patch: merge0: addi a5,sp,16 vl1re8.v v1,0(a5) lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) vsetivli zero,16,e8,m1,ta,ma vl1re8.v v0,0(a5) vl1re8.v v2,0(sp) vmsltu.vi v0,v0,16 vmerge.vvm v1,v1,v2,v0 vs1r.v v1,0(a0) ret The key of this optimization is that: 1. mask = vmsltu (selector, nunits) 2. result = vmerge (op0, op1, mask) gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_merge_patterns): New pattern. (expand_vec_perm_const_1): Add merge optmization. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: New test.
2023-06-15RISC-V: Ensure vector args and return use function stack to pass [PR110119]Lehua Ding1-5/+12
The V2 patch address comments from Juzhe, thanks. Hi, The reason for this bug is that in the case where the vector register is set to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option), TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed through two scalar registers, but when GCC calls FUNCTION_VALUE (call function riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not unified. The current treatment is to pass all vector arguments and returns through the function stack, and a new calling convention for vector registers will be added in the future. https://github.com/riscv-non-isa/riscv-elf-psabi-doc/ https://github.com/palmer-dabbelt/riscv-elf-psabi-doc/commit/126fa719972ff998a8a239c47d506c7809aea363 Best, Lehua gcc/ChangeLog: PR target/110119 * config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for vector mode (riscv_pass_by_reference): Return true for vector mode gcc/testsuite/ChangeLog: PR target/110119 * gcc.target/riscv/rvv/base/pr110119-1.c: New test. * gcc.target/riscv/rvv/base/pr110119-2.c: New test.
2023-06-15RISC-V: Align the predictor style for define_insn_and_splitPan Li2-22/+22
This patch is considered as the follow up of the below PATCH. https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621347.html We aligned the predictor style for the define_insn_and_split suggested by Kito. To avoid potential issues before we hit. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/autovec-opt.md: Align the predictor sytle. * config/riscv/autovec.md: Ditto.
2023-06-15RISC-V: Bugfix for vec_init repeating auto vectorization in RV32Pan Li1-4/+12
When constructing a vector mask from individual elements we wrongly assumed that we can broadcast BITS_PER_WORD (i.e. XLEN). The maximum is actually the vector element length (i.e. ELEN). This patch fixes this. After this patch, below failures on RV32 will be fixed. FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax execution test Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask): Take elen instead of scalar BITS_PER_WORD. (expand_vector_init_merge_repeating_sequence): Use inner_bits_size instead of scaler BITS_PER_WORD.
2023-06-14Remove MFWRAP_SPEC remnantJivan Hakobyan1-8/+0
This patch removes a remnant of mudflap. gcc/ChangeLog: * config/moxie/uclinux.h (MFWRAP_SPEC): Remove
2023-06-14aarch64: Fix -Werror=sign-compare bootstrap failureKyrylo Tkachov1-3/+3
Pushing to fix bootstrap. gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): Fix signed comparison warning in loop from npats to enelts.
2023-06-14driver: Forward '-lgfortran', '-lm' to offloading compilationThomas Schwinge2-0/+24
..., so that users don't manually need to specify '-foffload-options=-lgfortran', '-foffload-options=-lm' in addition to '-lgfortran', '-lm' (specified manually, or implicitly by the driver). gcc/ * gcc.cc (driver_handle_option): Forward host '-lgfortran', '-lm' to offloading compilation. * config/gcn/mkoffload.cc (main): Adjust. * config/nvptx/mkoffload.cc (main): Likewise. * doc/invoke.texi (foffload-options): Update example. libgomp/ * testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Don't set. * testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Likewise. * testsuite/libgomp.c/simd-math-1.c: Remove '-foffload-options=-lm'. * testsuite/libgomp.fortran/fortran-torture_execute_math.f90: Likewise. * testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90: Likewise.
2023-06-14Use x instead of v for alternative 2 (v, BH) in mov<mode>_internal.liuhongt1-1/+1
Since there's no evex version for vpcmpeq ymm, ymm, ymm. gcc/ChangeLog: PR target/110227 * config/i386/sse.md (mov<mode>_internal>): Use x instead of v for alternative 2 since there's no evex version for vpcmpeqd ymm, ymm, ymm. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110227.c: New test.
2023-06-13Remove sh5media divtab codeJeff Law1-203/+0
Spurred by Akari Takahashi's patch to config/sh/divtab.cc, this removes divtab.cc completely. divtab.cc was used to calculate a division table for the sh5 media processor. GCC dropped support for that (unmanufactured) chip back in 2016 and this file simply got missed AFAICT. gcc/ * config/sh/divtab.cc: Remove.
2023-06-13i386: Fix up whitespace in assemblyJakub Jelinek1-3/+3
I've noticed that standard_sse_constant_opcode emits some spurious whitespace around tab, that isn't something which is done for any other instruction and looks wrong. 2023-06-13 Jakub Jelinek <jakub@redhat.com> * config/i386/i386.cc (standard_sse_constant_opcode): Remove superfluous spaces around \t for vpcmpeqd.
2023-06-13RISC-V: Remove duplicate `#include "riscv-vector-switch.def"`Lehua Ding1-1/+2
Hi, This patch remove the duplicate `#include "riscv-vector-switch.def"` statement and add #undef for ENTRY and TUPLE_ENTRY macros later. Best, Lehua gcc/ChangeLog: * config/riscv/riscv-v.cc (struct mode_vtype_group): Remove duplicate #include. (ENTRY): Undef. (TUPLE_ENTRY): Undef.
2023-06-13RISC-V: Add comments of some functionsJuzhe-Zhong1-0/+7
gcc/ChangeLog: * config/riscv/riscv-v.cc (rvv_builder::single_step_npatterns_p): Add comment. (shuffle_generic_patterns): Ditto. (expand_vec_perm_const_1): Ditto.
2023-06-13RISC-V: Fix bug of VLA SLP auto-vectorizationJuzhe-Zhong1-4/+4
Sorry for producing bugs in the previous VLA SLP patch. Consider this following permutation: _85 = VEC_PERM_EXPR <{ 99, 17, ... }, { 11, 80, ... }, { 0, POLY_INT_CST [4, 4], 1, POLY_INT_CST [5, 4], 2, POLY_INT_CST [6, 4], ... }>; The correct result should be: _85 = { 99, 11, 17, 80, ... } However, I did wrong in the previous patch. Code sequence before this patch: set mask = { 0, 1, 0, 1, ... } set v0 = { 99, 17, 99, 17, ... } set v1 = { 11, 80, 11, 80, ... } set index = viota (mask) = { 0, 0, 1, 1, 2, 2, ... } set result = vrgather_mu (v0, v1, index, mask) = { 99, 11, 99, 80 } The result is incorrect. After this patch: set mask = { 0, 1, 0, 1, ... } set index = viota (mask) = { 0, 0, 1, 1, 2, 2, ... } set v0 = vrgather ({ 99, 17, 99, 17, ... }, index) = { 99, 99, 17, 17, ... } set v1 = { 11, 80, 11, 80, ... } set result = vrgather_mu (v0, v1, index, mask) = { 99, 11, 17, 80 } The result is what we expected. This issue was discovered in the test I appended in this patch with --param=riscv-autovec-lmul=2. gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): Fix bug. (shuffle_decompress_patterns): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-12.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-12.c: New test.
2023-06-13RISC-V: Add vector psabi checking.Yanzhang Wang3-2/+117
This patch adds support to check function's argument or return is vector type and throw warning if yes. There're two exceptions, - The vector_size attribute. - The intrinsic functions. Some cases that need to add -Wno-psabi to ignore the warning. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_init_cumulative_args): Set warning flag if func is not builtin * config/riscv/riscv.cc (riscv_scalable_vector_type_p): Determine whether the type is scalable vector. (riscv_arg_has_vector): Determine whether the arg is vector type. (riscv_pass_in_vector_p): Check the vector type param is passed by value. (riscv_init_cumulative_args): The same as header. (riscv_get_arg_info): Add the checking. (riscv_function_value): Check the func return and set warning flag * config/riscv/riscv.h (INIT_CUMULATIVE_ARGS): Add a flag to determine whether warning psabi or not. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/pr109244.C: Add the -Wno-psabi. * g++.target/riscv/rvv/base/pr109535.C: Same * gcc.target/riscv/rvv/base/binop_vx_constraint-120.c: Same * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: Same * gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Same * gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Same * gcc.target/riscv/rvv/base/pr110109-2.c: Same * gcc.target/riscv/rvv/base/scalar_move-9.c: Same * gcc.target/riscv/rvv/base/spill-10.c: Same * gcc.target/riscv/rvv/base/spill-11.c: Same * gcc.target/riscv/rvv/base/spill-9.c: Same * gcc.target/riscv/rvv/base/vlmul_ext-1.c: Same * gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: Same * gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Same * gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Same * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Same * gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Same * gcc.target/riscv/vector-abi-1.c: New test. * gcc.target/riscv/vector-abi-2.c: New test. * gcc.target/riscv/vector-abi-3.c: New test. * gcc.target/riscv/vector-abi-4.c: New test. * gcc.target/riscv/vector-abi-5.c: New test. * gcc.target/riscv/vector-abi-6.c: New test. Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com> Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
2023-06-13arm: Extend -mtp= argumentsKyrylo Tkachov6-6/+48
After discussing the -mtp= option with Arm's LLVM developers we'd like to extend the functionality of the option somewhat. There are actually 3 system registers that can be accessed for the thread pointer in aarch32: tpidrurw, tpidruro, tpidrprw. They are all read through the CP15 co-processor mechanism. The current -mtp=cp15 option reads the tpidruro register. This patch extends -mtp to allow for the above three explicit tpidr names and keeps -mtp=cp15 as an alias of -mtp=tpidruro for backwards compatibility. Bootstrapped and tested on arm-none-linux-gnueabihf. gcc/ChangeLog: * config/arm/arm-opts.h (enum arm_tp_type): Remove TP_CP15. Add TP_TPIDRURW, TP_TPIDRURO, TP_TPIDRPRW values. * config/arm/arm-protos.h (arm_output_load_tpidr): Declare prototype. * config/arm/arm.cc (arm_option_reconfigure_globals): Replace TP_CP15 with TP_TPIDRURO. (arm_output_load_tpidr): Define. * config/arm/arm.h (TARGET_HARD_TP): Define in terms of TARGET_SOFT_TP. * config/arm/arm.md (load_tp_hard): Call arm_output_load_tpidr to output assembly. (reload_tp_hard): Likewise. * config/arm/arm.opt (tpidrurw, tpidruro, tpidrprw): New values for arm_tp_type. * doc/invoke.texi (Arm Options, mtp): Document new values. gcc/testsuite/ChangeLog: * gcc.target/arm/mtp.c: New test. * gcc.target/arm/mtp_1.c: New test. * gcc.target/arm/mtp_2.c: New test. * gcc.target/arm/mtp_3.c: New test. * gcc.target/arm/mtp_4.c: New test.
2023-06-13aarch64: Extend -mtp= argumentsKyrylo Tkachov3-2/+19
After discussing the -mtp= option with Arm's LLVM developers we'd like to extend the functionality of the option somewhat. First of all, there is another TPIDR register that can be used to read the thread pointer: TPIDRRO_EL0 (which can also be accessed by AArch32 under another name) so it makes sense to add -mtp=tpidrr0_el0. This makes the existing arguments el0, el1, el2, el3 somewhat inconsistent in their naming so this patch introduces the more "full" names tpidr_el0, tpidr_el1, tpidr_el2, tpidr_el3 and makes the above short names alias of these new ones. Long story short, we preserve backwards compatibility and add a new TPIDR register to access through -mtp that wasn't available previously. There is more relevant discussion of the options at https://reviews.llvm.org/D152433 if you're interested. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/108779 * config/aarch64/aarch64-opts.h (enum aarch64_tp_reg): Add AARCH64_TPIDRRO_EL0 value. * config/aarch64/aarch64.cc (aarch64_output_load_tp): Define. * config/aarch64/aarch64.opt (tpidr_el0, tpidr_el1, tpidr_el2, tpidr_el3, tpidrro_el3): New accepted values to -mtp=. * doc/invoke.texi (AArch64 Options): Document new -mtp= options. gcc/testsuite/ChangeLog: PR target/108779 * gcc.target/aarch64/mtp_5.c: New test. * gcc.target/aarch64/mtp_6.c: New test. * gcc.target/aarch64/mtp_7.c: New test. * gcc.target/aarch64/mtp_8.c: New test. * gcc.target/aarch64/mtp_9.c: New test.
2023-06-13AArch64: [PR96339] Optimise svlast[ab]Tejas Belagod1-0/+133
This PR optimizes an SVE intrinsics sequence where svlasta (svptrue_pat_b8 (SV_VL1), x) a scalar is selected based on a constant predicate and a variable vector. This sequence is optimized to return the correspoding element of a NEON vector. For eg. svlasta (svptrue_pat_b8 (SV_VL1), x) returns umov w0, v0.b[1] Likewise, svlastb (svptrue_pat_b8 (SV_VL1), x) returns umov w0, v0.b[0] This optimization only works provided the constant predicate maps to a range that is within the bounds of a 128-bit NEON register. gcc/ChangeLog: PR target/96339 * config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): Fold sve calls that have a constant input predicate vector. (svlast_impl::is_lasta): Query to check if intrinsic is svlasta. (svlast_impl::is_lastb): Query to check if intrinsic is svlastb. (svlast_impl::vect_all_same): Check if all vector elements are equal. gcc/testsuite/ChangeLog: PR target/96339 * gcc.target/aarch64/sve/acle/general-c/svlast.c: New. * gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New. * gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New. * gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm to expect optimized code for function body. * gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise. * gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
2023-06-12Update perf auto profile scriptAndi Kleen1-1/+8
- Fix gen_autofdo_event: The download URL for the Intel Perfmon Event list has changed, as well as the JSON format. Also it now uses pattern matching to match CPUs. Update the script to support all of this. - Regenerate gcc-auto-profile with the latest published Intel model numbers, so it works with recent systems. - So far it's still broken on hybrid systems contrib/ChangeLog: * gen_autofdo_event.py: Update for download server changes gcc/ChangeLog * config/i386/gcc-auto-profile: Regenerate.
2023-06-13RISC-V: Fix V_WHOLE && V_FRACT iterator requirementJuzhe-Zhong1-7/+10
This patch fixes the requirement of V_WHOLE and V_FRACT. E.g. VNx8QI in V_WHOLE has no requirement which is incorrect. Actually, VNx8QI should be whole(full) mode when TARGET_MIN_VLEN < 128 since when TARGET_MIN_VLEN == 128, VNx8QI is e8mf2 which is fractional vector. Co-Authored by: Robin Dapp <rdapp@ventanamicro.com> gcc/ChangeLog: * config/riscv/vector-iterators.md: Fix requirement. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c: New test.