riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-07-09	rs6000, Remove __builtin_vsx_cmple* builtins	Carl Love	2	-43/+0
	The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take unsigned arguments and return an unsigned result. The current definitions take signed arguments and return signed results which is incorrect. The signed and unsigned versions of __builtin_vsx_cmple* are not documented in extend.texi. Also there are no test cases for the built-ins. Users can use the existing vec_cmple as PVIPR defines instead of __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi, __builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si and __builtin_vsx_cmple_8hi, __builtin_altivec_cmple_1ti, __builtin_altivec_cmple_u1ti. Hence these built-ins are redundant and are removed by this patch. gcc/ChangeLog: * config/rs6000/rs6000-builtin.cc (RS6000_BIF_CMPLE_16QI, RS6000_BIF_CMPLE_U16QI, RS6000_BIF_CMPLE_8HI, RS6000_BIF_CMPLE_U8HI, RS6000_BIF_CMPLE_4SI, RS6000_BIF_CMPLE_U4SI, RS6000_BIF_CMPLE_2DI, RS6000_BIF_CMPLE_U2DI, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI): Remove case statements. * config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si, __builtin_vsx_cmple_8hi, __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si, __builtin_vsx_cmple_u8hi): Remove buit-in definitions.
2024-07-09	i386: Implement .SAT_TRUNC for unsigned integers	Uros Bizjak	2	-2/+134
	The following testcase: unsigned short foo (unsigned int x) { _Bool overflow = x > (unsigned int)(unsigned short)(-1); return ((unsigned short)x \| (unsigned short)-overflow); } currently compiles (-O2) to: foo: xorl %eax, %eax cmpl $65535, %edi seta %al negl %eax orl %edi, %eax ret We can expand through ustrunc{m}{n}2 optab to use carry flag from the comparison and generate code using SBB: foo: cmpl $65535, %edi sbbl %eax, %eax orl %edi, %eax ret or CMOV instruction: foo: movl $65535, %eax cmpl %eax, %edi cmovnc %edi, %eax ret gcc/ChangeLog: * config/i386/i386.md (@cmp<mode>_1): Use SWI mode iterator. (ustruncdi<mode>2): New expander. (ustruncsi<mode>2): Ditto. (ustrunchiqi2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/sattrunc-1.c: New test.
2024-07-09	diagnostics: use refs rather than pointers for diagnostic_{path,context}	David Malcolm	3	-53/+52
	Use const & rather than const * in various places where it can't be null and can't change. No functional change intended. gcc/ChangeLog: * diagnostic-path.cc: Replace "const diagnostic_path " with "const diagnostic_path &" throughout, and "diagnostic_context " with "diagnostic context &". * diagnostic.cc (diagnostic_context::show_any_path): Pass reference in call to print_path. * diagnostic.h (diagnostic_context::print_path): Convert param to a reference. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-07-09	arm: clean up some legacy FPA related cruft.	Richard Earnshaw	1	-51/+10
	Support for the FPA on Arm was removed after gcc-4.7, but this little bit of crufty code was left behind. In particular the code to support the 'N' modifier in assembly code was left behind and this lead to a trail of other code that depended on it, even though most of the constants that it supported had been removed in the original cleanup. This patch removes most of the remaining cruft and simplifies the one bit that remains: to determine whether an RTL construct contains 0.0 we don't need to convert it to a real value, we can simply compare it to CONST0_RTX of the appropriate mode. gcc/ * config/arm/arm.cc (fp_consts_initited): Delete variable. (value_fp0): Likewise. (init_fp_table): Delete function. (fp_const_from_val): Likewise. (arm_const_double_rtx): Rework to avoid converting to REAL_VALUE_TYPE. (arm_print_operand, case 'N'): Make use of this case an error.
2024-07-09	RISC-V: Fix comment/naming in attribute parsing code	Christoph Müllner	1	-4/+4
	Function target attributes have to be separated by semi-colons. Let's fix the comment and variable naming to better explain what the code does. gcc/ChangeLog: * config/riscv/riscv-target-attr.cc (riscv_process_target_attr): Fix comments and variable names. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-07-09	RISC-V: Deduplicate arch subset list processing	Christoph Müllner	1	-26/+6
	We have a code duplication in riscv_set_arch_by_subset_list() and riscv_parse_arch_string(), where the latter function parses an ISA string into a subset_list before doing the same as the former function. riscv_parse_arch_string() is used to process command line options and riscv_set_arch_by_subset_list() processes target attributes. So, it is obvious that both functions should do the same. Let's deduplicate the code to enforce this. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_set_arch_by_subset_list): Fix overlong line. (riscv_parse_arch_string): Replace duplicated code by a call to riscv_set_arch_by_subset_list. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-07-09	RISC-V: testsuite: Properly gate LTO tests	Christoph Müllner	2	-2/+2
	There are two test cases with the following skip directive: dg-skip-if "" { --* } { "-flto -fno-fat-lto-objects" } This reads as: skip if both '-flto' and '-fno-fat-lto-objects' are present. This is not the case if only '-flto' is present. Since both tests depend on instruction sequences (one does check-function-bodies the other tests for an assembler error message), they won't work reliably with fat LTO objects. Let's change the skip line to gate the test on '-flto' to avoid failing tests like this: FAIL: gcc.target/riscv/interrupt-misaligned.c -O2 -flto check-function-bodies interrupt FAIL: gcc.target/riscv/interrupt-misaligned.c -O2 -flto -flto-partition=none check-function-bodies interrupt FAIL: gcc.target/riscv/pr93202.c -O2 -flto (test for errors, line 10) FAIL: gcc.target/riscv/pr93202.c -O2 -flto (test for errors, line 9) FAIL: gcc.target/riscv/pr93202.c -O2 -flto -flto-partition=none (test for errors, line 10) FAIL: gcc.target/riscv/pr93202.c -O2 -flto -flto-partition=none (test for errors, line 9) gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-misaligned.c: Remove "-fno-fat-lto-objects" from skip condition. * gcc.target/riscv/pr93202.c: Likewise. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-07-09	i386: Correct AVX10 CPUID emulation	Haochen Jiang	1	-2/+2
	AVX10 Documentaion has specified ecx value as 0 for AVX10 version and vector size under 0x24 subleaf. Although for ecx=1, the bits are all reserved for now, we still need to specify ecx as 0 to avoid dirty value in ecx. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Correct AVX10 CPUID emulation to specify ecx value.
2024-07-09	c: Rewrite c_parser_omp_tile_sizes to use c_parser_expr_list	Jakub Jelinek	3	-22/+15
	The following patch simplifies c_parser_omp_tile_sizes to use c_parser_expr_list, so that it will get CPP_EMBED parsing naturally, without having another spot that needs to be adjusted for it. 2024-07-09 Jakub Jelinek <jakub@redhat.com> * c-parser.cc (c_parser_omp_tile_sizes): Use c_parser_expr_list. * c-c++-common/gomp/tile-11.c: Adjust expected diagnostics for c. * c-c++-common/gomp/tile-12.c: Likewise.
2024-07-09	c++: Implement C++26 CWG2819 - Allow cv void * null pointer value conversion ↵	Jakub Jelinek	4	-25/+30
	to object types in constant expressions The following patch implements CWG2819 (which wasn't a DR because it changes behavior of C++26 only). 2024-07-09 Jakub Jelinek <jakub@redhat.com> * constexpr.cc (cxx_eval_constant_expression): CWG2819 - Allow cv void * null pointer value conversion to object types in constant expressions. * g++.dg/cpp26/constexpr-voidptr3.C: New test. * g++.dg/cpp0x/constexpr-cast2.C: Adjust expected diagnostics for C++26. * g++.dg/cpp0x/constexpr-cast4.C: Likewise.
2024-07-09	Rename __{float,double}_u to __x86_{float,double}_u to avoid pulluting the ↵	liuhongt	3	-8/+32
	namespace. I have a build failure on NetBSD as the namespace pollution avoidance causes a direct hit with the system /usr/include/math.h ======================================================================= In file included from /usr/src/local/gcc/obj/gcc/include/emmintrin.h:31, from /usr/src/local/gcc/obj/x86_64-unknown-netbsd10.99/libstdc++-v3/include/ext/random:45, from /usr/src/local/gcc/libstdc++-v3/include/precompiled/extc++.h:65: /usr/src/local/gcc/obj/gcc/include/xmmintrin.h:75:15: error: conflicting declaration 'typedef float __float_u' 75 \| typedef float __float_u __attribute__ ((__may_alias__, __aligned__ (1))); \| ^~~~~~~~~ In file included from /usr/src/local/gcc/obj/x86_64-unknown-netbsd10.99/libstdc++-v3/include/cmath:47, from /usr/src/local/gcc/obj/x86_64-unknown-netbsd10.99/libstdc++-v3/include/x86_64-unknown-netbsd10.99/bits/stdc++.h:114, from /usr/src/local/gcc/libstdc++-v3/include/precompiled/extc++.h:32: /usr/src/local/gcc/obj/gcc/include-fixed/math.h:49:7: note: previous declaration as 'union __float_u' 49 \| union __float_u { gcc/ChangeLog: PR target/115796 * config/i386/emmintrin.h (__float_u): Rename to .. (__x86_float_u): .. this. (_mm_load_sd): Ditto. (_mm_store_sd): Ditto. (_mm_loadh_pd): Ditto. (_mm_loadl_pd): Ditto. * config/i386/xmmintrin.h (__double_u): Rename to .. (__x86_double_u): .. this. (_mm_load_ss): Ditto. (_mm_store_ss): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr115796.c: New test.
2024-07-09	RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 2	Pan Li	9	-0/+185
	After the middle-end supported the vector mode of .SAT_ADD, add more testcases to ensure the correctness of RISC-V backend for form 2. Aka: Form 2: #define DEF_VEC_SAT_U_ADD_IMM_FMT_2(T, IMM) \ T __attribute__((noinline)) \ vec_sat_u_add_imm##IMM##_##T##_fmt_2 (T out, T in, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ out[i] = (T)(in[i] + IMM) < in[i] ? -1 : (in[i] + IMM); \ } DEF_VEC_SAT_U_ADD_IMM_FMT_2 (uint64_t, 9) Passed the fully rv64gcv regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add help test macro. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-6.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-7.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-8.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-5.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-6.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-7.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-09	RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 1	Pan Li	10	-0/+449
	After the middle-end supported the vector mode of .SAT_ADD, add more testcases to ensure the correctness of RISC-V backend for form 1. Aka: Form 1: #define DEF_VEC_SAT_U_ADD_IMM_FMT_1(T, IMM) \ T __attribute__((noinline)) \ vec_sat_u_add_imm##IMM##_##T##_fmt_1 (T out, T in, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ out[i] = (T)(in[i] + IMM) >= in[i] ? (in[i] + IMM) : -1; \ } DEF_VEC_SAT_U_ADD_IMM_FMT_1 (uint64_t, 9) Passed the fully rv64gcv regression tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add help test macro. * gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-3.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-4.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-3.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-4.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-09	Daily bump.	GCC Administrator	5	-1/+168

2024-07-08	[to-be-committed][RISC-V][V3] DCE analysis for extension elimination	Jeff Law	18	-5/+1047
	The pre-commit testing showed that making ext-dce only active at -O2 and above would require minor edits to the tests. In some cases we had specified -O1 in the test or specified no optimization level at all. Those need to be bumped to -O2. In one test we had one set of dg-options overriding another. The other approach that could have been taken would be to drop the -On argument, add an explicit -fext-dce and add dg-skip-if options. I originally thought that was going to be way to go, but the dg-skip-if aspect was going to get ugly as things like interaction between unrolling, peeling and -ftracer would have to be accounted for and would likely need semi-regular adjustment. Changes since V2: Testsuite changes to deal with pass only being enabled at -O2 or higher. -- Changes since V1: Check flag_ext_dce before running the new pass. I'd forgotten that I had removed that part of the gate to facilitate more testing. Turn flag_ext_dce on at -O2 and above. Adjust one of the riscv tests to explicitly avoid vectors Adjust a few aarch64 tests In tbz_2.c we remove an unnecessary extension which causes us to use "x" registers instead of "w" registers. In the pred_clobber tests we also remove an extension and that ultimately causes a reg->reg copy to change locations. -- This was actually ack'd late in the gcc-14 cycle, but I chose not to integrate it given how late we were in the cycle. The basic idea here is to track liveness of subobjects within a word and if we find an extension where the bits set aren't actually used, then we convert the extension into a subreg. The subreg typically simplifies away. I've seen this help a few routines in coremark, fix one bug in the testsuite (pr111384) and fix a couple internally reported bugs in Ventana. The original idea and code were from Joern; Jivan and I hacked it into usable shape. I've had this in my tester for ~8 months, so it's been through more build/test cycles than I care to contemplate and nearly every architecture we support. But just in case, I'm going to wait for it to spin through the pre-commit CI tester. I'll find my old ChangeLog before committing. gcc/ * Makefile.in (OBJS): Add ext-dce.o * common.opt (ext-dce): Document new option. * df-scan.cc (df_get_ext_block_use_set): Delete prototype and make extern. * df.h (df_get_exit_block_use_set): Prototype. * ext-dce.cc: New file/pass. * opts.cc (default_options_table): Handle ext-dce at -O2 or higher. * passes.def: Add ext-dce before combine. * tree-pass.h (make_pass_ext_dce): Prototype. gcc/testsuite * gcc.target/aarch64/sve/pred_clobber_1.c: Update expected output. * gcc.target/aarch64/sve/pred_clobber_2.c: Likewise. * gcc.target/aarch64/sve/pred_clobber_3.c: Likewise. * gcc.target/aarch64/tbz_2.c: Likewise. * gcc.target/riscv/core_bench_list.c: New test. * gcc.target/riscv/core_init_matrix.c: New test. * gcc.target/riscv/core_list_init.c: New test. * gcc.target/riscv/matrix_add_const.c: New test. * gcc.target/riscv/mem-extend.c: New test. * gcc.target/riscv/pr111384.c: New test. Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com> Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>
2024-07-08	c-format.cc: add ctors to format_check_results and format_check_context	David Malcolm	1	-26/+37
	This is a minor cleanup I spotted whilst working on another patch. No functional change intended. gcc/c-family/ChangeLog: * c-format.cc (format_check_results::format_check_results): New ctor. (struct format_check_context): Add ctor; add "m_" prefix to all fields. (check_format_info): Use above ctors. (check_format_arg): Update for "m_" prefix to format_check_context. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-07-08	i386: Promote {QI,HI}mode x86_mov<mode>cc_0_m1_neg to SImode	Uros Bizjak	1	-6/+19
	Promote HImode x86_mov<mode>cc_0_m1_neg insn to SImode to avoid redundant prefixes. Also promote QImode insn when TARGET_PROMOTE_QImode is set. This is similar to promotable_binary_operator splitter, where we promote the result to SImode. Also correct insn condition for splitters to SImode of NEG and NOT instructions. The sizes of QImode and SImode instructions are always the same, so there is no need for optimize_insn_for_size bypass. gcc/ChangeLog: * config/i386/i386.md (x86_mov<mode>cc_0_m1_neg splitter to SImode): New splitter. (NEG and NOT splitter to SImode): Remove optimize_insn_for_size_p predicate from insn condition.
2024-07-08	Remove trailing whitespace from invoke.texi	Patrick O'Neill	1	-196/+196
	gcc/ChangeLog: * doc/invoke.texi: Remove trailing whitespace. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-07-08	x86: Support bitwise and/andnot/abs/neg/copysign/xorsign op for V8BF/V16BF/V32BF	Levy Hsu	5	-36/+234
	This patch extends support for BF16 vector operations in GCC, including bitwise AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and V32BF modes. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator): Add VBF modes. (ix86_expand_copysign): Ditto. (ix86_expand_xorsign): Ditto. * config/i386/i386.cc (ix86_build_const_vector): Ditto. (ix86_build_signbit_mask): Ditto. * config/i386/sse.md: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-bf16-vec-absneg.c: New test. * gcc.target/i386/avx512f-bf16-vec-absneg.c: New test.
2024-07-08	rs6000: load high and low part of 128bit vector independently [PR110040]	Jeevitha	3	-0/+48
	PR110040 exposes an issue concerning moves from vector registers to GPRs. There are two moves, one for upper 64 bits and the other for the lower 64 bits. In the problematic test case, we are only interested in storing the lower 64 bits. However, the instruction for copying the upper 64 bits is still emitted and is dead code. This patch adds a splitter that splits apart the two move instructions so that DCE can remove the dead code after splitting. 2024-07-08 Jeevitha Palanisamy <jeevitha@linux.ibm.com> gcc/ PR target/110040 * config/rs6000/vsx.md (split pattern for V1TI to DI move): New define. gcc/testsuite/ PR target/110040 * gcc.target/powerpc/pr110040-1.c: New testcase. * gcc.target/powerpc/pr110040-2.c: New testcase.
2024-07-08	RISC-V: Implement .SAT_TRUNC for vector unsigned int	Pan Li	18	-0/+742
	This patch would like to implement the .SAT_TRUNC for the RISC-V backend. With the help of the RVV Vector Narrowing Fixed-Point Clip Instructions. The below SEW(S) are supported: * e64 => e32 * e64 => e16 * e64 => e8 * e32 => e16 * e32 => e8 * e16 => e8 Take below example to see the changes to asm. Form 1: #define DEF_VEC_SAT_U_TRUNC_FMT_1(NT, WT) \ void __attribute__((noinline)) \ vec_sat_u_trunc_##NT##_##WT##_fmt_1 (NT out, WT in, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ WT x = in[i]; \ bool overflow = x > (WT)(NT)(-1); \ out[i] = ((NT)x) \| (NT)-overflow; \ } \ } DEF_VEC_SAT_U_TRUNC_FMT_1 (uint32_t, uint64_t) Before this patch: .L3: vsetvli a5,a2,e64,m1,ta,ma vle64.v v1,0(a1) vmsgtu.vv v0,v1,v2 vsetvli zero,zero,e32,mf2,ta,ma vncvt.x.x.w v1,v1 vmerge.vim v1,v1,-1,v0 vse32.v v1,0(a0) slli a4,a5,3 add a1,a1,a4 slli a4,a5,2 add a0,a0,a4 sub a2,a2,a5 bne a2,zero,.L3 After this patch: .L3: vsetvli a5,a2,e32,mf2,ta,ma vle64.v v1,0(a1) vnclipu.wi v1,v1,0 vse32.v v1,0(a0) slli a4,a5,3 add a1,a1,a4 slli a4,a5,2 add a0,a0,a4 sub a2,a2,a5 bne a2,zero,.L3 Passed the rv64gcv fully regression tests. gcc/ChangeLog: * config/riscv/autovec.md (ustrunc<mode><v_double_trunc>2): Add new pattern for double truncation. (ustrunc<mode><v_quad_trunc>2): Ditto but for quad truncation. (ustrunc<mode><v_oct_trunc>2): Ditto but for oct truncation. * config/riscv/riscv-protos.h (expand_vec_double_ustrunc): Add new func decl to expand double vec ustrunc. (expand_vec_quad_ustrunc): Ditto but for quad. (expand_vec_oct_ustrunc): Ditto but for oct. * config/riscv/riscv-v.cc (expand_vec_double_ustrunc): Add new func impl to expand vector double ustrunc. (expand_vec_quad_ustrunc): Ditto but for quad. (expand_vec_oct_ustrunc): Ditto but for oct. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper test macros. * gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-4.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-5.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-6.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_unary_vv_run.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-08	fortran: Move definition of variable closer to its uses	Mikael Morin	1	-14/+19
	No change of behaviour, this makes a variable easier to track. gcc/fortran/ChangeLog: * trans-array.cc (gfc_trans_preloop_setup): Use a separate variable for iteration. Use directly the value of variable I if it is known. Move the definition of the variable to the branch where the remaining uses are.
2024-07-08	[RISC-V] add implied extension repeatly until stable	Fei Gao	2	-3/+14
	Call handle_implied_ext repeatly until there's no new subset added into the subset list. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::riscv_subset_list): init m_subset_num to 0. (riscv_subset_list::add): increase m_subset_num once a subset added. (riscv_subset_list::finalize): call handle_implied_ext repeatly until no change in m_subset_num. * config/riscv/riscv-subset.h: add m_subset_num member. Signed-off-by: Fei Gao <gaofei@eswincomputing.com>
2024-07-08	rs6000: Replace orc with iorc [PR115659]	Kewen Lin	3	-14/+14
	Since iorc optab is introduced, this patch is to update the expander names and all the related uses like bif expanders, gen functions accordingly. PR tree-optimization/115659 gcc/ChangeLog: * config/rs6000/rs6000-builtins.def: Update some bif expanders by replacing orc<mode>3 with iorc<mode>3. * config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update gen function by replacing orc<mode>3 with iorc<mode>3. * config/rs6000/rs6000.md (orc<mode>3): Rename to ... (iorc<mode>3): ... this.
2024-07-08	isel: Fold more in gimple_expand_vec_cond_expr with andc and iorc [PR115659]	Kewen Lin	4	-0/+42
	As PR115659 shows, assuming c = x CMP y, there are some folding chances for patterns r = c ? 0/z : z/-1: - for r = c ? 0 : z, it can be folded into r = ~c & z. - for r = c ? z : -1, it can be folded into r = ~c \| z. But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a compound operation, it's arguable to consider it beats vector selection. So this patch is to introduce new optabs andc, iorc and its corresponding internal functions BIT_{ANDC,IORC}, and if targets defines such optabs for vector modes, it means targets support these hardware insns and should be not worse than vector selection. PR tree-optimization/115659 gcc/ChangeLog: * doc/md.texi: Document andcm3 and iorcm3. * gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for patterns x CMP y ? 0 : z and x CMP y ? z : -1. * internal-fn.def (BIT_ANDC): New internal function. (BIT_IORC): Likewise. * optabs.def (andc, iorc): New optab.
2024-07-07	rs6000: Consider explicit VSX when masking off ALTIVEC [PR115688]	Kewen Lin	2	-2/+20
	PR115688 exposes an inconsistent state in which we have VSX enabled but ALTIVEC disabled. There is one hunk: if (main_target_opt && !main_target_opt->x_rs6000_altivec_abi) rs6000_isa_flags &= ~((OPTION_MASK_VSX \| OPTION_MASK_ALTIVEC) & ~rs6000_isa_flags_explicit); which disables both VSX and ALTIVEC together only considering them explicitly set or not. For the given case, VSX is explicitly specified, altivec is implicitly enabled as it's part of set ISA_2_6_MASKS_SERVER. When falling into the above hunk, vsx is kept as it's explicitly enabled but altivec gets masked off, it's unexpected. This patch is to consider explicit VSX when masking off ALTIVEC, not mask off it if TARGET_VSX and it's explicitly set. PR target/115688 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_option_override_internal): Consider explicit VSX when masking off ALTIVEC. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr115688.c: New test.
2024-07-08	x86: Update branch hint for Redwood Cove.	H.J. Lu	3	-24/+24
	According to Intel® 64 and IA-32 Architectures Optimization Reference Manual[1], Branch Hint is updated for Redwood Cove. --------cut from [1]------------------------- Starting with the Redwood Cove microarchitecture, if the predictor has no stored information about a branch, the branch has the Intel® SSE2 branch taken hint (i.e., instruction prefix 3EH), When the codec decodes the branch, it flips the branch’s prediction from not-taken to taken. It then flushes the pipeline in front of it and steers this pipeline to fetch the taken path of the branch. --------cut end ----------------------------- Split tune branch_prediction_hints into branch_prediction_hints_taken and branch_prediction_hints_not_taken, always generate branch hint for conditional branches, both tunes are disabled by default. [1] https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html gcc/ * config/i386/i386.cc (ix86_print_operand): Always generate branch hint for conditional branches. * config/i386/i386.h (TARGET_BRANCH_PREDICTION_HINTS): Split into .. (TARGET_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and .. (TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this. * config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS): Split into .. (X86_TUNE_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and .. (X86_TUNE_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this.
2024-07-08	Daily bump.	GCC Administrator	6	-1/+37

2024-07-07	PR modula2/115804 ICE during gimplification with new isfinite optab	Gaius Mulley	1	-11/+15
	The calls to five m2 builtins have the incorrect return type. This was detected when adding isfinitedf2 optab to the s390 backend which results in ICEs during gimplification in the gm2 testsuite. gcc/m2/ChangeLog: PR modula2/115804 * gm2-gcc/m2builtins.cc (builtin_function_entry): Add GTY. (DoBuiltinMemCopy): Add rettype and use rettype in the call. (DoBuiltinAlloca): Ditto. (DoBuiltinIsfinite): Ditto. (DoBuiltinIsnan): Ditto. (m2builtins_BuiltInHugeVal): Ditto. (m2builtins_BuiltInHugeValShort): Ditto. (m2builtins_BuiltInHugeValLong): Ditto. Co-Authored-By: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com> Co-Authored-By: Andrew Pinski <quic_apinski@quicinc.com> Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-07-07	doc: Remove dubious example around bug reporting	Gerald Pfeifer	1	-5/+0
	gcc: * doc/bugreport.texi (Bug Criteria): Remove dubious example.
2024-07-08	c++: Simplify uses of LAMBDA_EXPR_EXTRA_SCOPE	Nathaniel Shead	1	-5/+2
	I noticed there already exists a getter to get the scope of a lambda from its type directly rather than needing to go via CLASSTYPE_LAMBDA_EXPR, we may as well use it. gcc/cp/ChangeLog: * module.cc (trees_out::get_merge_kind): Use LAMBDA_TYPE_EXTRA_SCOPE instead of LAMBDA_EXPR_EXTRA_SCOPE. (trees_out::key_mergeable): Likewise. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2024-07-07	ada: Make the names of uninstalled cross-gnattools consistent across builds	Maciej W. Rozycki	2	-11/+30
	We suffer from an inconsistency in the names of uninstalled gnattools executables in cross-compiler configurations. The cause is a recipe we have: ada.all.cross: for tool in $(ADA_TOOLS) ; do \ if [ -f $$tool$(exeext) ] ; \ then \ $(MV) $$tool$(exeext) $$tool-cross$(exeext); \ fi; \ done the intent of which is to give the names of gnattools executables the '-cross' suffix, consistently with the compiler drivers: 'gcc-cross', 'g++-cross', etc. A problem with the recipe is that this 'make' target is called too early in the build process, before gnattools have been made. Consequently no renames happen and owing to that they are conditional on the presence of the individual executables the recipe succeeds doing nothing. However if a target is requested later on such as 'make pdf' that does not cause gnattools executables to be rebuilt, then 'ada.all.cross' does succeed in renaming the executables already present in the build tree. Then if the 'gnat' testsuite is run later on which expects non-suffixed 'gnatmake' executable, it does not find the 'gnatmake-cross' executable in the build tree and may either catastrophically fail or incorrectly use a system-installed copy of 'gnatmake'. Of course if a target is requested such as `make all' that does cause gnattools executables to be rebuilt, then both suffixed and non-suffixed uninstalled executables result. Fix the problem by moving the renaming of gnattools to a separate 'make' recipe, pasted into a new 'gnattools-cross-mv' target and the existing legacy 'cross-gnattools' target. Then invoke the new target explicitly from the 'gnattools-cross' recipe in gnattools/. Update the test harness accordingly, so that suffixed gnattools are used in cross-compilation testsuite runs. gcc/ada/ * gcc-interface/Make-lang.in (ada.all.cross): Move recipe to... (GNATTOOLS_CROSS_MV): ... this new variable. (cross-gnattools): Paste it here. (gnattools-cross-mv): New target. gnattools/ * Makefile.in (gnattools-cross): Also build 'gnattools-cross-mv' in GCC_DIR. gcc/testsuite/ * lib/gnat.exp (local_find_gnatmake, find_gnatclean): Use '-cross' suffix where testing a cross-compiler.
2024-07-07	Daily bump.	GCC Administrator	3	-1/+73

2024-07-06	[to-be-committed][v3][RISC-V] Handle bit manipulation of SImode values	Jeff Law	4	-16/+192
	Last patch in this round of bitmanip work... At least I think I'm going to pause here and switch gears to other projects that need attention 🙂 This patch introduces the ability to generate bitmanip instructions for rv64 when operating on SI objects when we know something about the range of the bit position (due to masking of the position). I've got note that the (7-pos % 8) bit position form was discovered by RAU in 500.perl. I took that and expanded it to the simple (pos & mask) form as well as covering bset, binv and bclr. As far as the implementation is concerned.... This turns the recently added define_splits into define_insn_and_split constructs. This allows combine to "see" enough RTL to realize a sign extension is unnecessary. Otherwise we get undesirable sign extensions for the new testcases. Second it adds new patterns for the logical operations. Two patterns for IOR/XOR and two patterns for AND. I think a key concept to keep in mind is that once we determine a Zbs operation is safe to perform on a SI value, we can rewrite the RTL in 64bit form. If we were ever to try and use range information at expand time for this stuff (and we probably should investigate that), that's the path I'd suggest. This is notably cleaner than my original implementation which actually kept the more complex RTL form through final and emitted 2/3 instructions (mask the bit position, then the bset/bclr/binv). Tested in my tester, but waiting for pre-commit CI to report back before taking further action. gcc/ * config/riscv/bitmanip.md (bset splitters): Turn into define_and_splits. Don't depend on combine splitting the "andn with constant" form. (bset, binv, bclr with masked bit position): New patterns. gcc/testsuite * gcc.target/riscv/binv-for-simode-1.c: New test. * gcc.target/riscv/bset-for-simode-1.c: New test. * gcc.target/riscv/bclr-for-simode-1.c: New test.
2024-07-06	testsuite/52641 - Fix more sloppy tests.	Georg-Johann Lay	14	-11/+25
	PR testsuite/52641 gcc/testsuite/ * gcc.dg/analyzer/torture/boxed-ptr-1.c: Requires size24plus. * gcc.dg/analyzer/torture/pr102692.c: Use intptr_t instead of long. * gcc.dg/ipa/pr102714.c: Use uintptr_t instead of unsigned long. * gcc.dg/torture/pr115387-1.c: Same. * gcc.dg/torture/pr113895-1.c : Same. * gcc.dg/ipa/pr108007.c: Require int32plus. * gcc.dg/ipa/pr109318.c: Same. * gcc.dg/ipa/pr96040.c: Use size_t instead of unsigned long. * gcc.dg/torture/pr113126.c: Use vectors of same dimension. * gcc.dg/tree-ssa/builtin-sprintf-9.c: Requires double64. * gcc.dg/spellcheck-inttypes.c [avr]: Avoid include of inttypes.h. * gcc.dg/analyzer/torture/pr104159.c [avr]: Skip. * gcc.dg/torture/pr84682-2.c [avr]: Skip. * gcc.dg/wtr-conversion-1.c [avr]: Remove avr selector since long double is a 64-bit type by now.
2024-07-06	[committed] Fix various sh define_insn_and_split predicates	Jeff Law	1	-50/+50
	The sh4-linux-gnu port has failed to bootstrap since the introduction of late combine due to failures to split certain insns. This is caused by incorrect predicates in various define_insn_and_split patterns. Essentially the insn's predicate is something like "TARGET_SH1". The split predicate is "&& can_create_pseudos_p ()". So these patterns will match post-reload, but be un-splittable. So at assembly output time, we get the failure as the output template is "#". This patch fixes the most obvious & egregious cases by bringing the split condition into the insn's predicate and leaving "&& 1" as the split condition. That's enough to get sh4-linux-gnu bootstrapping again and I'm hoping it does the same for sh4eb-linux-gnu. Pushing to the trunk. gcc/ * config/sh/sh.md (adddi3): Only allow matching when we can still create new pseudos. (subdi3, rotcl, rotcr, rotcr_neg_t, negdi2): Likewise. (abs<mode>2, negabs<mode>2, negdi_cond): Likewise. (swapbisi2_and_shl8, swapbhisi2, movsi_index_disp_load): Likewise. (movhi_index_disp_load, mov<mode>index_disp_store): Likewise. (mov_t_msb_neg, negt_msb, clipu_one): Likewise.
2024-07-06	AVR: Create more opportunities for -mfuse-add optimization.	Georg-Johann Lay	3	-21/+80
	avr_split_tiny_move() was only run for AVR_TINY because it has no PLUS addressing modes. Same applies to the X register on ordinary cores, and also to the Z register when used with [E]LPM. For example, without this patch long long addLL (long long a, long long b) { return a + b; } compiles with "-mmcu=atmgea128 -Os -dp" to: ... movw r26,r24 ; 80 [c=4 l=1] movhi/0 movw r30,r22 ; 81 [c=4 l=1] movhi/0 ld r18,X ; 82 [c=4 l=1] movqi_insn/3 adiw r26,1 ; 83 [c=4 l=3] movqi_insn/3 ld r19,X sbiw r26,1 adiw r26,2 ; 84 [c=4 l=3] movqi_insn/3 ld r20,X sbiw r26,2 adiw r26,3 ; 85 [c=4 l=3] movqi_insn/3 ld r21,X sbiw r26,3 adiw r26,4 ; 86 [c=4 l=3] movqi_insn/3 ld r22,X sbiw r26,4 adiw r26,5 ; 87 [c=4 l=3] movqi_insn/3 ld r23,X sbiw r26,5 adiw r26,6 ; 88 [c=4 l=3] movqi_insn/3 ld r24,X sbiw r26,6 adiw r26,7 ; 89 [c=4 l=2] movqi_insn/3 ld r25,X ld r10,Z ; 90 [c=4 l=1] movqi_insn/3 ... whereas with this patch it becomes: ... movw r26,r24 ; 80 [c=4 l=1] movhi/0 movw r30,r22 ; 81 [c=4 l=1] movhi/0 ld r18,X+ ; 140 [c=4 l=1] movqi_insn/3 ld r19,X+ ; 142 [c=4 l=1] movqi_insn/3 ld r20,X+ ; 144 [c=4 l=1] movqi_insn/3 ld r21,X+ ; 146 [c=4 l=1] movqi_insn/3 ld r22,X+ ; 148 [c=4 l=1] movqi_insn/3 ld r23,X+ ; 150 [c=4 l=1] movqi_insn/3 ld r24,X+ ; 152 [c=4 l=1] movqi_insn/3 ld r25,X ; 109 [c=4 l=1] movqi_insn/3 ld r10,Z ; 111 [c=4 l=1] movqi_insn/3 ... gcc/ * config/avr/avr.md: Also split with avr_split_tiny_move() for non-AVR_TINY. * config/avr/avr.cc (avr_split_tiny_move): Don't change memory references with base regs that can do PLUS addressing. (avr_out_lpm_no_lpmx) [POST_INC]: Don't output final ADIW when the address register is unused after. gcc/testsuite/ * gcc.target/avr/torture/fuse-add.c: New test.
2024-07-06	RISC-V: fix internal error on global variable-length array	Eric Botcazou	3	-1/+45
	This is an ICE in the RISC-V back-end calling tree_to_uhwi on the DECL_SIZE of a global variable-length array. gcc/ PR target/115591 * config/riscv/riscv.cc (riscv_valid_lo_sum_p): Add missing test on tree_fits_uhwi_p before calling tree_to_uhwi. gcc/testsuite/ * gnat.dg/array41.ads, gnat.dg/array41.adb: New test.
2024-07-06	PR target/115751: Avoid force_reg in ix86_expand_ternlog.	Roger Sayle	1	-2/+13
	This patch fixes a problem with splitting of complex AVX512 ternlog instructions on x86_64. A recent change allows the ternlog pattern to have multiple mem-like operands prior to reload, by emitting any "reloads" as necessary during split1, before register allocation. The issue is that this code calls force_reg to place the mem-like operand into a register, but unfortunately the vec_duplicate (broadcast) form of operands supported by ternlog isn't considered a "general_operand", i.e. supported by all instructions. This mismatch triggers an ICE in the middle-end's force_reg, even though the x86 supports loading these vec_duplicate operands into a vector register in a single (move) instruction. This patch resolves this problem by replacing force_reg with calls to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end, knows these will be recognized by recog). 2024-07-06 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/115751 * config/i386/i386-expand.cc (ix86_expand_ternlog): Avoid use of force_reg to "reload" non-register operands, as these may contain vec_duplicate (broadcast) operands that aren't supported by force_reg. Use (safer) gen_reg_rtx and emit_move instead.
2024-07-06	Daily bump.	GCC Administrator	4	-1/+160

2024-07-05	x86, Darwin: Fix bootstrap for 32b multilibs/hosts.	Iain Sandoe	1	-0/+23
	r15-1735-ge62ea4fb8ffcab06ddd contained changes that altered the codegen for 32b Darwin (whether hosted on 64b or as 32b host) such that the per function picbase load is called multiple times in some cases. Darwin's back end is not expecting this (and indeed some of the handling depends on a single instance). The fixes the issue by marking those instructions as not copyable (as suggested by Andrew Pinski). The change is Darwin-specific. gcc/ChangeLog: * config/i386/i386.cc (ix86_cannot_copy_insn_p): New. (TARGET_CANNOT_COPY_INSN_P): New. Signed-off-by: Iain Sandoe <iains@gcc.gnu.org>
2024-07-06	Fortran: switch test to use issignaling() built-in	Francois-Xavier Coudert	2	-10/+3
	The macro may not be present in all libc's, but the built-in is always available. gcc/testsuite/ChangeLog: * gfortran.dg/ieee/signaling_2.f90: Adjust test. * gfortran.dg/ieee/signaling_2_c.c: Adjust test.
2024-07-05	Arm: Fix ldrd offset range [PR115153]	Wilco Dijkstra	3	-29/+48
	The valid offset range of LDRD in arm_legitimate_index_p is increased to -1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode. Fix this by moving the LDRD check earlier. gcc: PR target/115153 * config/arm/arm.cc (arm_legitimate_index_p): Move LDRD case before NEON. (thumb2_legitimate_index_p): Update comments. (output_move_neon): Use DFmode for vldr/vstr and non-checking adjust_address. gcc/testsuite: PR target/115153 * gcc.target/arm/pr115153.c: Add new test. * lib/target-supports.exp: Add arm_arch_v7ve_neon target support.
2024-07-05	libgccjit: Allow comparing array types	Antoni Boucher	3	-0/+23
	gcc/jit/ChangeLog: * jit-common.h: Add array_type class. * jit-recording.h (type::dyn_cast_array_type, memento_of_get_aligned::dyn_cast_array_type, array_type::dyn_cast_array_type, array_type::is_same_type_as): New methods. gcc/testsuite/ChangeLog: * jit.dg/test-types.c: Add array type comparison to the test.
2024-07-05	libgccjit: Add support for the type bfloat16	Antoni Boucher	8	-2/+67
	gcc/jit/ChangeLog: PR jit/112574 * docs/topics/types.rst: Document GCC_JIT_TYPE_BFLOAT16. * jit-common.h: Update NUM_GCC_JIT_TYPES. * jit-playback.cc (get_tree_node_for_type): Support bfloat16. * jit-recording.cc (recording::memento_of_get_type::get_size, recording::memento_of_get_type::dereference, recording::memento_of_get_type::is_int, recording::memento_of_get_type::is_signed, recording::memento_of_get_type::is_float, recording::memento_of_get_type::is_bool): Support bfloat16. * libgccjit.h (enum gcc_jit_types): Add GCC_JIT_TYPE_BFLOAT16. gcc/testsuite/ChangeLog: PR jit/112574 * jit.dg/all-non-failing-tests.h: New test test-bfloat16.c. * jit.dg/test-types.c: Test GCC_JIT_TYPE_BFLOAT16. * jit.dg/test-bfloat16.c: New test.
2024-07-05	RISC-V: Use tu policy for first-element vec_set [PR115725].	Robin Dapp	6	-33/+22
	This patch changes the tail policy for vmv.s.x from ta to tu. By default the bug does not show up with qemu because qemu's current vmv.s.x implementation always uses the tail-undisturbed policy. With a local qemu version that overwrites the tail with ones when the tail-agnostic policy is specified, the bug shows. gcc/ChangeLog: * config/riscv/autovec.md: Add TU policy. * config/riscv/riscv-protos.h (enum insn_type): Define SCALAR_MOVE_MERGED_OP_TU. gcc/testsuite/ChangeLog: PR target/115725 * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust test expectation. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.
2024-07-05	AVR: target/87376 - Use nop_general_operand for DImode inputs.	Georg-Johann Lay	2	-13/+73
	The avr-dimode.md expanders have code like emit_move_insn(acc_a, operands[1]) where acc_a is a hard register and operands[1] might be a non-generic address-space memory reference. Such loads may clobber hard regs since some of them are implemented as libgcc calls /and/ 64-moves are expanded as eight byte-moves, so that acc_a or acc_b might be clobbered by such a load. This patch simply denies non-generic address-space references by using nop_general_operand for all avr-dimode.md input predicates. With the patch, all memory loads that require library calls are issued before the expander codes from avr-dimode.md are run. PR target/87376 gcc/ * config/avr/avr-dimode.md: Use "nop_general_operand" instead of "general_operand" as predicate for all input operands. gcc/testsuite/ * gcc.target/avr/torture/pr87376.c: New test.
2024-07-05	AArch64: lower 2 reg TBL permutes with one zero register to 1 reg TBL.	Tamar Christina	3	-6/+94
	When a two reg TBL is performed with one operand being a zero vector we can instead use a single reg TBL and map the indices for accessing the zero vector to an out of range constant. On AArch64 out of range indices into a TBL have a defined semantics of setting the element to zero. Many uArches have a slower 2-reg TBL than 1-reg TBL. Before this change we had: typedef unsigned int v4si __attribute__ ((vector_size (16))); v4si f1 (v4si a) { v4si zeros = {0,0,0,0}; return __builtin_shufflevector (a, zeros, 0, 5, 1, 6); } which generates: f1: mov v30.16b, v0.16b movi v31.4s, 0 adrp x0, .LC0 ldr q0, [x0, #:lo12:.LC0] tbl v0.16b, {v30.16b - v31.16b}, v0.16b ret .LC0: .byte 0 .byte 1 .byte 2 .byte 3 .byte 20 .byte 21 .byte 22 .byte 23 .byte 4 .byte 5 .byte 6 .byte 7 .byte 24 .byte 25 .byte 26 .byte 27 and with the patch: f1: adrp x0, .LC0 ldr q31, [x0, #:lo12:.LC0] tbl v0.16b, {v0.16b}, v31.16b ret .LC0: .byte 0 .byte 1 .byte 2 .byte 3 .byte -1 .byte -1 .byte -1 .byte -1 .byte 4 .byte 5 .byte 6 .byte 7 .byte -1 .byte -1 .byte -1 .byte -1 This sequence is generated often by openmp and aside from the strict performance impact of this change, it also gives better register allocation as we no longer have the consecutive register limitation. gcc/ChangeLog: * config/aarch64/aarch64.cc (struct expand_vec_perm_d): Add zero_op0_p and zero_op_p1. (aarch64_evpc_tbl): Implement register value remapping. (aarch64_vectorize_vec_perm_const): Detect if operand is a zero dup before it's forced to a reg. gcc/testsuite/ChangeLog: * gcc.target/aarch64/tbl_with_zero_1.c: New test. * gcc.target/aarch64/tbl_with_zero_2.c: New test.
2024-07-05	AArch64: remove aarch64_simd_vec_unpack<su>_lo_	Tamar Christina	2	-18/+5
	The fix for PR18127 reworked the uxtl to zip optimization. In doing so it undid the changes in aarch64_simd_vec_unpack<su>_lo_ and this now no longer matches aarch64_simd_vec_unpack<su>_hi_. It still works because the RTL generated by aarch64_simd_vec_unpack<su>_lo_ overlaps with the general zero extend RTL and so because that one is listed before the lo pattern recog picks it instead. This removes aarch64_simd_vec_unpack<su>_lo_. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_vec_unpack<su>_lo_<mode>): Remove. (vec_unpack<su>_lo_<mode): Simplify. * config/aarch64/aarch64.cc (aarch64_gen_shareable_zero): Update comment.
2024-07-05	middle-end: Add debug functions to dump dominator tree in dot format	Alex Coplan	1	-0/+30
	This adds debug functions to dump the dominator tree in dot format. There are two overloads: one which takes a FILE * and another which takes a const char fname and wraps the first with fopen/fclose for convenience. gcc/ChangeLog: dominance.cc (dot_dominance_tree): New.