riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-05-09	arm: [MVE intrinsics] rework vrndq vrndaq vrndmq vrndnq vrndpq vrndxq	Christophe Lyon	4	-655/+27
	Implement vrndq, vrndaq, vrndmq, vrndnq, vrndpq, vrndxq using the new MVE builtins framework. 2022-09-08 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_F): New. (vrndaq, vrndmq, vrndnq, vrndpq, vrndq, vrndxq): New. * config/arm/arm-mve-builtins-base.def (vrndaq, vrndmq, vrndnq) (vrndpq, vrndq, vrndxq): New. * config/arm/arm-mve-builtins-base.h (vrndaq, vrndmq, vrndnq) (vrndpq, vrndq, vrndxq): New. * config/arm/arm_mve.h (vrndxq): Remove. (vrndq): Remove. (vrndpq): Remove. (vrndnq): Remove. (vrndmq): Remove. (vrndaq): Remove. (vrndaq_m): Remove. (vrndmq_m): Remove. (vrndnq_m): Remove. (vrndpq_m): Remove. (vrndq_m): Remove. (vrndxq_m): Remove. (vrndq_x): Remove. (vrndnq_x): Remove. (vrndmq_x): Remove. (vrndpq_x): Remove. (vrndaq_x): Remove. (vrndxq_x): Remove. (vrndxq_f16): Remove. (vrndxq_f32): Remove. (vrndq_f16): Remove. (vrndq_f32): Remove. (vrndpq_f16): Remove. (vrndpq_f32): Remove. (vrndnq_f16): Remove. (vrndnq_f32): Remove. (vrndmq_f16): Remove. (vrndmq_f32): Remove. (vrndaq_f16): Remove. (vrndaq_f32): Remove. (vrndaq_m_f16): Remove. (vrndmq_m_f16): Remove. (vrndnq_m_f16): Remove. (vrndpq_m_f16): Remove. (vrndq_m_f16): Remove. (vrndxq_m_f16): Remove. (vrndaq_m_f32): Remove. (vrndmq_m_f32): Remove. (vrndnq_m_f32): Remove. (vrndpq_m_f32): Remove. (vrndq_m_f32): Remove. (vrndxq_m_f32): Remove. (vrndq_x_f16): Remove. (vrndq_x_f32): Remove. (vrndnq_x_f16): Remove. (vrndnq_x_f32): Remove. (vrndmq_x_f16): Remove. (vrndmq_x_f32): Remove. (vrndpq_x_f16): Remove. (vrndpq_x_f32): Remove. (vrndaq_x_f16): Remove. (vrndaq_x_f32): Remove. (vrndxq_x_f16): Remove. (vrndxq_x_f32): Remove. (__arm_vrndxq_f16): Remove. (__arm_vrndxq_f32): Remove. (__arm_vrndq_f16): Remove. (__arm_vrndq_f32): Remove. (__arm_vrndpq_f16): Remove. (__arm_vrndpq_f32): Remove. (__arm_vrndnq_f16): Remove. (__arm_vrndnq_f32): Remove. (__arm_vrndmq_f16): Remove. (__arm_vrndmq_f32): Remove. (__arm_vrndaq_f16): Remove. (__arm_vrndaq_f32): Remove. (__arm_vrndaq_m_f16): Remove. (__arm_vrndmq_m_f16): Remove. (__arm_vrndnq_m_f16): Remove. (__arm_vrndpq_m_f16): Remove. (__arm_vrndq_m_f16): Remove. (__arm_vrndxq_m_f16): Remove. (__arm_vrndaq_m_f32): Remove. (__arm_vrndmq_m_f32): Remove. (__arm_vrndnq_m_f32): Remove. (__arm_vrndpq_m_f32): Remove. (__arm_vrndq_m_f32): Remove. (__arm_vrndxq_m_f32): Remove. (__arm_vrndq_x_f16): Remove. (__arm_vrndq_x_f32): Remove. (__arm_vrndnq_x_f16): Remove. (__arm_vrndnq_x_f32): Remove. (__arm_vrndmq_x_f16): Remove. (__arm_vrndmq_x_f32): Remove. (__arm_vrndpq_x_f16): Remove. (__arm_vrndpq_x_f32): Remove. (__arm_vrndaq_x_f16): Remove. (__arm_vrndaq_x_f32): Remove. (__arm_vrndxq_x_f16): Remove. (__arm_vrndxq_x_f32): Remove. (__arm_vrndxq): Remove. (__arm_vrndq): Remove. (__arm_vrndpq): Remove. (__arm_vrndnq): Remove. (__arm_vrndmq): Remove. (__arm_vrndaq): Remove. (__arm_vrndaq_m): Remove. (__arm_vrndmq_m): Remove. (__arm_vrndnq_m): Remove. (__arm_vrndpq_m): Remove. (__arm_vrndq_m): Remove. (__arm_vrndxq_m): Remove. (__arm_vrndq_x): Remove. (__arm_vrndnq_x): Remove. (__arm_vrndmq_x): Remove. (__arm_vrndpq_x): Remove. (__arm_vrndaq_x): Remove. (__arm_vrndxq_x): Remove.
2023-05-09	arm: [MVE intrinsics] rework vabsq vnegq vclsq vclzq, vqabsq, vqnegq	Christophe Lyon	4	-1264/+30
	Implement vabsq, vnegq, vclsq, vclzq, vqabsq, vqnegq using the new MVE builtins framework. 2022-09-08 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_N_NO_U_F): New. (vabsq, vnegq, vclsq, vclzq, vqabsq, vqnegq): New. * config/arm/arm-mve-builtins-base.def (vabsq, vnegq, vclsq) (vclzq, vqabsq, vqnegq): New. * config/arm/arm-mve-builtins-base.h (vabsq, vnegq, vclsq, vclzq) (vqabsq, vqnegq): New. * config/arm/arm_mve.h (vabsq): Remove. (vabsq_m): Remove. (vabsq_x): Remove. (vabsq_f16): Remove. (vabsq_f32): Remove. (vabsq_s8): Remove. (vabsq_s16): Remove. (vabsq_s32): Remove. (vabsq_m_s8): Remove. (vabsq_m_s16): Remove. (vabsq_m_s32): Remove. (vabsq_m_f16): Remove. (vabsq_m_f32): Remove. (vabsq_x_s8): Remove. (vabsq_x_s16): Remove. (vabsq_x_s32): Remove. (vabsq_x_f16): Remove. (vabsq_x_f32): Remove. (__arm_vabsq_s8): Remove. (__arm_vabsq_s16): Remove. (__arm_vabsq_s32): Remove. (__arm_vabsq_m_s8): Remove. (__arm_vabsq_m_s16): Remove. (__arm_vabsq_m_s32): Remove. (__arm_vabsq_x_s8): Remove. (__arm_vabsq_x_s16): Remove. (__arm_vabsq_x_s32): Remove. (__arm_vabsq_f16): Remove. (__arm_vabsq_f32): Remove. (__arm_vabsq_m_f16): Remove. (__arm_vabsq_m_f32): Remove. (__arm_vabsq_x_f16): Remove. (__arm_vabsq_x_f32): Remove. (__arm_vabsq): Remove. (__arm_vabsq_m): Remove. (__arm_vabsq_x): Remove. (vnegq): Remove. (vnegq_m): Remove. (vnegq_x): Remove. (vnegq_f16): Remove. (vnegq_f32): Remove. (vnegq_s8): Remove. (vnegq_s16): Remove. (vnegq_s32): Remove. (vnegq_m_s8): Remove. (vnegq_m_s16): Remove. (vnegq_m_s32): Remove. (vnegq_m_f16): Remove. (vnegq_m_f32): Remove. (vnegq_x_s8): Remove. (vnegq_x_s16): Remove. (vnegq_x_s32): Remove. (vnegq_x_f16): Remove. (vnegq_x_f32): Remove. (__arm_vnegq_s8): Remove. (__arm_vnegq_s16): Remove. (__arm_vnegq_s32): Remove. (__arm_vnegq_m_s8): Remove. (__arm_vnegq_m_s16): Remove. (__arm_vnegq_m_s32): Remove. (__arm_vnegq_x_s8): Remove. (__arm_vnegq_x_s16): Remove. (__arm_vnegq_x_s32): Remove. (__arm_vnegq_f16): Remove. (__arm_vnegq_f32): Remove. (__arm_vnegq_m_f16): Remove. (__arm_vnegq_m_f32): Remove. (__arm_vnegq_x_f16): Remove. (__arm_vnegq_x_f32): Remove. (__arm_vnegq): Remove. (__arm_vnegq_m): Remove. (__arm_vnegq_x): Remove. (vclsq): Remove. (vclsq_m): Remove. (vclsq_x): Remove. (vclsq_s8): Remove. (vclsq_s16): Remove. (vclsq_s32): Remove. (vclsq_m_s8): Remove. (vclsq_m_s16): Remove. (vclsq_m_s32): Remove. (vclsq_x_s8): Remove. (vclsq_x_s16): Remove. (vclsq_x_s32): Remove. (__arm_vclsq_s8): Remove. (__arm_vclsq_s16): Remove. (__arm_vclsq_s32): Remove. (__arm_vclsq_m_s8): Remove. (__arm_vclsq_m_s16): Remove. (__arm_vclsq_m_s32): Remove. (__arm_vclsq_x_s8): Remove. (__arm_vclsq_x_s16): Remove. (__arm_vclsq_x_s32): Remove. (__arm_vclsq): Remove. (__arm_vclsq_m): Remove. (__arm_vclsq_x): Remove. (vclzq): Remove. (vclzq_m): Remove. (vclzq_x): Remove. (vclzq_s8): Remove. (vclzq_s16): Remove. (vclzq_s32): Remove. (vclzq_u8): Remove. (vclzq_u16): Remove. (vclzq_u32): Remove. (vclzq_m_u8): Remove. (vclzq_m_s8): Remove. (vclzq_m_u16): Remove. (vclzq_m_s16): Remove. (vclzq_m_u32): Remove. (vclzq_m_s32): Remove. (vclzq_x_s8): Remove. (vclzq_x_s16): Remove. (vclzq_x_s32): Remove. (vclzq_x_u8): Remove. (vclzq_x_u16): Remove. (vclzq_x_u32): Remove. (__arm_vclzq_s8): Remove. (__arm_vclzq_s16): Remove. (__arm_vclzq_s32): Remove. (__arm_vclzq_u8): Remove. (__arm_vclzq_u16): Remove. (__arm_vclzq_u32): Remove. (__arm_vclzq_m_u8): Remove. (__arm_vclzq_m_s8): Remove. (__arm_vclzq_m_u16): Remove. (__arm_vclzq_m_s16): Remove. (__arm_vclzq_m_u32): Remove. (__arm_vclzq_m_s32): Remove. (__arm_vclzq_x_s8): Remove. (__arm_vclzq_x_s16): Remove. (__arm_vclzq_x_s32): Remove. (__arm_vclzq_x_u8): Remove. (__arm_vclzq_x_u16): Remove. (__arm_vclzq_x_u32): Remove. (__arm_vclzq): Remove. (__arm_vclzq_m): Remove. (__arm_vclzq_x): Remove. (vqabsq): Remove. (vqnegq): Remove. (vqnegq_m): Remove. (vqabsq_m): Remove. (vqabsq_s8): Remove. (vqabsq_s16): Remove. (vqabsq_s32): Remove. (vqnegq_s8): Remove. (vqnegq_s16): Remove. (vqnegq_s32): Remove. (vqnegq_m_s8): Remove. (vqabsq_m_s8): Remove. (vqnegq_m_s16): Remove. (vqabsq_m_s16): Remove. (vqnegq_m_s32): Remove. (vqabsq_m_s32): Remove. (__arm_vqabsq_s8): Remove. (__arm_vqabsq_s16): Remove. (__arm_vqabsq_s32): Remove. (__arm_vqnegq_s8): Remove. (__arm_vqnegq_s16): Remove. (__arm_vqnegq_s32): Remove. (__arm_vqnegq_m_s8): Remove. (__arm_vqabsq_m_s8): Remove. (__arm_vqnegq_m_s16): Remove. (__arm_vqabsq_m_s16): Remove. (__arm_vqnegq_m_s32): Remove. (__arm_vqabsq_m_s32): Remove. (__arm_vqabsq): Remove. (__arm_vqnegq): Remove. (__arm_vqnegq_m): Remove. (__arm_vqabsq_m): Remove.
2023-05-09	arm: [MVE intrinsics] factorize several unary operations	Christophe Lyon	2	-337/+126
	Factorize vabs vcls vclz vneg vqabs vqneg vrnda vrndm vrndn vrndp vrnd vrndx so that they use the same pattern. This patch introduces the mve_mnemo iterator because some of the involved intrinsics have a different name from their mnenonic: for instance vrndq vs vrintz. 2022-09-08 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/iterators.md (MVE_INT_M_UNARY, MVE_INT_UNARY) (MVE_FP_UNARY, MVE_FP_M_UNARY): New. (mve_insn): Add vabs, vcls, vclz, vneg, vqabs, vqneg, vrnda, vrndm, vrndn, vrndp, vrnd, vrndx. (isu): Add VABSQ_M_S, VCLSQ_M_S, VCLZQ_M_S, VCLZQ_M_U, VNEGQ_M_S, VQABSQ_M_S, VQNEGQ_M_S. (mve_mnemo): New. * config/arm/mve.md (mve_vrndq_m_f<mode>, mve_vrndxq_f<mode>) (mve_vrndq_f<mode>, mve_vrndpq_f<mode>, mve_vrndnq_f<mode>) (mve_vrndmq_f<mode>, mve_vrndaq_f<mode>): Merge into ... (@mve_<mve_insn>q_f<mode>): ... this. (mve_vnegq_f<mode>, mve_vabsq_f<mode>): Merge into ... (mve_v<absneg_str>q_f<mode>): ... this. (mve_vnegq_s<mode>, mve_vabsq_s<mode>): Merge into ... (mve_v<absneg_str>q_s<mode>): ... this. (mve_vclsq_s<mode>, mve_vqnegq_s<mode>, mve_vqabsq_s<mode>): Merge into ... (@mve_<mve_insn>q_<supf><mode>): ... this. (mve_vabsq_m_s<mode>, mve_vclsq_m_s<mode>) (mve_vclzq_m_<supf><mode>, mve_vnegq_m_s<mode>) (mve_vqabsq_m_s<mode>, mve_vqnegq_m_s<mode>): Merge into ... (@mve_<mve_insn>q_m_<supf><mode>): ... this. (mve_vabsq_m_f<mode>, mve_vnegq_m_f<mode>, mve_vrndaq_m_f<mode>) (mve_vrndmq_m_f<mode>, mve_vrndnq_m_f<mode>, mve_vrndpq_m_f<mode>) (mve_vrndxq_m_f<mode>): Merge into ... (@mve_<mve_insn>q_m_f<mode>): ... this.
2023-05-09	arm: [MVE intrinsics] add unary shape	Christophe Lyon	2	-0/+28
	This patch adds the unary shape description. 2022-09-08 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm-mve-builtins-shapes.cc (unary): New. * config/arm/arm-mve-builtins-shapes.h (unary): New.
2023-05-09	mux-utils.h: Fix a comment typo	Jakub Jelinek	1	-1/+1
	Trivial comment typo... 2023-05-09 Jakub Jelinek <jakub@redhat.com> * mux-utils.h: Fix comment typo, avoides -> avoids.
2023-05-09	testsuite: Add further testcase for already fixed PR [PR109778]	Jakub Jelinek	2	-0/+29
	I came up with a testcase which reproduces all the way to r10-7469. LTO to avoid early inlining it, so that ccp handles rotates and not shifts before they are turned into rotates. 2023-05-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109778 * gcc.dg/lto/pr109778_0.c: New test. * gcc.dg/lto/pr109778_1.c: New file.
2023-05-09	tree-ssa-ccp, wide-int: Fix up handling of [LR]ROTATE_EXPR in bitwise ccp ↵	Jakub Jelinek	3	-6/+37
	[PR109778] The following testcase is miscompiled, because bitwise ccp2 handles a rotate with a signed type incorrectly. Seems tree-ssa-ccp.cc has the only callers of wi::[lr]rotate with 3 arguments, all other callers just rotate in the right precision and I think work correctly. ccp works with widest_ints and so rotations by the excessive precision certainly don't match what it wants when it sees a rotate in some specific bitsize. Still, if it is unsigned rotate and the widest_int is zero extended from width, the functions perform left shift and logical right shift on the value and then at the end zero extend the result of left shift and uselessly also the result of logical right shift and return \| of that. On the testcase we the signed char rrotate by 4 argument is CONSTANT -75 i.e. 0xffffffff....fffffb5 with mask 2. The mask is correctly rotated to 0x20, but because the 8-bit constant is sign extended to 192-bit one, the logical right shift by 4 doesn't yield expected 0xb, but gives 0xfffffffffff....ffffb, and then return wi::zext (left, width) \| wi::zext (right, width); where left is 0xfffffff....fb50, so we return 0xfb instead of the expected 0x5b. The following patch fixes that by doing the zero extension in case of the right variable before doing wi::lrshift rather than after it. Also, wi::[lr]rotate widht width < precision always zero extends the result. I'm afraid it can't do better because it doesn't know if it is done for an unsigned or signed type, but the caller in this case knows that very well, so I've done the extension based on sgn in the caller. E.g. 0x5b rotated right (or left) by 4 with width 8 previously gave 0xb5, but sgn == SIGNED in widest_int it should be 0xffffffff....fffb5 instead. 2023-05-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109778 * wide-int.h (wi::lrotate, wi::rrotate): Call wi::lrshift on wi::zext (x, width) rather than x if width != precision, rather than using wi::zext (right, width) after the shift. * tree-ssa-ccp.cc (bit_value_binop): Call wi::ext on the results of wi::lrotate or wi::rrotate. * gcc.c-torture/execute/pr109778.c: New test.
2023-05-09	genmatch: fixup get_out_file	Alexander Monakov	1	-24/+17
	get_out_file did not follow the coding conventions (mixing three-space and two-space indentation, missing linebreak before function name). Take that as an excuse to reimplement it in a more terse manner and rename as 'choose_output', which is hopefully more descriptive. gcc/ChangeLog: * genmatch.cc (get_out_file): Make static and rename to ... (choose_output): ... this. Reimplement. Update all uses ... (decision_tree::gen): ... here and ... (main): ... here.
2023-05-09	genmatch: clean up showUsage	Alexander Monakov	1	-11/+10
	Display usage more consistently and get rid of camelCase. gcc/ChangeLog: * genmatch.cc (showUsage): Reimplement as ... (usage): ...this. Adjust all uses. (main): Print usage when no arguments. Add missing 'return 1'.
2023-05-09	genmatch: clean up emit_func	Alexander Monakov	1	-45/+52
	Eliminate boolean parameters of emit_func. The first ('open') just prints 'extern' to generated header, which is unnecessary. Introduce a separate function to use when finishing a declaration in place of the second ('close'). Rename emit_func to 'fp_decl' (matching 'fprintf' in length) to unbreak indentation in several places. Reshuffle emitted line breaks in a few places to make generated declarations less ugly. gcc/ChangeLog: * genmatch.cc (header_file): Make static. (emit_func): Rename to... (fp_decl): ... this. Adjust all uses. (fp_decl_done): New function. Use it... (decision_tree::gen): ... here and... (write_predicate): ... here. (main): Adjust.
2023-05-09	aarch64: Avoid hard-coding specific register allocations	Richard Sandiford	33	-272/+269
	Some tests hard-coded specific allocations for temporary registers, whereas the RA should be free to pick anything that doesn't force unnecessary moves or spills. gcc/testsuite/ * gcc.target/aarch64/asimd-mul-to-shl-sub.c: Allow any register allocation for temporary results, rather than requiring specific registers. * gcc.target/aarch64/auto-init-padding-1.c: Likewise. * gcc.target/aarch64/auto-init-padding-2.c: Likewise. * gcc.target/aarch64/auto-init-padding-3.c: Likewise. * gcc.target/aarch64/auto-init-padding-4.c: Likewise. * gcc.target/aarch64/auto-init-padding-9.c: Likewise. * gcc.target/aarch64/memset-corner-cases.c: Likewise. * gcc.target/aarch64/memset-q-reg.c: Likewise. * gcc.target/aarch64/simd/vaddlv_1.c: Likewise. * gcc.target/aarch64/sve-neon-modes_1.c: Likewise. * gcc.target/aarch64/sve-neon-modes_3.c: Likewise. * gcc.target/aarch64/sve/load_scalar_offset_1.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise. * gcc.target/aarch64/sve/pr89007-1.c: Likewise. * gcc.target/aarch64/sve/pr89007-2.c: Likewise. * gcc.target/aarch64/sve/store_scalar_offset_1.c: Likewise. * gcc.target/aarch64/vadd_reduc-1.c: Likewise. * gcc.target/aarch64/vadd_reduc-2.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Allow the temporary predicate register to be any of p4-p7, rather than requiring p4 specifically. * gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise. * gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
2023-05-09	aarch64: Relax FP/vector register matches	Richard Sandiford	22	-82/+82
	There were many tests that used [0-9] to match an FP or vector register, but that should allow any of 0-31 instead. asm-x-constraint-1.c required s0-s7, but that's the range for "y" rather than "x". "x" allows s0-s15. sve/pcs/return_9.c required z2-z7 (the initial set of available call-clobbered registers), but z24-z31 are OK too. gcc/testsuite/ * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: Allow any FP/vector register, not just register 0-9. * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. * gcc.target/aarch64/ldp_stp_8.c: Likewise. * gcc.target/aarch64/ldp_stp_17.c: Likewise. * gcc.target/aarch64/ldp_stp_21.c: Likewise. * gcc.target/aarch64/simd/vpaddd_f64.c: Likewise. * gcc.target/aarch64/simd/vpaddd_s64.c: Likewise. * gcc.target/aarch64/simd/vpaddd_u64.c: Likewise. * gcc.target/aarch64/sve/adr_1.c: Likewise. * gcc.target/aarch64/sve/adr_2.c: Likewise. * gcc.target/aarch64/sve/adr_3.c: Likewise. * gcc.target/aarch64/sve/adr_4.c: Likewise. * gcc.target/aarch64/sve/adr_5.c: Likewise. * gcc.target/aarch64/sve/extract_1.c: Likewise. * gcc.target/aarch64/sve/extract_2.c: Likewise. * gcc.target/aarch64/sve/extract_3.c: Likewise. * gcc.target/aarch64/sve/extract_4.c: Likewise. * gcc.target/aarch64/sve/slp_4.c: Likewise. * gcc.target/aarch64/sve/spill_3.c: Likewise. * gcc.target/aarch64/vfp-1.c: Likewise. * gcc.target/aarch64/asm-x-constraint-1.c: Allow s0-s15, not just s0-s7. * gcc.target/aarch64/sve/pcs/return_9.c: Allow z24-z31 as well as z2-z7.
2023-05-09	aarch64: Relax predicate register matches	Richard Sandiford	22	-578/+578
	Most governing predicate operands require p0-p7, but some instructions also allow p8-p15. Non-gp uses of predicates often also allow all of p0-p15. This patch fixes up cases where we required p0-p7 unnecessarily. In some cases we match the definition (typically a comparison, PFALSE or PTRUE), sometimes we match the use (like a logic instruction, MOV or SEL), and sometimes we match both. gcc/testsuite/ * g++.target/aarch64/sve/vcond_1.C: Allow any predicate register for the temporary results, not just p0-p7. * gcc.target/aarch64/sve/acle/asm/dupq_b8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dupq_b16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dupq_b32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dupq_b64.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilele_5.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilele_6.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilele_7.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilele_9.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilele_10.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilelt_1.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilelt_2.c: Likewise. * gcc.target/aarch64/sve/acle/general/whilelt_3.c: Likewise. * gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise. * gcc.target/aarch64/sve/peel_ind_2.c: Likewise. * gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise. * gcc.target/aarch64/sve/vcond_2.c: Likewise. * gcc.target/aarch64/sve/vcond_3.c: Likewise. * gcc.target/aarch64/sve/vcond_7.c: Likewise. * gcc.target/aarch64/sve/vcond_18.c: Likewise. * gcc.target/aarch64/sve/vcond_19.c: Likewise. * gcc.target/aarch64/sve/vcond_20.c: Likewise.
2023-05-09	aarch64: Relax ordering requirements in SVE dup tests	Richard Sandiford	6	-0/+384
	Some of the svdup tests expand to a SEL between two constant vectors. This patch allows the constants to be formed in either order. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/dup_s16.c: When using SEL to select between two constant vectors, allow the constant moves to appear in either order. * gcc.target/aarch64/sve/acle/asm/dup_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dup_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dup_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dup_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/dup_u64.c: Likewise.
2023-05-09	aarch64: Allow moves after tied-register intrinsics	Richard Sandiford	38	-0/+360
	Some ACLE intrinsics map to instructions that tie the output operand to an input operand. If all the operands are allocated to different registers, and if MOVPRFX can't be used, we will need a move either before the instruction or after it. Many tests only matched the "before" case; this patch makes them accept the "after" case too. gcc/testsuite/ * gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: Allow moves to occur after the intrinsic instruction, rather than requiring them to happen before. * gcc.target/aarch64/advsimd-intrinsics/bfdot-1.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vdot-3-1.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adda_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/brka_b.c: Likewise. * gcc.target/aarch64/sve/acle/asm/brkb_b.c: Likewise. * gcc.target/aarch64/sve/acle/asm/brkn_b.c: Likewise. * gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/clasta_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/clasta_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/clasta_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/clastb_f16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/clastb_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/clastb_f64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/pfirst_b.c: Likewise. * gcc.target/aarch64/sve/acle/asm/pnext_b16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/pnext_b32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/pnext_b64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/pnext_b8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sli_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sli_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sli_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sli_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sli_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sli_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sli_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sli_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sri_s16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sri_s32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sri_s64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sri_s8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sri_u16.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sri_u32.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sri_u64.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sri_u8.c: Likewise.
2023-05-09	aarch64: Fix move-after-intrinsic function-body tests	Richard Sandiford	5	-7/+7
	Some of the SVE ACLE asm tests tried to be agnostic about the instruction order, but only one of the alternatives was exercised in practice. This patch fixes latent typos in the other versions. gcc/testsuite/ * gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Fix expected register allocation in the case where a move occurs after the intrinsic instruction. * gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise. * gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise.
2023-05-09	ira: Don't create copies for earlyclobbered pairs	Richard Sandiford	19	-18/+21
	This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a ("the previous patch"). To start by quoting that: If an insn requires two operands to be tied, and the input operand dies in the insn, IRA acts as though there were a copy from the input to the output with the same execution frequency as the insn. Allocating the same register to the input and the output then saves the cost of a move. If there is no such tie, but an input operand nevertheless dies in the insn, IRA creates a similar move, but with an eighth of the frequency. This helps to ensure that chains of instructions reuse registers in a natural way, rather than using arbitrarily different registers for no reason. This heuristic seems to work well in the vast majority of cases. However, the problem fixed in the previous patch was that we could create a copy for an operand pair even if, for all relevant alternatives, the output and input register classes did not have any registers in common. It is then impossible for the output operand to reuse the dying input register. This left unfixed a further case where copies don't make sense: there is no point trying to reuse the dying input register if, for all relevant alternatives, the output is earlyclobbered and the input doesn't match the output. (Matched earlyclobbers are fine.) Handling that case fixes several existing XFAILs and helps with a follow-on aarch64 patch. Tested on aarch64-linux-gnu and x86_64-linux-gnu. A SPEC2017 run on aarch64 showed no differences outside the noise. Also, I tried compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target per cpu directory, using the options -Os -fno-schedule-insns{,2}. The results below summarise the tests that showed a difference in LOC: Target Tests Good Bad Delta Best Worst Median ====== ===== ==== === ===== ==== ===== ====== amdgcn-amdhsa 14 7 7 3 -18 10 -1 arm-linux-gnueabihf 16 15 1 -22 -4 2 -1 csky-elf 6 6 0 -21 -6 -2 -4 hppa64-hp-hpux11.23 5 5 0 -7 -2 -1 -1 ia64-linux-gnu 16 16 0 -70 -15 -1 -3 m32r-elf 53 1 52 64 -2 8 1 mcore-elf 2 2 0 -8 -6 -2 -6 microblaze-elf 285 283 2 -909 -68 4 -1 mmix 7 7 0 -2101 -2091 -1 -1 msp430-elf 1 1 0 -4 -4 -4 -4 pru-elf 8 6 2 -12 -6 2 -2 rx-elf 22 18 4 -40 -5 6 -2 sparc-linux-gnu 15 14 1 -40 -8 1 -2 sparc-wrs-vxworks 15 14 1 -40 -8 1 -2 visium-elf 2 1 1 0 -2 2 -2 xstormy16-elf 1 1 0 -2 -2 -2 -2 with other targets showing no sensitivity to the patch. The only target that seems to be negatively affected is m32r-elf; otherwise the patch seems like an extremely minor but still clear improvement. gcc/ * ira-conflicts.cc (can_use_same_reg_p): Skip over non-matching earlyclobbers. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c: Remove XFAILs. * gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise.
2023-05-08	c++: non-template friend of template [PR106740]	Jason Merrill	1	-0/+18
	This was fixed by r13-1018, but the testcase seems needed. PR c++/106740 gcc/testsuite/ChangeLog: * g++.dg/template/friend78.C: New test.
2023-05-09	Daily bump.	GCC Administrator	8	-1/+286

2023-05-08	[x86_64] Introduce insvti_highpart define_insn_and_split.	Roger Sayle	2	-1/+38
	This is a repost/respin of a patch that was conditionally approved: https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html This patch adds a convenient post-reload splitter for setting/updating the highpart of a TImode variable, using i386's previously added split_double_concat infrastructure. For the new test case below: __int128 foo(__int128 x, unsigned long long y) { __int128 t = (__int128)y << 64; __int128 r = (x & ~0ull) \| t; return r; } mainline GCC with -O2 currently generates: foo: movq %rdi, %rcx xorl %eax, %eax xorl %edi, %edi orq %rcx, %rax orq %rdi, %rdx ret with this patch, GCC instead now generates the much better: foo: movq %rdi, %rcx movq %rcx, %rax ret It turns out that the -m32 equivalent of this testcase, already avoids using explict orl/xor instructions, as it gets optimized (in combine) by a completely different path. Given that this idiom isn't seen in 32-bit code (so this pattern doesn't match with -m32), and also that the shorter 32-bit AND bitmask is represented as a CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split is implemented for just TARGET_64BIT rather than contort a "generic" implementation using DWI mode iterators. 2023-05-08 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386.md (any_or_plus): Move definition earlier. (insvti_highpart_1): New define_insn_and_split to overwrite (insv) the highpart of a TImode register/memory. gcc/testsuite/ChangeLog gcc.target/i386/insvti_highpart-1.c: New test case.
2023-05-08	Fix cfg maintenance after inlining in AutoFDO	Eugene Rozenfeld	1	-9/+12
	Todo from early_inliner needs to be propagated so that cleanup_tree_cfg () is called if necessary. This bug was causing an assert in get_loop_body during ipa-sra in autoprofiledbootstrap build since loops weren't fixed up and one of the loops had num_nodes set to 0. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (auto_profile): Check todo from early_inline to see if cleanup_tree_vfg needs to be called. (early_inline): Return todo from early_inliner.
2023-05-08	Fix pr81192.c for int16 targets	Andrew Pinski	1	-4/+4
	I had missed when converting this testcase to Gimple that there was a define for int/unsigned type specifically to get an INT32 type. This means when using a literal integer constant you need to use the `_Literal (type)` to form the types correctly on the constants. This fixes the issue and has been both tested on xstormy16-elf and x86_64-linux-gnu. Committed as obvious. gcc/testsuite/ChangeLog: PR testsuite/109776 * gcc.dg/pr81192.c: Fix integer constants for int16 targets.
2023-05-08	RISC-V: Factor out vector manager code in vsetvli insertion pass. [NFC]	Kito Cheng	1	-20/+67
	gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vector_info): New. (pass_vsetvl::get_block_info): New. (pass_vsetvl::update_vector_info): New. (pass_vsetvl::simple_vsetvl): Use get_vector_info. (pass_vsetvl::compute_local_backward_infos): Ditto. (pass_vsetvl::transfer_before): Ditto. (pass_vsetvl::transfer_after): Ditto. (pass_vsetvl::emit_local_forward_vsetvls): Ditto. (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto. (pass_vsetvl::cleanup_insns): Ditto. (pass_vsetvl::compute_local_backward_infos): Use update_vector_info.
2023-05-08	RISC-V: Improve portability of testcases	Kito Cheng	3	-2/+13
	stdint.h will require having corresponding multi-lib existing, so using stdint-gcc.h instead, also added a riscv_vector.h wrapper to gcc.target/riscv/rvv/autovec/. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: Change stdint.h to stdint-gcc.h. * gcc.target/riscv/rvv/autovec/template-1.h: Ditto. * gcc.target/riscv/rvv/autovec/riscv_vector.h: New.
2023-05-08	Fix minor length computation on stormy16	Jeff Law	1	-1/+2
	Today's build of xstormy16-elf failed due to a branch to an out of range target. Manual inspection of the assembly code for the affected function (divdi3) showed that the zero-extension patterns were claiming a length of 2, but clearly assembled into 4 bytes. This patch adds an explicit length to the zero extension pattern and appears to resolve the issue in my test builds. gcc/ * config/stormy16/stormy16.md (zero_extendhisi2): Fix length.
2023-05-08	libgomp C++ testsuite: Use 'lang_include_flags' instead of 'libstdcxx_includes'	Thomas Schwinge	2	-8/+6
	With nvptx offloading configured, and supported, and CUDA available: $ make check-target-libgomp RUNTESTFLAGS="--all c.exp=context-1.c c++.exp=context-1.c" [...] Running [...]/libgomp.oacc-c/c.exp ... PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 Running [...]/libgomp.oacc-c++/c++.exp ... PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 [...] ..., but for 'c++.exp=context-1.c' alone, we currently get all-UNSUPPORTED: $ make check-target-libgomp RUNTESTFLAGS_="--all c++.exp=context-1.c" [...] Running [...]/libgomp.oacc-c++/c++.exp ... UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 [...] That is, if 'c.exp' executes first, it does successfully evaluate 'dg-require-effective-target openacc_cublas' -- and does cache this result (so it isn't reevaluated for 'c++.exp'). However, for 'c++.exp' alone (that is, without the 'c.exp' result cached), we run into: spawn -ignore SIGHUP [xgcc] [...] -x c++ openacc_cublas2311907.c [...] In file included from /usr/include/cuda_fp16.h:3673, from /usr/include/cublas_api.h:75, from /usr/include/cublas_v2.h:65, from openacc_cublas2311907.c:3: /usr/include/cuda_fp16.hpp:67:10: fatal error: utility: No such file or directory We're missing include paths to C++/libstdc++ build-tree headers. Fix this by using the mechanism introduced for Fortran in r212268 (commit f707da16f714f7fe5a42391748212c84dfec639b) re "libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}". libgomp/ * testsuite/libgomp.c++/c++.exp: Use 'lang_include_flags' instead of 'libstdcxx_includes'. * testsuite/libgomp.oacc-c++/c++.exp: Likewise.
2023-05-08	Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' ↵	Thomas Schwinge	7	-34/+44
	the 'LTO_TORTURE_OPTIONS' Otherwise, for example for 'RUNTESTFLAGS' of '--target_board=unix\{-m64,-m32\}' vs. '--target_board=unix\{-m32,-m64\}', both variants exercise testing with always the first flag variant's 'LTO_OPTIONS'/'LTO_TORTURE_OPTIONS', which results in unequal test results between the two 'RUNTESTFLAGS' variants if one of the flag variants has 'check_linker_plugin_available' but the other doesn't. Fix-up for r180245 (commit c1a7cdbbcca90ad5260bfc543f8c10f3514e76c1) "Update testsuite to run with slim LTO". gcc/testsuite/ * g++.dg/guality/guality.exp: Move 'torture-init' earlier. * gcc.dg/guality/guality.exp: Likewise. * gfortran.dg/guality/guality.exp: Likewise. * lib/c-torture.exp (LTO_TORTURE_OPTIONS): Don't set. * lib/gcc-dg.exp (LTO_TORTURE_OPTIONS): Don't set. * lib/lto.exp (lto_init, lto_finish): Let each 'lto_init' determine the default 'LTO_OPTIONS'. * lib/torture-options.exp (torture-init, torture-finish): Let each 'torture-init' determine the 'LTO_TORTURE_OPTIONS'.
2023-05-08	libgomp: Simplify OpenMP reverse offload host <-> device memory copy ↵	Thomas Schwinge	6	-103/+96
	implementation ... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it. Follow-up to commit 131d18e928a3ea1ab2d3bf61aa92d68a8a254609 "libgomp/nvptx: Prepare for reverse-offload callback handling", and commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8 "libgomp: Handle OpenMP's reverse offloads". libgomp/ * target.c (gomp_target_rev): Instead of 'dev_to_host_cpy', 'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'. * libgomp.h (gomp_target_rev): Adjust. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust. * plugin/plugin-gcn.c (process_reverse_offload): Adjust. * plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy) (rev_off_host_to_dev_cpy): Remove. (GOMP_OFFLOAD_run): Adjust.
2023-05-08	libgm2: Remove 'autogen.sh'	Thomas Schwinge	1	-30/+0
	... given that plain 'autoreconf' achieves the same. libgm2/ * autogen.sh: Remove.
2023-05-08	libgm2: Adjust 'autogen.sh' to 'ACLOCAL_AMFLAGS', and simplify	Thomas Schwinge	14	-50/+49
	Specifying explicit '-I ..' before '-I ../config' is what (most) other GCC components do. Specifying '-I .' is not necessary. With the order of '-I's aligned, 'autogen.sh' and plain 'autoreconf' then produce identical results. libgm2/ * autogen.sh: For 'aclocal', 'autoreconf', remove '-I .', add '-I ..'. * Makefile.am (ACLOCAL_AMFLAGS): Remove '-I .'. * libm2cor/Makefile.am (ACLOCAL_AMFLAGS): Likewise. * libm2iso/Makefile.am (ACLOCAL_AMFLAGS): Likewise. * libm2log/Makefile.am (ACLOCAL_AMFLAGS): Likewise. * libm2min/Makefile.am (ACLOCAL_AMFLAGS): Likewise. * libm2pim/Makefile.am (ACLOCAL_AMFLAGS): Likewise. * aclocal.m4: Regenerate. * Makefile.in: Likewise. * libm2cor/Makefile.in: Likewise. * libm2iso/Makefile.in: Likewise. * libm2log/Makefile.in: Likewise. * libm2min/Makefile.in: Likewise. * libm2pim/Makefile.in: Likewise.
2023-05-08	c++: list CTAD and resolve_nondeduced_context [PR106214]	Patrick Palka	2	-14/+41
	This extends the PR93107 fix, which made us do resolve_nondeduced_context on the elements of an initializer list during auto deduction, to happen for CTAD as well. PR c++/106214 PR c++/93107 gcc/cp/ChangeLog: * pt.cc (do_auto_deduction): Move up resolve_nondeduced_context calls to happen before do_class_deduction. Add some error_mark_node tests. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/class-deduction114.C: New test.
2023-05-08	Bump up precision size to 16 bits.	Michael Meissner	1	-12/+12
	The new __dmr type that is being added as a possible future PowerPC instruction set bumps into a structure field size issue. The size of the __dmr type is 1024 bits. The precision field in tree_type_common is currently 10 bits, so if you store 1,024 into field, you get a 0 back. When you get 0 in the precision field, the ccp pass passes this 0 to sext_hwi in hwint.h. That function in turn generates a shift that is equal to the host wide int bit size, which is undefined as machine dependent for shifting in C/C++. int shift = HOST_BITS_PER_WIDE_INT - prec; return ((HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) src << shift)) >> shift; It turns out the x86_64 where I first did my tests returns the original input before the two shifts, while the PowerPC always returns 0. In the ccp pass, the original input is -1, and so it worked. When I did the runs on the PowerPC, the result was 0, which ultimately led to the failure. 2023-02-01 Richard Biener <rguenther@suse.de> Michael Meissner <meissner@linux.ibm.com> PR middle-end/108623 * tree-core.h (tree_type_common): Bump up precision field to 16 bits. Align bit fields > 1 bit to at least an 8-bit boundary.
2023-05-08	fortran: Fix coding style around free()	Bernhard Reutner-Fischer	6	-9/+9
	Fix coding-style errors introduced in ca2f64d5d08c1699ca4b7cb2bf6a76692e809e0f gcc/fortran/ChangeLog: * resolve.cc (resolve_select_type): Fix coding style. libgfortran/ChangeLog: * caf/single.c (_gfortran_caf_register): Fix coding style. * io/async.c (update_pdt, async_io): Likewise. * io/format.c (free_format_data): Likewise. * io/transfer.c (st_read_done_worker, st_write_done_worker): Likewise. * io/unix.c (mem_close): Likewise.
2023-05-08	PHIOPT: factor out unary operations instead of just conversions	Andrew Pinski	6	-28/+71
	After using factor_out_conditional_conversion with diamond bb, we should be able do use it also for all normal unary gimple and not just conversions. This allows to optimize PR 59424 for an example. This is also a start to optimize PR 64700 and a few others. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. An example of this is: ``` static inline unsigned long long g(int t) { unsigned t1 = t; return t1; } static int abs1(int a) { if (a < 0) a = -a; return a; } unsigned long long f(int c, int d, int e) { unsigned long long t; if (d > e) t = g(abs1(d)); else t = g(abs1(e)); return t; } ``` Which should be optimized to: _9 = MAX_EXPR <d_5(D), e_6(D)>; _4 = ABS_EXPR <_9>; t_3 = (long long unsigned intD.16) _4; gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ... (factor_out_conditional_operation): This and add support for all unary operations. (pass_phiopt::execute): Update call to factor_out_conditional_conversion to call factor_out_conditional_operation instead. PR tree-optimization/109424 PR tree-optimization/59424 gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/abs-2.c: Update tree scan for details change in wording. * gcc.dg/tree-ssa/minmax-17.c: Likewise. * gcc.dg/tree-ssa/pr103771.c: Likewise. * gcc.dg/tree-ssa/minmax-18.c: New test. * gcc.dg/tree-ssa/minmax-19.c: New test.
2023-05-08	PHIOPT: Loop over calling factor_out_conditional_conversion	Andrew Pinski	2	-12/+36
	After adding diamond shaped bb support to factor_out_conditional_conversion, we can get a case where we have two conversions that needs factored out and then would have another phiopt happen. An example is: ``` static inline unsigned long long g(int t) { unsigned t1 = t; return t1; } unsigned long long f(int c, int d, int e) { unsigned long long t; if (c > d) t = g(c); else t = g(d); return t; } ``` In this case we should get a MAX_EXPR in phiopt1 with two casts. Before this patch, we would just factor out the outer cast and then wait till phiopt2 to factor out the inner cast. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (pass_phiopt::execute): Loop over factor_out_conditional_conversion. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-17.c: New test.
2023-05-08	PHIOPT: Add diamond bb form to factor_out_conditional_conversion	Andrew Pinski	4	-4/+42
	So the function factor_out_conditional_conversion already supports diamond shaped bb forms, just need to be called for such a thing. harden-cond-comp.c needed to be changed as we would optimize out the conversion now and that causes the compare hardening not needing to split the block which it was testing. So change it such that there would be no chance of optimization. Also add two testcases that showed the improvement. PR 103771 is solved in ifconvert also for the vectorizer but now it is solved in a general sense. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/49959 PR tree-optimization/103771 gcc/ChangeLog: * tree-ssa-phiopt.cc (pass_phiopt::execute): Support Diamond shapped bb form for factor_out_conditional_conversion. gcc/testsuite/ChangeLog: * c-c++-common/torture/harden-cond-comp.c: Change testcase slightly to avoid the new phiopt optimization. * gcc.dg/tree-ssa/abs-2.c: New test. * gcc.dg/tree-ssa/pr103771.c: New test.
2023-05-08	RISC-V: Fix ugly && incorrect codes of RVV auto-vectorization	Juzhe-Zhong	7	-47/+19
	1. Add movmisalign pattern for TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT targethook, current RISC-V has supported this target hook, we can't make it supported without movmisalign pattern. 2. Remove global extern of get_mask_policy_no_pred && get_tail_policy_no_pred. These 2 functions are comming from intrinsic builtin frameworks. We are sure we don't need them in auto-vectorization implementation. 3. Refine mask mode implementation. 4. We should not have "riscv_vector_" in riscv_vector namspace since it makes the codes inconsistent and ugly. For example: Before this patch: static opt_machine_mode riscv_get_mask_mode (machine_mode mode) { machine_mode mask_mode = VOIDmode; if (TARGET_VECTOR && riscv_vector::riscv_vector_get_mask_mode (mode).exists (&mask_mode)) return mask_mode; .. After this patch: riscv_get_mask_mode (machine_mode mode) { machine_mode mask_mode = VOIDmode; if (TARGET_VECTOR && riscv_vector::get_mask_mode (mode).exists (&mask_mode)) return mask_mode; .. 5. Fix fail testcase fixed-vlmax-1.c. gcc/ChangeLog: * config/riscv/autovec.md (movmisalign<mode>): New pattern. * config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): Delete. (riscv_vector_get_mask_mode): Ditto. (get_mask_policy_no_pred): Ditto. (get_tail_policy_no_pred): Ditto. (get_mask_mode): New function. * config/riscv/riscv-v.cc (get_mask_policy_no_pred): Delete. (get_tail_policy_no_pred): Ditto. (riscv_vector_mask_mode_p): Ditto. (riscv_vector_get_mask_mode): Ditto. (get_mask_mode): New function. * config/riscv/riscv-vector-builtins.cc (use_real_merge_p): Remove global extern. (get_tail_policy_for_pred): Ditto. * config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred): Ditto. (get_mask_policy_for_pred): Ditto * config/riscv/riscv.cc (riscv_get_mask_mode): Refine codes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Fix typo.
2023-05-08	RISC-V: Handle multi-lib path correclty for linux	Kito Cheng	4	-41/+100
	RISC-V Linux encodes the ABI into the path, so in theory, we can only use that to select multi-lib paths, and no way to use different multi-lib paths between `rv32i/ilp32` and `rv32ima/ilp32`, we'll mapping both to `/lib/ilp32`. It's hard to do that with GCC's builtin multi-lib selection mechanism; builtin mechanism did the option string compare and then enumerate all possible reuse rules during the build time. However, it's impossible to RISC-V; we have a huge number of combinations of `-march`, so implementing a customized multi-lib selection becomes the only solution. Multi-lib configuration is only used for determines which ISA should be used when compiling the corresponding ABI variant after this patch. During the multi-lib selection stage, only consider -mabi as the only key to select the multi-lib path. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_select_multilib_by_abi): New. (riscv_select_multilib): New. (riscv_compute_multilib): Extract logic to riscv_select_multilib and also handle select_by_abi. * config/riscv/elf.h (RISCV_USE_CUSTOMISED_MULTI_LIB): Change it to select_by_abi_arch_cmodel from 1. * config/riscv/linux.h (RISCV_USE_CUSTOMISED_MULTI_LIB): Define. * config/riscv/riscv-opts.h (enum riscv_multilib_select_kind): New.
2023-05-08	Makefile.in: clean up match.pd-related dependencies	Alexander Monakov	1	-6/+3
	Clean up confusing changes from the recent refactoring for parallel match.pd build. gimple-match-head.o is not built. Remove related flags adjustment. Autogenerated gimple-match-N.o files do not depend on gimple-match-exports.cc. {gimple,generic)-match-auto.h only depend on the prerequisites of the corresponding s-{gimple,generic}-match stamp file, not any .cc file. gcc/ChangeLog: * Makefile.in: (gimple-match-head.o-warn): Remove. (GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on gimple-match-exports.cc. (gimple-match-auto.h): Only depend on s-gimple-match. (generic-match-auto.h): Likewise.
2023-05-07	Move substitute_and_fold over to use simple_dce_from_worklist	Andrew Pinski	10	-51/+82
	While looking into a different issue, I noticed that it would take until the second forwprop pass to do some forward proping and it was because the ssa name was used more than once but the second statement was "dead" and we don't remove that until much later. So this uses simple_dce_from_worklist instead of manually removing of the known unused statements instead. Propagate engine does not do a cleanupcfg afterwards either but manually cleans up possible EH edges so simple_dce_from_worklist needs to communicate that back to the propagate engine. Some testcases needed to be updated/changed even because of better optimization. gcc.dg/pr81192.c even had to be changed to be using the gimple FE so it would be less fragile in the future too. gcc.dg/tree-ssa/pr98737-1.c was failing because __atomic_fetch_ was being matched but in those cases, the result was not being used so both __atomic_fetch_ and __atomic_x_and_fetch_ are valid choices and would not make a code generation difference. evrp7.c, evrp8.c, vrp35.c, vrp36.c: just needed a slightly change as the removal message is different slightly. kernels-alias-8.c: ccp1 is able to remove an unused load which causes ealias to have one less load to analysis so update the expected scan #. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/109691 * tree-ssa-dce.cc (simple_dce_from_worklist): Add need_eh_cleanup argument. If the removed statement can throw, have need_eh_cleanup include the bb of that statement. * tree-ssa-dce.h (simple_dce_from_worklist): Update declaration. * tree-ssa-propagate.cc (struct prop_stats_d): Remove num_dce. (substitute_and_fold_dom_walker::substitute_and_fold_dom_walker): Initialize dceworklist instead of stmts_to_remove. (substitute_and_fold_dom_walker::~substitute_and_fold_dom_walker): Destore dceworklist instead of stmts_to_remove. (substitute_and_fold_dom_walker::before_dom_children): Set dceworklist instead of adding to stmts_to_remove. (substitute_and_fold_engine::substitute_and_fold): Call simple_dce_from_worklist instead of poping from the list. Don't update the stat on removal statements. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/evrp7.c: Update for output change. * gcc.dg/tree-ssa/evrp8.c: Likewise. * gcc.dg/tree-ssa/vrp35.c: Likewise. * gcc.dg/tree-ssa/vrp36.c: Likewise. * gcc.dg/tree-ssa/pr98737-1.c: Update scan-tree-dump-not to check for assignment too instead of just a call. * c-c++-common/goacc/kernels-alias-8.c: Update test for removal of load. * gcc.dg/pr81192.c: Rewrite testcase in gimple based test.
2023-05-08	fortran: Remove conditionals around free()	Bernhard Reutner-Fischer	6	-18/+9
	gcc/fortran/ChangeLog: * resolve.cc (resolve_select_type): Call free() unconditionally. libgfortran/ChangeLog: * caf/single.c (_gfortran_caf_register): Call free() unconditionally. * io/async.c (update_pdt, async_io): Likewise. * io/format.c (free_format_data): Likewise. * io/transfer.c (st_read_done_worker, st_write_done_worker): Likewise. * io/unix.c (mem_close): Likewise.
2023-05-08	Fortran: Fix mpz and mpfr memory leaks [PR fortran/68800]	Bernhard Reutner-Fischer	2	-4/+9
	gcc/fortran/ChangeLog: PR fortran/68800 * expr.cc (find_array_section): Fix mpz memory leak. * simplify.cc (gfc_simplify_reshape): Fix mpz memory leaks in error paths.
2023-05-07	Fortran: Reject semicolon after namelist name.	Jerry DeLisle	2	-2/+17
	PR fortran/109662 libgfortran/ChangeLog: * io/list_read.c: Add check for a semicolon after a namelist name in read input. Issue a runtime error message. gcc/testsuite/ChangeLog: * gfortran.dg/pr109662-a.f90: New test.
2023-05-08	Daily bump.	GCC Administrator	4	-1/+146

2023-05-07	c++: fix pretty printing of 'alignof' vs '__alignof__' [PR85979]	Patrick Palka	3	-5/+30
	PR c++/85979 gcc/cp/ChangeLog: * cxx-pretty-print.cc (cxx_pretty_printer::unary_expression) <case ALIGNOF_EXPR>: Consider ALIGNOF_EXPR_STD_P. * error.cc (dump_expr) <case ALIGNOF_EXPR>: Likewise. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/alignof4.C: New test.
2023-05-07	c++: goto entering scope of obj w/ non-trivial dtor [PR103091]	Patrick Palka	3	-43/+42
	It seems ever since DR 2256 goto is permitted to cross the initialization of a trivially initialized object with a non-trivial destructor. We already supported this as an -fpermissive extension, so this patch just makes us unconditionally support this. DR 2256 PR c++/103091 gcc/cp/ChangeLog: * decl.cc (decl_jump_unsafe): Return bool instead of int. Don't consider TYPE_HAS_NONTRIVIAL_DESTRUCTOR. (check_previous_goto_1): Simplify now that decl_jump_unsafe returns bool instead of int. (check_goto): Likewise. gcc/testsuite/ChangeLog: * g++.old-deja/g++.other/init9.C: Don't expect diagnostics for goto made valid by DR 2256. * g++.dg/init/goto4.C: New test.
2023-05-07	c++: satisfaction of non-dep member alias template-id	Patrick Palka	2	-3/+18
	constraints_satisfied_p already carefully checks dependence of template arguments before proceeding with satisfaction, so the dependence check in instantiate_alias_template is unnecessary and overly conservative. Getting rid of it allows us to check satisfaction ahead of time in more cases as in the below testcase. gcc/cp/ChangeLog: * pt.cc (instantiate_alias_template): Exit early upon error from coerce_template_parms. Remove dependence test guarding constraints_satisfied_p. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-alias6.C: New test.
2023-05-07	c++: various code cleanups	Patrick Palka	4	-25/+20
	* Harden some tree accessor macros and fix a couple of bad PLACEHOLDER_TYPE_CONSTRAINTS accesses uncovered by this. * Use strip_innermost_template_args in outer_template_args. * Add !processing_template_decl early exit tests to some dependence predicates. gcc/cp/ChangeLog: * cp-tree.h (PLACEHOLDER_TYPE_CONSTRAINTS_INFO): Harden via TEMPLATE_TYPE_PARM_CHECK. (TPARMS_PRIMARY_TEMPLATE): Harden via TREE_VEC_CHECK. (TEMPLATE_TEMPLATE_PARM_TEMPLATE_DECL): Harden via TEMPLATE_TEMPLATE_PARM_CHECK. * cxx-pretty-print.cc (cxx_pretty_printer::simple_type_specifier): Guard PLACEHOLDER_TYPE_CONSTRAINTS access. * error.cc (dump_type) <case TEMPLATE_TYPE_PARM>: Use separate variable to store CLASS_PLACEHOLDER_TEMPLATE result. * pt.cc (outer_template_args): Use strip_innermost_template_args. (any_type_dependent_arguments_p): Exit early if !processing_template_decl. Use range-based for. (any_dependent_template_arguments_p): Likewise.
2023-05-07	c++: parenthesized -> resolving to static data member [PR98283]	Patrick Palka	3	-3/+17
	Here we're neglecting to propagate parenthesized-ness when the member access (this->m) resolves to a static data member (and thus finish_class_member_access_expr yields a VAR_DECL instead of a COMPONENT_REF). PR c++/98283 gcc/cp/ChangeLog: * pt.cc (tsubst_copy_and_build) <case COMPONENT_REF>: Propagate REF_PARENTHESIZED_P more generally via force_paren_expr. * semantics.cc (force_paren_expr): Document default argument. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/paren6.C: New test.
2023-05-07	c++: bound ttp in lambda function type [PR109651]	Patrick Palka	3	-8/+50
	After r14-11-g2245459c85a3f4 we now coerce the template arguments of a bound ttp again after level-lowering it. Notably a level-lowered ttp doesn't have DECL_CONTEXT set, so during this coercion we fall back to using current_template_parms to obtain the relevant set of in-scope parameters. But it turns out current_template_parms isn't properly set when substituting the function type of a generic lambda, and so if the type contains bound ttps that need to be lowered we'll crash during their attempted coercion. Specifically in the first testcase below, current_template_parms during the lambda type substitution (with T=int) is "1 U" instead of the expected "2 TT, 1 U", and we crash when level lowering TT<int>. Ultimately the problem is that tsubst_lambda_expr does things in the wrong order: we ought to substitute (and install) the in-scope template parameters _before_ substituting anything that may use those template parameters (such as the function type of a generic lambda). This patch corrects this substitution order. PR c++/109651 gcc/cp/ChangeLog: * pt.cc (coerce_template_args_for_ttp): Mention we can hit the current_template_parms fallback when level-lowering a bound ttp. (tsubst_template_decl): Add lambda_tparms parameter. Prefer to use lambda_tparms instead of substituting DECL_TEMPLATE_PARMS. (tsubst_decl) <case TEMPLATE_DECL>: Pass NULL_TREE as lambda_tparms to tsubst_template_decl. (tsubst_lambda_expr): For a generic lambda, substitute DECL_TEMPLATE_PARMS and set current_template_parms to it before substituting the function type. Pass the substituted DECL_TEMPLATE_PARMS as lambda_tparms to tsubst_template_decl. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-generic-ttp1.C: New test. * g++.dg/cpp2a/lambda-generic-ttp2.C: New test.