riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-10-12	RISC-V: Implement vector SAT_SUB for signed integer	Pan Li	3	-0/+21
	This patch would like to implement the sssub for vector signed integer. Form 1: #define DEF_VEC_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_1 (T out, T op_1, T op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T minus = (UT)x - (UT)y; \ out[i] = (x ^ y) >= 0 \ ? minus \ : (minus ^ x) >= 0 \ ? minus \ : x < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX) Before this patch: 28 │ vle8.v v1,0(a1) 29 │ vle8.v v2,0(a2) 30 │ sub a3,a3,a5 31 │ add a1,a1,a5 32 │ add a2,a2,a5 33 │ vsra.vi v4,v1,7 34 │ vsub.vv v3,v1,v2 35 │ vxor.vv v2,v1,v2 36 │ vxor.vv v0,v1,v3 37 │ vmslt.vi v2,v2,0 38 │ vmslt.vi v0,v0,0 39 │ vmand.mm v0,v0,v2 40 │ vxor.vv v3,v4,v5,v0.t 41 │ vse8.v v3,0(a0) 42 │ add a0,a0,a5 After this patch: 25 │ vle8.v v1,0(a1) 26 │ vle8.v v2,0(a2) 27 │ sub a3,a3,a5 28 │ add a1,a1,a5 29 │ add a2,a2,a5 30 │ vssub.vv v1,v1,v2 31 │ vse8.v v1,0(a0) 32 │ add a0,a0,a5 The below test suites are passed for this patch. The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec.md (sssub<mode>3): Add new pattern for signed SAT_SUB. * config/riscv/riscv-protos.h (expand_vec_sssub): Add new func decl to expand sssub to vssub. * config/riscv/riscv-v.cc (expand_vec_sssub): Add new func impl to expand sssub to vssub. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-12	Vect: Try the pattern of vector signed integer SAT_SUB	Pan Li	1	-1/+25
	Almost the same as vector unsigned integer SAT_SUB, try to match the signed version during the vector pattern matching. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * tree-vect-patterns.cc (gimple_signed_integer_sat_sub): Add new func decl for signed SAT_SUB. (vect_recog_sat_sub_pattern_transform): Update comments. (vect_recog_sat_sub_pattern): Try the vector signed SAT_SUB pattern. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-12	Match: Support form 1 for vector signed integer SAT_SUB	Pan Li	1	-0/+16
	This patch would like to support the form 1 of the vector signed integer SAT_SUB. Aka below example: Form 1: #define DEF_VEC_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_1 (T out, T op_1, T op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T minus = (UT)x - (UT)y; \ out[i] = (x ^ y) >= 0 \ ? minus \ : (minus ^ x) >= 0 \ ? minus \ : x < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX) Before this patch: 91 │ _108 = .SELECT_VL (ivtmp_106, POLY_INT_CST [16, 16]); 92 │ vect_x_16.11_80 = .MASK_LEN_LOAD (vectp_op_1.9_78, 8B, { -1, ... }, _108, 0); 93 │ _69 = vect_x_16.11_80 >> 7; 94 │ vect_x.12_81 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_x_16.11_80); 95 │ vect_y_18.15_85 = .MASK_LEN_LOAD (vectp_op_2.13_83, 8B, { -1, ... }, _108, 0); 96 │ vect__7.21_91 = vect_x_16.11_80 ^ vect_y_18.15_85; 97 │ mask__44.22_92 = vect__7.21_91 < { 0, ... }; 98 │ vect_y.16_86 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned char>(vect_y_18.15_85); 99 │ vect__6.17_87 = vect_x.12_81 - vect_y.16_86; 100 │ vect_minus_19.18_88 = VIEW_CONVERT_EXPR<vector([16,16]) signed char>(vect__6.17_87); 101 │ vect__8.19_89 = vect_x_16.11_80 ^ vect_minus_19.18_88; 102 │ mask__42.20_90 = vect__8.19_89 < { 0, ... }; 103 │ mask__41.23_93 = mask__42.20_90 & mask__44.22_92; 104 │ _4 = .COND_XOR (mask__41.23_93, _69, { 127, ... }, vect_minus_19.18_88); 105 │ .MASK_LEN_STORE (vectp_out.31_102, 8B, { -1, ... }, _108, 0, _4); 106 │ vectp_op_1.9_79 = vectp_op_1.9_78 + _108; 107 │ vectp_op_2.13_84 = vectp_op_2.13_83 + _108; 108 │ vectp_out.31_103 = vectp_out.31_102 + _108; 109 │ ivtmp_107 = ivtmp_106 - _108; After this patch: 81 │ _102 = .SELECT_VL (ivtmp_100, POLY_INT_CST [16, 16]); 82 │ vect_x_16.11_89 = .MASK_LEN_LOAD (vectp_op_1.9_87, 8B, { -1, ... }, _102, 0); 83 │ vect_y_18.14_93 = .MASK_LEN_LOAD (vectp_op_2.12_91, 8B, { -1, ... }, _102, 0); 84 │ vect_patt_38.15_94 = .SAT_SUB (vect_x_16.11_89, vect_y_18.14_93); 85 │ .MASK_LEN_STORE (vectp_out.16_96, 8B, { -1, ... }, _102, 0, vect_patt_38.15_94); 86 │ vectp_op_1.9_88 = vectp_op_1.9_87 + _102; 87 │ vectp_op_2.12_92 = vectp_op_2.12_91 + _102; 88 │ vectp_out.16_97 = vectp_out.16_96 + _102; 89 │ ivtmp_101 = ivtmp_100 - _102; The below test suites are passed for this patch. The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add case 1 matching pattern for vector signed SAT_SUB. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-12	Daily bump.	GCC Administrator	6	-1/+346

2024-10-11	Introduce GFC_STD_UNSIGNED.	Thomas Koenig	3	-10/+16
	This patch creates an unsigned "standard" for the gfc_option.allow_std field. One of the main reason why people want UNSIGNED for Fortran is interfacing for C. This is a preparation for further work on the ISO_C_BINDING constants. That, we do via iso-c-binding.def , whose last field is a standard for the constant to be defined for the standard in question, which is then checked. I could try and invent a different method for this, but I'd rather not. gcc/fortran/ChangeLog: * intrinsic.cc (add_functions): Convert uint and selected_unsigned_kind to GFC_STD_UNSIGNED. (gfc_check_intrinsic_standard): Handle GFC_STD_UNSIGNED. * libgfortran.h (GFC_STD_UNSIGNED): Add. * options.cc (gfc_post_options): Set GFC_STD_UNSIGNED if -funsigned is set.
2024-10-12	gcc.target/i386: Replace long with long long	H.J. Lu	5	-8/+9
	Since long is 64-bit for x32, replace long with long long for x32. * gcc.target/i386/bmi2-pr112526.c: Replace long with long long. * gcc.target/i386/pr105854.c: Likewise. * gcc.target/i386/pr112943.c: Likewise. * gcc.target/i386/pr67325.c: Likewise. * gcc.target/i386/pr97971.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-10-12	g++.target/i386/pr105953.C: Skip for x32	H.J. Lu	1	-1/+1
	Since -mabi=ms isn't supported for x32, skip g++.target/i386/pr105953.C for x32. * g++.target/i386/pr105953.C: Skip for x32. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-10-12	gcc.target/i386/pr115407.c: Only run for lp64	H.J. Lu	1	-1/+1
	Since -mcmodel=large is valid only for lp64, run pr115407.c only for lp64. * gcc.target/i386/pr115407.c: Only run for lp64. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-10-11	Fix thinko in previous change	Eric Botcazou	1	-1/+1
	gcc/ada/ PR ada/116498 PR ada/117087 * gcc-interface/decl.cc (validate_size): Fix thinko.
2024-10-11	libstdc++: Rearrange std::move_iterator helpers in stl_iterator.h	Jonathan Wakely	1	-32/+31
	The __niter_base(move_iterator<I>) overload and __is_move_iterator trait were originally immediately after the definition of move_iterator. The addition of C++20 features after move_iterator meant that those helpers were no longer anywhere near move_iterator. This change puts them back where they used to be, before all the new C++20 additions. libstdc++-v3/ChangeLog: * include/bits/stl_iterator.h (__niter_base(move_iterator<I>)) (__is_move_iterator, __miter_base, _GLIBCXX_MAKE_MOVE_ITERATOR) (_GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR): Move earlier in the file.
2024-10-11	PR target/117048 aarch64: Use more canonical and optimization-friendly ↵	Kyrylo Tkachov	2	-4/+63
	representation for XAR instruction The pattern for the Advanced SIMD XAR instruction isn't very optimization-friendly at the moment. In the testcase from the PR once simlify-rtx has done its work it generates the RTL: (set (reg:V2DI 119 [ _14 ]) (rotate:V2DI (xor:V2DI (reg:V2DI 114 [ vect__1.12_16 ]) (reg:V2DI 116 [ m1_01_8(D) ])) (const_vector:V2DI [ (const_int 32 [0x20]) repeated x2 ]))) which fails to match our XAR pattern because the pattern expects: 1) A ROTATERT instead of the ROTATE. However, according to the RTL ops documentation the preferred form of rotate-by-immediate is ROTATE, which I take to mean it's the canonical form. ROTATE (x, C) <-> ROTATERT (x, MODE_WIDTH - C) so it's better to match just one canonical representation. 2) A CONST_INT shift amount whereas the midend asks for a repeated vector constant. These issues are fixed by introducing a dedicated expander for the aarch64_xarqv2di name, needed by the arm_neon.h intrinsic, that translate the intrinsic-level CONST_INT immediate (the right-rotate amount) into a repeated vector constant subtracted from 64 to give the corresponding left-rotate amount that is fed to the new representation for the XAR define_insn that uses the ROTATE RTL code. This is a similar approach to have we handle the discrepancy between intrinsic-level and RTL-level vector lane numbers for big-endian. With this patch and [1/2] the arithmetic parts of the testcase now simplify to just one XAR instruction. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ PR target/117048 config/aarch64/aarch64-simd.md (aarch64_xarqv2di): Redefine into a define_expand. (aarch64_xarqv2di_insn): Define. gcc/testsuite/ PR target/117048 g++.target/aarch64/pr117048.C: New test.
2024-10-11	PR 117048: simplify-rtx: Extend (x << C1) \| (X >> C2) --> ROTATE ↵	Kyrylo Tkachov	1	-6/+10
	transformation to vector operands In the testcase from patch [2/2] we want to match a vector rotate operate from an IOR of left and right shifts by immediate. simplify-rtx has code for just that but it looks like it's prepared to do handle only scalar operands. In practice most of the code works for vector modes as well except the shift amounts are checked to be CONST_INT rather than vector constants that we have here. This is easily extended by using unwrap_const_vec_duplicate to extract the repeating constant shift amount. With this change combine now tries matching the simpler and expected: (set (reg:V2DI 119 [ _14 ]) (rotate:V2DI (xor:V2DI (reg:V2DI 114 [ vect__1.12_16 ]) (reg:V2DI 116 [ m1_01_8(D) ])) (const_vector:V2DI [ (const_int 32 [0x20]) repeated x2 ]))) instead of the previous: (set (reg:V2DI 119 [ _14 ]) (ior:V2DI (ashift:V2DI (xor:V2DI (reg:V2DI 114 [ vect__1.12_16 ]) (reg:V2DI 116 [ m1_01_8(D) ])) (const_vector:V2DI [ (const_int 32 [0x20]) repeated x2 ])) (lshiftrt:V2DI (xor:V2DI (reg:V2DI 114 [ vect__1.12_16 ]) (reg:V2DI 116 [ m1_01_8(D) ])) (const_vector:V2DI [ (const_int 32 [0x20]) repeated x2 ])))) To actually fix the PR the aarch64 backend needs some adjustment as well which is done in patch [2/2], which adds the testcase as well. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> PR target/117048 simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Handle vector constants in (x << C1) \| (x >> C2) -> ROTATE simplification.
2024-10-11	Fortran: Dead-function removal in error.cc (shrinking by 40%)	Tobias Burnus	1	-717/+0
	This patch removes a large number of unused static functions from error.cc, which previously were used for diagnostic but have been replaced by the common diagnostic code. gcc/fortran/ChangeLog: * error.cc (error_char, error_string, error_uinteger, error_integer, error_hwuint, error_hwint, gfc_widechar_display_length, gfc_wide_display_length, error_printf, show_locus, show_loci): Remove unused static functions. (IBUF_LEN, MAX_ARGS): Remove now unused #define.
2024-10-11	match.pd: Fold logarithmic identities.	Jennifer Schmitz	2	-0/+81
	This patch implements 4 rules for logarithmic identities in match.pd under -funsafe-math-optimizations: 1) logN(1.0/a) -> -logN(a). This avoids the division instruction. 2) logN(C/a) -> logN(C) - logN(a), where C is a real constant. Same as 1). 3) logN(a) + logN(b) -> logN(ab). This reduces the number of calls to log function. 4) logN(a) - logN(b) -> logN(a/b). Same as 4). Tests were added for float, double, and long double. The patch was bootstrapped and regtested on aarch64-linux-gnu and x86_64-linux-gnu, no regression. Additionally, SPEC 2017 fprate was run. While the transform does not seem to be triggered, we also see no non-noise impact on performance. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ PR tree-optimization/116826 PR tree-optimization/86710 match.pd: Fold logN(1.0/a) -> -logN(a), logN(C/a) -> logN(C) - logN(a), logN(a) + logN(b) -> logN(ab), and logN(a) - logN(b) -> logN(a/b). gcc/testsuite/ PR tree-optimization/116826 PR tree-optimization/86710 gcc.dg/tree-ssa/log_ident.c: New test.
2024-10-11	libstdc++: Use appropriate feature test macro for std::byte	Jonathan Wakely	1	-1/+1
	libstdc++-v3/ChangeLog: * include/bits/cpp_type_traits.h (__is_byte<byte>): Guard with __glibcxx_byte macro instead of checking __cplusplus.
2024-10-11	libstdc++: Fix localized %c formatting for <chrono> [PR117085]	Jonathan Wakely	4	-6/+27
	When formatting a time point with %c we call std::vformat_to using the formatting locale's D_T_FMT string, but we weren't adding the L option to the format string. This meant we always interpreted D_T_FMT in the C locale, instead of using the formatting locale as obviously intended when %c is used. libstdc++-v3/ChangeLog: PR libstdc++/117085 * include/bits/chrono_io.h (__formatter_chrono::_M_c): Add L option to format string. * testsuite/std/time/format.cc: Move to... * testsuite/std/time/format/format.cc: ...here. * testsuite/std/time/format_localized.cc: Move to... * testsuite/std/time/format/localized.cc: ...here. * testsuite/std/time/format/pr117085.cc: New test.
2024-10-11	libstdc++: Add missing whitespace in dg-do directives	Jonathan Wakely	2	-2/+2
	libstdc++-v3/ChangeLog: * testsuite/22_locale/time_get/get/char/5.cc: Fix dg-do directive. * testsuite/22_locale/time_get/get/wchar_t/5.cc: Likewise.
2024-10-11	tree-optimization/117080 - Add SLP_TREE_MEMORY_ACCESS_TYPE	Richard Biener	4	-50/+91
	It turns out target costing code looks at STMT_VINFO_MEMORY_ACCESS_TYPE to identify operations from (emulated) gathers for example. This doesn't work for SLP loads since we do not set STMT_VINFO_MEMORY_ACCESS_TYPE there as the vectorization strathegy might differ between different stmt uses. It seems we got away with setting it for stores though. The following adds a memory_access_type field to slp_tree and sets it from load and store vectorization code. All the costing doesn't record the SLP node (that was only done selectively for some corner case). The costing is really in need of a big overhaul, the following just massages the two relevant ops to fix gcc.dg/target/pr88531-2[bc].c FAILs when switching on SLP for non-grouped stores. In particular currently we either have a SLP node or a stmt_info in the cost hook but not both. So the following mitigates this, postponing a rewrite of costing to next stage1. Other targets look possibly affected as well but are left to respective maintainers to update. PR tree-optimization/117080 * tree-vectorizer.h (_slp_tree::memory_access_type): Add. (SLP_TREE_MEMORY_ACCESS_TYPE): New. (record_stmt_cost): Add another overload. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize memory_access_type. * tree-vect-stmts.cc (vectorizable_store): Set SLP_TREE_MEMORY_ACCESS_TYPE. (vectorizable_load): Likewise. Also record the SLP node when costing emulated gather offset decompose and vector composition. * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Also recognize SLP emulated gather/scatter.
2024-10-11	aarch64: Add codegen support for SVE2 faminmax	Saurabh Jha	4	-0/+147
	The AArch64 FEAT_FAMINMAX extension introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch adds code generation for famax and famin in terms of existing unspecs. With this patch: 1. famax can be expressed as taking UNSPEC_COND_SMAX of the two operands and then taking absolute value of their result. 2. famin can be expressed as taking UNSPEC_COND_SMIN of the two operands and then taking absolute value of their result. This fusion of operators is only possible when -march=armv9-a+faminmax+sve flags are passed. We also need to pass -ffast-math flag; this is what enables compiler to use UNSPEC_COND_SMAX and UNSPEC_COND_SMIN. This code generation is only available on -O2 or -O3 as that is when auto-vectorization is enabled. gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (aarch64_pred_faminmax_fused): Instruction pattern for faminmax codegen. config/aarch64/iterators.md: Iterator and attribute for faminmax codegen. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/faminmax_1.c: New test. * gcc.target/aarch64/sve/faminmax_2.c: New test.
2024-10-11	aarch64: Add SVE2 faminmax intrinsics	Saurabh Jha	11	-1/+2615
	The AArch64 FEAT_FAMINMAX extension introduces instructions for computing the floating point absolute maximum and minimum of the two vectors element-wise. This patch introduces SVE2 faminmax intrinsics. The intrinsics of this extension are implemented as the following builtin functions: * sva[max\|min]_[m\|x\|z] * sva[max\|min]_[f16\|f32\|f64]_[m\|x\|z] * sva[max\|min]_n_[f16\|f32\|f64]_[m\|x\|z] gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-base.cc (svamax): Absolute maximum declaration. (svamin): Absolute minimum declaration. * config/aarch64/aarch64-sve-builtins-base.def (REQUIRED_EXTENSIONS): Add faminmax intrinsics behind a flag. (svamax): Absolute maximum declaration. (svamin): Absolute minimum declaration. * config/aarch64/aarch64-sve-builtins-base.h: Declaring function bases for the new intrinsics. * config/aarch64/aarch64.h (TARGET_SVE_FAMINMAX): New flag for SVE2 faminmax. * config/aarch64/iterators.md: New unspecs, iterators, and attrs for the new intrinsics. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve2/acle/asm/amax_f16.c: New test. * gcc.target/aarch64/sve2/acle/asm/amax_f32.c: New test. * gcc.target/aarch64/sve2/acle/asm/amax_f64.c: New test. * gcc.target/aarch64/sve2/acle/asm/amin_f16.c: New test. * gcc.target/aarch64/sve2/acle/asm/amin_f32.c: New test. * gcc.target/aarch64/sve2/acle/asm/amin_f64.c: New test.
2024-10-11	middle-end/117086 - fixup vec_cond simplifications	Richard Biener	2	-21/+36
	The following adds missing checks for a vector type result type to simplifications that end up creating a vec_cond. PR middle-end/117086 * match.pd ((op (vec_cond ...) ..) -> (vec_cond ...)): Add missing checks for VECTOR_TYPE_P (type). * gcc.dg/torture/pr117086.c: New testcase.
2024-10-11	RISC-V: Add testcases for form 8 of scalar signed SAT_TRUNC	Pan Li	13	-0/+271
	Form 8: #define DEF_SAT_S_TRUNC_FMT_8(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_8 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN > x \|\| x >= (WT)NT_MAX \ ? x < 0 ? NT_MIN : NT_MAX \ : trunc; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-8-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-8-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-8-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-8-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-8-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-8-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-8-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	RISC-V: Add testcases for form 7 of scalar signed SAT_TRUNC	Pan Li	13	-0/+271
	Form 7: #define DEF_SAT_S_TRUNC_FMT_7(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_7 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN >= x \|\| x >= (WT)NT_MAX \ ? x < 0 ? NT_MIN : NT_MAX \ : trunc; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-7-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-7-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-7-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-7-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-7-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-7-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-7-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	RISC-V: Add testcases for form 6 of scalar signed SAT_TRUNC	Pan Li	13	-0/+271
	Form 6: #define DEF_SAT_S_TRUNC_FMT_6(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_6 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN >= x \|\| x > (WT)NT_MAX \ ? x < 0 ? NT_MIN : NT_MAX \ : trunc; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-6-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-6-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-6-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-6-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-6-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-6-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-6-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	RISC-V: Add testcases for form 5 of scalar signed SAT_TRUNC	Pan Li	13	-0/+271
	Form 5: #define DEF_SAT_S_TRUNC_FMT_5(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_5 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN > x \|\| x > (WT)NT_MAX \ ? x < 0 ? NT_MIN : NT_MAX \ : trunc; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-5-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-5-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-5-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-5-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-5-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-5-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-5-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	RISC-V: Add testcases for form 4 of scalar signed SAT_TRUNC	Pan Li	13	-0/+271
	Form 4: #define DEF_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN <= x && x < (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-4-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-4-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-4-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-4-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-4-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-4-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-4-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	Match: Support form 4 for scalar signed integer SAT_TRUNC	Pan Li	1	-0/+1
	This patch would like to support the form 4 of the scalar signed integer SAT_TRUNC. Aka below example: Form 4: #define DEF_SAT_S_TRUNC_FMT_4(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_4 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN <= x && x < (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } DEF_SAT_S_TRUNC_FMT_4(int8_t, int16_t, INT8_MIN, INT8_MAX) Before this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_4 (int16_t x) 6 │ { 7 │ int8_t trunc; 8 │ unsigned short x.0_1; 9 │ unsigned short _2; 10 │ int8_t _3; 11 │ _Bool _7; 12 │ signed char _8; 13 │ signed char _9; 14 │ signed char _10; 15 │ 16 │ ;; basic block 2, loop depth 0 17 │ ;; pred: ENTRY 18 │ x.0_1 = (unsigned short) x_4(D); 19 │ _2 = x.0_1 + 128; 20 │ if (_2 > 254) 21 │ goto <bb 4>; [50.00%] 22 │ else 23 │ goto <bb 3>; [50.00%] 24 │ ;; succ: 4 25 │ ;; 3 26 │ 27 │ ;; basic block 3, loop depth 0 28 │ ;; pred: 2 29 │ trunc_5 = (int8_t) x_4(D); 30 │ goto <bb 5>; [100.00%] 31 │ ;; succ: 5 32 │ 33 │ ;; basic block 4, loop depth 0 34 │ ;; pred: 2 35 │ _7 = x_4(D) < 0; 36 │ _8 = (signed char) _7; 37 │ _9 = -_8; 38 │ _10 = _9 ^ 127; 39 │ ;; succ: 5 40 │ 41 │ ;; basic block 5, loop depth 0 42 │ ;; pred: 3 43 │ ;; 4 44 │ # _3 = PHI <trunc_5(3), _10(4)> 45 │ return _3; 46 │ ;; succ: EXIT 47 │ 48 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_4 (int16_t x) 6 │ { 7 │ int8_t _3; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _3 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _3; 13 │ ;; succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add case 4 matching pattern for signed SAT_TRUNC. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	RISC-V: Add testcases for form 3 of scalar signed SAT_TRUNC	Pan Li	13	-0/+271
	Form 3: #define DEF_SAT_S_TRUNC_FMT_3(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_3 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN < x && x <= (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-3-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-3-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-3-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-3-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-3-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-3-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-3-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	Match: Support form 3 for scalar signed integer SAT_TRUNC	Pan Li	1	-0/+3
	This patch would like to support the form 3 of the scalar signed integer SAT_TRUNC. Aka below example: Form 3: #define DEF_SAT_S_TRUNC_FMT_3(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_3 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN < x && x <= (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } DEF_SAT_S_TRUNC_FMT_3(int8_t, int16_t, INT8_MIN, INT8_MAX) Before this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_sub_int8_t_fmt_3 (int8_t x, int8_t y) 6 │ { 7 │ signed char _1; 8 │ signed char _2; 9 │ int8_t _3; 10 │ __complex__ signed char _6; 11 │ _Bool _8; 12 │ signed char _9; 13 │ signed char _10; 14 │ signed char _11; 15 │ 16 │ ;; basic block 2, loop depth 0 17 │ ;; pred: ENTRY 18 │ _6 = .SUB_OVERFLOW (x_4(D), y_5(D)); 19 │ _2 = IMAGPART_EXPR <_6>; 20 │ if (_2 != 0) 21 │ goto <bb 4>; [50.00%] 22 │ else 23 │ goto <bb 3>; [50.00%] 24 │ ;; succ: 4 25 │ ;; 3 26 │ 27 │ ;; basic block 3, loop depth 0 28 │ ;; pred: 2 29 │ _1 = REALPART_EXPR <_6>; 30 │ goto <bb 5>; [100.00%] 31 │ ;; succ: 5 32 │ 33 │ ;; basic block 4, loop depth 0 34 │ ;; pred: 2 35 │ _8 = x_4(D) < 0; 36 │ _9 = (signed char) _8; 37 │ _10 = -_9; 38 │ _11 = _10 ^ 127; 39 │ ;; succ: 5 40 │ 41 │ ;; basic block 5, loop depth 0 42 │ ;; pred: 3 43 │ ;; 4 44 │ # _3 = PHI <_1(3), _11(4)> 45 │ return _3; 46 │ ;; succ: EXIT 47 │ 48 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_3 (int16_t x) 6 │ { 7 │ int8_t _3; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _3 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _3; 13 │ ;; succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add case 3 matching pattern for signed SAT_TRUNC. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	RISC-V: Add testcases for form 2 of scalar signed SAT_TRUNC	Pan Li	13	-0/+271
	Form 2: #define DEF_SAT_S_TRUNC_FMT_2(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_2 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN < x && x < (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_s_trunc-2-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-2-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-2-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-2-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-2-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-2-i64-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i16-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i32-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i32-to-i8.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i64-to-i16.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i64-to-i32.c: New test. * gcc.target/riscv/sat_s_trunc-run-2-i64-to-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	Match: Support form 2 for scalar signed integer SAT_TRUNC	Pan Li	1	-8/+13
	This patch would like to support the form 2 of the scalar signed integer SAT_TRUNC. Aka below example: Form 2: #define DEF_SAT_S_TRUNC_FMT_2(NT, WT, NT_MIN, NT_MAX) \ NT __attribute__((noinline)) \ sat_s_trunc_##WT##_to_##NT##_fmt_2 (WT x) \ { \ NT trunc = (NT)x; \ return (WT)NT_MIN < x && x < (WT)NT_MAX \ ? trunc \ : x < 0 ? NT_MIN : NT_MAX; \ } DEF_SAT_S_TRUNC_FMT_2(int8_t, int16_t, INT8_MIN, INT8_MAX) Before this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_2 (int16_t x) 6 │ { 7 │ int8_t trunc; 8 │ unsigned short x.0_1; 9 │ unsigned short _2; 10 │ int8_t _3; 11 │ _Bool _7; 12 │ signed char _8; 13 │ signed char _9; 14 │ signed char _10; 15 │ 16 │ ;; basic block 2, loop depth 0 17 │ ;; pred: ENTRY 18 │ x.0_1 = (unsigned short) x_4(D); 19 │ _2 = x.0_1 + 127; 20 │ if (_2 > 253) 21 │ goto <bb 4>; [50.00%] 22 │ else 23 │ goto <bb 3>; [50.00%] 24 │ ;; succ: 4 25 │ ;; 3 26 │ 27 │ ;; basic block 3, loop depth 0 28 │ ;; pred: 2 29 │ trunc_5 = (int8_t) x_4(D); 30 │ goto <bb 5>; [100.00%] 31 │ ;; succ: 5 32 │ 33 │ ;; basic block 4, loop depth 0 34 │ ;; pred: 2 35 │ _7 = x_4(D) < 0; 36 │ _8 = (signed char) _7; 37 │ _9 = -_8; 38 │ _10 = _9 ^ 127; 39 │ ;; succ: 5 40 │ 41 │ ;; basic block 5, loop depth 0 42 │ ;; pred: 3 43 │ ;; 4 44 │ # _3 = PHI <trunc_5(3), _10(4)> 45 │ return _3; 46 │ ;; succ: EXIT 47 │ 48 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_trunc_int16_t_to_int8_t_fmt_2 (int16_t x) 6 │ { 7 │ int8_t _3; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _3 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _3; 13 │ ;; succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add case 2 matching pattern for signed SAT_TRUNC. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-10-11	i386: Fix up spaceship expanders for -mtune=i[45]86 [PR117053]	Jakub Jelinek	2	-38/+113
	The adjusted and new spaceship expanders ICE with -mtune=i486 or -mtune=i586. The problem is that in that case TARGET_ZERO_EXTEND_WITH_AND is true and zero_extendqisi2 isn't allowed in that case, and we can't use the replacement AND, because that clobbers flags and we want to use them again. The following patch fixes that by using in those cases roughly what we want to expand it to after peephole2 optimizations, i.e. xor before the comparison, setcc_qi_slp and sbbl $0 (or for signed int case xoring of 2 regs, two setcc_qi_slp, subl). For setcc_qi_slp, it uses the setcc_si_slp hacks with UNSPEC that were in use for the floating point jp case (so such code is IMHO undesirable for the !TARGET_ZERO_EXTEND_WITH_AND case as we want to give combiner more liberty in that case). 2024-10-11 Jakub Jelinek <jakub@redhat.com> PR target/117053 config/i386/i386-expand.cc (ix86_expand_fp_spaceship): Handle TARGET_ZERO_EXTEND_WITH_AND differently. (ix86_expand_int_spaceship): Likewise. * g++.target/i386/pr116896-3.C: New test.
2024-10-11	tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP	Richard Biener	2	-1/+20
	The following temporarily reverts the support of permuted .MASK_LOAD for the case of non-grouped accesses. PR tree-optimization/117050 * tree-vect-slp.cc (vect_build_slp_tree_2): Do not support permutes of non-grouped .MASK_LOAD. * gcc.dg/vect/pr117050.c: New testcase.
2024-10-11	libstdc++: Fix some test failures with -fno-char8_t	Jonathan Wakely	2	-2/+9
	libstdc++-v3/ChangeLog: * testsuite/20_util/duration/io.cc [!__cpp_lib_char8_t]: Define char8_t as a typedef for unsigned char. * testsuite/std/format/parse_ctx_neg.cc: Skip for -fno-char8_t.
2024-10-11	Fix possible wrong-code with masked store-lanes	Richard Biener	1	-10/+20
	When we're doing masked store-lanes one mask element applies to all loads of one struct element. This requires uniform masks for all of the SLP lanes, something we already compute into STMT_VINFO_SLP_VECT_ONLY but fail to check when doing SLP store-lanes. The following corrects this. The following also adjusts the store-lane heuristic to properly check for masked or non-masked optab support. * tree-vect-slp.cc (vect_slp_prefer_store_lanes_p): Allow passing in of vectype, pass in whether the stores are masked and query the correct optab. (vect_build_slp_instance): Guard store-lanes query with ! STMT_VINFO_SLP_VECT_ONLY, guaranteeing an uniform mask.
2024-10-11	i386: Fix some patterns's mem attribute.	Hu, Lin1	1	-10/+12
	Hi, all This is another patch to modify some pattern's type attr from ssemov to ssemov2. Some ssemov pattern's mem attr should be load when their 2 operand is a memory operand. Bootstrapped and regtested on x86-64-linux-pc, OK for trunk? BRs, Lin gcc/ChangeLog: * config/i386/sse.md (sse_movhlps): Change type attr from ssemov to ssemov2. (sse_loadhps): Ditto. (vec_concat<mode>): Ditto. (vec_setv2df_0): Ditto. (sse_loadlps): Change attr from ssemov to ssemov2 except for 2, 3. (sse2_loadhps): Change attr from ssemov to ssemov2 except for 0, 1. (sse2_loadlpd): Change attr from ssemov to ssemov2 except for 0, 1, 2. (sse2_movsd_<mode>): Change attr from ssemov to ssemov2 except for 5. (vec_concatv2df): Change attr from ssemov to ssemov2 except for 0, 1, 2. (vec_concat<mode>): Change attr from ssemov to ssemov2 for 3, 4. (vec_concatv2di): Change attr from ssemov to ssemov2 except for 0, 1, 2, 3, 4, 5.
2024-10-11	Daily bump.	GCC Administrator	5	-1/+153

2024-10-10	aarch64: Alter pr116258.c test to correct for big endian.	Richard Ball	1	-1/+2
	The test at pr116258.c fails on big endian targets, this is because the test checks that the index of a floating point multiply is 0, which is correct only for little endian. gcc/testsuite/ChangeLog: PR tree-optimization/116258 * gcc.target/aarch64/pr116258.c: Alter test to add big-endian support.
2024-10-10	Fix PR116650: check all regs in regrename targets	Michael Matz	1	-6/+19
	(this came up for m68k vs. LRA, but is a generic problem) Regrename wants to use new registers for certain def-use chains. For validity of replacements it needs to check that the selected candidates are unused up to then. That's done in check_new_reg_p. But if it so happens that the new register needs more hardregs than the old register (which happens if the target allows inter-bank moves and the mode is something like a DFmode that needs to be placed into a SImode reg-pair), then check_new_reg_p only checks the first of those registers for free-ness. This is caused by that function looking up the number of necessary hardregs only in terms of the old hardreg number. It of course needs to do that in terms of the new candidate regnumber. The symptom is that regrename sometimes clobbers the higher numbered registers of such a regrename target pair. This patch fixes that problem. (In the particular case of the bug report it was LRA that left over a inter-bank move instruction that triggers regrename, ultimately causing the mis-compile. Reload didn't do that, but in general we of course can't rely on such moves not happening if the target allows them.) This also shows a general confusion in that function and the target hook interface here: for (i = nregs - 1; i >= 0; --) ... \|\| ! HARD_REGNO_RENAME_OK (reg + i, new_reg + i)) it uses nregs in a way that requires it to be the same between old and new register. The problem is that the target hook only gets register numbers, when it instead should get a mode and register numbers and would be called only for the first but not for subsequent registers. I've looked at a number of definitions of that target hook and I think that this is currently harmless in the sense that it would merely rule out some potential reg-renames that would in fact be okay to do. So I'm not changing the target hook interface here and hence that problem remains unfixed. PR rtl-optimization/116650 * regrename.cc (check_new_reg_p): Calculate nregs in terms of the new candidate register.
2024-10-10	phiopt: Remove candorest variable return instead	Andrew Pinski	1	-6/+1
	After r15-3560-gb081e6c860eb9688d24365d39, the setting of candorest with the break can just change to a return since this is inside a lambda now. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (pass_phiopt::execute): Remove candorest and return instead of setting candorest. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-10-10	RISC-V:Bugfix for C++ code compilation failure with rv32imafc_zve32f[pr116883]	Li Xu	2	-1/+21
	From: xuli <xuli1@eswincomputing.com> Example as follows: int main() { unsigned long arraya[128], arrayb[128], arrayc[128]; for (int i = 0; i < 128; i++) { arraya[i] = arrayb[i] + arrayc[i]; } return 0; } Compiled with -march=rv32imafc_zve32f -mabi=ilp32f, it will cause a compilation issue: riscv_vector.h:40:25: error: ambiguating new declaration of 'vint64m4_t __riscv_vle64(vbool16_t, const long long int, unsigned int)' 40 \| #pragma riscv intrinsic "vector" \| ^~~~~~~~ riscv_vector.h:40:25: note: old declaration 'vint64m1_t __riscv_vle64(vbool64_t, const long long int, unsigned int)' With zvl=32b, vbool16_t is registered in init_builtins() with type_common.precision=0x101 (nunits=2), mode_nunits[E_RVVMF16BI]=[2,2]. Normally, vbool64_t is only valid when TARGET_MIN_VLEN > 32, so vbool64_t is not registered in init_builtins(), meaning vbool64_t=null. In order to implement __attribute__((target("arch=+v"))), we must register all vector types and all RVV intrinsics. Therefore, vbool64_t will be registered by default with zvl=128b in reinit_builtins(), resulting in type_common.precision=0x101 (nunits=2) and mode_nunits[E_RVVMF64BI]=[2,2]. We then get TYPE_VECTOR_SUBPARTS(vbool16_t) == TYPE_VECTOR_SUBPARTS(vbool64_t), calculated using type_common.precision, resulting in 2. Since vbool16_t and vbool64_t have the same element type (boolean_type), the compiler treats them as the same type, leading to a re-declaration conflict. After all types and intrinsics have been registered, processing __attribute__((target("arch=+v"))) will update the parameters option and init_adjust_machine_modes. Therefore, to avoid conflicts, we can choose zvl=4096b for the null type reinit_builtins(). command option zvl=32b type nunits vbool64_t => null vbool32_t=> [1,1] vbool16_t=> [2,2] vbool8_t=> [4,4] vbool4_t=> [8,8] vbool2_t=> [16,16] vbool1_t=> [32,32] reinit zvl=128b vbool64_t => [2,2] conflict with zvl32b vbool16_t=> [2,2] reinit zvl=256b vbool64_t => [4,4] conflict with zvl32b vbool8_t=> [4,4] reinit zvl=512b vbool64_t => [8,8] conflict with zvl32b vbool4_t=> [8,8] reinit zvl=1024b vbool64_t => [16,16] conflict with zvl32b vbool2_t=> [16,16] reinit zvl=2048b vbool64_t => [32,32] conflict with zvl32b vbool1_t=> [32,32] reinit zvl=4096b vbool64_t => [64,64] zvl=4096b is ok Signed-off-by: xuli <xuli1@eswincomputing.com> PR target/116883 gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_pragma_intrinsic_flags_pollute): Choose zvl4096b to initialize null type. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/pr116883.C: New test.
2024-10-10	vect: Avoid divide by zero for permutes of extern VLA vectors	Richard Sandiford	1	-3/+12
	My recent VLA SLP patches caused a regression with cross compilers in gcc.dg/torture/neon-sve-bridge.c. There we have a VEC_PERM_EXPR created from two BIT_FIELD_REFs, with the child node being an external VLA vector: note: node 0x3704a70 (max_nunits=1, refcnt=2) vector(2) long int note: op: VEC_PERM_EXPR note: stmt 0 val1Return_9 = BIT_FIELD_REF <sveReturn_8, 64, 0>; note: stmt 1 val2Return_10 = BIT_FIELD_REF <sveReturn_8, 64, 64>; note: lane permutation { 0[0] 0[1] } note: children 0x3704b08 note: node (external) 0x3704b08 (max_nunits=1, refcnt=1) svint64_t note: { } For this kind of external node, the SLP_TREE_LANES is normally the total number of lanes in the vector, but it is zero if the vector has variable length: auto nunits = TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode)); unsigned HOST_WIDE_INT const_nunits; if (nunits.is_constant (&const_nunits)) SLP_TREE_LANES (vnode) = const_nunits; This led to division by zero in: /* Check whether the output has N times as many lanes per vector. / else if (constant_multiple_p (SLP_TREE_LANES (node) op_nunits, SLP_TREE_LANES (child) * nunits, &this_unpack_factor) && (i == 0 \|\| unpack_factor == this_unpack_factor)) unpack_factor = this_unpack_factor; No repetition takes place for this kind of external node, so this patch goes with Richard's suggestion to check for external nodes that have no scalar statements. This didn't show up for my native testing since division by zero doesn't trap on AArch64. gcc/ * tree-vect-slp.cc (vectorizable_slp_permutation_1): Set repeating_p to false if we have an external node for a pre-existing vector.
2024-10-10	libiberty: Restore build with CP_DEMANGLE_DEBUG defined	Simon Martin	1	-2/+2
	cp-demangle.c does not build when CP_DEMANGLE_DEBUG is defined since r13-2887-gb04208895fed34. This trivial patch fixes the issue. libiberty/ChangeLog: * cp-demangle.c (d_dump): Fix compilation when CP_DEMANGLE_DEBUG is defined.
2024-10-10	tree-optimization/117060 - fix oversight in vect_build_slp_tree_1	Richard Biener	2	-2/+24
	We are failing to match call vs. non-call when dealing with matching loads or stores. PR tree-optimization/117060 * tree-vect-slp.cc (vect_build_slp_tree_1): When comparing calls also fail if the first isn't a call. * gfortran.dg/pr117060.f90: New testcase.
2024-10-10	match.pd: Check trunc_mod vector obtap before folding.	Jennifer Schmitz	2	-2/+17
	This patch guards the simplification x / y * y == x -> x % y == 0 in match.pd by a check for: 1) Non-vector mode of x OR 2) Lack of support for vector division OR 3) Support of vector modulo The patch was bootstrapped and tested with no regression on aarch64-linux-gnu and x86_64-linux-gnu. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ PR tree-optimization/116831 * match.pd: Guard simplification to trunc_mod with check for mod optab support. gcc/testsuite/ PR tree-optimization/116831 * gcc.dg/torture/pr116831.c: New test.
2024-10-10	Allow SLP store of mixed external and constant	Richard Biener	4	-21/+26
	vect_build_slp_tree_1 rejected this during SLP discovery because it ran into the rhs code comparison code for stores. The following skips that completely for loads and stores as those are handled later anyway. This needs a heuristic adjustment in vect_get_and_check_slp_defs to avoid fallout with regard to BB vectorization and splitting of a store group vs. demoting one operand to external. gcc.dg/Wstringop-overflow-47.c needs adjustment given we now have vast improvements for code generation. gcc.dg/strlenopt-32.c needs adjustment because the strlen pass doesn't handle _11 = {0, b_6(D)}; __builtin_memcpy (&a, "foo.bar", 8); MEM <vector(2) char> [(char )&a + 3B] = _11; _9 = strlen (&a); I have opened PR117057 for this. tree-vect-slp.cc (vect_build_slp_tree_1): Do not compare RHS codes for loads or stores. (vect_get_and_check_slp_defs): Only demote operand to external in case there is more than one operand. * gcc.dg/vect/slp-57.c: New testcase. * gcc.dg/Wstringop-overflow-47.c: Adjust. * gcc.dg/strlenopt-32.c: XFAIL parts.
2024-10-10	Add a new tune avx256_avoid_vec_perm for SRF.	liuhongt	4	-2/+43
	According to Intel SOM[1], For Crestmont, most 256-bit Intel AVX2 instructions can be decomposed into two independent 128-bit micro-operations, except for a subset of Intel AVX2 instructions, known as cross-lane operations, can only compute the result for an element by utilizing one or more sources belonging to other elements. The 256-bit instructions listed below use more operand sources than can be natively supported by a single reservation station within these microarchitectures. They are decomposed into two μops, where the first μop resolves a subset of operand dependencies across two cycles. The dependent second μop executes the 256-bit operation by using a single 128-bit execution port for two consecutive cycles with a five-cycle latency for a total latency of seven cycles. VPERM2I128 ymm1, ymm2, ymm3/m256, imm8 VPERM2F128 ymm1, ymm2, ymm3/m256, imm8 VPERMPD ymm1, ymm2/m256, imm8 VPERMPS ymm1, ymm2, ymm3/m256 VPERMD ymm1, ymm2, ymm3/m256 VPERMQ ymm1, ymm2/m256, imm8 Instead of setting tune avx128_optimal for SRF, the patch add a new tune avx256_avoid_vec_perm for it. so by default, vectorizer still uses 256-bit VF if cost is profitable, but lowers to 128-bit whenever 256-bit vec_perm is needed for auto-vectorization. w/o vec_perm, performance of 256-bit vectorization should be similar as 128-bit ones(some benchmark results show it's even better than 128-bit vectorization since it enables more parallelism for convert cases.) [1] https://www.intel.com/content/www/us/en/content-details/814198/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs): Add new member m_num_avx256_vec_perm. (ix86_vector_costs::add_stmt_cost): Record 256-bit vec_perm. (ix86_vector_costs::finish_cost): Prevent vectorization for TAREGT_AVX256_AVOID_VEC_PERM when there's 256-bit vec_perm instruction. * config/i386/i386.h (TARGET_AVX256_AVOID_VEC_PERM): New Macro. * config/i386/x86-tune.def (X86_TUNE_AVX256_SPLIT_REGS): Add m_CORE_ATOM. (X86_TUNE_AVX256_AVOID_VEC_PERM): New tune. gcc/testsuite/ChangeLog: * gcc.target/i386/avx256_avoid_vec_perm.c: New test.
2024-10-10	Add new microarchitecture tune for SRF/GRR/CWF.	liuhongt	4	-12/+34
	For Crestmont, 4-operand vex blendv instructions come from MSROM and is slower than 3-instructions sequence (op1 & mask) \| (op2 & ~mask). legacy blendv instruction can still be handled by the decoder. The patch add a new tune which is enabled for all processors except for SRF/CWF. It will use vpand + vpandn + vpor instead of vpblendvb(similar for vblendvps/vblendvpd) for SRF/CWF. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Guard instruction blendv generation under new tune. * config/i386/i386.h (TARGET_SSE_MOVCC_USE_BLENDV): New Macro. * config/i386/x86-tune.def (X86_TUNE_SSE_MOVCC_USE_BLENDV): New tune.
2024-10-10	x86: Implement Fast-Math Float Truncation to BF16 via PSRLD Instruction	Levy Hsu	3	-7/+83
	gcc/ChangeLog: * config/i386/i386.md: Rewrite insn truncsfbf2. gcc/testsuite/ChangeLog: * gcc.target/i386/truncsfbf-1.c: New test. * gcc.target/i386/truncsfbf-2.c: New test.
2024-10-09	diagnostics: move text output member functions to correct file	David Malcolm	3	-87/+71
	No functional change intended. gcc/ChangeLog: * diagnostic-format-text.cc (diagnostic_text_output_format::after_diagnostic): Replace call to show_any_path with body, taken from diagnostic.cc. (diagnostic_text_output_format::build_prefix): Move here from diagnostic.cc, updating to use get_diagnostic_kind_text and diagnostic_get_color_for_kind. (diagnostic_text_output_format::file_name_as_prefix): Move here from diagnostic.cc (diagnostic_text_output_format::append_note): Likewise. * diagnostic-format-text.h (diagnostic_text_output_format::show_any_path): Drop decl. * diagnostic.cc (diagnostic_text_output_format::file_name_as_prefix): Move to diagnostic-format-text.cc. (diagnostic_text_output_format::build_prefix): Likewise. (diagnostic_text_output_format::show_any_path): Move to body of diagnostic_text_output_format::after_diagnostic. (diagnostic_text_output_format::append_note): Move to diagnostic-format-text.cc. Signed-off-by: David Malcolm <dmalcolm@redhat.com>