riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-11-20	Work in progress for refactoring simd intrinsicdevel/existing-fp8	Saurabh Jha	5	-797/+330

2024-11-14	aarch64: Add support for fp8fma instructions	Saurabh Jha	9	-4/+319
	The AArch64 FEAT_FP8FMA extension introduces instructions for multiply-add of vectors. This patch introduces the following instructions: 1. {vmlalbq\|vmlaltq}_f16_mf8_fpm. 2. {vmlalbq\|vmlaltq}_lane{q}_f16_mf8_fpm. 3. {vmlallbbq\|vmlallbtq\|vmlalltbq\|vmlallttq}_f32_mf8_fpm. 4. {vmlallbbq\|vmlallbtq\|vmlalltbq\|vmlallttq}_lane{q}_f32_mf8_fpm. It introduces the fp8fma flag. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (check_simd_lane_bounds): Add support for new unspecs. (aarch64_expand_pragma_builtins): Add support for new unspecs. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): New flags. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): New flags. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_FMA_FPM): Macro to declare fma intrinsics. (REQUIRED_EXTENSIONS): Define to declare functions behind command line flags. * config/aarch64/aarch64-simd.md: (@aarch64_<fpm_uns_op><VQ_HSF:mode><VQ_HSF:mode><V16QI_ONLY:mode><V16QI_ONLY:mode): Instruction pattern for fma intrinsics. (@aarch64_<fpm_uns_op><VQ_HSF:mode><VQ_HSF:mode><V16QI_ONLY:mode><VB:mode><SI_ONLY:mode): Instruction pattern for fma intrinsics with lane. * config/aarch64/aarch64.h (TARGET_FP8FMA): New flag for fp8fma instructions. * config/aarch64/iterators.md: New attributes and iterators. * doc/invoke.texi: New flag for fp8fma instructions. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/fma_fpm.c: New test.
2024-11-14	aarch64: Add support for fp8dot2 and fp8dot4	Saurabh Jha	10	-15/+380
	The AArch64 FEAT_FP8DOT2 and FEAT_FP8DOT4 extension introduces instructions for dot product of vectors. This patch introduces the following intrinsics: 1. vdot{q}_{fp16\|fp32}_mf8_fpm. 2. vdot{q}_lane{q}_{fp16\|fp32}_mf8_fpm. It introduces two flags: fp8dot2 and fp8dot4. We had to add space for another type in aarch64_pragma_builtins_data struct. The macros were updated to reflect that. We added a new aarch64_builtin_signature variant, quaternary, and added support for it in the functions aarch64_fntype and aarch64_expand_pragma_builtin. We added a new namespace, function_checker, to implement range checks for functions defined using the new pragma approach. The old intrinsic range checks will continue to work. All the new AdvSIMD intrinsics we define that need lane checks should be using the function in this namespace to implement the checks. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (ENTRY): Change to handle extra type. (enum class): Added new variant. (struct aarch64_pragma_builtins_data): Add support for another type. (aarch64_get_number_of_args): Handle new signature. (require_integer_constant): New function to check whether the operand is an integer constant. (require_immediate_range): New function to validate index ranges. (check_simd_lane_bounds): New function to validate index operands. (aarch64_general_check_builtin_call): Call function_checker::check-simd_lane_bounds. (aarch64_expand_pragma_builtin): Handle new signature. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): New flags. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): New flags. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_BINARY): Change to handle extra type. (ENTRY_BINARY_FPM): Change to handle extra type. (ENTRY_UNARY_FPM): Change to handle extra type. (ENTRY_TERNARY_FPM_LANE): Macro to declare fpm ternary with lane intrinsics. (ENTRY_VDOT_FPM): Macro to declare vdot intrinsics. (REQUIRED_EXTENSIONS): Define to declare functions behind command line flags. * config/aarch64/aarch64-simd.md: (@aarch64_<fpm_uns_op><VHF:mode><VHF:mode><VB:mode><VB:mode>): Instruction pattern for vdot2 intrinsics. (@aarch64_<fpm_uns_op><VHF:mode><VHF:mode><VB:mode><VB2:mode><SI_ONLY:mode>): Instruction pattern for vdot2 intrinsics with lane. (@aarch64_<fpm_uns_op><VDQSF:mode><VDQSF:mode><VB:mode><VB:mode>): Instruction pattern for vdot4 intrinsics. (@aarch64_<fpm_uns_op><VDQSF:mode><VDQSF:mode><VB:mode><VB2:mode><SI_ONLY:mode>): Instruction pattern for vdo4 intrinsics with lane. * config/aarch64/aarch64.h (TARGET_FP8DOT2): New flag for fp8dot2 instructions. (TARGET_FP8DOT4): New flag for fp8dot4 instructions. * config/aarch64/iterators.md: New attributes and iterators. * doc/invoke.texi: New flag for fp8dot2 and fp8dot4 instructions. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vdot2_fpmdot.c: New test. * gcc.target/aarch64/simd/vdot4_fpmdot.c: New test.
2024-11-14	aarch64: Add support for fp8 convert and scale	Saurabh Jha	8	-49/+587
	The AArch64 FEAT_FP8 extension introduces instructions for conversion and scaling. This patch introduces the following intrinsics: 1. vcvt{1\|2}_{bf16\|high_bf16\|low_bf16}_mf8_fpm. 2. vcvt{q}_mf8_f16_fpm. 3. vcvt_{high}_mf8_f32_fpm. 4. vscale{q}_{f16\|f32\|f64}. We introduced two aarch64_builtin_signatures enum variants, unary and ternary, and added support for these variants in the functions aarch64_fntype and aarch64_expand_pragma_builtin. We added new simd_types for integers (s32, s32q, and s64q) and for floating points (f8 and f8q). Because we added support for fp8 intrinsics here, we modified the check in acle/fp8.c that was checking that __ARM_FEATURE_FP8 macro is not defined. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (ENTRY): Modified to support uses_fpmr flag. (enum class): New variants to support new signatures. (struct aarch64_pragma_builtins_data): Add a new boolean field, uses_fpmr. (aarch64_get_number_of_args): Helper function used in aarch64_fntype and aarch64_expand_pragma_builtin. (aarch64_fntype): Handle new signatures. (aarch64_expand_pragma_builtin): Handle new signatures. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): New flag for FP8. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_BINARY): Macro to declare binary intrinsics. (ENTRY_TERNARY): Macro to declare ternary intrinsics. (ENTRY_UNARY): Macro to declare unary intrinsics. (ENTRY_VHSDF): Macro to declare binary intrinsics. (ENTRY_VHSDF_VHSDI): Macro to declare binary intrinsics. (REQUIRED_EXTENSIONS): Define to declare functions behind command line flags. * config/aarch64/aarch64-simd.md (@aarch64_<fpm_unary_bf_uns_op><V8BF_ONLY:mode><VB:mode>): Unary pattern. (@aarch64_<fpm_unary_hf_uns_op><V8HF_ONLY:mode><VB:mode>): Unary pattern. (@aarch64_lower_<fpm_unary_bf_uns_op><V8BF_ONLY:mode><V16QI_ONLY:mode>): Unary pattern. (@aarch64_lower_<fpm_unary_hf_uns_op><V8HF_ONLY:mode><V16QI_ONLY:mode>): Unary pattern. (@aarch64<fpm_uns_op><VB:mode><VCVTFPM:mode><VH_SF:mode>): Binary pattern. (@aarch64_<fpm_uns_op><V16QI_ONLY:mode><V8QI_ONLY:mode><V4SF_ONLY:mode><V4SF_ONLY:mode>): Unary pattern. (@aarch64_<fpm_uns_op><VHSDF:mode><VHSDI:mode>): Binary pattern. * config/aarch64/iterators.md: New attributes and iterators. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/fp8.c: Remove check that fp8 feature macro doesn't exist. * gcc.target/aarch64/simd/scale_fpm.c: New test. * gcc.target/aarch64/simd/vcvt_fpm.c: New test.
2024-11-14	aarch64: Refactor infrastructure for advsimd intrinsics	Vladimir Miloserdov	2	-19/+77
	This patch refactors the infrastructure for defining advsimd pragma intrinsics, adding support for more flexible type and signature handling in future SIMD extensions. A new simd_type structure is introduced, which allows for consistent mode and qualifier management across various advsimd operations. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (ENTRY): Modify to include modes and qualifiers for simd_type structure. (ENTRY_VHSDF): Move to aarch64-builtins.cc to decouple. (struct simd_type): New structure for managing mode and qualifier combinations for SIMD types. (struct aarch64_pragma_builtins_data): Replace mode with simd_type to support multiple argument types for intrinsics. (aarch64_fntype): Modify to handle different shapes type. (aarch64_expand_pragma_builtin): Modify to handle different shapes type. * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_BINARY): Move from aarch64-builtins.cc. (ENTRY_VHSDF): Move from aarch64-builtins.cc. (REQUIRED_EXTENSIONS): New macro.
2024-11-14	i386: Fix cstorebf4 fp comparison operand [PR117495]	Hongyu Wang	2	-11/+33
	For cstorebf4 it uses comparison_operator for BFmode compare, which is incorrect when directly uses ix86_expand_setcc as it does not canonicalize the input comparison to correct the compare code by swapping operands. The original code without AVX10.2 calls emit_store_flag_force, who actually calls to emit_store_flags_1 and recurisive calls to this expander again with swapped operand and flag. Therefore, we can avoid do the redundant recurisive call by adjusting the comparison_operator to ix86_fp_comparison_operator, and calls ix86_expand_setcc directly. gcc/ChangeLog: PR target/117495 * config/i386/i386.md (cstorebf4): Use ix86_fp_comparison_operator and calls ix86_expand_setcc directly. gcc/testsuite/ChangeLog: PR target/117495 * gcc.target/i386/pr117495.c: New test.
2024-11-13	[PATCH] RISC-V: Bugfix for unrecognizable insn for XTheadVector	Jin Ma	2	-2/+16
	error: unrecognizable insn: (insn 35 34 36 2 (set (subreg:RVVM1SF (reg/v:RVVM1x4SF 142 [ _r ]) 0) (unspec:RVVM1SF [ (const_vector:RVVM1SF repeat [ (const_double:SF 0.0 [0x0.0p+0]) ]) (reg:DI 0 zero) (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_TH_VWLDST)) -1 (nil)) during RTL pass: mode_sw PR target/116591 gcc/ChangeLog: * config/riscv/vector.md: Add restriction to call pred_th_whole_mov. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/pr116591.c: New test.
2024-11-13	aarch64: Relax add_overloaded_function assert	Richard Sandiford	4	-18/+28
	There are some SVE intrinsics that support one set of suffixes for one extension (E1, say) and another set of suffixes for another extension (E2, say). It is usually the case that, mutatis mutandis, E2 extends E1. Listing E1 first would then ensure that the manual C overload would also require E1, making it suitable for resolving both the E1 forms and, where appropriate, the E2 forms. However, there was one exception: the I8MM, F32MM, and F64MM extensions to SVE each added variants of svmmla, but there was no svmmla for SVE itself. This was handled by adding an SVE entry for svmmla that only defined the C overload; it had no variants of its own. This situation occurs more often with upcoming patches. Rather than keep adding these dummy entries, it seemed better to make the code automatically compute the lowest common denominator for all definitions that share the same C overload. gcc/ * config/aarch64/aarch64-protos.h (aarch64_required_extensions::common_denominator): New member function. * config/aarch64/aarch64-sve-builtins-base.def: Remove zero-variant entry for mmla. * config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): Remove support for it. * config/aarch64/aarch64-sve-builtins.cc (function_builder::add_overloaded): Relax the assert for duplicate definitions and instead calculate the common denominator of all requirements.
2024-11-13	i386: Add -mveclibabi=aocl [PR56504]	Filip Kastl	7	-16/+418
	We currently support generating vectorized math calls to the AMD core math library (ACML) (-mveclibabi=acml). That library is end-of-life and its successor is the math library from AMD Optimizing CPU Libraries (AOCL). This patch adds support for AOCL (-mveclibabi=aocl). That significantly broadens the range of vectorized math functions optimized for AMD CPUs that GCC can generate calls to. See the edit to invoke.texi for a complete list of added functions. Compared to the list of functions in AOCL LibM docs I left out these vectorized function families: - sincos and all functions working with arrays ... Because these functions have pointer arguments and that would require a bigger rework of ix86_veclibabi_aocl(). Also, I'm not sure if GCC even ever generates calls to these functions. - linearfrac ... Because these functions are specific to the AMD library. There's no equivalent glibc function nor GCC internal function nor GCC built-in. - powx, sqrt, fabs ... Because GCC doesn't vectorize these functions into calls and uses instructions instead. I also left amd_vrd2_expm1() (the AMD docs list the function but I wasn't able to link calls to it with the current version of the library). gcc/ChangeLog: PR target/56504 * config/i386/i386-options.cc (ix86_option_override_internal): Add ix86_veclibabi_type_aocl case. * config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern ix86_veclibabi_aocl(). * config/i386/i386-opts.h (enum ix86_veclibabi): Add ix86_veclibabi_type_aocl into the ix86_veclibabi enum. * config/i386/i386.cc (ix86_veclibabi_aocl): New function. * config/i386/i386.opt: Add the 'aocl' type. * doc/invoke.texi: Document -mveclibabi=aocl. gcc/testsuite/ChangeLog: PR target/56504 * gcc.target/i386/vectorize-aocl1.c: New test. Signed-off-by: Filip Kastl <fkastl@suse.cz>
2024-11-13	hppa: Remove inner `fix:SF/DF` from fixed-point patterns	John David Anglin	1	-8/+8
	2024-11-13 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR target/117525 * config/pa/pa.md (fix_truncsfsi2): Remove inner `fix:SF`. (fix_truncdfsi2, fix_truncsfdi2, fix_truncdfdi2, fixuns_truncsfsi2, fixuns_truncdfsi2, fixuns_truncsfdi2, fixuns_truncdfdi2): Likewise.
2024-11-13	diagnostics: avoid using global_dc in path-printing	David Malcolm	6	-35/+53
	gcc/analyzer/ChangeLog: * checker-path.cc (checker_path::debug): Explicitly use global_dc's reference printer. * diagnostic-manager.cc (diagnostic_manager::prune_interproc_events): Likewise. (diagnostic_manager::prune_system_headers): Likewise. gcc/ChangeLog: * diagnostic-path.cc (diagnostic_event::get_desc): Add param "ref_pp" and use instead of global_dc. (class path_label): Likewise, adding field m_ref_pp. (event_range::event_range): Add param "ref_pp" and pass to m_path_label. (path_summary::path_summary): Add param "ref_pp" and pass to event_range ctor. (diagnostic_text_output_format::print_path): Pass pp to path_summary ctor. (selftest::test_empty_path): Pass event_pp to pass_summary ctor. (selftest::test_intraprocedural_path): Likewise. (selftest::test_interprocedural_path_1): Likewise. (selftest::test_interprocedural_path_2): Likewise. (selftest::test_recursion): Likewise. (selftest::test_control_flow_1): Likewise. (selftest::test_control_flow_2): Likewise. (selftest::test_control_flow_3): Likewise. (selftest::assert_cfg_edge_path_streq): Likewise. (selftest::test_control_flow_5): Likewise. (selftest::test_control_flow_6): Likewise. * diagnostic-path.h (diagnostic_event::get_desc): Add param "ref_pp". * lazy-diagnostic-path.cc (selftest::test_intraprocedural_path): Pass event_pp to get_desc. simple-diagnostic-path.cc (selftest::test_intraprocedural_path): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-13	Match: Fold pow calls to ldexp when possible [PR57492]	Soumya AR	3	-0/+101
	This patch transforms the following POW calls to equivalent LDEXP calls, as discussed in PR57492: powi (powof2, i) -> ldexp (1.0, i * log2 (powof2)) powof2 * ldexp (x, i) -> ldexp (x, i + log2 (powof2)) a * ldexp(1., i) -> ldexp (a, i) This is especially helpful for SVE architectures as LDEXP calls can be implemented using the FSCALE instruction, as seen in the following patch: https://gcc.gnu.org/g:9b2915d95d855333d4d8f66b71a75f653ee0d076 SPEC2017 was run with this patch, while there are no noticeable improvements, there are no non-noise regressions either. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: PR target/57492 * match.pd: Added patterns to fold calls to pow to ldexp and optimize specific ldexp calls. gcc/testsuite/ChangeLog: PR target/57492 * gcc.dg/tree-ssa/ldexp.c: New test. * gcc.dg/tree-ssa/pow-to-ldexp.c: New test.
2024-11-13	RISC-V: Add Multi-Versioning Test Cases	Yangyu Chen	9	-0/+458
	This patch adds test cases for the Function Multi-Versioning (FMV) feature for RISC-V, which reuses the existing test cases from the aarch64 and ported them to RISC-V. Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/testsuite/ChangeLog: * g++.target/riscv/mv-symbols1.C: New test. * g++.target/riscv/mv-symbols2.C: New test. * g++.target/riscv/mv-symbols3.C: New test. * g++.target/riscv/mv-symbols4.C: New test. * g++.target/riscv/mv-symbols5.C: New test. * g++.target/riscv/mvc-symbols1.C: New test. * g++.target/riscv/mvc-symbols2.C: New test. * g++.target/riscv/mvc-symbols3.C: New test. * g++.target/riscv/mvc-symbols4.C: New test.
2024-11-13	RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and ↵	Yangyu Chen	1	-0/+587
	TARGET_GET_FUNCTION_VERSIONS_DISPATCHER This patch implements the TARGET_GENERATE_VERSION_DISPATCHER_BODY and TARGET_GET_FUNCTION_VERSIONS_DISPATCHER for RISC-V. This is used to generate the dispatcher function and get the dispatcher function for function multiversioning. This patch copies many codes from commit 0cfde688e213 ("[aarch64] Add function multiversioning support") and modifies them to fit the RISC-V port. A key difference is the data structure of feature bits in RISC-V C-API is a array of unsigned long long, while in AArch64 is not a array. So we need to generate the array reference for each feature bits element in the dispatcher function. Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv.cc (add_condition_to_bb): New function. (dispatch_function_versions): New function. (get_suffixed_assembler_name): New function. (make_resolver_func): New function. (riscv_generate_version_dispatcher_body): New function. (riscv_get_function_versions_dispatcher): New function. (TARGET_GENERATE_VERSION_DISPATCHER_BODY): Implement it. (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Implement it.
2024-11-13	RISC-V: Implement TARGET_MANGLE_DECL_ASSEMBLER_NAME	Yangyu Chen	1	-0/+39
	This patch implements the TARGET_MANGLE_DECL_ASSEMBLER_NAME for RISC-V. This is used to add function multiversioning suffixes to the assembler name. Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_mangle_decl_assembler_name): New function. (TARGET_MANGLE_DECL_ASSEMBLER_NAME): Define.
2024-11-13	RISC-V: Implement TARGET_COMPARE_VERSION_PRIORITY and ↵	Yangyu Chen	1	-0/+127
	TARGET_OPTION_FUNCTION_VERSIONS This patch implements TARGET_COMPARE_VERSION_PRIORITY and TARGET_OPTION_FUNCTION_VERSIONS for RISC-V. The TARGET_COMPARE_VERSION_PRIORITY is implemented to compare the priority of two function versions based on the rules defined in the RISC-V C-API Doc PR #85: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85/files#diff-79a93ca266139524b8b642e582ac20999357542001f1f4666fbb62b6fb7a5824R721 If multiple versions have equal priority, we select the function with the most number of feature bits generated by riscv_minimal_hwprobe_feature_bits. When it comes to the same number of feature bits, we diff two versions and select the one with the least significant bit set. Since a feature appears earlier in the feature_bits might be more important to performance. The TARGET_OPTION_FUNCTION_VERSIONS is implemented to check whether the two function versions are the same. This Implementation reuses the code in TARGET_COMPARE_VERSION_PRIORITY and check it returns 0, which means the equal priority. Co-Developed-by: Hank Chang <hank.chang@sifive.com> Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv.cc (parse_features_for_version): New function. (compare_fmv_features): New function. (riscv_compare_version_priority): New function. (riscv_common_function_versions): New function. (TARGET_COMPARE_VERSION_PRIORITY): Implement it. (TARGET_OPTION_FUNCTION_VERSIONS): Implement it.
2024-11-13	RISC-V: Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P	Yangyu Chen	4	-13/+115
	This patch implements the TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P for RISC-V. This hook is used to process attribute ((target_version ("..."))). As it is the first patch which introduces the target_version attribute, we also set TARGET_HAS_FMV_TARGET_ATTRIBUTE to 0 to use "target_version" for function versioning. Co-Developed-by: Hank Chang <hank.chang@sifive.com> Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_process_target_attr): Remove as it is not used. (riscv_option_valid_version_attribute_p): Declare. (riscv_process_target_version_attr): Declare. * config/riscv/riscv-target-attr.cc (riscv_target_attrs): Renamed from riscv_attributes. (riscv_target_version_attrs): New attributes for target_version. (riscv_process_one_target_attr): New arguments to select attrs. (riscv_process_target_attr): Likewise. (riscv_option_valid_attribute_p): Likewise. (riscv_process_target_version_attr): New function. (riscv_option_valid_version_attribute_p): New function. * config/riscv/riscv.cc (TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): Implement it. * config/riscv/riscv.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): Define it to 0 to use "target_version" for function versioning.
2024-11-13	RISC-V: Implement riscv_minimal_hwprobe_feature_bits	Yangyu Chen	4	-0/+226
	This patch implements the riscv_minimal_hwprobe_feature_bits feature for the RISC-V target. The feature bits are defined in the libgcc/config/riscv/feature_bits.c to provide bitmasks of ISA extensions that defined in RISC-V C-API. Thus, we need a function to generate the feature bits for IFUNC resolver to dispatch between different functions based on the hardware features. The minimal feature bits means to use the earliest extension appeard in the Linux hwprobe to cover the given ISA string. To allow older kernels without some implied extensions probe to run the FMV dispatcher correctly. For example, V implies Zve32x, but Zve32x appears in the Linux kernel since v6.11. If we use isa string directly to generate FMV dispatcher with functions with "arch=+v" extension, since we have V implied the Zve32x, FMV dispatcher will check if the Zve32x extension is supported by the host. If the Linux kernel is older than v6.11, the FMV dispatcher will fail to detect the Zve32x extension even it already implies by the V extension, thus making the FMV dispatcher fail to dispatch the correct function. Thus, we need to generate the minimal feature bits to cover the given ISA string to allow the FMV dispatcher to work correctly on older kernels. Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * common/config/riscv/riscv-common.cc (RISCV_EXT_BITMASK): New macro. (struct riscv_ext_bitmask_table_t): New struct. (riscv_minimal_hwprobe_feature_bits): New function. * common/config/riscv/riscv-ext-bitmask.def: New file. * config/riscv/riscv-subset.h (GCC_RISCV_SUBSET_H): Include riscv-feature-bits.h. (riscv_minimal_hwprobe_feature_bits): Declare the function. * config/riscv/riscv-feature-bits.h: New file.
2024-11-13	RISC-V: Implement Priority syntax parser for Function Multi-Versioning	Yangyu Chen	2	-0/+27
	This patch adds the priority syntax parser to support the Function Multi-Versioning (FMV) feature in RISC-V. This feature allows users to specify the priority of the function version in the attribute syntax. Chnages based on RISC-V C-API PR: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85 Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::handle_priority): New function. (riscv_target_attr_parser::update_settings): Update priority attribute. * config/riscv/riscv.opt: Add TargetVariable riscv_fmv_priority.
2024-11-13	Introduce TARGET_CLONES_ATTR_SEPARATOR for RISC-V	Yangyu Chen	7	-15/+48
	Some architectures may use ',' in the attribute string, but it is not used as the separator for different targets. To avoid conflict, we introduce a new macro TARGET_CLONES_ATTR_SEPARATOR to separate different clones. As an example, according to RISC-V C-API Specification [1], RISC-V allows ',' in the attribute string in the "arch=" option to specify one more ISA extensions in the same target function, which conflict with the default separator to separate different clones. This patch introduces TARGET_CLONES_ATTR_SEPARATOR for RISC-V and choose '#' as the separator, since '#' is not allowed in the target_clones option string. [1] https://github.com/riscv-non-isa/riscv-c-api-doc/blob/c6c5d6d9cf96b342293315a5dff3d25e96ef8191/src/c-api.adoc#__attribute__targetattr-string Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * defaults.h (TARGET_CLONES_ATTR_SEPARATOR): Define new macro. * multiple_target.cc (get_attr_str): Use TARGET_CLONES_ATTR_SEPARATOR to separate attributes. (separate_attrs): Likewise. (expand_target_clones): Likewise. * attribs.cc (attr_strcmp): Likewise. (sorted_attr_string): Likewise. * tree.cc (get_target_clone_attr_len): Likewise. * config/riscv/riscv.h (TARGET_CLONES_ATTR_SEPARATOR): Define TARGET_CLONES_ATTR_SEPARATOR for RISC-V. * doc/tm.texi: Document TARGET_CLONES_ATTR_SEPARATOR. * doc/tm.texi.in: Likewise.
2024-11-13	Fortran: Fix failing character pointer fcn assignment [PR105054]	Paul Thomas	2	-0/+100
	2024-11-14 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/105054 * resolve.cc (get_temp_from_expr): If the pointer function has a deferred character length, generate a new deferred charlen for the temporary. gcc/testsuite/ PR fortran/105054 * gfortran.dg/ptr_func_assign_6.f08: New test.
2024-11-13	c: add Wzero-as-null-pointer-constant [PR117059]	Martin Uecker	4	-4/+148
	Add warnings for the use of zero as a null pointer constant to the C FE. PR c/117059 gcc/c-family/ChangeLog: * c.opt (Wzero-as-null-pointer-constant): Enable for C and ObjC. gcc/c/ChangeLog: * c-typeck.cc (parse_build_binary_op): Add warning. (build_conditional_expr): Add warning. (convert_for_assignment): Add warning. gcc/ChangeLog: * doc/invoke.texi (Wzero-as-null-pointer-constant): Adapt description. gcc/testsuite/ChangeLog: * gcc.dg/Wzero-as-null-pointer-constant.c: New test. Suggested-by: Alejandro Colomar <alx@kernel.org> Acked-by: Alejandro Colomar <alx@kernel.org> Reviewed-by: Joseph Myers <josmyers@redhat.com>
2024-11-13	c: Handle C23 floating constant {d,D}{32,64,128} suffixes like {df,dd,dl}	Jakub Jelinek	5	-1/+84
	C23 roughly says that {d,D}{32,64,128} floating point constant suffixes are alternate spellings of {df,dd,dl} suffixes in annex H. So, the following patch allows that alternate spelling. Or is it intentional it isn't enabled and we need to do everything in there first before trying to define __STDC_IEC_60559_DFP__? Like add support for _Decimal32x and _Decimal64x types (including the d32x and d64x suffixes) etc. 2024-11-13 Jakub Jelinek <jakub@redhat.com> libcpp/ * expr.cc (interpret_float_suffix): Handle d32 and D32 suffixes for C like df, d64 and D64 like dd and d128 and D128 like dl. gcc/c-family/ * c-lex.cc (interpret_float): Subtract 3 or 4 from copylen rather than 2 if last character of CPP_N_DFLOAT is a digit. gcc/testsuite/ * gcc.dg/dfp/c11-constants-3.c: New test. * gcc.dg/dfp/c11-constants-4.c: New test. * gcc.dg/dfp/c23-constants-3.c: New test. * gcc.dg/dfp/c23-constants-4.c: New test.
2024-11-13	c: Implement C2Y N3298 - Introduce complex literals [PR117029]	Jakub Jelinek	23	-1/+538
	The following patch implements the C2Y N3298 paper Introduce complex literals by providing different (or no) diagnostics on imaginary constants (except for integer ones). For _DecimalN constants we don't support _Complex _DecimalN and error on any i/j suffixes mixed with DD/DL/DF, so nothing changed there. 2024-11-13 Jakub Jelinek <jakub@redhat.com> PR c/117029 libcpp/ * include/cpplib.h (struct cpp_options): Add imaginary_constants member. * init.cc (struct lang_flags): Add imaginary_constants bitfield. (lang_defaults): Add column for imaginary_constants. (cpp_set_lang): Copy over imaginary_constants. * expr.cc (cpp_classify_number): Diagnose CPP_N_IMAGINARY non-CPP_N_FLOATING constants differently for C. gcc/testsuite/ * gcc.dg/cpp/pr7263-3.c: Adjust expected diagnostic wording. * gcc.dg/c23-imaginary-constants-1.c: New test. * gcc.dg/c23-imaginary-constants-2.c: New test. * gcc.dg/c23-imaginary-constants-3.c: New test. * gcc.dg/c23-imaginary-constants-4.c: New test. * gcc.dg/c23-imaginary-constants-5.c: New test. * gcc.dg/c23-imaginary-constants-6.c: New test. * gcc.dg/c23-imaginary-constants-7.c: New test. * gcc.dg/c23-imaginary-constants-8.c: New test. * gcc.dg/c23-imaginary-constants-9.c: New test. * gcc.dg/c23-imaginary-constants-10.c: New test. * gcc.dg/c2y-imaginary-constants-1.c: New test. * gcc.dg/c2y-imaginary-constants-2.c: New test. * gcc.dg/c2y-imaginary-constants-3.c: New test. * gcc.dg/c2y-imaginary-constants-4.c: New test. * gcc.dg/c2y-imaginary-constants-5.c: New test. * gcc.dg/c2y-imaginary-constants-6.c: New test. * gcc.dg/c2y-imaginary-constants-7.c: New test. * gcc.dg/c2y-imaginary-constants-8.c: New test. * gcc.dg/c2y-imaginary-constants-9.c: New test. * gcc.dg/c2y-imaginary-constants-10.c: New test. * gcc.dg/c2y-imaginary-constants-11.c: New test. * gcc.dg/c2y-imaginary-constants-12.c: New test.
2024-11-13	aarch64: Optimise calls to ldexp with SVE FSCALE instruction [PR111733]	Soumya AR	4	-7/+72
	This patch uses the FSCALE instruction provided by SVE to implement the standard ldexp family of functions. Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the following code: float test_ldexpf (float x, int i) { return __builtin_ldexpf (x, i); } double test_ldexp (double x, int i) { return __builtin_ldexp(x, i); } GCC Output: test_ldexpf: b ldexpf test_ldexp: b ldexp Since SVE has support for an FSCALE instruction, we can use this to process scalar floats by moving them to a vector register and performing an fscale call, similar to how LLVM tackles an ldexp builtin as well. New Output: test_ldexpf: fmov s31, w0 ptrue p7.b, vl4 fscale z0.s, p7/m, z0.s, z31.s ret test_ldexp: sxtw x0, w0 ptrue p7.b, vl8 fmov d31, x0 fscale z0.d, p7/m, z0.d, z31.d ret This is a revision of an earlier patch, and now uses the extended definition of aarch64_ptrue_reg to generate predicate registers with the appropriate set bits. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: PR target/111733 * config/aarch64/aarch64-sve.md (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar floating modes and expand to the existing pattern for FSCALE. * config/aarch64/iterators.md: (SVE_FULL_F_SCALAR): Added an iterator to match all FP SVE modes as well as their scalar equivalents. (VPRED): Extended the attribute to handle GPF_HF modes. * internal-fn.def (LDEXP): Changed macro to incorporate ldexpf16. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/fscale.c: New test.
2024-11-13	RISC-V: Bugfix for max_sew_overlap_and_next_ratio_valid_for_prev_sew_p[pr117483]	xuli	2	-2/+29
	This patch fixs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117483 If prev and next satisfy the following rules, we should forbid the case (next.get_sew() < prev.get_sew() && (!next.get_ta() \|\| !next.get_ma())) in the compatible function max_sew_overlap_and_next_ratio_valid_for_prev_sew_p. Otherwise, the tail elements of next will be polluted. DEF_SEW_LMUL_RULE (ge_sew, ratio_and_ge_sew, ratio_and_ge_sew, max_sew_overlap_and_next_ratio_valid_for_prev_sew_p, always_false, use_max_sew_and_lmul_with_next_ratio) Passed the rv64gcv full regression test. Signed-off-by: Li Xu <xuli1@eswincomputing.com> PR target/117483 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Fix bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117483.c: New test.
2024-11-12	[RISC-V] Fix costing of LO_SUM expressions	Xianmiao Qu	1	-1/+2
	This is a rewrite of a patch originally from Xianmiao Qu. Xianmiao noticed that the costs we compute for LO_SUM expressions was incorrect. Essentially we costed based solely on the first input to the LO_SUM. In a LO_SUM, the first input is almost always going to be a REG and thus isn't interesting. The second argument is almost always going to be some kind of symbolic operand, which is much more interesting from a costing standpoint. The right way to fix this is to sum the cost of the two operands. I've verified this produces the same code as Xianmiao's Qu's original patch. This has been tested on rv32 and rv64 in my tester. It missed today's bootstrap of riscv64 though :( Naturally I'll wait on the pre-commit CI tester to render a verdict, but I don't expect any problems. -- From Xianmiao Qu's original submission -- Currently, the cost of the LO_SUM expression is based on the cost of calculating the first subexpression. When the first subexpression is a register, the cost result will be zero. It seems a bit unreasonable for a SET expression to have a zero cost when its source is LO_SUM. Moreover, having a cost of zero for the expression will lead the loop invariant pass to calculate its benefits of being moved outside the loop as zero, thus preventing the out-of-loop placement of the loop invariant. As an example, consider the following test case: long a; long b[]; long c; foo () { for (;;) c = b[a]; } When compiling with -march=rv64gc -mabi=lp64d -Os, the following code is generated: .cfi_startproc lui a5,%hi(c) ld a4,%lo(c)(a5) lui a2,%hi(b) lui a1,%hi(a) .L2: ld a5,%lo(a)(a1) addi a3,a2,%lo(b) slli a5,a5,3 add a5,a5,a3 ld a5,0(a5) sd a5,0(a4) j .L2 After adjust the cost of the LO_SUM expression, the instruction addi will be moved outside the loop: .cfi_startproc lui a5,%hi(c) ld a3,%lo(c)(a5) lui a4,%hi(b) lui a2,%hi(a) addi a4,a4,%lo(b) .L2: ld a5,%lo(a)(a2) slli a5,a5,3 add a5,a5,a4 ld a5,0(a5) sd a5,0(a3) j .L2 gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Correct costing of LO_SUM expressions. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2024-11-12	Reapply "[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]"	Jeff Law	2	-0/+36
	This reverts commit de3b277247ce98d189f121155b75f490725a42f6.
2024-11-13	i386: Zero extend 32-bit address to 64-bit with option -mx32 ↵	Hu, Lin1	2	-0/+36
	-maddress-mode=long. [PR 117418] -maddress-mode=long let Pmode = DI_mode, so zero extend 32-bit address to 64-bit and uses a 64-bit register as a pointer for avoid raise an ICE. gcc/ChangeLog: PR target/117418 * config/i386/i386-expand.cc (ix86_expand_builtin): Convert pointer's mode according to Pmode. gcc/testsuite/ChangeLog: PR target/117418 * gcc.target/i386/pr117418-1.c: New test.
2024-11-13	Daily bump.	GCC Administrator	5	-1/+1075

2024-11-12	Revert "[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]"	Jeff Law	2	-36/+0
	This reverts commit 69bd93c167fefbdff0cb88614275358b7a2b2941.
2024-11-12	RISC-V: Fix target-attr-norelax.c testcase	Yangyu Chen	1	-3/+4
	The target-attr-norelax.c testcase was failing due to the redundant "\t" check in the assembly output, and forgot to skip the check for lto build in the testcase. gcc/testsuite/ChangeLog: * gcc.target/riscv/target-attr-norelax.c: Fix testcase.
2024-11-13	Revert "Match: Simplify branch form 3 of unsigned SAT_ADD into branchless"	Pan Li	5	-67/+4
	This reverts commit df4af89bc3eabbeaccb16539aa1082cb9863e187.
2024-11-12	selftests: clear GCC_COLORS [PR117503]	David Malcolm	1	-1/+1
	gcc/ChangeLog: PR bootstrap/117503 * Makefile.in (GCC_FOR_SELFTESTS): Set GCC_COLORS=. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-11-12	hppa: Fix decrement_and_branch_until_zero constraint	John David Anglin	1	-1/+1
	The third alternative for argument 4 needs to be an early clobber constraint. Noticed testing LRA. 2024-11-12 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa.md (decrement_and_branch_until_zero): Fix constraint.
2024-11-12	RISC-V: testsuite: Remove deprecated compatibility headers	Edwin Lu	17	-17/+0
	Since r15-4981-g5c34f02ba7e these tests have been failing on vector targets with excess errors due to the new deprecation warning message. Remove the <cstdalign> header. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/base/bug-10.C: Remove cstdalign header. * g++.target/riscv/rvv/base/bug-11.C: Ditto. * g++.target/riscv/rvv/base/bug-12.C: Ditto. * g++.target/riscv/rvv/base/bug-13.C: Ditto. * g++.target/riscv/rvv/base/bug-14.C: Ditto. * g++.target/riscv/rvv/base/bug-15.C: Ditto. * g++.target/riscv/rvv/base/bug-16.C: Ditto. * g++.target/riscv/rvv/base/bug-17.C: Ditto. * g++.target/riscv/rvv/base/bug-2.C: Ditto. * g++.target/riscv/rvv/base/bug-23.C: Ditto. * g++.target/riscv/rvv/base/bug-3.C: Ditto. * g++.target/riscv/rvv/base/bug-4.C: Ditto. * g++.target/riscv/rvv/base/bug-5.C: Ditto. * g++.target/riscv/rvv/base/bug-6.C: Ditto. * g++.target/riscv/rvv/base/bug-7.C: Ditto. * g++.target/riscv/rvv/base/bug-8.C: Ditto. * g++.target/riscv/rvv/base/bug-9.C: Ditto. Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
2024-11-12	Verify that empty std::vector is optimized away	Jan Hubicka	1	-0/+60
	With __builtin_operator_new we now can optimize away unused std::vectors. This adds testcases mentioned in the PR. PR tree-optimization/96945 gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/pr96945.C: New test.
2024-11-12	testsuite: Adjust jump threading test expectation	Andrew Carlotti	1	-1/+1
	This test started failing on aarch64 after 0cfc9c95 in 2023 ("Phi analyzer - Initialize with range instead of a tree."). The only change visible in the pass dumps prior to thread2 is the upper bounds of some ranges are reduced from +INF to 7, consistent with the bitamsk information. After thread2, there are changes in the control flow, but only affecting edges that are obviously never taken (from basic blocks 6 through 12). These are cleaned up in the following pass, but the final codegen remains different. There isn't anything obviously wrong with the change in dump output, so let's just update the test expectations (as has happened previously here). gcc/testsuite/ChangeLog: PR tree-optimization/112376 * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Update expectation.
2024-11-12	AArch64: Remove duplicated addr_cost tables	Wilco Dijkstra	7	-115/+7
	Remove duplicated addr_cost tables - use generic_armv9_a_addrcost_table for Armv9-a cores and generic_armv8_a_addrcost_table for recent Armv8-a cores. No changes in generated code. gcc/ChangeLog: * config/aarch64/tuning_models/cortexx925.h (cortexx925_addrcost_table): Remove. * config/aarch64/tuning_models/neoversen1.h: Use generic_armv8_a_addrcost_table. * config/aarch64/tuning_models/neoversen2.h (neoversen2_addrcost_table): Remove. * config/aarch64/tuning_models/neoversen3.h (neoversen3_addrcost_table): Remove. * config/aarch64/tuning_models/neoversev2.h (neoversev2_addrcost_table): Remove. * config/aarch64/tuning_models/neoversev3.h (neoversev3_addrcost_table): Remove. * config/aarch64/tuning_models/neoversev3ae.h (neoversev3ae_addrcost_table): Remove.
2024-11-12	AArch64: Cleanup fusion defines	Wilco Dijkstra	29	-57/+46
	Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a common base level of fusion supported by almost all cores. Add AARCH64_FUSE_MOVK as a shortcut for all MOVK fusion. In most cases there is no change. It enables AARCH64_FUSE_CMP_BRANCH for a few older cores since it has no measurable effect if a core doesn't support it. Also it may have been accidentally left out on some cores that support all other types of branch fusion. gcc/ChangeLog: * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_BASE): New define. (AARCH64_FUSE_MOVK): Likewise. * config/aarch64/tuning_models/a64fx.h: Update. * config/aarch64/tuning_models/ampere1.h: Likewise. * config/aarch64/tuning_models/ampere1a.h: Likewise. * config/aarch64/tuning_models/ampere1b.h: Likewise. * config/aarch64/tuning_models/cortexa35.h: Likewise. * config/aarch64/tuning_models/cortexa53.h: Likewise. * config/aarch64/tuning_models/cortexa57.h: Likewise. * config/aarch64/tuning_models/cortexa72.h: Likewise. * config/aarch64/tuning_models/cortexa73.h: Likewise. * config/aarch64/tuning_models/cortexx925.h: Likewise. * config/aarch64/tuning_models/exynosm1.h: Likewise. * config/aarch64/tuning_models/fujitsu_monaka.h: Likewise. * config/aarch64/tuning_models/generic.h: Likewise. * config/aarch64/tuning_models/generic_armv8_a.h: Likewise. * config/aarch64/tuning_models/generic_armv9_a.h: Likewise. * config/aarch64/tuning_models/neoverse512tvb.h: Likewise. * config/aarch64/tuning_models/neoversen1.h: Likewise. * config/aarch64/tuning_models/neoversen2.h: Likewise. * config/aarch64/tuning_models/neoversen3.h: Likewise. * config/aarch64/tuning_models/neoversev1.h: Likewise. * config/aarch64/tuning_models/neoversev2.h: Likewise. * config/aarch64/tuning_models/neoversev3.h: Likewise. * config/aarch64/tuning_models/neoversev3ae.h: Likewise. * config/aarch64/tuning_models/qdf24xx.h: Likewise. * config/aarch64/tuning_models/saphira.h: Likewise. * config/aarch64/tuning_models/thunderx2t99.h: Likewise. * config/aarch64/tuning_models/thunderx3t110.h: Likewise. * config/aarch64/tuning_models/tsv110.h: Likewise.
2024-11-12	RISC-V: Fix incorrect test macro for signed scalar SAT_ADD form 2 run test	Pan Li	5	-8/+10
	This patch would like to fix one incorrect test macro usage for form 2 of signed scalar SAT_ADD run test. It should leverage the _FMT_2 instead of _FMT_1 for form 2. The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macro. * gcc.target/riscv/sat_s_add-run-5.c: Take form 2 for run test. * gcc.target/riscv/sat_s_add-run-6.c: Ditto. * gcc.target/riscv/sat_s_add-run-7.c: Ditto. * gcc.target/riscv/sat_s_add-run-8.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-11-12	RISC-V: Add norelax function attribute	yulong	2	-16/+49
	This patch adds norelax function attribute that be discussed in riscv-c-api-doc PR#94. URL:https://github.com/riscv-non-isa/riscv-c-api-doc/pull/94 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_declare_function_name): Add new attribute.
2024-11-12	[RISC-V] Drop undesirable two instruction macc alternatives	Jeff Law	1	-170/+140
	So I was looking at sub_dct a little while ago and was surprised to see us emit two instructions out of a single pattern. We generally try to avoid that -- it's not always possible, but as a general rule of thumb it should be avoided. Specifically I saw: > vmv1r.v v4,v2 # 138 [c=4 l=4] pred_mul_plusrvvm1hi_undef/5 > vmacc.vv v4,v8,v1 When we emit multiple instructions out of a single pattern we can't build a good schedule as we can't really describe the two instructions well and we can't split them up -- they move as an atomic unit. These cases can also raise correctness issues if the pattern doesn't properly account for both instructions in its length computation. Note the length, 4 bytes. So this is both a performance and latent correctness issue. It appears that these alternatives are meant to deal with the case when we have three source inputs and a non-matching output. The author did put in "?" to slightly disparage these alternatives, but a "!" would have been better. The best solution is to just remove those alternatives and let the allocator manage the matching operand issue. That's precisely what this patch does. For the various integer multiply-add/multiply-accumulate patterns we drop the alternatives which don't require a match between the output and one of the inputs. That fixes the correctness issue and should shave a cycle or two off our sub_dct code. Essentially the move bubbles up into an empty slot and we can schedule around the vmacc sensibly. Interestingly enough this fixes a scan-assembler test in my tester for both rv32 and rv64. > Tests that now work, but didn't before (10 tests): > > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 > unix/-march=rv32gcv: gcc: gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c scan-assembler-times \\tvmacc\\.vv 8 My BPI is already in a bootstrap test, so this patch won't hit the BPI for bootstrapping until Wednesday, meaning no data until Thursday. Will wait for the pre-commit tester though. gcc/ config/riscv/vector.md (pred_mul_plus<mode>_undef): Drop alternatives where output doesn't have to match input. (pred_madd<mode>, pred_macc<mode>): Likewise. (pred_madd<mode>_scalar, pred_macc<mode>_scalar): Likewise. (pred_madd<mode>_exended_scalar): Likewise. (pred_macc<mode>_exended_scalar): Likewise. (pred_minus_mul<mode>_undef): Likewise. (pred_nmsub<mode>, pred_nmsac<mode>): Likewise. (pred_nmsub<mode>_scalar, pred_nmsac<mode>_scalar): Likewise. (pred_nmsub<mode>_exended_scalar): Likewise. (pred_nmsac<mode>_exended_scalar): Likewise.
2024-11-12	tree-optimization/116973 - SLP permute lower heuristic and single-lane SLP	Richard Biener	1	-6/+11
	When forcing single-lane SLP to emulate non-SLP behavior we need to disable heuristics designed to optimize SLP loads and instead in all cases resort to an interleaving scheme as requested by forcefully doing single-lane SLP. This fixes the remaining fallout for --param vect-force-slp=1 on x86. PR tree-optimization/116973 * tree-vect-slp.cc (vect_lower_load_permutations): Add force_single_lane parameter. Disable heuristic that keeps some load-permutations. (vect_analyze_slp): Pass force_single_lane to vect_lower_load_permutations.
2024-11-12	libsanitizer: update test	Kito Cheng	1	-2/+2
	gcc/testsuite/ChangeLog: * c-c++-common/ubsan/builtin-1.c: Update test case due to sanitizer has change the error message.
2024-11-12	[committed] Fix minor c6x backend bug exposed by CRC patches	Jeff Law	1	-2/+4
	This is a minor bug in the c6x port I saw when testing Mariam's CRC work. Specifically some of the CRC tests were failing with a segfault testing if an operand was an "a_register" from within the dest_regfile attribute. We were extracting what we thought should have been a register operand then looking at the REGNO. The underlying data was totally bogus, hence the fault in the accessor macros. The core issue is we were trying to extract operands from a nop insn which has no operands. As far as I can tell "unknown" is a reasonable answer for the dest_regfile attribute on a nop insn, so this patch adds an explicit setting of dest_regfile rather than letting the default processing kick in. I'm applying the attached patch to the trunk. There's still a backend bug affecting ~15 CRC tests. Essentially the assembler complains about a label (related to debugging info) not at the start of an execution packet. I'm not chasing this down. gcc/ * config/c6x/c6x.md (nop, nop_count): Add explicit "dest_regfile" attribute setting.
2024-11-12	ada: Typo fix in comment	Marc Poulhiès	1	-1/+1
	gcc/ada/ChangeLog: * gcc-interface/Makefile.in: Remove extra 'with'.
2024-11-12	ada: Compile time crash on limited object in extended return	squirek	1	-5/+17
	This patch fixes an error in the compiler whereby using an extended return on an object of limited tagged type which extends a tagged protected type may lead to a compile-time crash. gcc/ada/ChangeLog: * exp_ch3.adb (Build_Assignment): Add condition to fetch corresponding record types for concurrent tagged types.
2024-11-12	ada: Fix spurious error on iterated component association with large index type	Eric Botcazou	1	-12/+19
	This is only for the Ada 2022 form of the iterated component association. gcc/ada/ChangeLog: * exp_aggr.adb (Two_Pass_Aggregate_Expansion): Use a type sized from the index type to compute the length. Simplify and remove useless calls to New_Copy_Tree for this computation.
2024-11-12	ada: Include design documentation within runtime sources	Pat Bernardi	233	-334/+1465
	The existing design documentation, required when generating the Software Architecture Design Specification and Software Component Design Specification documents for the light and light-tasking runtimes, has been included directly within runtime sources. gcc/ada/ChangeLog: * libgnarl/a-dynpri.ads: Add design annotations. * libgnarl/a-reatim.ads: Likewise. * libgnarl/a-synbar.ads: Likewise. * libgnarl/a-taside.ads: Likewise. * libgnarl/s-tarest.ads: Likewise. * libgnarl/s-tasinf.ads: Likewise. * libgnarl/s-taspri__posix.ads: Likewise. * libgnarl/s-tpobmu.ads: Likewise. * libgnat/a-assert.ads: Likewise. * libgnat/a-comlin.ads: Likewise. * libgnat/a-nbnbig.ads: Likewise. * libgnat/a-nubinu.ads: Likewise. * libgnat/a-numeri.ads: Likewise. * libgnat/a-unccon.ads: Likewise. * libgnat/a-uncdea.ads: Likewise. * libgnat/ada.ads: Likewise. * libgnat/g-debuti.ads: Likewise. * libgnat/g-sestin.ads: Likewise. * libgnat/g-souinf.ads: Likewise. * libgnat/gnat.ads: Likewise. * libgnat/i-cexten.ads: Likewise. * libgnat/i-cexten__128.ads: Likewise. * libgnat/i-cstrin.adb: Likewise. * libgnat/i-cstrin.ads: Likewise. * libgnat/interfac__2020.ads: Likewise. * libgnat/machcode.ads: Likewise. * libgnat/s-addope.ads: Likewise. * libgnat/s-aridou.ads: Likewise. * libgnat/s-arit32.ads: Likewise. * libgnat/s-arit64.ads: Likewise. * libgnat/s-assert.ads: Likewise. * libgnat/s-atacco.ads: Likewise. * libgnat/s-atocou.ads: Likewise. * libgnat/s-atocou__builtin.adb: Likewise. * libgnat/s-atopri.ads: Likewise. * libgnat/s-bitops.ads: Likewise. * libgnat/s-boarop.ads: Likewise. * libgnat/s-bytswa.ads: Likewise. * libgnat/s-carsi8.ads: Likewise. * libgnat/s-carun8.ads: Likewise. * libgnat/s-casi16.ads: Likewise. * libgnat/s-casi32.ads: Likewise. * libgnat/s-casi64.ads: Likewise. * libgnat/s-caun16.ads: Likewise. * libgnat/s-caun32.ads: Likewise. * libgnat/s-caun64.ads: Likewise. * libgnat/s-exnint.ads: Likewise. * libgnat/s-exnllf.ads: Likewise. * libgnat/s-exnlli.ads: Likewise. * libgnat/s-expint.ads: Likewise. * libgnat/s-explli.ads: Likewise. * libgnat/s-expllu.ads: Likewise. * libgnat/s-expmod.ads: Likewise. * libgnat/s-exponn.ads: Likewise. * libgnat/s-expont.ads: Likewise. * libgnat/s-exponu.ads: Likewise. * libgnat/s-expuns.ads: Likewise. * libgnat/s-fatflt.ads: Likewise. * libgnat/s-fatgen.ads: Likewise. * libgnat/s-fatlfl.ads: Likewise. * libgnat/s-fatllf.ads: Likewise. * libgnat/s-flocon.ads: Likewise. * libgnat/s-geveop.ads: Likewise. * libgnat/s-imageb.ads: Likewise. * libgnat/s-imaged.ads: Likewise. * libgnat/s-imagef.ads: Likewise. * libgnat/s-imagei.ads: Likewise. * libgnat/s-imagen.ads: Likewise. * libgnat/s-imageu.ads: Likewise. * libgnat/s-imagew.ads: Likewise. * libgnat/s-imde128.ads: Likewise. * libgnat/s-imde32.ads: Likewise. * libgnat/s-imde64.ads: Likewise. * libgnat/s-imen16.ads: Likewise. * libgnat/s-imen32.ads: Likewise. * libgnat/s-imenu8.ads: Likewise. * libgnat/s-imfi32.ads: Likewise. * libgnat/s-imfi64.ads: Likewise. * libgnat/s-imgbiu.ads: Likewise. * libgnat/s-imgboo.ads: Likewise. * libgnat/s-imgcha.ads: Likewise. * libgnat/s-imgint.ads: Likewise. * libgnat/s-imgllb.ads: Likewise. * libgnat/s-imglli.ads: Likewise. * libgnat/s-imgllu.ads: Likewise. * libgnat/s-imgllw.ads: Likewise. * libgnat/s-imgrea.ads: Likewise. * libgnat/s-imguns.ads: Likewise. * libgnat/s-imguti.ads: Likewise. * libgnat/s-imgwiu.ads: Likewise. * libgnat/s-maccod.ads: Likewise. * libgnat/s-multip.ads: Likewise. * libgnat/s-pack03.ads: Likewise. * libgnat/s-pack05.ads: Likewise. * libgnat/s-pack06.ads: Likewise. * libgnat/s-pack07.ads: Likewise. * libgnat/s-pack09.ads: Likewise. * libgnat/s-pack10.ads: Likewise. * libgnat/s-pack100.ads: Likewise. * libgnat/s-pack101.ads: Likewise. * libgnat/s-pack102.ads: Likewise. * libgnat/s-pack103.ads: Likewise. * libgnat/s-pack104.ads: Likewise. * libgnat/s-pack105.ads: Likewise. * libgnat/s-pack106.ads: Likewise. * libgnat/s-pack107.ads: Likewise. * libgnat/s-pack108.ads: Likewise. * libgnat/s-pack109.ads: Likewise. * libgnat/s-pack11.ads: Likewise. * libgnat/s-pack110.ads: Likewise. * libgnat/s-pack111.ads: Likewise. * libgnat/s-pack112.ads: Likewise. * libgnat/s-pack113.ads: Likewise. * libgnat/s-pack114.ads: Likewise. * libgnat/s-pack115.ads: Likewise. * libgnat/s-pack116.ads: Likewise. * libgnat/s-pack117.ads: Likewise. * libgnat/s-pack118.ads: Likewise. * libgnat/s-pack119.ads: Likewise. * libgnat/s-pack12.ads: Likewise. * libgnat/s-pack120.ads: Likewise. * libgnat/s-pack121.ads: Likewise. * libgnat/s-pack122.ads: Likewise. * libgnat/s-pack123.ads: Likewise. * libgnat/s-pack124.ads: Likewise. * libgnat/s-pack125.ads: Likewise. * libgnat/s-pack126.ads: Likewise. * libgnat/s-pack127.ads: Likewise. * libgnat/s-pack13.ads: Likewise. * libgnat/s-pack14.ads: Likewise. * libgnat/s-pack15.ads: Likewise. * libgnat/s-pack17.ads: Likewise. * libgnat/s-pack18.ads: Likewise. * libgnat/s-pack19.ads: Likewise. * libgnat/s-pack20.ads: Likewise. * libgnat/s-pack21.ads: Likewise. * libgnat/s-pack22.ads: Likewise. * libgnat/s-pack23.ads: Likewise. * libgnat/s-pack24.ads: Likewise. * libgnat/s-pack25.ads: Likewise. * libgnat/s-pack26.ads: Likewise. * libgnat/s-pack27.ads: Likewise. * libgnat/s-pack28.ads: Likewise. * libgnat/s-pack29.ads: Likewise. * libgnat/s-pack30.ads: Likewise. * libgnat/s-pack31.ads: Likewise. * libgnat/s-pack33.ads: Likewise. * libgnat/s-pack34.ads: Likewise. * libgnat/s-pack35.ads: Likewise. * libgnat/s-pack36.ads: Likewise. * libgnat/s-pack37.ads: Likewise. * libgnat/s-pack38.ads: Likewise. * libgnat/s-pack39.ads: Likewise. * libgnat/s-pack40.ads: Likewise. * libgnat/s-pack41.ads: Likewise. * libgnat/s-pack42.ads: Likewise. * libgnat/s-pack43.ads: Likewise. * libgnat/s-pack44.ads: Likewise. * libgnat/s-pack45.ads: Likewise. * libgnat/s-pack46.ads: Likewise. * libgnat/s-pack47.ads: Likewise. * libgnat/s-pack48.ads: Likewise. * libgnat/s-pack49.ads: Likewise. * libgnat/s-pack50.ads: Likewise. * libgnat/s-pack51.ads: Likewise. * libgnat/s-pack52.ads: Likewise. * libgnat/s-pack53.ads: Likewise. * libgnat/s-pack54.ads: Likewise. * libgnat/s-pack55.ads: Likewise. * libgnat/s-pack56.ads: Likewise. * libgnat/s-pack57.ads: Likewise. * libgnat/s-pack58.ads: Likewise. * libgnat/s-pack59.ads: Likewise. * libgnat/s-pack60.ads: Likewise. * libgnat/s-pack61.ads: Likewise. * libgnat/s-pack62.ads: Likewise. * libgnat/s-pack63.ads: Likewise. * libgnat/s-pack65.ads: Likewise. * libgnat/s-pack66.ads: Likewise. * libgnat/s-pack67.ads: Likewise. * libgnat/s-pack68.ads: Likewise. * libgnat/s-pack69.ads: Likewise. * libgnat/s-pack70.ads: Likewise. * libgnat/s-pack71.ads: Likewise. * libgnat/s-pack72.ads: Likewise. * libgnat/s-pack73.ads: Likewise. * libgnat/s-pack74.ads: Likewise. * libgnat/s-pack75.ads: Likewise. * libgnat/s-pack76.ads: Likewise. * libgnat/s-pack77.ads: Likewise. * libgnat/s-pack78.ads: Likewise. * libgnat/s-pack79.ads: Likewise. * libgnat/s-pack80.ads: Likewise. * libgnat/s-pack81.ads: Likewise. * libgnat/s-pack82.ads: Likewise. * libgnat/s-pack83.ads: Likewise. * libgnat/s-pack84.ads: Likewise. * libgnat/s-pack85.ads: Likewise. * libgnat/s-pack86.ads: Likewise. * libgnat/s-pack87.ads: Likewise. * libgnat/s-pack88.ads: Likewise. * libgnat/s-pack89.ads: Likewise. * libgnat/s-pack90.ads: Likewise. * libgnat/s-pack91.ads: Likewise. * libgnat/s-pack92.ads: Likewise. * libgnat/s-pack93.ads: Likewise. * libgnat/s-pack94.ads: Likewise. * libgnat/s-pack95.ads: Likewise. * libgnat/s-pack96.ads: Likewise. * libgnat/s-pack97.ads: Likewise. * libgnat/s-pack98.ads: Likewise. * libgnat/s-pack99.ads: Likewise. * libgnat/s-parame.ads: Likewise. * libgnat/s-rident.ads: Likewise. * libgnat/s-spark.ads: Likewise. * libgnat/s-spcuop.ads: Likewise. * libgnat/s-stoele.ads: Likewise. * libgnat/s-traent.ads: Likewise. * libgnat/s-unstyp.ads: Likewise. * libgnat/s-vaispe.ads: Likewise. * libgnat/s-valspe.ads: Likewise. * libgnat/s-vauspe.ads: Likewise. * libgnat/s-veboop.ads: Likewise. * libgnat/s-vector.ads: Likewise. * libgnat/s-vs_int.ads: Likewise. * libgnat/s-vs_lli.ads: Likewise. * libgnat/s-vs_llu.ads: Likewise. * libgnat/s-vs_uns.ads: Likewise. * libgnat/s-vsllli.ads: Likewise. * libgnat/text_io.ads: Likewise. * libgnat/unchconv.ads: Likewise. * libgnat/unchdeal.ads: Likewise. * s-pack.ads.tmpl: Likewise.