riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-05-13	RISC-V: Adjust riscv_can_inline_p	Kito Cheng	2	-6/+3
	We don't hold any extenison flags in `target_flags`, so no need to gather the extenison flags in `target_flags`. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_can_inline_p): Drop extension flags check from `target_flags`. * config/riscv/riscv-subset.h (riscv_x_target_flags_isa_mask): Remove. * config/riscv/riscv.cc (riscv_x_target_flags_isa_mask): Remove.
2025-05-13	RISC-V: Generate extension table in documentation from riscv-ext.def	Kito Cheng	2	-2/+120
	Automatically build the ISA extension reference table in invoke.texi from the unified riscv-ext.def metadata, ensuring documentation stays in sync with extension definitions and reducing manual maintenance. gcc/ChangeLog: * doc/invoke.texi: Replace hand‑written extension table with `@include riscv-ext.texi` to pull in auto‑generated entries. * doc/riscv-ext.texi: New generated definition file containing formatted documentation entries for each extension. * Makefile.in: Add riscv-ext.texi to the list of files to be processed by the Texinfo generator. * config/riscv/gen-riscv-ext-texi.cc: New. * config/riscv/t-riscv: Add rule for generating riscv-ext.texi.
2025-05-13	RISC-V: Use riscv-ext.def to generate target options and variables	Kito Cheng	8	-355/+551
	Leverage the centralized riscv-ext.def definitions to auto-generate the target option parsing and associated internal flags, replacing manual listings in riscv.opt; `riscv_ext_flag_table` part will remove in later patch. gcc/ChangeLog: * config/riscv/gen-riscv-ext-opt.cc: New. * config/riscv/riscv.opt: Drop manual entries for target options, and include riscv-ext.opt. * config/riscv/riscv-ext.opt: New. * config/riscv/riscv-ext.opt.urls: New. * config.gcc: Add riscv-ext.opt to the list of target options files. * common/config/riscv/riscv-common.cc (riscv_ext_flag_table): Adjsut target option variable entry. (riscv_set_arch_by_subset_list): Adjust target option variable. * config/riscv/riscv-c.cc (riscv_ext_flag_table): Adjust target option variable entry. * config/riscv/riscv-vector-builtins.cc (pragma_intrinsic_flags): Adjust variable name. (riscv_pragma_intrinsic_flags_pollute): Adjust variable name. (riscv_pragma_intrinsic_flags_restore): Ditto. * config/riscv/t-riscv: Add the rule for generating riscv-ext.opt. * config/riscv/riscv-opts.h (TARGET_MIN_VLEN): Update. (TARGET_MIN_VLEN_OPTS): Update.
2025-05-13	RISC-V: Introduce riscv-ext*.def to define extensions	Kito Cheng	5	-0/+2224
	Adding a new ISA extension to RISC-V GCC requires modifying several places: 1. riscv_ext_version_table for the extension version. 2. riscv.opt for the target option and variable. 3. riscv_ext_flag_table to bind the extension to its target option. 4. riscv_combine_info if this extension is just a macro extension. 5. riscv_implied_info if this extension implies other extensions. 6. invoke.texi for documentation (this one is often forgotten - even by me...). 7. riscv-ext-bitmask.def if this extension has been allocated a bitmask in `__riscv_feature_bits`. And now, we've integrated all the information into riscv-ext.def and generate (almost) everything from that! Some of the fields, like URL, are not used yet. They are planned to be updated later and used for improving the documentation. Changes since v1: - Rebase for including new extensions - Fix MASK_VECTOR handling gcc/ChangeLog: * config/riscv/riscv-ext.def: New file; define extension metadata table. * config/riscv/riscv-ext-corev.def: New. * config/riscv/riscv-ext-sifive.def: New. * config/riscv/riscv-ext-thead.def: New. * config/riscv/riscv-ext-ventana.def: New.
2025-05-12	aarch64: Remove cmov<mode>6 patterns	Andrew Pinski	1	-32/+0
	Since the cmov optab is not used and is being removed, the `cmov<mode>6` patterns from the aarch64 backend can also be removed. gcc/ChangeLog: * config/aarch64/aarch64.md (cmov<mode>6): Remove. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-05-12	nvptx: Support '-march=sm_61'	Thomas Schwinge	6	-8/+70
	gcc/ * config/nvptx/nvptx-sm.def: Add '61'. * config/nvptx/nvptx-gen.h: Regenerate. * config/nvptx/nvptx-gen.opt: Likewise. * config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust. * config/nvptx/nvptx.opt (-march-map=sm_61, -march-map=sm_62): Likewise. * config.gcc: Likewise. * doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_61'. * config/nvptx/gen-multilib-matches-tests: Extend. gcc/testsuite/ * gcc.target/nvptx/march-map=sm_61.c: Adjust. * gcc.target/nvptx/march-map=sm_62.c: Likewise. * gcc.target/nvptx/march=sm_61.c: New. libgomp/ * testsuite/libgomp.c/declare-variant-3-sm61.c: New. * testsuite/libgomp.c/declare-variant-3.h: Adjust.
2025-05-12	nvptx: Support '-mptx=5.0'	Thomas Schwinge	4	-0/+9
	gcc/ * config/nvptx/nvptx-opts.h (enum ptx_version): Add 'PTX_VERSION_5_0'. * config/nvptx/nvptx.cc (ptx_version_to_string) (ptx_version_to_number): Adjust. * config/nvptx/nvptx.h (TARGET_PTX_5_0): New. * config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue' '5.0' for 'PTX_VERSION_5_0'. * doc/invoke.texi (Nvidia PTX Options): Document '-mptx=5.0'. gcc/testsuite/ * gcc.target/nvptx/mptx=5.0.c: New.
2025-05-12	RISC-V: Minimal support for ssnpm, smnpm and smmpm extensions.	Dongyan Chen	1	-0/+19
	This patch support ssnpm, smnpm, smmpm, sspm and supm extensions[1]. To enable GCC to recognize and process ssnpm, smnpm, smmpm, sspm and supm extensions correctly at compile time. [1]https://github.com/riscv/riscv-j-extension/blob/master/zjpm/instructions.adoc Changes for v5: - Fix the testsuite error in arch-50.c. Changes for v4: - Fix the code based on the commit id 9b13bea07706a7cae0185f8a860d67209308c050. Changes for v3: - Fix the error messages in gcc/testsuite/gcc.target/riscv/arch-46.c Changes for v2: - Add the sspm and supm extensions. - Add the check_conflict_ext function to check the compatibility of ssnpm, smnpm, smmpm, sspm and supm extensions. - Add the test cases for ssnpm, smnpm, smmpm, sspm and supm extensions. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::check_conflict_ext): New extension. * config/riscv/riscv.opt: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-ss-1.c: New test. * gcc.target/riscv/arch-ss-2.c: New test.
2025-05-12	RISC-V: Support for zilsd and zclsd extensions.	Dongyan Chen	1	-0/+4
	This patch support zilsd and zclsd[1] extensions. To enable GCC to recognize and process zilsd and zclsd extension correctly at compile time. [1] https://github.com/riscv/riscv-zilsd Changes for v2: - Remove the addition of zilsd extension in gcc/common/config/riscv/riscv-ext-bitmask.def - Fix a bug with zilsd and zclsd extension dependency in gcc/common/config/riscv/riscv-common.cc gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::check_conflict_ext): New extension. * config/riscv/riscv.opt: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-zilsd-1.c: New. * gcc.target/riscv/arch-zilsd-2.c: New. * gcc.target/riscv/arch-zilsd-3.c: New.
2025-05-12	arm: remove iwmmxt registers from allocator tables	Richard Earnshaw	3	-70/+30
	These registers can no-longer be allocated, so remove them from the various tables. gcc/ChangeLog: * config/arm/aout.h (REGISTER_NAMES): Remove iwmmxt registers. * config/arm/arm.h (FIRST_IWMMXT_REGNUM): Delete. (LAST_IWMMXT_REGNUM): Delete. (FIRST_IWMMXT_GR_REGNUM): Delete. (LAST_IWMMXT_GR_REGNUM): Delete. (IS_IWMMXT_REGNUM): Delete. (IS_IWMMXT_GR_REGNUM): Delete. (FRAME_POINTER_REGNUM): Define relative to CC_REGNUM. (ARG_POINTER_REGNUM): Define relative to FRAME_POINTER_REGNUM. (FIRST_PSEUDO_REGISTER): Adjust. (WREG): Delete. (WGREG): Delete. (REG_ALLOC_ORDER): Remove iWMMX registers. (enum reg_class): Remove iWMMX register classes. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Remove iWMMX registers. * config/arm/arm.md (CC_REGNUM): Adjust value. (VFPCC_RENGUM): Likewise. (APSRQ_REGNUM): Likewise. (APSRGE_REGNUM): Likewise. (VPR_REGNUM): Likewise. (RA_AUTH_CODE): Likewise.
2025-05-12	arm: remove most remaining iwmmxt code.	Richard Earnshaw	4	-225/+32
	Remove most of the remaining code for iWMMXT support, except for the register allocation table entries. gcc/ChangeLog: * config/arm/arm-cpus.in (feature iwmmxt, feature iwmmxt2): Delete. * config/arm/arm-protos.h (arm_output_iwmmxt_shift_immediate): Delete. (arm_output_iwmmxt_tinsr): Delete. (arm_arch_iwmmxt): Delete. (arm_arch_iwmmxt2): Delete. * config/arm/arm.h (TARGET_IWMMXT): Delete. (TARGET_IWMMXT2): Delete. (TARGET_REALLY_IWMMXT): Delete. (TARGET_REALLY_IWMMXT2): Delete. (VALID_IWMMXT_REG_MODE): Delete. (ARM_HAVE_V8QI_ARITH): Remove iWMMXT. (ARM_HAVE_V4HI_ARITH): Likewise. (ARM_HAVE_V2SI_ARITH): Likewise. (ARM_HAVE_V8QI_LDST): Likewise. (ARM_HAVE_V4HI_LDST): Likewise. (ARM_HAVE_V2SI_LDST): Likewise. (SECONDARY_OUTPUT_RELOAD_CLASS): Remove iWMMXT cases. (SECONDARY_INPUT_RELOAD_CLASS): Likewise. * config/arm/arm.cc (arm_arch_iwmmxt): Delete. (arm_arch_iwmmxt2): Delete. (arm_option_reconfigure_globals): Don't initialize them. (arm_register_move_cost): Remove costs for iwmmxt. (struct minipool_node): Update comment. (output_move_double): Likewise (output_return_instruction): Likewise. (arm_print_operand, cases 'U' and 'w'): Report an error if used. (arm_regno_class): Remove iWMMXT cases. (arm_debugger_regno): Remove iWMMXT cases. (arm_output_iwmmxt_shift_immediate): Delete. (arm_output_iwmmxt_tinsr): Delete.
2025-05-12	arm: remove dead predefines when using WMMX	Richard Earnshaw	1	-7/+0
	Since we no-longer enable iWMMXT, these predefines are no-longer enabled when preprocessing C. Remove them. gcc/ChangeLog: * config/arm/arm-c.cc (arm_cpu_builtins): Remove predefines for __IWWMXT__, __IWMMXT2__ and __ARM_WMMX.
2025-05-12	arm: cleanup iterators.md after removing iwmmxt	Richard Earnshaw	1	-14/+6
	Mostly this is just removing references to iWMMXT in comments, but also remove some now unused iterators and attributes. gcc/ChangeLog: * config/arm/iterators.md (VMMX, VMMX2): Remove mode iterators. (MMX_char): Remove mode iterator attribute.
2025-05-12	arm: remove iwmmxt-related attributes from machine description	Richard Earnshaw	4	-326/+1
	Since we no-longer have any iwmxxt instructions, the iwmmxt-related attributes can never be set. Consequently, the marvel-f-iwmmxt scheduler is redundant as none of the pipes are ever used now. gcc/ChangeLog: * config/arm/arm.md (core_cycles): Remove iwmmxt attributes. * config/arm/types.md (autodetect_type): Likewise. * config/arm/marvell-f-iwmmxt.md: Removed. * config/arm/t-arm: Remove marvell-f-iwmmxt.md
2025-05-12	arm: Remove iwmmxt support from arm.cc	Richard Earnshaw	1	-181/+2
	TARGET_IWMMXT, TARGET_IWMMXT2 and their _REALLY_ equivalents are never true now, so the code using them can be simplified. gcc/ChangeLog: * config/arm/arm.cc (arm_option_check_internal): Remove IWMMXT check. (arm_options_perform_arch_sanity_checks): Likewise. (use_return_insn): Likewise. (arm_init_cumulative_args): Likewise. (arm_legitimate_index_p): Likewise. (thumb2_legitimate_index_p): Likewise. (arm_compute_save_core_reg_mask): Likewise. (output_return_instruction): Likewise. (arm_compute_frame_layout): Likewise. (arm_save_coproc_regs): Likewise. (arm_hard_regno_mode_ok): Likewise. (arm_expand_epilogue_apcs_frame): Likewise. (arm_expand_epilogue): Likewise. (arm_vector_mode_supported_p): Likewise. (arm_preferred_simd_mode): Likewise. (arm_conditional_register_usage): Likewise.
2025-05-12	arm: remove support for the iwmmxt ABI variant.	Richard Earnshaw	4	-52/+4
	The iwmmxt ABI is a variant of the ABI that supported passing certain parameters and results in iwmmxt registers. But since we no-longer support the instructions that can read and write these registers, the ABI variant can no-longer be used. gcc/ChangeLog: * config.gcc (arm, --with-abi): Remove iwmmxt abi option. * config/arm/arm.opt (enum ARM_ABI_IWMMXT): Remove. * config/arm/arm.h (TARGET_IWMMXT_ABI): Delete. (enum arm_pcs): Remove ARM_PCS_AAPCS_IWMMXT. (FUNCTION_ARG_REGNO_P): Remove IWMMXT ABI support. (CUMULATIVE_ARGS): Remove iwmmxt_nregs. * config/arm/arm.cc (arm_options_perform_arch_sanity_checks): Remove IWMMXT ABI checks. (arm_libcall_value_1): Likewise. (arm_function_value_regno_p): Likewise. (arm_apply_result_size): Remove adjustment for IWMMXT ABI. (arm_function_arg): Remove IWMMXT ABI support. (arm_arg_partial_bytes): Likewise. (arm_function_arg_advance): Likewise. (arm_init_cumulative_args): Don't initialize iwmmxt_nregs. * doc/invoke.texi (arm -mabi): Remove mention of the iwmmxt ABI option. * config/arm/arm-opts.h (enum arm_abi_type): Remove ARM_ABI_IWMMXT.
2025-05-12	arm: remove IWMMXT checks from MD files.	Richard Earnshaw	4	-46/+20
	Remove the various checks for TARGET_IWMMXT{,2} and TARGET_REALLY_IWMMXT{,2} from the remaining machine description files. These flags can never be true now. gcc/ChangeLog: * config/arm/arm.md(attr arch): Remove iwmmxt and iwmmxt2. Remove checks based on TARGET_REALLY_IWMMXT2 from all split patterns. (arm_movdi): Likewise. (arm_movt): Likewise. (arch_enabled): Remove test for iwmmxt2. config/arm/constraints.md (y, z): Remove register constraints. (Uy): Remove memory constraint. * config/arm/thumb2.md (thumb2_pop_single): Remove check for IWMMXT. * config/arm/vec-common.md (mov<mode>): Remove check for IWMMXT. (mul<mode>3): Likewise. (xor<mode>3): Likewise. (<absneg_str><mode>2): Likewise. (@movmisalign<mode>): Likewise. (@mve_<mve_insn>q_<supf><mode>): Likewise. (vashl<mode>3): Likewise. (vashr<mode>3): Likewise. (vlshr<mode>3): Likewise. (uavg<mode>3_ceil): Likewise.
2025-05-12	arm: Remove iwmmxt patterns.	Richard Earnshaw	6	-2709/+1
	This patch deletes the patterns relating to iwmmxt and iwmmxt2 and updates the relevant dependencies. gcc/ChangeLog: * config/arm/arm.md: Don't include iwmmxt.md. * config/arm/t-arm (MD_INCLUDES): Remove iwmmxt.md. config/arm/iwmmxt.md: Removed. * config/arm/iwmmxt2.md: Removed. * config/arm/unspecs.md: Remove comment referring to iwmmxt2.md. (enum unspec): Remove iWMMXt unspec values. (enum unspecv): Likewise. * config/arm/predicates.md (imm_or_reg_operand): Delete.
2025-05-12	arm: remove iWMMX builtins support.	Richard Earnshaw	1	-1274/+2
	This is the first step of removing the various builtins for iwmmxt, removing the builtins expansion code. It leaves a lot of code elsewhere, but we'll clean that up in subsequent patches. I'm not sure why safe_vector_operand would unconditionally try to expand to an iwmmxt instruction if passed (const_int 0). Clearly that's meaningless on other architectures, but perhaps this can't happen elsewhere. Anyway, for now, just mark this as unreachable so that we'll know about it if it ever happens. gcc/ChangeLog: * config/arm/arm-builtins.cc (enum arm_builtins): Delete iWMMX builtin values. (bdesc_2arg): Likewise. (bdesc_1arg): Likewise. (arm_init_iwmmxt_builtins): Delete. (arm_init_builtins): Don't call arm_init_iwmmxt_builtins. (safe_vector_operand): Use __builtin_unreachable instead of emitting an iwmmxt builtin. (arm_general_expand_builtin): Remove iWMMX builtins support.
2025-05-12	arm: treat -mcpu/arch=iwmmxt{,2} like XScale	Richard Earnshaw	4	-51/+34
	Treat options that select iwmmxt variants as we would for xscale. We leave the feature bits in for now, since they are still needed elsewhere, but they are never enabled. Also remove the remaining testsuite framework support for iwmmxt, since this will never trigger now. gcc/ * config/arm/arm-cpus.in (arch iwmmxt): treat in the same way as we would treat XScale. (arch iwmmxt2): Likewise. (cpu xscale): Add aliases for iwmmxt and iwmmxt2. (cpu iwmmxt): Delete. (cpu iwmmxt2): Delete. * config/arm/arm-generic.md (load_ldsched_xscale): Remove references to iwmmxt. (load_ldsched): Likewise. * config/arm/arm-tables.opt: Regenerated. * config/arm/arm-tune.md: Regenerated. * doc/sourcebuild.texi (arm_iwmmxt_ok): Delete. gcc/testsuite/ChangeLog: * gcc.target/arm/ivopts.c: Remove test for iwmmxt * lib/target-supports.exp (check_effective_target_arm_iwmmxt_ok): Delete.
2025-05-12	arm: clarify the logic of SECONDARY_(INPUT/OUTPUT)_RELOAD_CLASS	Richard Earnshaw	1	-26/+29
	The flattened logic of these functions and the complexity of the numerous clauses makes it very difficult to understand what's written in these macros. Additionally, SECONDARY_INPUT_RELOAD_CLASS was not laid out with the correct formatting. Add some parenthesis and re-indent to make the logic clearer. No functional change. gcc: * config/arm/arm.h (SECONDARY_OUTPUT_RELOAD_CLASS): Add parentheis and re-indent. (SECONDARY_INPUT_RELOAD_CLASS): Likewise.
2025-05-12	x86: Remove df_insn_rescan after emit_insn_*	H.J. Lu	1	-8/+3
	Since df_insn_rescan has been called by emit_insn_, there is no need to call it after calling emit_insn_. Remove its unnecessary usages. PR target/120228 * config/i386/i386-features.cc (ix86_place_single_vector_set): Remove df_insn_rescan after emit_insn_*. (remove_partial_avx_dependency): Likewise. (replace_vector_const): Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-05-11	i386: Fix move costs in vectorizer cost model.	Jan Hubicka	1	-11/+15
	This patch complements the change to stv and uses COSTS_N_INSNS (...)/2 to convert move costs to COSTS_N_INSNS based costs used by vectorizer. The patch makes pr9981 to XPASS so I removed xfail but it also makes pr91446 fail. This is about SLP /* { dg-options "-O2 -march=icelake-server -ftree-slp-vectorize -mtune-ctrl=^sse_typeless_stores" } / typedef struct { unsigned long long width, height; long long x, y; } info; extern void bar (info ); void foo (unsigned long long width, unsigned long long height, long long x, long long y) { info t; t.width = width; t.height = height; t.x = x; t.y = y; bar (&t); } /* { dg-final { scan-assembler-times "vmovdqa\[^\n\r\]xmm\[0-9\]" 2 } } / With fixed cost the construction cost is now too large so vectorization does not happen. This is the hack increasing cost to account integer->sse move which I think we can handle incrementally. gcc/ChangeLog: * config/i386/i386.cc (ix86_widen_mult_cost): Use sse_op to cost SSE integer addition. (ix86_multiplication_cost): Use COSTS_N_INSNS (...)/2 to cost sse loads. (ix86_shift_rotate_cost): Likewise. (ix86_vector_costs::add_stmt_cost): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/pr91446.c: xfail. * gcc.target/i386/pr99881.c: remove xfail.
2025-05-11	xtensa: Fix up unwanted spills of SFmode hard registers holding function ↵	Takayuki 'January June' Suwa	1	-9/+19
	arguments/returns Until now (presumably after transition to LRA), hard registers storing function arguments or return values were spilling undesirably when TARGET_HARD_FLOAT is enabled. /* example / float test0(float a, float b) { return a + b; } extern float foo(void); float test1(void) { return foo() 3.14f; } ;; before test0: entry sp, 48 wfr f0, a2 wfr f1, a3 add.s f0, f0, f1 s32i.n a2, sp, 0 ;; unwanted spilling-out s32i.n a3, sp, 4 ;; rfr a2, f0 retw.n .literal .LC1, 1078523331 test1: entry sp, 48 call8 foo l32r a8, .LC1 wfr f0, a10 wfr f1, a8 mul.s f0, f0, f1 s32i.n a10, sp, 0 ;; unwanted spilling-out rfr a2, f0 retw.n Ultimately, that is because the costs of moving between integer and floating-point hard registers are undefined and the default (large value) is used. This patch fixes this. ;; after test0: entry sp, 32 wfr f1, a2 wfr f0, a3 add.s f0, f1, f0 rfr a2, f0 retw.n .literal .LC1, 1078523331 test1: entry sp, 32 call8 foo l32r a8, .LC1 wfr f1, a10 wfr f0, a8 mul.s f0, f1, f0 rfr a2, f0 retw.n gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_register_move_cost): Add appropriate move costs between AR_REGS and FP_REGS.
2025-05-11	RISC-V: Support RISC-V Profiles 20/22.	Jiawei	1	-0/+2
	This patch introduces support for RISC-V Profiles RV20 and RV22 [1], enabling developers to utilize these profiles through the -march option. [1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0 Version log: Using lowercase letters to present Profiles. Using '_' as divsor between Profiles and other RISC-V extension. Add descriptions in invoke.texi. Checking if there exist '_' between Profiles and additional extensions. Using std::string to avoid memory problems. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (struct riscv_profiles): New struct. (riscv_subset_list::parse_profiles): New parser. (riscv_subset_list::parse_base_ext): Ditto. * config/riscv/riscv-subset.h: New def. * doc/invoke.texi: New option descriptions. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-49.c: New test. * gcc.target/riscv/arch-50.c: New test. * gcc.target/riscv/arch-51.c: New test. * gcc.target/riscv/arch-52.c: New test.
2025-05-11	x86: Change dest to src in replace_vector_const	H.J. Lu	1	-3/+3
	Replace rtx dest = SET_SRC (set); with rtx src = SET_SRC (set); in replace_vector_const to avoid confusion. PR target/92080 PR target/117839 * config/i386/i386-features.cc (replace_vector_const): Change dest to src. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-05-10	i386: Fix some problems in stv cost model	Jan Hubicka	4	-48/+116
	this patch fixes some of problems with cosint in scalar to vector pass. In particular 1) the pass uses optimize_insn_for_size which is intended to be used by expanders and splitters and requires the optimization pass to use set_rtl_profile (bb) for currently processed bb. This is not done, so we get random stale info about hotness of insn. 2) register allocator move costs are all realtive to integer reg-reg move which has cost of 2, so it is (except for size tables and i386) a latency of instruction multiplied by 2. These costs has been duplicated and are now used in combination with rtx costs which are all based to COSTS_N_INSNS that multiplies latency by 4. Some of vectorizer costing contains COSTS_N_INSNS (move_cost) / 2 to compensate, but some new code does not. This patch adds compensatoin. Perhaps we should update the cost tables to use COSTS_N_INSNS everywher but I think we want to first fix inconsistencies. Also the tables will get optically much longer, since we have many move costs and COSTS_N_INSNS is a lot of characters. 3) variable m which decides how much to multiply integer variant (to account that with -m32 all 64bit computations needs 2 instructions) is declared unsigned which makes the signed computation of instruction gain to be done in unsigned type and breaks i.e. for division. 4) I added integer_to_sse costs which are currently all duplicationof sse_to_integer. AMD chips are asymetric and moving one direction is faster than another. I will chance costs incremnetally once vectorizer part is fixed up, too. There are two failures gcc.target/i386/minmax-6.c and gcc.target/i386/minmax-7.c. Both test stv on hasswell which no longer happens since SSE->INT and INT->SSE moves are now more expensive. There is only one instruction to convert: Computing gain for chain #1... Instruction gain 8 for 11: {r110:SI=smax(r116:SI,0);clobber flags:CC;} Instruction conversion gain: 8 Registers conversion cost: 8 <- this is integer_to_sse and sse_to_integer Total gain: 0 total gain used to be 4 since the patch doubles the conversion costs. According to agner fog's tables the costs should be 1 cycle which is correct here. Final code gnerated is: vmovd %esi, %xmm0 * latency 1 cmpl %edx, %esi je .L2 vpxor %xmm1, %xmm1, %xmm1 * latency 1 vpmaxsd %xmm1, %xmm0, %xmm0 * latency 1 vmovd %xmm0, %eax * latency 1 imull %edx, %eax cltq movzwl (%rdi,%rax,2), %eax ret cmpl %edx, %esi je .L2 xorl %eax, %eax * latency 1 testl %esi, %esi * latency 1 cmovs %eax, %esi * latency 2 imull %edx, %esi movslq %esi, %rsi movzwl (%rdi,%rsi,2), %eax ret Instructions with latency info are those really different. So the uncoverted code has sum of latencies 4 and real latency 3. Converted code has sum of latencies 4 and real latency 3 (vmod+vpmaxsd+vmov). So I do not quite see it should be a win. There is also a bug in costing MIN/MAX case ABS: case SMAX: case SMIN: case UMAX: case UMIN: /* We do not have any conditional move cost, estimate it as a reg-reg move. Comparisons are costed as adds. / igain += m (COSTS_N_INSNS (2) + ix86_cost->add); /* Integer SSE ops are all costed the same. / igain -= ix86_cost->sse_op; break; Now COSTS_N_INSNS (2) is not quite right since reg-reg move should be 1 or perhaps 0. For Haswell cmov really is 2 cycles, but I guess we want to have that in cost vectors like all other instructions. I am not sure if this is really a win in this case (other minmax testcases seems to make sense). I have xfailed it for now and will check if that affects specs on LNT testers. I will proceed with similar fixes on vectorizer cost side. Sadly those introduces quite some differences in the testuiste (partly triggered by other costing problems, such as one of scatter/gather) gcc/ChangeLog: config/i386/i386-features.cc (general_scalar_chain::vector_const_cost): Add BB parameter; handle size costs; use COSTS_N_INSNS to compute move costs. (general_scalar_chain::compute_convert_gain): Use optimize_bb_for_size instead of optimize_insn_for size; use COSTS_N_INSNS to compute move costs; update calls of general_scalar_chain::vector_const_cost; use ix86_cost->integer_to_sse. (timode_immed_const_gain): Add bb parameter; use optimize_bb_for_size_p. (timode_scalar_chain::compute_convert_gain): Use optimize_bb_for_size_p. * config/i386/i386-features.h (class general_scalar_chain): Update prototype of vector_const_cost. * config/i386/i386.h (struct processor_costs): Add integer_to_sse. * config/i386/x86-tune-costs.h (struct processor_costs): Copy sse_to_integer to integer_to_sse everywhere. gcc/testsuite/ChangeLog: * gcc.target/i386/minmax-6.c: xfail test that pmax is used. * gcc.target/i386/minmax-7.c: xfall test that pmin is used.
2025-05-10	[V2][RISC-V] Synthesize more efficient IOR/XOR sequences	Shreya Munnangi	5	-10/+216
	So mvconst_internal's primary benefit is in constant synthesis not impacting the combine budget in terms of the number of instructions it is willing to combine together at any given time. The downside is mvconst_internal breaks combine's toplevel costing model and as a result many other patterns have to be implemented as define_insn_and_splits rather than the often more natural define_splits. This primarily impacts logical operations where we want to see the constant operand and potentially simplify the logical with other nearby logicals or shifts. We can reduce our reliance on mvconst_internal and generate better code for various cases by generating better initial code for logical operations. So let's assume we have a inclusive-or of a register with a nontrivial constant. Right now we will load the nontrivial constant into a new pseudo (using multiple instructions), then emit a two register source ior operation. For some cases we can just generate the code we want at expansion time. Concretely let's take this testcase: > unsigned long foo(unsigned long src) { return src \| 0x8800000000000007; } Right now we generate this code: > li a5,-15 > slli a5,a5,59 > addi a5,a5,7 > or a0,a0,a5 The first three instructions are synthesizing the constant. The last instruction performs the desired operation. But we can do better: > ori a0,a0,7 > bseti a0,a0,59 > bseti a0,a0,63 Notice how we never even bother to synthesize the constant. IOR/XOR are pretty simple and this patch focuses exclusively on those. We use [x]ori to set whatever low 11 bits we need, then bset/binv for a small number of higher bits. We use the cost of constant synthesis as our budget. We also support a couple special cases. First, we might be able to rotate the source value such that all the bits we want to manipulate are in the low 11 bits. So we rotate the source, manipulate the bits, then rotate things back to where they belong. I didn't see this trigger in spec, but I did trivially find a testcase where it was likely faster. Second, we can have cases where we want to invert most of the bits, but a small number are supposed to be preserved. We can pre-flip the bits we want to preserve with binv, then invert the whole register with not (which puts the bits to be preserved back in their original state). I suspect there are likely a few more cases that could be improved, but the patch should stand on its own now and getting it out of the way allows us to focus on logical AND which is far tougher, but also more important in the task of removing mvconst_internal. As we're not removing mvconst_internal yet, this patch is mostly a nop. I did look at spec before/after and didn't see anything particular interesting. I also temporarily removed mvconst_internal and looked at spec before/after to hopefully ensure we weren't missing anything obvious in the XOR/IOR cases. Obviously that latter test showed all kinds of regressions with AND. We're still working through implementation details on the AND case and determining what bridge patterns we're going to need to ensure we don't regress. But this XOR/IOR patch is in good enough shape that it can go forward now. Naturally this has been run through my tester (bootstrap & regression test is in flight, but won't finish for many more hours). Obviously I'm quite interested in anything spit out by the pre-commit CI system. gcc/ * config/riscv/iterators.md (OPTAB): New iterator. * config/riscv/predicates.md (arith_or_zbs_operand): Remove. (reg_or_const_int_operand): New predicate. * config/riscv/riscv-protos.h (synthesize_ior_xor): Prototype. * config/riscv/riscv.cc (synthesize_ior_xor): New function. * config/riscv/riscv.md (ior/xor expander): Use synthesize_ior_xor. gcc/testsuite/ * gcc.target/riscv/ior-synthesis-1.c: New test. * gcc.target/riscv/ior-synthesis-2.c: New test. * gcc.target/riscv/xor-synthesis-1.c: New test. * gcc.target/riscv/xor-synthesis-2.c: New test. * gcc.target/riscv/xor-synthesis-3.c: New test. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2025-05-10	i386/cygming: Decrease default preferred stack boundary for 32-bit targets	LIU Hao	2	-5/+13
	This commit decreases the default preferred stack boundary to 4. In i386-options.cc, there's ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY; which sets the default incoming stack boundary to this value, if it's not overridden by other options or attributes. Previously, GCC preferred 16-byte alignment like other platforms, unless `-miamcu` was specified. However, the Microsoft x86 ABI only requires the stack be aligned to 4-byte boundaries. Callback functions from MSVC code may break this assumption by GCC (see reference below), causing local variables to be misaligned. For compatibility reasons, when the attribute `force_align_arg_pointer` is attached to a function, it continues to ensure the stack is at least aligned to a 16-byte boundary, as the documentation seems to suggest. After this change, `STACK_REALIGN_DEFAULT` no longer has an effect on this target, so it is removed. Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111107#c9 Signed-off-by: LIU Hao <lh_mouse@126.com> Signed-off-by: Jonathan Yong <10walls@gmail.com> gcc/ChangeLog: PR target/111107 * config/i386/cygming.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Override definition from i386.h. (STACK_REALIGN_DEFAULT): Undefine, as it no longer has an effect. * config/i386/i386.cc (ix86_update_stack_boundary): Force minimum 128-bit alignment if `force_align_arg_pointer`.
2025-05-10	[PATCH v2] RISC-V: Use vclmul for CRC expansion if available	Anton Blanchard	2	-21/+92
	If the vector version of clmul (vclmul) is available and the scalar one is not, use it for CRC expansion. gcc/ * config/riscv/bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): Check TARGET_ZVBC. * config/riscv/riscv.cc (expand_crc_using_clmul): Emit code using vclmul if TARGET_ZVBC. gcc/testsuite * gcc.target/riscv/rvv/base/crc-builtin-zvbc.c: New test.
2025-05-09	AArch64: Optimize SVE loads/stores with ptrue predicates to unpredicated ↵	Jennifer Schmitz	1	-0/+17
	instructions. SVE loads and stores where the predicate is all-true can be optimized to unpredicated instructions. For example, svuint8_t foo (uint8_t x) { return svld1 (svptrue_b8 (), x); } was compiled to: foo: ptrue p3.b, all ld1b z0.b, p3/z, [x0] ret but can be compiled to: foo: ldr z0, [x0] ret Late_combine2 had already been trying to do this, but was missing the instruction: (set (reg/i:VNx16QI 32 v0) (unspec:VNx16QI [ (const_vector:VNx16BI repeat [ (const_int 1 [0x1]) ]) (mem:VNx16QI (reg/f:DI 0 x0 [orig:106 x ] [106]) [0 MEM <svuint8_t> [(unsigned char )x_2(D)]+0 S[16, 16] A8]) ] UNSPEC_PRED_X)) This patch adds a new define_insn_and_split that matches the missing instruction and splits it to an unpredicated load/store. Because LDR offers fewer addressing modes than LD1[BHWD], the pattern is guarded under reload_completed to only apply the transform once the address modes have been chosen during RA. The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve.md (aarch64_sve_ptrue<mode>_ldr_str): Add define_insn_and_split to fold predicated SVE loads/stores with ptrue predicates to unpredicated instructions. gcc/testsuite/ gcc.target/aarch64/sve/ptrue_ldr_str.c: New test. * gcc.target/aarch64/sve/acle/general/attributes_6.c: Adjust expected outcome. * gcc.target/aarch64/sve/cost_model_14.c: Adjust expected outcome. * gcc.target/aarch64/sve/cost_model_4.c: Adjust expected outcome. * gcc.target/aarch64/sve/cost_model_5.c: Adjust expected outcome. * gcc.target/aarch64/sve/cost_model_6.c: Adjust expected outcome. * gcc.target/aarch64/sve/cost_model_7.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_mf8.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Adjust expected outcome. * gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Adjust expected outcome. * gcc.target/aarch64/sve/peel_ind_2.c: Adjust expected outcome. * gcc.target/aarch64/sve/single_1.c: Adjust expected outcome. * gcc.target/aarch64/sve/single_2.c: Adjust expected outcome. * gcc.target/aarch64/sve/single_3.c: Adjust expected outcome. * gcc.target/aarch64/sve/single_4.c: Adjust expected outcome.
2025-05-08	aarch64: Fix up commutative and early-clobber markers on compact insns	Richard Earnshaw	3	-93/+93
	For constraints there are operand modifiers and constraint qualifiers. Operand modifiers apply to all alternatives and must appear, in traditional syntax before the first alternative. Constraint qualifiers, on the other hand must appear in each alternative to which they apply. There's no easy way to validate the distinction in the traditional md format, but when using the new compact format we can enforce some semantic checking of these characters to avoid some potentially surprising code generation. Fortunately, all of these errors are benign, but the two misplaced early-clobber markers were quite suspicious at first sight - it's only by luck that the second alternative does not need an early-clobber. The syntax checking will be added in the following patch, but first of all, fix up the errors in aarch64.md. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_<optab><mode>): Move commutative marker to the cons specification. (add<mode>3): Likewise. (@aarch64_pred_<su>abd<mode>): Likewise. (@aarch64_pred_<optab><mode>): Likewise. (cond_<optab><mode>_z): Likewise. (<optab><mode>3): Likewise. (@aarch64_pred_<optab><mode>): Likewise. (aarch64_pred_abd<mode>_relaxed): Likewise. (aarch64_pred_abd<mode>_strict): Likewise. (@aarch64_pred_<optab><mode>): Likewise. (@aarch64_pred_<optab><mode>): Likewise. (@aarch64_pred_fma<mode>): Likewise. (@aarch64_pred_fnma<mode>): Likewise. (@aarch64_pred_<optab><mode>): Likewise. config/aarch64/aarch64-sve2.md (@aarch64_sve_<su>clamp<mode>): Move commutative marker to the cons specification. (aarch64_sve_<su>clamp<mode>_x): Likewise. (@aarch64_sve_fclamp<mode>): Likewise. (aarch64_sve_fclamp<mode>_x): Likewise. (aarch64_sve2_nor<mode>): Likewise. (aarch64_sve2_nand<mode>): Likewise. (aarch64_pred_faminmax_fused): Likewise. config/aarch64/aarch64.md (loadwb_pre_pair_<ldst_sz>): Move the early-clobber marker to the relevant alternative. (storewb_pre_pair_<ldst_sz>): Likewise. (add<mode>3_aarch64): Move commutative marker to the cons specification. (addsi3_aarch64_uxtw): Likewise. (add<mode>3_poly_1): Likewise. (add<mode>3_compare0): Likewise. (addsi3_compare0_uxtw): Likewise. (add<mode>3nr_compare0): Likewise. (<optab><mode>3): Likewise. (<optab>si3_uxtw): Likewise. (and<mode>3_compare0): Likewise. (andsi3_compare0_uxtw): Likewise. (@aarch64_and<mode>3nr_compare0): Likewise.
2025-05-07	Canonicalize vec_merge in simplify_ternary_operation	Pengxuan Zheng	3	-4/+17
	Similar to the canonicalization done in combine, we canonicalize vec_merge with swap_communattive_operands_p in simplify_ternary_operation too. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_exact_log2_inverse): New. * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): Update pattern accordingly. * config/aarch64/aarch64.cc (aarch64_exact_log2_inverse): New. * simplify-rtx.cc (simplify_context::simplify_ternary_operation): Canonicalize vec_merge. Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
2025-05-07	[RISC-V][PR target/120137][PR target/120154] Don't create out-of-range ↵	Jeff Law	1	-1/+19
	permutation constants To make hashing sensible we canonicalize constant vectors in the hash table so that their first entry always has the value zero. That normalization can result in a value that can't be represented in the element mode. So before entering anything into the hash table we need to verify the normalized entries will fit into the element's mode. This fixes both 120137 and its duplicate 120154. This has been tested in my tester. I'm just waiting for the pre-commit tester to render its verdict. PR target/120137 PR target/120154 gcc/ * config/riscv/riscv-vect-permconst.cc (process_bb): Verify each canonicalized element fits into the vector element mode. gcc/testsuite/ * gcc.target/riscv/pr120137.c: New test. * gcc.target/riscv/pr120154.c: New test.
2025-05-07	[PATCH] RISC-V: Minimal support for zama16b extension.	Dongyan Chen	1	-0/+2
	This patch support zama16b extension[1]. To enable GCC to recognize and process zama16b extension correctly at compile time. [1] https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc gcc/ChangeLog: * common/config/riscv/riscv-common.cc: New extension. * config/riscv/riscv.opt: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-48.c: New test.
2025-05-07	arm: select CCFPEmode for LTGT [PR91323]	Richard Earnshaw	1	-1/+1
	Besides Arm, there are three other ports that define both CCFPmode and CCFPEmode. AArch64 and Sparc return CCFPEmode for LTGT; the other, Visium, doesn't support LTGT at all. AArch64 was changed in r8-5286-g8332c5ee8c5f3b, and Sparc with r10-2926-g000a5f8d23c04c. I suspect this issue is latent on Arm because cbranch?f4 and cstore?f4 reject LTGT and UNEQ and we fall back to a generic expansion which happens to work. Nevertheless, this patch updates the relevant bits of the Arm port to match the specification introduced in r10-2926-g000a5f8d23c04c. gcc/ChangeLog: PR target/91323 * config/arm/arm.cc (arm_select_cc_mode): Use CCFPEmode for LTGT.
2025-05-07	arm: Only reverse FP inequalities when -ffinite-math-only [PR110796...]	Richard Earnshaw	2	-2/+8
	On Arm we have been failing to fully implement support for IEEE NaNs in inequality comparisons because we have allowed reversing of inequalities in a way that allows SELECT_CC_MODE to produce different answers. For example, the reverse of GT is UNLE, but if we pass these two RTL codes to SELECT_CC_MODE, the former will return CCFPEmode, while the latter CCFPmode. It would be possible to allow fully reversible FPmodes, but to do so would involve adding yet more RTL codes, something like NOT_GT and NOT_UNLE, for the cases we cannot currently reverse. NOT_GT would then have the same condition code generation as UNLT, but the same mode selection as GT. In the mean time, we need to restrict REVERSIBLE_CC_MODE to non-floating modes unless we are compiling with -ffinite-math-only. In that case we can continue to reverse the comparisons, but now we want to always select CCFPmode as there's no need to consider the exception raising cases. PR target/110796 PR target/118446 gcc/ChangeLog: * config/arm/arm.h (REVERSIBLE_CC_MODE): FP modes are only reversible if flag_finite_math_only. * config/arm/arm.cc (arm_select_cc_mode): Return CCFPmode for all FP comparisons if flag_finite_math_only. gcc/testsuite/ChangeLog: * gcc.target/arm/armv8_2-fp16-arith-1.c: Adjust due to no-longer emitting VCMPE when -ffast-math..
2025-05-07	i386: implement costs for float<->int conversions in ↵	Jan Hubicka	1	-11/+39
	ix86_vector_costs::add_stmt_cost This patch adds pattern matching for float<->int conversions both as normal statements and promote_demote. While updating promote_demote I noticed that in cleanups I turned "stmt_cost =" into "int stmt_cost = " which turned the existing FP costing to NOOP. I also added comment on how demotes are done when turning i.e. 32bit into 8bit value (which is the case of pr19919.c). The patch disables vectorization in pr119919.c on generic tuning, but keeps it at both zen and skylake+. The underlying problem is bad cost of open-coded scatter which is tracked by 119902 so I simply added -mtune=znver1 so the testcase keeps testing vectorization. gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Add FLOAT_EXPR; FIX_TRUNC_EXPR and vec_promote_demote costs. gcc/testsuite/ChangeLog: * gcc.target/i386/pr119919.c: Add -mtune=znver1
2025-05-07	AArch64: Fold SVE load/store with certain ptrue patterns to LDR/STR.	Jennifer Schmitz	3	-14/+124
	SVE loads/stores using predicates that select the bottom 8, 16, 32, 64, or 128 bits of a register can be folded to ASIMD LDR/STR, thus avoiding the predicate. For example, svuint8_t foo (uint8_t x) { return svld1 (svwhilelt_b8 (0, 16), x); } was previously compiled to: foo: ptrue p3.b, vl16 ld1b z0.b, p3/z, [x0] ret and is now compiled to: foo: ldr q0, [x0] ret The optimization is applied during the expand pass and was implemented by making the following changes to maskload<mode><vpred> and maskstore<mode><vpred>: - the existing define_insns were renamed and new define_expands for maskloads and maskstores were added with nonmemory_operand as predicate such that the SVE predicate matches both register operands and constant-vector operands. - if the SVE predicate is a constant vector and contains a pattern as described above, an ASIMD load/store is emitted instead of the SVE load/store. The patch implements the optimization for LD1 and ST1, for 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit moves, for all full SVE data vector modes. Follow-up patches for LD2/3/4 and ST2/3/4 and potentially partial SVE vector modes are planned. The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ PR target/117978 config/aarch64/aarch64-protos.h: Declare aarch64_emit_load_store_through_mode and aarch64_sve_maskloadstore. * config/aarch64/aarch64-sve.md (maskload<mode><vpred>): New define_expand folding maskloads with certain predicate patterns to ASIMD loads. (aarch64_maskload<mode><vpred>): Renamed from maskload<mode><vpred>. (maskstore<mode><vpred>): New define_expand folding maskstores with certain predicate patterns to ASIMD stores. (aarch64_maskstore<mode><vpred>): Renamed from maskstore<mode><vpred>. * config/aarch64/aarch64.cc (aarch64_emit_load_store_through_mode): New function emitting a load/store through subregs of a given mode. (aarch64_emit_sve_pred_move): Refactor to use aarch64_emit_load_store_through_mode. (aarch64_expand_maskloadstore): New function to emit ASIMD loads/stores for maskloads/stores with SVE predicates with VL1, VL2, VL4, VL8, or VL16 patterns. (aarch64_partial_ptrue_length): New function returning number of leading set bits in a predicate. gcc/testsuite/ PR target/117978 * gcc.target/aarch64/sve/acle/general/whilelt_5.c: Adjust expected outcome. * gcc.target/aarch64/sve/ldst_ptrue_pat_128_to_neon.c: New test. * gcc.target/aarch64/sve/while_7.c: Adjust expected outcome. * gcc.target/aarch64/sve/while_9.c: Adjust expected outcome.
2025-05-07	s390: Add cstoreti4 expander	Stefan Schulze Frielinghaus	4	-3/+114
	For target VXE3 just emit a 128-bit comparison followed by a conditional load. For targets prior VXE3, emulate the 128-bit comparison and make use of a conditional load, too. gcc/ChangeLog: * config/s390/s390-protos.h (s390_expand_cstoreti4): New function. * config/s390/s390.cc (s390_expand_cstoreti4): New function. * config/s390/s390.md (CC_SUZ): New mode iterator. (l): New mode attribute. (cc_tolower): New mode attribute. * config/s390/vector.md (cstoreti4): New expander. (vec_cmpv2di_lane0_<cc_tolower>): New insn. (vec_cmpti_<cc_tolower>): New insn. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/cstoreti-1.c: New test. * gcc.target/s390/vector/cstoreti-2.c: New test.
2025-05-07	x86: Insert extra move for mode size smaller than natural size	H.J. Lu	1	-4/+35
	When generating a SUBREG from V16QI to V2HF, validate_subreg fails since V2HF is a floating point vector and its size (4 bytes) is smaller than its natural size (word size). Insert an extra move with a QI vector SUBREG of the same size to avoid validate_subreg failure. gcc/ PR target/120036 * config/i386/i386-features.cc (ix86_get_vector_load_mode): Handle 8/4/2 bytes. (remove_redundant_vector_load): If the mode size is smaller than its natural size, first insert an extra move with a QI vector SUBREG of the same size to avoid validate_subreg failure. gcc/testsuite/ PR target/120036 * g++.target/i386/pr120036.C: New test. * gcc.target/i386/pr117839-3a.c: Likewise. * gcc.target/i386/pr117839-3b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-05-06	[RISC-V] Avoid unnecessary andi with -1 argument	Jeff Law	1	-5/+15
	I was preparing to do some testing of Shreya's next patch on spec and stumbled across another "andi dst,src,-1" case. I fixed some stuff like this in the gcc-15 cycle, but this one slipped through. It's probably about 100M instructions on deepsjeng. So tiny, but there's no good reason to leave the clearly extraneous instructions in the output. As with the other cases, it's a post-reload splitter that's not being careful enough about the code it generates. This has gone through my tester successfully. Waiting on the pre-commit tester before going forward. gcc/ * config/riscv/riscv.md (branch<ANYI:mode>_shiftedarith_equals_zero): Avoid generating unnecessary andi. Fix formatting. gcc/testsuite g++.target/riscv/redundant-andi.C: New test.
2025-05-06	[PATCH] RISC-V: Recognized svadu and svade extension	Mingzhu Yan	1	-0/+4
	This patch support svadu and svade extension. To enable GCC to recognize and process svadu and svade extension correctly at compile time. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_ext_version_table): New extension. (riscv_ext_flag_table) Ditto. * config/riscv/riscv.opt: New mask. * doc/invoke.texi (RISC-V Options): New extension gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-45.c: New test. * gcc.target/riscv/arch-46.c: New test.
2025-05-07	i386: Add costs for integer<->float conversions	Jan Hubicka	3	-0/+158
	Extend ix86_rtx_costs to cost FLOAT, UNSIGNED_FLOAT, FIX, and UNSIGNED_FIX. There are many variants of integer<->float conversions and it seems meaningful to start with the typical scalar and vector ones. On modern CPUs the variants differs by at most 1 cycle. gcc/ChangeLog: * config/i386/i386.cc (ix86_rtx_costs): Cost FLOAT, UNSIGNED_FLOAT, FIX, UNSIGNED_FIX. * config/i386/i386.h (struct processor_costs): Add cvtsi2ss, cvtss2si, cvtpi2ps, cvtps2pi. * config/i386/x86-tune-costs.h (struct processor_costs): Update tables.
2025-05-06	[RISC-V][PR middle-end/114512] Recognize more bext idioms for RISC-V	Shreya Munnangi	2	-0/+78
	This is Shreya's next chunk of work. When I was looking for good bugs for her to chase down I cam across PR114512. While the bug isn't necessarily a RISC-V specific bug, its testcases did show how we were failing to recognize certain bit extraction idioms and how the lispy nature of RTL allows us to tackle these issues in the combiner. First, the bit position may be masked. The RISC-V port does not define SHIFT_COUNT_TRUNCATED for valid reasons. So if we want to optimize away a mask that matches what the hardware will do, we need suitable insns that include that explicit masking. In addition to needing to incorporate masking, the masking may happen in a subword mode. So we need to recognize the mask wrapped in a zero extension. Those two captured the most common cases. We can also have a single bit extraction implemented as a left shift of the bit into the sign bit, then a right shift by the size of a word - 1. These are less common, but we did cover the case derived from the upstream bug report as well as one class seen reviewing the instruction stream for spec2017. Finally, extracting a single bit at a variable position from a constant as seen with some regularity in spec2017. In that scenario, combine's chosen split point wasn't ideal (I forget what it selected, but it definitely wasn't helpful). So we've got a new splitter for this case as well. Earlier versions of this have gone through my tester as well as a bootstrap and regression cycle. This version has just gone through a cycle in my tester (but missed today's bootstrap cycle). Waiting on the upstream pre-commit tester to render its verdict, but the plan is to commit on Shreya's behalf once that's clean. Co-authored-by: Jeff Law <jlaw@ventanamicro.com> PR middle-end/114512 gcc/ * config/riscv/bitmanip.md (bext* patterns): New patterns for bext recognition plus splitter for extracting variable bit from a constant. * config/riscv/predicates.md (bitpos_mask_operand): New predicate. gcc/testsuite/ * gcc.target/riscv/pr114512.c: New test.
2025-05-06	RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost	Pan Li	3	-1/+61
	This patch would like to combine the vec_duplicate + vadd.vv to the vadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR, it will: * The pattern matching will be active by default. * The cost of GR2VR will be added to the total cost of pattern, aka: vec_dup cost = gr2vr_cost vadd.vv v, (vec_dup (x)) = gr2vr_cost + 1 Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP) \ void \ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x; \ } DEF_VX_BINARY(int32_t, +) Before this patch: 10 │ test_binary_vx_add: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma // Deleted if GR2VR cost zero 13 │ vmv.v.x v2,a2 // Ditto. 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vadd.vv v1,v2,v1 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_binary_vx_add: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vadd.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec-opt.md (<optab>_vx_<mode>): Add new combine to convert vec_duplicate + vadd.vv to vaddvx on GR2VR cost. config/riscv/riscv.cc (riscv_rtx_costs): Take care of the cost when vec_dup and vadd v, vec_dup(x). * config/riscv/vector-iterators.md: Add new iterator for vx. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-06	RISC-V: Add gr2vr cost helper function	Pan Li	3	-3/+19
	After we introduced the --param=gpr2vr-cost option to set the cost value of when operation act from gpr to vr, we would like to introduce a new helper function to get the cost of gp2vr. And then make sure all reference to gr2vr should go this helper function. The helper function will pick up the GR2VR value if the above option is not provided, or the default GR2VR will be returned. gcc/ChangeLog: * config/riscv/riscv-protos.h (get_gr2vr_cost): Add new decl to get the cost of gr2vr. * config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost): Leverage the helper function to get the cost of gr2vr. * config/riscv/riscv.cc (riscv_register_move_cost): Ditto. (riscv_builtin_vectorization_cost): Ditto. (get_gr2vr_cost): Add new impl of the helper function. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-06	RISC-V: Add new option --param=gpr2vr-cost= for rvv insn	Pan Li	2	-0/+6
	During investigate the combine from vec_dup and vop.vv into vop.vx, we need to depend on the cost of the insn operate from the gpr to vr, for example, vadd.vx. Thus, for better control and test, we introduce a new option, aka below: --param=gpr2vr-cost=<unsigned int> To specific the cost value of the insn that operate from the gpr to vr. gcc/ChangeLog: * config/riscv/riscv-opts.h (RVV_GR2VR_COST_UNPROVIDED): Add new macro to indicate the param is not provided. * config/riscv/riscv.opt: Add new option --pararm=gpr2vr-cost. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-06	Fix i386 bootstrap on non-Windows platforms	Jan Hubicka	1	-0/+2
	* config/i386/i386.cc (ix86_tls_index): Add ifdef.
2025-05-06	Allow a PCH to be mapped to a different address	LIU Hao	1	-17/+15
	First, try mapping the PCH to its original address. If that fails, try letting the system choose one; the PCH can be relocated thereafter. Reference: https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594556.html 2022-05-11 LIU Hao <lh_mouse@126.com> Signed-off-by: Jonathan Yong <10walls@gmail.com> PR pch/14940 gcc/ChangeLog: * config/i386/host-mingw32.cc (mingw32_gt_pch_use_address): Replace the loop that attempted to map the PCH only to its original address with more adaptive operations