aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
3 daysAVR: avr.opt.urls: Add -mfuse-move2Georg-Johann Lay1-0/+3
PR rtl-optimization 121340 gcc/ * config/avr/avr.opt.urls (-mfuse-move2): Add url.
3 daysAVR: Set .type of jump table label.Georg-Johann Lay1-0/+7
gcc/ * config/avr/avr.cc (avr_output_addr_vec) <labl>: Asm out its .type.
3 daysAVR: rtl-optimization/121340 - New mini-pass to undo superfluous moves from ↵Georg-Johann Lay4-0/+152
insn combine. Insn combine may come up with superfluous reg-reg moves, where the combine people say that these are no problem since reg-alloc is supposed to optimize them. The issue is that the lower-subreg pass sitting between combine and reg-alloc may split such moves, coming up with a zoo of subregs which are only handled poorly by the register allocator. This patch adds a new avr mini-pass that handles such cases. As an example, take int f_ffssi (long x) { return __builtin_ffsl (x); } where the two functions have the same interface, i.e. there are no extra moves required for the argument or for the return value. However, $ avr-gcc -S -Os -dp -mno-fuse-move ... f_ffssi: mov r20,r22 ; 29 [c=4 l=1] movqi_insn/0 mov r21,r23 ; 30 [c=4 l=1] movqi_insn/0 mov r22,r24 ; 31 [c=4 l=1] movqi_insn/0 mov r23,r25 ; 32 [c=4 l=1] movqi_insn/0 mov r25,r23 ; 33 [c=4 l=4] *movsi/0 mov r24,r22 mov r23,r21 mov r22,r20 rcall __ffssi2 ; 34 [c=16 l=1] *ffssihi2.libgcc ret ; 37 [c=0 l=1] return where all the moves add up to a no-op. The -mno-fuse-move option stops any attempts by the avr backend to clean up that mess. PR rtl-optimization/121340 gcc/ * config/avr/avr.opt (-mfuse-move2): New option. * config/avr/avr-passes.def (avr_pass_2moves): Insert after combine. * config/avr/avr-passes.cc (make_avr_pass_2moves): New function. (pass_data avr_pass_data_2moves): New static variable. (avr_pass_2moves): New rtl_opt_pass. * config/avr/avr-protos.h (make_avr_pass_2moves): New proto. * common/config/avr/avr-common.cc (default_options avr_option_optimization_table) <-mfuse-move2>: Set for -O1 and higher. * doc/invoke.texi (AVR Options) <-mfuse-move2>: Document.
3 dayslibgcc: Update FMV features to latest ACLE spec 2024Q4Wilco Dijkstra1-6/+6
Update FMV features to latest ACLE spec of 2024Q4 - several features have been removed or merged. Add FMV support for CSSC and MOPS. Preserve the ordering in enum CPUFeatures. gcc: * common/config/aarch64/cpuinfo.h: Remove unused features, add FEAT_CSSC and FEAT_MOPS. * config/aarch64/aarch64-option-extensions.def: Remove FMV support for RPRES, use PULL rather than AES, add FMV support for CSSC and MOPS. libgcc: * config/aarch64/cpuinfo.c (__init_cpu_features_constructor): Remove unused features, add support for CSSC and MOPS.
3 daysAArch64: Use correct cost for shifted halfword load/storesWilco Dijkstra1-1/+1
Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero for these. gcc: * config/aarch64/tuning_models/generic_armv9_a.h (generic_armv9_a_addrcost_table): Use zero cost for himode.
3 daysi386: Fix typo in diagnostic about simultaneous regparm and thiscall useArtemiy Granat1-1/+1
gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Fix typo.
3 daysi386: Fix incorrect handling of simultaneous regparm and thiscall useArtemiy Granat1-0/+4
gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Handle simultaneous use of regparm and thiscall attributes in case when regparm is set before thiscall. gcc/testsuite/ChangeLog: * gcc.target/i386/attributes-error.c: Add more attributes combinations.
3 daysi386: Fix incorrect comment about stdcall and fastcall compatibilityArtemiy Granat1-3/+2
gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Fix comments which state that combination of stdcall and fastcall attributes is valid but redundant.
3 daysi386: Ignore regparm attribute and warn for it in 64-bit modeArtemiy Granat1-12/+12
The regparm attribute does not affect code generation on x86-64 target. Despite this, regparm was accepted silently, unlike other calling convention attributes handled in the ix86_handle_cconv_attribute function. Due to lack of diagnostics, Linux kernel attempted to specify regparm(0) on vmread_error_trampoline declaration, which is supposed to be invoked with all arguments on stack: https://lore.kernel.org/all/20220928232015.745948-1-seanjc@google.com/ To produce a warning for regparm in 64-bit mode, simply move the block that produces diagnostics above the block that handles the regparm attribute. gcc/ChangeLog: * config/i386/i386-options.cc (ix86_handle_cconv_attribute): Move 64-bit mode check before regparm handling. gcc/testsuite/ChangeLog: * g++.dg/abi/regparm1.C: Require ia32 target. * gcc.target/i386/20020224-1.c: Likewise. * gcc.target/i386/pr103785.c: Use regparm attribute only if not in 64-bit mode. * gcc.target/i386/pr36533.c: Likewise. * gcc.target/i386/pr59099.c: Likewise. * gcc.target/i386/sibcall-8.c: Likewise. * gcc.target/i386/sw-1.c: Likewise. * gcc.target/i386/pr15184-2.c: Fix invalid comment. * gcc.target/i386/attributes-ignore.c: New test.
3 daysaarch64: Stop using sys/ifunc.h header in libatomic and libgccYury Khrustalev1-0/+12
This optional header is used to bring in the definition of the struct __ifunc_arg_t type. Since it has been added to glibc only recently, the previous implementation had to check whether this header is present and, if not, it provide its own definition. This creates dead code because either one of these two parts would not be tested. The ABI specification for ifunc resolvers allows to create own ABI-compatible definition for this type, which is the right way of doing it. In addition to improving consistency, the new approach also helps with addition of new fields to struct __ifunc_arg_t type without the need to work-around situations when the definition imported from the header lacks these new fields. ABI allows to define as many hwcap fields in this struct as needed, provided that at runtime we only access the fields that are permitted by the _size value. gcc/ * config/aarch64/aarch64.cc (build_ifunc_arg_type): Add new fields _hwcap3 and _hwcap4. libatomic/ * config/linux/aarch64/host-config.h (__ifunc_arg_t): Remove sys/ifunc.h and add new fields _hwcap3 and _hwcap4. libgcc/ * config/aarch64/cpuinfo.c (__ifunc_arg_t): Likewise. (__init_cpu_features): obtain and assign values for the fields _hwcap3 and _hwcap4. (__init_cpu_features_constructor): check _size in the arg argument.
3 daysrs6000: Avoid undefined behavior caused by overflow and invalid shiftsKishan Parmar2-11/+16
While building GCC with --with-build-config=bootstrap-ubsan on powerpc64le-unknown-linux-gnu, multiple UBSAN runtime errors were encountered in rs6000.cc and rs6000.md due to undefined behavior involving left shifts on negative values and shift exponents equal to or exceeding the type width. The issue was in bit pattern recognition code (in can_be_rotated_to_negative_lis and can_be_built_by_li_and_rldic), where signed values were shifted without handling negative inputs or guarding against shift counts equal to the type width, causing UB. The fix ensures shifts and rotations are done unsigned HOST_WIDE_INT, and casting back only where needed (like for arithmetic right shifts) with proper guards to prevent shift-by-64. 2025-07-31 Kishan Parmar <kishan@linux.ibm.com> gcc: PR target/118890 * config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): Avoid left shift of negative value and guard shift count. (can_be_built_by_li_and_rldic): Likewise. (rs6000_emit_set_long_const): Likewise. * config/rs6000/rs6000.md (splitter for plus into two 16-bit parts): Fix UB from overflow in addition.
3 daysAdd checks for node in aarch64 vector cost modelingRichard Biener1-1/+3
After removing STMT_VINFO_MEMORY_ACCESS_TYPE we now ICE when costing for scalar stmts required in the epilog since the cost model tries to pattern-match gathers (an earlier patch tried to improve this by introducing stmt groups, but that was on hold due to negative feedback). The following shot-cuts those attempts when node is NULL as that then cannot be a vector stmt. Another possibility would be to gate on vect_body, or restructure everything. Note we now ensure that when m_costing_for_scalar node is NULL. * config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype): Check for node before dereferencing. (aarch64_vector_costs::add_stmt_cost): Likewise.
3 daysaarch64: Prevent streaming-compatible code from assembler rejection [PR121028]Spencer Abson1-2/+10
Streaming-compatible functions can be compiled without SME enabled, but need to use "SMSTART SM" and "SMSTOP SM" to temporarily switch into the streaming state of a callee. These switches are conditional on the current mode being opposite to the target mode, so no SME instructions are executed if SME is not available. However, in GAS, "SMSTART SM" and "SMSTOP SM" always require +sme. A call from a streaming-compatible function, compiled without SME enabled, to a non -streaming function will be rejected as: Error: selected processor does not support `smstop sm'.. To work around this, we make use of the .inst directive to insert the literal encodings of "SMSTART SM" and "SMSTOP SM". gcc/ChangeLog: PR target/121028 * config/aarch64/aarch64-sme.md (aarch64_smstart_sm): Use the .inst directive if !TARGET_SME. (aarch64_smstop_sm): Likewise. gcc/testsuite/ChangeLog: PR target/121028 * gcc.target/aarch64/sme/call_sm_switch_1.c: Tell check-function -bodies not to ignore .inst directives, and replace the test for "smstart sm" with one for it's encoding. * gcc.target/aarch64/sme/call_sm_switch_11.c: Likewise. * gcc.target/aarch64/sme/pr121028.c: New test.
3 daysRemove STMT_VINFO_MEMORY_ACCESS_TYPERichard Biener2-15/+16
This should be present only on SLP nodes now. The RISC-V changes are mechanical along the line of the SLP_TREE_TYPE changes. * tree-vectorizer.h (_stmt_vec_info::memory_access_type): Remove. (STMT_VINFO_MEMORY_ACCESS_TYPE): Likewise. (vect_mem_access_type): Likewise. * tree-vect-stmts.cc (vectorizable_store): Do not set STMT_VINFO_MEMORY_ACCESS_TYPE. Fix SLP_TREE_MEMORY_ACCESS_TYPE usage. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Remove checking of memory access type. * config/riscv/riscv-vector-costs.cc (costs::compute_local_live_ranges): Use SLP_TREE_MEMORY_ACCESS_TYPE. (costs::need_additional_vector_vars_p): Likewise. (segment_loadstore_group_size): Get SLP node as argument, use SLP_TREE_MEMORY_ACCESS_TYPE. (costs::adjust_stmt_cost): Pass down SLP node. * config/aarch64/aarch64.cc (aarch64_ld234_st234_vectors): Use SLP_TREE_MEMORY_ACCESS_TYPE instead of vect_mem_access_type. (aarch64_detect_vector_stmt_subtype): Likewise. (aarch64_vector_costs::count_ops): Likewise. (aarch64_vector_costs::add_stmt_cost): Likewise.
3 daysFix comment typos - hanlde -> handleJakub Jelinek2-3/+3
2025-07-31 Jakub Jelinek <jakub@redhat.com> * gimple-ssa-store-merging.cc (find_bswap_or_nop): Fix comment typos, hanlde -> handle. * config/i386/i386.cc (ix86_gimple_fold_builtin, ix86_rtx_costs): Likewise. * config/i386/i386-features.cc (remove_partial_avx_dependency): Likewise. * gcc.target/i386/apx-1.c (apx_hanlder): Rename to ... (apx_handler): ... this. * gcc.target/i386/uintr-2.c (UINTR_hanlder): Rename to ... (UINTR_handler): ... this. * gcc.target/i386/uintr-5.c (UINTR_hanlder): Rename to ... (UINTR_handler): ... this.
3 daysRISC-V: Adding H to the canonical order [PR121312]Kito Cheng1-1/+1
We added H into canonical order before, but forgot to add it to arch-canonicalize as well... gcc/ChangeLog: PR target/121312 * config/riscv/arch-canonicalize: Add H extension to the canonical order.
4 days[x86] factor out worker from ix86_builtin_vectorization_costRichard Biener1-18/+23
The following factors out a worker that gets a mode argument rather than a vectype argument. That makes a difference when we hit the fallback in add_stmt_cost for scalar stmts where vectype might be NULL and thus mode is derived from the scalar stmt there. But ix86_builtin_vectorization_cost does not have access to the stmt. So the patch instead dispatches to the new ix86_default_vector_cost there, passing down the mode we derived from the stmt. This is to avoid regressions with a patch that makes even more scalar stmt costings have a vectype passed. * config/i386/i386.cc (ix86_default_vector_cost): Split out from ... (ix86_builtin_vectorization_cost): ... this and use mode instead of vectype as argument. (ix86_vector_costs::add_stmt_cost): Call ix86_default_vector_cost instead of ix86_builtin_vectorization_cost.
4 dayss390: Implement spaceship optab [PR117015]Stefan Schulze Frielinghaus3-0/+184
gcc/ChangeLog: PR target/117015 * config/s390/s390-protos.h (s390_expand_int_spaceship): New function. (s390_expand_fp_spaceship): New function. * config/s390/s390.cc (s390_expand_int_spaceship): New function. (s390_expand_fp_spaceship): New function. * config/s390/s390.md (spaceship<mode>4): New expander. gcc/testsuite/ChangeLog: * gcc.target/s390/spaceship-fp-1.c: New test. * gcc.target/s390/spaceship-fp-2.c: New test. * gcc.target/s390/spaceship-fp-3.c: New test. * gcc.target/s390/spaceship-fp-4.c: New test. * gcc.target/s390/spaceship-int-1.c: New test. * gcc.target/s390/spaceship-int-2.c: New test. * gcc.target/s390/spaceship-int-3.c: New test.
4 daysx86: Transform to "pushq $-1; popq reg" for -OzH.J. Lu1-1/+2
commit 4c80062d7b8c272e2e193b8074a8440dbb4fe588 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun May 25 07:40:29 2025 +0800 x86: Enable *mov<mode>_(and|or) only for -Oz disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for -Oz. But for legacy integer registers, the former is 4 bytes and the latter is 3 bytes. Enable such transformation for -Oz. gcc/ PR target/120427 * config/i386/i386.md (peephole2): Transform "movq $-1,reg" to "pushq $-1; popq reg" for -Oz if reg is a legacy integer register. gcc/testsuite/ PR target/120427 * gcc.target/i386/pr120427-5.c: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
4 daysvect: Add target hook to prefer gather/scatter instructionsAndrew Stubbs1-0/+12
For AMD GCN, the instructions available for loading/storing vectors are always scatter/gather operations (i.e. there are separate addresses for each vector lane), so the current heuristic to avoid gather/scatter operations with too many elements in get_group_load_store_type is counterproductive. Avoiding such operations in that function can subsequently lead to a missed vectorization opportunity whereby later analyses in the vectorizer try to use a very wide array type which is not available on this target, and thus it bails out. This patch adds a target hook to override the "single_element_p" heuristic in the function as a target hook, and activates it for GCN. This allows much better code to be generated for affected loops. Co-authored-by: Julian Brown <julian@codesourcery.com> gcc/ * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add documentation hook. * doc/tm.texi: Regenerate. * target.def (prefer_gather_scatter): Add target hook under vectorizer. * hooks.cc (hook_bool_mode_int_unsigned_false): New function. * hooks.h (hook_bool_mode_int_unsigned_false): New prototype. * tree-vect-stmts.cc (vect_use_strided_gather_scatters_p): Add parameters group_size and single_element_p, and rework to use targetm.vectorize.prefer_gather_scatter. (get_group_load_store_type): Move some of the condition into vect_use_strided_gather_scatters_p. * config/gcn/gcn.cc (gcn_prefer_gather_scatter): New function. (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook.
4 daysDon't pass vector params through to offload targetsAndrew Stubbs1-0/+6
The optimization options are deliberately passed through to the LTO compiler, but when the same mechanism is reused for offloading it ends up forcing the host compiler settings onto the device compiler. Maybe this should be removed completely, but this patch just fixes a few of them. In particular, param_vect_partial_vector_usage is disabled by x86 and this really hurts amdgcn. I also fixed an ambiguous else warning in the generated file by adding braces. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_option_override): Add note to set default for param_vect_partial_vector_usage to "1". * optc-save-gen.awk: Don't pass through options marked "NoOffload". * params.opt (-param=vect-epilogues-nomask): Add NoOffload. (-param=vect-partial-vector-usage): Likewise. (-param=vect-inner-loop-cost-factor): Likewise.
4 daysaarch64: Fix sme2+faminmax intrisic gating (PR 121300)Alfie Richards1-1/+2
Fixes the feature gating for the SME2+FAMINMAX intrinsics. PR target/121300 gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-sme.def (svamin/svamax): Fix arch gating. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr121300.c: New test.
4 daysaarch64: Add support for unpacked SVE FP conditional ternary arithmeticSpencer Abson1-29/+31
This patch extends the expander for fma, fnma, fms, and fnms to support partial SVE FP modes. We add the missing BF16 tests, which we can now trigger for having implemented the conditional expander. We also add tests for the 'merging with multiplicand' case, which this expander canonicalizes (albeit under SVE_STRICT_GP). gcc/ChangeLog: * config/aarch64/aarch64-sve.md (@cond_<optab><mode>): Extend to support partial FP modes. (*cond_<optab><mode>_2_strict): Extend from SVE_FULL_F to SVE_F, use aarch64_predicate_operand. (*cond_<optab><mode>_4_strict): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16, use aarch64_predicate_operand. (*cond_<optab><mode>_any_strict): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: Add test cases for merging with multiplcand. * gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmla_2.c: New test. * gcc.target/aarch64/sve/unpacked_cond_fmls_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmla_2.c: Likewise.. * gcc.target/aarch64/sve/unpacked_cond_fnmls_2.c: Likewise. * g++.target/aarch64/sve/unpacked_cond_ternary_bf16_1.C: Likewise. * g++.target/aarch64/sve/unpacked_cond_ternary_bf16_2.C: Likewise.
4 daysaarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary arithmeticSpencer Abson1-19/+19
Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is SVE_RELAXED_GP. We can only reliably test the 'merging with the third input' (addend) and 'independent value' patterns at this stage as the canocalisation that reorders the multiplicands based on the second SEL input would be performed by the conditional expander. Another difficulty is that we can't test these fused multiply/SEL combines without using __builtin_fma and friends. The reason for this is as follows: We support COND_ADD, COND_SUB, and COND_MUL optabs, so match.pd will canonicalize patterns like ADD/SUB/MUL combined with a VEC_COND_EXPR into these conditional forms. Later, when widening_mul tries to fold these into conditional fused multiply operations, the transformation fails - simply because we haven’t implemented those conditional fused multiply optabs yet. Hence why this patch lacks tests for BFloat16... gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_relaxed): Extend from SVE_FULL_F to SVE_F. (*cond_<optab><mode>_4_relaxed): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. (*cond_<optab><mode>_any_relaxed): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: New test. * gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise.
4 daysaarch64: Add support for unpacked SVE FP ternary arithmeticSpencer Abson1-13/+13
This patch extends the expander for unconditional fma, fnma, fms, and fnms, so that it supports partial SVE FP modes. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (<optab><mode>4): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. Use aarch64_sve_fp_pred instead of aarch64_ptrue_reg. (@aarch64_pred_<optab><mode>): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. Use aarch64_predicate_operand. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/unpacked_ternary_bf16_1.C: New test. * g++.target/aarch64/sve/unpacked_ternary_bf16_2.C: Likewise. * gcc.target/aarch64/sve/unpacked_fmla_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmla_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmls_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fnmla_1.c: Likeiwse. * gcc.target/aarch64/sve/unpacked_fnmla_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fnmls_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fnmls_2.c: Likewise.
4 daysRemove V64SFmode and V64SImode.liuhongt2-4/+1
It's needed by avx5124vnniw/avx5124fmaps which have been removed by r15-656-ge1a7e2c54d52d0. gcc/ChangeLog: * config/i386/i386-modes.def: Remove VECTOR_MODES(FLOAT, 256) and VECTOR_MODE (INT, SI, 64). * config/i386/i386.cc (ix86_hard_regno_nregs): Remove related code for V64SF/V64SImode.
4 daysEliminate redundant vpextrq/vpinsrq when move TI to V4SI.liuhongt1-0/+13
r14-1902-g96c3539f2a3813 split TImode move with 2 DImode move, it's supposed to optimize TImode in parameter/return since accoring to psABI it's stored into 2 general registers. But when TImode is not in parameter/return, it could create redundancy in the PR. The patch add a splitter to handle that. .i.e. (insn 10 9 14 2 (set (subreg:V2DI (reg:V4SI 98 [ <retval> ]) 0) (vec_concat:V2DI (subreg:DI (reg:TI 101) 0) (subreg:DI (reg:TI 101) 8))) 8442 {vec_concatv2di} (expr_list:REG_DEAD (reg:TI 101) gcc/ChangeLog: PR target/121274 * config/i386/sse.md (*vec_concatv2di_0): Add a splitter before it. gcc/testsuite/ChangeLog: * gcc.target/i386/pr121274.c: New test.
5 daysaarch64: Add support for unpacked SVE FP conditional binary arithmeticSpencer Abson3-73/+107
This patch extends the expander for conditional smax, smin, add, sub, mul, min, max, and div to support partial SVE FP modes. If exceptions from undefined vector elements must be suppressed, this expansion converts the container-level predicate to an element-level one, and ensures that these elements are inactive for the operation. In practice, this is a predicate AND with the existing mask and a container-size PTRUE. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_sve_emit_masked_fp_pred): Declare. * config/aarch64/aarch64-sve.md (and<mode>3): Change this to... (@and<mode>3): ...this, so that we can use gen_and3. (@cond_<optab><mode>): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16, use aarch64_predicate_operand. (*cond_<optab><mode>_2_strict): Likewise. (*cond_<optab><mode>_3_strict): Likewise. (*cond_<optab><mode>_any_strict): Likwise. (*cond_<optab><mode>_2_const_strict): Extend from SVE_FULL_F to SVE_F, use aarch64_predicate_operand. (*cond_<optab><mode>_any_const_strict): Likewise. (*cond_sub<mode>_3_const_strict): Likwise. (*cond_sub<mode>_const_strict): Likewise. (*vcond_mask_<mode><vpred>): Use aarch64_predicate_operand, and update the comment here. * config/aarch64/aarch64.cc (aarch64_sve_emit_masked_fp_pred): New function. Helper to mask the predicate in conditional expanders. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/unpacked_cond_binary_bf16_2.C: New test. * gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fadd_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fdiv_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmaxnm_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fminnm_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmul_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fsubr_2.c: Likewise.
5 daysRISC-V: Generate -mcpu and -mtune options from riscv-cores.def.Dongyan Chen3-2/+119
Automatically generate -mcpu and -mtune options in invoke.texi from the unified riscv-cores.def metadata, ensuring documentation stays in sync with definitions and reducing manual maintenance. gcc/ChangeLog: * Makefile.in: Add riscv-mcpu.texi and riscv-mtune.texi to the list of files to be processed by the Texinfo generator. * config/riscv/t-riscv: Add rule for generating riscv-mcpu.texi and riscv-mtune.texi. * doc/invoke.texi: Replace hand‑written extension table with `@include riscv-mcpu.texi` and `@include riscv-mtune.texi` to pull in auto‑generated entries. * config/riscv/gen-riscv-mcpu-texi.cc: New file. * config/riscv/gen-riscv-mtune-texi.cc: New file. * doc/riscv-mcpu.texi: New file. * doc/riscv-mtune.texi: New file.
5 dayssimplify-rtx: Simplify subregs of logic opsRichard Sandiford1-0/+34
This patch adds a new rule for distributing lowpart subregs through ANDs, IORs, and XORs with a constant, in cases where one of the terms then disappears. For example: (lowart-subreg:QI (and:HI x 0x100)) simplifies to zero and (lowart-subreg:QI (and:HI x 0xff)) simplifies to (lowart-subreg:QI x). This would often be handled at some point using nonzero bits. However, the specific case I want the optimisation for is SVE predicates, where nonzero bit tracking isn't currently an option. Specifically: the predicate modes VNx8BI, VNx4BI and VNx2BI have the same size as VNx16BI, but treat only every second, fourth, or eighth bit as significant. Thus if we have: (subreg:VNx8BI (and:VNx16BI x C)) where C is the repeating constant { 1, 0, 1, 0, ... }, then the AND only clears bits that are made insignificant by the subreg, and so the result is equal to (subreg:VNx8BI x). Later patches rely on this. gcc/ * simplify-rtx.cc (simplify_context::simplify_subreg): Distribute lowpart subregs through AND/IOR/XOR, if doing so eliminates one of the terms. (test_scalar_int_ext_ops): Add some tests of the above for integers. * config/aarch64/aarch64.cc (aarch64_test_sve_folding): Likewise add tests for predicate modes.
5 daysaarch64: Fix function_expander::get_reg_targetRichard Sandiford1-1/+2
function_expander::get_reg_target didn't actually check for a register, meaning that it could return a memory target instead. That doesn't really matter for the current direct and indirect uses (svundef*, svcreate*, and svset*) but it will for later patches. gcc/ * config/aarch64/aarch64-sve-builtins.cc (function_expander::get_reg_target): Check whether the target is a valid register_operand.
6 daysAVR: target/121277 - Don't load 0x800000 with const __flashx *x = NULL.Georg-Johann Lay1-6/+13
Converting from generic AS to __flashx used the same rule like for __memx, which tags RAM (generic AS) locations by setting bit 23. The justification was that generic isn't a subset of __flashx, though that lead to surprises with code like const __flashx *x = NULL. The natural thing to do is to just load 0x000000 in that case, so that the null pointer works in __flashx as expected. Apart from that, converting NULL to __flashx (or __flash) no more raises a -Waddr-space-convert diagnostic. gcc/ PR target/121277 * config/avr/avr.cc (avr_addr_space_convert): When converting from generic AS to __flashx, don't set bit 23. (avr_convert_to_type): Don't -Waddr-space-convert when NULL is converted to __flashx or to __flash.
6 daysx86: Disallow -mtls-dialect=gnu with no_caller_saved_registersH.J. Lu1-0/+22
__tls_get_addr doesn't preserve vector registers. When a function with no_caller_saved_registers attribute calls __tls_get_addr, YMM and ZMM registers will be clobbered. Issue an error and suggest -mtls-dialect=gnu2 in this case. gcc/ PR target/121208 * config/i386/i386.cc (ix86_tls_get_addr): Issue an error for -mtls-dialect=gnu with no_caller_saved_registers attribute and suggest -mtls-dialect=gnu2. gcc/testsuite/ PR target/121208 * gcc.target/i386/pr121208-1a.c: New test. * gcc.target/i386/pr121208-1b.c: Likewise. * gcc.target/i386/pr121208-2a.c: Likewise. * gcc.target/i386/pr121208-2b.c: Likewise. * gcc.target/i386/pr121208-3a.c: Likewise. * gcc.target/i386/pr121208-3b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
6 daysnvptx/nvptx.opt: Update -march-map= for newer sm_xxxTobias Burnus1-0/+45
Usage of the -march-map=: "Select the closest available '-march=' value that is not more capable." As PTX ISA 8.6/8.7 (= unreleased CUDA 12.7 + CUDA 12.8) added the Nvidia Blackwell GPUs SM_100, SM_101, and SM_120, it makes sense to add them as well. Note that all three come as sm_XXX and sm_XXXa. PTX ISA 8.8 (CUDA 12.9) added SM_103 and SM_121 and the new 'f' suffix for all SM_1xx. Internally, GCC currently generates the same code for >= sm_80 (Ampere); however, as GCC's -march= also supports sm_89 (Ada), the here added sm_1xxs (Blackwell) will map to sm_89. [Naming note: while ptx code generated for sm_X can also run with sm_Y if Y > X, code generated for sm_XXXa can (generally) only run on the specific hardware; and sm_XXXf implies compatibility with only subsequent targets in the same family.] gcc/ChangeLog: * config/nvptx/nvptx.opt (march-map=): Add sm_100{,f,a}, sm_101{,f,a}, sm_103{,a,f}, sm_120{,a,f} and sm_121{,f,a}.
6 daysgcn: Fix CDNA3 atomics' buffer invalidationTobias Burnus1-10/+12
For device (agent) scope atomics - as needed when there is more than one teams, a buffer_wbl2 followed by s_waitcnt is required. When doing the initial porting, the pre-atomic instruction got accidentally replaced by buffer_inv sc1, which is not quite the right instruction. gcc/ChangeLog: * config/gcn/gcn.md (atomic_load, atomic_store, atomic_exchange): Fix CDNA3 L2 cache write-back before atomic instructions.
6 daysgcn: Add more s_nop for MI300Tobias Burnus3-38/+55
Implement another case where the CDNA3 ISA documentation requires s_nop, add a comment why another case does not need to be handled. And add one case where an s_nop is required by MI300A hardware but seems to be not mentioned in the CDNA3 ISA documentation. gcc/ChangeLog: * config/gcn/gcn.md (define_attr "vcmp"): Add with values vcmp/vcmpx/no. (*movbi, cstoredi4.., cstore<mode>4): Set it. * config/gcn/gcn-valu.md (vec_cmp<mode>...): Likewise. * config/gcn/gcn.cc (gcn_cmpx_insn_p): Remove. (gcn_md_reorg): Add two new conditions for MI300.
6 daysgcn: Add 'nops' insn, extend commentsTobias Burnus3-2/+18
Use 's_nops' with a number instead of multiple of 's_nop' when manually adding 1 to 5 wait state. This helps with the instruction cache and helps a tiny bit with PR119367 where a two-byte variable overflows in the debugging location view handling. Add a comment about 'sc0' to TARGET_GLC_NAME as for atomics it is unrelated to the scope but to whether the result is stored; i.e. using e.g. 'sc1' instead of 'sc0' will have undesired consequences! Update the comment above print_operand_address to document 'R' and 'V'; those are used below as "Temporary hack.", but it makes sense to see them in the list. gcc/ChangeLog: * config/gcn/gcn-opts.h (enum hsaco_attr_type): Add comment about 'sc0'. * config/gcn/gcn.cc (gcn_md_reorg): Use gen_nops instead of gen_nop. (print_operand_address): Document 'R' and 'V' in the pre-function comment as well. * config/gcn/gcn.md (nops): Add.
6 daysMove STMT_VINFO_TYPE to SLP_TREE_TYPERichard Biener4-30/+24
I am at a point where I want to store additional information from analysis (from loads and stores) to re-use them at transform stage without repeating the analysis. I do not want to add to stmt_vec_info at this point, so this starts adding kind specific sub-structures by moving the STMT_VINFO_TYPE field to the SLP tree and adding a (dummy for now) union tagged by it to receive such data. The change is largely mechanical after RISC-V has been prepared to have a SLP node around. I have settled for a union (supposed to get pointers to data). As followup this enables getting rid of SLP_TREE_CODE and making VEC_PERM therein a separate type, unifying its handling. * tree-vectorizer.h (_slp_tree::type): Add. (_slp_tree::u): Likewise. (_stmt_vec_info::type): Remove. (STMT_VINFO_TYPE): Likewise. (SLP_TREE_TYPE): New. * tree-vectorizer.cc (vec_info::new_stmt_vec_info): Do not initialize type. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize type. (vect_slp_analyze_node_operations): Adjust. (vect_schedule_slp_node): Likewise. * tree-vect-patterns.cc (vect_init_pattern_stmt): Do not copy STMT_VINFO_TYPE. * tree-vect-loop.cc: Set SLP_TREE_TYPE instead of STMT_VINFO_TYPE everywhere. (vect_create_loop_vinfo): Do not set STMT_VINFO_TYPE on loop conditions. * tree-vect-stmts.cc: Set SLP_TREE_TYPE instead of STMT_VINFO_TYPE everywhere. (vect_analyze_stmt): Adjust. (vect_transform_stmt): Likewise. * config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops): Access SLP_TREE_TYPE instead of STMT_VINFO_TYPE. * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Remove non-SLP element-wise load/store matching. * config/rs6000/rs6000.cc (rs6000_cost_data::update_target_cost_per_stmt): Pass in the SLP node. Use that to get at the memory access kind and type. (rs6000_cost_data::add_stmt_cost): Pass down SLP node. * config/riscv/riscv-vector-costs.cc (variable_vectorized_p): Use SLP_TREE_TYPE. (costs::need_additional_vector_vars_p): Likewise. (costs::update_local_live_ranges): Likewise.
6 daysaarch64: Add tuning model for Olympus core.Jennifer Schmitz3-1/+212
This patch adds a new tuning model for the NVIDIA Olympus core. The values used here are based on the Software Optimization Guide that will be published imminently. Bootstrapped and tested on aarch64-linux-gnu, no regression. OK for trunk? OK to backport to GCC 15? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> Co-Authored-By: Dhruv Chawla <dhruvc@nvidia.com> gcc/ChangeLog: * config/aarch64/aarch64-cores.def (olympus): Use olympus tuning model. * config/aarch64/aarch64.cc: Include olympus.h. * config/aarch64/tuning_models/olympus.h: New file.
6 daysLoongArch: Remove the definition of CASE_VECTOR_SHORTEN_MODE.Lulu Cheng1-2/+0
On LoongArch, the switch jump-table always stores absolute addresses, so there is no need to define the macro CASE_VECTOR_SHORTEN_MODE. gcc/ChangeLog: * config/loongarch/loongarch.h (CASE_VECTOR_SHORTEN_MODE): Delete.
6 daysxtensa: Fix remaining inaccuracies in xtensa_is_insn_L32R_p()Takayuki 'January June' Suwa1-11/+35
The previous fix also had some flaws: - The TARGET_CONST16 check was a bit premature - It didn't take into account the possibility of the RTL expression "(set (reg:SF gpr) (const_int))", especially when TARGET_AUTOLITPOOLS is configured This patch fixes the above. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_is_insn_L32R_p): Re-rewrite to more accurately capture insns that could be L32R machine instructions wherever possible, and add comments that help understand the intent of the process.
7 daysRISC-V: Combine vec_duplicate + vaadd.vv to vaadd.vx on GR2VR costPan Li3-4/+7
This patch would like to combine the vec_duplicate + vaadd.vv to the vaadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_AVG_FLOOR(NT, WT) \ NT \ test_##NT##_avg_floor(NT x, NT y) \ { \ return (NT)(((WT)x + (WT)y) >> 1); \ } #define AVG_FLOOR_FUNC(T) test_##T##_avg_floor DEF_AVG_FLOOR(int32_t, int64_t) DEF_VX_BINARY_CASE_2_WRAP(T, AVG_FLOOR_FUNC(T), avg_floor) Before this patch: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vaadd.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vaadd.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vxrm_vec_vec_dup): Add new case UNSPEC_VAADD. (expand_vx_binary_vxrm_vec_dup_vec): Ditto. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new case UNSPEC_VAADD to iterator. Signed-off-by: Pan Li <pan2.li@intel.com>
8 daysRISC-V: riscv-ext.def: Add allocated group IDs and group bit positionsChristoph Müllner1-15/+15
The riscv-c-api-doc defines a group ID and and a bit position for some extension. Most of them are set in riscv-ext.def, but some are missing and one bit position (for Zilsd) is wrong. This patch replaces the `BITMASK_NOT_YET_ALLOCATED` value for the actual allocated value wherever possible and fixes the bit position for Zilsd. Currently, we don't have any infrastructure to utilize the information that is placed into riscv_ext_info_t::m_bitmask_group_id and riscv_ext_info_t::m_bitmask_group_bit_pos. This also means we can't test. gcc/ChangeLog: * config/riscv/riscv-ext.def: Add allocated group IDs and group bit positions. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
9 daysdiagnostics: convert diagnostic_t to enum class diagnostics::kindDavid Malcolm3-7/+12
No functional change intended. gcc/ChangeLog: * Makefile.in: Replace diagnostic.def with diagnostics/kinds.def. * config/aarch64/aarch64.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * config/i386/i386-options.cc: Likewise. * config/s390/s390.cc: Likewise. * diagnostic-core.h: Replace typedef diagnostic_t with enum class diagnostics::kind in diagnostics/kinds.h and include it. * diagnostic-global-context.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * diagnostic.cc: Likewise. * diagnostic.h: Likewise. * diagnostics/buffering.cc: Likewise. * diagnostics/buffering.h: Likewise. * diagnostics/context.h: Likewise. * diagnostics/diagnostic-info.h: Likewise. * diagnostics/html-sink.cc: Likewise. * diagnostic.def: Move to... * diagnostics/kinds.def: ...here and update for diagnostic_t becoming enum class diagnostics::kind. * diagnostics/kinds.h: New file, based on material in diagnostic-core.h. * diagnostics/lazy-paths.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * diagnostics/option-classifier.cc: Likewise. * diagnostics/option-classifier.h: Likewise. * diagnostics/output-spec.h: Likewise. * diagnostics/paths-output.cc: Likewise. * diagnostics/sarif-sink.cc: Likewise. * diagnostics/selftest-context.cc: Likewise. * diagnostics/selftest-context.h: Likewise. * diagnostics/sink.h: Likewise. * diagnostics/source-printing.cc: Likewise. * diagnostics/text-sink.cc: Likewise. * diagnostics/text-sink.h: Likewise. * gcc.cc: Likewise. * libgdiagnostics.cc: Likewise. * lto-wrapper.cc: Likewise. * opts-common.cc: Likewise. * opts-diagnostic.h: Likewise. * opts.cc: Likewise. * rtl-error.cc: Likewise. * substring-locations.cc: Likewise. * toplev.cc: Likewise. gcc/ada/ChangeLog: * gcc-interface/trans.cc: Update for diagnostic_t becoming enum class diagnostics::kind. gcc/analyzer/ChangeLog: * pending-diagnostic.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * program-point.cc: Likewise. gcc/c-family/ChangeLog: * c-common.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * c-format.cc: Likewise. * c-lex.cc: Likewise. * c-opts.cc: Likewise. * c-pragma.cc: Likewise. * c-warn.cc: Likewise. gcc/c/ChangeLog: * c-errors.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * c-parser.cc: Likewise. * c-typeck.cc: Likewise. gcc/cobol/ChangeLog: * util.cc: Update for diagnostic_t becoming enum class diagnostics::kind. gcc/cp/ChangeLog: * call.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * constexpr.cc: Likewise. * cp-tree.h: Likewise. * decl.cc: Likewise. * error.cc: Likewise. * init.cc: Likewise. * method.cc: Likewise. * module.cc: Likewise. * parser.cc: Likewise. * pt.cc: Likewise. * semantics.cc: Likewise. * typeck.cc: Likewise. * typeck2.cc: Likewise. gcc/d/ChangeLog: * d-diagnostic.cc: Update for diagnostic_t becoming enum class diagnostics::kind. gcc/fortran/ChangeLog: * cpp.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * error.cc: Likewise. * options.cc: Likewise. gcc/jit/ChangeLog: * dummy-frontend.cc: Update for diagnostic_t becoming enum class diagnostics::kind. gcc/m2/ChangeLog: * gm2-gcc/m2linemap.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * gm2-gcc/rtegraph.cc: Likewise. gcc/rust/ChangeLog: * backend/rust-tree.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * backend/rust-tree.h: Likewise. * resolve/rust-ast-resolve-expr.cc: Likewise. * resolve/rust-ice-finalizer.cc: Likewise. * resolve/rust-ice-finalizer.h: Likewise. * resolve/rust-late-name-resolver-2.0.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/diagnostic_plugin_test_show_locus.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * gcc.dg/plugin/expensive_selftests_plugin.cc: Likewise. * gcc.dg/plugin/location_overflow_plugin.cc: Likewise. * lib/gcc-dg.exp: Likewise. libcpp/ChangeLog: * internal.h: Update comment for diagnostic_t becoming enum class diagnostics::kind. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
9 daysRISC-V: Prepare dynamic LMUL heuristic for SLP.Robin Dapp2-25/+62
This patch prepares the dynamic LMUL vector costing to use the coming SLP_TREE_TYPE instead of the (to-be-removed) STMT_VINFO_TYPE. Even though the whole approach should be reviewed and adjusted at some point, the patch chooses the path of least resistance and uses a hash map for the stmt_info -> slp node relationship. A node is mapped to the accompanying stmt_info during add_stmt_cost. In finish_cost we go through all statements as before, and obtain the corresponding slp nodes as well as their types. This allows us to operate largely as before. We don't yet do the switch over from STMT_VINFO_TYPE to SLP_TREE_TYPE, though but only take care of the necessary refactoring upfront. Regtested on rv64gcv_zvl512b with -mrvv-max-lmul=dynamic. There are a few regressions but nothing worse than what we already have. I'd rather accept these now and take it as an incentive to work on the heuristic later than block the SLP work until it is fixed. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (get_live_range): Move compute_local_program_points to cost class. (variable_vectorized_p): Add slp node parameter. (need_additional_vector_vars_p): Move from here... (costs::need_additional_vector_vars_p): ... to here and add slp parameter. (compute_estimated_lmul): Move update_local_live_ranges to cost class. (has_unexpected_spills_p): Move from here... (costs::has_unexpected_spills_p): ... to here. (costs::record_lmul_spills): New function. (costs::add_stmt_cost): Add stmt_info, slp mapping. (costs::finish_cost): Analyze loop. * config/riscv/riscv-vector-costs.h: Move declarations to class.
9 daysRISC-V: Remove user-level interruptsChristoph Müllner2-19/+8
There was once a RISC-V extension draft ("N"), which introduced user-level interrupts. However, it was never ratified and the specification draft has been removed from the RISC-V ISA manual in commit `b6cade07034` with the comment "it'll likely need to be redesigned". Support for a N extension never made it to GCC, but we support fuction attributes for user-level interrupt handlers that use the URET instruction. The "user" interrupt attribute was documented in the RISC-V C API, but has been removed in PR #106 in May 2025 (driven by LLVM devs/ maintainers and ack'ed by at least one GCC maintainer). Let's drop URET support from GCC as well. gcc/ChangeLog: * config/riscv/riscv.cc (enum riscv_privilege_levels): Remove USER_MODE. (riscv_handle_type_attribute): Remove "user" interrupts. (riscv_expand_epilogue): Likewise. (riscv_get_interrupt_type): Likewise. * config/riscv/riscv.md (riscv_uret): Remove URET pattern. * doc/extend.texi: Remove documentation of user interrupts. gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-conflict-mode.c: Remove "user" interrupts. * gcc.target/riscv/xtheadint-push-pop.c: Likewise. * gcc.target/riscv/interrupt-umode.c: Removed. Reported-by: Sam Elliott <quic_aelliott@quicinc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
9 daysgcn: Add "s_nop"s for MI300Tobias Burnus4-137/+312
MI300 requires some additional s_nop to be added between some instructions. * As 'v_readlane' and 'v_writelane' have to be distinguished, the 'laneselect' attribute was changed from no/yes to no/read/write. * Add some missing 'laneselect' attributes for v_(read,write)lane. * Replace 'delayeduse' by 'flatmemaccess' which is more explicit, especially as some uses have to destinguished more details. (Alongside, one off-by-two delayeduse has been fixed.) On the other hand, RDNA 2, 3, and 3.5 do not require any added s_nop; thus, there is no need to walk the instructions for them to insert pointless S_NOP. (RDNA4 (not yet in GCC) requires it in a few cases.) gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_NO_MANUAL_NOPS, TARGET_CDNA3_NOPS): Define. * config/gcn/gcn.md (define_attr "laneselect): Change 'yes' to 'read' and 'write'. (define_attr "flatmemaccess"): Add with values store, storex34, load, atomic, atomicwait, cmpswapx2, and no. Replacing ... (define_attr "delayeduse"): Remove. (define_attr "transop"): Add with values yes and no. (various insns): Update 'laneselect', add flatmemaccess and transop, remove delayeduse; fixing an issue for s_load_dwordx4 vs. flat_store_dwordx4 related to delayeduse (now: flatmemaccess). * config/gcn/gcn-valu.md: Update laneselect attribute and add flatmemaccess. * config/gcn/gcn.cc (gcn_cmpx_insn_p): New. (gcn_md_reorg): Update for MI300 to add additional s_nop. Skip s_nop-insertion part for RDNA{2,3}; add "VALU writes EXEC followed by VALU DPP" unconditionally for CDNA2/CDNA3/GCN5.
9 daysRISC-V: Add support for resumable non-maskable interrupt (RNMI) handlersChristoph Müllner2-4/+21
The Smrnmi extension introduces the nmret instruction to return from RNMI handlers. We already have basic Smrnmi support. This patch introduces support for the nmret instruction and the ability to set the function attribute `__attribute__ ((interrupt ("rnmi")))` to let the compiler generate RNMI handlers. The attribute name is proposed in a PR for the RISC C API and approved by LLVM maintainers: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/116 gcc/ChangeLog: * config/riscv/riscv.cc (enum riscv_privilege_levels): Add RNMI_MODE. (riscv_handle_type_attribute): Handle 'rnmi' interrupt attribute. (riscv_expand_epilogue): Generate nmret for RNMI handlers. (riscv_get_interrupt_type): Handle 'rnmi' interrupt attribute. * config/riscv/riscv.md (riscv_rnmi): Add nmret INSN. * doc/extend.texi: Add documentation for 'rnmi' interrupt attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-rnmi.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
10 daysvect: Add is_gather_scatter argument to misalignment hook.Robin Dapp8-20/+72
This patch adds an is_gather_scatter argument to the support_vector_misalignment hook. All targets but riscv do not care about alignment for gather/scatter so return true for is_gather_scatter. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_builtin_support_vector_misalignment): Return true for gather/scatter. * config/arm/arm.cc (arm_builtin_support_vector_misalignment): Ditto. * config/epiphany/epiphany.cc (epiphany_support_vector_misalignment): Ditto. * config/gcn/gcn.cc (gcn_vectorize_support_vector_misalignment): Ditto. * config/loongarch/loongarch.cc (loongarch_builtin_support_vector_misalignment): Ditto. * config/riscv/riscv.cc (riscv_support_vector_misalignment): Add gather/scatter argument. * config/rs6000/rs6000.cc (rs6000_builtin_support_vector_misalignment): Return true for gather/scatter. * config/s390/s390.cc (s390_support_vector_misalignment): Ditto. * doc/tm.texi: Add argument. * target.def: Ditto. * targhooks.cc (default_builtin_support_vector_misalignment): Ditto. * targhooks.h (default_builtin_support_vector_misalignment): Ditto. * tree-vect-data-refs.cc (vect_supportable_dr_alignment): Ditto.
10 daysaarch64: Relaxed SEL combiner patterns for unpacked SVE FP binary arithmeticSpencer Abson1-49/+49
Extend the binary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ SVE_FULL_F_B16B16 to SVE_F/SVE_F_B16B16, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_relaxed): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. (*cond_<optab><mode>_3_relaxed): Likewise. (*cond_<optab><mode>_any_relaxed): Likwise. (*cond_<optab><mode>_any_const_relaxed): Extend from SVE_FULL_F to SVE_F. (*cond_add<mode>_2_const_relaxed): Likewise. (*cond_add<mode>_any_const_relaxed): Likewise. (*cond_sub<mode>_3_const_relaxed): Likewise. (*cond_sub<mode>_const_relaxed): Likewise. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/unpacked_cond_binary_bf16_1.C: New test. * gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fadd_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fdiv_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmaxnm_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fminnm_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmul_1.c: Likewise.. * gcc.target/aarch64/sve/unpacked_cond_fsubr_1.c: Likewise.