Age | Commit message (Collapse) | Author | Files | Lines |
|
PR rtl-optimization 121340
gcc/
* config/avr/avr.opt.urls (-mfuse-move2): Add url.
|
|
gcc/
* config/avr/avr.cc (avr_output_addr_vec) <labl>: Asm out its .type.
|
|
insn combine.
Insn combine may come up with superfluous reg-reg moves, where the combine
people say that these are no problem since reg-alloc is supposed to optimize
them. The issue is that the lower-subreg pass sitting between combine and
reg-alloc may split such moves, coming up with a zoo of subregs which are
only handled poorly by the register allocator.
This patch adds a new avr mini-pass that handles such cases.
As an example, take
int f_ffssi (long x)
{
return __builtin_ffsl (x);
}
where the two functions have the same interface, i.e. there are no extra
moves required for the argument or for the return value. However,
$ avr-gcc -S -Os -dp -mno-fuse-move ...
f_ffssi:
mov r20,r22 ; 29 [c=4 l=1] movqi_insn/0
mov r21,r23 ; 30 [c=4 l=1] movqi_insn/0
mov r22,r24 ; 31 [c=4 l=1] movqi_insn/0
mov r23,r25 ; 32 [c=4 l=1] movqi_insn/0
mov r25,r23 ; 33 [c=4 l=4] *movsi/0
mov r24,r22
mov r23,r21
mov r22,r20
rcall __ffssi2 ; 34 [c=16 l=1] *ffssihi2.libgcc
ret ; 37 [c=0 l=1] return
where all the moves add up to a no-op. The -mno-fuse-move option
stops any attempts by the avr backend to clean up that mess.
PR rtl-optimization/121340
gcc/
* config/avr/avr.opt (-mfuse-move2): New option.
* config/avr/avr-passes.def (avr_pass_2moves): Insert after combine.
* config/avr/avr-passes.cc (make_avr_pass_2moves): New function.
(pass_data avr_pass_data_2moves): New static variable.
(avr_pass_2moves): New rtl_opt_pass.
* config/avr/avr-protos.h (make_avr_pass_2moves): New proto.
* common/config/avr/avr-common.cc
(default_options avr_option_optimization_table) <-mfuse-move2>:
Set for -O1 and higher.
* doc/invoke.texi (AVR Options) <-mfuse-move2>: Document.
|
|
Update FMV features to latest ACLE spec of 2024Q4 - several features have been
removed or merged. Add FMV support for CSSC and MOPS. Preserve the ordering
in enum CPUFeatures.
gcc:
* common/config/aarch64/cpuinfo.h: Remove unused features, add FEAT_CSSC
and FEAT_MOPS.
* config/aarch64/aarch64-option-extensions.def: Remove FMV support
for RPRES, use PULL rather than AES, add FMV support for CSSC and MOPS.
libgcc:
* config/aarch64/cpuinfo.c (__init_cpu_features_constructor):
Remove unused features, add support for CSSC and MOPS.
|
|
Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero
for these.
gcc:
* config/aarch64/tuning_models/generic_armv9_a.h
(generic_armv9_a_addrcost_table): Use zero cost for himode.
|
|
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_handle_cconv_attribute):
Fix typo.
|
|
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_handle_cconv_attribute):
Handle simultaneous use of regparm and thiscall attributes in
case when regparm is set before thiscall.
gcc/testsuite/ChangeLog:
* gcc.target/i386/attributes-error.c: Add more attributes
combinations.
|
|
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_handle_cconv_attribute):
Fix comments which state that combination of stdcall and fastcall
attributes is valid but redundant.
|
|
The regparm attribute does not affect code generation on x86-64 target.
Despite this, regparm was accepted silently, unlike other calling
convention attributes handled in the ix86_handle_cconv_attribute
function.
Due to lack of diagnostics, Linux kernel attempted to specify regparm(0)
on vmread_error_trampoline declaration, which is supposed to be invoked
with all arguments on stack:
https://lore.kernel.org/all/20220928232015.745948-1-seanjc@google.com/
To produce a warning for regparm in 64-bit mode, simply move the block
that produces diagnostics above the block that handles the regparm
attribute.
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_handle_cconv_attribute):
Move 64-bit mode check before regparm handling.
gcc/testsuite/ChangeLog:
* g++.dg/abi/regparm1.C: Require ia32 target.
* gcc.target/i386/20020224-1.c: Likewise.
* gcc.target/i386/pr103785.c: Use regparm attribute only if
not in 64-bit mode.
* gcc.target/i386/pr36533.c: Likewise.
* gcc.target/i386/pr59099.c: Likewise.
* gcc.target/i386/sibcall-8.c: Likewise.
* gcc.target/i386/sw-1.c: Likewise.
* gcc.target/i386/pr15184-2.c: Fix invalid comment.
* gcc.target/i386/attributes-ignore.c: New test.
|
|
This optional header is used to bring in the definition of the
struct __ifunc_arg_t type. Since it has been added to glibc only
recently, the previous implementation had to check whether this
header is present and, if not, it provide its own definition.
This creates dead code because either one of these two parts would
not be tested. The ABI specification for ifunc resolvers allows to
create own ABI-compatible definition for this type, which is the
right way of doing it.
In addition to improving consistency, the new approach also helps
with addition of new fields to struct __ifunc_arg_t type without
the need to work-around situations when the definition imported
from the header lacks these new fields.
ABI allows to define as many hwcap fields in this struct as needed,
provided that at runtime we only access the fields that are permitted
by the _size value.
gcc/
* config/aarch64/aarch64.cc (build_ifunc_arg_type):
Add new fields _hwcap3 and _hwcap4.
libatomic/
* config/linux/aarch64/host-config.h (__ifunc_arg_t):
Remove sys/ifunc.h and add new fields _hwcap3 and _hwcap4.
libgcc/
* config/aarch64/cpuinfo.c (__ifunc_arg_t): Likewise.
(__init_cpu_features): obtain and assign values for the
fields _hwcap3 and _hwcap4.
(__init_cpu_features_constructor): check _size in the
arg argument.
|
|
While building GCC with --with-build-config=bootstrap-ubsan on
powerpc64le-unknown-linux-gnu, multiple UBSAN runtime errors were
encountered in rs6000.cc and rs6000.md due to undefined behavior
involving left shifts on negative values and shift exponents equal to
or exceeding the type width.
The issue was in bit pattern recognition code
(in can_be_rotated_to_negative_lis and can_be_built_by_li_and_rldic),
where signed values were shifted without handling negative inputs or
guarding against shift counts equal to the type width, causing UB.
The fix ensures shifts and rotations are done unsigned HOST_WIDE_INT,
and casting back only where needed (like for arithmetic right shifts)
with proper guards to prevent shift-by-64.
2025-07-31 Kishan Parmar <kishan@linux.ibm.com>
gcc:
PR target/118890
* config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): Avoid left
shift of negative value and guard shift count.
(can_be_built_by_li_and_rldic): Likewise.
(rs6000_emit_set_long_const): Likewise.
* config/rs6000/rs6000.md (splitter for plus into two 16-bit parts): Fix
UB from overflow in addition.
|
|
After removing STMT_VINFO_MEMORY_ACCESS_TYPE we now ICE when costing
for scalar stmts required in the epilog since the cost model tries
to pattern-match gathers (an earlier patch tried to improve this
by introducing stmt groups, but that was on hold due to negative
feedback). The following shot-cuts those attempts when node is NULL
as that then cannot be a vector stmt. Another possibility would be
to gate on vect_body, or restructure everything.
Note we now ensure that when m_costing_for_scalar node is NULL.
* config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype):
Check for node before dereferencing.
(aarch64_vector_costs::add_stmt_cost): Likewise.
|
|
Streaming-compatible functions can be compiled without SME enabled, but need
to use "SMSTART SM" and "SMSTOP SM" to temporarily switch into the streaming
state of a callee. These switches are conditional on the current mode being
opposite to the target mode, so no SME instructions are executed if SME is not
available.
However, in GAS, "SMSTART SM" and "SMSTOP SM" always require +sme. A call
from a streaming-compatible function, compiled without SME enabled, to a non
-streaming function will be rejected as:
Error: selected processor does not support `smstop sm'..
To work around this, we make use of the .inst directive to insert the literal
encodings of "SMSTART SM" and "SMSTOP SM".
gcc/ChangeLog:
PR target/121028
* config/aarch64/aarch64-sme.md (aarch64_smstart_sm): Use the .inst
directive if !TARGET_SME.
(aarch64_smstop_sm): Likewise.
gcc/testsuite/ChangeLog:
PR target/121028
* gcc.target/aarch64/sme/call_sm_switch_1.c: Tell check-function
-bodies not to ignore .inst directives, and replace the test for
"smstart sm" with one for it's encoding.
* gcc.target/aarch64/sme/call_sm_switch_11.c: Likewise.
* gcc.target/aarch64/sme/pr121028.c: New test.
|
|
This should be present only on SLP nodes now. The RISC-V changes
are mechanical along the line of the SLP_TREE_TYPE changes.
* tree-vectorizer.h (_stmt_vec_info::memory_access_type): Remove.
(STMT_VINFO_MEMORY_ACCESS_TYPE): Likewise.
(vect_mem_access_type): Likewise.
* tree-vect-stmts.cc (vectorizable_store): Do not set
STMT_VINFO_MEMORY_ACCESS_TYPE. Fix SLP_TREE_MEMORY_ACCESS_TYPE
usage.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Remove
checking of memory access type.
* config/riscv/riscv-vector-costs.cc (costs::compute_local_live_ranges):
Use SLP_TREE_MEMORY_ACCESS_TYPE.
(costs::need_additional_vector_vars_p): Likewise.
(segment_loadstore_group_size): Get SLP node as argument,
use SLP_TREE_MEMORY_ACCESS_TYPE.
(costs::adjust_stmt_cost): Pass down SLP node.
* config/aarch64/aarch64.cc (aarch64_ld234_st234_vectors): Use
SLP_TREE_MEMORY_ACCESS_TYPE instead of vect_mem_access_type.
(aarch64_detect_vector_stmt_subtype): Likewise.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Likewise.
|
|
2025-07-31 Jakub Jelinek <jakub@redhat.com>
* gimple-ssa-store-merging.cc (find_bswap_or_nop): Fix comment typos,
hanlde -> handle.
* config/i386/i386.cc (ix86_gimple_fold_builtin, ix86_rtx_costs):
Likewise.
* config/i386/i386-features.cc (remove_partial_avx_dependency):
Likewise.
* gcc.target/i386/apx-1.c (apx_hanlder): Rename to ...
(apx_handler): ... this.
* gcc.target/i386/uintr-2.c (UINTR_hanlder): Rename to ...
(UINTR_handler): ... this.
* gcc.target/i386/uintr-5.c (UINTR_hanlder): Rename to ...
(UINTR_handler): ... this.
|
|
We added H into canonical order before, but forgot to add it to
arch-canonicalize as well...
gcc/ChangeLog:
PR target/121312
* config/riscv/arch-canonicalize: Add H extension to the
canonical order.
|
|
The following factors out a worker that gets a mode argument
rather than a vectype argument. That makes a difference when
we hit the fallback in add_stmt_cost for scalar stmts where
vectype might be NULL and thus mode is derived from the scalar
stmt there. But ix86_builtin_vectorization_cost does not
have access to the stmt. So the patch instead dispatches
to the new ix86_default_vector_cost there, passing down the mode
we derived from the stmt.
This is to avoid regressions with a patch that makes even more
scalar stmt costings have a vectype passed.
* config/i386/i386.cc (ix86_default_vector_cost): Split
out from ...
(ix86_builtin_vectorization_cost): ... this and use
mode instead of vectype as argument.
(ix86_vector_costs::add_stmt_cost): Call
ix86_default_vector_cost instead of ix86_builtin_vectorization_cost.
|
|
gcc/ChangeLog:
PR target/117015
* config/s390/s390-protos.h (s390_expand_int_spaceship): New
function.
(s390_expand_fp_spaceship): New function.
* config/s390/s390.cc (s390_expand_int_spaceship): New function.
(s390_expand_fp_spaceship): New function.
* config/s390/s390.md (spaceship<mode>4): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/s390/spaceship-fp-1.c: New test.
* gcc.target/s390/spaceship-fp-2.c: New test.
* gcc.target/s390/spaceship-fp-3.c: New test.
* gcc.target/s390/spaceship-fp-4.c: New test.
* gcc.target/s390/spaceship-int-1.c: New test.
* gcc.target/s390/spaceship-int-2.c: New test.
* gcc.target/s390/spaceship-int-3.c: New test.
|
|
commit 4c80062d7b8c272e2e193b8074a8440dbb4fe588
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Sun May 25 07:40:29 2025 +0800
x86: Enable *mov<mode>_(and|or) only for -Oz
disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for
-Oz. But for legacy integer registers, the former is 4 bytes and the
latter is 3 bytes. Enable such transformation for -Oz.
gcc/
PR target/120427
* config/i386/i386.md (peephole2): Transform "movq $-1,reg" to
"pushq $-1; popq reg" for -Oz if reg is a legacy integer register.
gcc/testsuite/
PR target/120427
* gcc.target/i386/pr120427-5.c: New test.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.
This patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.
Co-authored-by: Julian Brown <julian@codesourcery.com>
gcc/
* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
documentation hook.
* doc/tm.texi: Regenerate.
* target.def (prefer_gather_scatter): Add target hook under vectorizer.
* hooks.cc (hook_bool_mode_int_unsigned_false): New function.
* hooks.h (hook_bool_mode_int_unsigned_false): New prototype.
* tree-vect-stmts.cc (vect_use_strided_gather_scatters_p): Add
parameters group_size and single_element_p, and rework to use
targetm.vectorize.prefer_gather_scatter.
(get_group_load_store_type): Move some of the condition into
vect_use_strided_gather_scatters_p.
* config/gcn/gcn.cc (gcn_prefer_gather_scatter): New function.
(TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook.
|
|
The optimization options are deliberately passed through to the LTO compiler,
but when the same mechanism is reused for offloading it ends up forcing the
host compiler settings onto the device compiler. Maybe this should be removed
completely, but this patch just fixes a few of them. In particular,
param_vect_partial_vector_usage is disabled by x86 and this really hurts amdgcn.
I also fixed an ambiguous else warning in the generated file by adding braces.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_option_override): Add note to set default for
param_vect_partial_vector_usage to "1".
* optc-save-gen.awk: Don't pass through options marked "NoOffload".
* params.opt (-param=vect-epilogues-nomask): Add NoOffload.
(-param=vect-partial-vector-usage): Likewise.
(-param=vect-inner-loop-cost-factor): Likewise.
|
|
Fixes the feature gating for the SME2+FAMINMAX intrinsics.
PR target/121300
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-sme.def (svamin/svamax): Fix
arch gating.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr121300.c: New test.
|
|
This patch extends the expander for fma, fnma, fms, and fnms to support
partial SVE FP modes.
We add the missing BF16 tests, which we can now trigger for having
implemented the conditional expander.
We also add tests for the 'merging with multiplicand' case, which this
expander canonicalizes (albeit under SVE_STRICT_GP).
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (@cond_<optab><mode>): Extend
to support partial FP modes.
(*cond_<optab><mode>_2_strict): Extend from SVE_FULL_F to SVE_F,
use aarch64_predicate_operand.
(*cond_<optab><mode>_4_strict): Extend from SVE_FULL_F_B16B16 to
SVE_F_B16B16, use aarch64_predicate_operand.
(*cond_<optab><mode>_any_strict): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: Add test cases
for merging with multiplcand.
* gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmla_2.c: New test.
* gcc.target/aarch64/sve/unpacked_cond_fmls_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmla_2.c: Likewise..
* gcc.target/aarch64/sve/unpacked_cond_fnmls_2.c: Likewise.
* g++.target/aarch64/sve/unpacked_cond_ternary_bf16_1.C: Likewise.
* g++.target/aarch64/sve/unpacked_cond_ternary_bf16_2.C: Likewise.
|
|
Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/
SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is
SVE_RELAXED_GP.
We can only reliably test the 'merging with the third input' (addend)
and 'independent value' patterns at this stage as the canocalisation that
reorders the multiplicands based on the second SEL input would be performed
by the conditional expander.
Another difficulty is that we can't test these fused multiply/SEL combines
without using __builtin_fma and friends. The reason for this is as follows:
We support COND_ADD, COND_SUB, and COND_MUL optabs, so match.pd will
canonicalize patterns like ADD/SUB/MUL combined with a VEC_COND_EXPR into
these conditional forms. Later, when widening_mul tries to fold these into
conditional fused multiply operations, the transformation fails - simply
because we haven’t implemented those conditional fused multiply optabs yet.
Hence why this patch lacks tests for BFloat16...
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_relaxed):
Extend from SVE_FULL_F to SVE_F.
(*cond_<optab><mode>_4_relaxed): Extend from SVE_FULL_F_B16B16
to SVE_F_B16B16.
(*cond_<optab><mode>_any_relaxed): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/unpacked_cond_fmla_1.c: New test.
* gcc.target/aarch64/sve/unpacked_cond_fmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmla_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fnmls_1.c: Likewise.
|
|
This patch extends the expander for unconditional fma, fnma, fms, and
fnms, so that it supports partial SVE FP modes.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (<optab><mode>4): Extend from
SVE_FULL_F_B16B16 to SVE_F_B16B16. Use aarch64_sve_fp_pred instead
of aarch64_ptrue_reg.
(@aarch64_pred_<optab><mode>): Extend from SVE_FULL_F_B16B16 to
SVE_F_B16B16. Use aarch64_predicate_operand.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/unpacked_ternary_bf16_1.C: New test.
* g++.target/aarch64/sve/unpacked_ternary_bf16_2.C: Likewise.
* gcc.target/aarch64/sve/unpacked_fmla_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fmla_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fmls_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fnmla_1.c: Likeiwse.
* gcc.target/aarch64/sve/unpacked_fnmla_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fnmls_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_fnmls_2.c: Likewise.
|
|
It's needed by avx5124vnniw/avx5124fmaps which have been removed by
r15-656-ge1a7e2c54d52d0.
gcc/ChangeLog:
* config/i386/i386-modes.def: Remove VECTOR_MODES(FLOAT, 256)
and VECTOR_MODE (INT, SI, 64).
* config/i386/i386.cc (ix86_hard_regno_nregs): Remove related
code for V64SF/V64SImode.
|
|
r14-1902-g96c3539f2a3813 split TImode move with 2 DImode move, it's
supposed to optimize TImode in parameter/return since accoring to
psABI it's stored into 2 general registers.
But when TImode is not in parameter/return, it could create redundancy
in the PR.
The patch add a splitter to handle that.
.i.e.
(insn 10 9 14 2 (set (subreg:V2DI (reg:V4SI 98 [ <retval> ]) 0)
(vec_concat:V2DI (subreg:DI (reg:TI 101) 0)
(subreg:DI (reg:TI 101) 8)))
8442 {vec_concatv2di}
(expr_list:REG_DEAD (reg:TI 101)
gcc/ChangeLog:
PR target/121274
* config/i386/sse.md (*vec_concatv2di_0): Add a splitter
before it.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr121274.c: New test.
|
|
This patch extends the expander for conditional smax, smin, add, sub, mul,
min, max, and div to support partial SVE FP modes.
If exceptions from undefined vector elements must be suppressed, this
expansion converts the container-level predicate to an element-level one, and
ensures that these elements are inactive for the operation. In practice, this
is a predicate AND with the existing mask and a container-size PTRUE.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_sve_emit_masked_fp_pred):
Declare.
* config/aarch64/aarch64-sve.md (and<mode>3): Change this to...
(@and<mode>3): ...this, so that we can use gen_and3.
(@cond_<optab><mode>): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16,
use aarch64_predicate_operand.
(*cond_<optab><mode>_2_strict): Likewise.
(*cond_<optab><mode>_3_strict): Likewise.
(*cond_<optab><mode>_any_strict): Likwise.
(*cond_<optab><mode>_2_const_strict): Extend from SVE_FULL_F to SVE_F,
use aarch64_predicate_operand.
(*cond_<optab><mode>_any_const_strict): Likewise.
(*cond_sub<mode>_3_const_strict): Likwise.
(*cond_sub<mode>_const_strict): Likewise.
(*vcond_mask_<mode><vpred>): Use aarch64_predicate_operand, and update
the comment here.
* config/aarch64/aarch64.cc (aarch64_sve_emit_masked_fp_pred): New
function. Helper to mask the predicate in conditional expanders.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/unpacked_cond_binary_bf16_2.C: New test.
* gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fadd_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fdiv_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmaxnm_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fminnm_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmul_2.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fsubr_2.c: Likewise.
|
|
Automatically generate -mcpu and -mtune options in invoke.texi from
the unified riscv-cores.def metadata, ensuring documentation stays in sync
with definitions and reducing manual maintenance.
gcc/ChangeLog:
* Makefile.in: Add riscv-mcpu.texi and riscv-mtune.texi to the list
of files to be processed by the Texinfo generator.
* config/riscv/t-riscv: Add rule for generating riscv-mcpu.texi
and riscv-mtune.texi.
* doc/invoke.texi: Replace hand‑written extension table with
`@include riscv-mcpu.texi` and `@include riscv-mtune.texi` to
pull in auto‑generated entries.
* config/riscv/gen-riscv-mcpu-texi.cc: New file.
* config/riscv/gen-riscv-mtune-texi.cc: New file.
* doc/riscv-mcpu.texi: New file.
* doc/riscv-mtune.texi: New file.
|
|
This patch adds a new rule for distributing lowpart subregs through
ANDs, IORs, and XORs with a constant, in cases where one of the terms
then disappears. For example:
(lowart-subreg:QI (and:HI x 0x100))
simplifies to zero and
(lowart-subreg:QI (and:HI x 0xff))
simplifies to (lowart-subreg:QI x).
This would often be handled at some point using nonzero bits. However,
the specific case I want the optimisation for is SVE predicates,
where nonzero bit tracking isn't currently an option. Specifically:
the predicate modes VNx8BI, VNx4BI and VNx2BI have the same size as
VNx16BI, but treat only every second, fourth, or eighth bit as
significant. Thus if we have:
(subreg:VNx8BI (and:VNx16BI x C))
where C is the repeating constant { 1, 0, 1, 0, ... }, then the
AND only clears bits that are made insignificant by the subreg,
and so the result is equal to (subreg:VNx8BI x). Later patches
rely on this.
gcc/
* simplify-rtx.cc (simplify_context::simplify_subreg): Distribute
lowpart subregs through AND/IOR/XOR, if doing so eliminates one
of the terms.
(test_scalar_int_ext_ops): Add some tests of the above for integers.
* config/aarch64/aarch64.cc (aarch64_test_sve_folding): Likewise
add tests for predicate modes.
|
|
function_expander::get_reg_target didn't actually check for a register,
meaning that it could return a memory target instead. That doesn't
really matter for the current direct and indirect uses (svundef*,
svcreate*, and svset*) but it will for later patches.
gcc/
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::get_reg_target): Check whether the target
is a valid register_operand.
|
|
Converting from generic AS to __flashx used the same rule like
for __memx, which tags RAM (generic AS) locations by setting bit 23.
The justification was that generic isn't a subset of __flashx, though
that lead to surprises with code like const __flashx *x = NULL.
The natural thing to do is to just load 0x000000 in that case,
so that the null pointer works in __flashx as expected.
Apart from that, converting NULL to __flashx (or __flash) no more
raises a -Waddr-space-convert diagnostic.
gcc/
PR target/121277
* config/avr/avr.cc (avr_addr_space_convert): When converting
from generic AS to __flashx, don't set bit 23.
(avr_convert_to_type): Don't -Waddr-space-convert when NULL
is converted to __flashx or to __flash.
|
|
__tls_get_addr doesn't preserve vector registers. When a function
with no_caller_saved_registers attribute calls __tls_get_addr, YMM
and ZMM registers will be clobbered. Issue an error and suggest
-mtls-dialect=gnu2 in this case.
gcc/
PR target/121208
* config/i386/i386.cc (ix86_tls_get_addr): Issue an error for
-mtls-dialect=gnu with no_caller_saved_registers attribute and
suggest -mtls-dialect=gnu2.
gcc/testsuite/
PR target/121208
* gcc.target/i386/pr121208-1a.c: New test.
* gcc.target/i386/pr121208-1b.c: Likewise.
* gcc.target/i386/pr121208-2a.c: Likewise.
* gcc.target/i386/pr121208-2b.c: Likewise.
* gcc.target/i386/pr121208-3a.c: Likewise.
* gcc.target/i386/pr121208-3b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Usage of the -march-map=: "Select the closest available '-march=' value
that is not more capable."
As PTX ISA 8.6/8.7 (= unreleased CUDA 12.7 + CUDA 12.8) added the
Nvidia Blackwell GPUs SM_100, SM_101, and SM_120, it makes sense to
add them as well. Note that all three come as sm_XXX and sm_XXXa.
PTX ISA 8.8 (CUDA 12.9) added SM_103 and SM_121 and the new 'f' suffix
for all SM_1xx.
Internally, GCC currently generates the same code for >= sm_80 (Ampere);
however, as GCC's -march= also supports sm_89 (Ada), the here added
sm_1xxs (Blackwell) will map to sm_89.
[Naming note: while ptx code generated for sm_X can also run with sm_Y
if Y > X, code generated for sm_XXXa can (generally) only run on
the specific hardware; and sm_XXXf implies compatibility with only
subsequent targets in the same family.]
gcc/ChangeLog:
* config/nvptx/nvptx.opt (march-map=): Add sm_100{,f,a},
sm_101{,f,a}, sm_103{,a,f}, sm_120{,a,f} and sm_121{,f,a}.
|
|
For device (agent) scope atomics - as needed when there is more than one teams,
a buffer_wbl2 followed by s_waitcnt is required. When doing the initial porting,
the pre-atomic instruction got accidentally replaced by buffer_inv sc1, which is
not quite the right instruction.
gcc/ChangeLog:
* config/gcn/gcn.md (atomic_load, atomic_store, atomic_exchange):
Fix CDNA3 L2 cache write-back before atomic instructions.
|
|
Implement another case where the CDNA3 ISA documentation requires s_nop,
add a comment why another case does not need to be handled. And add one
case where an s_nop is required by MI300A hardware but seems to be not
mentioned in the CDNA3 ISA documentation.
gcc/ChangeLog:
* config/gcn/gcn.md (define_attr "vcmp"): Add with values
vcmp/vcmpx/no.
(*movbi, cstoredi4.., cstore<mode>4): Set it.
* config/gcn/gcn-valu.md (vec_cmp<mode>...): Likewise.
* config/gcn/gcn.cc (gcn_cmpx_insn_p): Remove.
(gcn_md_reorg): Add two new conditions for MI300.
|
|
Use 's_nops' with a number instead of multiple of 's_nop' when
manually adding 1 to 5 wait state. This helps with
the instruction cache and helps a tiny bit with PR119367 where
a two-byte variable overflows in the debugging location view handling.
Add a comment about 'sc0' to TARGET_GLC_NAME as for atomics it is
unrelated to the scope but to whether the result is stored; i.e.
using e.g. 'sc1' instead of 'sc0' will have undesired consequences!
Update the comment above print_operand_address to document 'R' and 'V';
those are used below as "Temporary hack.", but it makes sense to see
them in the list.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (enum hsaco_attr_type): Add comment
about 'sc0'.
* config/gcn/gcn.cc (gcn_md_reorg): Use gen_nops instead of gen_nop.
(print_operand_address): Document 'R' and 'V' in the
pre-function comment as well.
* config/gcn/gcn.md (nops): Add.
|
|
I am at a point where I want to store additional information from
analysis (from loads and stores) to re-use them at transform stage
without repeating the analysis. I do not want to add to
stmt_vec_info at this point, so this starts adding kind specific
sub-structures by moving the STMT_VINFO_TYPE field to the SLP
tree and adding a (dummy for now) union tagged by it to receive
such data.
The change is largely mechanical after RISC-V has been prepared
to have a SLP node around.
I have settled for a union (supposed to get pointers to data).
As followup this enables getting rid of SLP_TREE_CODE and making
VEC_PERM therein a separate type, unifying its handling.
* tree-vectorizer.h (_slp_tree::type): Add.
(_slp_tree::u): Likewise.
(_stmt_vec_info::type): Remove.
(STMT_VINFO_TYPE): Likewise.
(SLP_TREE_TYPE): New.
* tree-vectorizer.cc (vec_info::new_stmt_vec_info): Do not
initialize type.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize type.
(vect_slp_analyze_node_operations): Adjust.
(vect_schedule_slp_node): Likewise.
* tree-vect-patterns.cc (vect_init_pattern_stmt): Do not
copy STMT_VINFO_TYPE.
* tree-vect-loop.cc: Set SLP_TREE_TYPE instead of
STMT_VINFO_TYPE everywhere.
(vect_create_loop_vinfo): Do not set STMT_VINFO_TYPE on
loop conditions.
* tree-vect-stmts.cc: Set SLP_TREE_TYPE instead of
STMT_VINFO_TYPE everywhere.
(vect_analyze_stmt): Adjust.
(vect_transform_stmt): Likewise.
* config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops):
Access SLP_TREE_TYPE instead of STMT_VINFO_TYPE.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Remove non-SLP element-wise load/store matching.
* config/rs6000/rs6000.cc
(rs6000_cost_data::update_target_cost_per_stmt): Pass in
the SLP node. Use that to get at the memory access
kind and type.
(rs6000_cost_data::add_stmt_cost): Pass down SLP node.
* config/riscv/riscv-vector-costs.cc (variable_vectorized_p):
Use SLP_TREE_TYPE.
(costs::need_additional_vector_vars_p): Likewise.
(costs::update_local_live_ranges): Likewise.
|
|
This patch adds a new tuning model for the NVIDIA Olympus core.
The values used here are based on the Software Optimization Guide
that will be published imminently.
Bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for trunk?
OK to backport to GCC 15?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
Co-Authored-By: Dhruv Chawla <dhruvc@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (olympus): Use olympus tuning
model.
* config/aarch64/aarch64.cc: Include olympus.h.
* config/aarch64/tuning_models/olympus.h: New file.
|
|
On LoongArch, the switch jump-table always stores absolute
addresses, so there is no need to define the macro
CASE_VECTOR_SHORTEN_MODE.
gcc/ChangeLog:
* config/loongarch/loongarch.h
(CASE_VECTOR_SHORTEN_MODE): Delete.
|
|
The previous fix also had some flaws:
- The TARGET_CONST16 check was a bit premature
- It didn't take into account the possibility of the RTL expression
"(set (reg:SF gpr) (const_int))", especially when TARGET_AUTOLITPOOLS is
configured
This patch fixes the above.
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_is_insn_L32R_p):
Re-rewrite to more accurately capture insns that could be L32R machine
instructions wherever possible, and add comments that help understand
the intent of the process.
|
|
This patch would like to combine the vec_duplicate + vaadd.vv to the
vaadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_AVG_FLOOR(NT, WT) \
NT \
test_##NT##_avg_floor(NT x, NT y) \
{ \
return (NT)(((WT)x + (WT)y) >> 1); \
}
#define AVG_FLOOR_FUNC(T) test_##T##_avg_floor
DEF_AVG_FLOOR(int32_t, int64_t)
DEF_VX_BINARY_CASE_2_WRAP(T, AVG_FLOOR_FUNC(T), avg_floor)
Before this patch:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
14 │ slli a3,a3,32
15 │ srli a3,a3,32
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
18 │ vle32.v v1,0(a1)
19 │ slli a4,a5,2
20 │ sub a3,a3,a5
21 │ add a1,a1,a4
22 │ vaadd.vv v1,v1,v2
23 │ vse32.v v1,0(a0)
24 │ add a0,a0,a4
25 │ bne a3,zero,.L3
After this patch:
11 │ beq a3,zero,.L8
12 │ slli a3,a3,32
13 │ srli a3,a3,32
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
16 │ vle32.v v1,0(a1)
17 │ slli a4,a5,2
18 │ sub a3,a3,a5
19 │ add a1,a1,a4
20 │ vaadd.vx v1,v1,a2
21 │ vse32.v v1,0(a0)
22 │ add a0,a0,a4
23 │ bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vx_binary_vxrm_vec_vec_dup):
Add new case UNSPEC_VAADD.
(expand_vx_binary_vxrm_vec_dup_vec): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new case UNSPEC_VAADD to
iterator.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
The riscv-c-api-doc defines a group ID and and a bit position for some
extension. Most of them are set in riscv-ext.def, but some are missing
and one bit position (for Zilsd) is wrong.
This patch replaces the `BITMASK_NOT_YET_ALLOCATED` value for the actual
allocated value wherever possible and fixes the bit position for Zilsd.
Currently, we don't have any infrastructure to utilize the information
that is placed into riscv_ext_info_t::m_bitmask_group_id and
riscv_ext_info_t::m_bitmask_group_bit_pos. This also means we can't
test.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: Add allocated group IDs and
group bit positions.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
No functional change intended.
gcc/ChangeLog:
* Makefile.in: Replace diagnostic.def with diagnostics/kinds.def.
* config/aarch64/aarch64.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* config/i386/i386-options.cc: Likewise.
* config/s390/s390.cc: Likewise.
* diagnostic-core.h: Replace typedef diagnostic_t with
enum class diagnostics::kind in diagnostics/kinds.h and include
it.
* diagnostic-global-context.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* diagnostic.cc: Likewise.
* diagnostic.h: Likewise.
* diagnostics/buffering.cc: Likewise.
* diagnostics/buffering.h: Likewise.
* diagnostics/context.h: Likewise.
* diagnostics/diagnostic-info.h: Likewise.
* diagnostics/html-sink.cc: Likewise.
* diagnostic.def: Move to...
* diagnostics/kinds.def: ...here and update for diagnostic_t
becoming enum class diagnostics::kind.
* diagnostics/kinds.h: New file, based on material in
diagnostic-core.h.
* diagnostics/lazy-paths.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* diagnostics/option-classifier.cc: Likewise.
* diagnostics/option-classifier.h: Likewise.
* diagnostics/output-spec.h: Likewise.
* diagnostics/paths-output.cc: Likewise.
* diagnostics/sarif-sink.cc: Likewise.
* diagnostics/selftest-context.cc: Likewise.
* diagnostics/selftest-context.h: Likewise.
* diagnostics/sink.h: Likewise.
* diagnostics/source-printing.cc: Likewise.
* diagnostics/text-sink.cc: Likewise.
* diagnostics/text-sink.h: Likewise.
* gcc.cc: Likewise.
* libgdiagnostics.cc: Likewise.
* lto-wrapper.cc: Likewise.
* opts-common.cc: Likewise.
* opts-diagnostic.h: Likewise.
* opts.cc: Likewise.
* rtl-error.cc: Likewise.
* substring-locations.cc: Likewise.
* toplev.cc: Likewise.
gcc/ada/ChangeLog:
* gcc-interface/trans.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
gcc/analyzer/ChangeLog:
* pending-diagnostic.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* program-point.cc: Likewise.
gcc/c-family/ChangeLog:
* c-common.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* c-format.cc: Likewise.
* c-lex.cc: Likewise.
* c-opts.cc: Likewise.
* c-pragma.cc: Likewise.
* c-warn.cc: Likewise.
gcc/c/ChangeLog:
* c-errors.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* c-parser.cc: Likewise.
* c-typeck.cc: Likewise.
gcc/cobol/ChangeLog:
* util.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
gcc/cp/ChangeLog:
* call.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* constexpr.cc: Likewise.
* cp-tree.h: Likewise.
* decl.cc: Likewise.
* error.cc: Likewise.
* init.cc: Likewise.
* method.cc: Likewise.
* module.cc: Likewise.
* parser.cc: Likewise.
* pt.cc: Likewise.
* semantics.cc: Likewise.
* typeck.cc: Likewise.
* typeck2.cc: Likewise.
gcc/d/ChangeLog:
* d-diagnostic.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
gcc/fortran/ChangeLog:
* cpp.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* error.cc: Likewise.
* options.cc: Likewise.
gcc/jit/ChangeLog:
* dummy-frontend.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
gcc/m2/ChangeLog:
* gm2-gcc/m2linemap.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* gm2-gcc/rtegraph.cc: Likewise.
gcc/rust/ChangeLog:
* backend/rust-tree.cc: Update for diagnostic_t becoming
enum class diagnostics::kind.
* backend/rust-tree.h: Likewise.
* resolve/rust-ast-resolve-expr.cc: Likewise.
* resolve/rust-ice-finalizer.cc: Likewise.
* resolve/rust-ice-finalizer.h: Likewise.
* resolve/rust-late-name-resolver-2.0.cc: Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.cc: Update for
diagnostic_t becoming enum class diagnostics::kind.
* gcc.dg/plugin/expensive_selftests_plugin.cc: Likewise.
* gcc.dg/plugin/location_overflow_plugin.cc: Likewise.
* lib/gcc-dg.exp: Likewise.
libcpp/ChangeLog:
* internal.h: Update comment for diagnostic_t becoming
enum class diagnostics::kind.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
This patch prepares the dynamic LMUL vector costing to use the coming
SLP_TREE_TYPE instead of the (to-be-removed) STMT_VINFO_TYPE.
Even though the whole approach should be reviewed and adjusted at some
point, the patch chooses the path of least resistance and uses a hash
map for the stmt_info -> slp node relationship. A node is mapped to the
accompanying stmt_info during add_stmt_cost. In finish_cost we go
through all statements as before, and obtain the corresponding slp nodes
as well as their types.
This allows us to operate largely as before. We don't yet do the switch
over from STMT_VINFO_TYPE to SLP_TREE_TYPE, though but only take care
of the necessary refactoring upfront.
Regtested on rv64gcv_zvl512b with -mrvv-max-lmul=dynamic. There are a
few regressions but nothing worse than what we already have. I'd rather
accept these now and take it as an incentive to work on the heuristic
later than block the SLP work until it is fixed.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (get_live_range):
Move compute_local_program_points to cost class.
(variable_vectorized_p): Add slp node parameter.
(need_additional_vector_vars_p): Move from here...
(costs::need_additional_vector_vars_p): ... to here and add slp
parameter.
(compute_estimated_lmul): Move update_local_live_ranges to cost
class.
(has_unexpected_spills_p): Move from here...
(costs::has_unexpected_spills_p): ... to here.
(costs::record_lmul_spills): New function.
(costs::add_stmt_cost): Add stmt_info, slp mapping.
(costs::finish_cost): Analyze loop.
* config/riscv/riscv-vector-costs.h: Move declarations to class.
|
|
There was once a RISC-V extension draft ("N"), which introduced
user-level interrupts. However, it was never ratified and the
specification draft has been removed from the RISC-V ISA manual
in commit `b6cade07034` with the comment "it'll likely need to
be redesigned".
Support for a N extension never made it to GCC, but we support
fuction attributes for user-level interrupt handlers that use
the URET instruction.
The "user" interrupt attribute was documented in the RISC-V C API,
but has been removed in PR #106 in May 2025 (driven by LLVM devs/
maintainers and ack'ed by at least one GCC maintainer).
Let's drop URET support from GCC as well.
gcc/ChangeLog:
* config/riscv/riscv.cc (enum riscv_privilege_levels): Remove USER_MODE.
(riscv_handle_type_attribute): Remove "user" interrupts.
(riscv_expand_epilogue): Likewise.
(riscv_get_interrupt_type): Likewise.
* config/riscv/riscv.md (riscv_uret): Remove URET pattern.
* doc/extend.texi: Remove documentation of user interrupts.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/interrupt-conflict-mode.c: Remove "user"
interrupts.
* gcc.target/riscv/xtheadint-push-pop.c: Likewise.
* gcc.target/riscv/interrupt-umode.c: Removed.
Reported-by: Sam Elliott <quic_aelliott@quicinc.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
MI300 requires some additional s_nop to be added between some instructions.
* As 'v_readlane' and 'v_writelane' have to be distinguished, the
'laneselect' attribute was changed from no/yes to no/read/write.
* Add some missing 'laneselect' attributes for v_(read,write)lane.
* Replace 'delayeduse' by 'flatmemaccess' which is more explicit,
especially as some uses have to destinguished more details.
(Alongside, one off-by-two delayeduse has been fixed.)
On the other hand, RDNA 2, 3, and 3.5 do not require any added s_nop;
thus, there is no need to walk the instructions for them to insert
pointless S_NOP. (RDNA4 (not yet in GCC) requires it in a few cases.)
gcc/ChangeLog:
* config/gcn/gcn-opts.h (TARGET_NO_MANUAL_NOPS,
TARGET_CDNA3_NOPS): Define.
* config/gcn/gcn.md (define_attr "laneselect): Change 'yes' to
'read' and 'write'.
(define_attr "flatmemaccess"): Add with values store, storex34,
load, atomic, atomicwait, cmpswapx2, and no. Replacing ...
(define_attr "delayeduse"): Remove.
(define_attr "transop"): Add with values yes and no.
(various insns): Update 'laneselect', add flatmemaccess and transop,
remove delayeduse; fixing an issue for s_load_dwordx4 vs.
flat_store_dwordx4 related to delayeduse (now: flatmemaccess).
* config/gcn/gcn-valu.md: Update laneselect attribute and add
flatmemaccess.
* config/gcn/gcn.cc (gcn_cmpx_insn_p): New.
(gcn_md_reorg): Update for MI300 to add additional s_nop.
Skip s_nop-insertion part for RDNA{2,3}; add "VALU writes EXEC
followed by VALU DPP" unconditionally for CDNA2/CDNA3/GCN5.
|
|
The Smrnmi extension introduces the nmret instruction to return from RNMI
handlers. We already have basic Smrnmi support. This patch introduces
support for the nmret instruction and the ability to set the function
attribute `__attribute__ ((interrupt ("rnmi")))` to let the compiler
generate RNMI handlers.
The attribute name is proposed in a PR for the RISC C API and approved
by LLVM maintainers:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/116
gcc/ChangeLog:
* config/riscv/riscv.cc (enum riscv_privilege_levels): Add
RNMI_MODE.
(riscv_handle_type_attribute): Handle 'rnmi' interrupt attribute.
(riscv_expand_epilogue): Generate nmret for RNMI handlers.
(riscv_get_interrupt_type): Handle 'rnmi' interrupt attribute.
* config/riscv/riscv.md (riscv_rnmi): Add nmret INSN.
* doc/extend.texi: Add documentation for 'rnmi' interrupt attribute.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/interrupt-rnmi.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
This patch adds an is_gather_scatter argument to the
support_vector_misalignment hook. All targets but riscv do not care
about alignment for gather/scatter so return true for is_gather_scatter.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_builtin_support_vector_misalignment):
Return true for gather/scatter.
* config/arm/arm.cc (arm_builtin_support_vector_misalignment):
Ditto.
* config/epiphany/epiphany.cc (epiphany_support_vector_misalignment):
Ditto.
* config/gcn/gcn.cc (gcn_vectorize_support_vector_misalignment):
Ditto.
* config/loongarch/loongarch.cc (loongarch_builtin_support_vector_misalignment):
Ditto.
* config/riscv/riscv.cc (riscv_support_vector_misalignment):
Add gather/scatter argument.
* config/rs6000/rs6000.cc (rs6000_builtin_support_vector_misalignment):
Return true for gather/scatter.
* config/s390/s390.cc (s390_support_vector_misalignment):
Ditto.
* doc/tm.texi: Add argument.
* target.def: Ditto.
* targhooks.cc (default_builtin_support_vector_misalignment):
Ditto.
* targhooks.h (default_builtin_support_vector_misalignment):
Ditto.
* tree-vect-data-refs.cc (vect_supportable_dr_alignment):
Ditto.
|
|
Extend the binary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/
SVE_FULL_F_B16B16 to SVE_F/SVE_F_B16B16, where the strictness value
is SVE_RELAXED_GP.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_relaxed):
Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16.
(*cond_<optab><mode>_3_relaxed): Likewise.
(*cond_<optab><mode>_any_relaxed): Likwise.
(*cond_<optab><mode>_any_const_relaxed): Extend from SVE_FULL_F
to SVE_F.
(*cond_add<mode>_2_const_relaxed): Likewise.
(*cond_add<mode>_any_const_relaxed): Likewise.
(*cond_sub<mode>_3_const_relaxed): Likewise.
(*cond_sub<mode>_const_relaxed): Likewise.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/unpacked_cond_binary_bf16_1.C: New test.
* gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fadd_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fdiv_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmaxnm_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fminnm_1.c: Likewise.
* gcc.target/aarch64/sve/unpacked_cond_fmul_1.c: Likewise..
* gcc.target/aarch64/sve/unpacked_cond_fsubr_1.c: Likewise.
|