Age | Commit message (Collapse) | Author | Files | Lines |
|
rl78 still uses reload rather than LRA.
epiphany still uses reload and causes ICEs during reload.
Both don't have a maintainer. epiphany has been without one since
2024 (2023 email) while rl78 has been without one since 2018.
gcc/ChangeLog:
* config.gcc: Mark epiphany*-*-* and rl78*-*-* as
obsolete targets.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
For TLS calls:
1. UNSPEC_TLS_GD:
(parallel [
(set (reg:DI 0 ax)
(call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
(const_int 0 [0])))
(unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
(reg/f:DI 7 sp)] UNSPEC_TLS_GD)
(clobber (reg:DI 5 di))])
2. UNSPEC_TLS_LD_BASE:
(parallel [
(set (reg:DI 0 ax)
(call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
(const_int 0 [0])))
(unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])
3. UNSPEC_TLSDESC:
(parallel [
(set (reg/f:DI 104)
(plus:DI (unspec:DI [
(symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
(reg:DI 114)
(reg/f:DI 7 sp)] UNSPEC_TLSDESC)
(const:DI (unspec:DI [
(symbol_ref:DI ("e") [flags 0x1a])
] UNSPEC_DTPOFF))))
(clobber (reg:CC 17 flags))])
(parallel [
(set (reg:DI 101)
(unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
(reg:DI 112)
(reg/f:DI 7 sp)] UNSPEC_TLSDESC))
(clobber (reg:CC 17 flags))])
they return the same value for the same input value. But multiple calls
with the same input value may be generated for simple programs like:
void a(long *);
int b(void);
void c(void);
static __thread long e;
long
d(void)
{
a(&e);
if (b())
c();
return e;
}
When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
generated:
.type d, @function
d:
.LFB0:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
leaq e@TLSDESC(%rip), %rbx
movq %rbx, %rax
call *e@TLSCALL(%rax)
addq %fs:0, %rax
movq %rax, %rdi
call a@PLT
call b@PLT
testl %eax, %eax
jne .L8
movq %rbx, %rax
call *e@TLSCALL(%rax)
popq %rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
movq %fs:(%rax), %rax
ret
.p2align 4,,10
.p2align 3
.L8:
.cfi_restore_state
call c@PLT
movq %rbx, %rax
call *e@TLSCALL(%rax)
popq %rbx
.cfi_def_cfa_offset 8
movq %fs:(%rax), %rax
ret
.cfi_endproc
There are 3 "call *e@TLSCALL(%rax)". They all return the same value.
Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
extend it to also remove redundant TLS calls to generate:
d:
.LFB0:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
leaq e@TLSDESC(%rip), %rax
movq %fs:0, %rdi
call *e@TLSCALL(%rax)
addq %rax, %rdi
movq %rax, %rbx
call a@PLT
call b@PLT
testl %eax, %eax
jne .L8
movq %fs:(%rbx), %rax
popq %rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L8:
.cfi_restore_state
call c@PLT
movq %fs:(%rbx), %rax
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
with only one "call *e@TLSCALL(%rax)". This reduces the number of
__tls_get_addr calls in libgcc.a by 72%:
__tls_get_addr calls before after
libgcc.a 868 243
gcc/
PR target/81501
* config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD,
X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC.
(redundant_load): Renamed to ...
(redundant_pattern): This.
(ix86_place_single_vector_set): Replace redundant_load with
redundant_pattern.
(replace_tls_call): New.
(ix86_place_single_tls_call): Likewise.
(pass_remove_redundant_vector_load): Renamed to ...
(pass_x86_cse): This. Add val, def_insn, mode, scalar_mode, kind,
x86_cse, candidate_gnu_tls_p, candidate_gnu2_tls_p and
candidate_vector_p.
(pass_x86_cse::candidate_gnu_tls_p): New.
(pass_x86_cse::candidate_gnu2_tls_p): Likewise.
(pass_x86_cse::candidate_vector_p): Likewise.
(remove_redundant_vector_load): Renamed to ...
(pass_x86_cse::x86_cse): This. Extend to remove redundant TLS
calls.
(make_pass_remove_redundant_vector_load): Renamed to ...
(make_pass_x86_cse): This.
* config/i386/i386-passes.def: Replace
pass_remove_redundant_vector_load with pass_x86_cse.
* config/i386/i386-protos.h (ix86_tls_get_addr): New.
(make_pass_remove_redundant_vector_load): Renamed to ...
(make_pass_x86_cse): This.
* config/i386/i386.cc (ix86_tls_get_addr): Remove static.
* config/i386/i386.h (machine_function): Add
tls_descriptor_call_multiple_p.
* config/i386/i386.md (tls64): New attribute.
(@tls_global_dynamic_64_<mode>): Set tls_descriptor_call_multiple_p.
(@tls_local_dynamic_base_64_<mode>): Likewise.
(@tls_dynamic_gnu2_64_<mode>): Likewise.
(*tls_global_dynamic_64_<mode>): Set tls64 attribute to gd.
(*tls_local_dynamic_base_64_<mode>): Set tls64 attribute to ld_base.
(*tls_dynamic_gnu2_lea_64_<mode>): Set tls64 attribute to lea.
(*tls_dynamic_gnu2_call_64_<mode>): Set tls64 attribute to call.
(*tls_dynamic_gnu2_combine_64_<mode>): Set tls64 attribute to
combine.
gcc/testsuite/
PR target/81501
* g++.target/i386/pr81501-1.C: New test.
* gcc.target/i386/pr81501-1a.c: Likewise.
* gcc.target/i386/pr81501-1b.c: Likewise.
* gcc.target/i386/pr81501-2a.c: Likewise.
* gcc.target/i386/pr81501-2b.c: Likewise.
* gcc.target/i386/pr81501-3.c: Likewise.
* gcc.target/i386/pr81501-4a.c: Likewise.
* gcc.target/i386/pr81501-4b.c: Likewise.
* gcc.target/i386/pr81501-5.c: Likewise.
* gcc.target/i386/pr81501-6a.c: Likewise.
* gcc.target/i386/pr81501-6b.c: Likewise.
* gcc.target/i386/pr81501-7.c: Likewise.
* gcc.target/i386/pr81501-8a.c: Likewise.
* gcc.target/i386/pr81501-8b.c: Likewise.
* gcc.target/i386/pr81501-9a.c: Likewise.
* gcc.target/i386/pr81501-9b.c: Likewise.
* gcc.target/i386/pr81501-10a.c: Likewise.
* gcc.target/i386/pr81501-10b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Newer linker support an option to disable deduplication of entities.
This speeds up linking and can improve debug experience. Adopting the
same criteria as clang in adding the option.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config.in: Regenerate.
* config/darwin.h (DARWIN_LD_NO_DEDUPLICATE): New.
(LINK_SPEC): Handle -no_deduplicate.
* configure: Regenerate.
* configure.ac: Detect linker support for -no_deduplicate.
|
|
The Darwin ABI uses a different section for string constants when
address sanitizing is enabled. This adds defintions of the asan-
specific sections and switches string constants to the correct
section.
It also makes the string constant symbols linker-visible when
asan is enabled, but not otherwise.
gcc/ChangeLog:
* config/darwin-sections.def (asan_string_section,
asan_globals_section, asan_liveness_section): New.
* config/darwin.cc (objc_method_decl): Use asan sections
when asan is enabled.
(darwin_encode_section_info): Alter string constant
linker visibility depending on asan.
(machopic_select_section): Use the asan sections when
asan is enabled.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/darwin-cfstring-3.c: Adjust for amended
string labels.
* g++.dg/torture/darwin-cfstring-3.C: Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
When we canonicalize the comparison for a czero sequence we need to handle both
integer and fp comparisons. Furthermore, within the integer space we want to
make sure we promote any sub-word objects to a full word.
All that is working fine. After promotion we then force the value into a
register if it is not a register or constant already. The idea is not to have
to special case subregs in subsequent code. This works fine except when we're
presented with a floating point object that would be a subword. (subreg:SF
(reg:SI)) on rv64 for example.
So this tightens up that force_reg step. Bootstapped and regression tested on
riscv64-linux-gnu and tested on riscv32-elf and riscv64-elf.
Pushing to the trunk after pre-commit verifies no regressions.
Jeff
PR target/121160
gcc/
* config/riscv/riscv.cc (canonicalize_comparands); Tighten check for
forcing value into a GPR.
gcc/testsuite/
* gcc.target/riscv/pr121160.c: New test.
|
|
This is the first step in handling the review part of:
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692091.html
'''
Oh, as we now do alias walks in forwprop maybe we should make this
conditional and do
this not for all pass instances, since it makes forwprop possibly a lot slower?
'''
The check of the limit was after the alias check which could slow down things.
This moves the check of the limit to begining of the if.
Bootstrapped and tested on x86_64-linux-gnu.
Pushed as obvious.
PR tree-optimization/121474
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_aggr_zeroprop): Move the check
for limit before the alias check.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
The new routine uses table lookups more effectively, and avoids __int128
arithmetic until necessary.
gcc/cobol/ChangeLog:
* genutil.cc (get_binary_value): Use the new routine.
libgcobol/ChangeLog:
* libgcobol.cc (get_binary_value_local): Use the new routine.
* stringbin.cc (int_from_string): Removed.
(__gg__packed_to_binary): Implement new routine.
* stringbin.h (__gg__packed_to_binary): Likewise.
|
|
gcc/cp/ChangeLog:
* lex.cc (init_operators): Fix typo.
|
|
The following wraps SLP_TREE_CODE checks against VEC_PERM_EXPR
(the only relevant code) in a new SLP_TREE_PERMUTE_P predicate.
Most places guard against SLP_TREE_REPRESENTATIVE being NULL.
* tree-vectorizer.h (SLP_TREE_PERMUTE_P): New.
* tree-vect-slp-patterns.cc (linear_loads_p): Adjust.
(vect_detect_pair_op): Likewise.
(addsub_pattern::recognize): Likewise.
* tree-vect-slp.cc (vect_print_slp_tree): Likewise.
(vect_gather_slp_loads): Likewise.
(vect_is_slp_load_node): Likewise.
(optimize_load_redistribution_1): Likewise.
(vect_optimize_slp_pass::is_cfg_latch_edge): Likewise.
(vect_optimize_slp_pass::internal_node_cost): Likewise.
(vect_optimize_slp_pass::start_choosing_layouts): Likewise.
(vect_optimize_slp_pass::backward_cost): Likewise.
(vect_optimize_slp_pass::forward_pass): Likewise.
(vect_optimize_slp_pass::get_result_with_layout): Likewise.
(vect_optimize_slp_pass::materialize): Likewise.
(vect_optimize_slp_pass::dump): Likewise.
(vect_optimize_slp_pass::decide_masked_load_lanes): Likewise.
(vect_update_slp_vf_for_node): Likewise.
(vect_slp_analyze_node_operations_1): Likewise.
(vect_schedule_slp_node): Likewise.
(vect_schedule_scc): Likewise.
* tree-vect-stmts.cc (vect_analyze_stmt): Likewise.
(vect_transform_stmt): Likewise.
(vect_is_simple_use): Likewise.
|
|
This removes a use of STMT_VINFO_DEF_TYPE.
* tree-vect-stmts.cc (vect_analyze_stmt): Use
SLP_TREE_DEF_TYPE instead of STMT_VINFO_DEF_TYPE.
|
|
The following splits up VMAT_GATHER_SCATTER into
VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and
VMAT_GATHER_SCATTER_EMULATED. The main motivation is to reduce
the uses of (full) gs_info, but it also makes the kind representable
by a single entry rather than the ifn and decl tristate.
The strided load with gather case gets to use VMAT_GATHER_SCATTER_IFN,
since that's what we end up checking.
* tree-vectorizer.h (vect_memory_access_type): Replace
VMAT_GATHER_SCATTER with three separate access types,
VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and
VMAT_GATHER_SCATTER_EMULATED.
(mat_gather_scatter_p): New predicate.
(GATHER_SCATTER_LEGACY_P): Remove.
(GATHER_SCATTER_IFN_P): Likewise.
(GATHER_SCATTER_EMULATED_P): Likewise.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
Adjust.
(get_load_store_type): Likewise.
(vect_get_loop_variant_data_ptr_increment): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Likewise.
* config/riscv/riscv-vector-costs.cc
(costs::need_additional_vector_vars_p): Likewise.
* config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype):
Likewise.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Likewise.
|
|
The gather_scatter_info pointer is only used as flag, so pass down
a flag.
* tree-vectorizer.h (vect_supportable_dr_alignment): Pass
a bool instead of a pointer to gather_scatter_info.
* tree-vect-data-refs.cc (vect_supportable_dr_alignment):
Likewise.
* tree-vect-stmts.cc (get_load_store_type): Adjust.
|
|
2025-08-13 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/89092
* resolve.cc (was_declared): Add subroutine attribute.
gcc/testsuite/
PR fortran/89092
* gfortran.dg/pr89092.f90: New test.
|
|
The rtx cost value defined by the target backend affects the
calculation of register pressure classes in the IRA, thus affecting
scheduling. This may cause program performance degradation.
For example, OpenSSL 3.5.1 SHA512 and SPEC CPU 2017 exchange_r.
This problem can be avoided by defining a set of register pressure
classes in the target backend instead of using the default IRA to
automatically calculate them.
gcc/ChangeLog:
PR target/120476
* config/loongarch/loongarch.cc
(loongarch_compute_pressure_classes): New function.
(TARGET_COMPUTE_PRESSURE_CLASSES): Define.
|
|
This patch adds support for C23's _BitInt for LoongArch.
From the LoongArch psABI[1]:
> _BitInt(N) objects are stored in little-endian order in memory
> and are signed by default.
>
> For N ≤ 64, a _BitInt(N) object have the same size and alignment
> of the smallest fundamental integral type that can contain it.
> The unused high-order bits within this containing type are filled
> with sign or zero extension of the N-bit value, depending on whether
> the _BitInt(N) object is signed or unsigned. The _BitInt(N) object
> propagates its signedness to the containing type and is laid out
> in a register or memory as an object of this type.
>
> For N > 64, _BitInt(N) objects are implemented as structs of 64-bit
> integer chunks. The number of chunks is the smallest even integer M
> so that M * 64 ≥ N. These objects are of the same size of the struct
> containing the chunks, but always have 16-byte alignment. If there
> are unused bits in the highest-ordered chunk that contains used
> bits, they are defined as the sign- or zero- extension of the used
> bits depending on whether the _BitInt(N) object is signed or
> unsigned. If an entire chunk is unused, its bits are undefined.
[1] https://github.com/loongson/la-abi-specs
PR target/117599
gcc/ChangeLog:
* config/loongarch/loongarch.h: Define a PROMOTE_MODE case for
small _BitInts.
* config/loongarch/loongarch.cc (loongarch_promote_function_mode):
Same.
(loongarch_bitint_type_info): New function.
(TARGET_C_BITINT_TYPE_INFO): Declare.
libgcc/ChangeLog:
* config/loongarch/t-softfp-tf: Enable _BitInt helper functions.
* config/loongarch/t-loongarch: Same.
* config/loongarch/libgcc-loongarch.ver: New file.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/bitint-alignments.c: New test.
* gcc.target/loongarch/bitint-args.c: New test.
* gcc.target/loongarch/bitint-sizes.c: New test.
|
|
So this is a minor bug in a few DFA descriptions such as the Xiangshan and a
couple of the SiFive descriptions.
While Xiangshan covers every insn type, some of the reservations check the mode
of the operation. Concretely the fdiv/fsqrt unit reservations vary based on
the mode. They handled DF/SF, but not HF (the relevant iterators don't include
BF).
This patch just adds HF support with the same characteristics as SF. Those who
know these designs better could perhaps improve the reservation, but this at
least keeps us from aborting.
I did check the other published DFAs for mode dependent reservations. That's
show I found the p400/p600 issue.
Tested in my tester, waiting for CI to render its verdict before pushing.
PR target/121113
gcc/
* config/riscv/sifive-p400.md: Handle HFmode for fdiv/fsqrt.
* config/riscv/sifive-p600.md: Likewise.
* config/riscv/xiangshan.md: Likewise.
gcc/testsuite/
* gcc.target/riscv/pr121113.c: New test.
|
|
Replace " value *= 10; value += digit" routines with a new one that does two
digits at a time and avoids __int128 calculations until they are necessary.
These changes also clean up the conversion behavior when a digit is not valid.
gcc/cobol/ChangeLog:
* genutil.cc (get_binary_value): Use the new routine.
libgcobol/ChangeLog:
* libgcobol.cc (int128_to_field): Use the new routine.
(get_binary_value_local): Use the new routine.
(format_for_display_internal): Formatting.
(__gg__get_file_descriptor): Likewise.
* stringbin.cc (string_from_combined): Formatting.
(packed_from_combined): Likewise.
(int_from_string): New routine.
(__gg__numeric_display_to_binary): Likewise.
* stringbin.h (__gg__numeric_display_to_binary): Likewise.
|
|
I added this test back in r7-934-g15c671a79ca66d, but it looks like
r15-2125-g81824596361cf4 changed the error message.
gcc/testsuite/ChangeLog:
PR testsuite/119783
jit.dg/test-error-impossible-must-tail-call.c
* jit.dg/test-error-impossible-must-tail-call.c (verify_code):
Check that we get a suitable-looking error message, but don't
try to specify exactly what the message is.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
gcc/jit/ChangeLog:
PR jit/121516
* libgccjit++.h (context::new_struct_type): Replace use of
&fields[0] with fields.data ().
(context::new_function): Likewise for params.
(context::new_rvalue): Likewise for elements.
(context::new_call): Likewise for args.
(block::end_with_switch): Likewise for cases.
(block::end_with_extended_asm_goto): Likewise for goto_blocks.
(context::new_struct_ctor): Likewise for fields and values.
(context::new_array_ctor): Likewise for values.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
For
(set (reg/v:DI 106 [ k ])
(const_int 3000000000 [0xb2d05e00]))
...
(set (reg:V4SI 115 [ _13 ])
(vec_duplicate:V4SI (subreg:SI (reg/v:DI 106 [ k ]) 0)))
...
(set (reg:V2SI 118 [ _9 ])
(vec_duplicate:V2SI (subreg:SI (reg/v:DI 106 [ k ]) 0)))
we should generate
(set (reg:SI 125)
(const_int -1294967296 [0xffffffffb2d05e00]))
(set (reg:V4SI 124)
(vec_duplicate:V4SI (reg:VSI 125))
...
(set (reg:V4SI 115 [ _13 ])
(reg:V4SI 124)
...
(set (reg:V2SI 118 [ _9 ])
(subreg:V2SI (reg:V4SI 124))
by converting integer constant to mode of move.
gcc/
PR target/121497
* config/i386/i386-features.cc (ix86_broadcast_inner): Convert
integer constant to mode of move
gcc/testsuite/
PR target/121497
* gcc.target/i386/pr121497.c: New test.
Co-authored-by: Liu, Hongtao <hongtao.liu@intel.com>
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vmerge.vvm
combine to vmerge.vxm, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-1-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-2-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-2-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-2-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-2-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-3-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-3-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-3-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-3-i8.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmerge-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmerge-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmerge-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmerge-run-1-i8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to combine the vec_duplicate + vaadd.vv to the
vaadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_MERGE_0(T) \
void \
test_vx_merge_##T##_case_0 (T * restrict out, T * restrict in, \
T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
{ \
if (i % 2 == 0) \
out[i] = x; \
else \
out[i] = in[i]; \
} \
}
DEF_VX_MERGE_0(int32_t)
Before this patch:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
...
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
...
22 │ vmerge.vvm v1,v1,v2,v0
...
25 │ bne a3,zero,.L3
After this patch:
11 │ beq a3,zero,.L8
...
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
...
20 │ vmerge.vxm v1,v1,a2,v0
...
23 │ bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*merge_vx_<mode>): Add new
pattern to combine the vmerge.vxm.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Hi,
In PR121334 we are asked to expand a const_vector of size 4 with
poly_int elements. It has 2 elts per pattern so is neither a
const_vector_duplicate nor a const_vector_stepped.
We don't allow this kind of constant in legitimate_constant_p but expr
apparently still wants us to expand it under certain conditions.
This patch implements a basic expander for such kinds of patterns.
As slide1up is used to build the individual vectors it also adds
a helper function expand_slide1up.
I regtested on rv64gcv_zvl512b but unfortunately the newly created pattern is
not even executed. I tried some variations of the original code but didn't
manage to trigger it.
Regards
Robin
PR target/121334
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_slide1up): New function.
(expand_vector_init_trailing_same_elem): Use new function.
(expand_const_vector_onestep): New function.
(expand_const_vector): Uew expand_slide1up.
(expand_vector_init_merge_repeating_sequence): Ditto.
(shuffle_off_by_one_patterns): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr121334.c: New test.
|
|
enum can't be used in #if.
For #if expression, identifiers that are not macros,
which are all considered to be the number zero.
This patch may fix https://sourceware.org/bugzilla/show_bug.cgi?id=32776.
gcc/ChangeLog:
* config/loongarch/loongarch-def.h (ABI_BASE_LP64D): New macro.
(ABI_BASE_LP64F): New macro.
(ABI_BASE_LP64S): New macro.
(N_ABI_BASE_TYPES): New macro.
|
|
The following refactors the now misleading slp_done_for_suggested_uf
and slp states kept during vectorizer loop analysis.
* tree-vect-loop.cc (vect_analyze_loop_2): Change
slp_done_for_suggested_uf to a boolean
single_lane_slp_done_for_suggested_uf. Change slp
to force_single_lane boolean.
(vect_analyze_loop_1): Adjust similarly.
|
|
For the reasons explained in the comment, fwprop shouldn't even
try to propagate an asm definition.
gcc/
PR rtl-optimization/121253
* fwprop.cc (forward_propagate_into): Don't propagate asm defs.
gcc/testsuite/
PR rtl-optimization/121253
* gcc.target/aarch64/pr121253.c: New test.
|
|
With the hybrid stmt detection no longer working as a gate-keeper
to detect unhandled stmts we have to, and can, detect those earlier.
The appropriate place is vect_mark_stmts_to_be_vectorized where
for trivially relevant PHIs we can stop analyzing when the PHI
wasn't classified as a known def during vect_analyze_scalar_cycles.
PR tree-optimization/121509
* tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized):
Fail early when we detect a relevant but not handled PHI.
* gcc.dg/vect/pr121509.c: New testcase.
|
|
When inserting a compensation stmt during VN we are making sure to
register the result for the original stmt into the hashtable so
VN iteration has the chance to converge and we avoid inserting
another copy each time. But the implementation doesn't work for
non-SSA name values, and is also not necessary for constants since
we did not insert anything for them. The following appropriately
guards the calls to vn_nary_op_insert_stmt as was already done
in one place.
PR tree-optimization/121514
* tree-ssa-sccvn.cc (visit_nary_op): Only call
vn_nary_op_insert_stmt for SSA name result.
* gcc.dg/torture/pr121514.c: New testcase.
|
|
[PR121494]
Note this conflicts with my not yet approved patch for copy prop for aggregates into
function arguments (I will get back to that soon).
So the problem here is that I assumed if:
*a = decl1;
would not cause an exception that:
decl2 = *a;
would cause not cause one too.
I was wrong, in some cases where the Ada front-end marks `*a` in the store
as TREE_THIS_NOTRAP (due to knowing never be null or some other cases).
So that means when we prop decl1 into the statement storing decl2, we need to
mark that statement as possible to cleanup for eh.
Bootstraped and tested on x86_64-linux-gnu.
Also tested on x86_64-linux-gnu with a hack to force generate LC constant decls in the gimplifier.
PR tree-optimization/121494
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_agr_copyprop): Mark the bb of the use
stmt if needed for eh cleanup.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
|
|
Now that all STMT_VINFO_VECTYPE uses from vectorizable_* have been
pruged there's no longer a need to have STMT_VINFO_VECTYPE set.
We still rely on it being present on data-ref stmts and there it
can differ between different SLP instances when doing BB vectorization.
The following removes the setting from vect_analyze_stmt and
vect_transform_stmt.
Note the following clears STMT_VINFO_VECTYPE from pattern stmts (the
vector type should have moved to the SLP tree by this time).
* tree-vect-stmts.cc (vect_analyze_stmt): Only set
STMT_VINFO_VECTYPE for dataref SLP representatives.
Clear it for others and do not restore the original value.
(vect_transform_stmt): Likewise.
|
|
The following passes down the vector type to functions instead of
querying it from the reduc-info stmt-info.
* tree-vect-loop.cc (get_initial_defs_for_reduction):
Get vector type as argument.
(vect_find_reusable_accumulator): Likewise.
(vect_transform_cycle_phi): Adjust.
|
|
There's one use of STMT_VINFO_VECTYPE in vectorizable_reduction
where I'm only 99% sure which SLP_TREE_VECTYPE to replace it with
(vectorizable_reduction needs a lot of post-only-SLP TLC). The
following replaces it with the hopefully appropriate one.
* tree-vect-loop.cc (vectorizable_reduction): Replace
STMT_VINFO_VECTYPE use with SLP_TREE_VECTYPE.
|
|
This is another case where opportunistically handling a first
aggregate copy where we failed to match up the refs exactly
(as we don't insert missing handling components) yields to a
failure in the second aggregate copy that we visit. Add another
fixup to deal with such situations, in-line with that present
opportunistic handling.
PR tree-optimization/121493
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Opportunistically
strip components with known offset.
* gcc.dg/tree-ssa/ssa-fre-109.c: New testcase.
|
|
The following avoids ending up with a MEM_REF as component to apply.
* tree-ssa-sccvn.cc (vn_reference_lookup_3): When we fail to
match up the two base MEM_REFs, fail.
|
|
This patch adds support for the optional lower argument in intrinsic
c_f_pointer specified in Fortran 2023. Test cases and documentation have also
been updated.
gcc/fortran/ChangeLog:
* check.cc (gfc_check_c_f_pointer): Check lower arg legitimacy.
* intrinsic.cc (add_subroutines): Teach c_f_pointer about lower arg.
* intrinsic.h (gfc_check_c_f_pointer): Add lower arg.
* intrinsic.texi: Update lower arg for c_f_pointer.
* trans-intrinsic.cc (conv_isocbinding_subroutine): Add logic handle lower.
gcc/testsuite/ChangeLog:
* gfortran.dg/c_f_pointer_shape_tests_7.f90: New test.
* gfortran.dg/c_f_pointer_shape_tests_8.f90: New test.
* gfortran.dg/c_f_pointer_shape_tests_9.f90: New test.
Signed-off-by: Yuao Ma <c8ef@outlook.com>
|
|
This is a patch primarily from Shreya, though I think she cribbed some
code from Philipp that we had internally within Ventana and I made some
minor adjustments as well.
So the basic idea here is similar to her work on logical ops --
specifically when we can generate more efficient code at expansion time,
then do so. In some cases the net is better code; in other cases we
lessen reliance on mvconst_internal and finally it provides
infrastructure that I think will help address an issue Paul Antoine
reported a little while back.
The most obvious case is using paired addis from initial code generation
for some constants. It will also use a shNadd insn when the cost to
synthesize the original value is higher than the right-shifted value.
Finally it will negate the constant and use "sub" if the negated
constant is cheaper than the original constant.
There's more work to do in here, particularly WRT 32 bit objects for
rv64. Shreya is looking at that right now. There may also be cases
where another shNadd or addi would be profitable. We haven't really
explored those cases in any detail, while there may be cases to handle,
it's unclear how often they occur in practice.
I don't want to remove the define_insn_and_split for the paired addi
cases yet. I think that likely happens as a side effect of fixing Paul
Antoine's issue.
Bootstrapped and regression tested on a BPI & Pioneer box. Will
obviously wait for the pre-commit tester before moving forward.
Jeff
PR target/120603
gcc/
* config/riscv/riscv-protos.h (synthesize_add): Add prototype.
* config/riscv/riscv.cc (synthesize_add): New function.
* config/riscv/riscv.md (addsi3): Allow any constant as operands[2]
in the expander. Force the constant into a register as needed for
TARGET_64BIT. Use synthesize_add for !TARGET_64BIT.
(*adddi3): Renamed from adddi3.
(adddi3): New expander. Use synthesize_add.
gcc/testsuite
* gcc.target/riscv/add-synthesis-1.c: New test.
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
|
|
The internal representation of Numeric Display (ND) zoned decimal variables
when operating in EBCDIC mode has been brought into compliance with IBM
conventions. This requires changes to data input, data output, internal
conversion of zoned decimal to binary, and variable assignment.
gcc/cobol/ChangeLog:
* genapi.cc (compare_binary_binary): Formatting.
(cobol_compare): Formatting.
(mh_numeric_display): Rewrite "move ND to ND" algorithm.
(initial_from_initial): Proper initialization of EBCDIC ND variables.
* genmath.cc (fast_add): Delete comment.
* genutil.cc (get_binary_value): Modify for updated EBCDIC.
libgcobol/ChangeLog:
* common-defs.h (NUMERIC_DISPLAY_SIGN_BIT): New comment; new constant.
(EBCDIC_MINUS): New constant.
(EBCDIC_PLUS): Likewise.
(EBCDIC_ZERO): Likewise.
(EBCDIC_NINE): Likewise.
(PACKED_NYBBLE_PLUS): Likewise.
(PACKED_NYBBLE_MINUS): Likewise.
(PACKED_NYBBLE_UNSIGNED): Likewise.
(NUMERIC_DISPLAY_SIGN_BIT_ASCII): Likewise.
(NUMERIC_DISPLAY_SIGN_BIT_EBCDIC): Likewise.
(SEPARATE_PLUS): Likewise.
(SEPARATE_MINUS): Likewise.
(ZONED_ZERO): Likewise.
(ZONE_SIGNED_EBCDIC): Likewise.
* configure: Regenerate.
* libgcobol.cc (turn_sign_bit_on): Handle new EBCDIC sign convention.
(turn_sign_bit_off): Likewise.
(is_sign_bit_on): Likewise.
(int128_to_field): EBCDIC NumericDisplay conversion.
(get_binary_value_local): Likewise.
(format_for_display_internal): Likewise.
(normalize_id): Likewise.
(__gg__inspect_format_1): Convert EBCDIC negative numbers to positive.
* stringbin.cc (packed_from_combined): Quell cppcheck warning.
gcc/testsuite/ChangeLog:
* cobol.dg/group2/ALLOCATE_Rule_8_OPTION_INITIALIZE_with_figconst.out:
Change test for updated handling of Numeric Display variables.
|
|
|
|
Reject QI/HImode conditions, which would require extension in
order to compare. Fixes
z.c:10:1: error: unrecognizable insn:
10 | }
| ^
(insn 23 22 24 2 (set (reg:CC 66 cc)
(compare:CC (reg:HI 128)
(reg:HI 127))) "z.c":6:6 -1
(nil))
during RTL pass: vregs
gcc:
* config/aarch64/aarch64.md (mov<ALLI>cc): Accept MODE_CC
conditions directly; reject QI/HImode conditions.
gcc/testsuite:
* gcc.target/aarch64/cmpbr-3.c: New.
* gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: Simplify
test for csel by ignoring the actual registers used.
|
|
Restrict the immediate range to the intersection of LT/GE and GT/LE
so that cfglayout can invert the condition to redirect any branch.
gcc:
PR target/121388
* config/aarch64/aarch64.cc (aarch64_cb_rhs): Restrict the
range of LT/GE and GT/LE to their intersections.
* config/aarch64/aarch64.md (*aarch64_cb<INT_CMP><GPI>): Unexport.
Use cmpbr_imm_predicate instead of aarch64_cb_rhs.
* config/aarch64/constraints.md (Uc1): Accept 0..62.
(Uc2): Remove.
* config/aarch64/iterators.md (cmpbr_imm_predicate): New.
(cmpbr_imm_constraint): Update to match aarch64_cb_rhs.
* config/aarch64/predicates.md (aarch64_cb_reg_i63_operand): New.
(aarch64_cb_reg_i62_operand): New.
gcc/testsuite:
PR target/121388
* gcc.target/aarch64/cmpbr.c (u32_x0_ult_64): XFAIL.
(i32_x0_slt_64, u64_x0_ult_64, i64_x0_slt_64): XFAIL.
* gcc.target/aarch64/cmpbr-2.c: New.
|
|
gcc:
* config/aarch64/aarch64.cc (aarch64_if_then_else_costs):
Use aarch64_cb_rhs to match CB insns.
|
|
gcc/testsuite:
* gcc.target/aarch64/cmpbr.c: Only compile, not assemble,
since we want to scan the assembly.
|
|
There is a conflict between aarch64_tbzltdi1 and aarch64_cbltdi
with respect to pnum_clobbers, resulting in a recog failure:
0xa1fffe fancy_abort(char const*, int, char const*)
../../gcc/diagnostics/context.cc:1640
0x81340e patch_jump_insn
../../gcc/cfgrtl.cc:1303
0xc0eafe redirect_branch_edge
../../gcc/cfgrtl.cc:1330
0xc0f372 cfg_layout_redirect_edge_and_branch
../../gcc/cfgrtl.cc:4736
0xbfb6b9 redirect_edge_and_branch(edge_def*, basic_block_def*)
../../gcc/cfghooks.cc:391
0x1fa9310 try_forward_edges
../../gcc/cfgcleanup.cc:561
0x1fa9310 try_optimize_cfg
../../gcc/cfgcleanup.cc:2931
0x1fa9310 cleanup_cfg(int)
../../gcc/cfgcleanup.cc:3143
0x1fe11e8 rest_of_handle_cse
../../gcc/cse.cc:7591
0x1fe11e8 execute
../../gcc/cse.cc:7622
The simplest solution is to remove the clobber from aarch64_tbz.
This removes the possibility of expansion via TST+B.cond, which
will merely fall back to TBNZ+B on shorter branches.
gcc:
PR target/121385
* config/aarch64/aarch64.md (*aarch64_tbz<LTGE><ALLI>1): Remove
cc clobber and expansion via TST+Bcond.
gcc/testsuite:
PR target/121385
* gcc.target/aarch64/cmpbr-1.c: New.
|
|
With -mtrack-speculation, CC_REGNUM must be used at every
conditional branch.
gcc:
* config/aarch64/aarch64.h (TARGET_CMPBR): False when
aarch64_track_speculation is true.
|
|
Both patterns used !reload_completed as a condition, which is
questionable at best. The branch pattern failed to include a
clobber of CC_REGNUM. Both problems were unlikely to trigger
in practice, due to how the optimization pipeline is organized,
but let's fix them anyway.
gcc:
* config/aarch64/aarch64.cc (aarch64_gen_compare_split_imm24): New.
* config/aarch64/aarch64-protos.h: Update.
* config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>): Use it.
Add match_scratch and cc clobbers. Use match_operator instead of
iterator expansion.
(*compare_cstore<GPI>_insn): Likewise.
|
|
Two of the three uses of aarch64_imm24 included the important follow-up
tests vs aarch64_move_imm and aarch64_plus_operand. Lack of the exclusion
within aarch64_if_then_else_costs produced incorrect costing.
Since aarch64_split_imm24 has already matched a non-negative CONST_INT,
drill down from aarch64_plus_operand to aarch64_uimm12_shift.
gcc:
* config/aarch64/predicates.md (aarch64_split_imm24): Rename from
aarch64_imm24; exclude aarch64_move_imm and aarch64_uimm12_shift.
* config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>):
Update for aarch64_split_imm24.
(*compare_cstore<GPI>_insn): Likewise.
* config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Likewise.
|
|
The save/restore_stack_nonlocal patterns passed a DImode rtx to
gen_tbranch_neqi3 for a QImode compare. But since we're seeding
r16 with 1, GCSEnabled will clear the only set bit in r16, so we
can use CBNZ instead of TBNZ.
gcc:
* config/aarch64/aarch64.md (tbranch_<EQL><SHORT>3): Remove.
(save_stack_nonlocal): Use aarch64_gen_compare_zero_and_branch.
(restore_stack_nonlocal): Likewise.
gcc/testsuite:
* gcc.target/aarch64/gcs-nonlocal-3.c: Match cbnz.
|
|
With -mtrack-speculation, the pattern that was directly expanded by
aarch64_restore_za is disabled. Use the helper function instead.
gcc:
* config/aarch64/aarch64.cc
(aarch64_gen_compare_zero_and_branch): Export.
* config/aarch64/aarch64-protos.h
(aarch64_gen_compare_zero_and_branch): Declare it.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Use it.
* config/aarch64/aarch64.md (*aarch64_cbz<EQL><GPI>): Unexport.
|
|
gcc:
* config/aarch64/aarch64.cc (aarch64_if_the_else_costs): Reorg to
include the cost of inner within TBZ sign-bit test, only match
CBZ/CBNZ with valid modes, and both for the aarch64_imm24 test.
|