Age | Commit message (Collapse) | Author | Files | Lines |
|
A recent commit introduced a compiler warning in thead.cc:
error: invalid suffix on literal; C++11 requires a space between literal and string macro [-Werror=literal-suffix]
1144 | fprintf (file, "(%s),"HOST_WIDE_INT_PRINT_DEC",%u", reg_names[REGNO (addr.reg)],
| ^
This commit addresses this issue and breaks the line such that it won't
exceed 80 characters.
gcc/ChangeLog:
* config/riscv/thead.cc (th_print_operand_address): Fix compiler
warning.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
2 scratch registers, %r10 and %r11, are available at function entry for
large model profiling. But %r10 may be used by stack realignment and we
can't use %r10 in this case. Add x86_64_select_profile_regnum to find
a caller-saved register which isn't live or a callee-saved register
which has been saved on stack in the prologue at entry for large model
profiling and sorry if we can't find one.
gcc/
PR target/113689
* config/i386/i386.cc (x86_64_select_profile_regnum): New.
(x86_function_profiler): Call x86_64_select_profile_regnum to
get a scratch register for large model profiling.
gcc/testsuite/
PR target/113689
* gcc.target/i386/pr113689-1.c: New file.
* gcc.target/i386/pr113689-2.c: Likewise.
* gcc.target/i386/pr113689-3.c: Likewise.
|
|
Adds missing bti instruction at the beginning of a virtual
thunk, when bti is enabled.
gcc/ChangeLog:
* config/arm/arm.cc (arm_output_mi_thunk): Emit
insn for bti_c when bti is enabled.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add v8_1_m_main_pacbti.
* g++.target/arm/bti_thunk.C: New test.
|
|
I was too sleepy writting this :(.
gcc/ChangeLog:
* config/mips/mips-msa.md (neg<mode:MSA>2): Add missing mode for
neg.
|
|
We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with MSA enabled.
Use the bnegi.df instructions to simply reverse the sign bit instead.
gcc/ChangeLog:
* config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
(neg<mode>2): Change the mode iterator from MSA to IMSA because
in FP arithmetic we cannot use (0 - x) for -x.
(neg<mode>2): New define_insn to implement FP vector negation,
using a bnegi instruction to negate the sign bit.
|
|
vzeroupper pass [PR113059]
The move of the vzeroupper pass from after reload pass to after
postreload_cse helped only partially, CSE-like passes can still invalidate
those notes (especially REG_UNUSED) if they use some earlier register
holding some value later on in the IL.
So, either we could try to move it one pass further after gcse2 and hope
no later pass invalidates the notes, or the following patch attempts to
restore the REG_DEAD/REG_UNUSED state from GCC 13 and earlier, where
the LRA or reload passes remove all REG_DEAD/REG_UNUSED notes and the notes
reappear only at the start of dse2 pass when it calls
df_note_add_problem ();
df_analyze ();
So, effectively
NEXT_PASS (pass_postreload_cse);
NEXT_PASS (pass_gcse2);
NEXT_PASS (pass_split_after_reload);
NEXT_PASS (pass_ree);
NEXT_PASS (pass_compare_elim_after_reload);
NEXT_PASS (pass_thread_prologue_and_epilogue);
passes operate without those notes in the IL.
While in GCC 14 mode switching computes the notes problem at the start of
vzeroupper, the patch below removes them at the end of the pass again, so
that the above passes continue to operate without them.
2024-02-05 Jakub Jelinek <jakub@redhat.com>
PR target/113059
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Remove REG_DEAD/REG_UNUSED notes at the end of the pass before
df_analyze call.
|
|
The following avoids re-using a register holding a pointer (and
thus might be REG_POINTER) for the result of a pointer difference
computation. That might confuse heuristics in (broken) RTL alias
analysis which relies on REG_POINTER indicating that we're
dealing with one.
This alone doesn't fix anything.
PR target/113255
* config/i386/i386-expand.cc
(expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves):
Use a new pseudo for the skipped number of bytes.
|
|
gcc/ChangeLog:
* config/riscv/riscv-cores.def: Add sifive-p450, sifive-p670.
* doc/invoke.texi (RISC-V Options): Add sifive-p450,
sifive-p670.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/mcpu-sifive-p450.c: New test.
* gcc.target/riscv/mcpu-sifive-p670.c: New test.
|
|
Add sifive p400 series scheduler module. For more information
see https://www.sifive.com/cores/performance-p450-470.
gcc/ChangeLog:
* config/riscv/riscv.md: Include sifive-p400.md.
* config/riscv/sifive-p400.md: New file.
* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add sifive_p400.
* config/riscv/riscv.cc (sifive_p400_tune_info): New.
* config/riscv/riscv.h (TARGET_SFB_ALU): Update.
* doc/invoke.texi (RISC-V Options): Add sifive-p400-series
|
|
gcc/ChangeLog:
* config/xtensa/xtensa.md (*eqne_zero_masked_bits):
Add missing ":SI" to the match_operator.
|
|
After LRA transition, HImode constants that don't fit into signed 12 bits
are no longer subject to constant synthesis:
/* example */
void test(void) {
short foo = 32767;
__asm__ ("" :: "r"(foo));
}
;; before
.literal_position
.literal .LC0, 32767
test:
l32r a9, .LC0
ret.n
This patch fixes that:
;; after
test:
movi.n a9, -1
extui a9, a9, 17, 15
ret.n
gcc/ChangeLog:
* config/xtensa/xtensa.md (SHI): New mode iterator.
(2 split patterns related to constsynth):
Change to also accept HImode operands.
|
|
This patch adjusts the costs so that we treat REG and SUBREG expressions the
same for costing.
This was motivated by bt_skip_func and bt_find_func in xz and results in nearly
a 5% improvement in the dynamic instruction count for input #2 and smaller, but
definitely visible improvements pretty much across the board. Exceptions would
be perlbench input #1 and exchange2 which showed very small regressions.
In the bt_find_func and bt_skip_func cases we have something like this:
> (insn 10 7 11 2 (set (reg/v:DI 136 [ x ])
> (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip}
> (nil))
> (insn 11 10 12 2 (set (reg:DI 142 [ _1 ])
> (plus:DI (reg/v:DI 136 [ x ])
> (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3}
> (nil))
[ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ])
> (plus:DI (reg/v:DI 136 [ x ])
> (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3}
> (nil))
Note the two uses of (reg 136). The best way to handle that in combine might be
a 3->2 split. But there's a much better approach if we look at fwprop...
(set (reg:DI 142 [ _1 ])
(plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))
(reg/v:DI 139 [ b ])))
change not profitable (cost 4 -> cost 8)
So that should be the same cost as a regular DImode addition when the ZBA
extension is enabled. But it ends up costing more because the clause to cost
this variant isn't prepared to handle a SUBREG. That results in the RTL above
having too high a cost and fwprop gives up.
One approach would be to replace the REG_P with REG_P || SUBREG_P in the
costing code. I ultimately decided against that and instead check if the
operand in question passes register_operand.
By far the most important case to handle is the DImode PLUS. But for the sake
of consistency, I changed the other instances in riscv_rtx_costs as well. For
those other cases we're talking about improvements in the .000001% range.
While we are into stage4, this just hits cost modeling which we've generally
agreed is still appropriate (though we were mostly talking about vector). So
I'm going to extend that general agreement ever so slightly and include scalar
cost modeling :-)
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG
similarly.
gcc/testsuite/
* gcc.target/riscv/reg_subreg_costs.c: New test.
Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
|
|
We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with LSX enabled.
Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
instead. We are already doing this for LASX and now we can unify them
into simd.md.
gcc/ChangeLog:
* config/loongarch/lsx.md (neg<mode:FLSX>2): Remove the
incorrect expand.
* config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
(elmsgnbit): Likewise.
(neg<mode:FVEC>2): New define_insn.
* config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
are now instantiated in simd.md.
|
|
We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes.
But in loongarch_symbol_insns:
if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
return 0;
And LSX_SUPPORTED_MODE_P is defined as:
#define LSX_SUPPORTED_MODE_P(MODE) \
(ISA_HAS_LSX \
&& GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ...
GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined:
ALWAYS_INLINE poly_uint16
mode_to_bytes (machine_mode mode)
{
#if GCC_VERSION >= 4001
return (__builtin_constant_p (mode)
? mode_size_inline (mode) : mode_size[mode]);
#else
return mode_size[mode];
#endif
}
There is an assertion in mode_size_inline:
gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);
Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
thus if __builtin_constant_p (mode) is evaluated true (it happens when
GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
cause an ICE. OTOH if __builtin_constant_p (mode) is evaluated false,
mode_size[mode] is still an out-of-bound array access (the length or the
mode_size array is NUM_MACHINE_MODES).
So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
MAX_MACHINE_MODE in loongarch_symbol_insns. This is very similar to a
MIPS bug PR98491 fixed by me about 3 years ago.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
MAX_MACHINE_MODE.
|
|
This FAIL was introduced from r14-6908. The reason is that when merging
constant vector permutation implementations, the 128-bit matching situation
was not fully considered. In fact, the expansion of 128-bit vectors after
merging only supports value-based 4 elements set shuffle, so this time is a
complete implementation of the entire 128-bit vector constant permutation,
and some structural adjustments have also been made to the code.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_expand_vselect): Adjust.
(loongarch_expand_vselect_vconcat): Ditto.
(loongarch_try_expand_lsx_vshuf_const): New, use vshuf to implement
all 128-bit constant permutation situations.
(loongarch_expand_lsx_shuffle): Adjust and rename function name.
(loongarch_is_imm_set_shuffle): Renamed function name.
(loongarch_expand_vec_perm_even_odd): Function forward declaration.
(loongarch_expand_vec_perm_even_odd_1): Add implement for 128-bit
extract-even and extract-odd permutations.
(loongarch_is_odd_extraction): Delete.
(loongarch_is_even_extraction): Ditto.
(loongarch_expand_vec_perm_const): Adjust.
|
|
When bootstrapping GCC 14 with --with-build-config=bootstrap-lto, an ODR
violation is detected:
../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
../../gcc/config/loongarch/loongarch-def.cc:186: note:
'abi_minimal_isa' was previously declared here
186 | abi_minimal_isa = array<array<loongarch_isa, N_ABI_EXT_TYPES>,
../../gcc/config/loongarch/loongarch-def.cc:186: note:
code may be misoptimized unless '-fno-strict-aliasing' is used
Fix it by adding a proper declaration of abi_minimal_isa into
loongarch-def.h and remove the ODR-violating local declaration in
loongarch-opts.cc.
gcc/ChangeLog:
* config/loongarch/loongarch-def.h (abi_minimal_isa): Declare.
* config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove
the ODR-violating locale declaration.
|
|
This change implements __builtin_get_fpsr() and __builtin_set_fpsr(x)
to get and set the floating-point status register. They are used to
implement pa_atomic_assign_expand_fenv().
2024-02-02 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR target/59778
* config/pa/pa.cc (enum pa_builtins): Add PA_BUILTIN_GET_FPSR
and PA_BUILTIN_SET_FPSR builtins.
* (pa_builtins_icode): Declare.
* (def_builtin, pa_fpu_init_builtins): New.
* (pa_init_builtins): Initialize FPU builtins.
* (pa_builtin_decl, pa_expand_builtin_1): New.
* (pa_expand_builtin): Handle PA_BUILTIN_GET_FPSR and
PA_BUILTIN_SET_FPSR builtins.
* (pa_atomic_assign_expand_fenv): New.
* config/pa/pa.md (UNSPECV_GET_FPSR, UNSPECV_SET_FPSR): New
UNSPECV constants.
(get_fpsr, put_fpsr): New expanders.
(get_fpsr_32, get_fpsr_64, set_fpsr_32, set_fpsr_64): New
insn patterns.
|
|
This patch fixes the following:
vsetvli a5,a1,e32,m1,tu,ma
slli a4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,zero
vsetvli a5,zero,e32,m1,ta,ma ---> Redundant vsetvl.
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
VSETVL PASS is able to fuse avl = 1 of scalar move and VLMAX avl of reduction.
However, this following RTL blocks the fusion in dependence analysis in VSETVL PASS:
(insn 49 24 50 5 (set (reg:RVVM1SI 98 v2 [148])
(if_then_else:RVVM1SI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI [
(const_int 1 [0x1])
repeat [
(const_int 0 [0])
]
])
(const_int 1 [0x1])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM1SI repeat [
(const_int 0 [0])
])
(unspec:RVVM1SI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF))) 3813 {*pred_broadcastrvvm1si_zero}
(nil))
(insn 50 49 51 5 (set (reg:DI 15 a5 [151]) ----> It set a5, blocks the following VLMAX into the scalar move above.
(unspec:DI [
(const_int 32 [0x20])
] UNSPEC_VLMAX)) 2566 {vlmax_avldi}
(expr_list:REG_EQUIV (unspec:DI [
(const_int 32 [0x20])
] UNSPEC_VLMAX)
(nil)))
(insn 51 50 52 5 (set (reg:RVVM1SI 97 v1 [150])
(unspec:RVVM1SI [
(unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 15 a5 [151])
(const_int 2 [0x2])
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(unspec:RVVM1SI [
(reg:RVVM1SI 97 v1 [orig:134 vect_result_14.6 ] [134])
(reg:RVVM1SI 98 v2 [148])
] UNSPEC_REDUC_SUM)
(unspec:RVVM1SI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF)
] UNSPEC_REDUC)) 17541 {pred_redsumrvvm1si}
(expr_list:REG_DEAD (reg:RVVM1SI 98 v2 [148])
(expr_list:REG_DEAD (reg:SI 66 vl)
(expr_list:REG_DEAD (reg:DI 15 a5 [151])
(expr_list:REG_DEAD (reg:DI 0 zero)
(nil))))))
Such situation can only happen on auto-vectorization, never happen on intrinsic codes.
Since the reduction is passed VLMAX AVL, it should be more natural to pass VLMAX to the scalar move which initial the value of the reduction.
After this patch:
vsetvli a5,a1,e32,m1,tu,ma
slli a4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
vsetvli a5,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
Tested on both RV32/RV64 no regression.
PR target/113697
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_reduction): Pass VLMAX avl to scalar move.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr113697.c: New test.
|
|
This reverts commit 74489c19070703361acc20bc172f304cae845a96.
|
|
Realize in recent benchmark evaluation (coremark-pro zip-test):
vid.v v2
vmv.v.i v5,0
.L9:
vle16.v v3,0(a4)
vrsub.vx v4,v2,a6 ---> LICM failed to hoist it outside the loop.
The root cause is:
(insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0)
(reg:DI 223)) "rvv.c":11:9 208 {*movdi_64bit} -> Its result used by the following vrsub.vx then supress the hoist of the vrsub.vx
(nil))
(insn 57 56 59 4 (set (reg:RVVMF2HI 216)
(if_then_else:RVVMF2HI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 350)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(minus:RVVMF2HI (vec_duplicate:RVVMF2HI (reg:HI 220))
(reg:RVVMF2HI 217))
(unspec:RVVMF2HI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF))) "rvv.c":11:9 6938 {pred_subrvvmf2hi_reverse_scalar}
(expr_list:REG_DEAD (reg:HI 220)
(nil)))
This patch fixes it generate (set (reg:HI) (subreg:HI (reg:DI))) instead of (set (subreg:DI (reg:DI)) (reg:DI)).
After this patch:
vid.v v2
vrsub.vx v2,v2,a7
vmv.v.i v4,0
.L3:
vle16.v v3,0(a4)
Tested on both RV32 and RV64 no regression.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_move): Fix poly_int dest generation.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/poly_licm-1.c: New test.
* gcc.target/riscv/rvv/autovec/poly_licm-2.c: New test.
|
|
This patch would like to cleanup some comments which are out of date or incorrect.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_get_arg_info): Cleanup comments.
(riscv_pass_by_reference): Ditto.
(riscv_fntype_abi): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
I realize there is a RTL regression between GCC-14 and GCC-13.
https://godbolt.org/z/Ga7K6MqaT
GCC-14:
(insn 9 13 31 2 (set (reg:DI 15 a5 [138])
(unspec:DI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi}
(expr_list:REG_EQUIV (unspec:DI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)
(nil)))
(insn 31 9 10 2 (parallel [
(set (reg:DI 15 a5 [138])
(unspec:DI [
(reg:DI 0 zero)
(const_int 32 [0x20])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL))
(set (reg:SI 66 vl)
(unspec:SI [
(reg:DI 0 zero)
(const_int 32 [0x20])
(const_int 7 [0x7])
] UNSPEC_VSETVL))
(set (reg:SI 67 vtype)
(unspec:SI [
(const_int 32 [0x20])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL))
]) "/app/example.c":5:15 3281 {vsetvldi}
(nil))
GCC-13:
(insn 10 7 26 2 (set (reg/f:DI 11 a1 [139])
(plus:DI (reg:DI 11 a1 [142])
(const_int 800 [0x320]))) "/app/example.c":6:32 5 {adddi3}
(nil))
(insn 26 10 9 2 (parallel [
(set (reg:DI 15 a5)
(unspec:DI [
(reg:DI 0 zero)
(const_int 32 [0x20])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL))
(set (reg:SI 66 vl)
(unspec:SI [
(reg:DI 0 zero)
(const_int 32 [0x20])
(const_int 7 [0x7])
] UNSPEC_VSETVL))
(set (reg:SI 67 vtype)
(unspec:SI [
(const_int 32 [0x20])
(const_int 7 [0x7])
(const_int 1 [0x1]) repeated x2
] UNSPEC_VSETVL))
]) "/app/example.c":5:15 792 {vsetvldi}
(nil))
GCC-13 doesn't have:
(insn 9 13 31 2 (set (reg:DI 15 a5 [138])
(unspec:DI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi}
(expr_list:REG_EQUIV (unspec:DI [
(const_int 64 [0x40])
] UNSPEC_VLMAX)
(nil)))
vsetvl_pre doesn't emit any assembler which is just used for occupying scalar register.
It should be removed in VSETVL PASS.
Tested on both RV32 and RV64 no regression.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (vsetvl_pre_insn_p): New function.
(pre_vsetvl::cleaup): Remove vsetvl_pre.
(pre_vsetvl::remove_vsetvl_pre_insns): New function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/vsetvl_pre-1.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.
|
|
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_multiply_add_p): New.
(loongarch_vector_costs::add_stmt_cost): Adjust.
gcc/testsuite/ChangeLog:
* gfortran.dg/vect/vect-10.f90: New test.
|
|
model.
The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for
addressing a symbol to be adjacent. So model them as "one large
instruction", i.e. define_insn, with two output registers. The real
address is the sum of these two registers.
The advantage of this approach is the RTL passes can still use ldx/stx
instructions to skip an addi.d instruction.
gcc/ChangeLog:
* config/loongarch/loongarch.md (unspec): Add
UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2.
(la_pcrel64_two_parts): New define_insn.
* config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a
typo in the comment.
(loongarch_call_tls_get_addr): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for
addressing the TLS symbol and __tls_get_addr. Emit an REG_EQUAL
note to allow CSE addressing __tls_get_addr.
(loongarch_legitimize_tls_address): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address TLS IE symbols with
la_pcrel64_two_parts.
(loongarch_split_symbol): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address symbols with
la_pcrel64_two_parts.
(loongarch_output_mi_thunk): Clean up unreachable code. If
-mcmodel=extreme -mexplicit-relocs={always,auto}, address the MI
thunks with la_pcrel64_two_parts.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/func-call-extreme-1.c (dg-options):
Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i
instruction sequences are not reordered by the compiler.
(NOIPA): Disallow interprocedural optimizations.
* gcc.target/loongarch/func-call-extreme-2.c: Remove the content
duplicated from func-call-extreme-1.c, include it instead.
(dg-options): Likewise.
* gcc.target/loongarch/func-call-extreme-3.c (dg-options):
Likewise.
* gcc.target/loongarch/func-call-extreme-4.c (dg-options):
Likewise.
* gcc.target/loongarch/cmodel-extreme-1.c: New test.
* gcc.target/loongarch/cmodel-extreme-2.c: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-1.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-2.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-3.C: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_call_tls_get_addr):
Add support for call36.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c: New test.
|
|
-mexplicit-relocs=auto.
Binutils does not support relaxation using four instructions to obtain
symbol addresses
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
When the code model of the symbol is extreme and -mexplicit-relocs=auto,
the macro instruction loading symbol address is not applicable.
(loongarch_call_tls_get_addr): Adjust code.
(loongarch_legitimize_tls_address): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c: New test.
* gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch-protos.h (loongarch_symbol_extreme_p):
Add function declaration.
* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For SYMBOL_PCREL64, non-zero addend of "la.local $rd,$rt,sym+addend"
is not allowed
(loongarch_load_tls): Added macro support in extreme mode.
(loongarch_call_tls_get_addr): Likewise.
(loongarch_legitimize_tls_address): Likewise.
(loongarch_force_address): Likewise.
(loongarch_legitimize_move): Likewise.
(loongarch_output_mi_thunk): Likewise.
(loongarch_option_override_internal): Remove the code that detects
explicit relocs status.
(loongarch_handle_model_attribute): Likewise.
* config/loongarch/loongarch.md (movdi_symbolic_off64): New template.
* config/loongarch/predicates.md (symbolic_off64_operand): New predicate.
(symbolic_off64_or_reg_operand): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/attr-model-5.c: New test.
* gcc.target/loongarch/func-call-extreme-5.c: New test.
* gcc.target/loongarch/func-call-extreme-6.c: New test.
* gcc.target/loongarch/tls-extreme-macro.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_load_tls):
Load all types of tls symbols through one function.
(loongarch_got_load_tls_gd): Delete.
(loongarch_got_load_tls_ld): Delete.
(loongarch_got_load_tls_ie): Delete.
(loongarch_got_load_tls_le): Delete.
(loongarch_call_tls_get_addr): Modify the called function name.
(loongarch_legitimize_tls_address): Likewise.
* config/loongarch/loongarch.md (@got_load_tls_gd<mode>): Delete.
(@load_tls<mode>): New template.
(@got_load_tls_ld<mode>): Delete.
(@got_load_tls_le<mode>): Delete.
(@got_load_tls_ie<mode>): Delete.
|
|
values through fp.
Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + offset) + a x C).
Thereby modifying the register dependencies and optimizing the code.
The value of C is 2 4 or 8.
The following is the assembly code before and after a loop modification in spec2006 401.bzip:
old | new
735 .L71: | 735 .L71:
736 slli.d $r12,$r15,2 | 736 slli.d $r12,$r15,2
737 ldx.w $r13,$r22,$r12 | 737 ldx.w $r13,$r22,$r12
738 addi.d $r15,$r15,-1 | 738 addi.d $r15,$r15,-1
739 slli.w $r16,$r15,0 | 739 slli.w $r16,$r15,0
740 addi.w $r13,$r13,-1 | 740 addi.w $r13,$r13,-1
741 slti $r14,$r13,0 | 741 slti $r14,$r13,0
742 add.w $r12,$r26,$r13 | 742 add.w $r12,$r26,$r13
743 maskeqz $r12,$r12,$r14 | 743 maskeqz $r12,$r12,$r14
744 masknez $r14,$r13,$r14 | 744 masknez $r14,$r13,$r14
745 or $r12,$r12,$r14 | 745 or $r12,$r12,$r14
746 ldx.bu $r14,$r30,$r12 | 746 ldx.bu $r14,$r30,$r12
747 lu12i.w $r13,4096>>12 | 747 alsl.d $r14,$r14,$r18,2
748 ori $r13,$r13,432 | 748 ldptr.w $r13,$r14,0
749 add.d $r13,$r13,$r3 | 749 addi.w $r17,$r13,-1
750 alsl.d $r14,$r14,$r13,2 | 750 stptr.w $r17,$r14,0
751 ldptr.w $r13,$r14,-1968 | 751 slli.d $r13,$r13,2
752 addi.w $r17,$r13,-1 | 752 stx.w $r12,$r22,$r13
753 st.w $r17,$r14,-1968 | 753 ldptr.w $r12,$r19,0
754 slli.d $r13,$r13,2 | 754 blt $r12,$r16,.L71
755 stx.w $r12,$r22,$r13 | 755 .align 4
756 ldptr.w $r12,$r18,-2048 | 756
757 blt $r12,$r16,.L71 | 757
758 .align 4 | 758
This patch is ported from riscv's commit r14-3111.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (mem_shadd_or_shadd_rtx_p): New function.
(loongarch_legitimize_address): Add logical transformation code.
|
|
The fix for PR70321 introduced a splitter that split a doubleword
comparison into a pair of XORs followed by an IOR to set the (zero)
flags register. To help the reload, splitter forced SUBREG pieces of
double-word input values to a pseudo, but this regressed
gcc.target/i386/pr82580.c:
int f0 (U x, U y) { return x == y; }
from:
xorq %rdx, %rdi
xorq %rcx, %rsi
xorl %eax, %eax
orq %rsi, %rdi
sete %al
ret
to:
xchgq %rdi, %rsi
movq %rdx, %r8
movq %rcx, %rax
movq %rsi, %rdx
movq %rdi, %rcx
xorq %rax, %rcx
xorq %r8, %rdx
xorl %eax, %eax
orq %rcx, %rdx
sete %al
ret
To mitigate the regression, remove this legacy heuristic (workaround?).
There have been many incremental changes and improvements to x86 TImode
and register allocation, so this legacy workaround is not only no longer
useful, but it actually hurts register allocation. The patched compiler
now produces:
xchgq %rdi, %rsi
xorl %eax, %eax
xorq %rsi, %rdx
xorq %rdi, %rcx
orq %rcx, %rdx
sete %al
ret
PR target/113701
gcc/ChangeLog:
* config/i386/i386.md (*cmp<dwi>_doubleword):
Do not force SUBREG pieces to pseudos.
|
|
The first alternative stores the floating-point status register
in the destination. It should store zero. We need to copy %fr0
to another floating-point register to initialize it to zero.
2024-02-01 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md (atomic_storedi_1): Fix bug in
alternative 1.
|
|
gcc/
* config/avr/avr.cc: Tabify.
|
|
Also add 'STATIC_ASSERT's for number of SGPR/VGPR/AVGPR registers (in
'#ifndef USED_FOR_TARGET', as otherwise 'STATIC_ASSERT' isn't available).
gcc/
* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Don't
hard-code number of SGPR/VGPR/AVGPR registers.
* config/gcn/gcn.h: Add a 'STATIC_ASSERT's for number of
SGPR/VGPR/AVGPR registers.
|
|
Add sifive p600 series scheduler module. For more information
see https://www.sifive.com/cores/performance-p650-670.
Add sifive-p650, sifive-p670 for mcpu option will come in separate patches.
gcc/ChangeLog:
* config/riscv/riscv.md: Add "fcvt_i2f", "fcvt_f2i" type
attribute, and include sifive-p600.md.
* config/riscv/generic-ooo.md: Update type attribute.
* config/riscv/generic.md: Update type attribute.
* config/riscv/sifive-7.md: Update type attribute.
* config/riscv/sifive-p600.md: New file.
* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add sifive_p600.
* config/riscv/riscv.cc (sifive_p600_tune_info): New.
* config/riscv/riscv.h (TARGET_SFB_ALU): Update.
* doc/invoke.texi (RISC-V Options): Add sifive-p600-series
|
|
The RISC-V Profiles specification here:
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#7-new-isa-extensions
These extensions don't add any new features but
describe existing features. So this patch only adds parsing.
Za64rs: Reservation set size of 64 bytes
Za128rs: Reservation set size of 128 bytes
Ziccif: Main memory supports instruction fetch with atomicity requirement
Ziccrse: Main memory supports forward progress on LR/SC sequences
Ziccamoa: Main memory supports all atomics in A
Zicclsm: Main memory supports misaligned loads/stores
Zic64b: Cache block size isf 64 bytes
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add Za64rs, Za128rs,
Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b items.
* config/riscv/riscv.opt: New macro for 7 new unprivileged
extensions.
* doc/invoke.texi (RISC-V Options): Add Za64rs, Za128rs,
Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b extensions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/za-ext.c: New test.
* gcc.target/riscv/zi-ext.c: New test.
|
|
g++.dg/asan/default-options-1.C FAILs on Solaris/SPARC and x86:
FAIL: g++.dg/asan/default-options-1.C -O0 execution test
FAIL: g++.dg/asan/default-options-1.C -O1 execution test
FAIL: g++.dg/asan/default-options-1.C -O2 execution test
FAIL: g++.dg/asan/default-options-1.C -O2 -flto execution test
FAIL: g++.dg/asan/default-options-1.C -O2 -flto -flto-partition=none execution test
FAIL: g++.dg/asan/default-options-1.C -O3 -g execution test
FAIL: g++.dg/asan/default-options-1.C -Os execution test
The failure is always the same:
AddressSanitizer: CHECK failed: asan_rtl.cpp:397 "((!AsanInitIsRunning() && "ASan init calls itself!")) != (0)" (0x0, 0x0) (tid=1)
This happens because libasan makes unportable assumptions about
initialization order that don't hold on Solaris. The problem has
already been fixed in clang by
[Driver] Link shared asan runtime lib with -z now on Solaris/x86
https://reviews.llvm.org/D156325
where it was way more prevalent.
This patch applies the same fix to gcc.
Tested on i386-pc-solaris2.11 (ld and gld) and sparc-sun-solaris2.11.
2024-01-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc:
* config/sol2.h (LIBASAN_EARLY_SPEC): Add -z now unless
-static-libasan. Add missing whitespace.
|
|
machine description
They're not used there, and we avoid potentially out-of-sync definitions.
gcc/
* config/gcn/gcn.md (FIRST_SGPR_REG, LAST_SGPR_REG)
(FIRST_VGPR_REG, LAST_VGPR_REG, FIRST_AVGPR_REG, LAST_AVGPR_REG):
Don't 'define_constants'.
|
|
..., which was always (a) unused, and (b) bogus: always-false.
gcc/
* config/gcn/gcn.h (SGPR_OR_VGPR_REGNO_P): Remove.
|
|
For OpenACC/GCN '-march=gfx1100', a lot of libgomp OpenACC test cases FAIL:
/tmp/ccGfLJ8a.mkoffload.2.s:406:2: error: instruction not supported on this GPU
ds_cmpst_rtn_b32 v0, v0, v4, v3
^
In RDNA 3, 'ds_cmpst_[...]' has been replaced by 'ds_cmpstore_[...]', and the
notes for 'ds_cmpst_[...]' in pre-RDNA 3 ISA manuals:
Caution, the order of src and cmp are the *opposite* of the BUFFER_ATOMIC_CMPSWAP opcode.
..., have been resolved for 'ds_cmpstore_[...]' in the RDNA 3 ISA manual:
In this architecture the order of src and cmp agree with the BUFFER_ATOMIC_CMPSWAP opcode.
..., and therefore '%2', '%3' now swapped with regards to GCC operand order.
Most of the affected libgomp OpenACC test cases then PASS their execution test.
gcc/
* config/gcn/gcn.md (sync_compare_and_swap<mode>_lds_insn)
[TARGET_RDNA3]: Adjust.
|
|
This reverts commit 26c34b809cd1a6249027730a8b52bbf6a1c0f4a8.
|
|
This reverts commit e56fb037d9d265682f5e7217d8a4c12a8d3fddf8.
|
|
This reverts commit 23cd2961bd2ff63583f46e3499a07bd54491d45c.
|
|
Enables assert that every typed instruction is associated with a
dfa reservation
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sched_variable_issue): enable assert
|
|
Creates new generic vector pipeline file common to all cpu tunes.
Moves all vector related pipelines from generic-ooo to generic-vector-ooo.
Creates new vector crypto related insn reservations.
gcc/ChangeLog:
* config/riscv/generic-ooo.md (generic_ooo): Move reservation
(generic_ooo_vec_load): ditto
(generic_ooo_vec_store): ditto
(generic_ooo_vec_loadstore_seg): ditto
(generic_ooo_vec_alu): ditto
(generic_ooo_vec_fcmp): ditto
(generic_ooo_vec_imul): ditto
(generic_ooo_vec_fadd): ditto
(generic_ooo_vec_fmul): ditto
(generic_ooo_crypto): ditto
(generic_ooo_perm): ditto
(generic_ooo_vec_reduction): ditto
(generic_ooo_vec_ordered_reduction): ditto
(generic_ooo_vec_idiv): ditto
(generic_ooo_vec_float_divsqrt): ditto
(generic_ooo_vec_mask): ditto
(generic_ooo_vec_vesetvl): ditto
(generic_ooo_vec_setrm): ditto
(generic_ooo_vec_readlen): ditto
* config/riscv/riscv.md: include generic-vector-ooo
* config/riscv/generic-vector-ooo.md: New file. to here
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
|
|
This patch adds non-vector related insn reservations and updates/creates
new insn reservations so all non-vector typed instructions have a reservation.
gcc/ChangeLog:
* config/riscv/generic-ooo.md (generic_ooo_sfb_alu): Add reservation
(generic_ooo_branch): ditto
* config/riscv/generic.md (generic_sfb_alu): ditto
(generic_fmul_half): ditto
* config/riscv/riscv.md: Remove cbo, pushpop, and rdfrm types
* config/riscv/sifive-7.md (sifive_7_hfma):Add reservation
(sifive_7_popcount): ditto
* config/riscv/vector.md: change rdfrm to fmove
* config/riscv/zc.md: change pushpop to load/store
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
|
|
After r14-1187-gd6b756447cd58b, simplify_gen_subreg can return
NULL for "unaligned" memory subreg. Since V8DI has an alignment of 8 bytes,
using TImode causes simplify_gen_subreg to return NULL.
This fixes the issue by using DImode instead for the loop. And then we will have
later on the STP/LDP pass combine it back into STP/LDP if needed.
Since strict align is less important (usually used for firmware and early boot only),
not doing LDP/STP here is ok.
Built and tested for aarch64-linux-gnu with no regressions.
PR target/113657
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (split for movv8di):
For strict aligned mode, use DImode instead of TImode.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/ls64_strict_align.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The PR shows us ICEing due to an unrecognizable TFmode save emitted by
aarch64_process_components. The problem is that for T{I,F,D}mode we
conservatively require mems to be in range for x-register ldp/stp. That
is because (at least for TImode) it can be allocated to both GPRs and
FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
a q-register load/store.
As Richard pointed out in the PR, aarch64_get_separate_components
already checks that the offsets are suitable for a single load, so we
just need to choose a mode in aarch64_reg_save_mode that gives the full
q-register range. In this patch, we choose V16QImode as an alternative
16-byte "bag-of-bits" mode that doesn't have the artificial range
restrictions imposed on T{I,F,D}mode.
For T{F,D}mode in GCC 15 I think we could consider relaxing the
restriction imposed in aarch64_classify_address, as typically T{F,D}mode
should be allocated to FPRs. But such a change seems too invasive to
consider for GCC 14 at this stage (let alone backports).
Fortunately the new flexible load/store pair patterns in GCC 14 allow
this mode change to work without further changes. The backports are
more involved as we need to adjust the load/store pair handling to cater
for V16QImode in a few places.
Note that for the testcase we are relying on the torture options to add
-funroll-loops at -O3 which is necessary to trigger the ICE on trunk
(but not on the 13 branch).
gcc/ChangeLog:
PR target/111677
* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
V16QImode for the full 16-byte FPR saves in the vector PCS case.
gcc/testsuite/ChangeLog:
PR target/111677
* gcc.target/aarch64/torture/pr111677.c: New test.
|
|
gcc/
* config/avr/avr-mcus.def: Add AVR64DU28, AVR64DU32, ATA5787,
ATA5835, ATtiny64AUTO, ATA5700M322.
* doc/avr-mmcu.texi: Rebuild.
|
|
strub: introduce STACK_ADDRESS_OFFSET
Since STACK_POINTER_OFFSET is not necessarily at the boundary between
caller- and callee-owned stack, as desired by
__builtin_stack_address(), and using it as if it were or not causes
problems, introduce a new macro so that ports can define it suitably,
without modifying STACK_POINTER_OFFSET.
for gcc/ChangeLog
PR middle-end/112917
PR middle-end/113100
* builtins.cc (expand_builtin_stack_address): Use
STACK_ADDRESS_OFFSET.
* doc/extend.texi (__builtin_stack_address): Adjust.
* config/sparc/sparc.h (STACK_ADDRESS_OFFSET): Define.
* doc/tm.texi.in (STACK_ADDRESS_OFFSET): Document.
* doc/tm.texi: Rebuilt.
|