Age | Commit message (Collapse) | Author | Files | Lines |
|
Support the Sstvala extension, which provides all needed values in
Supervisor Trap Value register (stval).
gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extension definition.
* config/riscv/riscv-ext.opt: New extension mask.
* doc/riscv-ext.texi: Document the new extension.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-sstvala.c: New test.
Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
|
|
Support the Sscounterenw extension, which allows writeable enables for any
supported counter.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extension definition.
* config/riscv/riscv-ext.opt: New extension mask.
* doc/riscv-ext.texi: Document the new extension.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-sscounterenw.c: New test.
Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
|
|
Support the Ssccptr extension, which allows the main memory to support
page table reads.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extension definition.
* config/riscv/riscv-ext.opt: New extension mask.
* doc/riscv-ext.texi: Document the new extension.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-ssccptr.c: New test.
Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
|
|
Support the Smrnmi extension, which provides new CSRs
for Machine mode Non-Maskable Interrupts.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extension definition.
* config/riscv/riscv-ext.opt: New extension mask.
* doc/riscv-ext.texi: Document the new extension.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-smrnmi.c: New test.
Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
|
|
Support the Sm/scsrind extensions, which provide indirect access to
machine-level CSRs.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extension definition.
* config/riscv/riscv-ext.opt: New extension mask.
* doc/riscv-ext.texi: Document the new extension.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-smcsrind.c: New test.
Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
|
|
Some vmovvdup pattern's type attribute is sselog1 and then mem attribute is
both. Modify type attribute according to other patterns about vmovvdup.
gcc/ChangeLog:
* config/i386/sse.md
(avx512f_movddup512<mask_name>): Change sselog1 to ssemov.
(avx_movddup256<mask_name>): Ditto.
(*vec_dupv2di): Change alternative 4's type attribute from sselog1
to ssemov.
|
|
Update the defination of RISC-V extensions in riscv-ext.def.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: Update declaration.
Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
|
|
In commit 50be486dff4ea2676ed022e9524ef190b92ae2b1
"nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup", some
additional tracking of the PTX code was added, and this assumes that
potentially every single character of PTX code needs to be tracked as a new
chunk of PTX code. That's problematic if we're dealing with voluminous PTX
code (for example, non-trivial C++ code), and the 'file_idx' 'alloca'tion then
causes stack overflow. For example:
FAIL: libgomp.c++/target-std__valarray-1.C (test for excess errors)
UNRESOLVED: libgomp.c++/target-std__valarray-1.C compilation failed to produce executable
lto-wrapper: fatal error: [...]/build-gcc/gcc//accel/nvptx-none/mkoffload terminated with signal 11 [Segmentation fault], core dumped
gcc/
* config/nvptx/mkoffload.cc (process): Use an 'auto_vec' for
'file_idx'.
|
|
This patch implies zicsr for svade and svadu extensions.
According to the riscv-privileged spec, the svade and svadu extensions
are privileged instructions, so they should imply zicsr.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: Imply zicsr.
|
|
This patch support svbare extension, which is an extension in RVA23 profile.
To enable GCC to recognize and process svbare extension correctly at compile time.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extension defs.
* config/riscv/riscv-ext.opt: Ditto.
* doc/riscv-ext.texi: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-60.c: New test.
|
|
Some similar code could be wrapped to func get_vector_binary_rtx_cost,
thus leverage this function to avoid code duplication.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv.cc (get_vector_binary_rtx_cost): Rename
the args to scalar2vr.
(riscv_rtx_costs): Leverage above func to avoid code dup.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch add the RISC-V Shlcofideleg extension. It supports delegating
LCOFI interrupts(the count-overflow interrupts) to VS-mode.[1]
[1] https://riscv.github.io/riscv-isa-manual/snapshot/privileged
gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extension defs.
* config/riscv/riscv-ext.opt: Ditto.
* doc/riscv-ext.texi: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-shlocofideleg.c: New test.
Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
|
|
The patch aims to optimize
movb (%rdi), %al
movq %rdi, %rbx
xorl %esi, %eax, %edx
movb %dl, (%rdi)
cmpb %sil, %al
jne
to
xorb %sil, (%rdi)
movq %rdi, %rbx
jne
Reduce 2 mov and 1 cmp instructions.
Due to APX NDD allowing the dest register and source register to be different,
some original peephole2 are invalid. Add new peephole2 patterns for APX NDD.
gcc/ChangeLog:
* config/i386/i386.md (define_peephole2): Define some new peephole2 for
APX NDD.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr49095-2.c: New test.
|
|
Enable -mapxf will change some patterns about adc/sbb.
Hence gcc will raise an extra mov like
movq 8(%rdi), %rax
adcq %rax, 8(%rsi), %rax
movq %rax, 8(%rdi)
rather than
movq 8(%rsi), %rax
adcq %rax, 8(%rdi)
The patch add more kinds of peephole2 to eliminate the extra mov.
gcc/ChangeLog:
* config/i386/i386.md: Add 4 new peephole2 by swap the original
peephole2's operands' order to support new pattern.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr79173-13.c: New test.
* gcc.target/i386/pr79173-14.c: Ditto.
* gcc.target/i386/pr79173-15.c: Ditto.
* gcc.target/i386/pr79173-16.c: Ditto.
* gcc.target/i386/pr79173-17.c: Ditto.
* gcc.target/i386/pr79173-18.c: Ditto.
|
|
This patch would like to combine the vec_duplicate + vdiv.vv to the
vdiv.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_BINARY(T, OP) \
void \
test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
out[i] = in[i] OP x; \
}
DEF_VX_BINARY(int32_t, /)
Before this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
14 │ slli a3,a3,32
15 │ srli a3,a3,32
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
18 │ vle32.v v1,0(a1)
19 │ slli a4,a5,2
20 │ sub a3,a3,a5
21 │ add a1,a1,a4
22 │ vdiv.vv v1,v1,v2
23 │ vse32.v v1,0(a0)
24 │ add a0,a0,a4
25 │ bne a3,zero,.L3
After this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ slli a3,a3,32
13 │ srli a3,a3,32
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
16 │ vle32.v v1,0(a1)
17 │ slli a4,a5,2
18 │ sub a3,a3,a5
19 │ add a1,a1,a4
20 │ vdiv.vx v1,v1,a2
21 │ vse32.v v1,0(a0)
22 │ add a0,a0,a4
23 │ bne a3,zero,.L3
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup): Add new
case for DIV op.
* config/riscv/riscv.cc (get_vector_binary_rtx_cost): Add new func
to get the cost of vector binary.
(riscv_rtx_costs): Add div rtx match and leverage above wrap to
get cost.
* config/riscv/vector-iterators.md: Add new op div to no_shift_vx_op.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Since last patch introduced get_fr2vr_cost () to get the correct cost to move
data from a floating-point to a vector register, this patch replaces existing
uses of the constant FR2VR.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost): Replace
FR2VR with get_fr2vr_cost ().
* config/riscv/riscv.cc (riscv_register_move_cost): Likewise.
(riscv_builtin_vectorization_cost): Likewise.
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a plus-mult or minus-mult RTL instruction.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v6,fa0
vfmadd.vv v9,v6,v7
After, we get only one:
vfmadd.vf v9,fa0,v7
On SPEC2017's 503.bwaves_r, depending on the workload, the reduction in dynamic
instruction count varies from -4.66% to -4.75%.
PR target/119100
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*<optab>_vf_<mode>): Add new pattern to
combine vec_duplicate + vfm{add,sub}.vv into vfm{add,sub}.vf.
* config/riscv/riscv-opts.h (FPR2VR_COST_UNPROVIDED): Define.
* config/riscv/riscv-protos.h (get_fr2vr_cost): Declare function.
* config/riscv/riscv.cc (riscv_rtx_costs): Add cost model for MULT with
VEC_DUPLICATE.
(get_fr2vr_cost): New function.
* config/riscv/riscv.opt: Add new option --param=fpr2vr-cost.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_data.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_run.h: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f64.c: New test.
|
|
This patch support smcntrpmf extension[1].
To enable GCC to recognize and process smcntrpmf extension correctly at compile time.
[1]https://github.com/riscvarchive/riscv-smcntrpmf
gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extension defs.
* config/riscv/riscv-ext.opt: Ditto.
* doc/riscv-ext.texi: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-59.c: New test.
|
|
The -mcmodel=large option was originally added to handle generation of
large binaries with large PLTs. However, when compiling the Linux
kernel with allyesconfig the output binary is so large that the jump
instruction 26-bit immediate is not large enough to store the jump
offset to some symbols when linking. Example error:
relocation truncated to fit: R_OR1K_INSN_REL_26 against symbol `do_fpe_trap' defined in .text section in arch/openrisc/kernel/traps.o
We fix this by forcing jump offsets to registers when -mcmodel=large.
Note, to get the Linux kernel allyesconfig config to work with OpenRISC,
this patch is needed along with some other patches to the Linux hand
coded assembly bits.
gcc/ChangeLog:
* config/or1k/predicates.md (call_insn_operand): Add condition
to not allow symbol_ref operands with TARGET_CMODEL_LARGE.
* config/or1k/or1k.opt: Document new -mcmodel=large
implications.
* doc/invoke.texi: Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/or1k/call-1.c: New test.
* gcc.target/or1k/got-1.c: New test.
|
|
Separate the build rules to compile and link stage to make sure
BUILD_LINKERFLAGS and BUILD_LDFLAGS are applied correctly.
We hit this issue when we try to build GCC with non-system-default g++,
and it use newer libstdc++, and then got error from using older libstdc++ from
system, that should not happened if we link with -static-libgcc and
-static-libstdc++.
gcc/ChangeLog:
* config/riscv/t-riscv: Adjust build rule for gen-riscv-ext-opt
and gen-riscv-ext-texi.
|
|
This commit implements a full-featured iterator for the
riscv_subset_list, that it able to use range-based-for-loop.
That could simplfy the code in the future, and make it more readable,
also more compatible with standard C++ containers.
gcc/ChangeLog:
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Use
range-based-for-loop.
* config/riscv/riscv-subset.h (riscv_subset_list::iterator):
New.
(riscv_subset_list::const_iterator): New.
|
|
Inspired by the avg_ceil patches, notice there were even more
lines too long from autovec.md. So fix that format issues.
gcc/ChangeLog:
* config/riscv/autovec.md: Fix line too long for sorts
of pattern.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
As one of the last steps in removing old reload.
gcc/ChangeLog:
* config/xtensa/xtensa.cc: Remove include of reload.h.
|
|
pattern
In this case, there is no need to consider reloading when memory is the
destination. On the other hand, when memory is the source, reloading
read from constant pool becomes double indirection and should obviously
be avoided.
gcc/ChangeLog:
* config/xtensa/xtensa.md (movsf_internal):
Remove destination side constraint modifier '^' in the third
alternative.
|
|
Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS in order to avoid using
ALL_REGS rclass as is done on other targets, instead of overestimating
between integer and FP register move costs.
gcc/ChangeLog:
* config/xtensa/xtensa.cc
(xtensa_ira_change_pseudo_allocno_class):
New prototype and function.
(TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): Define macro.
(xtensa_register_move_cost):
Change between integer and FP register move cost to a value
based on actual behavior, i.e. 2, the default and the same as
the move cost between integer registers.
|
|
The avg_ceil has the rounding mode towards +inf, while the
vaadd.vv has the rnu which totally match the sematics. From
RVV spec, the fixed vaadd.vv with rnu,
roundoff_signed(v, d) = (signed(v) >> d) + r
r = v[d - 1]
For vaadd, d = 1, then we have
roundoff_signed(v, 1) = (signed(v) >> 1) + v[0]
If v[0] is bit 0, nothing need to do as there is no rounding.
If v[0] is bit 1, there will be rounding with 2 cases.
Case 1: v is positive.
roundoff_signed(v, 1) = (signed(v) >> 1) + 1, aka round towards +inf
roundoff_signed(2 + 3, 1) = (5 >> 1) + 1 = 3
Case 2: v is negative.
roundoff_signed(v, 1) = (signed(v) >> 1) + 1, aka round towards +inf
roundoff_signed(-9 + 2, 1) = (-7 >> 1) + 1 = -4 + 1 = -3
Thus, we can leverage the vaadd with rnu directly for avg_ceil.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/autovec.md (avg<v_double_trunc>3_ceil): Add insn
expand to leverage vaadd with rnu directly.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
autofdo tests are now running only for x86. This patch makes it
run for aarch64 too. Verified that perf and create_gcov are running
as expected.
gcc/ChangeLog:
* config/aarch64/gcc-auto-profile: Make script executable.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Enable autofdo tests for aarch64.
Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
|
|
In AMD znver4, znver5 targets vpshufd, vpsrldq have latencies 1,2 and
throughput 4 (2 for znver4),2 respectively. It is better to generate
shuffles instead of shifts wherever possible. In this patch we try to
generate appropriate shuffle instruction to copy higher half to lower
half instead of a simple right shift during horizontal vector reduction.
gcc/ChangeLog:
* config/i386/i386-expand.cc (emit_reduc_half): Use shuffles to
generate reduc half for V4SI, similar modes.
* config/i386/i386.h (TARGET_SSE_REDUCTION_PREFER_PSHUF): New Macro.
* config/i386/x86-tune.def (X86_TUNE_SSE_REDUCTION_PREFER_PSHUF):
New tuning.
gcc/testsuite/ChangeLog:
* gcc.target/i386/reduc-pshuf.c: New test.
|
|
Add support of double trap extension [1], enabling GCC
to recognize the following extensions at compile time.
New extensions:
- ssdbltrp
- smdbltrp
[1] https://github.com/riscv/riscv-double-trap/releases/download/v1.0/riscv-double-trap.pdf
gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extensions
* config/riscv/riscv-ext.opt: Auto re-generated
* doc/riscv-ext.texi: Auto re-generated
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-57.c: New test
* gcc.target/riscv/arch-58.c: New test
Signed-off-by: Jerry Zhang Jian <jerry.zhangjian@sifive.com>
|
|
This patch would like to combine the vec_duplicate + vmul.vv to the
vmul.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_BINARY(T, OP) \
void \
test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
out[i] = in[i] OP x; \
}
DEF_VX_BINARY(int32_t, |)
Before this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
14 │ slli a3,a3,32
15 │ srli a3,a3,32
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
18 │ vle32.v v1,0(a1)
19 │ slli a4,a5,2
20 │ sub a3,a3,a5
21 │ add a1,a1,a4
22 │ vmul.vv v1,v1,v2
23 │ vse32.v v1,0(a0)
24 │ add a0,a0,a4
25 │ bne a3,zero,.L3
After this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ slli a3,a3,32
13 │ srli a3,a3,32
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
16 │ vle32.v v1,0(a1)
17 │ slli a4,a5,2
18 │ sub a3,a3,a5
19 │ add a1,a1,a4
20 │ vmul.vx v1,v1,a2
21 │ vse32.v v1,0(a0)
22 │ add a0,a0,a4
23 │ bne a3,zero,.L3
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case for MULT op.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op mult to no_shift_vx_ops.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
since uses of addss for other purposes then modelling FP addition/subtraction should
be gone now, this patch sets addss cost back to 2.
gcc/ChangeLog:
PR target/119298
* config/i386/x86-tune-costs.h (struct processor_costs): Set addss cost
back to 2.
|
|
In check_builtin_call we eventually perform a division by zero when no
vector modes are present. This patch just avoids the division in that
case.
PR target/120436
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-shapes.cc (struct vset_def):
Avoid division by zero.
(struct vget_def): Ditto.
* config/riscv/riscv-vector-builtins.h (struct function_group_info):
Use required_extensions_specified instead of duplicating code.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr120436.c: New test.
|
|
The signed avg_floor totally match the sematics of fixed point
rvv insn vaadd, within round down. Thus, leverage it directly
to implement the avf_floor.
The spec of RVV is somehow not that clear about the difference
between the float point and fixed point for the rounding that
discard least-significant information.
For float point which is not two's complement, the "discard
least-significant information" indicates truncation round. For
example as below:
* 3.5 -> 3
* -2.3 -> -2
For fixed point which is two's complement, the "discard
least-significant information" indicates round down. For
example as below:
* 3.5 -> 3
* -2.3 -> -3
And the vaadd takes the round down which is totally matching
the sematics of the avf_floor.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/autovec.md (avg<v_double_trunc>3_floor): Add insn
expand to leverage vaadd directly.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch enables newly implemented features in GCC (FAMINMAX, FP8FMA,
FP8DOT2, FP8DOT4, LUT) for FUJITSU-MONAKA
processor (-mcpu=fujitsu-monaka).
2025-05-23 Yuta Mukai <mukai.yuta@fujitsu.com>
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (fujitsu-monaka): Update ISA
features.
|
|
contrib/ChangeLog:
* gen_autofdo_event.py: Add support for AMD Zen 3 and
later CPUs.
gcc/ChangeLog:
* config/i386/gcc-auto-profile: regenerate.
|
|
A typo in the mnemonic attribute caused a failed bootstrap. Not sure
how that passed the bootstrap done before committing.
gcc/ChangeLog:
* config/s390/vector.md(*vec_extract<mode>): Fix mnemonic.
Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
|
|
So this patch from Shreya adds the ability to use andi + a series of bclr insns
to synthesize a logical AND, much like we're doing for IOR/XOR using ori+bset
or their xor equivalents.
This would regress from a code quality standpoint if we didn't make some
adjustments to a handful of define_insn_and_split patterns in the riscv backend
which support the same kind of idioms.
Essentially we turn those define_insn_and_split patterns into the simple
define_splits they always should have been. That's been the plan since we
started down this path -- now is the time to make that change for a subset of
patterns. It may be the case that when we're finished we may not even need
those patterns. That's still TBD.
I'm aware of one minor regression in xalan. As seen elsewhere, combine
reconstructs the mask value, uses mvconst_internal to load it into a reg then
an and instruction. That looks better than the operation synthesis, but only
because of the mvconst_internal little white lie.
This patch does help in a variety of places. It's fairly common in gimple.c
from 502.gcc to see cases where we'd use bclr to clear a bit, then set the
exact same bit a few instructions later. That was an artifact of using a
define_insn_and_split -- it wasn't obvious to combine that we had two
instructions manipulating the same bit. Now that is obvious to combine and the
redundant operation gets removed.
This has spun in my tester with no regressions on riscv32-elf and riscv64-elf.
Hopefully the baseline for the tester as stepped forward 🙂
gcc/
* config/riscv/bitmanip.md (andi+bclr splits): Simplified from
prior define_insn_and_splits.
* config/riscv/riscv.cc (synthesize_and): Add support for andi+bclr
sequences.
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
|
|
This patch would like to combine the vec_duplicate + vxor.vv to the
vxor.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_BINARY(T, OP) \
void \
test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
out[i] = in[i] OP x; \
}
DEF_VX_BINARY(int32_t, |)
Before this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
14 │ slli a3,a3,32
15 │ srli a3,a3,32
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
18 │ vle32.v v1,0(a1)
19 │ slli a4,a5,2
20 │ sub a3,a3,a5
21 │ add a1,a1,a4
22 │ vxor.vv v1,v1,v2
23 │ vse32.v v1,0(a0)
24 │ add a0,a0,a4
25 │ bne a3,zero,.L3
After this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ slli a3,a3,32
13 │ srli a3,a3,32
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
16 │ vle32.v v1,0(a1)
17 │ slli a4,a5,2
18 │ sub a3,a3,a5
19 │ add a1,a1,a4
20 │ vxor.vx v1,v1,a2
21 │ vse32.v v1,0(a0)
22 │ add a0,a0,a4
23 │ bne a3,zero,.L3
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case for XOR op.
(expand_vx_binary_vec_vec_dup): Diito.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op or to no_shift_vx_ops.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Since floating point and vector registers overlap on s390, more
efficient code can be generated to extract FPRs from VRs.
Additionally, for double vectors, more efficient code can be generated
to load specific lanes.
gcc/ChangeLog:
* config/s390/vector.md (VF): New mode iterator.
(VEC_SET_NONFLOAT): New mode iterator.
(VEC_SET_SINGLEFLOAT): New mode iterator.
(*vec_set<mode>): Split pattern in two.
(*vec_setv2df): Extract special handling for V2DF mode.
(*vec_extract<mode>): Split pattern in two.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vec-extract-1.c: New test.
* gcc.target/s390/vector/vec-set-1.c: New test.
Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
|
|
Add support for autoprofiledbootstrap in aarch64.
This is similar to what is done for i386. Added
gcc/config/aarch64/gcc-auto-profile for aarch64 profile
creation.
How to run:
configure --with-build-config=bootstrap-lto
make autoprofiledbootstrap
ChangeLog:
* Makefile.def: AUTO_PROFILE based on cpu_type.
* Makefile.in: Likewise.
* configure: Regenerate.
* configure.ac: Set autofdo_target.
gcc/ChangeLog:
* config/aarch64/gcc-auto-profile: New file.
Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
|
|
As one of the last steps in removing old reload, I'll delete the reload.h
header file. It would be a bit embarrassing if that stopped the target I am
responsible for from working, so let's prevent that.
We do not actually use anything from this header file (checked by building
with this patch, and make check has identical results as well), so it was
easy for our port. Many other ports will be like this, but some will need
some adjustments. I'll do cross builds of many ports before it is all over,
but it would be good if other ports tried to remove reload.h from their
includes as well :-)
2025-06-26 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/rs6000.cc: Remove include of reload.h .
|
|
gcc/
PR target/86772
Tracking CVE-2017-5753
* config/microblaze/microblaze.cc (TARGET_HAVE_SPECULATION_SAFE_VALUE):
Define to speculation_save_value_not_needed
|
|
this patch attempts to make vectorizer costs of vector consructions more
realistic. Currently we account one integer_to_sse cost for integer vector
construction but we over-estimate 256 and 512bit vinserts by using addss
instead of sse_op. This is because in reality, especially on AMD machines,
vectorization of constructors may get expensive due to quite large
integer<->sse move costs.
Estimating real integer<->sse register traffic is quite hard since some of
integer non-vector arithmetics can be done in SSE registers (for example,
if there is no real arithmetics, just memory load or any code that can be
converted by scalar-to-vector RTL pass).
I think to fix the situation we need to proceed with Richi's recent patch on
adding extra info to the cost hooks and pattern match what can eventually be
STV converted. Towards that we however also need to fix current STC limitations
(such as lack for int->sse conversion) and make the cost mode more meaningful.
This patch removes the hack using addss to "add extra cost" to 256 and 512bit
constructors. Instead I use integer_to_sse cost in add_stmt_cost. We already
account 1 consversion for all constructs (no matter of size). I made it to be
2 conversions for 256 and 3 for 512 since it is closest to what we do now.
Current costs tables are not matching reality for zens
1) SSE loads (which are pushed down from 10 cycles to 3 cycles)
2) SSE stores
2) SSE->integer conversion cost (which is 3 cycles instead of 5)
Similarly we are not having realistic values for Intel chips, especially
artifically increasing SSE->integer costs.
The reason is that changing those values regressed benchmarks. This was mostly
because these costs were accounted wrong on multiple spots and we kind of
fine-tuned for SPECs.
Other reason is that at the time the tables was merged with register allocator
increasing those costs led to IRA using integer registers to spill SSE values
and vice versa which does not work that well in practice. I think one of
problems there is missing model for memory renaming which makes integer
spilling significantly cheaper then modelled.
In previous patches I fixed multiple issues on accounting loads and stores and
with this change, I hope I will be able to get the tables more realistics and
incrementally fix issues with individual benchmarks.
I benchmarked the patch wtih -Ofast -march=native -flto on znver5 and skylake.
It seems in noise for skylake, for znver5 I got what seems off-noise for
xalabcbmk 8.73->8.81 (rate). Rest seems in noise too,
however the change affects quite some SLP decisions when the sequence is
just loads followed by vector store.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_builtin_vectorization_cost):
use sse_op instead of addss to cost vinsertti128 and vinsertti64x4;
compute correct mode of vinsertti128.
(ix86_vector_costs::add_stmt_cost): For integer 256bit and 512bit
vector constructions account more integer_to_sse moves.
|
|
With `-masm=intel`, GCC generates registers without % prefixes. If a
user-declared symbol happens to match a register, it will confuse the
assembler. User-defined symbols should be quoted, so they are not to
be mistaken for registers or operators.
Support for quoted symbols were added in Binutils 2.26, originally
for ARM assembly, where registers are also unprefixed:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d02603dc201f80cd9d2a1f4b1a16110b1e04222b
This change is required for `@SECREL32` to work in Intel syntax when
targeting Windows, where `@` is allowed as part of a symbol. GNU AS
fails to parse a plain symbol with that suffix:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881#c79
gcc/ChangeLog:
PR target/53929
PR target/80881
* config/i386/i386-protos.h (ix86_asm_output_labelref): Declare new
function for quoting user-defined symbols in Intel syntax.
* config/i386/i386.cc (ix86_asm_output_labelref): Implement it.
* config/i386/i386.h (ASM_OUTPUT_LABELREF): Use it.
* config/i386/cygming.h (ASM_OUTPUT_LABELREF): Use it.
|
|
The next chunk of Shreya's work.
For this expansion we want to detect cases when the mask fits in a simm12 after
shifting right by the number of trailing zeros in the mask.
In that case we can synthesize the AND with a shift right, andi and shift left.
I saw this case come up when doing some experimentation with mvconst_internal
removed.
This doesn't make any difference in spec right now, mvconst_internal will turn
the sequence back into a constant load + and with register. But Shreya and I
have reviewed the .expand dump on hand written tests and verified we're getting
the synthesis we want.
Tested on riscv32-elf and riscv64-elf. Waiting on upstream CI's verdict before
moving forward.
gcc/
* config/riscv/riscv.cc (synthesize_and): Use a srl+andi+sll
sequence when the mask fits in a simm12 after shifting by the
number of trailing zeros.
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
|
|
This patch would like to combine the vec_duplicate + vor.vv to the
vor.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_BINARY(T, OP) \
void \
test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
out[i] = in[i] OP x; \
}
DEF_VX_BINARY(int32_t, |)
Before this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
14 │ slli a3,a3,32
15 │ srli a3,a3,32
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
18 │ vle32.v v1,0(a1)
19 │ slli a4,a5,2
20 │ sub a3,a3,a5
21 │ add a1,a1,a4
22 │ vor.vv v1,v1,v2
23 │ vse32.v v1,0(a0)
24 │ add a0,a0,a4
25 │ bne a3,zero,.L3
After this patch:
10 │ test_vx_binary_or_int32_t_case_0:
11 │ beq a3,zero,.L8
12 │ slli a3,a3,32
13 │ srli a3,a3,32
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
16 │ vle32.v v1,0(a1)
17 │ slli a4,a5,2
18 │ sub a3,a3,a5
19 │ add a1,a1,a4
20 │ vor.vx v1,v1,a2
21 │ vse32.v v1,0(a0)
22 │ add a0,a0,a4
23 │ bne a3,zero,.L3
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new
case for IOR op.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op or to no_shift_vx_ops.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch adds a --param=autovec-mode=<MODE_NAME>. When the param is
specified we make autovectorize_vector_modes return exactly this mode if
it is available. This helps when testing different vectorizer settings.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (autovectorize_vector_modes): Return
user-specified mode if available.
* config/riscv/riscv.opt: New param.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/param-autovec-mode.c: New test.
|
|
This patch initializes saved_vxrm_mode to VXRM_MODE_NONE. This is a
warning (but no error) when building the compiler so better fix it.
gcc/ChangeLog:
* config/riscv/riscv.cc (singleton_vxrm_need): Init
saved_vxrm_mode.
|
|
With all-SLP we annotate statements slightly differently. This patch
uses STMT_VINFO_RELEVANT_P in order to walk through potential program
points.
Also it makes the LMUL estimate always use the same path. This helps
fix a number of test cases that regressed since GCC 14.
There are still some failing ones but it appears to me that the chosen
LMUL is still correct and we just expect different log messages.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (compute_estimated_lmul):
Always use vect_vf_for_cost and TARGET_MIN_VLEN.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Adjust
expectations.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Ditto.
|
|
This patch folds the following pattern:
lsl <y>, <x>, <shift>
lsr <z>, <x>, <shift>
orr <r>, <y>, <z>
to:
revb/h/w <r>, <x>
when the shift amount is equal to half the bitwidth of the <x>
register.
Bootstrapped and regtested on aarch64-linux-gnu.
Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com>
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
gcc/ChangeLog:
* expmed.cc (expand_rotate_as_vec_perm): Avoid a no-op move if the
target already provided the result in the expected register.
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const):
Avoid forcing subregs into fresh registers unnecessarily.
* config/aarch64/aarch64-sve.md: Add define_split for rotate.
(*v_revvnx8hi): New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/shift_rev_1.c: New test.
* gcc.target/aarch64/sve/shift_rev_2.c: Likewise.
* gcc.target/aarch64/sve/shift_rev_3.c: Likewise.
|