Age | Commit message (Collapse) | Author | Files | Lines |
|
zce must imply zcf but this rule was corrupted after
refactoring in 9e12010b5e724277ea. This may be observed
ater generating an .s file from any source code file with
-mriscv-attribute -march=rv32if_zce -mabi=ilp32 -S
options. A full march will be presented in arch attribute:
rv32i2p1_f2p2_zicsr2p0_zca1p0_zcb1p0_zce1p0_zcmp1p0_zcmt1p0
As you see, zcf is not presented here though f_zce pair is
passed in -march. According to The RISC-V Instruction
Set Manual:
Specifying Zce on RV32 with F includes Zca, Zcb, Zcmp,
Zcmt and Zcf.
PR target/118906
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: fix zce to zcf
implication.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/attribute-zce-1.c: New test.
* gcc.target/riscv/attribute-zce-2.c: New test.
* gcc.target/riscv/attribute-zce-3.c: New test.
* gcc.target/riscv/attribute-zce-4.c: New test.
|
|
This patch would like to fix one bug when expanding const vector for the
interleave case. For example, we have:
base1 = 151
step = 121
For vec_series, we will generate vector in format of v[i] = base + i * step.
Then the vec_series will have below result for HImode, and we can find
that the result overflow to the highest 8 bits of HImode.
v1.b = {151, 255, 7, 0, 119, 0, 231, 0, 87, 1, 199, 1, 55, 2, 167, 2}
Aka we expect v1.b should be:
v1.b = {151, 0, 7, 0, 119, 0, 231, 0, 87, 0, 199, 0, 55, 0, 167, 0}
After that it will perform the IOR with v2 for the base2(aka another series).
v2.b = {0, 17, 0, 33, 0, 49, 0, 65, 0, 81, 0, 97, 0, 113, 0, 129}
Unfortunately, the base1 + i * step1 in HImode may overflow to the high
8 bits, and the high 8 bits will pollute the v2 and result in incorrect
value in const_vector.
This patch would like to perform the overflow to smode check before the
optimized interleave code generation. If overflow or VLA, it will fall
back to the default merge approach.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/118931
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Add overflow to
smode check and clean up highest bits if overflow.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118931-run-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
In PR118950 we do not zero masked elements in a gather load.
While recognizing a gather/scatter pattern we do not use the original
type of the LHS. This matters because the type can differ with bool
patterns (e.g. _Bool vs unsigned char) and we don't notice the need
for zeroing out the padding bytes.
This patch just uses the original LHS's type.
PR middle-end/118950
gcc/ChangeLog:
* tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Use
original LHS's type.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr118950.c: New test.
|
|
The incorrect cfi directive info breaks stack unwind in try/catch/cxa.
Before patch:
cm.push {ra, s0-s2}, -16
.cfi_offset 1, -12
.cfi_offset 8, -8
.cfi_offset 18, -4
After patch:
cm.push {ra, s0-s2}, -16
.cfi_offset 1, -16
.cfi_offset 8, -12
.cfi_offset 9, -8
.cfi_offset 18, -4
gcc/ChangeLog:
* config/riscv/riscv.cc: Set multi push regs bits.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zcmp_push_gpr.c: New test.
|
|
dynamic stack allocation not supported'
In Subversion r217296 (Git commit e2acc079ff125a869159be45371dc0a29b230e92)
"Testsuite alloca fixes for ptx", effective-target 'alloca' was added to mark
up test cases that run into the nvptx back end's non-support of dynamic stack
allocation. (Later, nvptx gained conditional support for that in
commit 3861d362ec7e3c50742fc43833fe9d8674f4070e
"nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181]", but on the
other hand, in commit f93a612fc4567652b75ffc916d31a446378e6613
"bpf: liberate R9 for general register allocation", the BPF back end joined
"the list of targets that do not support alloca in target-support.exp".
Manually maintaining the list of test cases requiring effective-target 'alloca'
is notoriously hard, gets out of date quickly: new test cases added to the test
suite may need to be analyzed and annotated, and over time annotations also may
need to be removed, in cases where the compiler learns to optimize out
'alloca'/VLA usage, for example. This commit replaces (99 % of) the manual
annotations with an automatic scheme: turn test cases into UNSUPPORTED if
running into 'sorry, unimplemented: dynamic stack allocation not supported'.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_alloca):
Gracefully handle the case that we've not be called (indirectly)
from 'dg-test'.
* lib/gcc-dg.exp (proc gcc-dg-prune): Turn
'sorry, unimplemented: dynamic stack allocation not supported' into
UNSUPPORTED.
* c-c++-common/Walloca-larger-than.c: Don't
'dg-require-effective-target alloca'.
* c-c++-common/Warray-bounds-9.c: Likewise.
* c-c++-common/Warray-bounds.c: Likewise.
* c-c++-common/Wdangling-pointer-2.c: Likewise.
* c-c++-common/Wdangling-pointer-4.c: Likewise.
* c-c++-common/Wdangling-pointer-5.c: Likewise.
* c-c++-common/Wdangling-pointer.c: Likewise.
* c-c++-common/Wimplicit-fallthrough-7.c: Likewise.
* c-c++-common/Wsizeof-pointer-memaccess1.c: Likewise.
* c-c++-common/Wsizeof-pointer-memaccess2.c: Likewise.
* c-c++-common/Wstringop-truncation.c: Likewise.
* c-c++-common/Wunused-var-6.c: Likewise.
* c-c++-common/Wunused-var-8.c: Likewise.
* c-c++-common/analyzer/alloca-leak.c: Likewise.
* c-c++-common/analyzer/allocation-size-multiline-2.c: Likewise.
* c-c++-common/analyzer/allocation-size-multiline-3.c: Likewise.
* c-c++-common/analyzer/capacity-1.c: Likewise.
* c-c++-common/analyzer/capacity-3.c: Likewise.
* c-c++-common/analyzer/imprecise-floating-point-1.c: Likewise.
* c-c++-common/analyzer/infinite-recursion-alloca.c: Likewise.
* c-c++-common/analyzer/malloc-callbacks.c: Likewise.
* c-c++-common/analyzer/malloc-paths-8.c: Likewise.
* c-c++-common/analyzer/out-of-bounds-5.c: Likewise.
* c-c++-common/analyzer/out-of-bounds-diagram-11.c: Likewise.
* c-c++-common/analyzer/uninit-alloca.c: Likewise.
* c-c++-common/analyzer/write-to-string-literal-5.c: Likewise.
* c-c++-common/asan/alloca_loop_unpoisoning.c: Likewise.
* c-c++-common/auto-init-11.c: Likewise.
* c-c++-common/auto-init-12.c: Likewise.
* c-c++-common/auto-init-15.c: Likewise.
* c-c++-common/auto-init-16.c: Likewise.
* c-c++-common/builtins.c: Likewise.
* c-c++-common/dwarf2/vla1.c: Likewise.
* c-c++-common/gomp/pr61486-2.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-4.c: Likewise.
* c-c++-common/torture/strub-run3.c: Likewise.
* c-c++-common/torture/strub-run4.c: Likewise.
* c-c++-common/torture/strub-run4c.c: Likewise.
* c-c++-common/torture/strub-run4d.c: Likewise.
* c-c++-common/torture/strub-run4i.c: Likewise.
* g++.dg/Walloca1.C: Likewise.
* g++.dg/Walloca2.C: Likewise.
* g++.dg/cpp0x/pr70338.C: Likewise.
* g++.dg/cpp1y/lambda-generic-vla1.C: Likewise.
* g++.dg/cpp1y/vla10.C: Likewise.
* g++.dg/cpp1y/vla2.C: Likewise.
* g++.dg/cpp1y/vla6.C: Likewise.
* g++.dg/cpp1y/vla8.C: Likewise.
* g++.dg/debug/debug5.C: Likewise.
* g++.dg/debug/debug6.C: Likewise.
* g++.dg/debug/pr54828.C: Likewise.
* g++.dg/diagnostic/pr70105.C: Likewise.
* g++.dg/eh/cleanup5.C: Likewise.
* g++.dg/eh/spbp.C: Likewise.
* g++.dg/ext/builtin_alloca.C: Likewise.
* g++.dg/ext/tmplattr9.C: Likewise.
* g++.dg/ext/vla10.C: Likewise.
* g++.dg/ext/vla11.C: Likewise.
* g++.dg/ext/vla12.C: Likewise.
* g++.dg/ext/vla15.C: Likewise.
* g++.dg/ext/vla16.C: Likewise.
* g++.dg/ext/vla17.C: Likewise.
* g++.dg/ext/vla23.C: Likewise.
* g++.dg/ext/vla3.C: Likewise.
* g++.dg/ext/vla6.C: Likewise.
* g++.dg/ext/vla7.C: Likewise.
* g++.dg/init/array24.C: Likewise.
* g++.dg/init/new47.C: Likewise.
* g++.dg/init/pr55497.C: Likewise.
* g++.dg/opt/pr78201.C: Likewise.
* g++.dg/template/vla2.C: Likewise.
* g++.dg/torture/Wsizeof-pointer-memaccess1.C: Likewise.
* g++.dg/torture/Wsizeof-pointer-memaccess2.C: Likewise.
* g++.dg/torture/pr62127.C: Likewise.
* g++.dg/torture/pr67055.C: Likewise.
* g++.dg/torture/stackalign/eh-alloca-1.C: Likewise.
* g++.dg/torture/stackalign/eh-inline-2.C: Likewise.
* g++.dg/torture/stackalign/eh-vararg-1.C: Likewise.
* g++.dg/torture/stackalign/eh-vararg-2.C: Likewise.
* g++.dg/warn/Wplacement-new-size-5.C: Likewise.
* g++.dg/warn/Wsizeof-pointer-memaccess-1.C: Likewise.
* g++.dg/warn/Wvla-1.C: Likewise.
* g++.dg/warn/Wvla-3.C: Likewise.
* g++.old-deja/g++.ext/array2.C: Likewise.
* g++.old-deja/g++.ext/constructor.C: Likewise.
* g++.old-deja/g++.law/builtin1.C: Likewise.
* g++.old-deja/g++.other/crash12.C: Likewise.
* g++.old-deja/g++.other/eh3.C: Likewise.
* g++.old-deja/g++.pt/array6.C: Likewise.
* g++.old-deja/g++.pt/dynarray.C: Likewise.
* gcc.c-torture/compile/20000923-1.c: Likewise.
* gcc.c-torture/compile/20030224-1.c: Likewise.
* gcc.c-torture/compile/20071108-1.c: Likewise.
* gcc.c-torture/compile/20071117-1.c: Likewise.
* gcc.c-torture/compile/900313-1.c: Likewise.
* gcc.c-torture/compile/parms.c: Likewise.
* gcc.c-torture/compile/pr17397.c: Likewise.
* gcc.c-torture/compile/pr35006.c: Likewise.
* gcc.c-torture/compile/pr42956.c: Likewise.
* gcc.c-torture/compile/pr51354.c: Likewise.
* gcc.c-torture/compile/pr52714.c: Likewise.
* gcc.c-torture/compile/pr55851.c: Likewise.
* gcc.c-torture/compile/pr77754-1.c: Likewise.
* gcc.c-torture/compile/pr77754-2.c: Likewise.
* gcc.c-torture/compile/pr77754-3.c: Likewise.
* gcc.c-torture/compile/pr77754-4.c: Likewise.
* gcc.c-torture/compile/pr77754-5.c: Likewise.
* gcc.c-torture/compile/pr77754-6.c: Likewise.
* gcc.c-torture/compile/pr78439.c: Likewise.
* gcc.c-torture/compile/pr79413.c: Likewise.
* gcc.c-torture/compile/pr82564.c: Likewise.
* gcc.c-torture/compile/pr87110.c: Likewise.
* gcc.c-torture/compile/pr99787-1.c: Likewise.
* gcc.c-torture/compile/vla-const-1.c: Likewise.
* gcc.c-torture/compile/vla-const-2.c: Likewise.
* gcc.c-torture/execute/20010209-1.c: Likewise.
* gcc.c-torture/execute/20020314-1.c: Likewise.
* gcc.c-torture/execute/20020412-1.c: Likewise.
* gcc.c-torture/execute/20021113-1.c: Likewise.
* gcc.c-torture/execute/20040223-1.c: Likewise.
* gcc.c-torture/execute/20040308-1.c: Likewise.
* gcc.c-torture/execute/20040811-1.c: Likewise.
* gcc.c-torture/execute/20070824-1.c: Likewise.
* gcc.c-torture/execute/20070919-1.c: Likewise.
* gcc.c-torture/execute/built-in-setjmp.c: Likewise.
* gcc.c-torture/execute/pr22061-1.c: Likewise.
* gcc.c-torture/execute/pr43220.c: Likewise.
* gcc.c-torture/execute/pr82210.c: Likewise.
* gcc.c-torture/execute/pr86528.c: Likewise.
* gcc.c-torture/execute/vla-dealloc-1.c: Likewise.
* gcc.dg/20001012-2.c: Likewise.
* gcc.dg/20020415-1.c: Likewise.
* gcc.dg/20030331-2.c: Likewise.
* gcc.dg/20101010-1.c: Likewise.
* gcc.dg/Walloca-1.c: Likewise.
* gcc.dg/Walloca-10.c: Likewise.
* gcc.dg/Walloca-11.c: Likewise.
* gcc.dg/Walloca-12.c: Likewise.
* gcc.dg/Walloca-13.c: Likewise.
* gcc.dg/Walloca-14.c: Likewise.
* gcc.dg/Walloca-15.c: Likewise.
* gcc.dg/Walloca-2.c: Likewise.
* gcc.dg/Walloca-3.c: Likewise.
* gcc.dg/Walloca-4.c: Likewise.
* gcc.dg/Walloca-5.c: Likewise.
* gcc.dg/Walloca-6.c: Likewise.
* gcc.dg/Walloca-7.c: Likewise.
* gcc.dg/Walloca-8.c: Likewise.
* gcc.dg/Walloca-9.c: Likewise.
* gcc.dg/Walloca-larger-than-2.c: Likewise.
* gcc.dg/Walloca-larger-than-3.c: Likewise.
* gcc.dg/Walloca-larger-than-4.c: Likewise.
* gcc.dg/Walloca-larger-than.c: Likewise.
* gcc.dg/Warray-bounds-22.c: Likewise.
* gcc.dg/Warray-bounds-41.c: Likewise.
* gcc.dg/Warray-bounds-46.c: Likewise.
* gcc.dg/Warray-bounds-48-novec.c: Likewise.
* gcc.dg/Warray-bounds-48.c: Likewise.
* gcc.dg/Warray-bounds-50.c: Likewise.
* gcc.dg/Warray-bounds-63.c: Likewise.
* gcc.dg/Warray-bounds-66.c: Likewise.
* gcc.dg/Wdangling-pointer.c: Likewise.
* gcc.dg/Wfree-nonheap-object-2.c: Likewise.
* gcc.dg/Wfree-nonheap-object.c: Likewise.
* gcc.dg/Wrestrict-17.c: Likewise.
* gcc.dg/Wrestrict.c: Likewise.
* gcc.dg/Wreturn-local-addr-2.c: Likewise.
* gcc.dg/Wreturn-local-addr-3.c: Likewise.
* gcc.dg/Wreturn-local-addr-4.c: Likewise.
* gcc.dg/Wreturn-local-addr-6.c: Likewise.
* gcc.dg/Wsizeof-pointer-memaccess1.c: Likewise.
* gcc.dg/Wstack-usage.c: Likewise.
* gcc.dg/Wstrict-aliasing-bogus-vla-1.c: Likewise.
* gcc.dg/Wstrict-overflow-27.c: Likewise.
* gcc.dg/Wstringop-overflow-15.c: Likewise.
* gcc.dg/Wstringop-overflow-23.c: Likewise.
* gcc.dg/Wstringop-overflow-25.c: Likewise.
* gcc.dg/Wstringop-overflow-27.c: Likewise.
* gcc.dg/Wstringop-overflow-3.c: Likewise.
* gcc.dg/Wstringop-overflow-39.c: Likewise.
* gcc.dg/Wstringop-overflow-56.c: Likewise.
* gcc.dg/Wstringop-overflow-57.c: Likewise.
* gcc.dg/Wstringop-overflow-67.c: Likewise.
* gcc.dg/Wstringop-overflow-71.c: Likewise.
* gcc.dg/Wstringop-truncation-3.c: Likewise.
* gcc.dg/Wvla-larger-than-1.c: Likewise.
* gcc.dg/Wvla-larger-than-2.c: Likewise.
* gcc.dg/Wvla-larger-than-3.c: Likewise.
* gcc.dg/Wvla-larger-than-4.c: Likewise.
* gcc.dg/Wvla-larger-than-5.c: Likewise.
* gcc.dg/analyzer/boxed-malloc-1.c: Likewise.
* gcc.dg/analyzer/call-summaries-2.c: Likewise.
* gcc.dg/analyzer/malloc-1.c: Likewise.
* gcc.dg/analyzer/malloc-reuse.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-diagram-12.c: Likewise.
* gcc.dg/analyzer/pr93355-localealias.c: Likewise.
* gcc.dg/analyzer/putenv-1.c: Likewise.
* gcc.dg/analyzer/taint-alloc-1.c: Likewise.
* gcc.dg/analyzer/torture/pr93373.c: Likewise.
* gcc.dg/analyzer/torture/ubsan-1.c: Likewise.
* gcc.dg/analyzer/vla-1.c: Likewise.
* gcc.dg/atomic/stdatomic-vm.c: Likewise.
* gcc.dg/attr-alloc_size-6.c: Likewise.
* gcc.dg/attr-alloc_size-7.c: Likewise.
* gcc.dg/attr-alloc_size-8.c: Likewise.
* gcc.dg/attr-alloc_size-9.c: Likewise.
* gcc.dg/attr-noipa.c: Likewise.
* gcc.dg/auto-init-uninit-36.c: Likewise.
* gcc.dg/auto-init-uninit-9.c: Likewise.
* gcc.dg/auto-type-1.c: Likewise.
* gcc.dg/builtin-alloc-size.c: Likewise.
* gcc.dg/builtin-dynamic-alloc-size.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-1.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-object-size-1.c: Likewise.
* gcc.dg/builtin-object-size-2.c: Likewise.
* gcc.dg/builtin-object-size-3.c: Likewise.
* gcc.dg/builtin-object-size-4.c: Likewise.
* gcc.dg/builtins-64.c: Likewise.
* gcc.dg/builtins-68.c: Likewise.
* gcc.dg/c23-auto-2.c: Likewise.
* gcc.dg/c99-const-expr-13.c: Likewise.
* gcc.dg/c99-vla-1.c: Likewise.
* gcc.dg/fold-alloca-1.c: Likewise.
* gcc.dg/gomp/pr30494.c: Likewise.
* gcc.dg/gomp/vla-2.c: Likewise.
* gcc.dg/gomp/vla-3.c: Likewise.
* gcc.dg/gomp/vla-4.c: Likewise.
* gcc.dg/gomp/vla-5.c: Likewise.
* gcc.dg/graphite/pr99085.c: Likewise.
* gcc.dg/guality/guality.c: Likewise.
* gcc.dg/lto/pr80778_0.c: Likewise.
* gcc.dg/nested-func-10.c: Likewise.
* gcc.dg/nested-func-12.c: Likewise.
* gcc.dg/nested-func-13.c: Likewise.
* gcc.dg/nested-func-14.c: Likewise.
* gcc.dg/nested-func-15.c: Likewise.
* gcc.dg/nested-func-16.c: Likewise.
* gcc.dg/nested-func-17.c: Likewise.
* gcc.dg/nested-func-9.c: Likewise.
* gcc.dg/packed-vla.c: Likewise.
* gcc.dg/pr100225.c: Likewise.
* gcc.dg/pr25682.c: Likewise.
* gcc.dg/pr27301.c: Likewise.
* gcc.dg/pr31507-1.c: Likewise.
* gcc.dg/pr33238.c: Likewise.
* gcc.dg/pr41470.c: Likewise.
* gcc.dg/pr49120.c: Likewise.
* gcc.dg/pr50764.c: Likewise.
* gcc.dg/pr51491-2.c: Likewise.
* gcc.dg/pr51990-2.c: Likewise.
* gcc.dg/pr51990.c: Likewise.
* gcc.dg/pr59011.c: Likewise.
* gcc.dg/pr59523.c: Likewise.
* gcc.dg/pr61561.c: Likewise.
* gcc.dg/pr78468.c: Likewise.
* gcc.dg/pr78902.c: Likewise.
* gcc.dg/pr79972.c: Likewise.
* gcc.dg/pr82875.c: Likewise.
* gcc.dg/pr83844.c: Likewise.
* gcc.dg/pr84131.c: Likewise.
* gcc.dg/pr87099.c: Likewise.
* gcc.dg/pr87320.c: Likewise.
* gcc.dg/pr89045.c: Likewise.
* gcc.dg/pr91014.c: Likewise.
* gcc.dg/pr93986.c: Likewise.
* gcc.dg/pr98721-1.c: Likewise.
* gcc.dg/pr99122-2.c: Likewise.
* gcc.dg/shrink-wrap-alloca.c: Likewise.
* gcc.dg/sso-14.c: Likewise.
* gcc.dg/strlenopt-62.c: Likewise.
* gcc.dg/strlenopt-83.c: Likewise.
* gcc.dg/strlenopt-84.c: Likewise.
* gcc.dg/strlenopt-91.c: Likewise.
* gcc.dg/torture/Wsizeof-pointer-memaccess1.c: Likewise.
* gcc.dg/torture/calleesave-sse.c: Likewise.
* gcc.dg/torture/pr48953.c: Likewise.
* gcc.dg/torture/pr71881.c: Likewise.
* gcc.dg/torture/pr71901.c: Likewise.
* gcc.dg/torture/pr78742.c: Likewise.
* gcc.dg/torture/pr92088-1.c: Likewise.
* gcc.dg/torture/pr92088-2.c: Likewise.
* gcc.dg/torture/pr93124.c: Likewise.
* gcc.dg/torture/pr94479.c: Likewise.
* gcc.dg/torture/stackalign/alloca-1.c: Likewise.
* gcc.dg/torture/stackalign/inline-2.c: Likewise.
* gcc.dg/torture/stackalign/nested-3.c: Likewise.
* gcc.dg/torture/stackalign/vararg-1.c: Likewise.
* gcc.dg/torture/stackalign/vararg-2.c: Likewise.
* gcc.dg/tree-ssa/20030807-2.c: Likewise.
* gcc.dg/tree-ssa/20080530.c: Likewise.
* gcc.dg/tree-ssa/alias-37.c: Likewise.
* gcc.dg/tree-ssa/builtin-sprintf-warn-22.c: Likewise.
* gcc.dg/tree-ssa/builtin-sprintf-warn-25.c: Likewise.
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Likewise.
* gcc.dg/tree-ssa/loop-interchange-15.c: Likewise.
* gcc.dg/tree-ssa/pr23848-1.c: Likewise.
* gcc.dg/tree-ssa/pr23848-2.c: Likewise.
* gcc.dg/tree-ssa/pr23848-3.c: Likewise.
* gcc.dg/tree-ssa/pr23848-4.c: Likewise.
* gcc.dg/uninit-32.c: Likewise.
* gcc.dg/uninit-36.c: Likewise.
* gcc.dg/uninit-39.c: Likewise.
* gcc.dg/uninit-41.c: Likewise.
* gcc.dg/uninit-9-O0.c: Likewise.
* gcc.dg/uninit-9.c: Likewise.
* gcc.dg/uninit-pr100250.c: Likewise.
* gcc.dg/uninit-pr101300.c: Likewise.
* gcc.dg/uninit-pr101494.c: Likewise.
* gcc.dg/uninit-pr98583.c: Likewise.
* gcc.dg/vla-2.c: Likewise.
* gcc.dg/vla-22.c: Likewise.
* gcc.dg/vla-24.c: Likewise.
* gcc.dg/vla-3.c: Likewise.
* gcc.dg/vla-4.c: Likewise.
* gcc.dg/vla-stexp-1.c: Likewise.
* gcc.dg/vla-stexp-2.c: Likewise.
* gcc.dg/vla-stexp-4.c: Likewise.
* gcc.dg/vla-stexp-5.c: Likewise.
* gcc.dg/winline-7.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-1.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-10.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-2.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-3.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-4.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-5.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-6.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-7.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-8.c: Likewise.
* gcc.target/aarch64/stack-check-alloca-9.c: Likewise.
* gcc.target/arc/interrupt-6.c: Likewise.
* gcc.target/i386/pr80969-3.c: Likewise.
* gcc.target/loongarch/stack-check-alloca-1.c: Likewise.
* gcc.target/loongarch/stack-check-alloca-2.c: Likewise.
* gcc.target/loongarch/stack-check-alloca-3.c: Likewise.
* gcc.target/loongarch/stack-check-alloca-4.c: Likewise.
* gcc.target/loongarch/stack-check-alloca-5.c: Likewise.
* gcc.target/loongarch/stack-check-alloca-6.c: Likewise.
* gcc.target/riscv/stack-check-alloca-1.c: Likewise.
* gcc.target/riscv/stack-check-alloca-10.c: Likewise.
* gcc.target/riscv/stack-check-alloca-2.c: Likewise.
* gcc.target/riscv/stack-check-alloca-3.c: Likewise.
* gcc.target/riscv/stack-check-alloca-4.c: Likewise.
* gcc.target/riscv/stack-check-alloca-5.c: Likewise.
* gcc.target/riscv/stack-check-alloca-6.c: Likewise.
* gcc.target/riscv/stack-check-alloca-7.c: Likewise.
* gcc.target/riscv/stack-check-alloca-8.c: Likewise.
* gcc.target/riscv/stack-check-alloca-9.c: Likewise.
* gcc.target/sparc/setjmp-1.c: Likewise.
* gcc.target/x86_64/abi/ms-sysv/ms-sysv.c: Likewise.
* gcc.c-torture/compile/20001221-1.c: Don't 'dg-skip-if'
for '! alloca'.
* gcc.c-torture/compile/20020807-1.c: Likewise.
* gcc.c-torture/compile/20050801-2.c: Likewise.
* gcc.c-torture/compile/920428-4.c: Likewise.
* gcc.c-torture/compile/debugvlafunction-1.c: Likewise.
* gcc.c-torture/compile/pr41469.c: Likewise.
* gcc.c-torture/execute/920721-2.c: Likewise.
* gcc.c-torture/execute/920929-1.c: Likewise.
* gcc.c-torture/execute/921017-1.c: Likewise.
* gcc.c-torture/execute/941202-1.c: Likewise.
* gcc.c-torture/execute/align-nest.c: Likewise.
* gcc.c-torture/execute/alloca-1.c: Likewise.
* gcc.c-torture/execute/pr22061-4.c: Likewise.
* gcc.c-torture/execute/pr36321.c: Likewise.
* gcc.dg/torture/pr8081.c: Likewise.
* gcc.dg/analyzer/data-model-1.c: Don't
'dg-require-effective-target alloca'. XFAIL relevant
'dg-warning's for '! alloca'.
* gcc.dg/uninit-38.c: Likewise.
* gcc.dg/uninit-pr98578.c: Likewise.
* gcc.dg/compat/struct-by-value-22_main.c: Comment on
'dg-require-effective-target alloca'.
libstdc++-v3/
* testsuite/lib/prune.exp (proc libstdc++-dg-prune): Turn
'sorry, unimplemented: dynamic stack allocation not supported' into
UNSUPPORTED.
|
|
This patch would like to fix the ICE similar as below, assump we have
sample code:
1 │ int a, b, c;
2 │ short d, e, f;
3 │ long g (long h) { return h; }
4 │
5 │ void i () {
6 │ for (; b; ++b) {
7 │ f = 5 >> a ? d : d << a;
8 │ e &= c | g(f);
9 │ }
10 │ }
It will ice when compile with -O3 -march=rv64gc_zve64f -mrvv-vector-bits=zvl
during GIMPLE pass: vect
pr116351-1.c: In function ‘i’:
pr116351-1.c:8:6: internal compiler error: in get_len_load_store_mode,
at optabs-tree.cc:655
8 | void i () {
| ^
0x44d6b9d internal_error(char const*, ...)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517
0x44a26a6 fancy_abort(char const*, int, char const*)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722
0x19e4309 get_len_load_store_mode(machine_mode, bool, internal_fn*, vec<int, va_heap, vl_ptr>*)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs-tree.cc:655
0x1fada40 vect_verify_loop_lens
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:1566
0x1fb2b07 vect_analyze_loop_2
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3037
0x1fb4302 vect_analyze_loop_1
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3478
0x1fb4e9a vect_analyze_loop(loop*, gimple*, vec_info_shared*)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3638
0x203c2dc try_vectorize_loop_1
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1095
0x203c839 try_vectorize_loop
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1212
0x203cb2c execute
During vectorization the override_widen pattern matched and then will get DImode
as vector_mode in loop_info. After that the loop_vinfo will step in vect_analyze_xx
with below flow:
vect_analyze_loop_2
|- vect_pattern_recog // over-widening and set loop_vinfo->vector_mode to DImode
|- ...
|- vect_analyze_loop_operations
|- stmt_info->def_type == vect_reduction_def
|- stmt_info->slp_type == pure_slp
|- vectorizable_lc_phi // Not Hit
|- vectorizable_induction // Not Hit
|- vectorizable_reduction // Not Hit
|- vectorizable_recurr // Not Hit
|- vectorizable_live_operation // Not Hit
|- vect_analyze_stmt
|- stmt_info->relevant == vect_unused_in_scope
|- stmt_info->live == false
|- p pattern_stmt_info == (stmt_vec_info) 0x0
|- return opt_result::success ();
OR
|- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP analysis\n"
|- Early return opt_result::success ();
|- vectorizable_load/store/call_convert/... // Not Hit
|- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS(loop_vinfo).is_empty ()
|- vect_verify_loop_lens (loop_vinfo)
|- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert result in ICE
Finally, the DImode in loop_vinfo will hit the assert (VECTOR_MODE_P (mode))
in vect_verify_loop_lens. This patch would like to return false
directly if the loop_vinfo has relevant mode like DImode for the ICE
fix, but still may have mis-optimization for similar cases. We will try
to cover that in separated patches.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
PR middle-end/116351
gcc/ChangeLog:
* tree-vect-loop.cc (vect_verify_loop_lens): Return false if the
loop_vinfo has relevant mode such as DImode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr116351-1.c: New test.
* gcc.target/riscv/rvv/base/pr116351-2.c: New test.
* gcc.target/riscv/rvv/base/pr116351.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
In PR115703 we fuse two vsetvls:
Fuse curr info since prev info compatible with it:
prev_info: VALID (insn 438, bb 2)
Demand fields: demand_ge_sew demand_non_zero_avl
SEW=32, VLMUL=m1, RATIO=32, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(reg:DI 0 zero)
VL=(reg:DI 9 s1 [312])
curr_info: VALID (insn 92, bb 20)
Demand fields: demand_ratio_and_ge_sew demand_avl
SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(const_int 4 [0x4])
VL=(nil)
prev_info after fused: VALID (insn 438, bb 2)
Demand fields: demand_ratio_and_ge_sew demand_avl
SEW=64, VLMUL=mf2, RATIO=64, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(const_int 4 [0x4])
VL=(nil).
The result is vsetvl zero, zero, e64, mf2, ta, ma. The previous vsetvl
set vl = 4 but here we wrongly set it to vl = 2. As all the following
vsetvls only ever change the ratio we never recover.
The issue is quite difficult to trigger because we can often
deduce the value of d at runtime. Then very check for the value of
d will be optimized away.
The last known bad commit is r15-3458-g5326306e7d9d36. With that commit
the output is wrong but -fno-schedule-insns makes it correct. From the
next commit on the issue is latent. I still added the PR's test as scan
and run check even if they don't trigger right now. Not sure if the
run test will ever fail but well. I verified that the
patch fixes the issue when applied on top of r15-3458-g5326306e7d9d36.
PR target/115703
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc: Use max_sew for calculating the
new LMUL.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr115703-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr115703.c: New test.
|
|
Ref:
https://github.com/ewlu/gcc-precommit-ci/issues/3096#issue-2854419069
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/bug-9.c: Added new failure check.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-17.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-18.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-19.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-20.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-21.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-22.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-23.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-24.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-25.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-26.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-27.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-28.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-29.c: Likewise.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-3.c: Likewise.
|
|
This patch would like to avoid the ICE when the target attribute
specific the xlen different to the cmd. Aka compile with rv64gc
but target attribute with rv32gcv_zbb. For example as blow:
1 │ long foo (long a, long b)
2 │ __attribute__((target("arch=rv32gcv_zbb")));
3 │
4 │ long foo (long a, long b)
5 │ {
6 │ return a + (b * 2);
7 │ }
when compile with rv64gc -O3, it will have ICE similar as below
during RTL pass: fwprop1
test.c: In function ‘foo’:
test.c:10:1: internal compiler error: in add_use, at
rtl-ssa/accesses.cc:1234
10 | }
| ^
0x44d6b9d internal_error(char const*, ...)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517
0x44a26a6 fancy_abort(char const*, int, char const*)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722
0x408fac9 rtl_ssa::function_info::add_use(rtl_ssa::use_info*)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/accesses.cc:1234
0x40a5eea
rtl_ssa::function_info::create_reg_use(rtl_ssa::function_info::build_info&,
rtl_ssa::insn_info*, rtl_ssa::resource_info)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/insns.cc:496
0x4456738
rtl_ssa::function_info::add_artificial_accesses(rtl_ssa::function_info::build_info&,
df_ref_flags)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:900
0x4457297
rtl_ssa::function_info::start_block(rtl_ssa::function_info::build_info&,
rtl_ssa::bb_info*)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1082
0x4453627
rtl_ssa::function_info::bb_walker::before_dom_children(basic_block_def*)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:118
0x3e9f3fb dom_walker::walk(basic_block_def*)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/domwalk.cc:311
0x445806f rtl_ssa::function_info::process_all_blocks()
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1298
0x40a22d3 rtl_ssa::function_info::function_info(function*)
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/functions.cc:51
0x3ec3f80 fwprop_init
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:893
0x3ec420d fwprop
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:963
0x3ec43ad execute
Consider stage 4, we just report error for the above scenario when
detect the cmd xlen is different to the target attribute during the
target hook TARGET_OPTION_VALID_ATTRIBUTE_P implementation.
PR target/118540
gcc/ChangeLog:
* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Report error when cmd xlen is different with target attribute.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118540-1.c: New test.
* gcc.target/riscv/rvv/base/pr118540-2.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This is Jakub's patch and Ian's testcase for the slightly vexing fault building
the D runtime with an s390x-x-riscv cross compiler.
The core issue is we're allocating a vector to hold temporary registers
unconditionally, including cases where the vector isn't needed because the loop
isn't going to iterate.
In the cases where the vector isn't needed the length is computed with an
expression (x / y) - 1 where x / y will be zero. The alloca(-1) on the s390
platform triggers a fault. We haven't seen the fault with an x86 cross, but we
can certainly see the bogus value being passed to alloca with a debugger.
Jakub patch just conditionalizes the whole block in a sensible way. So it
looks larger than it really is. I thought it might be better to do a bit of
manual CSE on this code to make it even more obvious, but I think we're
ultimately OK here.
Ian provided the testcase, collapsed down into equivalent C code. Again, it
doesn't fault on an x86-x-riscv, but I can see the incorrect behavior with a
debugger.
And a shout-out to Stefan for providing a docker based reproducer, it really
helped track this down.
PR target/118248
gcc/
* config/riscv/riscv-string.cc (riscv_block_move_straight): Only
allocate REGS buffer if it will be needed.
gcc/testsuite
* gcc.target/riscv/pr118248.c: New test.
|
|
the test scanned for vmin and vmax instead of vminu and vmaxu.
This patch fixes that.
Will commit as obvious once the CI is OK with it.
Regards
Robin
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr117722.c: Scan for vminu and
vmaxu.
|
|
my last fix wasn't sufficient. This patch just scans for the scalar
insns now.
Going to commit as obvious if the CI is happy.
Regards
Robin
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/reduc-8.c: Scan for add.
* gcc.target/riscv/rvv/autovec/reduc/reduc-9.c: Scan for fadd.
|
|
When using riscv_v_abi, the return and arguments of the function should
be adequately checked to avoid ICE.
PR target/118872
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_fntype_abi): Strengthen the logic
of the check to avoid missing the error report.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118872.c: New test.
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Jin Ma <jinma@linux.alibaba.com>
|
|
Hi,
in PR118832 we have another instance of the problem already noticed in
PR117878. We sometimes use e.g. expand_simple_binop for vector
operations like shift or and. While this is usually OK, it causes
problems when doing it late, e.g. during LRA.
In particular, we might rematerialize a const_vector during LRA, which
then leaves an insn laying around that cannot be split any more if it
requires a pseudo. Therefore we should only use the split variants
in expand_const_vector.
This patch fixed the issue in the PR and also pre-emptively rewrites two
other spots that might be prone to the same issue.
Regtested on rv64gcv_zvl512b. As the two other cases don't have a test
(so might not even trigger) I unconditionally enabled them for my testsuite
run.
Regards
Robin
PR target/118832
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Expand as
vlmax insn during lra.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr118832.c: New test.
|
|
A couple of Vector pseudoinstructions use x0 scalar which could be
inefficient on wider uarches due to regfile crossing.
Instead use the imm 0 form, which should be functionally equivalent.
pseudoinsn orig insn with x0 this patch
-------------------- -------------------- -------------------
vneg.v vd,vs vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0
vncvt.x.x.w vd,vs,vm vnsrl.wx vd,vs,x0,vm vnsrl.wi vd,vs,0,vm
vwcvt.x.x.v vd,vs,vm vwadd.vx vd,vs,x0,vm (imm not supported)
gcc/ChangeLog:
* config/riscv/vector.md: vncvt substitute vnsrl.
vnsrl with x0 replace with immediate 0.
vneg substitute vrsub.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: Change
expected pattern.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto.
* gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto.
* gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
This is a follow-up to the patch below to avoid generating unrecognized
vsetivl instructions for XTheadVector.
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html
PR target/118601
gcc/ChangeLog:
* config/riscv/riscv-string.cc (expand_block_move): Check with new
constraint 'vl' instead of 'K'.
(expand_vec_setmem): Likewise.
(expand_vec_cmpmem): Likewise.
* config/riscv/riscv-v.cc (force_vector_length_operand): Likewise.
(expand_load_store): Likewise.
(expand_strided_load): Likewise.
(expand_strided_store): Likewise.
(expand_lanes_load_store): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to...
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test.
* gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test.
Reported-by: Edwin Lu <ewlu@rivosinc.com>
|
|
Code sinking is just semantic preserving code motions, so it's a lot like
scheduling in that code motions can change the vector configuration needed at
various program points. That in turn can also change the number of vsetvls as
we may or may not be able to merge them after the code motions.
The sinking heuristics were twiddled several months ago resulting in a handful
of scan-asm failures. This patch adjusts the tests appropriately fixing
pr115123 (P3 regression).
PR target/115123
gcc/testsuite
* gcc.target/riscv/rvv/base/pr114352-3.c: Adjust expected output.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-66.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-91.c: Likewise.
* gcc.target/riscv/rvv/vsetvl/avl_single-92.c: Likewise.
|
|
There's some special case code in the risc-v move expander to try and optimize
cases where the source is a subreg of a vector and the destination is a scalar
mode.
The code works fine except when we have no support for the given mode. ie HF or
BF when those extensions aren't enabled. We'll end up tripping an assert in
that case when we should have just let standard expansion do its thing.
Tested in my system for rv32 and rv64, but I'll wait for the pre-commit tester
to render a verdict before moving forward.
PR target/118146
gcc/
* config/riscv/riscv.cc (riscv_legitimize_move): Handle subreg
of vector source better to avoid ICE.
gcc/testsuite
* gcc.target/riscv/pr118146-1.c: New test.
* gcc.target/riscv/pr118146-2.c: New test.
|
|
Inspired by PR118103, the VXRM register should be treated almost the
same as the FRM register, aka cooperatively-managed global register.
Thus, add the VXRM to global_regs to avoid the elimination by the
late-combine pass.
For example as below code:
21 │
22 │ void compute ()
23 │ {
24 │ size_t vl = __riscv_vsetvl_e16m1 (N);
25 │ vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
26 │ vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
27 │ vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, vl);
28 │
29 │ __riscv_vse16_v_u16m1 (c, vc, vl);
30 │ }
31 │
32 │ int main ()
33 │ {
34 │ initialize ();
35 │ compute();
36 │
37 │ return 0;
38 │ }
After compile with -march=rv64gcv -O3, we will have:
30 │ compute:
31 │ csrwi vxrm,2
32 │ lui a3,%hi(a)
33 │ lui a4,%hi(b)
34 │ addi a4,a4,%lo(b)
35 │ vsetivli zero,4,e16,m1,ta,ma
36 │ addi a3,a3,%lo(a)
37 │ vle16.v v2,0(a4)
38 │ vle16.v v1,0(a3)
39 │ lui a4,%hi(c)
40 │ addi a4,a4,%lo(c)
41 │ vaaddu.vv v1,v1,v2
42 │ vse16.v v1,0(a4)
43 │ ret
44 │ .size compute, .-compute
45 │ .section .text.startup,"ax",@progbits
46 │ .align 1
47 │ .globl main
48 │ .type main, @function
49 │ main:
| // csrwi vxrm,2 deleted after inline
50 │ addi sp,sp,-16
51 │ sd ra,8(sp)
52 │ call initialize
53 │ lui a3,%hi(a)
54 │ lui a4,%hi(b)
55 │ vsetivli zero,4,e16,m1,ta,ma
56 │ addi a4,a4,%lo(b)
57 │ addi a3,a3,%lo(a)
58 │ vle16.v v2,0(a4)
59 │ vle16.v v1,0(a3)
60 │ lui a4,%hi(c)
61 │ addi a4,a4,%lo(c)
62 │ li a0,0
63 │ vaaddu.vv v1,v1,v2
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/118103
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the VXRM as the global_regs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118103-2.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-2.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Richard S's recent change to iv increment insertion removed a reg->reg move
(which was its intent AFAICT). This triggered a failure on a riscv test.
That test was meant to verify that we didn't have an extraneous reg->reg move
due to a buglet in the risc-v splitters. Before the 2023 change we had two
vector reg->reg moves and after the 2023 fix we had just one. With Richard's
change we have none ;-) Adjusting test accordingly.
Pushed to the trunk.
gcc/testsuite
* gcc.target/riscv/rvv/autovec/madd-split2-1.c: Update expected
output.
|
|
The following test ICEs on RISC-V at least latently since
r14-1622-g99bfdb072e67fa3fe294d86b4b2a9f686f8d9705 which added
RISC-V specific case to get_biv_step_1 to recognize also
({zero,sign}_extend:DI (plus:SI op0 op1))
The reason for the ICE is that op1 in this case is CONST_POLY_INT
which unlike the really expected VOIDmode CONST_INTs has its own
mode and still satisfies CONSTANT_P.
GET_MODE (rhs) (SImode) is different from outer_mode (DImode), so
the function later does
*inner_step = simplify_gen_binary (code, outer_mode,
*inner_step, op1);
but that obviously ICEs because while *inner_step is either VOIDmode
or DImode, op1 has SImode.
The following patch fixes it by extending op1 using code so that
simplify_gen_binary can handle it. Another option would be
to change the !CONSTANT_P (op1) 3 lines above this to
!CONST_INT_P (op1), I think it isn't very likely that we get something
useful from other constants there.
2025-02-06 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/117506
* loop-iv.cc (get_biv_step_1): For {ZERO,SIGN}_EXTEND
of PLUS apply {ZERO,SIGN}_EXTEND to op1.
* gcc.dg/pr117506.c: New test.
* gcc.target/riscv/pr117506.c: New test.
|
|
The -mcpu=tt-ascalon-d8 option for the test implies D extension, which
is not compatible with the ILP32E and ILP64E ABIs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr118170.c: Ignore for E ABI.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
multiplication against boolean value
Andrew, Raphael and I have all poked at it in various ways over the last year
or so. I think when Raphael and I first looked at it I sent us down a bit of
rathole.
In particular it's odd that we're using a multiply to implement a select and it
seemed like recognizing the idiom and rewriting into a conditional move was the
right path. That looked reasonably good for the test, but runs into problems
with min/max detection elsewhere.
I think that initial investigation somewhat polluted our thinking. The
regression can be fixed with a fairly simple match.pd pattern.
Essentially we want to handle
x * (x || b) -> x
x * !(x || b) -> 0
There's simplifications that can be made for "&&" cases, but I haven't seen
them in practice. Rather than drop in untested patterns, I'm leaving that as a
future todo.
My original was two match.pd patterns. Andrew combined them into a single
pattern. I've made this conditional on GIMPLE as an earlier version that
simplified to a conditional move showed that when applied on GENERIC we could
drop an operand with a side effect which is clearly not good.
I've bootstrapped and regression tested this on x86. I've also tested on the
various embedded targets in my tester.
PR tree-optimization/114277
gcc/
* match.pd (a * (a || b) -> a): New pattern.
(a * !(a || b) -> 0): Likewise.
gcc/testsuite
* gcc.target/i386/pr114277.c: New test.
* gcc.target/riscv/pr114277.c: Likewise.
Co-author: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This patch would like to fix the wroing code generation for the scalar
signed SAT_TRUNC. The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode. Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add. The gen_lowpart
will generate something like lbu which has all zero for highest bits.
For example, when 0xff7f(-129 for HImode) trunc to QImode, we actually
want compare -129 to -128, but if there is no sign extend like lbu, we will
compare 0xff7f to 0xffffffffffffff80(assum Xmode is DImode). Thus, we have
to sign extend 0xff(Qmode) to 0xffffffffffffff7f(assume Xmode is DImode)
before compare in Xmode.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/117688
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_sstrunc): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr117688.h: Add test helper macros.
* gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to fix the wroing code generation for the scalar
signed SAT_SUB. The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode. Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like sub. The gen_lowpart
will generate something like lbu which has all zero for highest bits.
For example, when 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually
want to -1 - 1 = -2, but if there is no sign extend like lbu, we will get
0xff - 1 = 0xfe which is incorrect. Thus, we have to sign extend 0xff(Qmode)
to 0xffffffffffffffff(assume XImode is DImode) before sub in Xmode.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/117688
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_sssub): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr117688.h: Add test helper macro.
* gcc.target/riscv/pr117688-sub-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to fix the wroing code generation for the scalar
signed SAT_ADD. The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode. Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add. The gen_lowpart
will generate something like lbu which has all zero for highest bits.
For example, when 0xff(-1 for QImode) plus 0x2(1 for QImode), we actually
want to -1 + 2 = 1, but if there is no sign extend like lbu, we will get
0xff + 2 = 0x101 which is incorrect. Thus, we have to sign extend 0xff(Qmode)
to 0xffffffffffffffff(assume XImode is DImode) before plus in Xmode.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/117688
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_ssadd): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr117688-add-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s8.c: New test.
* gcc.target/riscv/pr117688.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
In both tests we expect a VEC_SHL_INSERT expression but we now add the
initial value at the end. Just remove that scan check.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/reduc-8.c: Remove
VEC_SHL_INSERT check.
* gcc.target/riscv/rvv/autovec/reduc/reduc-9.c: Ditto.
|
|
The test fails with _zvfh because we vectorize more. Just adjust the
test expectations.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c:
Distinguish between zvfh and !zvfh.
|
|
After we enabled the labe-combine pass after the mode-switching pass, it
will try to combine below insn patterns into op. Aka:
(insn 40 5 41 2 (set (reg:SI 11 a1 [151])
(reg:SI 69 frm)) "pr118103-simple.c":67:15 2712 {frrmsi}
(nil))
(insn 41 40 7 2 (set (reg:SI 69 frm)
(const_int 2 [0x2])) "pr118103-simple.c":69:8 2710 {fsrmsi_restore}
(nil))
(insn 42 10 11 2 (set (reg:SI 69 frm)
(reg:SI 11 a1 [151])) "pr118103-simple.c":70:8 2710 {fsrmsi_restore}
(nil))
trying to combine definition of r11 in:
40: a1:SI=frm:SI
into:
42: frm:SI=a1:SI
instruction becomes a no-op:
(set (reg:SI 69 frm)
(reg:SI 69 frm))
original cost = 4 + 4 (weighted: 8.000000), replacement cost =
2147483647; keeping replacement
rescanning insn with uid = 42.
updating insn 42 in-place
verify found no changes in insn with uid = 42.
deleting insn 40
For example we have code as blow:
9 │ int test_exampe () {
10 │ test ();
11 │
12 │ size_t vl = 4;
13 │ vfloat16m1_t va = __riscv_vle16_v_f16m1(a, vl);
14 │ va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
15 │ va = __riscv_vfmsac_vv_f16m1(va, va, va, vl);
16 │
17 │ __riscv_vse16_v_f16m1(b, va, vl);
18 │
19 │ return 0;
20 │ }
it will be compiled to:
53 │ main:
54 │ addi sp,sp,-16
55 │ sd ra,8(sp)
56 │ call initialize
57 │ lui a6,%hi(b)
58 │ lui a2,%hi(a)
59 │ addi a3,a6,%lo(b)
60 │ addi a2,a2,%lo(a)
61 │ li a4,4
62 │ .L8:
63 │ fsrmi 2
64 │ vsetvli a5,a4,e16,m1,ta,ma
65 │ vle16.v v1,0(a2)
66 │ slli a1,a5,1
67 │ subw a4,a4,a5
68 │ add a2,a2,a1
69 │ vfnmadd.vv v1,v1,v1
>> The fsrm a0 insn is deleted by late-combine <<
70 │ vfmsub.vv v1,v1,v1
71 │ vse16.v v1,0(a3)
72 │ add a3,a3,a1
73 │ bgt a4,zero,.L8
74 │ lh a4,%lo(b)(a6)
75 │ li a5,-20480
76 │ addi a5,a5,-1382
77 │ bne a4,a5,.L14
78 │ ld ra,8(sp)
79 │ li a0,0
80 │ addi sp,sp,16
81 │ jr ra
This patch would like to add the FRM register to the global_regs as it
is a cooperatively-managed global register. And then the fsrm insn will
not be eliminated by late-combine. The related spec17 cam4 failure may
also caused by this issue too.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/118103
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the FRM as the global_regs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118103-1.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This reverts commit b22d9c8f8216d15773dee4f9677c6b26aff507fd.
|
|
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Enable testsuite of
XTheadVector.
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Adjust correctly.
* gcc.target/riscv/rvv/xtheadvector/prefix.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: Likewise.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: Likewise.
|
|
I've had a long standing TODO to review the RISC-V testsuite regressions from
enabling the late-combine pass (pr116256). I adjusted a few cases months ago,
this adjusts a couple more were it looks like the right thing to do.
All that's left after this are the vls/dup-? tests which regress in meaningful
ways and I'm still investigating reasonable approaches to fix them (they play
into the whole mvconst_internal pattern situation), late-combine isn't doing
anything wrong.
PR target/116256
gcc/testsuite
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-37.c: Update expected
output.
* gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Likewise.
|
|
The RISC-V backend has checks to verify that every used insn has an associated
type and that every insn type maps to some reservation in the DFA model. If
either test fails we ICE.
With the cpu/isa allowed to vary independently from the tune/scheduler model,
it's entirely possible (in fact trivial) to trigger those kinds of ICEs.
This patch "fixes" the ICEs for xiangshan-nanhu by throwing every unknown insn
type into a special bucket I wouldn't be surprised if a few of them are
implemented (like rotates as the chip seems to have other bitmanip extensions).
But I know nothing about this design and the DFA author hasn't responded to
requests to update the DFA in ~6 months.
This should dramatically reduce the number of ICEs in the testsuite if someone
were to turn on xiangshan-nanhu scheduling.
Not strictly a regression, but a bugfix and highly isolated to the
xiangshan-nanhu tuning in the RISC-V backend. So I'm gating this into gcc-15,
assuming pre-commit doesn't balk.
PR target/114442
gcc/
* config/riscv/xiangshan.md: Add missing insn types to a
new dummy insn reservation.
gcc/testsuite
* gcc.target/riscv/pr114442.c: New test.
|
|
For XTheadCondMov, the bit width of rs2 should always be XLEN-sized, otherwise
the program logic will be wrong.
Reference form
https://github.com/XUANTIE-RV/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf
Synopsis
Move if equal zero.
Mnemonic
th.mveqz rd, rs1, rs2
Description
This instruction moves the content of register rs1 into rd if the content of rs2 is 0x0.
Otherwise, the value of rd does not change.
Operation
if (reg[rs2] == 0x0)
reg[rd] := reg[rs1]
gcc/ChangeLog:
* config/riscv/thead.md (*th_cond_mov<GPR:mode><GPR2:mode>):
Change GPR2 to X.
(*th_cond_mov<GPR:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadcondmov-bug.c: New test.
|
|
While this wasn't originally marked as a regression, it almost certainly is
given that older versions of GCC would have used libatomic and would not have
ICE'd on this code.
Basically this is another case where we directly used simplify_gen_subreg when
we should have used gen_lowpart.
When I fixed a similar bug a while back I noted the code in question as needing
another looksie. I think at that time my brain saw the mixed modes (SI & QI)
and locked up. But the QI stuff is just the shift count, not some deeper
issue. So fixing is trivial.
We just replace the simplify_gen_subreg with a gen_lowpart and get on with our
lives.
Tested on rv64 and rv32 in my tester. Waiting on pre-commit testing for final
verdict.
PR target/116308
gcc/
* config/riscv/riscv.cc (riscv_lshift_subword): Use gen_lowpart
rather than simplify_gen_subreg.
gcc/testsuite/
* gcc.target/riscv/pr116308.c: New test.
|
|
These testcases require RV64 targets. They fail when -march=rv32* is
specified while using an riscv64* compiler.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/crc-21-rv64-zbc.c: Disallow rv32 targets.
* gcc.target/riscv/crc-21-rv64-zbkc.c: Ditto.
|
|
VSETVL_VTYPE_CHANGE_ONLY for XTheadVector.
In RVV 1.0, the instruction "vsetvli zero,zero,*" indicates that the
available vector length (avl) does not change. However, in XTheadVector,
this same instruction signifies that the avl should take the maximum value.
Consequently, when fusing vsetvl instructions, the optimization labeled
"VSETVL_VTYPE_CHANGE_ONLY" is disabled for XTheadVector.
PR target/118357
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc: Function change_vtype_only_p always
returns false for XTheadVector.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xtheadvector/pr118357.c: New test.
|
|
gcc/ChangeLog:
* config/riscv/riscv.cc
(is_zicfilp_p): New function.
(is_zicfiss_p): New function.
* config/riscv/riscv-zicfilp.cc: Update.
* config/riscv/riscv.h: Update.
* config/riscv/riscv.md: Update.
* config/riscv/riscv-c.cc: Add CFI predefine marco.
gcc/testsuite/ChangeLog:
* c-c++-common/fcf-protection-1.c: Update.
* c-c++-common/fcf-protection-2.c: Update.
* c-c++-common/fcf-protection-3.c: Update.
* c-c++-common/fcf-protection-4.c: Update.
* c-c++-common/fcf-protection-5.c: Update.
* c-c++-common/fcf-protection-6.c: Update.
* c-c++-common/fcf-protection-7.c: Update.
* gcc.target/riscv/ssp-1.c: Update.
* gcc.target/riscv/ssp-2.c: Update.
* gcc.target/riscv/zicfilp-call.c: Update.
* gcc.target/riscv/interrupt-no-lpad.c: Update.
|
|
This patch only support landing pad value is 0.
The next version will implement function signature based labeling
scheme.
RISC-V CFI SPEC: https://github.com/riscv/riscv-cfi
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add ZICFILP ISA
string.
* config.gcc: Add riscv-zicfilp.o
* config/riscv/riscv-passes.def (INSERT_PASS_BEFORE):
Insert landing pad instructions.
* config/riscv/riscv-protos.h (make_pass_insert_landing_pad):
Declare.
* config/riscv/riscv-zicfilp.cc: New file.
* config/riscv/riscv.cc
(riscv_trampoline_init): Add landing pad instructions.
(riscv_legitimize_call_address): Likewise.
(riscv_output_mi_thunk): Likewise.
* config/riscv/riscv.h: Update.
* config/riscv/riscv.md: Add landing pad patterns.
* config/riscv/riscv.opt (TARGET_ZICFILP): Define.
* config/riscv/t-riscv: Add build rule for
riscv-zicfilp.o
gcc/testsuite/ChangeLog:
* gcc.target/riscv/interrupt-no-lpad.c: New test.
* gcc.target/riscv/zicfilp-call.c: New test.
Co-Developed-by: Greg McGary <gkm@rivosinc.com>,
Kito Cheng <kito.cheng@gmail.com>
|
|
This patch is implemented according to the RISC-V CFI specification.
It supports the generation of shadow stack instructions in the prologue,
epilogue, non-local gotos, and unwinding.
RISC-V CFI SPEC: https://github.com/riscv/riscv-cfi
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add ZICFISS ISA string.
* config/riscv/predicates.md: New predicate x1x5_operand.
* config/riscv/riscv.cc
(riscv_expand_prologue): Insert shadow stack instructions.
(riscv_expand_epilogue): Likewise.
(riscv_for_each_saved_reg): Assign t0 or ra register for
sspopchk instruction.
(need_shadow_stack_push_pop_p): New function. Omit shadow
stack operation on leaf function.
* config/riscv/riscv.h
(need_shadow_stack_push_pop_p): Define.
* config/riscv/riscv.md: Add shadow stack patterns.
(save_stack_nonlocal): Add shadow stack instructions for setjump.
(restore_stack_nonlocal): Add shadow stack instructions for longjump.
* config/riscv/riscv.opt (TARGET_ZICFISS): Define.
libgcc/ChangeLog:
* config/riscv/linux-unwind.h: Include shadow-stack-unwind.h.
* config/riscv/shadow-stack-unwind.h
(_Unwind_Frames_Extra): Define.
(_Unwind_Frames_Increment): Define.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/ssp-1.c: New test.
* gcc.target/riscv/ssp-2.c: New test.
Co-Developed-by: Greg McGary <gkm@rivosinc.com>,
Kito Cheng <kito.cheng@gmail.com>
|
|
Update Sifive Xsfvqmacc and Xsfvfnrclip extension's testcases.
version log:
Update synchronize LMUL settings with return type.
gcc/ChangeLog:
* config/riscv/vector.md: New attr set.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c: Add vsetivli checking.
* gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c: Ditto.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c: Ditto.
|
|
`.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the
return value will be the start value even if the length is 0.
However current code gen in RISC-V backend is not meet that semantic, it will
result a random garbage value if length is 0.
Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64:
# _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0);
vsetvli zero,a5,e64,m1,ta,ma
vfmv.s.f v2,fa5 # insn 1
vfredosum.vs v1,v1,v2 # insn 2
vfmv.f.s fa5,v1 # insn 3
insn 1:
- vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value.
insn 2:
- vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA.
(v-spec say: `If vl=0, no operation is performed and the destination register
is not updated.`)
insn 3:
- vfmv.f.s will move the value from v1 even VL=0, so this is safe.
So how we fix that? we need two fix for that:
1. insn 1: need always execute with VL=1, so that we can guarantee it will
always work as expect.
2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for
all reduction patterns, then we can guarantee vd[0] will contain the
start value when vl=0
For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2,
we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1`
(start value).
Change since V3:
- Rename _AV to _VL0_SAFE for readability.
- Use non-VL0_SAFE version if VL is const or VLMAX.
- Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX.
- Two more testcase.
gcc/ChangeLog:
PR target/118182
* config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust
argument for expand_reduction.
(*widen_reduc_plus_scal_<mode>): Ditto.
(*fold_left_widen_plus_<mode>): Ditto.
(*mask_len_fold_left_widen_plus_<mode>): Ditto.
(*cond_widen_reduc_plus_scal_<mode>): Ditto.
(*cond_len_widen_reduc_plus_scal_<mode>): Ditto.
(*cond_widen_reduc_plus_scal_<mode>): Ditto.
* config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for
expand_reduction.
(reduc_smax_scal_<mode>): Ditto.
(reduc_umax_scal_<mode>): Ditto.
(reduc_smin_scal_<mode>): Ditto.
(reduc_umin_scal_<mode>): Ditto.
(reduc_and_scal_<mode>): Ditto.
(reduc_ior_scal_<mode>): Ditto.
(reduc_xor_scal_<mode>): Ditto.
(reduc_plus_scal_<mode>): Ditto.
(reduc_smax_scal_<mode>): Ditto.
(reduc_smin_scal_<mode>): Ditto.
(reduc_fmax_scal_<mode>): Ditto.
(reduc_fmin_scal_<mode>): Ditto.
(fold_left_plus_<mode>): Ditto.
(mask_len_fold_left_plus_<mode>): Ditto.
* config/riscv/riscv-v.cc (expand_reduction): Add one more
argument for reduction code for vl0-safe.
* config/riscv/riscv-protos.h (expand_reduction): Ditto.
* config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of
reduction.
(ANY_REDUC_VL0_SAFE): New.
(ANY_WREDUC_VL0_SAFE): Ditto.
(ANY_FREDUC_VL0_SAFE): Ditto.
(ANY_FREDUC_SUM_VL0_SAFE): Ditto.
(ANY_FWREDUC_SUM_VL0_SAFE): Ditto.
(reduc_op): Add _VL0_SAFE variant of reduction.
(order) Ditto.
* config/riscv/vector.md (@pred_<reduc_op><mode>): New.
gcc/testsuite/ChangeLog:
PR target/118182
* gfortran.target/riscv/rvv/pr118182.f: New.
* gcc.target/riscv/rvv/autovec/pr118182-1.c: New.
* gcc.target/riscv/rvv/autovec/pr118182-2.c: New.
|
|
Clearly an oversight in the generic-ooo model caught by the checking code. I
should have realized it was generic-ooo as we don't have a pipeline description
for the tenstorrent design yet, just the costing model.
The patch was extracted from the BZ which indicated Anton was the author, so I
kept that. I'm listed as co-author just in case someone wants to complain
about the testcase in the future. I didn't do any notable lifting here.
Thanks Peter and Anton!
PR target/118170
gcc/
* config/riscv/generic-ooo.md (generic_ooo_float_div_half): New
reservation.
gcc/testsuite
* gcc.target/riscv/pr118170.c: New test.
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
|
|
> The BZ in question is a failure to recognize a pair of shifts as a sign
> extension.
>
> I originally thought simplify-rtx would be the right framework to
> address this problem, but fwprop is actually better. We can write the
> recognizer much simpler in that framework.
>
> fwprop already simplifies nested shifts/extensions to the desired RTL,
> but it's not considered profitable and we throw away the good work done
> by fwprop & simplifiers.
>
> It's hard to see a scenario where nested shifts or nested extensions
> that simplify down to a single sign/zero extension isn't a profitable
> transformation. So when fwprop has nested shifts/extensions that
> simplifies to an extension, we consider it profitable.
>
> This allow us to simplify the testcase on rv64 with ZBB enabled from a
> pair of shifts to a single byte or half-word sign extension.
Hmm. So just to summarise something that was discussed in the PR
comments, this is a case where combine's expand_compound_operation/
make_compound_operation wrangler hurts us, because the process isn't
idempotent, and combine produces two complex instructions:
(insn 6 3 7 2 (set (reg:DI 137 [ _3 ])
(ashift:DI (reg:DI 139 [ x ])
(const_int 24 [0x18]))) "foo.c":2:20 305 {ashldi3}
(expr_list:REG_DEAD (reg:DI 139 [ x ])
(nil)))
(insn 12 7 13 2 (set (reg/i:DI 10 a0)
(sign_extend:DI (ashiftrt:SI (subreg:SI (reg:DI 137 [ _3 ]) 0)
(const_int 24 [0x18])))) "foo.c":2:27 321 {ashrsi3_extend}
(expr_list:REG_DEAD (reg:DI 137 [ _3 ])
(nil)))
given two simple instructions:
(insn 6 3 7 2 (set (reg:SI 137 [ _3 ])
(sign_extend:SI (subreg:QI (reg/v:DI 136 [ x ]) 0))) "foo.c":2:20 533 {*extendqisi2_bitmanip}
(expr_list:REG_DEAD (reg/v:DI 136 [ x ])
(nil)))
(insn 7 6 12 2 (set (reg:DI 138 [ _3 ])
(sign_extend:DI (reg:SI 137 [ _3 ]))) "foo.c":2:20 discrim 1 133 {*extendsidi2_internal}
(expr_list:REG_DEAD (reg:SI 137 [ _3 ])
(nil)))
If I run with -fdisable-rtl-combine then late_combine1 already does the
expected transformation.
Although it would be nice to fix combine, that might be difficult.
If we treat combine as immutable then the options are:
(1) Teach simplify-rtx to simplify combine's output into a single sign_extend.
(2) Allow fwprop1 to get in first, before combine has a chance to mess
things up.
The patch goes for (2).
Is that a fair summary?
Playing devil's advocate, I suppose one advantage of (1) is that it
would allow the optimisation even if the original rtl looked like
combine's output. And fwprop1 doesn't distinguish between cases in
which the source instruction disappears from cases in which the source
instruction is kept. Thus we could transform:
(set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
(set (reg:DI R3) (sign_extend:DI (reg:SI R2)))
into:
(set (reg:SI R2) (sign_extend:SI (reg:QI R1)))
(set (reg:DI R3) (sign_extend:DI (reg:QI R1)))
which increases the register pressure between the two instructions
(since R2 and R1 are both now live). In general, there could be
quite a gap between the two instructions.
On the other hand, even in that case, fwprop1 would be parallelising
the extensions. And since we're talking about unary operations,
even two-address targets would allow R1 to be extended without
tying the source and destination.
Also, it seems relatively unlikely that expand would produce code
that looks like combine's, since the gimple optimisers should have
simplified it into conversions.
So initially I was going to agree that it's worth trying in fwprop. But...
[ commentary on Jeff's original approach dropped. ]
So it seems like it's a bit of a mess 🙁
If we do try to fix combine, I think something like the attached
would fit within the current scheme. It is a pure shift-for-shift
transformation, avoiding any extensions.
Will think more about it, but wanted to get the above stream of
consciousness out before I finish for the day 🙂
PR rtl-optimization/109592
gcc/
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify nested shifts with subregs.
gcc/testsuite
* gcc.target/riscv/pr109592.c: New test.
* gcc.target/riscv/sign-extend-rshift.c: Adjust expected output
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
|
|
In PR118154 we emit strided stores but the first of those does not
always have the proper VTYPE. That's because we erroneously delete
a necessary vsetvl.
In order to determine whether to elide
(1)
Expr[7]: VALID (insn 116, bb 17)
Demand fields: demand_ratio_and_ge_sew demand_avl
SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(reg:DI 0 zero)
when e.g.
(2)
Expr[3]: VALID (insn 360, bb 15)
Demand fields: demand_sew_lmul demand_avl
SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64
TAIL_POLICY=agnostic, MASK_POLICY=agnostic
AVL=(reg:DI 0 zero)
VL=(reg:DI 13 a3 [345])
is already available, we use
sew_ge_and_prev_sew_le_next_max_sew_and_next_ratio_valid_for_prev_sew_p.
(1) requires RATIO = SEW/LMUL = 16 and an SEW >= 8. (2) has ratio = 64,
though, so we cannot directly elide (1).
This patch uses ratio_eq_p instead of next_ratio_valid_for_prev_sew_p.
PR target/118154
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (MAX_LMUL): New define.
(pre_vsetvl::earliest_fuse_vsetvl_info): Use.
(pre_vsetvl::pre_global_vsetvl_info): New predicate with equal
ratio.
* config/riscv/riscv-vsetvl.def: Use.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr118154-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr118154-2.c: New test.
|
|
In PR118140 we simplify
_ifc__33 = .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11);
to 1:
Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1
when _46 == 1. This happens by removing the conditional and applying
a | 1 = 1. Normally we re-introduce the conditional and its else value
if needed but that does not happen here as we're not dealing with a
vector type. For correctness's sake, we must not remove the conditional
even for non-vector types.
This patch re-introduces a COND_EXPR in such cases. For PR118140 this
result in a non-vectorized loop.
PR middle-end/118140
gcc/ChangeLog:
* gimple-match-exports.cc (maybe_resimplify_conditional_op): Add
COND_EXPR when we simplified to a scalar gimple value but still
have an else value.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr118140.c: New test.
* gcc.target/riscv/rvv/autovec/pr118140.c: New test.
|
|
Hi,
in PR117682 we build an interleaving pattern
{ 1, 201, 209, 25, 161, 105, 113, 185, 65, 9,
17, 89, 225, 169, 177, 249, 129, 73, 81, 153,
33, 233, 241, 57, 193, 137, 145, 217, 97, 41,
49, 121 };
with negative step expecting wraparound semantics due to -fwrapv.
For building interleaved patterns we have an optimization that
does e.g.
{1, 209, ...} = { 1, 0, 209, 0, ...}
and
{201, 25, ...} >> 8 = { 0, 201, 0, 25, ...}
and IORs those.
The optimization only works if the lowpart bits are zero. When
overflowing e.g. with a negative step we cannot guarantee this.
This patch makes us fall back to the generic merge handling for negative
steps.
I'm not 100% certain we're good even for positive steps. If the
step or the vector length is large enough we'd still overflow and
have non-zero lower bits. I haven't seen this happen during my
testing, though and the patch doesn't make things worse, so...
Regtested on rv64gcv_zvl512b. Let's see what the CI says.
Regards
Robin
PR target/117682
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Fall back to
merging if either step is negative.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr117682.c: New test.
|
|
Hi,
the zbb-rol-ror and stack_save_restore tests use the -fno-lto option and
scan the final assembly. For an invocation like -flto ... -fno-lto the
output file we scan is still something like
zbb-rol-ror-09.ltrans0.ltrans.s.
Therefore skip the tests when "-flto" is present. This gets rid
of a few UNRESOLVED tests.
Regtested on rv64gcv_zvl512b. Going to push if the CI agrees.
Regards
Robin
gcc/testsuite/ChangeLog:
* gcc.target/riscv/stack_save_restore_1.c: Skip for -flto.
* gcc.target/riscv/stack_save_restore_2.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-04.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-05.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-06.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-07.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-08.c: Ditto.
* gcc.target/riscv/zbb-rol-ror-09.c: Ditto.
|
|
The test case
long
test (long x, long y)
{
return ((x | 0x1ff) << 3) + y;
}
is now compiled (-O2 -march=rv64g_zba) to
li a4,4096
slliw a5,a0,3
addi a4,a4,-8
or a5,a5,a4
addw a0,a5,a1
ret
Despite this check was originally intended to use zba better, now
removing it actually enables the use of zba for this test case (thanks
to late combine):
ori a5,a0,511
sh3add a0,a5,a1
ret
Obviously, bitmanip.md does not cover
(any_or (ashift (reg) (imm123)) imm) at all, and even for and it just
seems more natural splitting to (ashift (and (reg) (imm')) (imm123))
first, then let late combine to combine the outer ashift and the plus.
I've not found any test case regressed by the removal.
And "make check-gcc RUNTESTFLAGS=riscv.exp='zba-*.c'" also reports no
failure.
gcc/ChangeLog:
PR target/115921
* config/riscv/riscv.md (<optab>_shift_reverse): Remove
check for TARGET_ZBA.
gcc/testsuite/ChangeLog:
PR target/115921
* gcc.target/riscv/zba-shNadd-08.c: New test.
|
|
"use_max_sew" to merge vsetvl
When the vsetvl instructions of the two RVV instructions are merged
using "use_max_sew", it is possible to update the sew of prev if
prev.sew < next.sew, but keep the original ratio, which is obviously
wrong. when the subsequent instructions are equal to the wrong ratio,
it is possible to generate the wrong "vsetvli zero,zero" instruction,
which will lead to unknown avl.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (demand_system::use_max_sew): Also
set the ratio for PREV.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/bug-10.c: New test.
|