aboutsummaryrefslogtreecommitdiff
path: root/gcc/testsuite/gcc.target/riscv
AgeCommit message (Collapse)AuthorFilesLines
11 hours[PR target/118906] [PATCH v2] RISC-V: Fix a typo in zce to zcf implicationYuriy Kolerov4-0/+24
zce must imply zcf but this rule was corrupted after refactoring in 9e12010b5e724277ea. This may be observed ater generating an .s file from any source code file with -mriscv-attribute -march=rv32if_zce -mabi=ilp32 -S options. A full march will be presented in arch attribute: rv32i2p1_f2p2_zicsr2p0_zca1p0_zcb1p0_zce1p0_zcmp1p0_zcmt1p0 As you see, zcf is not presented here though f_zce pair is passed in -march. According to The RISC-V Instruction Set Manual: Specifying Zce on RV32 with F includes Zca, Zcb, Zcmp, Zcmt and Zcf. PR target/118906 gcc/ChangeLog: * common/config/riscv/riscv-common.cc: fix zce to zcf implication. gcc/testsuite/ChangeLog: * gcc.target/riscv/attribute-zce-1.c: New test. * gcc.target/riscv/attribute-zce-2.c: New test. * gcc.target/riscv/attribute-zce-3.c: New test. * gcc.target/riscv/attribute-zce-4.c: New test.
2 daysRISC-V: Fix bug for expand_const_vector interleave [PR118931]Pan Li1-0/+19
This patch would like to fix one bug when expanding const vector for the interleave case. For example, we have: base1 = 151 step = 121 For vec_series, we will generate vector in format of v[i] = base + i * step. Then the vec_series will have below result for HImode, and we can find that the result overflow to the highest 8 bits of HImode. v1.b = {151, 255, 7, 0, 119, 0, 231, 0, 87, 1, 199, 1, 55, 2, 167, 2} Aka we expect v1.b should be: v1.b = {151, 0, 7, 0, 119, 0, 231, 0, 87, 0, 199, 0, 55, 0, 167, 0} After that it will perform the IOR with v2 for the base2(aka another series). v2.b = {0, 17, 0, 33, 0, 49, 0, 65, 0, 81, 0, 97, 0, 113, 0, 129} Unfortunately, the base1 + i * step1 in HImode may overflow to the high 8 bits, and the high 8 bits will pollute the v2 and result in incorrect value in const_vector. This patch would like to perform the overflow to smode check before the optimized interleave code generation. If overflow or VLA, it will fall back to the default merge approach. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/118931 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Add overflow to smode check and clean up highest bits if overflow. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr118931-run-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
6 daysvect: Use original LHS type for gather pattern [PR118950].Robin Dapp1-0/+29
In PR118950 we do not zero masked elements in a gather load. While recognizing a gather/scatter pattern we do not use the original type of the LHS. This matters because the type can differ with bool patterns (e.g. _Bool vs unsigned char) and we don't notice the need for zeroing out the padding bytes. This patch just uses the original LHS's type. PR middle-end/118950 gcc/ChangeLog: * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Use original LHS's type. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr118950.c: New test.
6 daysRISC-V: Fix .cfi_offset directive when push/pop in zcmpLino Hsing-Yu Peng1-0/+12
The incorrect cfi directive info breaks stack unwind in try/catch/cxa. Before patch: cm.push {ra, s0-s2}, -16 .cfi_offset 1, -12 .cfi_offset 8, -8 .cfi_offset 18, -4 After patch: cm.push {ra, s0-s2}, -16 .cfi_offset 1, -16 .cfi_offset 8, -12 .cfi_offset 9, -8 .cfi_offset 18, -4 gcc/ChangeLog: * config/riscv/riscv.cc: Set multi push regs bits. gcc/testsuite/ChangeLog: * gcc.target/riscv/zcmp_push_gpr.c: New test.
7 daysTurn test cases into UNSUPPORTED if running into 'sorry, unimplemented: ↵Thomas Schwinge10-10/+0
dynamic stack allocation not supported' In Subversion r217296 (Git commit e2acc079ff125a869159be45371dc0a29b230e92) "Testsuite alloca fixes for ptx", effective-target 'alloca' was added to mark up test cases that run into the nvptx back end's non-support of dynamic stack allocation. (Later, nvptx gained conditional support for that in commit 3861d362ec7e3c50742fc43833fe9d8674f4070e "nvptx: PTX 'alloca' for '-mptx=7.3'+, '-march=sm_52'+ [PR65181]", but on the other hand, in commit f93a612fc4567652b75ffc916d31a446378e6613 "bpf: liberate R9 for general register allocation", the BPF back end joined "the list of targets that do not support alloca in target-support.exp". Manually maintaining the list of test cases requiring effective-target 'alloca' is notoriously hard, gets out of date quickly: new test cases added to the test suite may need to be analyzed and annotated, and over time annotations also may need to be removed, in cases where the compiler learns to optimize out 'alloca'/VLA usage, for example. This commit replaces (99 % of) the manual annotations with an automatic scheme: turn test cases into UNSUPPORTED if running into 'sorry, unimplemented: dynamic stack allocation not supported'. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_alloca): Gracefully handle the case that we've not be called (indirectly) from 'dg-test'. * lib/gcc-dg.exp (proc gcc-dg-prune): Turn 'sorry, unimplemented: dynamic stack allocation not supported' into UNSUPPORTED. * c-c++-common/Walloca-larger-than.c: Don't 'dg-require-effective-target alloca'. * c-c++-common/Warray-bounds-9.c: Likewise. * c-c++-common/Warray-bounds.c: Likewise. * c-c++-common/Wdangling-pointer-2.c: Likewise. * c-c++-common/Wdangling-pointer-4.c: Likewise. * c-c++-common/Wdangling-pointer-5.c: Likewise. * c-c++-common/Wdangling-pointer.c: Likewise. * c-c++-common/Wimplicit-fallthrough-7.c: Likewise. * c-c++-common/Wsizeof-pointer-memaccess1.c: Likewise. * c-c++-common/Wsizeof-pointer-memaccess2.c: Likewise. * c-c++-common/Wstringop-truncation.c: Likewise. * c-c++-common/Wunused-var-6.c: Likewise. * c-c++-common/Wunused-var-8.c: Likewise. * c-c++-common/analyzer/alloca-leak.c: Likewise. * c-c++-common/analyzer/allocation-size-multiline-2.c: Likewise. * c-c++-common/analyzer/allocation-size-multiline-3.c: Likewise. * c-c++-common/analyzer/capacity-1.c: Likewise. * c-c++-common/analyzer/capacity-3.c: Likewise. * c-c++-common/analyzer/imprecise-floating-point-1.c: Likewise. * c-c++-common/analyzer/infinite-recursion-alloca.c: Likewise. * c-c++-common/analyzer/malloc-callbacks.c: Likewise. * c-c++-common/analyzer/malloc-paths-8.c: Likewise. * c-c++-common/analyzer/out-of-bounds-5.c: Likewise. * c-c++-common/analyzer/out-of-bounds-diagram-11.c: Likewise. * c-c++-common/analyzer/uninit-alloca.c: Likewise. * c-c++-common/analyzer/write-to-string-literal-5.c: Likewise. * c-c++-common/asan/alloca_loop_unpoisoning.c: Likewise. * c-c++-common/auto-init-11.c: Likewise. * c-c++-common/auto-init-12.c: Likewise. * c-c++-common/auto-init-15.c: Likewise. * c-c++-common/auto-init-16.c: Likewise. * c-c++-common/builtins.c: Likewise. * c-c++-common/dwarf2/vla1.c: Likewise. * c-c++-common/gomp/pr61486-2.c: Likewise. * c-c++-common/torture/builtin-clear-padding-4.c: Likewise. * c-c++-common/torture/strub-run3.c: Likewise. * c-c++-common/torture/strub-run4.c: Likewise. * c-c++-common/torture/strub-run4c.c: Likewise. * c-c++-common/torture/strub-run4d.c: Likewise. * c-c++-common/torture/strub-run4i.c: Likewise. * g++.dg/Walloca1.C: Likewise. * g++.dg/Walloca2.C: Likewise. * g++.dg/cpp0x/pr70338.C: Likewise. * g++.dg/cpp1y/lambda-generic-vla1.C: Likewise. * g++.dg/cpp1y/vla10.C: Likewise. * g++.dg/cpp1y/vla2.C: Likewise. * g++.dg/cpp1y/vla6.C: Likewise. * g++.dg/cpp1y/vla8.C: Likewise. * g++.dg/debug/debug5.C: Likewise. * g++.dg/debug/debug6.C: Likewise. * g++.dg/debug/pr54828.C: Likewise. * g++.dg/diagnostic/pr70105.C: Likewise. * g++.dg/eh/cleanup5.C: Likewise. * g++.dg/eh/spbp.C: Likewise. * g++.dg/ext/builtin_alloca.C: Likewise. * g++.dg/ext/tmplattr9.C: Likewise. * g++.dg/ext/vla10.C: Likewise. * g++.dg/ext/vla11.C: Likewise. * g++.dg/ext/vla12.C: Likewise. * g++.dg/ext/vla15.C: Likewise. * g++.dg/ext/vla16.C: Likewise. * g++.dg/ext/vla17.C: Likewise. * g++.dg/ext/vla23.C: Likewise. * g++.dg/ext/vla3.C: Likewise. * g++.dg/ext/vla6.C: Likewise. * g++.dg/ext/vla7.C: Likewise. * g++.dg/init/array24.C: Likewise. * g++.dg/init/new47.C: Likewise. * g++.dg/init/pr55497.C: Likewise. * g++.dg/opt/pr78201.C: Likewise. * g++.dg/template/vla2.C: Likewise. * g++.dg/torture/Wsizeof-pointer-memaccess1.C: Likewise. * g++.dg/torture/Wsizeof-pointer-memaccess2.C: Likewise. * g++.dg/torture/pr62127.C: Likewise. * g++.dg/torture/pr67055.C: Likewise. * g++.dg/torture/stackalign/eh-alloca-1.C: Likewise. * g++.dg/torture/stackalign/eh-inline-2.C: Likewise. * g++.dg/torture/stackalign/eh-vararg-1.C: Likewise. * g++.dg/torture/stackalign/eh-vararg-2.C: Likewise. * g++.dg/warn/Wplacement-new-size-5.C: Likewise. * g++.dg/warn/Wsizeof-pointer-memaccess-1.C: Likewise. * g++.dg/warn/Wvla-1.C: Likewise. * g++.dg/warn/Wvla-3.C: Likewise. * g++.old-deja/g++.ext/array2.C: Likewise. * g++.old-deja/g++.ext/constructor.C: Likewise. * g++.old-deja/g++.law/builtin1.C: Likewise. * g++.old-deja/g++.other/crash12.C: Likewise. * g++.old-deja/g++.other/eh3.C: Likewise. * g++.old-deja/g++.pt/array6.C: Likewise. * g++.old-deja/g++.pt/dynarray.C: Likewise. * gcc.c-torture/compile/20000923-1.c: Likewise. * gcc.c-torture/compile/20030224-1.c: Likewise. * gcc.c-torture/compile/20071108-1.c: Likewise. * gcc.c-torture/compile/20071117-1.c: Likewise. * gcc.c-torture/compile/900313-1.c: Likewise. * gcc.c-torture/compile/parms.c: Likewise. * gcc.c-torture/compile/pr17397.c: Likewise. * gcc.c-torture/compile/pr35006.c: Likewise. * gcc.c-torture/compile/pr42956.c: Likewise. * gcc.c-torture/compile/pr51354.c: Likewise. * gcc.c-torture/compile/pr52714.c: Likewise. * gcc.c-torture/compile/pr55851.c: Likewise. * gcc.c-torture/compile/pr77754-1.c: Likewise. * gcc.c-torture/compile/pr77754-2.c: Likewise. * gcc.c-torture/compile/pr77754-3.c: Likewise. * gcc.c-torture/compile/pr77754-4.c: Likewise. * gcc.c-torture/compile/pr77754-5.c: Likewise. * gcc.c-torture/compile/pr77754-6.c: Likewise. * gcc.c-torture/compile/pr78439.c: Likewise. * gcc.c-torture/compile/pr79413.c: Likewise. * gcc.c-torture/compile/pr82564.c: Likewise. * gcc.c-torture/compile/pr87110.c: Likewise. * gcc.c-torture/compile/pr99787-1.c: Likewise. * gcc.c-torture/compile/vla-const-1.c: Likewise. * gcc.c-torture/compile/vla-const-2.c: Likewise. * gcc.c-torture/execute/20010209-1.c: Likewise. * gcc.c-torture/execute/20020314-1.c: Likewise. * gcc.c-torture/execute/20020412-1.c: Likewise. * gcc.c-torture/execute/20021113-1.c: Likewise. * gcc.c-torture/execute/20040223-1.c: Likewise. * gcc.c-torture/execute/20040308-1.c: Likewise. * gcc.c-torture/execute/20040811-1.c: Likewise. * gcc.c-torture/execute/20070824-1.c: Likewise. * gcc.c-torture/execute/20070919-1.c: Likewise. * gcc.c-torture/execute/built-in-setjmp.c: Likewise. * gcc.c-torture/execute/pr22061-1.c: Likewise. * gcc.c-torture/execute/pr43220.c: Likewise. * gcc.c-torture/execute/pr82210.c: Likewise. * gcc.c-torture/execute/pr86528.c: Likewise. * gcc.c-torture/execute/vla-dealloc-1.c: Likewise. * gcc.dg/20001012-2.c: Likewise. * gcc.dg/20020415-1.c: Likewise. * gcc.dg/20030331-2.c: Likewise. * gcc.dg/20101010-1.c: Likewise. * gcc.dg/Walloca-1.c: Likewise. * gcc.dg/Walloca-10.c: Likewise. * gcc.dg/Walloca-11.c: Likewise. * gcc.dg/Walloca-12.c: Likewise. * gcc.dg/Walloca-13.c: Likewise. * gcc.dg/Walloca-14.c: Likewise. * gcc.dg/Walloca-15.c: Likewise. * gcc.dg/Walloca-2.c: Likewise. * gcc.dg/Walloca-3.c: Likewise. * gcc.dg/Walloca-4.c: Likewise. * gcc.dg/Walloca-5.c: Likewise. * gcc.dg/Walloca-6.c: Likewise. * gcc.dg/Walloca-7.c: Likewise. * gcc.dg/Walloca-8.c: Likewise. * gcc.dg/Walloca-9.c: Likewise. * gcc.dg/Walloca-larger-than-2.c: Likewise. * gcc.dg/Walloca-larger-than-3.c: Likewise. * gcc.dg/Walloca-larger-than-4.c: Likewise. * gcc.dg/Walloca-larger-than.c: Likewise. * gcc.dg/Warray-bounds-22.c: Likewise. * gcc.dg/Warray-bounds-41.c: Likewise. * gcc.dg/Warray-bounds-46.c: Likewise. * gcc.dg/Warray-bounds-48-novec.c: Likewise. * gcc.dg/Warray-bounds-48.c: Likewise. * gcc.dg/Warray-bounds-50.c: Likewise. * gcc.dg/Warray-bounds-63.c: Likewise. * gcc.dg/Warray-bounds-66.c: Likewise. * gcc.dg/Wdangling-pointer.c: Likewise. * gcc.dg/Wfree-nonheap-object-2.c: Likewise. * gcc.dg/Wfree-nonheap-object.c: Likewise. * gcc.dg/Wrestrict-17.c: Likewise. * gcc.dg/Wrestrict.c: Likewise. * gcc.dg/Wreturn-local-addr-2.c: Likewise. * gcc.dg/Wreturn-local-addr-3.c: Likewise. * gcc.dg/Wreturn-local-addr-4.c: Likewise. * gcc.dg/Wreturn-local-addr-6.c: Likewise. * gcc.dg/Wsizeof-pointer-memaccess1.c: Likewise. * gcc.dg/Wstack-usage.c: Likewise. * gcc.dg/Wstrict-aliasing-bogus-vla-1.c: Likewise. * gcc.dg/Wstrict-overflow-27.c: Likewise. * gcc.dg/Wstringop-overflow-15.c: Likewise. * gcc.dg/Wstringop-overflow-23.c: Likewise. * gcc.dg/Wstringop-overflow-25.c: Likewise. * gcc.dg/Wstringop-overflow-27.c: Likewise. * gcc.dg/Wstringop-overflow-3.c: Likewise. * gcc.dg/Wstringop-overflow-39.c: Likewise. * gcc.dg/Wstringop-overflow-56.c: Likewise. * gcc.dg/Wstringop-overflow-57.c: Likewise. * gcc.dg/Wstringop-overflow-67.c: Likewise. * gcc.dg/Wstringop-overflow-71.c: Likewise. * gcc.dg/Wstringop-truncation-3.c: Likewise. * gcc.dg/Wvla-larger-than-1.c: Likewise. * gcc.dg/Wvla-larger-than-2.c: Likewise. * gcc.dg/Wvla-larger-than-3.c: Likewise. * gcc.dg/Wvla-larger-than-4.c: Likewise. * gcc.dg/Wvla-larger-than-5.c: Likewise. * gcc.dg/analyzer/boxed-malloc-1.c: Likewise. * gcc.dg/analyzer/call-summaries-2.c: Likewise. * gcc.dg/analyzer/malloc-1.c: Likewise. * gcc.dg/analyzer/malloc-reuse.c: Likewise. * gcc.dg/analyzer/out-of-bounds-diagram-12.c: Likewise. * gcc.dg/analyzer/pr93355-localealias.c: Likewise. * gcc.dg/analyzer/putenv-1.c: Likewise. * gcc.dg/analyzer/taint-alloc-1.c: Likewise. * gcc.dg/analyzer/torture/pr93373.c: Likewise. * gcc.dg/analyzer/torture/ubsan-1.c: Likewise. * gcc.dg/analyzer/vla-1.c: Likewise. * gcc.dg/atomic/stdatomic-vm.c: Likewise. * gcc.dg/attr-alloc_size-6.c: Likewise. * gcc.dg/attr-alloc_size-7.c: Likewise. * gcc.dg/attr-alloc_size-8.c: Likewise. * gcc.dg/attr-alloc_size-9.c: Likewise. * gcc.dg/attr-noipa.c: Likewise. * gcc.dg/auto-init-uninit-36.c: Likewise. * gcc.dg/auto-init-uninit-9.c: Likewise. * gcc.dg/auto-type-1.c: Likewise. * gcc.dg/builtin-alloc-size.c: Likewise. * gcc.dg/builtin-dynamic-alloc-size.c: Likewise. * gcc.dg/builtin-dynamic-object-size-1.c: Likewise. * gcc.dg/builtin-dynamic-object-size-2.c: Likewise. * gcc.dg/builtin-dynamic-object-size-3.c: Likewise. * gcc.dg/builtin-dynamic-object-size-4.c: Likewise. * gcc.dg/builtin-object-size-1.c: Likewise. * gcc.dg/builtin-object-size-2.c: Likewise. * gcc.dg/builtin-object-size-3.c: Likewise. * gcc.dg/builtin-object-size-4.c: Likewise. * gcc.dg/builtins-64.c: Likewise. * gcc.dg/builtins-68.c: Likewise. * gcc.dg/c23-auto-2.c: Likewise. * gcc.dg/c99-const-expr-13.c: Likewise. * gcc.dg/c99-vla-1.c: Likewise. * gcc.dg/fold-alloca-1.c: Likewise. * gcc.dg/gomp/pr30494.c: Likewise. * gcc.dg/gomp/vla-2.c: Likewise. * gcc.dg/gomp/vla-3.c: Likewise. * gcc.dg/gomp/vla-4.c: Likewise. * gcc.dg/gomp/vla-5.c: Likewise. * gcc.dg/graphite/pr99085.c: Likewise. * gcc.dg/guality/guality.c: Likewise. * gcc.dg/lto/pr80778_0.c: Likewise. * gcc.dg/nested-func-10.c: Likewise. * gcc.dg/nested-func-12.c: Likewise. * gcc.dg/nested-func-13.c: Likewise. * gcc.dg/nested-func-14.c: Likewise. * gcc.dg/nested-func-15.c: Likewise. * gcc.dg/nested-func-16.c: Likewise. * gcc.dg/nested-func-17.c: Likewise. * gcc.dg/nested-func-9.c: Likewise. * gcc.dg/packed-vla.c: Likewise. * gcc.dg/pr100225.c: Likewise. * gcc.dg/pr25682.c: Likewise. * gcc.dg/pr27301.c: Likewise. * gcc.dg/pr31507-1.c: Likewise. * gcc.dg/pr33238.c: Likewise. * gcc.dg/pr41470.c: Likewise. * gcc.dg/pr49120.c: Likewise. * gcc.dg/pr50764.c: Likewise. * gcc.dg/pr51491-2.c: Likewise. * gcc.dg/pr51990-2.c: Likewise. * gcc.dg/pr51990.c: Likewise. * gcc.dg/pr59011.c: Likewise. * gcc.dg/pr59523.c: Likewise. * gcc.dg/pr61561.c: Likewise. * gcc.dg/pr78468.c: Likewise. * gcc.dg/pr78902.c: Likewise. * gcc.dg/pr79972.c: Likewise. * gcc.dg/pr82875.c: Likewise. * gcc.dg/pr83844.c: Likewise. * gcc.dg/pr84131.c: Likewise. * gcc.dg/pr87099.c: Likewise. * gcc.dg/pr87320.c: Likewise. * gcc.dg/pr89045.c: Likewise. * gcc.dg/pr91014.c: Likewise. * gcc.dg/pr93986.c: Likewise. * gcc.dg/pr98721-1.c: Likewise. * gcc.dg/pr99122-2.c: Likewise. * gcc.dg/shrink-wrap-alloca.c: Likewise. * gcc.dg/sso-14.c: Likewise. * gcc.dg/strlenopt-62.c: Likewise. * gcc.dg/strlenopt-83.c: Likewise. * gcc.dg/strlenopt-84.c: Likewise. * gcc.dg/strlenopt-91.c: Likewise. * gcc.dg/torture/Wsizeof-pointer-memaccess1.c: Likewise. * gcc.dg/torture/calleesave-sse.c: Likewise. * gcc.dg/torture/pr48953.c: Likewise. * gcc.dg/torture/pr71881.c: Likewise. * gcc.dg/torture/pr71901.c: Likewise. * gcc.dg/torture/pr78742.c: Likewise. * gcc.dg/torture/pr92088-1.c: Likewise. * gcc.dg/torture/pr92088-2.c: Likewise. * gcc.dg/torture/pr93124.c: Likewise. * gcc.dg/torture/pr94479.c: Likewise. * gcc.dg/torture/stackalign/alloca-1.c: Likewise. * gcc.dg/torture/stackalign/inline-2.c: Likewise. * gcc.dg/torture/stackalign/nested-3.c: Likewise. * gcc.dg/torture/stackalign/vararg-1.c: Likewise. * gcc.dg/torture/stackalign/vararg-2.c: Likewise. * gcc.dg/tree-ssa/20030807-2.c: Likewise. * gcc.dg/tree-ssa/20080530.c: Likewise. * gcc.dg/tree-ssa/alias-37.c: Likewise. * gcc.dg/tree-ssa/builtin-sprintf-warn-22.c: Likewise. * gcc.dg/tree-ssa/builtin-sprintf-warn-25.c: Likewise. * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-15.c: Likewise. * gcc.dg/tree-ssa/pr23848-1.c: Likewise. * gcc.dg/tree-ssa/pr23848-2.c: Likewise. * gcc.dg/tree-ssa/pr23848-3.c: Likewise. * gcc.dg/tree-ssa/pr23848-4.c: Likewise. * gcc.dg/uninit-32.c: Likewise. * gcc.dg/uninit-36.c: Likewise. * gcc.dg/uninit-39.c: Likewise. * gcc.dg/uninit-41.c: Likewise. * gcc.dg/uninit-9-O0.c: Likewise. * gcc.dg/uninit-9.c: Likewise. * gcc.dg/uninit-pr100250.c: Likewise. * gcc.dg/uninit-pr101300.c: Likewise. * gcc.dg/uninit-pr101494.c: Likewise. * gcc.dg/uninit-pr98583.c: Likewise. * gcc.dg/vla-2.c: Likewise. * gcc.dg/vla-22.c: Likewise. * gcc.dg/vla-24.c: Likewise. * gcc.dg/vla-3.c: Likewise. * gcc.dg/vla-4.c: Likewise. * gcc.dg/vla-stexp-1.c: Likewise. * gcc.dg/vla-stexp-2.c: Likewise. * gcc.dg/vla-stexp-4.c: Likewise. * gcc.dg/vla-stexp-5.c: Likewise. * gcc.dg/winline-7.c: Likewise. * gcc.target/aarch64/stack-check-alloca-1.c: Likewise. * gcc.target/aarch64/stack-check-alloca-10.c: Likewise. * gcc.target/aarch64/stack-check-alloca-2.c: Likewise. * gcc.target/aarch64/stack-check-alloca-3.c: Likewise. * gcc.target/aarch64/stack-check-alloca-4.c: Likewise. * gcc.target/aarch64/stack-check-alloca-5.c: Likewise. * gcc.target/aarch64/stack-check-alloca-6.c: Likewise. * gcc.target/aarch64/stack-check-alloca-7.c: Likewise. * gcc.target/aarch64/stack-check-alloca-8.c: Likewise. * gcc.target/aarch64/stack-check-alloca-9.c: Likewise. * gcc.target/arc/interrupt-6.c: Likewise. * gcc.target/i386/pr80969-3.c: Likewise. * gcc.target/loongarch/stack-check-alloca-1.c: Likewise. * gcc.target/loongarch/stack-check-alloca-2.c: Likewise. * gcc.target/loongarch/stack-check-alloca-3.c: Likewise. * gcc.target/loongarch/stack-check-alloca-4.c: Likewise. * gcc.target/loongarch/stack-check-alloca-5.c: Likewise. * gcc.target/loongarch/stack-check-alloca-6.c: Likewise. * gcc.target/riscv/stack-check-alloca-1.c: Likewise. * gcc.target/riscv/stack-check-alloca-10.c: Likewise. * gcc.target/riscv/stack-check-alloca-2.c: Likewise. * gcc.target/riscv/stack-check-alloca-3.c: Likewise. * gcc.target/riscv/stack-check-alloca-4.c: Likewise. * gcc.target/riscv/stack-check-alloca-5.c: Likewise. * gcc.target/riscv/stack-check-alloca-6.c: Likewise. * gcc.target/riscv/stack-check-alloca-7.c: Likewise. * gcc.target/riscv/stack-check-alloca-8.c: Likewise. * gcc.target/riscv/stack-check-alloca-9.c: Likewise. * gcc.target/sparc/setjmp-1.c: Likewise. * gcc.target/x86_64/abi/ms-sysv/ms-sysv.c: Likewise. * gcc.c-torture/compile/20001221-1.c: Don't 'dg-skip-if' for '! alloca'. * gcc.c-torture/compile/20020807-1.c: Likewise. * gcc.c-torture/compile/20050801-2.c: Likewise. * gcc.c-torture/compile/920428-4.c: Likewise. * gcc.c-torture/compile/debugvlafunction-1.c: Likewise. * gcc.c-torture/compile/pr41469.c: Likewise. * gcc.c-torture/execute/920721-2.c: Likewise. * gcc.c-torture/execute/920929-1.c: Likewise. * gcc.c-torture/execute/921017-1.c: Likewise. * gcc.c-torture/execute/941202-1.c: Likewise. * gcc.c-torture/execute/align-nest.c: Likewise. * gcc.c-torture/execute/alloca-1.c: Likewise. * gcc.c-torture/execute/pr22061-4.c: Likewise. * gcc.c-torture/execute/pr36321.c: Likewise. * gcc.dg/torture/pr8081.c: Likewise. * gcc.dg/analyzer/data-model-1.c: Don't 'dg-require-effective-target alloca'. XFAIL relevant 'dg-warning's for '! alloca'. * gcc.dg/uninit-38.c: Likewise. * gcc.dg/uninit-pr98578.c: Likewise. * gcc.dg/compat/struct-by-value-22_main.c: Comment on 'dg-require-effective-target alloca'. libstdc++-v3/ * testsuite/lib/prune.exp (proc libstdc++-dg-prune): Turn 'sorry, unimplemented: dynamic stack allocation not supported' into UNSUPPORTED.
11 daysVect: Fix ICE when vect_verify_loop_lens acts on relevant mode [PR116351]Pan Li3-0/+28
This patch would like to fix the ICE similar as below, assump we have sample code: 1 │ int a, b, c; 2 │ short d, e, f; 3 │ long g (long h) { return h; } 4 │ 5 │ void i () { 6 │ for (; b; ++b) { 7 │ f = 5 >> a ? d : d << a; 8 │ e &= c | g(f); 9 │ } 10 │ } It will ice when compile with -O3 -march=rv64gc_zve64f -mrvv-vector-bits=zvl during GIMPLE pass: vect pr116351-1.c: In function ‘i’: pr116351-1.c:8:6: internal compiler error: in get_len_load_store_mode, at optabs-tree.cc:655 8 | void i () { | ^ 0x44d6b9d internal_error(char const*, ...) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517 0x44a26a6 fancy_abort(char const*, int, char const*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722 0x19e4309 get_len_load_store_mode(machine_mode, bool, internal_fn*, vec<int, va_heap, vl_ptr>*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/optabs-tree.cc:655 0x1fada40 vect_verify_loop_lens /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:1566 0x1fb2b07 vect_analyze_loop_2 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3037 0x1fb4302 vect_analyze_loop_1 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3478 0x1fb4e9a vect_analyze_loop(loop*, gimple*, vec_info_shared*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vect-loop.cc:3638 0x203c2dc try_vectorize_loop_1 /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1095 0x203c839 try_vectorize_loop /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/tree-vectorizer.cc:1212 0x203cb2c execute During vectorization the override_widen pattern matched and then will get DImode as vector_mode in loop_info. After that the loop_vinfo will step in vect_analyze_xx with below flow: vect_analyze_loop_2 |- vect_pattern_recog // over-widening and set loop_vinfo->vector_mode to DImode |- ... |- vect_analyze_loop_operations |- stmt_info->def_type == vect_reduction_def |- stmt_info->slp_type == pure_slp |- vectorizable_lc_phi // Not Hit |- vectorizable_induction // Not Hit |- vectorizable_reduction // Not Hit |- vectorizable_recurr // Not Hit |- vectorizable_live_operation // Not Hit |- vect_analyze_stmt |- stmt_info->relevant == vect_unused_in_scope |- stmt_info->live == false |- p pattern_stmt_info == (stmt_vec_info) 0x0 |- return opt_result::success (); OR |- PURE_SLP_STMT (stmt_info) && !node then dump "handled only by SLP analysis\n" |- Early return opt_result::success (); |- vectorizable_load/store/call_convert/... // Not Hit |- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P && !LOOP_VINFO_MASKS(loop_vinfo).is_empty () |- vect_verify_loop_lens (loop_vinfo) |- assert (VECTOR_MODE_P (loop_vinfo->vector_mode); // Hit assert result in ICE Finally, the DImode in loop_vinfo will hit the assert (VECTOR_MODE_P (mode)) in vect_verify_loop_lens. This patch would like to return false directly if the loop_vinfo has relevant mode like DImode for the ICE fix, but still may have mis-optimization for similar cases. We will try to cover that in separated patches. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. PR middle-end/116351 gcc/ChangeLog: * tree-vect-loop.cc (vect_verify_loop_lens): Return false if the loop_vinfo has relevant mode such as DImode. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr116351-1.c: New test. * gcc.target/riscv/rvv/base/pr116351-2.c: New test. * gcc.target/riscv/rvv/base/pr116351.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
11 daysRISC-V: Fix ratio in vsetvl fuse rule [PR115703].Robin Dapp2-0/+82
In PR115703 we fuse two vsetvls: Fuse curr info since prev info compatible with it: prev_info: VALID (insn 438, bb 2) Demand fields: demand_ge_sew demand_non_zero_avl SEW=32, VLMUL=m1, RATIO=32, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(reg:DI 0 zero) VL=(reg:DI 9 s1 [312]) curr_info: VALID (insn 92, bb 20) Demand fields: demand_ratio_and_ge_sew demand_avl SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(const_int 4 [0x4]) VL=(nil) prev_info after fused: VALID (insn 438, bb 2) Demand fields: demand_ratio_and_ge_sew demand_avl SEW=64, VLMUL=mf2, RATIO=64, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(const_int 4 [0x4]) VL=(nil). The result is vsetvl zero, zero, e64, mf2, ta, ma. The previous vsetvl set vl = 4 but here we wrongly set it to vl = 2. As all the following vsetvls only ever change the ratio we never recover. The issue is quite difficult to trigger because we can often deduce the value of d at runtime. Then very check for the value of d will be optimized away. The last known bad commit is r15-3458-g5326306e7d9d36. With that commit the output is wrong but -fno-schedule-insns makes it correct. From the next commit on the issue is latent. I still added the PR's test as scan and run check even if they don't trigger right now. Not sure if the run test will ever fail but well. I verified that the patch fixes the issue when applied on top of r15-3458-g5326306e7d9d36. PR target/115703 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Use max_sew for calculating the new LMUL. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr115703-run.c: New test. * gcc.target/riscv/rvv/autovec/pr115703.c: New test.
12 daysRISC-V: Fix failed tests for regression due to fix ICE patchJin Ma15-0/+15
Ref: https://github.com/ewlu/gcc-precommit-ci/issues/3096#issue-2854419069 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-9.c: Added new failure check. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-17.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-18.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-19.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-20.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-21.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-22.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-23.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-24.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-25.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-26.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-27.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-28.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-29.c: Likewise. * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-3.c: Likewise.
12 daysRISC-V: Fix ICE for target attributes has different xlen sizePan Li2-0/+24
This patch would like to avoid the ICE when the target attribute specific the xlen different to the cmd. Aka compile with rv64gc but target attribute with rv32gcv_zbb. For example as blow: 1 │ long foo (long a, long b) 2 │ __attribute__((target("arch=rv32gcv_zbb"))); 3 │ 4 │ long foo (long a, long b) 5 │ { 6 │ return a + (b * 2); 7 │ } when compile with rv64gc -O3, it will have ICE similar as below during RTL pass: fwprop1 test.c: In function ‘foo’: test.c:10:1: internal compiler error: in add_use, at rtl-ssa/accesses.cc:1234 10 | } | ^ 0x44d6b9d internal_error(char const*, ...) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic-global-context.cc:517 0x44a26a6 fancy_abort(char const*, int, char const*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/diagnostic.cc:1722 0x408fac9 rtl_ssa::function_info::add_use(rtl_ssa::use_info*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/accesses.cc:1234 0x40a5eea rtl_ssa::function_info::create_reg_use(rtl_ssa::function_info::build_info&, rtl_ssa::insn_info*, rtl_ssa::resource_info) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/insns.cc:496 0x4456738 rtl_ssa::function_info::add_artificial_accesses(rtl_ssa::function_info::build_info&, df_ref_flags) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:900 0x4457297 rtl_ssa::function_info::start_block(rtl_ssa::function_info::build_info&, rtl_ssa::bb_info*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1082 0x4453627 rtl_ssa::function_info::bb_walker::before_dom_children(basic_block_def*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:118 0x3e9f3fb dom_walker::walk(basic_block_def*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/domwalk.cc:311 0x445806f rtl_ssa::function_info::process_all_blocks() /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/blocks.cc:1298 0x40a22d3 rtl_ssa::function_info::function_info(function*) /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/rtl-ssa/functions.cc:51 0x3ec3f80 fwprop_init /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:893 0x3ec420d fwprop /home/pli/gcc/111/riscv-gnu-toolchain/gcc/__RISC-V_BUILD__/../gcc/fwprop.cc:963 0x3ec43ad execute Consider stage 4, we just report error for the above scenario when detect the cmd xlen is different to the target attribute during the target hook TARGET_OPTION_VALID_ATTRIBUTE_P implementation. PR target/118540 gcc/ChangeLog: * config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch): Report error when cmd xlen is different with target attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr118540-1.c: New test. * gcc.target/riscv/rvv/base/pr118540-2.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
13 days[PR target/118248] Avoid bogus alloca call in RISC-V backendJakub Jelinek1-0/+26
This is Jakub's patch and Ian's testcase for the slightly vexing fault building the D runtime with an s390x-x-riscv cross compiler. The core issue is we're allocating a vector to hold temporary registers unconditionally, including cases where the vector isn't needed because the loop isn't going to iterate. In the cases where the vector isn't needed the length is computed with an expression (x / y) - 1 where x / y will be zero. The alloca(-1) on the s390 platform triggers a fault. We haven't seen the fault with an x86 cross, but we can certainly see the bogus value being passed to alloca with a debugger. Jakub patch just conditionalizes the whole block in a sensible way. So it looks larger than it really is. I thought it might be better to do a bit of manual CSE on this code to make it even more obvious, but I think we're ultimately OK here. Ian provided the testcase, collapsed down into equivalent C code. Again, it doesn't fault on an x86-x-riscv, but I can see the incorrect behavior with a debugger. And a shout-out to Stefan for providing a docker based reproducer, it really helped track this down. PR target/118248 gcc/ * config/riscv/riscv-string.cc (riscv_block_move_straight): Only allocate REGS buffer if it will be needed. gcc/testsuite * gcc.target/riscv/pr118248.c: New test.
2025-02-15[PATCH] RISC-V: testsuite: Adjust pr117722.c scan.Robin Dapp1-2/+2
the test scanned for vmin and vmax instead of vminu and vmaxu. This patch fixes that. Will commit as obvious once the CI is OK with it. Regards Robin gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr117722.c: Scan for vminu and vmaxu.
2025-02-15RISC-V: testsuite: Fix reduc-[89].c again.Robin Dapp2-2/+2
my last fix wasn't sufficient. This patch just scans for the scalar insns now. Going to commit as obvious if the CI is happy. Regards Robin gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/reduc/reduc-8.c: Scan for add. * gcc.target/riscv/rvv/autovec/reduc/reduc-9.c: Scan for fadd.
2025-02-15RISC-V: Bugfix ICE for RVV intrinisc when using no-extension parametersJin Ma1-0/+13
When using riscv_v_abi, the return and arguments of the function should be adequately checked to avoid ICE. PR target/118872 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_fntype_abi): Strengthen the logic of the check to avoid missing the error report. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr118872.c: New test. Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Jin Ma <jinma@linux.alibaba.com>
2025-02-13RISC-V: Avoid more unsplit insns in const expander [PR118832].Robin Dapp1-0/+13
Hi, in PR118832 we have another instance of the problem already noticed in PR117878. We sometimes use e.g. expand_simple_binop for vector operations like shift or and. While this is usually OK, it causes problems when doing it late, e.g. during LRA. In particular, we might rematerialize a const_vector during LRA, which then leaves an insn laying around that cannot be split any more if it requires a pseudo. Therefore we should only use the split variants in expand_const_vector. This patch fixed the issue in the PR and also pre-emptively rewrites two other spots that might be prone to the same issue. Regtested on rv64gcv_zvl512b. As the two other cases don't have a test (so might not even trigger) I unconditionally enabled them for my testsuite run. Regards Robin PR target/118832 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Expand as vlmax insn during lra. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr118832.c: New test.
2025-02-12RISC-V: Vector pesudoinsns with x0 operand to use imm 0Vineet Gupta36-67/+67
A couple of Vector pseudoinstructions use x0 scalar which could be inefficient on wider uarches due to regfile crossing. Instead use the imm 0 form, which should be functionally equivalent. pseudoinsn orig insn with x0 this patch -------------------- -------------------- ------------------- vneg.v vd,vs vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0 vncvt.x.x.w vd,vs,vm vnsrl.wx vd,vs,x0,vm vnsrl.wi vd,vs,0,vm vwcvt.x.x.v vd,vs,vm vwadd.vx vd,vs,x0,vm (imm not supported) gcc/ChangeLog: * config/riscv/vector.md: vncvt substitute vnsrl. vnsrl with x0 replace with immediate 0. vneg substitute vrsub. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: Change expected pattern. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto. * gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto. * gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2025-02-12RISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targetsJin Ma3-3/+71
This is a follow-up to the patch below to avoid generating unrecognized vsetivl instructions for XTheadVector. https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html PR target/118601 gcc/ChangeLog: * config/riscv/riscv-string.cc (expand_block_move): Check with new constraint 'vl' instead of 'K'. (expand_vec_setmem): Likewise. (expand_vec_cmpmem): Likewise. * config/riscv/riscv-v.cc (force_vector_length_operand): Likewise. (expand_load_store): Likewise. (expand_strided_load): Likewise. (expand_strided_store): Likewise. (expand_lanes_load_store): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to... * gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here. * gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test. * gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test. Reported-by: Edwin Lu <ewlu@rivosinc.com>
2025-02-09[PR target/115123] Fix testsuite fallout from sinking heuristic changeJeff Law11-13/+11
Code sinking is just semantic preserving code motions, so it's a lot like scheduling in that code motions can change the vector configuration needed at various program points. That in turn can also change the number of vsetvls as we may or may not be able to merge them after the code motions. The sinking heuristics were twiddled several months ago resulting in a handful of scan-asm failures. This patch adjusts the tests appropriately fixing pr115123 (P3 regression). PR target/115123 gcc/testsuite * gcc.target/riscv/rvv/base/pr114352-3.c: Adjust expected output. * gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-66.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-82.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-83.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-86.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-88.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-90.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-91.c: Likewise. * gcc.target/riscv/rvv/vsetvl/avl_single-92.c: Likewise.
2025-02-08[RISC-V][PR target/118146] Fix ICE for unsupported modesJeff Law2-0/+31
There's some special case code in the risc-v move expander to try and optimize cases where the source is a subreg of a vector and the destination is a scalar mode. The code works fine except when we have no support for the given mode. ie HF or BF when those extensions aren't enabled. We'll end up tripping an assert in that case when we should have just let standard expansion do its thing. Tested in my system for rv32 and rv64, but I'll wait for the pre-commit tester to render a verdict before moving forward. PR target/118146 gcc/ * config/riscv/riscv.cc (riscv_legitimize_move): Handle subreg of vector source better to avoid ICE. gcc/testsuite * gcc.target/riscv/pr118146-1.c: New test. * gcc.target/riscv/pr118146-2.c: New test.
2025-02-07RISC-V: Make VXRM as global register [PR118103]Pan Li2-0/+84
Inspired by PR118103, the VXRM register should be treated almost the same as the FRM register, aka cooperatively-managed global register. Thus, add the VXRM to global_regs to avoid the elimination by the late-combine pass. For example as below code: 21 │ 22 │ void compute () 23 │ { 24 │ size_t vl = __riscv_vsetvl_e16m1 (N); 25 │ vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl); 26 │ vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl); 27 │ vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, vl); 28 │ 29 │ __riscv_vse16_v_u16m1 (c, vc, vl); 30 │ } 31 │ 32 │ int main () 33 │ { 34 │ initialize (); 35 │ compute(); 36 │ 37 │ return 0; 38 │ } After compile with -march=rv64gcv -O3, we will have: 30 │ compute: 31 │ csrwi vxrm,2 32 │ lui a3,%hi(a) 33 │ lui a4,%hi(b) 34 │ addi a4,a4,%lo(b) 35 │ vsetivli zero,4,e16,m1,ta,ma 36 │ addi a3,a3,%lo(a) 37 │ vle16.v v2,0(a4) 38 │ vle16.v v1,0(a3) 39 │ lui a4,%hi(c) 40 │ addi a4,a4,%lo(c) 41 │ vaaddu.vv v1,v1,v2 42 │ vse16.v v1,0(a4) 43 │ ret 44 │ .size compute, .-compute 45 │ .section .text.startup,"ax",@progbits 46 │ .align 1 47 │ .globl main 48 │ .type main, @function 49 │ main: | // csrwi vxrm,2 deleted after inline 50 │ addi sp,sp,-16 51 │ sd ra,8(sp) 52 │ call initialize 53 │ lui a3,%hi(a) 54 │ lui a4,%hi(b) 55 │ vsetivli zero,4,e16,m1,ta,ma 56 │ addi a4,a4,%lo(b) 57 │ addi a3,a3,%lo(a) 58 │ vle16.v v2,0(a4) 59 │ vle16.v v1,0(a3) 60 │ lui a4,%hi(c) 61 │ addi a4,a4,%lo(c) 62 │ li a0,0 63 │ vaaddu.vv v1,v1,v2 The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/118103 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_conditional_register_usage): Add the VXRM as the global_regs. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr118103-2.c: New test. * gcc.target/riscv/rvv/base/pr118103-run-2.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-02-06[RISC-V] Fix risc-v expected test output after recent iv changesJeff Law1-1/+1
Richard S's recent change to iv increment insertion removed a reg->reg move (which was its intent AFAICT). This triggered a failure on a riscv test. That test was meant to verify that we didn't have an extraneous reg->reg move due to a buglet in the risc-v splitters. Before the 2023 change we had two vector reg->reg moves and after the 2023 fix we had just one. With Richard's change we have none ;-) Adjusting test accordingly. Pushed to the trunk. gcc/testsuite * gcc.target/riscv/rvv/autovec/madd-split2-1.c: Update expected output.
2025-02-06loop-iv, riscv: Fix get_biv_step_1 for RISC-V [PR117506]Jakub Jelinek1-0/+5
The following test ICEs on RISC-V at least latently since r14-1622-g99bfdb072e67fa3fe294d86b4b2a9f686f8d9705 which added RISC-V specific case to get_biv_step_1 to recognize also ({zero,sign}_extend:DI (plus:SI op0 op1)) The reason for the ICE is that op1 in this case is CONST_POLY_INT which unlike the really expected VOIDmode CONST_INTs has its own mode and still satisfies CONSTANT_P. GET_MODE (rhs) (SImode) is different from outer_mode (DImode), so the function later does *inner_step = simplify_gen_binary (code, outer_mode, *inner_step, op1); but that obviously ICEs because while *inner_step is either VOIDmode or DImode, op1 has SImode. The following patch fixes it by extending op1 using code so that simplify_gen_binary can handle it. Another option would be to change the !CONSTANT_P (op1) 3 lines above this to !CONST_INT_P (op1), I think it isn't very likely that we get something useful from other constants there. 2025-02-06 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/117506 * loop-iv.cc (get_biv_step_1): For {ZERO,SIGN}_EXTEND of PLUS apply {ZERO,SIGN}_EXTEND to op1. * gcc.dg/pr117506.c: New test. * gcc.target/riscv/pr117506.c: New test.
2025-02-04testsuite: RISC-V: Ignore pr118170.c for E ABIDimitar Dimitrov1-1/+1
The -mcpu=tt-ascalon-d8 option for the test implies D extension, which is not compatible with the ILP32E and ILP64E ABIs. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr118170.c: Ignore for E ABI. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-01-31[committed][PR tree-optimization/114277] Fix missed optimization for ↵Jeff Law1-0/+9
multiplication against boolean value Andrew, Raphael and I have all poked at it in various ways over the last year or so. I think when Raphael and I first looked at it I sent us down a bit of rathole. In particular it's odd that we're using a multiply to implement a select and it seemed like recognizing the idiom and rewriting into a conditional move was the right path. That looked reasonably good for the test, but runs into problems with min/max detection elsewhere. I think that initial investigation somewhat polluted our thinking. The regression can be fixed with a fairly simple match.pd pattern. Essentially we want to handle x * (x || b) -> x x * !(x || b) -> 0 There's simplifications that can be made for "&&" cases, but I haven't seen them in practice. Rather than drop in untested patterns, I'm leaving that as a future todo. My original was two match.pd patterns. Andrew combined them into a single pattern. I've made this conditional on GIMPLE as an earlier version that simplified to a conditional move showed that when applied on GENERIC we could drop an operand with a side effect which is clearly not good. I've bootstrapped and regression tested this on x86. I've also tested on the various embedded targets in my tester. PR tree-optimization/114277 gcc/ * match.pd (a * (a || b) -> a): New pattern. (a * !(a || b) -> 0): Likewise. gcc/testsuite * gcc.target/i386/pr114277.c: New test. * gcc.target/riscv/pr114277.c: Likewise. Co-author: Andrew Pinski <quic_apinski@quicinc.com>
2025-01-29RISC-V: Fix incorrect code gen for scalar signed SAT_TRUNC [PR117688]Pan Li7-0/+58
This patch would like to fix the wroing code generation for the scalar signed SAT_TRUNC. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like add. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff7f(-129 for HImode) trunc to QImode, we actually want compare -129 to -128, but if there is no sign extend like lbu, we will compare 0xff7f to 0xffffffffffffff80(assum Xmode is DImode). Thus, we have to sign extend 0xff(Qmode) to 0xffffffffffffff7f(assume Xmode is DImode) before compare in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_sstrunc): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688.h: Add test helper macros. * gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-01-29RISC-V: Fix incorrect code gen for scalar signed SAT_SUB [PR117688]Pan Li5-0/+45
This patch would like to fix the wroing code generation for the scalar signed SAT_SUB. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like sub. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually want to -1 - 1 = -2, but if there is no sign extend like lbu, we will get 0xff - 1 = 0xfe which is incorrect. Thus, we have to sign extend 0xff(Qmode) to 0xffffffffffffffff(assume XImode is DImode) before sub in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_sssub): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688.h: Add test helper macro. * gcc.target/riscv/pr117688-sub-run-1-s16.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s32.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s64.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-01-29RISC-V: Fix incorrect code gen for scalar signed SAT_ADD [PR117688]Pan Li5-0/+51
This patch would like to fix the wroing code generation for the scalar signed SAT_ADD. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like add. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff(-1 for QImode) plus 0x2(1 for QImode), we actually want to -1 + 2 = 1, but if there is no sign extend like lbu, we will get 0xff + 2 = 0x101 which is incorrect. Thus, we have to sign extend 0xff(Qmode) to 0xffffffffffffffff(assume XImode is DImode) before plus in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_ssadd): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688-add-run-1-s16.c: New test. * gcc.target/riscv/pr117688-add-run-1-s32.c: New test. * gcc.target/riscv/pr117688-add-run-1-s64.c: New test. * gcc.target/riscv/pr117688-add-run-1-s8.c: New test. * gcc.target/riscv/pr117688.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-01-27RISC-V: testsuite: Fix reduc-8.c and reduc-9.cRobin Dapp2-2/+0
In both tests we expect a VEC_SHL_INSERT expression but we now add the initial value at the end. Just remove that scan check. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/reduc/reduc-8.c: Remove VEC_SHL_INSERT check. * gcc.target/riscv/rvv/autovec/reduc/reduc-9.c: Ditto.
2025-01-27RISC-V: testsuite: Fix gather_load_64-12-zvbb.cRobin Dapp1-1/+2
The test fails with _zvfh because we vectorize more. Just adjust the test expectations. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c: Distinguish between zvfh and !zvfh.
2025-01-26RISC-V: Make FRM as global register [PR118103]Pan Li2-0/+77
After we enabled the labe-combine pass after the mode-switching pass, it will try to combine below insn patterns into op. Aka: (insn 40 5 41 2 (set (reg:SI 11 a1 [151]) (reg:SI 69 frm)) "pr118103-simple.c":67:15 2712 {frrmsi} (nil)) (insn 41 40 7 2 (set (reg:SI 69 frm) (const_int 2 [0x2])) "pr118103-simple.c":69:8 2710 {fsrmsi_restore} (nil)) (insn 42 10 11 2 (set (reg:SI 69 frm) (reg:SI 11 a1 [151])) "pr118103-simple.c":70:8 2710 {fsrmsi_restore} (nil)) trying to combine definition of r11 in: 40: a1:SI=frm:SI into: 42: frm:SI=a1:SI instruction becomes a no-op: (set (reg:SI 69 frm) (reg:SI 69 frm)) original cost = 4 + 4 (weighted: 8.000000), replacement cost = 2147483647; keeping replacement rescanning insn with uid = 42. updating insn 42 in-place verify found no changes in insn with uid = 42. deleting insn 40 For example we have code as blow: 9 │ int test_exampe () { 10 │ test (); 11 │ 12 │ size_t vl = 4; 13 │ vfloat16m1_t va = __riscv_vle16_v_f16m1(a, vl); 14 │ va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl); 15 │ va = __riscv_vfmsac_vv_f16m1(va, va, va, vl); 16 │ 17 │ __riscv_vse16_v_f16m1(b, va, vl); 18 │ 19 │ return 0; 20 │ } it will be compiled to: 53 │ main: 54 │ addi sp,sp,-16 55 │ sd ra,8(sp) 56 │ call initialize 57 │ lui a6,%hi(b) 58 │ lui a2,%hi(a) 59 │ addi a3,a6,%lo(b) 60 │ addi a2,a2,%lo(a) 61 │ li a4,4 62 │ .L8: 63 │ fsrmi 2 64 │ vsetvli a5,a4,e16,m1,ta,ma 65 │ vle16.v v1,0(a2) 66 │ slli a1,a5,1 67 │ subw a4,a4,a5 68 │ add a2,a2,a1 69 │ vfnmadd.vv v1,v1,v1 >> The fsrm a0 insn is deleted by late-combine << 70 │ vfmsub.vv v1,v1,v1 71 │ vse16.v v1,0(a3) 72 │ add a3,a3,a1 73 │ bgt a4,zero,.L8 74 │ lh a4,%lo(b)(a6) 75 │ li a5,-20480 76 │ addi a5,a5,-1382 77 │ bne a4,a5,.L14 78 │ ld ra,8(sp) 79 │ li a0,0 80 │ addi sp,sp,16 81 │ jr ra This patch would like to add the FRM register to the global_regs as it is a cooperatively-managed global register. And then the fsrm insn will not be eliminated by late-combine. The related spec17 cam4 failure may also caused by this issue too. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/118103 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_conditional_register_usage): Add the FRM as the global_regs. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr118103-1.c: New test. * gcc.target/riscv/rvv/base/pr118103-run-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-01-21Revert "[PATCH 2/2] RISC-V:Add intrinsic cases for the CMOs extensions"Jeff Law2-116/+0
This reverts commit b22d9c8f8216d15773dee4f9677c6b26aff507fd.
2025-01-21RISC-V: Enable and adjust the testsuite for XTheadVector.Jin Ma9-59/+79
gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Enable testsuite of XTheadVector. * gcc.target/riscv/rvv/xtheadvector/pr114194.c: Adjust correctly. * gcc.target/riscv/rvv/xtheadvector/prefix.c: Likewise. * gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: Likewise. * gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: Likewise. * gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: Likewise. * gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: Likewise. * gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: Likewise. * gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: Likewise.
2025-01-20[PR target/116256] Adjust expected output in a couple testcasesJeff Law2-2/+2
I've had a long standing TODO to review the RISC-V testsuite regressions from enabling the late-combine pass (pr116256). I adjusted a few cases months ago, this adjusts a couple more were it looks like the right thing to do. All that's left after this are the vls/dup-? tests which regress in meaningful ways and I'm still investigating reasonable approaches to fix them (they play into the whole mvconst_internal pattern situation), late-combine isn't doing anything wrong. PR target/116256 gcc/testsuite * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-37.c: Update expected output. * gcc.target/riscv/rvv/vsetvl/vsetvl-15.c: Likewise.
2025-01-20[PR target/114442] Add reservations for all insn types to xiangshan-nanhu modelJeff Law1-0/+3
The RISC-V backend has checks to verify that every used insn has an associated type and that every insn type maps to some reservation in the DFA model. If either test fails we ICE. With the cpu/isa allowed to vary independently from the tune/scheduler model, it's entirely possible (in fact trivial) to trigger those kinds of ICEs. This patch "fixes" the ICEs for xiangshan-nanhu by throwing every unknown insn type into a special bucket I wouldn't be surprised if a few of them are implemented (like rotates as the chip seems to have other bitmanip extensions). But I know nothing about this design and the DFA author hasn't responded to requests to update the DFA in ~6 months. This should dramatically reduce the number of ICEs in the testsuite if someone were to turn on xiangshan-nanhu scheduling. Not strictly a regression, but a bugfix and highly isolated to the xiangshan-nanhu tuning in the RISC-V backend. So I'm gating this into gcc-15, assuming pre-commit doesn't balk. PR target/114442 gcc/ * config/riscv/xiangshan.md: Add missing insn types to a new dummy insn reservation. gcc/testsuite * gcc.target/riscv/pr114442.c: New test.
2025-01-20RISC-V: Correct the mode that is causing the program to fail for XTheadCondMovJin Ma1-0/+12
For XTheadCondMov, the bit width of rs2 should always be XLEN-sized, otherwise the program logic will be wrong. Reference form https://github.com/XUANTIE-RV/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf Synopsis Move if equal zero. Mnemonic th.mveqz rd, rs1, rs2 Description This instruction moves the content of register rs1 into rd if the content of rs2 is 0x0. Otherwise, the value of rd does not change. Operation if (reg[rs2] == 0x0) reg[rd] := reg[rs1] gcc/ChangeLog: * config/riscv/thead.md (*th_cond_mov<GPR:mode><GPR2:mode>): Change GPR2 to X. (*th_cond_mov<GPR:mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadcondmov-bug.c: New test.
2025-01-18[RISC-V][PR target/116308] Fix generation of initial RTL for atomicsJeff Law1-0/+9
While this wasn't originally marked as a regression, it almost certainly is given that older versions of GCC would have used libatomic and would not have ICE'd on this code. Basically this is another case where we directly used simplify_gen_subreg when we should have used gen_lowpart. When I fixed a similar bug a while back I noted the code in question as needing another looksie. I think at that time my brain saw the mixed modes (SI & QI) and locked up. But the QI stuff is just the shift count, not some deeper issue. So fixing is trivial. We just replace the simplify_gen_subreg with a gen_lowpart and get on with our lives. Tested on rv64 and rv32 in my tester. Waiting on pre-commit testing for final verdict. PR target/116308 gcc/ * config/riscv/riscv.cc (riscv_lshift_subword): Use gen_lowpart rather than simplify_gen_subreg. gcc/testsuite/ * gcc.target/riscv/pr116308.c: New test.
2025-01-18RISC-V: Disable RV64-only crc testcases for RV32Bohan Lei2-6/+4
These testcases require RV64 targets. They fail when -march=rv32* is specified while using an riscv64* compiler. gcc/testsuite/ChangeLog: * gcc.target/riscv/crc-21-rv64-zbc.c: Disallow rv32 targets. * gcc.target/riscv/crc-21-rv64-zbkc.c: Ditto.
2025-01-18[PR target/118357] RISC-V: Disable fusing vsetvl instructions by ↵Jin Ma1-0/+13
VSETVL_VTYPE_CHANGE_ONLY for XTheadVector. In RVV 1.0, the instruction "vsetvli zero,zero,*" indicates that the available vector length (avl) does not change. However, in XTheadVector, this same instruction signifies that the avl should take the maximum value. Consequently, when fusing vsetvl instructions, the optimization labeled "VSETVL_VTYPE_CHANGE_ONLY" is disabled for XTheadVector. PR target/118357 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Function change_vtype_only_p always returns false for XTheadVector. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/pr118357.c: New test.
2025-01-17RISC-V: Add -fcf-protection=[full|branch|return] to enable zicfiss, zicfilp.Monk Chiang4-4/+4
gcc/ChangeLog: * config/riscv/riscv.cc (is_zicfilp_p): New function. (is_zicfiss_p): New function. * config/riscv/riscv-zicfilp.cc: Update. * config/riscv/riscv.h: Update. * config/riscv/riscv.md: Update. * config/riscv/riscv-c.cc: Add CFI predefine marco. gcc/testsuite/ChangeLog: * c-c++-common/fcf-protection-1.c: Update. * c-c++-common/fcf-protection-2.c: Update. * c-c++-common/fcf-protection-3.c: Update. * c-c++-common/fcf-protection-4.c: Update. * c-c++-common/fcf-protection-5.c: Update. * c-c++-common/fcf-protection-6.c: Update. * c-c++-common/fcf-protection-7.c: Update. * gcc.target/riscv/ssp-1.c: Update. * gcc.target/riscv/ssp-2.c: Update. * gcc.target/riscv/zicfilp-call.c: Update. * gcc.target/riscv/interrupt-no-lpad.c: Update.
2025-01-17RISC-V: Add Zicfilp ISA extension.Monk Chiang2-0/+21
This patch only support landing pad value is 0. The next version will implement function signature based labeling scheme. RISC-V CFI SPEC: https://github.com/riscv/riscv-cfi gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add ZICFILP ISA string. * config.gcc: Add riscv-zicfilp.o * config/riscv/riscv-passes.def (INSERT_PASS_BEFORE): Insert landing pad instructions. * config/riscv/riscv-protos.h (make_pass_insert_landing_pad): Declare. * config/riscv/riscv-zicfilp.cc: New file. * config/riscv/riscv.cc (riscv_trampoline_init): Add landing pad instructions. (riscv_legitimize_call_address): Likewise. (riscv_output_mi_thunk): Likewise. * config/riscv/riscv.h: Update. * config/riscv/riscv.md: Add landing pad patterns. * config/riscv/riscv.opt (TARGET_ZICFILP): Define. * config/riscv/t-riscv: Add build rule for riscv-zicfilp.o gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-no-lpad.c: New test. * gcc.target/riscv/zicfilp-call.c: New test. Co-Developed-by: Greg McGary <gkm@rivosinc.com>, Kito Cheng <kito.cheng@gmail.com>
2025-01-17RISC-V: Add Zicfiss ISA extension.Monk Chiang2-0/+51
This patch is implemented according to the RISC-V CFI specification. It supports the generation of shadow stack instructions in the prologue, epilogue, non-local gotos, and unwinding. RISC-V CFI SPEC: https://github.com/riscv/riscv-cfi gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add ZICFISS ISA string. * config/riscv/predicates.md: New predicate x1x5_operand. * config/riscv/riscv.cc (riscv_expand_prologue): Insert shadow stack instructions. (riscv_expand_epilogue): Likewise. (riscv_for_each_saved_reg): Assign t0 or ra register for sspopchk instruction. (need_shadow_stack_push_pop_p): New function. Omit shadow stack operation on leaf function. * config/riscv/riscv.h (need_shadow_stack_push_pop_p): Define. * config/riscv/riscv.md: Add shadow stack patterns. (save_stack_nonlocal): Add shadow stack instructions for setjump. (restore_stack_nonlocal): Add shadow stack instructions for longjump. * config/riscv/riscv.opt (TARGET_ZICFISS): Define. libgcc/ChangeLog: * config/riscv/linux-unwind.h: Include shadow-stack-unwind.h. * config/riscv/shadow-stack-unwind.h (_Unwind_Frames_Extra): Define. (_Unwind_Frames_Increment): Define. gcc/testsuite/ChangeLog: * gcc.target/riscv/ssp-1.c: New test. * gcc.target/riscv/ssp-2.c: New test. Co-Developed-by: Greg McGary <gkm@rivosinc.com>, Kito Cheng <kito.cheng@gmail.com>
2025-01-16RISC-V: Update Xsfvqmacc and Xsfvfnrclip's testcasesLiao Shihua10-1/+255
Update Sifive Xsfvqmacc and Xsfvfnrclip extension's testcases. version log: Update synchronize LMUL settings with return type. gcc/ChangeLog: * config/riscv/vector.md: New attr set. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c: Add vsetivli checking. * gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_xu_f_qf.c: Ditto. * gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c: Ditto. * gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c: Ditto. * gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c: Ditto. * gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c: Ditto. * gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c: Ditto. * gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c: Ditto. * gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c: Ditto. * gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c: Ditto.
2025-01-15RISC-V: Fix code gen for reduction with length 0 [PR118182]Kito Cheng2-0/+55
`.MASK_LEN_FOLD_LEFT_PLUS`(or `mask_len_fold_left_plus_m`) is expecting the return value will be the start value even if the length is 0. However current code gen in RISC-V backend is not meet that semantic, it will result a random garbage value if length is 0. Let example by current code gen for MASK_LEN_FOLD_LEFT_PLUS with f64: # _148 = .MASK_LEN_FOLD_LEFT_PLUS (stmp__148.33_134, vect__70.32_138, { -1, ... }, loop_len_161, 0); vsetvli zero,a5,e64,m1,ta,ma vfmv.s.f v2,fa5 # insn 1 vfredosum.vs v1,v1,v2 # insn 2 vfmv.f.s fa5,v1 # insn 3 insn 1: - vfmv.s.f won't do anything if VL=0, which means v2 will contain garbage value. insn 2: - vfredosum.vs won't do anything if VL=0, and keep vd unchanged even TA. (v-spec say: `If vl=0, no operation is performed and the destination register is not updated.`) insn 3: - vfmv.f.s will move the value from v1 even VL=0, so this is safe. So how we fix that? we need two fix for that: 1. insn 1: need always execute with VL=1, so that we can guarantee it will always work as expect. 2. insn 2: Add new pattern to force `vd` use same reg as `vs1` (start value) for all reduction patterns, then we can guarantee vd[0] will contain the start value when vl=0 For 1, it's just a simple change to riscv_vector::expand_reduction, but for 2, we have to add _VL0_SAFE variant reduction to force `vd` use same reg as `vs1` (start value). Change since V3: - Rename _AV to _VL0_SAFE for readability. - Use non-VL0_SAFE version if VL is const or VLMAX. - Only force VL=1 for vfmv.s.f when VL is non-const and non-VLMAX. - Two more testcase. gcc/ChangeLog: PR target/118182 * config/riscv/autovec-opt.md (*widen_reduc_plus_scal_<mode>): Adjust argument for expand_reduction. (*widen_reduc_plus_scal_<mode>): Ditto. (*fold_left_widen_plus_<mode>): Ditto. (*mask_len_fold_left_widen_plus_<mode>): Ditto. (*cond_widen_reduc_plus_scal_<mode>): Ditto. (*cond_len_widen_reduc_plus_scal_<mode>): Ditto. (*cond_widen_reduc_plus_scal_<mode>): Ditto. * config/riscv/autovec.md (reduc_plus_scal_<mode>): Adjust argument for expand_reduction. (reduc_smax_scal_<mode>): Ditto. (reduc_umax_scal_<mode>): Ditto. (reduc_smin_scal_<mode>): Ditto. (reduc_umin_scal_<mode>): Ditto. (reduc_and_scal_<mode>): Ditto. (reduc_ior_scal_<mode>): Ditto. (reduc_xor_scal_<mode>): Ditto. (reduc_plus_scal_<mode>): Ditto. (reduc_smax_scal_<mode>): Ditto. (reduc_smin_scal_<mode>): Ditto. (reduc_fmax_scal_<mode>): Ditto. (reduc_fmin_scal_<mode>): Ditto. (fold_left_plus_<mode>): Ditto. (mask_len_fold_left_plus_<mode>): Ditto. * config/riscv/riscv-v.cc (expand_reduction): Add one more argument for reduction code for vl0-safe. * config/riscv/riscv-protos.h (expand_reduction): Ditto. * config/riscv/vector-iterators.md (unspec): Add _VL0_SAFE variant of reduction. (ANY_REDUC_VL0_SAFE): New. (ANY_WREDUC_VL0_SAFE): Ditto. (ANY_FREDUC_VL0_SAFE): Ditto. (ANY_FREDUC_SUM_VL0_SAFE): Ditto. (ANY_FWREDUC_SUM_VL0_SAFE): Ditto. (reduc_op): Add _VL0_SAFE variant of reduction. (order) Ditto. * config/riscv/vector.md (@pred_<reduc_op><mode>): New. gcc/testsuite/ChangeLog: PR target/118182 * gfortran.target/riscv/rvv/pr118182.f: New. * gcc.target/riscv/rvv/autovec/pr118182-1.c: New. * gcc.target/riscv/rvv/autovec/pr118182-2.c: New.
2025-01-14[RISC-V][PR target/118170] Add HF div/sqrt reservationAnton Blanchard1-0/+9
Clearly an oversight in the generic-ooo model caught by the checking code. I should have realized it was generic-ooo as we don't have a pipeline description for the tenstorrent design yet, just the costing model. The patch was extracted from the BZ which indicated Anton was the author, so I kept that. I'm listed as co-author just in case someone wants to complain about the testcase in the future. I didn't do any notable lifting here. Thanks Peter and Anton! PR target/118170 gcc/ * config/riscv/generic-ooo.md (generic_ooo_float_div_half): New reservation. gcc/testsuite * gcc.target/riscv/pr118170.c: New test. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2025-01-14[PR rtl-optimization/109592] Simplify nested shiftsRichard Sandiford2-4/+13
> The BZ in question is a failure to recognize a pair of shifts as a sign > extension. > > I originally thought simplify-rtx would be the right framework to > address this problem, but fwprop is actually better. We can write the > recognizer much simpler in that framework. > > fwprop already simplifies nested shifts/extensions to the desired RTL, > but it's not considered profitable and we throw away the good work done > by fwprop & simplifiers. > > It's hard to see a scenario where nested shifts or nested extensions > that simplify down to a single sign/zero extension isn't a profitable > transformation. So when fwprop has nested shifts/extensions that > simplifies to an extension, we consider it profitable. > > This allow us to simplify the testcase on rv64 with ZBB enabled from a > pair of shifts to a single byte or half-word sign extension. Hmm. So just to summarise something that was discussed in the PR comments, this is a case where combine's expand_compound_operation/ make_compound_operation wrangler hurts us, because the process isn't idempotent, and combine produces two complex instructions: (insn 6 3 7 2 (set (reg:DI 137 [ _3 ]) (ashift:DI (reg:DI 139 [ x ]) (const_int 24 [0x18]))) "foo.c":2:20 305 {ashldi3} (expr_list:REG_DEAD (reg:DI 139 [ x ]) (nil))) (insn 12 7 13 2 (set (reg/i:DI 10 a0) (sign_extend:DI (ashiftrt:SI (subreg:SI (reg:DI 137 [ _3 ]) 0) (const_int 24 [0x18])))) "foo.c":2:27 321 {ashrsi3_extend} (expr_list:REG_DEAD (reg:DI 137 [ _3 ]) (nil))) given two simple instructions: (insn 6 3 7 2 (set (reg:SI 137 [ _3 ]) (sign_extend:SI (subreg:QI (reg/v:DI 136 [ x ]) 0))) "foo.c":2:20 533 {*extendqisi2_bitmanip} (expr_list:REG_DEAD (reg/v:DI 136 [ x ]) (nil))) (insn 7 6 12 2 (set (reg:DI 138 [ _3 ]) (sign_extend:DI (reg:SI 137 [ _3 ]))) "foo.c":2:20 discrim 1 133 {*extendsidi2_internal} (expr_list:REG_DEAD (reg:SI 137 [ _3 ]) (nil))) If I run with -fdisable-rtl-combine then late_combine1 already does the expected transformation. Although it would be nice to fix combine, that might be difficult. If we treat combine as immutable then the options are: (1) Teach simplify-rtx to simplify combine's output into a single sign_extend. (2) Allow fwprop1 to get in first, before combine has a chance to mess things up. The patch goes for (2). Is that a fair summary? Playing devil's advocate, I suppose one advantage of (1) is that it would allow the optimisation even if the original rtl looked like combine's output. And fwprop1 doesn't distinguish between cases in which the source instruction disappears from cases in which the source instruction is kept. Thus we could transform: (set (reg:SI R2) (sign_extend:SI (reg:QI R1))) (set (reg:DI R3) (sign_extend:DI (reg:SI R2))) into: (set (reg:SI R2) (sign_extend:SI (reg:QI R1))) (set (reg:DI R3) (sign_extend:DI (reg:QI R1))) which increases the register pressure between the two instructions (since R2 and R1 are both now live). In general, there could be quite a gap between the two instructions. On the other hand, even in that case, fwprop1 would be parallelising the extensions. And since we're talking about unary operations, even two-address targets would allow R1 to be extended without tying the source and destination. Also, it seems relatively unlikely that expand would produce code that looks like combine's, since the gimple optimisers should have simplified it into conversions. So initially I was going to agree that it's worth trying in fwprop. But... [ commentary on Jeff's original approach dropped. ] So it seems like it's a bit of a mess 🙁 If we do try to fix combine, I think something like the attached would fit within the current scheme. It is a pure shift-for-shift transformation, avoiding any extensions. Will think more about it, but wanted to get the above stream of consciousness out before I finish for the day 🙂 PR rtl-optimization/109592 gcc/ * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify nested shifts with subregs. gcc/testsuite * gcc.target/riscv/pr109592.c: New test. * gcc.target/riscv/sign-extend-rshift.c: Adjust expected output Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2025-01-14RISC-V: Fix vsetvl compatibility predicate [PR118154].Robin Dapp2-0/+54
In PR118154 we emit strided stores but the first of those does not always have the proper VTYPE. That's because we erroneously delete a necessary vsetvl. In order to determine whether to elide (1) Expr[7]: VALID (insn 116, bb 17) Demand fields: demand_ratio_and_ge_sew demand_avl SEW=8, VLMUL=mf2, RATIO=16, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(reg:DI 0 zero) when e.g. (2) Expr[3]: VALID (insn 360, bb 15) Demand fields: demand_sew_lmul demand_avl SEW=64, VLMUL=m1, RATIO=64, MAX_SEW=64 TAIL_POLICY=agnostic, MASK_POLICY=agnostic AVL=(reg:DI 0 zero) VL=(reg:DI 13 a3 [345]) is already available, we use sew_ge_and_prev_sew_le_next_max_sew_and_next_ratio_valid_for_prev_sew_p. (1) requires RATIO = SEW/LMUL = 16 and an SEW >= 8. (2) has ratio = 64, though, so we cannot directly elide (1). This patch uses ratio_eq_p instead of next_ratio_valid_for_prev_sew_p. PR target/118154 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (MAX_LMUL): New define. (pre_vsetvl::earliest_fuse_vsetvl_info): Use. (pre_vsetvl::pre_global_vsetvl_info): New predicate with equal ratio. * config/riscv/riscv-vsetvl.def: Use. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr118154-1.c: New test. * gcc.target/riscv/rvv/autovec/pr118154-2.c: New test.
2025-01-14match: Keep conditional in simplification to constant [PR118140].Robin Dapp1-0/+29
In PR118140 we simplify _ifc__33 = .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11); to 1: Match-and-simplified .COND_IOR (_41, d_lsm.7_11, _46, d_lsm.7_11) to 1 when _46 == 1. This happens by removing the conditional and applying a | 1 = 1. Normally we re-introduce the conditional and its else value if needed but that does not happen here as we're not dealing with a vector type. For correctness's sake, we must not remove the conditional even for non-vector types. This patch re-introduces a COND_EXPR in such cases. For PR118140 this result in a non-vectorized loop. PR middle-end/118140 gcc/ChangeLog: * gimple-match-exports.cc (maybe_resimplify_conditional_op): Add COND_EXPR when we simplified to a scalar gimple value but still have an else value. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr118140.c: New test. * gcc.target/riscv/rvv/autovec/pr118140.c: New test.
2025-01-13RISC-V: Disallow negative step for interleaving [PR117682]Robin Dapp1-0/+15
Hi, in PR117682 we build an interleaving pattern { 1, 201, 209, 25, 161, 105, 113, 185, 65, 9, 17, 89, 225, 169, 177, 249, 129, 73, 81, 153, 33, 233, 241, 57, 193, 137, 145, 217, 97, 41, 49, 121 }; with negative step expecting wraparound semantics due to -fwrapv. For building interleaved patterns we have an optimization that does e.g. {1, 209, ...} = { 1, 0, 209, 0, ...} and {201, 25, ...} >> 8 = { 0, 201, 0, 25, ...} and IORs those. The optimization only works if the lowpart bits are zero. When overflowing e.g. with a negative step we cannot guarantee this. This patch makes us fall back to the generic merge handling for negative steps. I'm not 100% certain we're good even for positive steps. If the step or the vector length is large enough we'd still overflow and have non-zero lower bits. I haven't seen this happen during my testing, though and the patch doesn't make things worse, so... Regtested on rv64gcv_zvl512b. Let's see what the CI says. Regards Robin PR target/117682 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Fall back to merging if either step is negative. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr117682.c: New test.
2025-01-13RISC-V: testsuite: Skip test with -fltoRobin Dapp8-14/+16
Hi, the zbb-rol-ror and stack_save_restore tests use the -fno-lto option and scan the final assembly. For an invocation like -flto ... -fno-lto the output file we scan is still something like zbb-rol-ror-09.ltrans0.ltrans.s. Therefore skip the tests when "-flto" is present. This gets rid of a few UNRESOLVED tests. Regtested on rv64gcv_zvl512b. Going to push if the CI agrees. Regards Robin gcc/testsuite/ChangeLog: * gcc.target/riscv/stack_save_restore_1.c: Skip for -flto. * gcc.target/riscv/stack_save_restore_2.c: Ditto. * gcc.target/riscv/zbb-rol-ror-04.c: Ditto. * gcc.target/riscv/zbb-rol-ror-05.c: Ditto. * gcc.target/riscv/zbb-rol-ror-06.c: Ditto. * gcc.target/riscv/zbb-rol-ror-07.c: Ditto. * gcc.target/riscv/zbb-rol-ror-08.c: Ditto. * gcc.target/riscv/zbb-rol-ror-09.c: Ditto.
2025-01-13RISC-V: Remove zba check in bitwise and ashift reassociation [PR 115921]Xi Ruoyao1-0/+9
The test case long test (long x, long y) { return ((x | 0x1ff) << 3) + y; } is now compiled (-O2 -march=rv64g_zba) to li a4,4096 slliw a5,a0,3 addi a4,a4,-8 or a5,a5,a4 addw a0,a5,a1 ret Despite this check was originally intended to use zba better, now removing it actually enables the use of zba for this test case (thanks to late combine): ori a5,a0,511 sh3add a0,a5,a1 ret Obviously, bitmanip.md does not cover (any_or (ashift (reg) (imm123)) imm) at all, and even for and it just seems more natural splitting to (ashift (and (reg) (imm')) (imm123)) first, then let late combine to combine the outer ashift and the plus. I've not found any test case regressed by the removal. And "make check-gcc RUNTESTFLAGS=riscv.exp='zba-*.c'" also reports no failure. gcc/ChangeLog: PR target/115921 * config/riscv/riscv.md (<optab>_shift_reverse): Remove check for TARGET_ZBA. gcc/testsuite/ChangeLog: PR target/115921 * gcc.target/riscv/zba-shNadd-08.c: New test.
2025-01-13RISC-V: Fix the result error caused by not updating ratio when using ↵Jin Ma1-0/+14
"use_max_sew" to merge vsetvl When the vsetvl instructions of the two RVV instructions are merged using "use_max_sew", it is possible to update the sew of prev if prev.sew < next.sew, but keep the original ratio, which is obviously wrong. when the subsequent instructions are equal to the wrong ratio, it is possible to generate the wrong "vsetvli zero,zero" instruction, which will lead to unknown avl. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (demand_system::use_max_sew): Also set the ratio for PREV. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-10.c: New test.