Age | Commit message (Collapse) | Author | Files | Lines |
|
For below pattern, can be treated as a simple move because floating point
and vector share a common register on loongarch64.
(set (reg/v:SF 32 $f0 [orig:93 res ] [93])
(vec_select:SF (reg:V8SF 32 $f0 [115])
(parallel [
(const_int 0 [0])
])))
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_extract<mode>_0):
New define_insn_and_split patten.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vect-extract.c: New test.
|
|
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
SPEC2017 performance evaluation shows 1% performance improvement for fprate
GEOMEAN and no obvious regression for others. Especially, 526.blender_r +10.6%
on 3A6000.
This modification will introduce the following FAIL items:
FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Conditional combines static and invariant" 1
FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Will duplicate bb" 2
FAIL: gcc.dg/tree-ssa/update-threading.c scan-tree-dump-times optimized "Invalid sum" 0
gcc/ChangeLog:
* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/short-circuit.c: New test.
|
|
approximate division.
We found that in the spec17 521.wrf program, some loop invariant code generated
from single-precision floating-point approximate division calculation failed to
propose a loop. This is because the pseudo-register that stores the
intermediate temporary calculation results is rewritten in the implementation
of single-precision floating-point approximate division, failing to propose
invariants in the loop2_invariant pass. To this end, the intermediate temporary
calculation results are stored in new pseudo-registers without destroying the
read-write dependency, so that they could be recognized as loop invariants in
the loop2_invariant pass.
After optimization, the number of instructions of 521.wrf is reduced by 0.18%
compared with before optimization (1716612948501 -> 1713471771364).
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_emit_swdivsf): Adjust.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/invariant-recip.c: New test.
|
|
This patch fixes the recent noticed bug in RV32 glibc.
We incorrectly deleted a vsetvl:
...
and a4,a4,a3
vmv.v.i v1,0 ---> Missed vsetvl cause illegal instruction report.
vse8.v v1,0(a5)
The root cause the laterin in LCM is incorrect.
BB 358:
avloc: n_bits = 2, set = {}
kill: n_bits = 2, set = {}
antloc: n_bits = 2, set = {}
transp: n_bits = 2, set = {}
avin: n_bits = 2, set = {}
avout: n_bits = 2, set = {}
del: n_bits = 2, set = {}
cause LCM let BB 360 delete the vsetvl:
BB 360:
avloc: n_bits = 2, set = {}
kill: n_bits = 2, set = {}
antloc: n_bits = 2, set = {}
transp: n_bits = 2, set = {0 1 }
avin: n_bits = 2, set = {}
avout: n_bits = 2, set = {}
del: n_bits = 2, set = {1}
Also, remove unknown vsetvl info into local computation since it is unnecessary.
Tested on both RV32/RV64 no regression.
PR target/113469
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_lcm_local_properties): Fix bug.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr113469.c: New test.
|
|
The problem here is we don't check the return value of exact_log2
and always use that result as shifter. This fixes the issue by avoiding
the shift if the value was `-1` (which means the value was not exact a power of 2);
in this case we could either check if the values was equal to -1 or not equal to because
we then assign -1 to shift if the constant value was not equal. I chose `!=` as
it seemed to be more obvious of what the code is doing.
Committed as obvious after a build/test for aarch64-linux-gnu.
gcc/ChangeLog:
PR target/100212
* config/aarch64/aarch64.cc (aarch64_classify_index): Avoid
undefined shift after the call to exact_log2.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The J constraint can invoke undefined behavior due to it taking the
negative of the ival if ival was HWI_MIN. The fix is simple as casting
to `unsigned HOST_WIDE_INT` before doing the negative of it. This
does that.
Committed as obvious after build/test for aarch64-linux-gnu.
gcc/ChangeLog:
PR target/100204
* config/aarch64/constraints.md (J): Cast to `unsigned HOST_WIDE_INT`
before taking the negative of it.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
gcc/
PR target/113601
* config/avr/avr-mcus.def (atmega3208, atmega3209): Fix data_section_start.
|
|
Recent commit introduced a conditional branch in eh_return epilogues
that is not compatible with speculation tracking:
commit 426fddcbdad6746fe70e031f707fb07f55dfb405
Author: Szabolcs Nagy <szabolcs.nagy@arm.com>
CommitDate: 2023-11-27 15:52:48 +0000
aarch64: Use br instead of ret for eh_return
Refactor the compare zero and jump pattern and use it to fix the issue.
gcc/ChangeLog:
PR target/112987
* config/aarch64/aarch64.cc (aarch64_gen_compare_zero_and_branch): New.
(aarch64_expand_epilogue): Use the new function.
(aarch64_split_compare_and_swap): Likewise.
(aarch64_split_atomic_op): Likewise.
|
|
Spec: github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md
Contributors:
Mary Bennett <mary.bennett@embecosm.com>
Nandni Jamnadas <nandni.jamnadas@embecosm.com>
Pietra Ferreira <pietra.ferreira@embecosm.com>
Charlie Keaney
Jessica Mills
Craig Blackmore <craig.blackmore@embecosm.com>
Simon Cook <simon.cook@embecosm.com>
Jeremy Bennett <jeremy.bennett@embecosm.com>
Helene Chelin <helene.chelin@embecosm.com>
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add XCVbitmanip.
* config/riscv/constraints.md: Likewise.
* config/riscv/corev.def: Likewise.
* config/riscv/corev.md: Likewise.
* config/riscv/predicates.md: Likewise.
* config/riscv/riscv-builtins.cc (AVAIL): Likewise.
* config/riscv/riscv-ftypes.def: Likewise.
* config/riscv/riscv.opt: Likewise.
* config/riscv/riscv.cc (riscv_print_operand): Add new operand 'Y'.
* doc/extend.texi: Add XCVbitmanip builtin documentation.
* doc/sourcebuild.texi: Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cv-simd-abs-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-abs-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-add-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-add-div2-compile-1.c: New test.
* gcc.target/riscv/cv-simd-add-div4-compile-1.c: New test.
* gcc.target/riscv/cv-simd-add-div8-compile-1.c: New test.
* gcc.target/riscv/cv-simd-add-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-add-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-add-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-and-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-and-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-and-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-and-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-avg-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-avg-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-avg-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-avg-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-avgu-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-avgu-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-avgu-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-avgu-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpeq-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpeq-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpeq-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpeq-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpge-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpge-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpge-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpge-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgeu-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgeu-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgeu-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgeu-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgt-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgt-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgt-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgt-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgtu-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgtu-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgtu-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpgtu-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmple-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmple-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmple-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmple-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpleu-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpleu-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpleu-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpleu-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmplt-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmplt-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmplt-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmplt-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpltu-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpltu-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpltu-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpltu-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpne-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpne-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpne-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cmpne-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cplxconj-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cplxmul-i-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cplxmul-i-div2-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cplxmul-i-div4-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cplxmul-i-div8-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cplxmul-r-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cplxmul-r-div2-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cplxmul-r-div4-compile-1.c: New test.
* gcc.target/riscv/cv-simd-cplxmul-r-div8-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotsp-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotsp-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotsp-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotsp-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotup-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotup-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotup-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotup-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotusp-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotusp-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotusp-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-dotusp-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-extract-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-extract-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-extractu-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-extractu-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-insert-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-insert-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-march-compile-1.c: New test.
* gcc.target/riscv/cv-simd-max-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-max-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-max-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-max-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-maxu-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-maxu-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-maxu-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-maxu-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-min-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-min-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-min-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-min-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-minu-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-minu-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-minu-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-minu-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-neg-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-neg-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-or-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-or-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-or-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-or-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-pack-compile-1.c: New test.
* gcc.target/riscv/cv-simd-pack-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-packhi-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-packlo-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotsp-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotsp-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotsp-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotsp-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotup-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotup-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotup-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotup-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotusp-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotusp-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotusp-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sdotusp-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-shuffle-sci-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-shuffle2-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-shuffle2-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-shufflei0-sci-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-shufflei1-sci-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-shufflei2-sci-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-shufflei3-sci-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sll-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sll-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sll-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sll-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sra-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sra-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sra-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sra-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-srl-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-srl-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-srl-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-srl-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sub-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sub-div2-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sub-div4-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sub-div8-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sub-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sub-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-sub-sc-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-subrotmj-compile-1.c: New test.
* gcc.target/riscv/cv-simd-subrotmj-div2-compile-1.c: New test.
* gcc.target/riscv/cv-simd-subrotmj-div4-compile-1.c: New test.
* gcc.target/riscv/cv-simd-subrotmj-div8-compile-1.c: New test.
* gcc.target/riscv/cv-simd-xor-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-xor-h-compile-1.c: New test.
* gcc.target/riscv/cv-simd-xor-sc-b-compile-1.c: New test.
* gcc.target/riscv/cv-simd-xor-sc-h-compile-1.c: New test.
* lib/target-supports.exp: Add proc for XCVsimd extension.
|
|
gcc/
* config/gcn/gcn-hsa.h (ASM_SPEC): Add space after -mxnack= argument.
|
|
Also adjust some of the tests for scan-assembly. The behavior is the
same as --param=riscv-vector-abi before.
gcc/ChangeLog:
PR target/113538
* config/riscv/riscv.cc (riscv_get_arg_info): Remove the flag.
(riscv_fntype_abi): Ditto.
* config/riscv/riscv.opt: Ditto.
gcc/testsuite/ChangeLog:
PR target/113538
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: Fix the asm
check.
* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-return.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-69.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-70.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-71.c: Ditto.
* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vadd.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vfadd.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vget_vset.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vloxseg2ei16.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv32_vreinterpret.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vadd.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vfadd.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vget_vset.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vloxseg2ei16.c: Ditto.
* gcc.target/riscv/rvv/base/overloaded_rv64_vreinterpret.c: Ditto.
* gcc.target/riscv/rvv/base/spill-10.c: Ditto.
* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
* gcc.target/riscv/rvv/base/spill-9.c: Ditto.
* gcc.target/riscv/rvv/base/tuple_vundefined.c: Ditto.
* gcc.target/riscv/rvv/base/vcreate.c: Ditto.
* gcc.target/riscv/rvv/base/vlmul_ext-1.c: Ditto.
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Ditto.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Ditto.
* lib/target-supports.exp: Remove the flag.
Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
|
|
When generalising vector_cst_all_same, I'd forgotten to update
VECTOR_CST_ENCODED_ELT to VECTOR_CST_ELT. The check deliberately
looks at implicitly encoded elements in some cases.
gcc/
PR target/113572
* config/aarch64/aarch64-sve-builtins.cc (vector_cst_all_same):
Check VECTOR_CST_ELT instead of VECTOR_CST_ENCODED_ELT
gcc/testsuite/
PR target/113572
* gcc.target/aarch64/sve/pr113572.c: New test.
|
|
The LS64 movv8di pattern didn't handle loads that overlapped with
the address register (unless the overlap happened to be in the
last subload).
gcc/
PR target/113550
* config/aarch64/aarch64-simd.md: In the movv8di splitter, check
whether each split instruction is a load that clobbers the source
address. Emit that instruction last if so.
gcc/testsuite/
PR target/113550
* gcc.target/aarch64/pr113550.c: New test.
|
|
g:74e3e839ab2d36841320 handled the UXTL{,2}-ZIP[12] optimisation
in split1. The UXTL input is a 64-bit vector of N-bit elements
and the result is a 128-bit vector of 2N-bit elements. The
corresponding ZIP1 operates on 128-bit vectors of N-bit elements.
This meant that the ZIP1 input had to be a 128-bit paradoxical subreg
of the 64-bit UXTL input. In the PRs, it wasn't possible to generate
this subreg because the inputs were already subregs of a x[234]
structure of 64-bit vectors.
I don't think the same thing can happen for UXTL2->ZIP2 because
UXTL2 input is a 128-bit vector rather than a 64-bit vector.
It isn't really necessary for ZIP1 to take 128-bit inputs,
since the upper 64 bits are ignored. This patch therefore adds
a pattern for 64-bit → 128-bit ZIP1s.
In principle, we should probably use this form for all ZIP1s.
But in practice, that creates an awkward special case, and
would be quite invasive for stage 4.
gcc/
PR target/113485
* config/aarch64/aarch64-simd.md (aarch64_zip1<mode>_low): New
pattern.
(<optab><Vnarrowq><mode>2): Use it instead of generating a
paradoxical subreg for the input.
gcc/testsuite/
PR target/113485
* gcc.target/aarch64/pr113485.c: New test.
* gcc.target/aarch64/pr113573.c: Likewise.
|
|
While looking into PR113469, I notice the LCM delete a vsetvl incorrectly.
This patch add dump information of all predecessors for LCM delete vsetvl block
for better debugging.
Tested no regression.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (get_all_predecessors): New function.
(pre_vsetvl::pre_global_vsetvl_info): Add LCM delete block all
predecessors dump information.
|
|
Notice full available is computed evey round of earliest fusion which is redundant.
Actually we only need to compute it once in phase 3.
It's NFC patch and tested no regression. Committed.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_vsetvl_def_data): Remove
redundant full available computation.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.
|
|
This patch adds no fusion compile option to disable phase 2 global fusion.
It can help us to analyze the compile-time and debugging.
Committed.
gcc/ChangeLog:
* config/riscv/riscv-opts.h (enum vsetvl_strategy_enum): Add optim-no-fusion option.
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::lazy_vsetvl): Ditto.
(pass_vsetvl::execute): Ditto.
* config/riscv/riscv.opt: Ditto.
|
|
It is incorrect to use vld/vori to implement the vec_concatz<mode> because when the LSX
instruction is used to update the value of the vector register, the upper 128 bits of
the vector register will not be zeroed.
gcc/ChangeLog:
* config/loongarch/lasx.md (@vec_concatz<mode>): Remove this define_insn pattern.
* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): Use vec_concat<mode>.
|
|
TLS gd ld and ie type symbols will generate corresponding GOT entries,
so non-zero offsets cannot be generated.
The address of TLS le type symbol+addend is not implemented in binutils,
so non-zero offset is not generated here for the time being.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For symbols of type tls, non-zero Offset is not generated.
|
|
gcc/
* config/rs6000/rs6000-string.cc (expand_block_compare): Enable
P9 with m32 and mpowerpc64.
gcc/testsuite/
* gcc.target/powerpc/block-cmp-1.c: Exclude m32 and mpowerpc64.
* gcc.target/powerpc/block-cmp-4.c: Likewise.
* gcc.target/powerpc/block-cmp-8.c: New.
|
|
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_option_override_internal):
Enable -mlam=u57 by default when compiled with
-fsanitize=hwaddress.
|
|
The problem here is the builtin apply mechanism thinks the FP registers
are to be used due to get_raw_arg_mode not returning VOIDmode. This
fixes that oversight and the backend now returns VOIDmode for non-general-regs
if TARGET_GENERAL_REGS_ONLY is true.
Built and tested for aarch64-linux-gnu with no regressions.
PR target/113486
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_get_reg_raw_mode): For
TARGET_GENERAL_REGS_ONLY, return VOIDmode for non-GP_REGNUM_P regno.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/builtin_apply-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Since the match.pd transforms (zero_one == 0) ? y : z <op> y,
into ((typeof(y))zero_one * z) <op> y. Add splitters to recongize
this expression to generate SFB instructions.
gcc/ChangeLog:
PR target/113095
* config/riscv/sfb.md: New splitters to rewrite single bit
sign extension as the condition to SFB instructions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sfb.c: New test.
* gcc.target/riscv/pr113095.c: New test.
|
|
As suggested in the ticket this replaces the expansion by converting the
Advanced SIMD types to SVE types by simply printing out an SVE register for
these instructions.
This fixes the subreg issues since there are no subregs involved anymore.
gcc/ChangeLog:
PR target/109636
* config/aarch64/aarch64-simd.md (<su_optab>div<mode>3,
mulv2di3): Remove.
* config/aarch64/iterators.md (VQDIV): Remove.
(SVE_FULL_SDI_SIMD, SVE_FULL_HSDI_SIMD_DI,
SVE_I_SIMD_DI): New.
(VPRED, sve_lane_con): Add V4SI and V2DI.
* config/aarch64/aarch64-sve.md (<optab><mode>3,
@aarch64_pred_<optab><mode>): Support Advanced SIMD types.
(mul<mode>3): New, split from <optab><mode>3.
(@aarch64_pred_<optab><mode>, *post_ra_<optab><mode>3): New.
* config/aarch64/aarch64-sve2.md (@aarch64_mul_lane_<mode>,
*aarch64_mul_unpredicated_<mode>): Change SVE_FULL_HSDI to
SVE_FULL_HSDI_SIMD_DI.
gcc/testsuite/ChangeLog:
PR target/109636
* gcc.target/aarch64/sve/pr109636_1.c: New test.
* gcc.target/aarch64/sve/pr109636_2.c: New test.
* gcc.target/aarch64/sve2/pr109636_1.c: New test.
|
|
The AArch64 vector PCS does not allow simd calls with simdlen 1,
however due to a bug we currently do allow it for num == 0.
This causes us to emit a symbol that doesn't exist and we fail to link.
gcc/ChangeLog:
PR tree-optimization/113552
* config/aarch64/aarch64.cc
(aarch64_simd_clone_compute_vecsize_and_simdlen): Block simdlen 1.
gcc/testsuite/ChangeLog:
PR tree-optimization/113552
* gcc.target/aarch64/pr113552.c: New test.
* gcc.target/aarch64/simd_pcs_attribute-3.c: Remove bogus check.
|
|
GAS introduced explicit relocs since 2001, and %pcrel_hi/low were
introduced in 2014. In future, we may introduce more.
Let's convert -mexplicit-relocs option, and accpet options:
none, base, pcrel.
We also update gcc/configure.ac to set the value to option
the gas support when GCC itself is built.
gcc
* configure.ac: Detect the explicit relocs support for
mips, and define C macro MIPS_EXPLICIT_RELOCS.
* config.in: Regenerated.
* configure: Regenerated.
* doc/invoke.texi(MIPS Options): Add -mexplicit-relocs.
* config/mips/mips-opts.h: Define enum mips_explicit_relocs.
* config/mips/mips.cc(mips_set_compression_mode): Sorry if
!TARGET_EXPLICIT_RELOCS instead of just set it.
* config/mips/mips.h: Define TARGET_EXPLICIT_RELOCS and
TARGET_EXPLICIT_RELOCS_PCREL with mips_opt_explicit_relocs.
* config/mips/mips.opt: Introduce -mexplicit-relocs= option
and define -m(no-)explicit-relocs as aliases.
|
|
Since, to the best of my knowledge, all reported regressions related to
the ldp/stp fusion pass have now been fixed, and PGO+LTO bootstrap with
--enable-languages=all is working again with the passes enabled, this
patch turns the passes back on by default, as agreed with Jakub here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642478.html
gcc/ChangeLog:
* config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
to 1.
(-mlate-ldp-fusion): Likewise.
|
|
SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
that is, VSETVL PASS consume over 33 GB memory which make use impossible
to compile SPEC 2017 wrf in a laptop.
The root cause is wasting-memory variables:
unsigned num_exprs = num_bbs * num_regs;
sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs);
sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs);
I find that compute_avl_def_data can be achieved by RTL_SSA framework.
Replace the code implementation base on RTL_SSA framework.
After this patch, the memory-hog issue is fixed.
simple vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out)
is 1.673 GB.
lazy vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out)
is 2.441 GB.
Tested on both RV32 and RV64, no regression.
gcc/ChangeLog:
PR target/113495
* config/riscv/riscv-vsetvl.cc (get_expr_id): Remove.
(get_regno): Ditto.
(get_bb_index): Ditto.
(pre_vsetvl::compute_avl_def_data): Ditto.
(pre_vsetvl::earliest_fuse_vsetvl_info): Fix large memory usage.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.
gcc/testsuite/ChangeLog:
PR target/113495
* gcc.target/riscv/rvv/vsetvl/avl_single-107.c: Adapt test.
|
|
As the PR shows, we were missing code to update debug uses in the
load/store pair fusion pass. This patch fixes that.
The patch tries to give a complete treatment of the debug uses that will
be affected by the changes we make, and in particular makes an effort to
preserve debug info where possible, e.g. when re-ordering an update of
a base register by a constant over a debug use of that register. When
re-ordering loads over a debug use of a transfer register, we reset the
debug insn. Likewise when re-ordering stores over debug uses of mem.
While doing this I noticed that try_promote_writeback used a strange
choice of move_range for the pair insn, in that it chose the previous
nondebug insn instead of the insn itself. Since the insn is being
changed, these move ranges are equivalent (at least in terms of nondebug
insn placement as far as RTL-SSA is concerned), but I think it is more
natural to choose the pair insn itself. This is needed to avoid
incorrectly updating some debug uses.
gcc/ChangeLog:
PR target/113089
* config/aarch64/aarch64-ldp-fusion.cc (reset_debug_use): New.
(fixup_debug_use): New.
(fixup_debug_uses_trailing_add): New.
(fixup_debug_uses): New. Use it ...
(ldp_bb_info::fuse_pair): ... here.
(try_promote_writeback): Call fixup_debug_uses_trailing_add to
fix up debug uses of the base register that are affected by
folding in the trailing add insn.
gcc/testsuite/ChangeLog:
PR target/113089
* gcc.c-torture/compile/pr113089.c: New test.
|
|
While working on PR113089, I realised we where missing code to re-parent
trailing nondebug uses of the base register in the case of cancelling
writeback in the load/store pair pass. This patch fixes that.
gcc/ChangeLog:
PR target/113089
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
Update trailing nondebug uses of the base register in the case
of cancelling writeback.
|
|
For the testcase in the PR, we try to pair insns where the first has
writeback and the second uses the updated base register. This causes us
to record a hazard against the second insn, thus narrowing the move
range away from the end of the BB.
However, it isn't meaningful to record hazards against the other insn
in the pair, as this doesn't change which pairs can be formed, and also
doesn't change where the pair is formed (from the perspective of
nondebug insns).
To see why this is the case, consider the two cases:
- Suppoe we are finding hazards for insns[0]. If we record a hazard
against insns[1], then range.last becomes
insns[1]->prev_nondebug_insn (), but note that this is equivalent to
inserting after insns[1] (since insns[1] is being changed).
- Now consider finding hazards for insns[1]. Suppose we record
insns[0] as a hazard. Then we set range.first = insns[0], which is a
no-op.
As such, it seems better to never record hazards against the other insn
in the pair, as we check whether the insns themselves are suitable for
combination separately (e.g. for ldp checking that they use distinct
transfer registers). Avoiding unnecessarily narrowing the move range
avoids unnecessarily re-ordering over debug insns.
This should also mean that we can only narrow the move range away from
the end of the BB in the case that we record a hazard for insns[0]
against insns[1]->prev_nondebug_insn () or earlier. This means that for
the non-call-exceptions case, either the move range includes insns[1],
or we reject the pair (thus the assert tripped in the PR should always
hold).
gcc/ChangeLog:
PR target/113356
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::try_fuse_pair):
Don't record hazards against the opposite insn in the pair.
gcc/testsuite/ChangeLog:
PR target/113356
* gcc.target/aarch64/pr113356.C: New test.
|
|
As the PR shows (specifically #c7) we are missing updating uses of mem
when inserting an stp in the aarch64 load/store pair fusion pass. This
patch fixes that.
RTL-SSA has a simple view of memory and by default doesn't allow stores
to be re-ordered w.r.t. other stores. In the ldp fusion pass, we do our
own alias analysis and so can re-order stores over other accesses when
we deem this is safe. If neither store can be re-purposed (moved into
the required position to form the stp while respecting the RTL-SSA
constraints), then we turn both the candidate stores into "tombstone"
insns (logically delete them) and insert a new stp insn.
As it stands, we implement the insert case separately (after dealing
with the candidate stores) in fuse_pair by inserting into the middle of
the vector of changes. This is OK when we only have to insert one
change, but with this fix we would need to insert the change for the new
stp plus multiple changes to fix up uses of mem (note the number of
fix-ups is naturally bounded by the alias limit param to prevent
quadratic behaviour). If we kept the code structured as is and inserted
into the middle of the vector, that would lead to repeated moving of
elements in the vector which seems inefficient. The structure of the
code would also be a little unwieldy.
To improve on that situation, this patch introduces a helper class,
stp_change_builder, which implements a state machine that helps to build
the required changes directly in program order. That state machine is
reponsible for deciding what changes need to be made in what order, and
the code in fuse_pair then simply follows those steps.
Together with the fix in the previous patch for installing new defs
correctly in RTL-SSA, this fixes PR113070.
We take the opportunity to rename the function decide_stp_strategy to
try_repurpose_store, as that seems more descriptive of what it actually
does, since stp_change_builder is now responsible for the overall change
strategy.
gcc/ChangeLog:
PR target/113070
* config/aarch64/aarch64-ldp-fusion.cc
(struct stp_change_builder): New.
(decide_stp_strategy): Reanme to ...
(try_repurpose_store): ... this.
(ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
construct stp changes. Fix up uses when inserting new stp insns.
|
|
Since r14-6945-gc659dd8bfb55e02a1b97407c1c28f7a0e8f7f09b
there is a warning
../../gcc/config/ia64/ia64.cc: In function ‘void ia64_start_function(FILE*, const char*, tree)’:
../../gcc/config/ia64/ia64.cc:3889:59: warning: unused parameter ‘decl’ [-Wunused-parameter]
3889 | ia64_start_function (FILE *file, const char *fnname, tree decl)
| ~~~~~^~~~
which presumably for bootstraps breaks the bootstrap.
While the decl parameter is passed to the ASM_OUTPUT_FUNCTION_LABEL macro,
that macro actually doesn't use that argument, so the removal of
ATTRIBUTE_UNUSED was incorrect.
This patch reverts the first ia64.cc hunk from r14-6945.
2024-01-23 Jeff Law <jlaw@ventanamicro.com>
Jakub Jelinek <jakub@redhat.com>
* config/ia64/ia64.cc (ia64_start_function): Add ATTRIBUTE_UNUSED to
decl.
|
|
In the original fix for this PR, I'd made sure that
including <arm_sme.h> didn't reach the final return in
simulate_builtin_function_decl (which would indicate duplicate
function definitions). But it seems I forgot to do the same
thing for C++, which defines all of its overloads directly.
This patch fixes a case where we still recorded duplicate
functions for C++. Thanks to Iain for reporting the resulting
GC ICE and for help with reproducing it.
gcc/
PR target/112989
* config/aarch64/aarch64-sve-builtins-shapes.cc (build_one): Skip
MODE_single variants of functions that don't take tuple arguments.
|
|
The PR shows two different cases where try_promote_writeback produces an
RTL pattern which isn't recognized. Currently this leads to an ICE, as
we assert recog success, but I think it's better just to back out of the
changes gracefully if recog fails (as we do in the main fuse_pair case).
In theory since we check the ranges here recog shouldn't fail (which is
why I had the assert in the first place), but the PR shows an edge case
in the patterns where if we form a pre-writeback pair where the
writeback offset is exactly -S, where S is the size in bytes of one
transfer register, we fail to match the expected pattern as the patterns
look explicitly for plus operands in the mems. I think fixing this
would require adding at least four new special-case patterns to
aarch64.md for what doesn't seem to be a particularly useful variant of
the insns. Even if we were to do that, I think it would be GCC 15
material, and it's better to just punt for GCC 14.
The ILP32 case in the PR is a bit different, as that shows us trying to
combine a pair with DImode base register operands in the mems together
with an SImode trailing update of the base register. This leads to us
forming an RTL pattern which references the base register in both SImode
and DImode, which also fails to recog. Again, I think it's best just to
take the missed optimization for now. If we really want to make this
(try_promote_writeback) work for ILP32, we can try to do it for GCC 15.
gcc/ChangeLog:
PR target/113114
* config/aarch64/aarch64-ldp-fusion.cc (try_promote_writeback):
Don't assert recog success, just punt if the writeback pair
isn't recognized.
gcc/testsuite/ChangeLog:
PR target/113114
* gcc.c-torture/compile/pr113114.c: New test.
* gcc.target/aarch64/pr113114.c: New test.
|
|
I see
../../gcc/config/gcn/gcn.cc: In function ‘void gcn_hsa_declare_function_name(FILE*, const char*, tree)’:
../../gcc/config/gcn/gcn.cc:6568:67: warning: unused parameter ‘decl’ [-Wunused-parameter]
6568 | gcn_hsa_declare_function_name (FILE *file, const char *name, tree decl)
| ~~~~~^~~~
warning presumably since r14-6945-gc659dd8bfb55e02a1b97407c1c28f7a0e8f7f09b
Previously, the argument was anonymous, but now it is passed to a macro
which ignores it, so I think we should go with ATTRIBUTE_UNUSED.
2024-01-23 Jakub Jelinek <jakub@redhat.com>
* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Add
ATTRIBUTE_UNUSED to decl.
|
|
Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler
macro.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false
for SYMBOL_TLS_LDM and SYMBOL_TLS_GD.
(loongarch_call_tls_get_addr): Do not split symbols of
SYMBOL_TLS_LDM or SYMBOL_TLS_GD if la_opt_explicit_relocs is
EXPLICIT_RELOCS_AUTO.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Check
for la.tls.ld and la.tls.gd.
|
|
So the problem here is the 2 functions check_cpu and check_arch use
the wrong variable to check if an alias is valid for that cpu/arch.
check_cpu uses cpu_optaliases instead of cpu_opt_alias. cpu_optaliases
is an array of index'ed by the cpuname that contains all of the valid aliases
for that cpu but cpu_opt_alias is an double index array which is index'ed
by cpuname and the alias which provides what is the alias for that option.
Similar thing happens for check_arch and arch_optaliases vs arch_optaliases.
Tested by running:
```
awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+simd" config/arm/arm-cpus.in
awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon" config/arm/arm-cpus.in
awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon-vfpv3" config/arm/arm-cpus.in
```
And they don't return error back.
gcc/ChangeLog:
PR target/113030
* config/arm/parsecpu.awk (check_cpu): Use cpu_opt_alias
instead of cpu_optaliases.
(check_arch): Use arch_opt_alias instead of arch_optaliases.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Notice there is a AI benchmark, GCC vs Clang has 3% performance drop.
It's because Clang/LLVM has a simplification transform vmv.v.x (avl = 1) into vmv.s.x.
Since vmv.s.x has more flexible vsetvl demand than vmv.v.x that can allow us to have
better chances to fuse vsetvl.
Consider this following case:
void
foo (uint32_t *outputMat, uint32_t *inputMat)
{
vuint32m1_t matRegIn0 = __riscv_vle32_v_u32m1 (inputMat, 4);
vuint32m1_t matRegIn1 = __riscv_vle32_v_u32m1 (inputMat + 4, 4);
vuint32m1_t matRegIn2 = __riscv_vle32_v_u32m1 (inputMat + 8, 4);
vuint32m1_t matRegIn3 = __riscv_vle32_v_u32m1 (inputMat + 12, 4);
vbool32_t oddMask
= __riscv_vreinterpret_v_u32m1_b32 (__riscv_vmv_v_x_u32m1 (0xaaaa, 1));
vuint32m1_t smallTransposeMat0
= __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn0, matRegIn1, 1, 4);
vuint32m1_t smallTransposeMat2
= __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn2, matRegIn3, 1, 4);
vuint32m1_t outMat0 = __riscv_vslideup_vx_u32m1_tu (smallTransposeMat0,
smallTransposeMat2, 2, 4);
__riscv_vse32_v_u32m1 (outputMat, outMat0, 4);
}
Before this patch:
vsetivli zero,4,e32,m1,ta,ma
li a5,45056
addi a2,a1,16
addi a3,a1,32
addi a4,a1,48
vle32.v v1,0(a1)
vle32.v v4,0(a2)
vle32.v v2,0(a3)
vle32.v v3,0(a4)
addiw a5,a5,-1366
vsetivli zero,1,e32,m1,ta,ma
vmv.v.x v0,a5 ---> Since it avl = 1, we can transform it into vmv.s.x
vsetivli zero,4,e32,m1,tu,mu
vslideup.vi v1,v4,1,v0.t
vslideup.vi v2,v3,1,v0.t
vslideup.vi v1,v2,2
vse32.v v1,0(a0)
ret
After this patch:
li a5,45056
addi a2,a1,16
vsetivli zero,4,e32,m1,tu,mu
addiw a5,a5,-1366
vle32.v v3,0(a2)
addi a3,a1,32
addi a4,a1,48
vle32.v v1,0(a1)
vmv.s.x v0,a5
vle32.v v2,0(a3)
vslideup.vi v1,v3,1,v0.t
vle32.v v3,0(a4)
vslideup.vi v2,v3,1,v0.t
vslideup.vi v1,v2,2
vse32.v v1,0(a0)
ret
Tested on both RV32 and RV64 no regression.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (splat_to_scalar_move_p): New function.
* config/riscv/riscv-v.cc (splat_to_scalar_move_p): Ditto.
* config/riscv/vector.md: Simplify vmv.v.x. into vmv.s.x.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/attribute-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/attribute-3.c: New test.
|
|
This patch fixes the recent regression:
FAIL: gcc.dg/torture/float32-tg-2.c -O1 (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c -O1 (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c -O2 (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c -O2 (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c -O3 -g (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c -O3 -g (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c -Os (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c -Os (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c -O1 (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c -O1 (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c -O2 (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c -O2 (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c -O3 -g (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c -O3 -g (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c -Os (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c -Os (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c -O1 (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c -O1 (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c -O2 (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c -O2 (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c -O3 -g (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c -O3 -g (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c -Os (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c -Os (test for excess errors)
due to commit 86de9b66480b710202a2898cf513db105d8c432f.
The root cause is register_operand and reg_or_subregno are consistent so we reach the assertion fail.
We shouldn't worry about subreg:...VL_REGNUM since it's impossible that we can have such situation,
that is, we only have (set (reg) (reg:VL_REGNUM)) which generate "csrr vl" ASM for first fault load instructions (vleff).
So, using REG_P and REGNO must be totally solid and robostic.
Since we don't allow VL_RENUM involved into register allocation and we don't have such constraint, we always use this
following pattern to generate "csrr vl" ASM:
(define_insn "read_vlsi"
[(set (match_operand:SI 0 "register_operand" "=r")
(reg:SI VL_REGNUM))]
"TARGET_VECTOR"
"csrr\t%0,vl"
[(set_attr "type" "rdvl")
(set_attr "mode" "SI")])
So the check in riscv.md is to disallow such situation fall into move pattern in riscv.md
Tested on both RV32/RV64 no regression.
PR target/109092
gcc/ChangeLog:
* config/riscv/riscv.md: Use reg instead of subreg.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr109092.c: New test.
|
|
diagnostic [PR111966]
With debugging enabled, '*.mkoffload.dbg.o' files are generated. The e_flags
header of all *.o files must be the same - otherwise, the linker complains.
Since r14-4734-g56ed1055b2f40ac162ae8d382280ac07a33f789f the -march= default
is now gfx900. If compiling without any -march= flag, the default value is
used by the compiler but not passed to mkoffload. Hence, mkoffload.cc's uses
its own default for march - unfortunately, it still had gfx803/fiji as default,
leading to the linker error: 'incompatible mach'. Solution: Update the
default to gfx900.
While debugging it, I saw that /tmp/cc*.mkoffload.dbg.o kept accumulating;
there were a couple of issues with the handling:
* dbgobj was always added to files_to_cleanup
* If copy_early_debug_info returned true, dbgobj was added again
-> pointless and in theory a race if the same file was added in the
faction of a second.
* If copy_early_debug_info returned false,
- In exactly one case, it already deleted the file it self
(same potential race as above)
- The pointer dbgobj was freed - such that files_to_cleanup contained
a dangling pointer - probably the reason that stale files remained.
Solution: Only if copy_early_debug_info returns true, dbgobj is added to
files_to_cleanup. If it returns false, the file is unlinked before freeing
the pointer.
When compiling, GCC warned about several fatal_error messages as having
no %<...%> or %qs quotes. This patch now silences several of those warnings
by using those quotes.
gcc/ChangeLog:
PR other/111966
* config/gcn/mkoffload.cc (elf_arch): Change default to gfx900
to match the compiler default.
(simple_object_copy_lto_debug_sections): Never unlink the outfile
on error as the caller does so.
(maybe_unlink, compile_native): Use %<...%> and %qs in fatal_error.
(main): Likewise. Fix 'mkoffload.dbg.o' cleanup.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
|
|
vfirst/vmsbf/vmsif/vmsof instructions are supposed to demand ratio instead of demanding sew_lmul.
But my previous typo makes VSETVL PASS miss honor the risc-v v spec.
Consider this following simple case:
int foo4 (void * in, void * out)
{
vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4);
v = __riscv_vadd_vv_i32m1 (v, v, 4);
vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32(v);
mask = __riscv_vmsof_m_b32(mask, 4);
return __riscv_vfirst_m_b32(mask, 4);
}
Before this patch:
foo4:
vsetivli zero,4,e32,m1,ta,ma
vle32.v v1,0(a0)
vadd.vv v1,v1,v1
vsetvli zero,zero,e8,mf4,ta,ma ----> redundant.
vmsof.m v2,v1
vfirst.m a0,v2
ret
After this patch:
foo4:
vsetivli zero,4,e32,m1,ta,ma
vle32.v v1,0(a0)
vadd.vv v1,v1,v1
vmsof.m v2,v1
vfirst.m a0,v2
ret
Confirm RVV spec and Clang, this patch makes VSETVL PASS match the correct behavior.
Tested on both RV32/RV64, no regression.
gcc/ChangeLog:
* config/riscv/vector.md: Fix vfirst/vmsbf/vmsof ratio attributes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/attribute-1.c: New test.
|
|
v2:
Avoid internal ICE for the case below.
vint8mf8_t test_vle8_v_i8mf8_m(vbool64_t vm, const int32_t *rs1, size_t vl) {
return __riscv_vle8(vm, rs1, vl);
}
v1:
Change the hash value of overloaded intrinsic from considering
all parameter types to:
1. Encoding vector data type
2. In order to distinguish vle8_v_i8mf8_m(vbool64_t vm, const int8_t *rs1, size_t vl)
and vle8_v_u8mf8_m(vbool64_t vm, const uint8_t *rs1, size_t vl), encode the pointer type
3. In order to distinguish vfadd_vv_f32mf2_rm(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl)
and vfadd_vv_f32mf2(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl), encode the number of
parameters. The same goes for the vxrm intrinsics.
PR target/113420
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins.cc (has_vxrm_or_frm_p):remove.
(registered_function::overloaded_hash):refactor.
(resolve_overloaded_builtin):avoid internal ICE.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr113420-1.c: New test.
* gcc.target/riscv/rvv/base/pr113420-2.c: New test.
|
|
gcc/
* config/riscv/riscv.cc (riscv_init_cumulative_args): Install
correcction version of last change.
|
|
The signature was still using ATTRIBUTE_UNUSED and actually marked one
of the used arguments with ATTRIBUTE_UNUSED.
This patch drops the decorations and instead remove the name of arguments
which are actually unused which is the preferred way to handle this now
when we can.
Bootstrapped. I didn't have test results on the platform where I
bootstrapped, so no results to compare against. Given its NFC, I
think we're OK without the regression results.
gcc/
* config/riscv/riscv.cc (riscv_init_cumulative_args): Update and
fix bugs in signature.
|
|
../../gcc/config/riscv/riscv.cc: In function 'void riscv_init_cumulative_args(CUMULATIVE_ARGS*, tree, rtx, tree, int)':
../../gcc/config/riscv/riscv.cc:4879:34: error: unused parameter 'fndecl' [-Werror=unused-parameter]
4879 | tree fndecl,
| ~~~~~^~~~~~
../../gcc/config/riscv/riscv.cc: In function 'bool riscv_vector_mode_supported_any_target_p(machine_mode)':
../../gcc/config/riscv/riscv.cc:10537:56: error: unused parameter 'mode' [-Werror=unused-parameter]
10537 | riscv_vector_mode_supported_any_target_p (machine_mode mode)
| ~~~~~~~~~~~~~^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:2559: riscv.o] Error 1
Suppress these warnings.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_init_cumulative_args): Suppress warning.
(riscv_vector_mode_supported_any_target_p): Ditto.
|
|
PR110934 is a problem on m68k where -fzero-call-used-regs -fpic ICEs
when clearing an FP register.
The generic code generates an XFmode move of zero to that register,
which becomes an XFmode load from initialized data, which due to -fpic
uses a non-constant address, which the backend rejects. The
zero-call-used-regs pass runs very late, after register allocation and
frame layout, and at that point we can't allow new uses of the PIC
register or new pseudos.
To clear an FP register on m68k it's enough to do the move in SFmode,
but the generic code can't be told to do that, so this patch updates
m68k to use its own TARGET_ZERO_CALL_USED_REGS.
Bootstrapped and regression tested on m68k-linux-gnu.
Ok for master? (I don't have commit rights.)
gcc/
PR target/110934
* config/m68k/m68k.cc (m68k_zero_call_used_regs): New function.
(TARGET_ZERO_CALL_USED_REGS): Define.
gcc/testsuite/
PR target/110934
* gcc.target/m68k/pr110934.c: New test.
|
|
When generating RMW logical operations on m68k, the backend
recognizes single-bit operations and rewrites them as bit
instructions on operands adjusted to address the intended byte.
When offsetting the addresses the backend keeps the modes as
SImode, even though the actual access will be in QImode.
The uclinux target defines M68K_OFFSETS_MUST_BE_WITHIN_SECTIONS_P
which adds a check that the adjusted operand is within the bounds
of the original object. Since the address has been offset it is
not, and the compiler ICEs.
The bug is that the modes of the adjusted operands should have been
narrowed to QImode, which is that this patch does. Nearby code
which narrows to HImode gets that right.
Bootstrapped and regression tested on m68k-linux-gnu.
Ok for master? (Note: I don't have commit rights.)
gcc/
PR target/108640
* config/m68k/m68k.cc (output_andsi3): Use QImode for
address adjusted for 1-byte RMW access.
(output_iorsi3): Likewise.
(output_xorsi3): Likewise.
gcc/testsuite/
PR target/108640
* gcc.target/m68k/pr108640.c: New test.
|
|
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by
RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode, X0_REGNUM)
every time we call RVV_VLMAX, that is, we are always generating garbage and redundant
(reg:DI 0 zero) rtx.
After this patch fix, the memory hog is gone.
Time variable usr sys wall GGC
machine dep reorg : 1.99 ( 9%) 0.35 ( 56%) 2.33 ( 10%) 939M ( 80%) [Before this patch]
machine dep reorg : 1.71 ( 6%) 0.16 ( 27%) 3.77 ( 6%) 659k ( 0%) [After this patch]
Time variable usr sys wall GGC
machine dep reorg : 75.93 ( 18%) 14.23 ( 88%) 90.15 ( 21%) 33383M ( 95%) [Before this patch]
machine dep reorg : 56.00 ( 14%) 7.92 ( 77%) 63.93 ( 15%) 4361k ( 0%) [After this patch]
Test is running. Ok for trunk if I passed the test with no regresion ?
PR target/113495
gcc/ChangeLog:
* config/riscv/riscv-protos.h (RVV_VLMAX): Change to regno_reg_rtx[X0_REGNUM].
(RVV_VUNDEF): Ditto.
* config/riscv/riscv-vsetvl.cc: Add timevar.
|
|
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_std_ext): Remove.
(riscv_subset_list::parse_multiletter_ext): Remove.
* config/riscv/riscv-subset.h
(riscv_subset_list::parse_std_ext): Remove.
(riscv_subset_list::parse_multiletter_ext): Remove.
|