aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2024-01-26LoongArch: Split vec_selects of bottom elements into simple moveJiahao Xu1-0/+15
For below pattern, can be treated as a simple move because floating point and vector share a common register on loongarch64. (set (reg/v:SF 32 $f0 [orig:93 res ] [93]) (vec_select:SF (reg:V8SF 32 $f0 [115]) (parallel [ (const_int 0 [0]) ]))) gcc/ChangeLog: * config/loongarch/lasx.md (vec_extract<mode>_0): New define_insn_and_split patten. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-extract.c: New test.
2024-01-26LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUITJiahao Xu1-0/+1
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the short-circuit operation instead of the non-short-circuit operation. SPEC2017 performance evaluation shows 1% performance improvement for fprate GEOMEAN and no obvious regression for others. Especially, 526.blender_r +10.6% on 3A6000. This modification will introduce the following FAIL items: FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Conditional combines static and invariant" 1 FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Will duplicate bb" 2 FAIL: gcc.dg/tree-ssa/update-threading.c scan-tree-dump-times optimized "Invalid sum" 0 gcc/ChangeLog: * config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define. gcc/testsuite/ChangeLog: * gcc.target/loongarch/short-circuit.c: New test.
2024-01-26LoongArch: Optimize implementation of single-precision floating-point ↵Li Wei1-6/+13
approximate division. We found that in the spec17 521.wrf program, some loop invariant code generated from single-precision floating-point approximate division calculation failed to propose a loop. This is because the pseudo-register that stores the intermediate temporary calculation results is rewritten in the implementation of single-precision floating-point approximate division, failing to propose invariants in the loop2_invariant pass. To this end, the intermediate temporary calculation results are stored in new pseudo-registers without destroying the read-write dependency, so that they could be recognized as loop invariants in the loop2_invariant pass. After optimization, the number of instructions of 521.wrf is reduced by 0.18% compared with before optimization (1716612948501 -> 1713471771364). gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_emit_swdivsf): Adjust. gcc/testsuite/ChangeLog: * gcc.target/loongarch/invariant-recip.c: New test.
2024-01-26RISC-V: Fix incorrect LCM delete bug [VSETVL PASS]Juzhe-Zhong1-9/+10
This patch fixes the recent noticed bug in RV32 glibc. We incorrectly deleted a vsetvl: ... and a4,a4,a3 vmv.v.i v1,0 ---> Missed vsetvl cause illegal instruction report. vse8.v v1,0(a5) The root cause the laterin in LCM is incorrect. BB 358: avloc: n_bits = 2, set = {} kill: n_bits = 2, set = {} antloc: n_bits = 2, set = {} transp: n_bits = 2, set = {} avin: n_bits = 2, set = {} avout: n_bits = 2, set = {} del: n_bits = 2, set = {} cause LCM let BB 360 delete the vsetvl: BB 360: avloc: n_bits = 2, set = {} kill: n_bits = 2, set = {} antloc: n_bits = 2, set = {} transp: n_bits = 2, set = {0 1 } avin: n_bits = 2, set = {} avout: n_bits = 2, set = {} del: n_bits = 2, set = {1} Also, remove unknown vsetvl info into local computation since it is unnecessary. Tested on both RV32/RV64 no regression. PR target/113469 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_lcm_local_properties): Fix bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr113469.c: New test.
2024-01-25aarch64: Fix/avoid undefinedness in aarch64_classify_index [PR100212]Andrew Pinski1-1/+3
The problem here is we don't check the return value of exact_log2 and always use that result as shifter. This fixes the issue by avoiding the shift if the value was `-1` (which means the value was not exact a power of 2); in this case we could either check if the values was equal to -1 or not equal to because we then assign -1 to shift if the constant value was not equal. I chose `!=` as it seemed to be more obvious of what the code is doing. Committed as obvious after a build/test for aarch64-linux-gnu. gcc/ChangeLog: PR target/100212 * config/aarch64/aarch64.cc (aarch64_classify_index): Avoid undefined shift after the call to exact_log2. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-25aarch64: Fix undefinedness while testing the J constraint [PR100204]Andrew Pinski1-1/+1
The J constraint can invoke undefined behavior due to it taking the negative of the ival if ival was HWI_MIN. The fix is simple as casting to `unsigned HOST_WIDE_INT` before doing the negative of it. This does that. Committed as obvious after build/test for aarch64-linux-gnu. gcc/ChangeLog: PR target/100204 * config/aarch64/constraints.md (J): Cast to `unsigned HOST_WIDE_INT` before taking the negative of it. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-25AVR: target/113601 - Fix wrong data start for ATmega3208 and ATmega3209.Georg-Johann Lay1-2/+2
gcc/ PR target/113601 * config/avr/avr-mcus.def (atmega3208, atmega3209): Fix data_section_start.
2024-01-25aarch64: Fix eh_return for -mtrack-speculation [PR112987]Szabolcs Nagy1-43/+32
Recent commit introduced a conditional branch in eh_return epilogues that is not compatible with speculation tracking: commit 426fddcbdad6746fe70e031f707fb07f55dfb405 Author: Szabolcs Nagy <szabolcs.nagy@arm.com> CommitDate: 2023-11-27 15:52:48 +0000 aarch64: Use br instead of ret for eh_return Refactor the compare zero and jump pattern and use it to fix the issue. gcc/ChangeLog: PR target/112987 * config/aarch64/aarch64.cc (aarch64_gen_compare_zero_and_branch): New. (aarch64_expand_epilogue): Use the new function. (aarch64_split_compare_and_swap): Likewise. (aarch64_split_atomic_op): Likewise.
2024-01-25RISC-V: Add support for XCVsimd extension in CV32E40PMary Bennett8-0/+2133
Spec: github.com/openhwgroup/core-v-sw/blob/master/specifications/corev-builtin-spec.md Contributors: Mary Bennett <mary.bennett@embecosm.com> Nandni Jamnadas <nandni.jamnadas@embecosm.com> Pietra Ferreira <pietra.ferreira@embecosm.com> Charlie Keaney Jessica Mills Craig Blackmore <craig.blackmore@embecosm.com> Simon Cook <simon.cook@embecosm.com> Jeremy Bennett <jeremy.bennett@embecosm.com> Helene Chelin <helene.chelin@embecosm.com> gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add XCVbitmanip. * config/riscv/constraints.md: Likewise. * config/riscv/corev.def: Likewise. * config/riscv/corev.md: Likewise. * config/riscv/predicates.md: Likewise. * config/riscv/riscv-builtins.cc (AVAIL): Likewise. * config/riscv/riscv-ftypes.def: Likewise. * config/riscv/riscv.opt: Likewise. * config/riscv/riscv.cc (riscv_print_operand): Add new operand 'Y'. * doc/extend.texi: Add XCVbitmanip builtin documentation. * doc/sourcebuild.texi: Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/cv-simd-abs-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-abs-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-add-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-add-div2-compile-1.c: New test. * gcc.target/riscv/cv-simd-add-div4-compile-1.c: New test. * gcc.target/riscv/cv-simd-add-div8-compile-1.c: New test. * gcc.target/riscv/cv-simd-add-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-add-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-add-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-and-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-and-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-and-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-and-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-avg-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-avg-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-avg-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-avg-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-avgu-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-avgu-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-avgu-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-avgu-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpeq-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpeq-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpeq-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpeq-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpge-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpge-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpge-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpge-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgeu-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgeu-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgeu-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgeu-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgt-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgt-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgt-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgt-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgtu-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgtu-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgtu-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpgtu-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmple-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmple-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmple-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmple-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpleu-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpleu-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpleu-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpleu-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmplt-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmplt-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmplt-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmplt-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpltu-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpltu-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpltu-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpltu-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpne-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpne-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpne-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-cmpne-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-cplxconj-compile-1.c: New test. * gcc.target/riscv/cv-simd-cplxmul-i-compile-1.c: New test. * gcc.target/riscv/cv-simd-cplxmul-i-div2-compile-1.c: New test. * gcc.target/riscv/cv-simd-cplxmul-i-div4-compile-1.c: New test. * gcc.target/riscv/cv-simd-cplxmul-i-div8-compile-1.c: New test. * gcc.target/riscv/cv-simd-cplxmul-r-compile-1.c: New test. * gcc.target/riscv/cv-simd-cplxmul-r-div2-compile-1.c: New test. * gcc.target/riscv/cv-simd-cplxmul-r-div4-compile-1.c: New test. * gcc.target/riscv/cv-simd-cplxmul-r-div8-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotsp-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotsp-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotsp-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotsp-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotup-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotup-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotup-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotup-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotusp-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotusp-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotusp-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-dotusp-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-extract-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-extract-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-extractu-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-extractu-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-insert-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-insert-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-march-compile-1.c: New test. * gcc.target/riscv/cv-simd-max-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-max-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-max-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-max-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-maxu-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-maxu-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-maxu-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-maxu-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-min-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-min-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-min-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-min-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-minu-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-minu-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-minu-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-minu-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-neg-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-neg-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-or-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-or-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-or-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-or-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-pack-compile-1.c: New test. * gcc.target/riscv/cv-simd-pack-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-packhi-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-packlo-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotsp-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotsp-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotsp-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotsp-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotup-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotup-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotup-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotup-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotusp-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotusp-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotusp-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sdotusp-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-shuffle-sci-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-shuffle2-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-shuffle2-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-shufflei0-sci-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-shufflei1-sci-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-shufflei2-sci-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-shufflei3-sci-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sll-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sll-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sll-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sll-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sra-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sra-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sra-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sra-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-srl-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-srl-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-srl-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-srl-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sub-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sub-div2-compile-1.c: New test. * gcc.target/riscv/cv-simd-sub-div4-compile-1.c: New test. * gcc.target/riscv/cv-simd-sub-div8-compile-1.c: New test. * gcc.target/riscv/cv-simd-sub-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-sub-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-sub-sc-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-subrotmj-compile-1.c: New test. * gcc.target/riscv/cv-simd-subrotmj-div2-compile-1.c: New test. * gcc.target/riscv/cv-simd-subrotmj-div4-compile-1.c: New test. * gcc.target/riscv/cv-simd-subrotmj-div8-compile-1.c: New test. * gcc.target/riscv/cv-simd-xor-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-xor-h-compile-1.c: New test. * gcc.target/riscv/cv-simd-xor-sc-b-compile-1.c: New test. * gcc.target/riscv/cv-simd-xor-sc-h-compile-1.c: New test. * lib/target-supports.exp: Add proc for XCVsimd extension.
2024-01-25gcn: Add missing space to ASM_SPEC in gcn-hsa.hTobias Burnus1-1/+1
gcc/ * config/gcn/gcn-hsa.h (ASM_SPEC): Add space after -mxnack= argument.
2024-01-25RISC-V: remove param riscv-vector-abi. [PR113538]Yanzhang Wang2-9/+3
Also adjust some of the tests for scan-assembly. The behavior is the same as --param=riscv-vector-abi before. gcc/ChangeLog: PR target/113538 * config/riscv/riscv.cc (riscv_get_arg_info): Remove the flag. (riscv_fntype_abi): Ditto. * config/riscv/riscv.opt: Ditto. gcc/testsuite/ChangeLog: PR target/113538 * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-7.c: Fix the asm check. * gcc.target/riscv/rvv/base/abi-call-args-1-run.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-args-1.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-args-2-run.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-args-2.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-args-3-run.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-args-3.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-args-4-run.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-args-4.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-error-1.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-return-run.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-return.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-variant_cc.c: Ditto. * gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: Ditto. * gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: Ditto. * gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: Ditto. * gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: Ditto. * gcc.target/riscv/rvv/base/abi-callee-saved-1.c: Ditto. * gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: Ditto. * gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: Ditto. * gcc.target/riscv/rvv/base/abi-callee-saved-2.c: Ditto. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-69.c: Ditto. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-70.c: Ditto. * gcc.target/riscv/rvv/base/float-point-dynamic-frm-71.c: Ditto. * gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv32_vadd.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv32_vfadd.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv32_vget_vset.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv32_vloxseg2ei16.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv32_vreinterpret.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv64_vadd.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv64_vfadd.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv64_vget_vset.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv64_vloxseg2ei16.c: Ditto. * gcc.target/riscv/rvv/base/overloaded_rv64_vreinterpret.c: Ditto. * gcc.target/riscv/rvv/base/spill-10.c: Ditto. * gcc.target/riscv/rvv/base/spill-11.c: Ditto. * gcc.target/riscv/rvv/base/spill-9.c: Ditto. * gcc.target/riscv/rvv/base/tuple_vundefined.c: Ditto. * gcc.target/riscv/rvv/base/vcreate.c: Ditto. * gcc.target/riscv/rvv/base/vlmul_ext-1.c: Ditto. * gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Ditto. * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Ditto. * lib/target-supports.exp: Remove the flag. Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
2024-01-25aarch64: Fix out-of-bounds ENCODED_ELT access [PR113572]Richard Sandiford1-1/+1
When generalising vector_cst_all_same, I'd forgotten to update VECTOR_CST_ENCODED_ELT to VECTOR_CST_ELT. The check deliberately looks at implicitly encoded elements in some cases. gcc/ PR target/113572 * config/aarch64/aarch64-sve-builtins.cc (vector_cst_all_same): Check VECTOR_CST_ELT instead of VECTOR_CST_ENCODED_ELT gcc/testsuite/ PR target/113572 * gcc.target/aarch64/sve/pr113572.c: New test.
2024-01-25aarch64: Handle overlapping registers in movv8di [PR113550]Richard Sandiford1-4/+14
The LS64 movv8di pattern didn't handle loads that overlapped with the address register (unless the overlap happened to be in the last subload). gcc/ PR target/113550 * config/aarch64/aarch64-simd.md: In the movv8di splitter, check whether each split instruction is a load that clobbers the source address. Emit that instruction last if so. gcc/testsuite/ PR target/113550 * gcc.target/aarch64/pr113550.c: New test.
2024-01-25aarch64: Avoid paradoxical subregs in UXTL split [PR113485]Richard Sandiford1-3/+14
g:74e3e839ab2d36841320 handled the UXTL{,2}-ZIP[12] optimisation in split1. The UXTL input is a 64-bit vector of N-bit elements and the result is a 128-bit vector of 2N-bit elements. The corresponding ZIP1 operates on 128-bit vectors of N-bit elements. This meant that the ZIP1 input had to be a 128-bit paradoxical subreg of the 64-bit UXTL input. In the PRs, it wasn't possible to generate this subreg because the inputs were already subregs of a x[234] structure of 64-bit vectors. I don't think the same thing can happen for UXTL2->ZIP2 because UXTL2 input is a 128-bit vector rather than a 64-bit vector. It isn't really necessary for ZIP1 to take 128-bit inputs, since the upper 64 bits are ignored. This patch therefore adds a pattern for 64-bit → 128-bit ZIP1s. In principle, we should probably use this form for all ZIP1s. But in practice, that creates an awkward special case, and would be quite invasive for stage 4. gcc/ PR target/113485 * config/aarch64/aarch64-simd.md (aarch64_zip1<mode>_low): New pattern. (<optab><Vnarrowq><mode>2): Use it instead of generating a paradoxical subreg for the input. gcc/testsuite/ PR target/113485 * gcc.target/aarch64/pr113485.c: New test. * gcc.target/aarch64/pr113573.c: Likewise.
2024-01-25RISC-V: Add LCM delete block predecessors dump informationJuzhe-Zhong1-0/+42
While looking into PR113469, I notice the LCM delete a vsetvl incorrectly. This patch add dump information of all predecessors for LCM delete vsetvl block for better debugging. Tested no regression. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (get_all_predecessors): New function. (pre_vsetvl::pre_global_vsetvl_info): Add LCM delete block all predecessors dump information.
2024-01-25RISC-V: Remove redundant full available computation [NFC]Juzhe-Zhong1-34/+23
Notice full available is computed evey round of earliest fusion which is redundant. Actually we only need to compute it once in phase 3. It's NFC patch and tested no regression. Committed. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::compute_vsetvl_def_data): Remove redundant full available computation. (pre_vsetvl::pre_global_vsetvl_info): Ditto.
2024-01-25RISC-V: Add optim-no-fusion compile option [VSETVL PASS]Juzhe-Zhong3-14/+21
This patch adds no fusion compile option to disable phase 2 global fusion. It can help us to analyze the compile-time and debugging. Committed. gcc/ChangeLog: * config/riscv/riscv-opts.h (enum vsetvl_strategy_enum): Add optim-no-fusion option. * config/riscv/riscv-vsetvl.cc (pass_vsetvl::lazy_vsetvl): Ditto. (pass_vsetvl::execute): Ditto. * config/riscv/riscv.opt: Ditto.
2024-01-25LoongArch: Remove vec_concatz<mode> pattern.Jiahao Xu2-26/+6
It is incorrect to use vld/vori to implement the vec_concatz<mode> because when the LSX instruction is used to update the value of the vector register, the upper 128 bits of the vector register will not be zeroed. gcc/ChangeLog: * config/loongarch/lasx.md (@vec_concatz<mode>): Remove this define_insn pattern. * config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): Use vec_concat<mode>.
2024-01-25LoongArch: Disable TLS type symbols from generating non-zero offsets.Lulu Cheng1-9/+9
TLS gd ld and ie type symbols will generate corresponding GOT entries, so non-zero offsets cannot be generated. The address of TLS le type symbol+addend is not implemented in binutils, so non-zero offset is not generated here for the time being. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_symbolic_constant_p): For symbols of type tls, non-zero Offset is not generated.
2024-01-25rs6000: Enable block compare expand on P9 with m32 and mpowerpc64Haochen Gui1-5/+7
gcc/ * config/rs6000/rs6000-string.cc (expand_block_compare): Enable P9 with m32 and mpowerpc64. gcc/testsuite/ * gcc.target/powerpc/block-cmp-1.c: Exclude m32 and mpowerpc64. * gcc.target/powerpc/block-cmp-4.c: Likewise. * gcc.target/powerpc/block-cmp-8.c: New.
2024-01-25Enable -mlam=u57 by default when compiled with -fsanitize=hwaddress.liuhongt1-0/+9
gcc/ChangeLog: * config/i386/i386-options.cc (ix86_option_override_internal): Enable -mlam=u57 by default when compiled with -fsanitize=hwaddress.
2024-01-24aarch64: Fix __builtin_apply with -mgeneral-regs-only [PR113486]Andrew Pinski1-0/+4
The problem here is the builtin apply mechanism thinks the FP registers are to be used due to get_raw_arg_mode not returning VOIDmode. This fixes that oversight and the backend now returns VOIDmode for non-general-regs if TARGET_GENERAL_REGS_ONLY is true. Built and tested for aarch64-linux-gnu with no regressions. PR target/113486 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_get_reg_raw_mode): For TARGET_GENERAL_REGS_ONLY, return VOIDmode for non-GP_REGNUM_P regno. gcc/testsuite/ChangeLog: * gcc.target/aarch64/builtin_apply-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-24[PATCH v3] RISC-V: Add split pattern to generate SFB instructions. [PR113095]Monk Chiang1-0/+32
Since the match.pd transforms (zero_one == 0) ? y : z <op> y, into ((typeof(y))zero_one * z) <op> y. Add splitters to recongize this expression to generate SFB instructions. gcc/ChangeLog: PR target/113095 * config/riscv/sfb.md: New splitters to rewrite single bit sign extension as the condition to SFB instructions. gcc/testsuite/ChangeLog: * gcc.target/riscv/sfb.c: New test. * gcc.target/riscv/pr113095.c: New test.
2024-01-24AArch64: Fix expansion of Advanced SIMD div and mul using SVE [PR109636]Tamar Christina4-87/+79
As suggested in the ticket this replaces the expansion by converting the Advanced SIMD types to SVE types by simply printing out an SVE register for these instructions. This fixes the subreg issues since there are no subregs involved anymore. gcc/ChangeLog: PR target/109636 * config/aarch64/aarch64-simd.md (<su_optab>div<mode>3, mulv2di3): Remove. * config/aarch64/iterators.md (VQDIV): Remove. (SVE_FULL_SDI_SIMD, SVE_FULL_HSDI_SIMD_DI, SVE_I_SIMD_DI): New. (VPRED, sve_lane_con): Add V4SI and V2DI. * config/aarch64/aarch64-sve.md (<optab><mode>3, @aarch64_pred_<optab><mode>): Support Advanced SIMD types. (mul<mode>3): New, split from <optab><mode>3. (@aarch64_pred_<optab><mode>, *post_ra_<optab><mode>3): New. * config/aarch64/aarch64-sve2.md (@aarch64_mul_lane_<mode>, *aarch64_mul_unpredicated_<mode>): Change SVE_FULL_HSDI to SVE_FULL_HSDI_SIMD_DI. gcc/testsuite/ChangeLog: PR target/109636 * gcc.target/aarch64/sve/pr109636_1.c: New test. * gcc.target/aarch64/sve/pr109636_2.c: New test. * gcc.target/aarch64/sve2/pr109636_1.c: New test.
2024-01-24AArch64: Do not allow SIMD clones with simdlen 1 [PR113552]Tamar Christina1-1/+2
The AArch64 vector PCS does not allow simd calls with simdlen 1, however due to a bug we currently do allow it for num == 0. This causes us to emit a symbol that doesn't exist and we fail to link. gcc/ChangeLog: PR tree-optimization/113552 * config/aarch64/aarch64.cc (aarch64_simd_clone_compute_vecsize_and_simdlen): Block simdlen 1. gcc/testsuite/ChangeLog: PR tree-optimization/113552 * gcc.target/aarch64/pr113552.c: New test. * gcc.target/aarch64/simd_pcs_attribute-3.c: Remove bogus check.
2024-01-24MIPS: Accept arguments for -mexplicit-relocsYunQiang Su4-4/+41
GAS introduced explicit relocs since 2001, and %pcrel_hi/low were introduced in 2014. In future, we may introduce more. Let's convert -mexplicit-relocs option, and accpet options: none, base, pcrel. We also update gcc/configure.ac to set the value to option the gas support when GCC itself is built. gcc * configure.ac: Detect the explicit relocs support for mips, and define C macro MIPS_EXPLICIT_RELOCS. * config.in: Regenerated. * configure: Regenerated. * doc/invoke.texi(MIPS Options): Add -mexplicit-relocs. * config/mips/mips-opts.h: Define enum mips_explicit_relocs. * config/mips/mips.cc(mips_set_compression_mode): Sorry if !TARGET_EXPLICIT_RELOCS instead of just set it. * config/mips/mips.h: Define TARGET_EXPLICIT_RELOCS and TARGET_EXPLICIT_RELOCS_PCREL with mips_opt_explicit_relocs. * config/mips/mips.opt: Introduce -mexplicit-relocs= option and define -m(no-)explicit-relocs as aliases.
2024-01-24aarch64: Re-enable ldp/stp fusion passAlex Coplan1-2/+2
Since, to the best of my knowledge, all reported regressions related to the ldp/stp fusion pass have now been fixed, and PGO+LTO bootstrap with --enable-languages=all is working again with the passes enabled, this patch turns the passes back on by default, as agreed with Jakub here: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642478.html gcc/ChangeLog: * config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default to 1. (-mlate-ldp-fusion): Likewise.
2024-01-24RISC-V: Fix large memory usage of VSETVL PASS [PR113495]Juzhe-Zhong1-182/+51
SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS that is, VSETVL PASS consume over 33 GB memory which make use impossible to compile SPEC 2017 wrf in a laptop. The root cause is wasting-memory variables: unsigned num_exprs = num_bbs * num_regs; sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs); sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs); m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs); m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs); I find that compute_avl_def_data can be achieved by RTL_SSA framework. Replace the code implementation base on RTL_SSA framework. After this patch, the memory-hog issue is fixed. simple vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out) is 1.673 GB. lazy vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out) is 2.441 GB. Tested on both RV32 and RV64, no regression. gcc/ChangeLog: PR target/113495 * config/riscv/riscv-vsetvl.cc (get_expr_id): Remove. (get_regno): Ditto. (get_bb_index): Ditto. (pre_vsetvl::compute_avl_def_data): Ditto. (pre_vsetvl::earliest_fuse_vsetvl_info): Fix large memory usage. (pre_vsetvl::pre_global_vsetvl_info): Ditto. gcc/testsuite/ChangeLog: PR target/113495 * gcc.target/riscv/rvv/vsetvl/avl_single-107.c: Adapt test.
2024-01-23aarch64: Fix up debug uses in ldp/stp pass [PR113089]Alex Coplan1-7/+327
As the PR shows, we were missing code to update debug uses in the load/store pair fusion pass. This patch fixes that. The patch tries to give a complete treatment of the debug uses that will be affected by the changes we make, and in particular makes an effort to preserve debug info where possible, e.g. when re-ordering an update of a base register by a constant over a debug use of that register. When re-ordering loads over a debug use of a transfer register, we reset the debug insn. Likewise when re-ordering stores over debug uses of mem. While doing this I noticed that try_promote_writeback used a strange choice of move_range for the pair insn, in that it chose the previous nondebug insn instead of the insn itself. Since the insn is being changed, these move ranges are equivalent (at least in terms of nondebug insn placement as far as RTL-SSA is concerned), but I think it is more natural to choose the pair insn itself. This is needed to avoid incorrectly updating some debug uses. gcc/ChangeLog: PR target/113089 * config/aarch64/aarch64-ldp-fusion.cc (reset_debug_use): New. (fixup_debug_use): New. (fixup_debug_uses_trailing_add): New. (fixup_debug_uses): New. Use it ... (ldp_bb_info::fuse_pair): ... here. (try_promote_writeback): Call fixup_debug_uses_trailing_add to fix up debug uses of the base register that are affected by folding in the trailing add insn. gcc/testsuite/ChangeLog: PR target/113089 * gcc.c-torture/compile/pr113089.c: New test.
2024-01-23aarch64: Re-parent trailing nondebug base reg uses [PR113089]Alex Coplan1-0/+24
While working on PR113089, I realised we where missing code to re-parent trailing nondebug uses of the base register in the case of cancelling writeback in the load/store pair pass. This patch fixes that. gcc/ChangeLog: PR target/113089 * config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair): Update trailing nondebug uses of the base register in the case of cancelling writeback.
2024-01-23aarch64: Don't record hazards against paired insns [PR113356]Alex Coplan1-2/+2
For the testcase in the PR, we try to pair insns where the first has writeback and the second uses the updated base register. This causes us to record a hazard against the second insn, thus narrowing the move range away from the end of the BB. However, it isn't meaningful to record hazards against the other insn in the pair, as this doesn't change which pairs can be formed, and also doesn't change where the pair is formed (from the perspective of nondebug insns). To see why this is the case, consider the two cases: - Suppoe we are finding hazards for insns[0]. If we record a hazard against insns[1], then range.last becomes insns[1]->prev_nondebug_insn (), but note that this is equivalent to inserting after insns[1] (since insns[1] is being changed). - Now consider finding hazards for insns[1]. Suppose we record insns[0] as a hazard. Then we set range.first = insns[0], which is a no-op. As such, it seems better to never record hazards against the other insn in the pair, as we check whether the insns themselves are suitable for combination separately (e.g. for ldp checking that they use distinct transfer registers). Avoiding unnecessarily narrowing the move range avoids unnecessarily re-ordering over debug insns. This should also mean that we can only narrow the move range away from the end of the BB in the case that we record a hazard for insns[0] against insns[1]->prev_nondebug_insn () or earlier. This means that for the non-call-exceptions case, either the move range includes insns[1], or we reject the pair (thus the assert tripped in the PR should always hold). gcc/ChangeLog: PR target/113356 * config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::try_fuse_pair): Don't record hazards against the opposite insn in the pair. gcc/testsuite/ChangeLog: PR target/113356 * gcc.target/aarch64/pr113356.C: New test.
2024-01-23aarch64: Fix up uses of mem following stp insert [PR113070]Alex Coplan1-55/+195
As the PR shows (specifically #c7) we are missing updating uses of mem when inserting an stp in the aarch64 load/store pair fusion pass. This patch fixes that. RTL-SSA has a simple view of memory and by default doesn't allow stores to be re-ordered w.r.t. other stores. In the ldp fusion pass, we do our own alias analysis and so can re-order stores over other accesses when we deem this is safe. If neither store can be re-purposed (moved into the required position to form the stp while respecting the RTL-SSA constraints), then we turn both the candidate stores into "tombstone" insns (logically delete them) and insert a new stp insn. As it stands, we implement the insert case separately (after dealing with the candidate stores) in fuse_pair by inserting into the middle of the vector of changes. This is OK when we only have to insert one change, but with this fix we would need to insert the change for the new stp plus multiple changes to fix up uses of mem (note the number of fix-ups is naturally bounded by the alias limit param to prevent quadratic behaviour). If we kept the code structured as is and inserted into the middle of the vector, that would lead to repeated moving of elements in the vector which seems inefficient. The structure of the code would also be a little unwieldy. To improve on that situation, this patch introduces a helper class, stp_change_builder, which implements a state machine that helps to build the required changes directly in program order. That state machine is reponsible for deciding what changes need to be made in what order, and the code in fuse_pair then simply follows those steps. Together with the fix in the previous patch for installing new defs correctly in RTL-SSA, this fixes PR113070. We take the opportunity to rename the function decide_stp_strategy to try_repurpose_store, as that seems more descriptive of what it actually does, since stp_change_builder is now responsible for the overall change strategy. gcc/ChangeLog: PR target/113070 * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New. (decide_stp_strategy): Reanme to ... (try_repurpose_store): ... this. (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to construct stp changes. Fix up uses when inserting new stp insns.
2024-01-23ia64: Fix up -Wunused-parameter warningJakub Jelinek1-1/+2
Since r14-6945-gc659dd8bfb55e02a1b97407c1c28f7a0e8f7f09b there is a warning ../../gcc/config/ia64/ia64.cc: In function ‘void ia64_start_function(FILE*, const char*, tree)’: ../../gcc/config/ia64/ia64.cc:3889:59: warning: unused parameter ‘decl’ [-Wunused-parameter] 3889 | ia64_start_function (FILE *file, const char *fnname, tree decl) | ~~~~~^~~~ which presumably for bootstraps breaks the bootstrap. While the decl parameter is passed to the ASM_OUTPUT_FUNCTION_LABEL macro, that macro actually doesn't use that argument, so the removal of ATTRIBUTE_UNUSED was incorrect. This patch reverts the first ia64.cc hunk from r14-6945. 2024-01-23 Jeff Law <jlaw@ventanamicro.com> Jakub Jelinek <jakub@redhat.com> * config/ia64/ia64.cc (ia64_start_function): Add ATTRIBUTE_UNUSED to decl.
2024-01-23aarch64: Avoid registering duplicate C++ overloads [PR112989]Richard Sandiford1-0/+8
In the original fix for this PR, I'd made sure that including <arm_sme.h> didn't reach the final return in simulate_builtin_function_decl (which would indicate duplicate function definitions). But it seems I forgot to do the same thing for C++, which defines all of its overloads directly. This patch fixes a case where we still recorded duplicate functions for C++. Thanks to Iain for reporting the resulting GC ICE and for help with reproducing it. gcc/ PR target/112989 * config/aarch64/aarch64-sve-builtins-shapes.cc (build_one): Skip MODE_single variants of functions that don't take tuple arguments.
2024-01-23aarch64: Don't assert recog success in ldp/stp pass [PR113114]Alex Coplan1-1/+9
The PR shows two different cases where try_promote_writeback produces an RTL pattern which isn't recognized. Currently this leads to an ICE, as we assert recog success, but I think it's better just to back out of the changes gracefully if recog fails (as we do in the main fuse_pair case). In theory since we check the ranges here recog shouldn't fail (which is why I had the assert in the first place), but the PR shows an edge case in the patterns where if we form a pre-writeback pair where the writeback offset is exactly -S, where S is the size in bytes of one transfer register, we fail to match the expected pattern as the patterns look explicitly for plus operands in the mems. I think fixing this would require adding at least four new special-case patterns to aarch64.md for what doesn't seem to be a particularly useful variant of the insns. Even if we were to do that, I think it would be GCC 15 material, and it's better to just punt for GCC 14. The ILP32 case in the PR is a bit different, as that shows us trying to combine a pair with DImode base register operands in the mems together with an SImode trailing update of the base register. This leads to us forming an RTL pattern which references the base register in both SImode and DImode, which also fails to recog. Again, I think it's best just to take the missed optimization for now. If we really want to make this (try_promote_writeback) work for ILP32, we can try to do it for GCC 15. gcc/ChangeLog: PR target/113114 * config/aarch64/aarch64-ldp-fusion.cc (try_promote_writeback): Don't assert recog success, just punt if the writeback pair isn't recognized. gcc/testsuite/ChangeLog: PR target/113114 * gcc.c-torture/compile/pr113114.c: New test. * gcc.target/aarch64/pr113114.c: New test.
2024-01-23gcn: Fix a warningJakub Jelinek1-1/+2
I see ../../gcc/config/gcn/gcn.cc: In function ‘void gcn_hsa_declare_function_name(FILE*, const char*, tree)’: ../../gcc/config/gcn/gcn.cc:6568:67: warning: unused parameter ‘decl’ [-Wunused-parameter] 6568 | gcn_hsa_declare_function_name (FILE *file, const char *name, tree decl) | ~~~~~^~~~ warning presumably since r14-6945-gc659dd8bfb55e02a1b97407c1c28f7a0e8f7f09b Previously, the argument was anonymous, but now it is passed to a macro which ignores it, so I think we should go with ATTRIBUTE_UNUSED. 2024-01-23 Jakub Jelinek <jakub@redhat.com> * config/gcn/gcn.cc (gcn_hsa_declare_function_name): Add ATTRIBUTE_UNUSED to decl.
2024-01-23LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=autoXi Ruoyao1-5/+5
Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler macro. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_explicit_relocs_p): If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false for SYMBOL_TLS_LDM and SYMBOL_TLS_GD. (loongarch_call_tls_get_addr): Do not split symbols of SYMBOL_TLS_LDM or SYMBOL_TLS_GD if la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Check for la.tls.ld and la.tls.gd.
2024-01-22arm: Fix parsecpu.awk for aliases [PR113030]Andrew Pinski1-2/+2
So the problem here is the 2 functions check_cpu and check_arch use the wrong variable to check if an alias is valid for that cpu/arch. check_cpu uses cpu_optaliases instead of cpu_opt_alias. cpu_optaliases is an array of index'ed by the cpuname that contains all of the valid aliases for that cpu but cpu_opt_alias is an double index array which is index'ed by cpuname and the alias which provides what is the alias for that option. Similar thing happens for check_arch and arch_optaliases vs arch_optaliases. Tested by running: ``` awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+simd" config/arm/arm-cpus.in awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon" config/arm/arm-cpus.in awk -f config/arm/parsecpu.awk -v cmd="chkarch armv7-a+neon-vfpv3" config/arm/arm-cpus.in ``` And they don't return error back. gcc/ChangeLog: PR target/113030 * config/arm/parsecpu.awk (check_cpu): Use cpu_opt_alias instead of cpu_optaliases. (check_arch): Use arch_opt_alias instead of arch_optaliases. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-22RISC-V: Lower vmv.v.x (avl = 1) into vmv.s.xJuzhe-Zhong3-1/+21
Notice there is a AI benchmark, GCC vs Clang has 3% performance drop. It's because Clang/LLVM has a simplification transform vmv.v.x (avl = 1) into vmv.s.x. Since vmv.s.x has more flexible vsetvl demand than vmv.v.x that can allow us to have better chances to fuse vsetvl. Consider this following case: void foo (uint32_t *outputMat, uint32_t *inputMat) { vuint32m1_t matRegIn0 = __riscv_vle32_v_u32m1 (inputMat, 4); vuint32m1_t matRegIn1 = __riscv_vle32_v_u32m1 (inputMat + 4, 4); vuint32m1_t matRegIn2 = __riscv_vle32_v_u32m1 (inputMat + 8, 4); vuint32m1_t matRegIn3 = __riscv_vle32_v_u32m1 (inputMat + 12, 4); vbool32_t oddMask = __riscv_vreinterpret_v_u32m1_b32 (__riscv_vmv_v_x_u32m1 (0xaaaa, 1)); vuint32m1_t smallTransposeMat0 = __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn0, matRegIn1, 1, 4); vuint32m1_t smallTransposeMat2 = __riscv_vslideup_vx_u32m1_tumu (oddMask, matRegIn2, matRegIn3, 1, 4); vuint32m1_t outMat0 = __riscv_vslideup_vx_u32m1_tu (smallTransposeMat0, smallTransposeMat2, 2, 4); __riscv_vse32_v_u32m1 (outputMat, outMat0, 4); } Before this patch: vsetivli zero,4,e32,m1,ta,ma li a5,45056 addi a2,a1,16 addi a3,a1,32 addi a4,a1,48 vle32.v v1,0(a1) vle32.v v4,0(a2) vle32.v v2,0(a3) vle32.v v3,0(a4) addiw a5,a5,-1366 vsetivli zero,1,e32,m1,ta,ma vmv.v.x v0,a5 ---> Since it avl = 1, we can transform it into vmv.s.x vsetivli zero,4,e32,m1,tu,mu vslideup.vi v1,v4,1,v0.t vslideup.vi v2,v3,1,v0.t vslideup.vi v1,v2,2 vse32.v v1,0(a0) ret After this patch: li a5,45056 addi a2,a1,16 vsetivli zero,4,e32,m1,tu,mu addiw a5,a5,-1366 vle32.v v3,0(a2) addi a3,a1,32 addi a4,a1,48 vle32.v v1,0(a1) vmv.s.x v0,a5 vle32.v v2,0(a3) vslideup.vi v1,v3,1,v0.t vle32.v v3,0(a4) vslideup.vi v2,v3,1,v0.t vslideup.vi v1,v2,2 vse32.v v1,0(a0) ret Tested on both RV32 and RV64 no regression. gcc/ChangeLog: * config/riscv/riscv-protos.h (splat_to_scalar_move_p): New function. * config/riscv/riscv-v.cc (splat_to_scalar_move_p): Ditto. * config/riscv/vector.md: Simplify vmv.v.x. into vmv.s.x. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/attribute-2.c: New test. * gcc.target/riscv/rvv/vsetvl/attribute-3.c: New test.
2024-01-22RISC-V: Fix regressions due to 86de9b66480b710202a2898cf513db105d8c432fJuzhe-Zhong1-4/+2
This patch fixes the recent regression: FAIL: gcc.dg/torture/float32-tg-2.c -O1 (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg-2.c -O1 (test for excess errors) FAIL: gcc.dg/torture/float32-tg-2.c -O2 (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg-2.c -O2 (test for excess errors) FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) FAIL: gcc.dg/torture/float32-tg-2.c -O3 -g (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg-2.c -O3 -g (test for excess errors) FAIL: gcc.dg/torture/float32-tg-2.c -Os (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg-2.c -Os (test for excess errors) FAIL: gcc.dg/torture/float32-tg.c -O1 (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg.c -O1 (test for excess errors) FAIL: gcc.dg/torture/float32-tg.c -O2 (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg.c -O2 (test for excess errors) FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) FAIL: gcc.dg/torture/float32-tg.c -O3 -g (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg.c -O3 -g (test for excess errors) FAIL: gcc.dg/torture/float32-tg.c -Os (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/float32-tg.c -Os (test for excess errors) FAIL: gcc.dg/torture/pr48124-4.c -O1 (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/pr48124-4.c -O1 (test for excess errors) FAIL: gcc.dg/torture/pr48124-4.c -O2 (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/pr48124-4.c -O2 (test for excess errors) FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) FAIL: gcc.dg/torture/pr48124-4.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/pr48124-4.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gcc.dg/torture/pr48124-4.c -O3 -g (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/pr48124-4.c -O3 -g (test for excess errors) FAIL: gcc.dg/torture/pr48124-4.c -Os (internal compiler error: in reg_or_subregno, at jump.cc:1895) FAIL: gcc.dg/torture/pr48124-4.c -Os (test for excess errors) due to commit 86de9b66480b710202a2898cf513db105d8c432f. The root cause is register_operand and reg_or_subregno are consistent so we reach the assertion fail. We shouldn't worry about subreg:...VL_REGNUM since it's impossible that we can have such situation, that is, we only have (set (reg) (reg:VL_REGNUM)) which generate "csrr vl" ASM for first fault load instructions (vleff). So, using REG_P and REGNO must be totally solid and robostic. Since we don't allow VL_RENUM involved into register allocation and we don't have such constraint, we always use this following pattern to generate "csrr vl" ASM: (define_insn "read_vlsi" [(set (match_operand:SI 0 "register_operand" "=r") (reg:SI VL_REGNUM))] "TARGET_VECTOR" "csrr\t%0,vl" [(set_attr "type" "rdvl") (set_attr "mode" "SI")]) So the check in riscv.md is to disallow such situation fall into move pattern in riscv.md Tested on both RV32/RV64 no regression. PR target/109092 gcc/ChangeLog: * config/riscv/riscv.md: Use reg instead of subreg. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr109092.c: New test.
2024-01-22[gcn] mkoffload: Fix linking with "-g"; fix file deletion; improve ↵Tobias Burnus1-16/+16
diagnostic [PR111966] With debugging enabled, '*.mkoffload.dbg.o' files are generated. The e_flags header of all *.o files must be the same - otherwise, the linker complains. Since r14-4734-g56ed1055b2f40ac162ae8d382280ac07a33f789f the -march= default is now gfx900. If compiling without any -march= flag, the default value is used by the compiler but not passed to mkoffload. Hence, mkoffload.cc's uses its own default for march - unfortunately, it still had gfx803/fiji as default, leading to the linker error: 'incompatible mach'. Solution: Update the default to gfx900. While debugging it, I saw that /tmp/cc*.mkoffload.dbg.o kept accumulating; there were a couple of issues with the handling: * dbgobj was always added to files_to_cleanup * If copy_early_debug_info returned true, dbgobj was added again -> pointless and in theory a race if the same file was added in the faction of a second. * If copy_early_debug_info returned false, - In exactly one case, it already deleted the file it self (same potential race as above) - The pointer dbgobj was freed - such that files_to_cleanup contained a dangling pointer - probably the reason that stale files remained. Solution: Only if copy_early_debug_info returns true, dbgobj is added to files_to_cleanup. If it returns false, the file is unlinked before freeing the pointer. When compiling, GCC warned about several fatal_error messages as having no %<...%> or %qs quotes. This patch now silences several of those warnings by using those quotes. gcc/ChangeLog: PR other/111966 * config/gcn/mkoffload.cc (elf_arch): Change default to gfx900 to match the compiler default. (simple_object_copy_lto_debug_sections): Never unlink the outfile on error as the caller does so. (maybe_unlink, compile_native): Use %<...%> and %qs in fatal_error. (main): Likewise. Fix 'mkoffload.dbg.o' cleanup. Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
2024-01-22RISC-V: Fix vfirst/vmsbf/vmsif/vmsof ratio attributesJuzhe-Zhong1-1/+1
vfirst/vmsbf/vmsif/vmsof instructions are supposed to demand ratio instead of demanding sew_lmul. But my previous typo makes VSETVL PASS miss honor the risc-v v spec. Consider this following simple case: int foo4 (void * in, void * out) { vint32m1_t v = __riscv_vle32_v_i32m1 (in, 4); v = __riscv_vadd_vv_i32m1 (v, v, 4); vbool32_t mask = __riscv_vreinterpret_v_i32m1_b32(v); mask = __riscv_vmsof_m_b32(mask, 4); return __riscv_vfirst_m_b32(mask, 4); } Before this patch: foo4: vsetivli zero,4,e32,m1,ta,ma vle32.v v1,0(a0) vadd.vv v1,v1,v1 vsetvli zero,zero,e8,mf4,ta,ma ----> redundant. vmsof.m v2,v1 vfirst.m a0,v2 ret After this patch: foo4: vsetivli zero,4,e32,m1,ta,ma vle32.v v1,0(a0) vadd.vv v1,v1,v1 vmsof.m v2,v1 vfirst.m a0,v2 ret Confirm RVV spec and Clang, this patch makes VSETVL PASS match the correct behavior. Tested on both RV32/RV64, no regression. gcc/ChangeLog: * config/riscv/vector.md: Fix vfirst/vmsbf/vmsof ratio attributes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/attribute-1.c: New test.
2024-01-22RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]xuli1-77/+16
v2: Avoid internal ICE for the case below. vint8mf8_t test_vle8_v_i8mf8_m(vbool64_t vm, const int32_t *rs1, size_t vl) { return __riscv_vle8(vm, rs1, vl); } v1: Change the hash value of overloaded intrinsic from considering all parameter types to: 1. Encoding vector data type 2. In order to distinguish vle8_v_i8mf8_m(vbool64_t vm, const int8_t *rs1, size_t vl) and vle8_v_u8mf8_m(vbool64_t vm, const uint8_t *rs1, size_t vl), encode the pointer type 3. In order to distinguish vfadd_vv_f32mf2_rm(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl) and vfadd_vv_f32mf2(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl), encode the number of parameters. The same goes for the vxrm intrinsics. PR target/113420 gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (has_vxrm_or_frm_p):remove. (registered_function::overloaded_hash):refactor. (resolve_overloaded_builtin):avoid internal ICE. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr113420-1.c: New test. * gcc.target/riscv/rvv/base/pr113420-2.c: New test.
2024-01-21Install right version of last change.Jeff Law1-0/+1
gcc/ * config/riscv/riscv.cc (riscv_init_cumulative_args): Install correcction version of last change.
2024-01-21[committed] [NFC] Fix riscv_init_cumulative_args for unused argumentsJeff Law1-6/+1
The signature was still using ATTRIBUTE_UNUSED and actually marked one of the used arguments with ATTRIBUTE_UNUSED. This patch drops the decorations and instead remove the name of arguments which are actually unused which is the preferred way to handle this now when we can. Bootstrapped. I didn't have test results on the platform where I bootstrapped, so no results to compare against. Given its NFC, I think we're OK without the regression results. gcc/ * config/riscv/riscv.cc (riscv_init_cumulative_args): Update and fix bugs in signature.
2024-01-20RISC-V: Suppress warningJuzhe-Zhong1-2/+2
../../gcc/config/riscv/riscv.cc: In function 'void riscv_init_cumulative_args(CUMULATIVE_ARGS*, tree, rtx, tree, int)': ../../gcc/config/riscv/riscv.cc:4879:34: error: unused parameter 'fndecl' [-Werror=unused-parameter] 4879 | tree fndecl, | ~~~~~^~~~~~ ../../gcc/config/riscv/riscv.cc: In function 'bool riscv_vector_mode_supported_any_target_p(machine_mode)': ../../gcc/config/riscv/riscv.cc:10537:56: error: unused parameter 'mode' [-Werror=unused-parameter] 10537 | riscv_vector_mode_supported_any_target_p (machine_mode mode) | ~~~~~~~~~~~~~^~~~ cc1plus: all warnings being treated as errors make[3]: *** [Makefile:2559: riscv.o] Error 1 Suppress these warnings. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_init_cumulative_args): Suppress warning. (riscv_vector_mode_supported_any_target_p): Ditto.
2024-01-19[PATCH] Avoid ICE on m68k -fzero-call-used-regs -fpic [PR110934]Mikael Pettersson1-0/+46
PR110934 is a problem on m68k where -fzero-call-used-regs -fpic ICEs when clearing an FP register. The generic code generates an XFmode move of zero to that register, which becomes an XFmode load from initialized data, which due to -fpic uses a non-constant address, which the backend rejects. The zero-call-used-regs pass runs very late, after register allocation and frame layout, and at that point we can't allow new uses of the PIC register or new pseudos. To clear an FP register on m68k it's enough to do the move in SFmode, but the generic code can't be told to do that, so this patch updates m68k to use its own TARGET_ZERO_CALL_USED_REGS. Bootstrapped and regression tested on m68k-linux-gnu. Ok for master? (I don't have commit rights.) gcc/ PR target/110934 * config/m68k/m68k.cc (m68k_zero_call_used_regs): New function. (TARGET_ZERO_CALL_USED_REGS): Define. gcc/testsuite/ PR target/110934 * gcc.target/m68k/pr110934.c: New test.
2024-01-19[PATCH] Avoid ICE in single-bit logical RMWs on m68k-uclinux [PR108640]Mikael Pettersson1-3/+3
When generating RMW logical operations on m68k, the backend recognizes single-bit operations and rewrites them as bit instructions on operands adjusted to address the intended byte. When offsetting the addresses the backend keeps the modes as SImode, even though the actual access will be in QImode. The uclinux target defines M68K_OFFSETS_MUST_BE_WITHIN_SECTIONS_P which adds a check that the adjusted operand is within the bounds of the original object. Since the address has been offset it is not, and the compiler ICEs. The bug is that the modes of the adjusted operands should have been narrowed to QImode, which is that this patch does. Nearby code which narrows to HImode gets that right. Bootstrapped and regression tested on m68k-linux-gnu. Ok for master? (Note: I don't have commit rights.) gcc/ PR target/108640 * config/m68k/m68k.cc (output_andsi3): Use QImode for address adjusted for 1-byte RMW access. (output_iorsi3): Likewise. (output_xorsi3): Likewise. gcc/testsuite/ PR target/108640 * gcc.target/m68k/pr108640.c: New test.
2024-01-19RISC-V: Fix RVV_VLMAXJuzhe-Zhong2-4/+3
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode, X0_REGNUM) every time we call RVV_VLMAX, that is, we are always generating garbage and redundant (reg:DI 0 zero) rtx. After this patch fix, the memory hog is gone. Time variable usr sys wall GGC machine dep reorg : 1.99 ( 9%) 0.35 ( 56%) 2.33 ( 10%) 939M ( 80%) [Before this patch] machine dep reorg : 1.71 ( 6%) 0.16 ( 27%) 3.77 ( 6%) 659k ( 0%) [After this patch] Time variable usr sys wall GGC machine dep reorg : 75.93 ( 18%) 14.23 ( 88%) 90.15 ( 21%) 33383M ( 95%) [Before this patch] machine dep reorg : 56.00 ( 14%) 7.92 ( 77%) 63.93 ( 15%) 4361k ( 0%) [After this patch] Test is running. Ok for trunk if I passed the test with no regresion ? PR target/113495 gcc/ChangeLog: * config/riscv/riscv-protos.h (RVV_VLMAX): Change to regno_reg_rtx[X0_REGNUM]. (RVV_VUNDEF): Ditto. * config/riscv/riscv-vsetvl.cc: Add timevar.
2024-01-19RISC-V: Remove unused function in riscv_subset_list [NFC]Kito Cheng1-4/+0
gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::parse_std_ext): Remove. (riscv_subset_list::parse_multiletter_ext): Remove. * config/riscv/riscv-subset.h (riscv_subset_list::parse_std_ext): Remove. (riscv_subset_list::parse_multiletter_ext): Remove.